Allow SHA1 instances to be reused to compute another hash value, and
resume caching them in ObjectInserter and PackParser. This shaves a
small amount of running time off parsing git.git's pack file:
before after
------ ------
25.25s 25.55s
25.48s 25.06s
25.26s 24.94s
Almost noise (small difference), but recycling the instances reduces
some stress on the memory allocator finding two 80 word message block
arrays needed for hashing and collision detection.
Change-Id: I4af88a720e81460293bc5c5d1d3db1a831e7e228
5e8e2179b2 (Update Jetty to
9.4.1.v201470120, 2017-01-26) updated Jetty in the maven build.
Update the buck build to match so buck builds work again.
The buck build will go away soon, but in the meantime (until the bazel
build gets the same level of support) it is convenient as a faster way
of running tests than using maven.
The bazel build doesn't need this change since it doesn't build or run
http tests yet.
Change-Id: Ibbdaf2880e76b32fc9f6b5605a2ff29e3deffda2
Generate names for objects using only the pure Java SHA1
implementation, but continue using MessageDigest in tests.
This opens the possibility of changing the hashing function
to incorporate additional safety measures, such as those
used in sha1dc[1].
Since MessageDigest has higher throughput, continue using
MessageDigest for computing pack, idx and DirCache trailers.
These are less likely to be sensitive to SHAttered[2] types
of attacks, as Git uses them to detect random bit flips
during transfer, and not for content identity.
[1] https://github.com/cr-marcstevens/sha1collisiondetection
[2] https://shattered.it/
Change-Id: If6da98334201f7f20cb916e46f782c45f373784e
This implementation is derived straight from the description written
in RFC 3174. On Mac OS X with Java 1.8.0_91 it offers similar
throughput as MessageDigest SHA-1:
system 239.75 MiB/s
system 244.71 MiB/s
system 245.00 MiB/s
system 244.92 MiB/s
sha1 234.08 MiB/s
sha1 244.50 MiB/s
sha1 242.99 MiB/s
sha1 241.73 MiB/s
This is the fastest implementation I could come up with. Common SHA-1
implementation tricks such as unrolling loops creates a method too
large for the JIT to effectively optimize, resulting in lower overall
hashing throughput. Using a preprocessor to perform the register
renaming of A-E also didn't help, as again the method was too large
for the JIT to effectively optimize.
Fortunately the fastest version is a naive, straight-forward
implementation very close to the description in RFC 3174.
Change-Id: I228b05c4a294ca2ad51386cf0e47978c68e1aa42
Since the introduction of generic type parameter inference in Java 7,
it's not necessary to explicitly specify the type of generic parameters.
Enable the warning in Eclipse, and fix all occurrences.
Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
FileRepository is in the package org.eclipse.jgit.internal, and is
thus non-API. This causes warnings in Eclipse when FileRepository is
used.
Add a filter to prevent the warnings.
Change-Id: I9a8ae106c085bb0e826031fa183b4c4bdabcc5fc
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
In 0bff481d45 to accurately use the two
limits it was necessary to move the LimitedInputStream out of the
PacketLineIn and further down to the PackParser. Unfortuantely this
didn't survive review, as a buggy test failed and the "fix" was to
drop this part of the code.
The maxPackSizeLimit should apply to the pack stream, not the pkt-line
framing used to send commands to control the ReceivePack instance. The
commands are controlled using a different limit. The failing test allowed
too many bytes in the pack and was only failing because it was including
the command framing. The correct fix for the test was simply to drop the
limit lower, to more closely match the actual pack size.
Change-Id: I47d3885b9d7d527e153df7ac9c62fc2865ceecf4
RevCommit.getCommitTime returns time in seconds since the epoch.
ZipArchiveEntry.setTime expects time in milliseconds.
Add the missing unit conversion to get the correct result.
Correct formatting to be consistent with the rest of the code.
Change-Id: I990b92f1d996ec8538d4857755694d91b142eb53
Set missingOverrideAnnotation=warning in Eclipse compiler preferences
which enables the warning:
The method <method> of type <type> should be tagged with @Override
since it actually overrides a superclass method
Justification for this warning is described in:
http://stackoverflow.com/a/94411/381622
Enabling this causes in excess of 1000 warnings across the entire
code-base. They are very easy to fix automatically with Eclipse's
"Quick Fix" tool.
Fix all of them except 2 which cause compilation failure when the
project is built with mvn; add TODO comments on those for further
investigation.
Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
MappedLoginService is no longer available in Jetty 9.4 therefore base
TestLoginService on AbstractLoginService.
Apparently Jetty now uses slf4j hence adapt RecordingLogger accordingly
so we can log error messages containing slf4j style formatting anchors
"{}".
Change-Id: Ibb36aba8782882936849b6102001a88b699bb65c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Add a new method setTagOpt which sets the annotated tag behavior during
fetch. Pass the option to the fetch command.
No explicit tests are added; the fetch with tags functionality is already
covered by the tests of the fetch command.
Change-Id: I131e1f68d8fcced178d8fa48abf7ffab17f8e173
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Archived zip files for a same commit have different MD5 hash because
mdate and mdate in the header of zip entries are not specified. In
this case, Commons Compress sets an archived time.
In the original git implementation, it's set a commit time:
e2b2d6a172/archive.c (L378)
By this fix, archive command sets the commit time to ZipArchiveEntry
when RevCommit is given as an archiving target.
Change-Id: I30dd8710e910cdf42d57742f8709e9803930a123
Signed-off-by: Naoki Takezoe <takezoe@gmail.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If the pruneexpire config is set to "now", then any unreferenced loose
objects are immediately eligible for gc. So there is no need to
actually write the loose objects.
Users who run hosting services which sometimes accept large, entirely
garbage packs might set the following configurations:
gc.pruneExpire = now
gc.prunePackExpire = 2.weeks
Then garbage objects will be kept around in packs, but after two weeks
the packs themselves will get deleted.
For client-side users of jgit, the default settings will loosen
garbage objects, and, after an hour, delete the old packs in which
they resided.
Change-Id: I8f686ac60b40181b1ee92ac6c313c3f33b55c44c
Signed-off-by: David Turner <dturner@twosigma.com>
Without this, using bazel 0.4.4 to build fails:
ERROR: jgit/org.eclipse.jgit/BUILD:29:1: Java compilation in rule '//org.eclipse.jgit:insecure_cipher_factory' failed: Worker process sent response with exit code: 1.
jgit/src/org/eclipse/jgit/transport/InsecureCipherFactory.java:63: error: [InsecureCryptoUsage] Insecure usage of a crypto API: the transformation is not a compile-time constant expression.
return Cipher.getInstance(algo);
^
(see http://errorprone.info/bugpattern/InsecureCryptoUsage)
Change-Id: I7f9a3a5117e42cb68544674f5312df0368aa3674
At beginning of the OBJECT_SCAN loop, it will first check if the object
exists in the last pack, however, it forgot to avoid garbage pack for
the first iteration.
Change-Id: I8a99c0f439218d19c49cd4dae891b8cc4a57099d
Signed-off-by: Zhen Chen <czhen@google.com>
There are multiple places in DfsReader to skip garbage pack if both of
the following conditions satisfied:
* AvoidUnreachable flag is set
* The pack is a garabge pack
Refactor them into a shared private method.
Change-Id: I67d6bb601db55f904437c807c6a3c36f0a723265
Signed-off-by: Zhen Chen <czhen@google.com>
Place a configurable upper bound on the amount of command data
received from clients during `git push`. The limit is applied to the
encoded wire protocol format, not the JGit in-memory representation.
This allows clients to flexibly use the limit; shorter reference names
allow for more commands, longer reference names permit fewer commands
per batch.
Based on data gathered from many repositories at $DAY_JOB, the average
reference name is well under 200 bytes when encoded in UTF-8 (the wire
encoding). The new 3 MiB default receive.maxCommandBytes allows about
11,155 references in a single `git push` invocation. A Gerrit Code
Review system with six-digit change numbers could still encode 29,399
references in the 3 MiB maxCommandBytes limit.
Change-Id: I84317d396d25ab1b46820e43ae2b73943646032c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
In one place version 3.0.4 is used, and in another place 3.0.3 is
used.
Define the version (3.0.4) in a property and use that in both places,
so it doesn't get inconsistent again next time the version is bumped.
Change-Id: If3a2489cec78c0c9ef76aa6b941fda51b098e04b
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Clarify that 'true' means 'auto close'. This makes it consistent with
other calls that have a boolean argument for 'bare'. It also makes it a
bit easier to see what's going on while stepping in the debugger, because
it's not necessary to scroll around to find the method declaration.
Change-Id: Idacd749407dcfd258af3efaaf44d129069925dd3
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
submoduleStandalone is created by createWorkRepository() which adds
the created repository to the set of repositories to be closed in
the test teardown. It is therefore not necessary to explicitly close
it.
Change-Id: Ib6f525b644fdeaaf1934df39cc2d3583a0d883dc
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The renameDetector member returned by this method will be null when
following file renames has been disabled by previously calling:
setFollowFileRenames(false).
Annotate it as @Nullable and update the Javadoc to explicitly
document the null return.
Change-Id: I9bdf443a64cf3c45352d3ab023051a2e11f7426d
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
When rebasing, force-pushing has a race condition: someone else might
have pushed a commit since the one you just rewrote. The force-with-lease
option prevents this by ensuring that the ref's old value is the one
that you expected.
Change-Id: I97ca9f8395396c76332bdd07c486e60549ca4401
Signed-off-by: David Turner <dturner@twosigma.com>
In a DFS repository the DfsGarbageCollector will typically attempt
delta compression while creating the three main pack files: GC,
GC_REST and GC_TXN. Include all of these in the wasDeltaAttempted()
decision so that future packers can bypass delta compression of
non-delta objects.
Change-Id: Ic2330c69fab0c494b920b4df0a290f3c2e1a03d7
In 8ac65d33ed PackWriter changed its
behavior to always prefer the last object representation presented
to it by the ObjectReuseAsIs implementation. This was a fix to avoid
delta chain cycles.
Unfortunately it can lead to suboptimal compression when concurrent
GCs are run on the same repository. One case is automatic GC running
(with default settings) in parallel to a manual GC that has disabled
delta reuse in order to generate new smaller deltas for the entire
history of the repository.
Running GC with no-reuse generally requires more CPU time, which
also translates to a longer running time. This can lead to a race
where the automatic GC completes before the no-reuse GC, leaving
the repository in a state such as:
no-reuse GC: size 1 GiB, mtime = 18:45
auto GC: size 8 GiB, mtime = 17:30
With the default sort ordering, the smaller no-reuse GC pack is
sorted earlier in the pack list, due to its more recent mtime.
During object reuse in a future GC, these smaller representations
are considered first by PackWriter, but are all discarded when the
auto GC file from 17:30 is examined second (due to its older mtime).
Work around this in two ways.
Well formed DFS repositories should have at most 1 GC pack. If
2 or more GC packs exist, break the sorting tie by selecting the
smaller file earlier in the pack list. This allows all normal read
code paths to favor the smaller file, which places less pressure
on the DfsBlockCache. If any GC race happens, readers serving clone
requests will prefer the file that is smaller.
During object reuse, flip this ordering so that the smaller file is
last. This allows PackWriter to see smaller deltas last, replacing
larger representations that were previously considered from other
pack files.
Change-Id: I0b7dc8bb9711c82abd6bd16643f518cfccc6d31a
Delta search was discarding discovered deltas if an object appeared
near a type boundary in the delta search window. This has caused JGit
to produce larger pack files than other implementations of the packing
algorithm.
Delta search works by pushing prior objects into a search window, an
ordered list of objects to attempt to delta compress the next object
against. (The window size is bounded, avoiding O(N^2) behavior.)
For implementation reasons multiple object types can appear in the
input list, and the window. PackWriter commonly passes both trees and
blobs in the input list handed to the DeltaWindow algorithm. The pack
file format requires an object to only delta compress against the same
type, so the DeltaWindow algorithm must stop doing comparisions if a
blob would be compared to a tree.
Because the input list is sorted by object type and the window is
recently considered prior objects, once a wrong type is discovered in
the window the search algorithm stops and uses the current result.
Unfortunately the termination condition was discarding any found
delta by setting deltaBase and deltaBuf to null when it was trying
to break the window search.
When this bug occurs, the state of the DeltaWindow looks like this:
current
|
\ /
input list: tree0 tree1 blob1 blob2
window: blob1 tree1 tree0
/ \
|
res.prev
As the loop iterates to the right across the window, it first finds
that blob1 is a suitable delta base for blob2, and temporarily holds
this in the bestDelta/deltaBuf fields. It then considers tree1, but
tree1 has the wrong type (blob != tree), so the window loop must give
up and fall through the remaining code.
Moving the condition up and discarding the window contents allows
the bestDelta/deltaBuf to be kept, letting the final file delta
compress blob1 against blob0.
The impact of this bug (and its fix) on real world repositories is
likely minimal. The boundary from blob to tree happens approximately
once in the search, as the input list is sorted by type. Only the
first window size worth of blobs (e.g. 10 or 250) were failing to
produce a delta in the final file.
This bug fix does produce significantly different results for small
test repositories created in the unit test suite, such as when a pack
may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test
cases can now better sample different output pack file sizes depending
on delta compression and object reuse flags in PackConfig.
Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
Disabling the garbage pack coalescing when garbageTtl > 0 can result in
lot of garbage packs if they are created within the garbageTtl time.
To avoid a large number of garbage packs, re-introducing garbage pack
coalescing for the packs that are created within a single calendar day
when the garbageTtl is more than one day or one third of the garbageTtl.
Change-Id: If969716aeb55fb4fd0ff71d75f41a07638cd5a69
Signed-off-by: Thirumala Reddy Mutchukota <thirumala@google.com>