The getCopyBuffer() is almost always used during output. All known
implementations of ObjectReuseAsIs rely on the buffer to be present,
and the only sane way to get good performance from PackWriter is to
reuse objects during packing.
Avoid a branch and test when obtaining this buffer by making sure
it is always populated.
Change-Id: I200baa0bde5dcdd11bab7787291ad64535c9f7fb
If a server is streaming 3GiB worth of pack data to a client there
is no reason to compute the CRC32 checksum on the objects. The
CRC32 code computed by PackWriter is used only in the new index
created by writeIndex(), which is never invoked for the native Git
network protocols.
Object reuse may still compute its own CRC32 to verify the data
being copied from an existing pack has not been corrupted. This
check is done by the ObjectReader that implements ObjectReuseAsIs
and has no relationship to the CRC32 being skipped during output.
Change-Id: I05626f2e0d6ce19119b57d8a27193922636d60a7
If the configuration wants to run 4 threads the delta search work
is initially split somewhat evenly across the 4 threads. During
execution some threads will finish early due to the work not being
split fairly, as the initial partitions were based on object count
and not cost to inflate or size of DeltaIndex.
When a thread finishes early it now tries to take 50% of the work
remaining on a sibling thread, and executes that before exiting.
This repeats as each thread completes until a thread has only 1
object remaining.
Repacking Blink, Chromium's new fork of WebKit (2.2M objects 3.9G):
[pack]
reuseDeltas = false
reuseObjects = false
depth = 50
threads = 8
window = 250
windowMemory = 800m
before: ~105% CPU after 80%
after: >780% CPU to 100%
Change-Id: I65e45422edd96778aba4b6e5a0fd489ea48e8ca3
* changes:
Fix plugin provider names to conform with release train requirement
Add missing @since tags for new API methods
DfsReaderOptions are options for a DFS stored repository
According to release train requirements [1] the provider name for all
artifacts of Eclipse projects is "Eclipse <project name>".
[1] http://wiki.eclipse.org/Development_Resources/HOWTO/Release_Reviews#Checklist
Change-Id: I8445070d1d96896d378bfc49ed062a5e7e0f201f
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Some packs built by JGit have incredibly long delta chains due to a
long standing bug in PackWriter. Google has packs created by JGit's
DfsGarbageCollector with chains of 6000 objects long, or more.
Inflating objects at the end of this 6000 long chain is impossible
to complete within a reasonable time bound. It could take a beefy
system hours to perform even using the heavily optimized native C
implementation of Git, let alone with JGit.
Enable pack.cutDeltaChains to be set in a configuration file to
permit the PackWriter to determine the length of each delta chain
and clip the chain at arbitrary points to fit within pack.depth.
Delta chain cycles are still possible, but no attempt is made to
detect them. A trivial chain of A->B->A will iterate for the full
pack.depth configured limit (e.g. 50) and then pick an object to
store as non-delta.
When cutting chains the object list is walked in reverse to try
and take advantage of existing chain computations. The assumption
here is most deltas are near the end of the list, and their bases
are near the front of the list. Going up from the tail attempts to
reuse chainLength computations by relying on the memoized value in
the delta base.
The chainLength field in ObjectToPack is overloaded into the depth
field normally used by DeltaWindow. This is acceptable because the
chain cut happens before delta search, and the chainLength is reset
to 0 if delta search will follow.
Change-Id: Ida4fde9558f3abbbb77ade398d2af3941de9c812
This switch is called mostly for OBJ_TREE and OBJ_BLOB types, which
typically make up 66% of the objects in a repository. Simplify the
test for these common types by testing for the one bit they have in
common and returning early.
Object type 5 is currently undefined. In the old code it would hit
the default and return true. In the new code it will match the early
case and also return true. In either implementation 5 should never show
up as it is not a valid type known to Git.
Object type 6 OFS_DELTA is not permitted to be supplied here.
Object type 7 REF_DELTA is not permitted to be supplied here.
Change-Id: I0ede8acee928bb3e73c744450863942064864e9c
Now that WANT_WRITE is gone renumber the flags to move the unused
bit next to the type. Recluster AS_IS and DELTA_ATTEMPTED to be
next to each other since these bits are tested as a pair.
Change-Id: I42994b5ff1f67435e15c3f06d02e3b82141e8f08
Free up the WANT_WRITE flag in ObjectToPack by switching the test
to use the special offset value of 1. The Git pack file format
calls for the first 4 bytes to be 'PACK', which means any object
must start at an offset >= 4. Current versions require another 8
bytes in the header, placing the first object at offset = 12.
So offset = 1 is an invalid location for an object, and can be
used as a marker signal to indicate the writing loop has tried
to write the object, but recursed into the base first. When an
object is visited with offset == 1 it means there is a cycle in
the delta base path, and the cycle must be broken.
Change-Id: I2d05b9017c5f9bd9464b91d43e8d4b4a085e55bc
Garbage is randomly ordered and unlikely to delta compress against
other garbage. Disable delta compression allowing objects to switch
to whole form when moving to the garbage pack.
Because the garbage is not well compressed assume deltas were not
attempted during a normal GC cycle.
Override the reuse settings, garbage that can be reused should be
reused as-is into the garbage pack rather than switching something
like the compression level during a GC. It is intended that garbage
will eventually be removed from the repository so expending CPU
time on a compression switch is not worthwhile.
Change-Id: I0e8e58ee99e5011d375d3d89c94f2957de8402b9
This implementation has been proven to deadlock in production server
loads. Google has been running with it disabled for a quite a while,
as the bugs have been difficult to identify and fix.
Instead of suggesting it works and is useful, drop the code. JGit
should not advertise support for functionality that is known to
be broken.
In a few of the places where read-ahead was enabled by DfsReader
there is more information about what blocks should be loaded when.
During object representation selection, or size lookup, or sending
object as-is to a PackWriter, or sending an entire pack as-is the
reader knows exactly which blocks are required in the cache, and it
also can compute when those will be needed. The broken read-ahead
code was stupid and just read a fixed amount ahead of the current
offset, which can waste IOs if more precise data was available.
DFS systems are usually slow to respond so read-ahead is still
a desired feature, but it needs to be rebuilt from scratch and
make better use of the offset information.
Change-Id: Ibaed8288ec3340cf93eb269dc0f1f23ab5ab1aea
String.valueOf is an overloaded and the compiler unfortunately picks
the wrong one since null contains no type information.
Change-Id: Icd197eaa046421f3cfcc5bf3e7601dc5bc7486b6
Rewrite this complicated logic to examine each pack file exactly
once. This reduces thrashing when there are many large pack files
present and the reader needs to locate each object's header.
The intermediate temporary list is now smaller, it is bounded to
the same length as the input object list. In the prior version of
this code the list contained one entry for every representation of
every object being packed.
Only one representation object is allocated, reducing the overall
memory footprint to be approximately one reference per object found
in the current pack file (the pointer in the BlockList). This saves
considerable working set memory compared to the prior version that
made and held onto a new representation for every ObjectToPack.
Change-Id: I2c1f18cd6755643ac4c2cf1f23b5464ca9d91b22
Clip the configured limit to Integer.MAX_VALUE at the top of the
loop, saving a compare branch per object considered. This can cut
2M branches out of a repacking of the Linux kernel.
Rewrite the logic so the primary path is to match the conditional;
most objects are larger than BLKSZ (16 bytes) and less than limit.
This may help branch prediction on CPUs if the CPU tries to assume
execution takes the side of the branch and not the second.
Change-Id: I5133d1651640939afe9fbcfd8cfdb59965c57d5a
There is no reasonable way for a subclass to correctly override and
implement these methods. They depend on internal state that cannot
otherwise be managed.
Most of these methods are also in critical paths of PackWriter.
Declare them final so subclasses do not try to replace them,
and so the JIT knows the smaller ones can be safely inlined.
Change-Id: I9026938e5833ac0b94246d21c69a143a9224626c
None of these methods should ever be overridden at runtime by an
extension class. Given how small they are the JIT should perform
inlining where reasonable. Hint this is possible by marking all
methods final so its clear no replacement can be loaded later on.
Change-Id: Ia75a5d36c6bd25b24169e2bdfa360c8f52b669cd
This flag is never checked on its own. It is only checked as part
of a pair through the doNotAttemptDelta() method. Delete the method
so there is less confusion about the flag being used on its own.
Change-Id: Id7088caa649599f4f11d633412c2a2af0fd45dd8
This method is only invoked with true as the argument.
Remove the unnecessary parameter and branch, making
the code easier for the JIT to optimize.
Change-Id: I68a9cd82f197b7d00a524ea3354260a0828083c6
Added also tests and the associated option for the command line Merge
command.
Bug: 335091
Change-Id: Ie321c572284a6f64765a81674089fc408a10d059
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Due to the Git internal sort order a directory is sorted as if it ended
with a '/', this means that the path filter didn't set the last possible
matching entry to the correct value. In the reported issue we had the
following filters.
org.eclipse.jgit.console
org.eclipse.jgit
As an optimization we throw a StopWalkException when the walked tree
passes the last possible filter, which was this:
org.eclipse.jgit.console
Due to the git sorting order, the tree was processed in this order:
org.eclipse.jgit.console
org.eclipse.jgit.test
org.eclipse.jgit
At org.eclipse.jgit.test we threw the StopWalkException preventing the
walk from completing successfully.
A correct last possible match should be:
org.eclipse.jgit/
For simplicit we define it as:
org/eclipse/jgit/
This filter would be the maximum if we also had e.g. org and org.eclipse
in the filter, but that would require more work so we simply replace all
characters lower than '/' by a slash.
We believe the possible extra walking does not not warrant the extra
analysis.
Bug: 362430
Change-Id: I4869019ea57ca07d4dff6bfa8e81725f56596d9f
1. I have authored 100% of the content I'm contributing,
2. I have the rights to donate the content to Eclipse,
3. I contribute the content under the EDL
Change-Id: I48b1828e0b1304f76276ec07ebac7ee9f521b194
Problem:
LogCommand.all() throws an IncorrectObjectTypeException when
there are tag references, and the repository does not contain
the file "packed-refs". It seems that the references were not properly
peeled before being added to the markStart() method.
Solution:
Call getRepository().peel() on every Ref that has isPeeled()==false
in LogCommand.all() .
Added test case for LogCommand.all() on repo with a tag.
1. I have authored 100% of the content I'm contributing,
2. I have the rights to donate the content to Eclipse,
3. I contribute the content under the EDL
Bug: 402025
Change-Id: Idb8881eeb6ccce8530f2837b25296e8e83636eb7
Instead of re-reading all refs after each update, execute
the deletes first, then read all refs once and perform
the check for conflicting ref names in memory.
Change-Id: I17d0b3ccc27f868c8497607d8e57bf7082e65ba3
Allowed ipv6-address in a uri like:
http://[::1]:8080/repo.git
Change-Id: Ia00a20f694b2e9314892df77f9b11f551bb1d34e
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Writing CLI test cases is tedious because of all the formatting and
escaping subtleties needed when comparing actual output with what's
expected. While creating a test case the two new functions are to be
used instead of the existing execute() in order to prepare the correct
command and expected output and to generate the corresponding test code
that can be pasted into the test case function.
Change-Id: Ia66dc449d3f6fb861c300fef8b56fba83a56c94c
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
This wrong book-keeping caused IOExceptions to be thrown because
LockFile.unlock() erroneously tried to delete the non-existing lock
file. These IOExeptions were hidden since they were silently caught.
Change-Id: If42b6192d92c5a2d8f2bf904b16567ef08c32e89
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Instead of forcing the implementation of the DFS backend to handle
making sure the extension bits are set correctly, have the common
callers in JGit set the extension at the same time they supply the
file sizes to the pack description. This simplifies assumptions for
an implementation of the DFS backend.
Change-Id: I55142ad8ea08a3e2e8349f72b3714578eba9c342
On Windows renameTo will not overwrite a file, so it must be deleted
first. The fix for Bug 402834 did not account for that.
Bug: 403685
Change-Id: I3453342c17e064dcb50906a540172978941a10a6
Unlike the OS or Java rename this method will (on *nix) try (on Windows)
replace the target with the source provided the target does not exist,
the target does exist and is a file, or if it is a directory which only
contains directories. In the latter case the directory hierarchy will be
deleted.
If the initial rename fails and the target is an existing file the the
target file will be deleted first and then the rename is retried.
Change-Id: Iae75c49c85445ada7795246a02ce02f7c248d956
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>