This implementation has been proven to deadlock in production server
loads. Google has been running with it disabled for a quite a while,
as the bugs have been difficult to identify and fix.
Instead of suggesting it works and is useful, drop the code. JGit
should not advertise support for functionality that is known to
be broken.
In a few of the places where read-ahead was enabled by DfsReader
there is more information about what blocks should be loaded when.
During object representation selection, or size lookup, or sending
object as-is to a PackWriter, or sending an entire pack as-is the
reader knows exactly which blocks are required in the cache, and it
also can compute when those will be needed. The broken read-ahead
code was stupid and just read a fixed amount ahead of the current
offset, which can waste IOs if more precise data was available.
DFS systems are usually slow to respond so read-ahead is still
a desired feature, but it needs to be rebuilt from scratch and
make better use of the offset information.
Change-Id: Ibaed8288ec3340cf93eb269dc0f1f23ab5ab1aea
String.valueOf is an overloaded and the compiler unfortunately picks
the wrong one since null contains no type information.
Change-Id: Icd197eaa046421f3cfcc5bf3e7601dc5bc7486b6
Rewrite this complicated logic to examine each pack file exactly
once. This reduces thrashing when there are many large pack files
present and the reader needs to locate each object's header.
The intermediate temporary list is now smaller, it is bounded to
the same length as the input object list. In the prior version of
this code the list contained one entry for every representation of
every object being packed.
Only one representation object is allocated, reducing the overall
memory footprint to be approximately one reference per object found
in the current pack file (the pointer in the BlockList). This saves
considerable working set memory compared to the prior version that
made and held onto a new representation for every ObjectToPack.
Change-Id: I2c1f18cd6755643ac4c2cf1f23b5464ca9d91b22
Clip the configured limit to Integer.MAX_VALUE at the top of the
loop, saving a compare branch per object considered. This can cut
2M branches out of a repacking of the Linux kernel.
Rewrite the logic so the primary path is to match the conditional;
most objects are larger than BLKSZ (16 bytes) and less than limit.
This may help branch prediction on CPUs if the CPU tries to assume
execution takes the side of the branch and not the second.
Change-Id: I5133d1651640939afe9fbcfd8cfdb59965c57d5a
There is no reasonable way for a subclass to correctly override and
implement these methods. They depend on internal state that cannot
otherwise be managed.
Most of these methods are also in critical paths of PackWriter.
Declare them final so subclasses do not try to replace them,
and so the JIT knows the smaller ones can be safely inlined.
Change-Id: I9026938e5833ac0b94246d21c69a143a9224626c
None of these methods should ever be overridden at runtime by an
extension class. Given how small they are the JIT should perform
inlining where reasonable. Hint this is possible by marking all
methods final so its clear no replacement can be loaded later on.
Change-Id: Ia75a5d36c6bd25b24169e2bdfa360c8f52b669cd
This flag is never checked on its own. It is only checked as part
of a pair through the doNotAttemptDelta() method. Delete the method
so there is less confusion about the flag being used on its own.
Change-Id: Id7088caa649599f4f11d633412c2a2af0fd45dd8
This method is only invoked with true as the argument.
Remove the unnecessary parameter and branch, making
the code easier for the JIT to optimize.
Change-Id: I68a9cd82f197b7d00a524ea3354260a0828083c6
Added also tests and the associated option for the command line Merge
command.
Bug: 335091
Change-Id: Ie321c572284a6f64765a81674089fc408a10d059
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Due to the Git internal sort order a directory is sorted as if it ended
with a '/', this means that the path filter didn't set the last possible
matching entry to the correct value. In the reported issue we had the
following filters.
org.eclipse.jgit.console
org.eclipse.jgit
As an optimization we throw a StopWalkException when the walked tree
passes the last possible filter, which was this:
org.eclipse.jgit.console
Due to the git sorting order, the tree was processed in this order:
org.eclipse.jgit.console
org.eclipse.jgit.test
org.eclipse.jgit
At org.eclipse.jgit.test we threw the StopWalkException preventing the
walk from completing successfully.
A correct last possible match should be:
org.eclipse.jgit/
For simplicit we define it as:
org/eclipse/jgit/
This filter would be the maximum if we also had e.g. org and org.eclipse
in the filter, but that would require more work so we simply replace all
characters lower than '/' by a slash.
We believe the possible extra walking does not not warrant the extra
analysis.
Bug: 362430
Change-Id: I4869019ea57ca07d4dff6bfa8e81725f56596d9f
1. I have authored 100% of the content I'm contributing,
2. I have the rights to donate the content to Eclipse,
3. I contribute the content under the EDL
Change-Id: I48b1828e0b1304f76276ec07ebac7ee9f521b194
Problem:
LogCommand.all() throws an IncorrectObjectTypeException when
there are tag references, and the repository does not contain
the file "packed-refs". It seems that the references were not properly
peeled before being added to the markStart() method.
Solution:
Call getRepository().peel() on every Ref that has isPeeled()==false
in LogCommand.all() .
Added test case for LogCommand.all() on repo with a tag.
1. I have authored 100% of the content I'm contributing,
2. I have the rights to donate the content to Eclipse,
3. I contribute the content under the EDL
Bug: 402025
Change-Id: Idb8881eeb6ccce8530f2837b25296e8e83636eb7
Instead of re-reading all refs after each update, execute
the deletes first, then read all refs once and perform
the check for conflicting ref names in memory.
Change-Id: I17d0b3ccc27f868c8497607d8e57bf7082e65ba3
Allowed ipv6-address in a uri like:
http://[::1]:8080/repo.git
Change-Id: Ia00a20f694b2e9314892df77f9b11f551bb1d34e
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
This wrong book-keeping caused IOExceptions to be thrown because
LockFile.unlock() erroneously tried to delete the non-existing lock
file. These IOExeptions were hidden since they were silently caught.
Change-Id: If42b6192d92c5a2d8f2bf904b16567ef08c32e89
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Instead of forcing the implementation of the DFS backend to handle
making sure the extension bits are set correctly, have the common
callers in JGit set the extension at the same time they supply the
file sizes to the pack description. This simplifies assumptions for
an implementation of the DFS backend.
Change-Id: I55142ad8ea08a3e2e8349f72b3714578eba9c342
On Windows renameTo will not overwrite a file, so it must be deleted
first. The fix for Bug 402834 did not account for that.
Bug: 403685
Change-Id: I3453342c17e064dcb50906a540172978941a10a6
Unlike the OS or Java rename this method will (on *nix) try (on Windows)
replace the target with the source provided the target does not exist,
the target does exist and is a file, or if it is a directory which only
contains directories. In the latter case the directory hierarchy will be
deleted.
If the initial rename fails and the target is an existing file the the
target file will be deleted first and then the rename is retried.
Change-Id: Iae75c49c85445ada7795246a02ce02f7c248d956
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Allow users to provide their OutputStream (via Transport#
push(monitor, refUpdates, out)) so that server messages can be written
to it (in SideBandInputStream) while they're coming in.
CQ: 7065
Bug: 398404
Change-Id: I670782784b38702d52bca98203909aca0496d1c0
Signed-off-by: Andre Dietisheim <andre.dietisheim@gmail.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When running the garbage collection for a repository it is often
interesting to compare the repository statistics from before and after
the garbage collection to understand the effect of the garbage
collection. This is why it makes sense that the
GarbageCollectionCommand provides a method to retrieve the repository
statistics before running the garbage collection.
So far without running the garbage collection the repository statistics
can only be retrieved by using JGit internal classes. This is what EGit
and Gerrit do at the moment, but it would be better to have an API for
this.
Change-Id: Id7e579157e9fbef5cfd1fc9f97ada45f0ca8c379
Signed-off-by: Edwin Kempin <edwin.kempin@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This will simplify to adapt EGit to the removal of FileRepository from
jgit's public API in change I2ab1327c202ef2003565e1b0770a583970e432e9.
Change-Id: If98b0b97e8f13a94d4ea7ba1be0f90d82b0fba4b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Only on Windows the rename operation which renames temporary Packfiles
(and index-files and bitmap-files) sometime fails. This happens only
when renaming a temporary Packfile to a Packfile which already exists.
Such situations occur if you run GC twice on a repo without modifying
the repo inbetween.
In such situations there was bug in GC which led to a corrupted repo
whithout any packfiles anymore. This commit fixes the problem by
introducing a utility method which renames a file and throws an
IOException if it fails. This method also takes care to repeat a
failing rename if our FS class has found out we are running on a
platform with a unreliable File.renameTo() method.
I am searching for a better solution because even with this utility
method in hand a GC on a already GC'ed repo will fail on Windows. But
at least with this fix we will not produce corrupted repos anymore.
Bug: 389305
Change-Id: Iac1ab3e0b8c419c90404f2e2f3559672eb8f6d28
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
With JGit it is possible to write reflog entries where new objectid and
old objectid is null. Such reflogs cause FileRepository GC to crash
because it doesn't expect the new objectid to be null. One case where
this happened is in Gerrit's allProjects repo. In the same way as we
expect the old objectid to be potentially null we should also ignore
null values in the new objectid column.
Change-Id: Icf666c7ef803179b84306ca8deb602369b8df16e
This breaks all existing callers once. Applications are not supposed
to build against the internal storage API unless they can accept API
churn and make necessary updates as versions change.
Change-Id: I2ab1327c202ef2003565e1b0770a583970e432e9
As noticed by Robin Rosenberg in review of
I4eb87c850078ca187b38b81cc91c92afb1176945.
Change-Id: If96d66b6c025ad8f2f47829c933f3c65ab6cbeef
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Continuing is trickier, as .git/rebase-apply contains no message file
and no git-rebase-todo.
Bug: 336820
Change-Id: I4eb87c850078ca187b38b81cc91c92afb1176945
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Treat first parent traversals as 1 and higher parents as MERGE_COST,
to match git name-rev. Allow overriding the merge cost during tests to
avoid creating 2^16 commits on the fly.
Change-Id: I0175e0c3ab1abe6722e4241abe2f106d1fe92a69
This is helpful for writing the pack configuration into a log file.
Change-Id: I5e7f5ff7e01c9538ca12a1860844ba9b467bdf05
Signed-off-by: Edwin Kempin <edwin.kempin@sap.com>
This is helpful for writing the repository statistics into a log file.
Change-Id: I0e8cd9ad05f123ab3851960890a50213f353a373
Signed-off-by: Edwin Kempin <edwin.kempin@sap.com>
The bitmap code in PackWriter knows exactly when to use a pack as
a "cached pack". It enables cached pack usage only when the pack
has a bitmap and its entire closure of objects needs to be sent.
This is a much simpler code path to maintain, and JGit actually
has a way to write the necessary index.
Change-Id: I2645d482f8733fdf0c4120cc59ba9aa4d4ba6881
Just counting objects is not sufficient. There are some race
conditions with receive packs and delta base completion that
may confuse such a simple algorithm.
Instead always do the larger set computations, and rely on the
PackWriter having no objects pending as the way to avoid creating
an empty pack file.
Change-Id: Ic81fefb158ed6ef8d6522062f2be0338a49f6bc4
Let the pack description copy the relevant stats values. This
moves it out of the garbage collector and compactor algorithms,
co-locating with something that might care.
Remove some unnecessary code from the DfsPackCompactor, the stats
tracks the same information and can supply it.
Change-Id: Id64ab38d507c0ed19ae0d106862d175b7364eba3
Prefer ~(N+1) to ^1~N. Although both are correct, the former is
cleaner and matches "git name-rev".
Change-Id: I772001a219e5eb346f5552c92e6d98c70b2cfa98
The walk logic does not use RevWalk because it needs to walk all paths
to each of the requested commits, keeping track of each path along which
the commit was found in the RevCommit subclass. From these paths, a
single "best" path is chosen based on the total path length, with a
penalty applied for paths that traverse merges.
This functionality parallels "git name-rev".
Change-Id: I92bfb47dd16c898313d2ee525395609c3bf72ebe
This fixes two cases:
- A folder without tracked content exist both in the workdir and merged
commit, as long as there names within that folder does not conflict.
- An empty folder structure exists with the same name as a file in the
merged commit.
Bug: 402834
Change-Id: I4c5b9f11313dd1665fcbdae2d0755fdb64deb3ef
Clients send a bunch of unknown objects to UploadPack on each round
of negotiation. Many of these are not known to the server, which
leads the implementation to be looking at indexes for garbage packs.
Disable examining the index of a garbage pack, allowing servers to
avoid reading them from disk during negotiation.
The effect of this change is the server will only ACK a have line
if the object was reachable during the last garbage collection,
or was recently added to the repository. For most repositories
there is no impact in this behavior change.
If a repository rewinds a branch, runs GC, and then resets the
branch back to where it was before, the now current tip is going to
be skipped by this change. A client that has the commit may wind up
getting a slightly larger data transfer from the server as an older
common ancestor will be chosen during negotiation. This is fixable
on the server side by running GC again to correct the layout of
objects in pack files.
Change-Id: Icd550359ef70fc7b701980f9b13d923fd13c744b
The DHT backend was very slow at parsing objects. To work around
that performance limitation I obfuscated UploadPack by folding both
the want and have sets together in a single parse queue. Since DHT
was removed the complexity is no longer constructive to JGit.
Doing this refactoring prepares the code for a slightly future
change where the have lines need to be handled specially from the
want lines. Splitting the parsing up into two phases makes such
a modification trivial.
Change-Id: If7aad533b82448bbb688278e21f709282e5ccf4b
Garbage is unlikely to be used by a reader. Ensure they always
cluster at the end of the search list, no matter what timestamp
was used on the pack files.
Change-Id: I3bed89e9569ee3363c36bb3f73fcd34057a3883f
If a repository has significant amounts of unreachable garbage the
final phase to coalesce it can take longer than any other part of the
garbage collection phase. Provide a setting for applications to tweak
the threshold where coalescing ends and files just remain on disk.
Change-Id: I5f11a998a7185c75ece3271d8bc6181bb83f54c1