A CompressedBitmap represents a pair (EWAH bit vector, PackIndex
assigning bit positions to git objects). The bit vector is a member
field and the PackIndex is implicit via the 'this' reference to the
outer class.
Make this clearer by making CompressedBitmap a static class and
replacing the 'this' reference by an explicit field.
Likewise for CompressedBitmapBuilder.
Change-Id: Id85659fc4fc3ad82034db3370cce4cdbe0c5492c
Suggested-by: Terry Parker <tparker@google.com>
When creating bitmaps during gc, the bitmaps in tipCommitBitmaps are
built in setupTipCommitBitmaps using the following procedure:
0. create a bitmap ('reuse') that lists all ancestors of commits
whose existing bitmaps will be reused. I will call this the
reused part of history.
1. initialize a bitmap for each of the pack's "want"s by taking
a copy of the 'reuse' bitmap and setting the bit corresponding
to the one wanted commit.
2. walk through ancestors of wants, excluding the reused part of
history. Add parents of visited commits to bitmaps that have
those commits.
3. AND-NOT each tipCommitBitmap against the 'reuse' bitmap
4. Sort the bitmaps and AND-NOT each against the previous so they
partition the new commits.
The OR against 'reuse' in step 1 and the AND-NOT against 'reuse'
cancel each other out, except when commits from the reused part of
history are added to a bitmap in step 2. So avoid adding commits from
the reused part of history in step 2 and skip the OR and AND-NOT.
Performance impact (thanks to Terry for measuring):
The initial "selecting bitmaps" phase of garbage collection decreased
from (83 + 81 + 85) / 3 = 83 to (56 + 57 + 56) / 3 = 56.3, meaning
nearly a ~50% speedup of that phase.
Tested-by: Terry Parker <tparker@google.com>
Change-Id: I26ea695809594347575d14a1d8e6721b8608eb9c
Instead of using the newRevFilter helper, call the appropriate
RevFilter constructor directly. This means one less hop to find
documentation about what the RevFilter will do.
Change-Id: Ida6ff1c0457a47a1bd1e4ed0fd1dd42a616d214f
When setupTipCommitBitmaps is called, writeBitmaps does not have any
bitmaps saved, so these calls to .add always add a single commit and
do not OR in a bitmap.
The objects returned by nextObject after a commit walk is finished
are trees and blobs. Non-commit objects do not have bitmaps
associated so the call to .add also can only add a single object.
Change-Id: I7b816145251a7fa4f1fffe0d03644257ea7f5440
This is the caller that the BitmapBuilder.add method was designed
around. Moving away from .add makes it more verbose but hopefully
clearer.
Change-Id: I57b1d7c1dc8fb800b242b76c606922b5aa36b9b2
This puts the code for include() in each RevFilter returned by
newRevFilter in one place and should make the code easier to
understand and modify.
Change-Id: I590ac4604b61fc1ffe7aba2ed89f8139847ccac3
The count of loaded commits is equal to the number of commits returned
by the walk. Simplify BitmapRevFilter by counting them in the caller.
Change-Id: Ia95da47831d9e89d6f8068470ec4529aaabfa7dd
Every Bitmap in current JGit code has an associated BitmapIndex. Make
it public in BitmapBuilder to make retrieving bitmaps to OR in from
that index easier.
Change-Id: I2773aa94d8b67f12194608e6317c0792a5de21e2
The BitmapIndex.BitmapBuilder.add API is subtle:
/**
* Adds the id and the existing bitmap for the id, if one
* exists, to the bitmap.
*
* @return true if the value was not contained or able to be
* loaded.
*/
boolean add(AnyObjectId objectId, int type);
Reading the name of the method does not make it obvious what it will
do. Does it add the named object to the bitmap, or all objects
reachable from it? It depends on whether the BitmapIndex owns an
existing bitmap for that object. I did not notice this subtlety when
skimming the javadoc, either. This resulted in enough confusion to
subtly break the bitmap building code (see change
I30844134bfde0cbabdfaab884c84b9809dd8bdb8 for details).
So discourage use of the add() API by deprecating it.
To replace it, provide a addObject() method that adds a single object.
This way, callers can decide whether to use addObject() or or() based
on the context.
For example,
if (bitmap.add(c, OBJ_COMMIT)) {
for (RevCommit p : c.getParents()) {
rememberToAlsoHandle(p);
}
}
can be replaced with
if (bitmap.contains(c)) {
// already included
} else if (index.getBitmap(c) != null) {
bitmap.or(index.getBitmap(c));
} else {
bitmap.addObject(c, OBJ_COMMIT);
for (RevCommit p : c.getParents()) {
rememberToAlsoHandle(p);
}
}
which is more verbose but makes it clearer that the behavior
depends on the content of index.getBitmaps().
Change-Id: Ib745645f187e1b1483f8587e99501dfcb7b867a5
This makes it easier to distinguish between implementations of methods
from the interface from helpers internal to org.eclipse.jgit.internal.storage.*.
This was illegal in Java 5 but JGit requires newer Java these days.
Change-Id: I92c65f3407a334acddd32ec9e09ab7d1d39c4dc6
This RevWalk filters out reused bitmap commits via the 'reuse' bitmap.
Avoid possible wasted time and complexity by not also redundantly
marking them UNINTERESTING.
Change-Id: Ibb714002ddac599963d148a9aab90645fcc73141
When building fullBitmap in order to determine which ancestor chain to
add this commit to, we were excluding the ancestors of reusedCommits
using markUninteresting. This use of markUninteresting is a bit
wasteful because we already have a bitmap indicating exactly which
commits should be excluded (which can save some walking). Use it.
A separate commit will remove the now-redundant markUninteresting
call.
No behavior change intended (except for performance improvement).
Change-Id: I1112641852d72aa05c9a8bd08a552c70342ccedb
This RevWalk filters out reused bitmap commits via the 'reuse' bitmap.
Avoid possible wasted time and complexity by not redundantly marking
them UNINTERESTING any more.
Change-Id: If467ccd1d75e17cf9367b2a0399fca3f9d52adf9
When garbage collecting, we decide to reuse some bitmaps in older
history from the previous pack to save time. The remainder of commit
selection only involves commits not covered by those bitmaps.
Currently we carry that out in two ways:
1. by building a bitmap representing the already-covered commits,
for easy containment checks and AND-NOT-ing against
2. by marking the reused bitmap commits as uninteresting in the
RevWalk that finds new commits
The mechanism in (2) is less efficient than (1): rw.next() will walk
back from reused bitmap commits to check whether the commit it is
about to emit is an ancestor of them, when using the bitmap from (1)
would let us perform the same check with a single contains() call.
Add a RevFilter teaching the RevWalk to perform that same check
directly using the bitmap from (1).
The next time the RevWalk is used, a different RevFilter is installed
so this does not break that.
A later commit will drop the markUninteresting calls.
No functional change intended except a possible speedup.
Change-Id: Ic375210fa69330948be488bf9dbbe4cb0d069ae6
Prior to this change, DfsInserter would not insert an object into a pack
if it already existed in another pack in the repository, even if that
pack was unreachable. Consider this sequence of events:
- Object FOO is pushed to a repository.
- Subsequent ref changes make FOO UNREACHABLE_GARBAGE.
- FOO is subsequently re-inserted using a DfsInserter, but skipped
due to existing in UNREACHABLE_GARBAGE.
- The repository is repacked; FOO will not be written into a new pack
because it is not yet reachable from a reference. If the
UNREACHABLE_GARBAGE packs are deleted, FOO disappears.
- A reference is updated to reference FOO. This reference is now broken
as FOO was removed when the repacking process deleted the
UNREACHABLE_GARBAGE pack that stored the only copy of FOO.
The garbage collector can't safely delete the UNREACHABLE_GARBAGE
pack because FOO might be in the middle of being re-inserted/re-packed.
This change writes a duplicate copy of an object if it only exists in
UNREACHABLE_GARBAGE. This "freshens" the object to give it a chance to
survive long enough to be made reachable through a reference.
Change-Id: I20f2062230f3af3bccd6f21d3b7342f1152a5532
Signed-off-by: Mike Williams <miwilliams@google.com>
Until 320a4142 (Update bitmap selection throttling to fully span
active branches, 2015-10-20), setupTipCommitBitmaps contained code
along the following lines:
for (PackBitmapIndexRemapper.Entry entry : bitmapRemapper) {
if (!reuse(entry))
continue;
RevCommit rc = (RevCommit) rw.peel(rw.parseAny(entry));
reuseCommits.add(new BitmapCommit(rc));
EWAHCompressedBitmap bitmap =
remap.ofObjectType(remap.getBitmap(rc), OBJ_COMMIT);
writeBitmaps.addBitmap(rc, bitmap, 0);
reuse.add(rc, OBJ_COMMIT);
}
writeBitmaps.clearBitmaps(); // Remove temporary bitmaps
This loop OR-ed together bitmaps for commits whose bitmaps would be
reused. A subtle point is the use of the add() method, which ORs in a
bitmap from the BitmapIndex when it exists and falls back to OR-ing in
a single bit when that bitmap does not exist in the BitmapIndex.
Commit 320a4142 removed the addBitmap step, so the bitmap does not
exist in the BitmapIndex and the fallback behavior is triggered.
Simplify and restore the intended behavior by avoiding use of the
subtle use of the add() method --- use or() directly instead.
Change-Id: I30844134bfde0cbabdfaab884c84b9809dd8bdb8
When packing is able to reuse lots of deltas from existing packs, those
objects are marked as "doNotAttemptDelta" and do not contribute to
DeltaTask's computeTopPaths() "totalWeight" calculation.
In the extreme case when all packs are reusable, "totalWeight" will be
zero. DeltaTask.partitionTasks() uses "totalWeight" to determine a
"weightPerThread" size it uses to set up DeltaTasks. When "totalWeight"
is small, partitionTasks() ends up creating a DeltaTask for every
unique path.
For a large repository, the small "weightPerThread" can result in the
creation of >100k tasks (for the MSM 3.10 Linux repository, the count
was ~150k). This makes the "task stealing" mechanism in DeltaTask very
inefficient, because every attempt to steal work does a linear walk
through all tasks, searching for the one with the most work remaining,
which is O(N^2) comparisons. For the MSM 3.10 repository when all
deltas were reusable, PackWriter.parallelDeltaSearch() took
(1615+1633+1458)/3 = 1568 seconds.
The error is that DeltaTask treats the weights of objects marked as
"doNotAttemptDelta" inconsistently. It ignores the weights when
calculating "totalWeight" but uses them when partitioning the tasks.
The fix is to also ignore them when partitioning the tasks.
With this patch applied, PackWriter.parallelDeltaSearch() on the
MSM 3.10 repository when all deltas are reused went from taking
1568 seconds to 62ms (>25k speedup).
This patch also fixes a totalWeight initialization error in
DeltaTask.computeTopPaths().
Change-Id: I2ae37efa83bca42b0e716266ae6aa9d182e76d9c
Signed-off-by: Terry Parker <tparker@google.com>
Avoid leaving the reader in suspense by handling the unusual
(!RevCommit) case first. As a nice side effect, there is less nesting
to keep track of in the rest of the loop body.
No functional change intended.
Change-Id: I1580de444fccde08070f696218c12041151a924a
When the file <git-dir>/hooks/pre-push exists make sure that is is
executing during a push. The pre-push hook runs during git push, after
the remote refs have been updated but before any objects have been
transferred.
Change-Id: Ibbb58ee3227742d1a2f913134ce11e7a135c7f4c
In order to support filters in gitattributes FS.runProcess() is made
public. Support for stdin redirection has been added. Support for binary
data on stdin/stdout (as used be clean/smudge filters) has been added.
Change-Id: Ice2c152e9391368dc5748d7b825a838e3eb755f9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fixed random errors in discoverGitSystemConfig() on Linux where the
process error stream was closed by readPipe() before or while
GobblerThread was reading from it.
Marked readPipe() as @Nullable and fixed potential NPE in
discoverGitSystemConfig() on readPipe() return value.
Fixed process error output randomly mixed with other threads log
messages.
Change-Id: Id882af2762cfb75f010f693b2e1c46eb6968ee82
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Add comments and rename variables in PackWriterBitmapPreparer to improve
readability.
Change-Id: I49e7a1c35307298f7bc32ebfc46ab08e94290125
Signed-off-by: Terry Parker <tparker@google.com>
Java compiler must generate synthetic access methods for private methods
and fields of the enclosing class if they are accessed from inner
classes and vice versa.
While invisible in the code, those synthetic access methods exist in the
bytecode and seem to produce some extra execution overhead at runtime
(compared with the direct access to this fields or methods), see
https://git.eclipse.org/r/58948/.
By removing the "private" access modifier from affected methods and
fields we help compiler to avoid generation of synthetic access methods
and hope to improve execution performance.
To validate changes, one can either use javap or use Bytecode Outline
plugin in Eclipse. In both cases one should look for "synthetic
access$<number>" methods at the end of the class and inner class files
in question - there should be none.
NB: don't mix this "synthetic access$" methods up with "public synthetic
bridge" methods generated to allow generic method override return types.
Change-Id: I94fb481b68c84841c1db1a5ebe678b13e13c962b
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Java compiler must generate synthetic access methods for private methods
and fields of the enclosing class if they are accessed from inner
classes and vice versa.
While invisible in the code, those synthetic access methods exist in the
bytecode and seem to produce some extra execution overhead at runtime
(compared with the direct access to this fields or methods), see
https://git.eclipse.org/r/58948/.
By removing the "private" access modifier from affected methods and
fields we help compiler to avoid generation of synthetic access methods
and hope to improve execution performance.
To validate changes, one can either use javap or use Bytecode Outline
plugin in Eclipse. In both cases one should look for "synthetic
access$<number>" methods at the end of the class and inner class files
in question - there should be none.
NB: don't mix this "synthetic access$" methods up with "public synthetic
bridge" methods generated to allow generic method override return types.
Change-Id: Ie7b65f251ec4452d5a5ed48aa0f272cf49a9aecd
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Java compiler must generate synthetic access methods for private methods
and fields of the enclosing class if they are accessed from inner
classes and vice versa.
While invisible in the code, those synthetic access methods exist in the
bytecode and seem to produce some extra execution overhead at runtime
(compared with the direct access to this fields or methods), see
https://git.eclipse.org/r/58948/.
By removing the "private" access modifier from affected methods and
fields we help compiler to avoid generation of synthetic access methods
and hope to improve execution performance.
To validate changes, one can either use javap or use Bytecode Outline
plugin in Eclipse. In both cases one should look for "synthetic
access$<number>" methods at the end of the class and inner class files
in question - there should be none.
NB: don't mix this "synthetic access$" methods up with "public synthetic
bridge" methods generated to allow generic method override return types.
Change-Id: I0ebaeb2bc454cd8051b901addb102c1a6688688b
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Java compiler must generate synthetic access methods for private methods
and fields of the enclosing class if they are accessed from inner
classes and vice versa.
While invisible in the code, those synthetic access methods exist in the
bytecode and seem to produce some extra execution overhead at runtime
(compared with the direct access to this fields or methods), see
https://git.eclipse.org/r/58948/.
By removing the "private" access modifier from affected methods and
fields we help compiler to avoid generation of synthetic access methods
and hope to improve execution performance.
To validate changes, one can either use javap or use Bytecode Outline
plugin in Eclipse. In both cases one should look for "synthetic
access$<number>" methods at the end of the class and inner class files
in question - there should be none.
NB: don't mix this "synthetic access$" methods up with "public synthetic
bridge" methods generated to allow generic method override return types.
Change-Id: If53ec94145bae47b74e2561305afe6098012715c
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Expose the following bitmap selection parameters via PackConfig:
"bitmapContiguousCommitCount", "bitmapRecentCommitCount",
"bitmapRecentCommitSpan", "bitmapDistantCommitSpan",
"bitmapExcessiveBranchCount", and "bitmapInactiveBranchAge".
The value of bitmapContiguousCommitCount, whereby bitmaps are
created for the most recent N commits in a branch, has never
been verified. If experiments show that they are not valuable,
then we can simplify the implementation so that there is only
a concept of recent and distant commit history (defined by
"bitmapRecentCommitCount"), and the only controls we need are
"bitmapRecentCommitSpan" and "bitmapDistantCommitSpan".
Change-Id: I288bf3f97d6fbfdfcd5dde2699eff433a7307fb9
Signed-off-by: Terry Parker <tparker@google.com>
Replace the “bitmapCommitRange” parameter that was recently introduced
with two new parameters: “bitmapExcessiveBranchCount” and
“bitmapInactiveBranchAgeInDays”. If the count of branches does not
exceed “bitmapExcessiveBranchCount”, then the current algorithm is kept
for all branches.
If the branch count is excessive, then the commit time for the tip
commit for each branch is used to determine if a branch is “inactive”.
"Active" branches get full commit selection using the existing
algorithm. "Inactive" branches get fewer bitmaps near the branch tips.
Introduce a "contiguousCommitCount" parameter that always enforces that
the N most recent commits in a branch are selected for bitmaps. The
previous nextSelectionDistance() algorithm created anywhere from 1-100
contiguous bitmaps at branch tips.
For example, consider a branch with commits numbering 0-300, with 0
being the most recent commit. If the most recent 200 commits are not
merge commits and the 200th commit was the last one selected,
nextSelectionDistance() returned 100, causing commits 200-101 to be
ignored. Then a window of size 100 was evaluated, searching for merge
commits. Since no merge commits are found, the next commit (commit 0)
was selected, for a total of 1 commit in the topmost 100 commits.
If instead the 250th commit was selected, then by the same logic
commit 50 is selected. At that point nextSelectionDistance() switches to
selecting consecutive commits, so commits 0-50 in the topmost 100
commits are selected. The "contiguousCommitCount" parameter provides
more determinism by always selecting a constant number or topmost
commits.
Add an optimization to break out of the inner loop of selectCommits() if
all of the commits for the current branch have already been found.
When reusing bitmaps from an existing pack, remove unnecessary
populating and clearing of the writeBitmaps/PackBitmapIndexBuilder.
Add comments to PackWriterBitmapPreparer, rename methods and variables
for readability.
Add tests for bitmap selection with and without merge commits and with
excessive branch pruning triggered.
Note: I will follow up with an additional change that exposes the new
parameters through PackConfig.
Change-Id: I5ccbb96c8849f331c302d9f7840e05f9650c4608
Signed-off-by: Terry Parker <tparker@google.com>
Since Git 2.6 wildcard restrictions for refspecs have been loosened:
refspecs like "refs/heads/*foo:refs/heads/foo*" are valid now.
See Git commit 8d3981ccbed9fc211b4e67105015179d9d2a5692
Change-Id: Icb78afbd282c425173b3a7bc10eadc4015689bb8
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Building on top of https://git.eclipse.org/r/#/c/56391/
Here we preserve compatibility with JetS3t
and add 2 new native JGit encryption implementations.
For reference, see connection configuration files:
* Version 0: jgit-s3-connection-v-0.properties
* Version 1: jgit-s3-connection-v-1.properties
* Version 2: jgit-s3-connection-v-2.properties
Change-Id: I713290bcacbe92d88e5ef28ce137de73dd1abe2f
Signed-off-by: Andrei Pozolotin <andrei.pozolotin@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
See previous attempt: https://git.eclipse.org/r/#/c/16674/
Here we preserve as much of JetS3t mode as possible
while allowing to use new Java 8+ PBE algorithms
such as PBEWithHmacSHA512AndAES_256
Summary of changes:
* change pom.xml to control long tests
* add WalkEncryptionTest.launch to run long tests
* add AmazonS3.Keys to to normalize use of constants
* change WalkEncryption to support AES in JetS3t mode
* add WalkEncryptionTest to test remote encryption pipeline
* add support for CI configuration for live Amazon S3 testing
* add log4j based logging for tests in both Eclipse and Maven build
To test locally, check out the review branch, then:
* create amazon test configuration file
* located your home dir: ${user.home}
* named jgit-s3-config.properties
* file format follows AmazonS3 connection settings file:
accesskey = your-amazon-access-key
secretkey = your-amazon-secret-key
test.bucket = your-bucket-for-testing
* finally:
* run in Eclipse: WalkEncryptionTest.launch
* or
* run in Shell: mvn test --define test=WalkEncryptionTest
Change-Id: I6f455fd9fb4eac261ca73d0bec6a4e7dae9f2e91
Signed-off-by: Andrei Pozolotin <andrei.pozolotin@gmail.com>
If the checkout path is currently a non-empty directory (and was a link
or a regular file before), this directory will be removed before
performing checkout, but only if the checkout path is specified.
Bug: 474973
Change-Id: Ifc6c61592d9b54d26c66367163acdebea369145c
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
A bitmap index contains bitmaps for a set of commits in a pack file.
Creating a bitmap for every commit is too expensive, so heuristics
select the most "important" commits. The most recent commits are the
most valuable. To clone a repository only those for the branch tips are
needed. When fetching, only commits since the last fetch are needed.
The commit selection heuristics generally work, but for some
repositories the number of selected commits is prohibitively high. One
example is the MSM 3.10 Linux kernel. With over 1 million commits on
2820 branches, the current heuristics resulted in +36k selected commits.
Each uncompressed bitmap for that repository is ~413k, making it
difficult to complete a GC operation in available memory.
The benefit of creating bitmaps over the entire history of a repository
like the MSM 3.10 Linux kernel isn't clear. For that repository, most
history for the last year appears to be in the last 100k commits.
Limiting bitmap commit selection to just those commits reduces the count
of selected commits from ~36k to ~10.5k. Dropping bitmaps for older
commits does not affect object counting times for clones or for fetches
on clients that are reasonably up-to-date.
This patch defines a new "bitmapCommitRange" PackConfig parameter to
limit the commit selection process when building bitmaps. The range
starts with the most recent commit and walks backwards. A range of 10k
considers only the 10000 most recent commits. A range of zero creates
bitmaps only for branch tips. A range of -1 (the default) does not limit
the range--all commits in the pack are used in the commit selection
process.
Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb
Signed-off-by: Terry Parker <tparker@google.com>
On a server also running Gerrit that is using RepoCommand to
convert from an XML manifest to a git submodule superproject
periodically, it would be handy to be able to use Gerrit's
submodule subscription feature[1] to update the superproject
automatically between RepoCommand runs as changes are merged
in each subprojects.
This requires setting the 'branch' field for each submodule
so that Gerrit knows what branch to watch. Add an option to
do that.
Setting the branch field also is useful for plain Git users,
since it allows them to use "git submodule update --remote" to
manually update all submodules between RepoCommand runs.
[1] https://gerrit-review.googlesource.com/Documentation/user-submodules.html
Change-Id: I1a10861bcd0df3b3673fc2d481c8129b2bdac5f9
Signed-off-by: Stefan Beller <sbeller@google.com>
Clirr doesn't support Java 8 hence use japicmp instead.
See https://github.com/siom79/japicmp
Change-Id: If4b30a6d6aa849b4d6b3b0c900558c609822840c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This patch makes JGit parsing of ignore rules containing unmatched '['
bracket compatible to the Git CLI.
Since '[' starts character group, Git tries to parse the ignore rule as
a shell glob pattern and if the character group is not closed, the glob
pattern is invalid and so the ignore rule never matches anything.
See also http://article.gmane.org/gmane.comp.version-control.git/278699.
Bug: 478490
Change-Id: I734a4d14fcdd721070e3f75d57e33c2c0700d503
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
- Remove declaration of IOException that is no longer thrown
- Add missing //$NON-NLS-1$ to prevent "Non-externalized string literal"
warning.
These warnings seem to have been introduced by If13f7b406.
Change-Id: I30058eed31b92067a6ab22e787732b08e29f8d63
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>