When garbage collecting, we decide to reuse some bitmaps in older
history from the previous pack to save time. The remainder of commit
selection only involves commits not covered by those bitmaps.
Currently we carry that out in two ways:
1. by building a bitmap representing the already-covered commits,
for easy containment checks and AND-NOT-ing against
2. by marking the reused bitmap commits as uninteresting in the
RevWalk that finds new commits
The mechanism in (2) is less efficient than (1): rw.next() will walk
back from reused bitmap commits to check whether the commit it is
about to emit is an ancestor of them, when using the bitmap from (1)
would let us perform the same check with a single contains() call.
Add a RevFilter teaching the RevWalk to perform that same check
directly using the bitmap from (1).
The next time the RevWalk is used, a different RevFilter is installed
so this does not break that.
A later commit will drop the markUninteresting calls.
No functional change intended except a possible speedup.
Change-Id: Ic375210fa69330948be488bf9dbbe4cb0d069ae6
Prior to this change, DfsInserter would not insert an object into a pack
if it already existed in another pack in the repository, even if that
pack was unreachable. Consider this sequence of events:
- Object FOO is pushed to a repository.
- Subsequent ref changes make FOO UNREACHABLE_GARBAGE.
- FOO is subsequently re-inserted using a DfsInserter, but skipped
due to existing in UNREACHABLE_GARBAGE.
- The repository is repacked; FOO will not be written into a new pack
because it is not yet reachable from a reference. If the
UNREACHABLE_GARBAGE packs are deleted, FOO disappears.
- A reference is updated to reference FOO. This reference is now broken
as FOO was removed when the repacking process deleted the
UNREACHABLE_GARBAGE pack that stored the only copy of FOO.
The garbage collector can't safely delete the UNREACHABLE_GARBAGE
pack because FOO might be in the middle of being re-inserted/re-packed.
This change writes a duplicate copy of an object if it only exists in
UNREACHABLE_GARBAGE. This "freshens" the object to give it a chance to
survive long enough to be made reachable through a reference.
Change-Id: I20f2062230f3af3bccd6f21d3b7342f1152a5532
Signed-off-by: Mike Williams <miwilliams@google.com>
Until 320a4142 (Update bitmap selection throttling to fully span
active branches, 2015-10-20), setupTipCommitBitmaps contained code
along the following lines:
for (PackBitmapIndexRemapper.Entry entry : bitmapRemapper) {
if (!reuse(entry))
continue;
RevCommit rc = (RevCommit) rw.peel(rw.parseAny(entry));
reuseCommits.add(new BitmapCommit(rc));
EWAHCompressedBitmap bitmap =
remap.ofObjectType(remap.getBitmap(rc), OBJ_COMMIT);
writeBitmaps.addBitmap(rc, bitmap, 0);
reuse.add(rc, OBJ_COMMIT);
}
writeBitmaps.clearBitmaps(); // Remove temporary bitmaps
This loop OR-ed together bitmaps for commits whose bitmaps would be
reused. A subtle point is the use of the add() method, which ORs in a
bitmap from the BitmapIndex when it exists and falls back to OR-ing in
a single bit when that bitmap does not exist in the BitmapIndex.
Commit 320a4142 removed the addBitmap step, so the bitmap does not
exist in the BitmapIndex and the fallback behavior is triggered.
Simplify and restore the intended behavior by avoiding use of the
subtle use of the add() method --- use or() directly instead.
Change-Id: I30844134bfde0cbabdfaab884c84b9809dd8bdb8
When packing is able to reuse lots of deltas from existing packs, those
objects are marked as "doNotAttemptDelta" and do not contribute to
DeltaTask's computeTopPaths() "totalWeight" calculation.
In the extreme case when all packs are reusable, "totalWeight" will be
zero. DeltaTask.partitionTasks() uses "totalWeight" to determine a
"weightPerThread" size it uses to set up DeltaTasks. When "totalWeight"
is small, partitionTasks() ends up creating a DeltaTask for every
unique path.
For a large repository, the small "weightPerThread" can result in the
creation of >100k tasks (for the MSM 3.10 Linux repository, the count
was ~150k). This makes the "task stealing" mechanism in DeltaTask very
inefficient, because every attempt to steal work does a linear walk
through all tasks, searching for the one with the most work remaining,
which is O(N^2) comparisons. For the MSM 3.10 repository when all
deltas were reusable, PackWriter.parallelDeltaSearch() took
(1615+1633+1458)/3 = 1568 seconds.
The error is that DeltaTask treats the weights of objects marked as
"doNotAttemptDelta" inconsistently. It ignores the weights when
calculating "totalWeight" but uses them when partitioning the tasks.
The fix is to also ignore them when partitioning the tasks.
With this patch applied, PackWriter.parallelDeltaSearch() on the
MSM 3.10 repository when all deltas are reused went from taking
1568 seconds to 62ms (>25k speedup).
This patch also fixes a totalWeight initialization error in
DeltaTask.computeTopPaths().
Change-Id: I2ae37efa83bca42b0e716266ae6aa9d182e76d9c
Signed-off-by: Terry Parker <tparker@google.com>
Avoid leaving the reader in suspense by handling the unusual
(!RevCommit) case first. As a nice side effect, there is less nesting
to keep track of in the rest of the loop body.
No functional change intended.
Change-Id: I1580de444fccde08070f696218c12041151a924a
When the file <git-dir>/hooks/pre-push exists make sure that is is
executing during a push. The pre-push hook runs during git push, after
the remote refs have been updated but before any objects have been
transferred.
Change-Id: Ibbb58ee3227742d1a2f913134ce11e7a135c7f4c
In order to support filters in gitattributes FS.runProcess() is made
public. Support for stdin redirection has been added. Support for binary
data on stdin/stdout (as used be clean/smudge filters) has been added.
Change-Id: Ice2c152e9391368dc5748d7b825a838e3eb755f9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fixed random errors in discoverGitSystemConfig() on Linux where the
process error stream was closed by readPipe() before or while
GobblerThread was reading from it.
Marked readPipe() as @Nullable and fixed potential NPE in
discoverGitSystemConfig() on readPipe() return value.
Fixed process error output randomly mixed with other threads log
messages.
Change-Id: Id882af2762cfb75f010f693b2e1c46eb6968ee82
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
When commits are selected for bitmap generation, they are reordered
so that related "chains" of commits are grouped together. Chains are
"subbranches" of commits that may branch off of and re-merge with the
main line. Grouping by chains means that the XOR difference between
consecutive selected commits will be smaller, resulting in better
run-length compression of the XORed bitmaps.
Add a new testSelectionOrderingWithChains() test in a new
GcCommitSelectionTest test class. Also move related GC commit selection
tests out of GcBasicPackingTest and into GcCommitSelectionTest.
Change-Id: I8e80cac29c4ca8193b41c9898e5436c22a659f11
Signed-off-by: Terry Parker <tparker@google.com>
Add comments and rename variables in PackWriterBitmapPreparer to improve
readability.
Change-Id: I49e7a1c35307298f7bc32ebfc46ab08e94290125
Signed-off-by: Terry Parker <tparker@google.com>
Java compiler must generate synthetic access methods for private methods
and fields of the enclosing class if they are accessed from inner
classes and vice versa.
While invisible in the code, those synthetic access methods exist in the
bytecode and seem to produce some extra execution overhead at runtime
(compared with the direct access to this fields or methods), see
https://git.eclipse.org/r/58948/.
By removing the "private" access modifier from affected methods and
fields we help compiler to avoid generation of synthetic access methods
and hope to improve execution performance.
To validate changes, one can either use javap or use Bytecode Outline
plugin in Eclipse. In both cases one should look for "synthetic
access$<number>" methods at the end of the class and inner class files
in question - there should be none.
NB: don't mix this "synthetic access$" methods up with "public synthetic
bridge" methods generated to allow generic method override return types.
Change-Id: I94fb481b68c84841c1db1a5ebe678b13e13c962b
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Java compiler must generate synthetic access methods for private methods
and fields of the enclosing class if they are accessed from inner
classes and vice versa.
While invisible in the code, those synthetic access methods exist in the
bytecode and seem to produce some extra execution overhead at runtime
(compared with the direct access to this fields or methods), see
https://git.eclipse.org/r/58948/.
By removing the "private" access modifier from affected methods and
fields we help compiler to avoid generation of synthetic access methods
and hope to improve execution performance.
To validate changes, one can either use javap or use Bytecode Outline
plugin in Eclipse. In both cases one should look for "synthetic
access$<number>" methods at the end of the class and inner class files
in question - there should be none.
NB: don't mix this "synthetic access$" methods up with "public synthetic
bridge" methods generated to allow generic method override return types.
Change-Id: Ie7b65f251ec4452d5a5ed48aa0f272cf49a9aecd
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Java compiler must generate synthetic access methods for private methods
and fields of the enclosing class if they are accessed from inner
classes and vice versa.
While invisible in the code, those synthetic access methods exist in the
bytecode and seem to produce some extra execution overhead at runtime
(compared with the direct access to this fields or methods), see
https://git.eclipse.org/r/58948/.
By removing the "private" access modifier from affected methods and
fields we help compiler to avoid generation of synthetic access methods
and hope to improve execution performance.
To validate changes, one can either use javap or use Bytecode Outline
plugin in Eclipse. In both cases one should look for "synthetic
access$<number>" methods at the end of the class and inner class files
in question - there should be none.
NB: don't mix this "synthetic access$" methods up with "public synthetic
bridge" methods generated to allow generic method override return types.
Change-Id: I0ebaeb2bc454cd8051b901addb102c1a6688688b
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Java compiler must generate synthetic access methods for private methods
and fields of the enclosing class if they are accessed from inner
classes and vice versa.
While invisible in the code, those synthetic access methods exist in the
bytecode and seem to produce some extra execution overhead at runtime
(compared with the direct access to this fields or methods), see
https://git.eclipse.org/r/58948/.
By removing the "private" access modifier from affected methods and
fields we help compiler to avoid generation of synthetic access methods
and hope to improve execution performance.
To validate changes, one can either use javap or use Bytecode Outline
plugin in Eclipse. In both cases one should look for "synthetic
access$<number>" methods at the end of the class and inner class files
in question - there should be none.
NB: don't mix this "synthetic access$" methods up with "public synthetic
bridge" methods generated to allow generic method override return types.
Change-Id: If53ec94145bae47b74e2561305afe6098012715c
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Expose the following bitmap selection parameters via PackConfig:
"bitmapContiguousCommitCount", "bitmapRecentCommitCount",
"bitmapRecentCommitSpan", "bitmapDistantCommitSpan",
"bitmapExcessiveBranchCount", and "bitmapInactiveBranchAge".
The value of bitmapContiguousCommitCount, whereby bitmaps are
created for the most recent N commits in a branch, has never
been verified. If experiments show that they are not valuable,
then we can simplify the implementation so that there is only
a concept of recent and distant commit history (defined by
"bitmapRecentCommitCount"), and the only controls we need are
"bitmapRecentCommitSpan" and "bitmapDistantCommitSpan".
Change-Id: I288bf3f97d6fbfdfcd5dde2699eff433a7307fb9
Signed-off-by: Terry Parker <tparker@google.com>
Replace the “bitmapCommitRange” parameter that was recently introduced
with two new parameters: “bitmapExcessiveBranchCount” and
“bitmapInactiveBranchAgeInDays”. If the count of branches does not
exceed “bitmapExcessiveBranchCount”, then the current algorithm is kept
for all branches.
If the branch count is excessive, then the commit time for the tip
commit for each branch is used to determine if a branch is “inactive”.
"Active" branches get full commit selection using the existing
algorithm. "Inactive" branches get fewer bitmaps near the branch tips.
Introduce a "contiguousCommitCount" parameter that always enforces that
the N most recent commits in a branch are selected for bitmaps. The
previous nextSelectionDistance() algorithm created anywhere from 1-100
contiguous bitmaps at branch tips.
For example, consider a branch with commits numbering 0-300, with 0
being the most recent commit. If the most recent 200 commits are not
merge commits and the 200th commit was the last one selected,
nextSelectionDistance() returned 100, causing commits 200-101 to be
ignored. Then a window of size 100 was evaluated, searching for merge
commits. Since no merge commits are found, the next commit (commit 0)
was selected, for a total of 1 commit in the topmost 100 commits.
If instead the 250th commit was selected, then by the same logic
commit 50 is selected. At that point nextSelectionDistance() switches to
selecting consecutive commits, so commits 0-50 in the topmost 100
commits are selected. The "contiguousCommitCount" parameter provides
more determinism by always selecting a constant number or topmost
commits.
Add an optimization to break out of the inner loop of selectCommits() if
all of the commits for the current branch have already been found.
When reusing bitmaps from an existing pack, remove unnecessary
populating and clearing of the writeBitmaps/PackBitmapIndexBuilder.
Add comments to PackWriterBitmapPreparer, rename methods and variables
for readability.
Add tests for bitmap selection with and without merge commits and with
excessive branch pruning triggered.
Note: I will follow up with an additional change that exposes the new
parameters through PackConfig.
Change-Id: I5ccbb96c8849f331c302d9f7840e05f9650c4608
Signed-off-by: Terry Parker <tparker@google.com>
LocalDiskRepositoryTestCase and TestRepository have competing ideas
about time. Push them into MockSystemReader so they can
cooperate.
Rename getClock() methods that return Dates to getDate().
Change-Id: Ibbd9fe7f85d0064b0a19e3b675b9718a9e67c479
Signed-off-by: Terry Parker <tparker@google.com>
Since we use the reporting plugins only in the parent pom.xml there's no
point in using the new pluginManagement tag in the reporting section
which was introduced to fix
https://issues.apache.org/jira/browse/MSITE-443
Change-Id: I750ca3765e95afb06609a362fb3354afc3b66b90
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Building on top of https://git.eclipse.org/r/#/c/56391/
Here we preserve compatibility with JetS3t
and add 2 new native JGit encryption implementations.
For reference, see connection configuration files:
* Version 0: jgit-s3-connection-v-0.properties
* Version 1: jgit-s3-connection-v-1.properties
* Version 2: jgit-s3-connection-v-2.properties
Change-Id: I713290bcacbe92d88e5ef28ce137de73dd1abe2f
Signed-off-by: Andrei Pozolotin <andrei.pozolotin@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
See previous attempt: https://git.eclipse.org/r/#/c/16674/
Here we preserve as much of JetS3t mode as possible
while allowing to use new Java 8+ PBE algorithms
such as PBEWithHmacSHA512AndAES_256
Summary of changes:
* change pom.xml to control long tests
* add WalkEncryptionTest.launch to run long tests
* add AmazonS3.Keys to to normalize use of constants
* change WalkEncryption to support AES in JetS3t mode
* add WalkEncryptionTest to test remote encryption pipeline
* add support for CI configuration for live Amazon S3 testing
* add log4j based logging for tests in both Eclipse and Maven build
To test locally, check out the review branch, then:
* create amazon test configuration file
* located your home dir: ${user.home}
* named jgit-s3-config.properties
* file format follows AmazonS3 connection settings file:
accesskey = your-amazon-access-key
secretkey = your-amazon-secret-key
test.bucket = your-bucket-for-testing
* finally:
* run in Eclipse: WalkEncryptionTest.launch
* or
* run in Shell: mvn test --define test=WalkEncryptionTest
Change-Id: I6f455fd9fb4eac261ca73d0bec6a4e7dae9f2e91
Signed-off-by: Andrei Pozolotin <andrei.pozolotin@gmail.com>
At least on Windows the test failed each second time on the last assert.
Adding a small timeout before gc.prune() makes the test stable again.
Change-Id: I23d98dd565912c58dcf2f24f3ebc24824670cff3
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
RepoCommandTest was failing because of open file handle left.
IgnoreNodeTest was failing because of problems with creation of files
with trailing spaces on Windows.
HookTest was failing because of wrong line delimiter.
Change-Id: I34f074ac447eb4c3ada8b250309bb568b426189d
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
If the checkout path is currently a non-empty directory (and was a link
or a regular file before), this directory will be removed before
performing checkout, but only if the checkout path is specified.
Bug: 474973
Change-Id: Ifc6c61592d9b54d26c66367163acdebea369145c
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
A bitmap index contains bitmaps for a set of commits in a pack file.
Creating a bitmap for every commit is too expensive, so heuristics
select the most "important" commits. The most recent commits are the
most valuable. To clone a repository only those for the branch tips are
needed. When fetching, only commits since the last fetch are needed.
The commit selection heuristics generally work, but for some
repositories the number of selected commits is prohibitively high. One
example is the MSM 3.10 Linux kernel. With over 1 million commits on
2820 branches, the current heuristics resulted in +36k selected commits.
Each uncompressed bitmap for that repository is ~413k, making it
difficult to complete a GC operation in available memory.
The benefit of creating bitmaps over the entire history of a repository
like the MSM 3.10 Linux kernel isn't clear. For that repository, most
history for the last year appears to be in the last 100k commits.
Limiting bitmap commit selection to just those commits reduces the count
of selected commits from ~36k to ~10.5k. Dropping bitmaps for older
commits does not affect object counting times for clones or for fetches
on clients that are reasonably up-to-date.
This patch defines a new "bitmapCommitRange" PackConfig parameter to
limit the commit selection process when building bitmaps. The range
starts with the most recent commit and walks backwards. A range of 10k
considers only the 10000 most recent commits. A range of zero creates
bitmaps only for branch tips. A range of -1 (the default) does not limit
the range--all commits in the pack are used in the commit selection
process.
Change-Id: Ied92c70cfa0778facc670e0f14a0980bed5e3bfb
Signed-off-by: Terry Parker <tparker@google.com>
Previously the method DirCacheCheckoutTest#assertWorkDir() silently
skipped over empty folders. If tests would have left unexpected empty
folders in the worktree this would be overlooked. Now empty folders have
to be specified by something like mkmap("<foldername>", "/", ...]
Change-Id: Idb8b270e92daf02ecdc381d148a5958bd83ec057
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
On a server also running Gerrit that is using RepoCommand to
convert from an XML manifest to a git submodule superproject
periodically, it would be handy to be able to use Gerrit's
submodule subscription feature[1] to update the superproject
automatically between RepoCommand runs as changes are merged
in each subprojects.
This requires setting the 'branch' field for each submodule
so that Gerrit knows what branch to watch. Add an option to
do that.
Setting the branch field also is useful for plain Git users,
since it allows them to use "git submodule update --remote" to
manually update all submodules between RepoCommand runs.
[1] https://gerrit-review.googlesource.com/Documentation/user-submodules.html
Change-Id: I1a10861bcd0df3b3673fc2d481c8129b2bdac5f9
Signed-off-by: Stefan Beller <sbeller@google.com>