The TreeWalk filtering classes need to support the three different
meanings of the return value the path comparison generates.
A new path comparison method (isPathMatch) is created with
three distinct return values (isPathPrefix use value '0' to
encode two of these) which will makes it possible for the logical
operators (especially NOT) to aggregate a correct verdict.
A filter like: AND(Path("path"), NOT(Path("path/to/other")))
Should filter out 'path/to/other/file', but not 'path/to/my/file'.
The path-limiting feature when testing path/to/my/file, would
result to run test for the following paths:
path
path/to
path/to/my
path/to/my/file
isPathPrefix('path/to/other') will return '0' for the first two
and since there is no way for NOT to distinguish between an exact
match and a match indicating that the tested path is a 'parent',
it will incorrectly return false and thus remove everything below
'path' immediately.
isPathMatch has a distinguished value for 'parent' matches that
will be preserved through the logic operators and should not
cause an over-eager removal of paths.
The functionality of isPathPrefix is required by other parts
and is untouched.
Unit tests are included to ensure that the logical functionality
is correct and can be preserved.
Change-Id: Ice2ca9406f09f1b179569e99b86a0e5d77baa20d
Signed-off-by: Magnus Vigerlöf <magnus.vigerlof@gmail.com>
Allow SHA1 instances to be reused to compute another hash value, and
resume caching them in ObjectInserter and PackParser. This shaves a
small amount of running time off parsing git.git's pack file:
before after
------ ------
25.25s 25.55s
25.48s 25.06s
25.26s 24.94s
Almost noise (small difference), but recycling the instances reduces
some stress on the memory allocator finding two 80 word message block
arrays needed for hashing and collision detection.
Change-Id: I4af88a720e81460293bc5c5d1d3db1a831e7e228
This implementation is derived straight from the description written
in RFC 3174. On Mac OS X with Java 1.8.0_91 it offers similar
throughput as MessageDigest SHA-1:
system 239.75 MiB/s
system 244.71 MiB/s
system 245.00 MiB/s
system 244.92 MiB/s
sha1 234.08 MiB/s
sha1 244.50 MiB/s
sha1 242.99 MiB/s
sha1 241.73 MiB/s
This is the fastest implementation I could come up with. Common SHA-1
implementation tricks such as unrolling loops creates a method too
large for the JIT to effectively optimize, resulting in lower overall
hashing throughput. Using a preprocessor to perform the register
renaming of A-E also didn't help, as again the method was too large
for the JIT to effectively optimize.
Fortunately the fastest version is a naive, straight-forward
implementation very close to the description in RFC 3174.
Change-Id: I228b05c4a294ca2ad51386cf0e47978c68e1aa42
Since the introduction of generic type parameter inference in Java 7,
it's not necessary to explicitly specify the type of generic parameters.
Enable the warning in Eclipse, and fix all occurrences.
Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Set missingOverrideAnnotation=warning in Eclipse compiler preferences
which enables the warning:
The method <method> of type <type> should be tagged with @Override
since it actually overrides a superclass method
Justification for this warning is described in:
http://stackoverflow.com/a/94411/381622
Enabling this causes in excess of 1000 warnings across the entire
code-base. They are very easy to fix automatically with Eclipse's
"Quick Fix" tool.
Fix all of them except 2 which cause compilation failure when the
project is built with mvn; add TODO comments on those for further
investigation.
Change-Id: I5772061041fd361fe93137fd8b0ad356e748a29c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Archived zip files for a same commit have different MD5 hash because
mdate and mdate in the header of zip entries are not specified. In
this case, Commons Compress sets an archived time.
In the original git implementation, it's set a commit time:
e2b2d6a172/archive.c (L378)
By this fix, archive command sets the commit time to ZipArchiveEntry
when RevCommit is given as an archiving target.
Change-Id: I30dd8710e910cdf42d57742f8709e9803930a123
Signed-off-by: Naoki Takezoe <takezoe@gmail.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If the pruneexpire config is set to "now", then any unreferenced loose
objects are immediately eligible for gc. So there is no need to
actually write the loose objects.
Users who run hosting services which sometimes accept large, entirely
garbage packs might set the following configurations:
gc.pruneExpire = now
gc.prunePackExpire = 2.weeks
Then garbage objects will be kept around in packs, but after two weeks
the packs themselves will get deleted.
For client-side users of jgit, the default settings will loosen
garbage objects, and, after an hour, delete the old packs in which
they resided.
Change-Id: I8f686ac60b40181b1ee92ac6c313c3f33b55c44c
Signed-off-by: David Turner <dturner@twosigma.com>
Place a configurable upper bound on the amount of command data
received from clients during `git push`. The limit is applied to the
encoded wire protocol format, not the JGit in-memory representation.
This allows clients to flexibly use the limit; shorter reference names
allow for more commands, longer reference names permit fewer commands
per batch.
Based on data gathered from many repositories at $DAY_JOB, the average
reference name is well under 200 bytes when encoded in UTF-8 (the wire
encoding). The new 3 MiB default receive.maxCommandBytes allows about
11,155 references in a single `git push` invocation. A Gerrit Code
Review system with six-digit change numbers could still encode 29,399
references in the 3 MiB maxCommandBytes limit.
Change-Id: I84317d396d25ab1b46820e43ae2b73943646032c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
submoduleStandalone is created by createWorkRepository() which adds
the created repository to the set of repositories to be closed in
the test teardown. It is therefore not necessary to explicitly close
it.
Change-Id: Ib6f525b644fdeaaf1934df39cc2d3583a0d883dc
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
When rebasing, force-pushing has a race condition: someone else might
have pushed a commit since the one you just rewrote. The force-with-lease
option prevents this by ensuring that the ref's old value is the one
that you expected.
Change-Id: I97ca9f8395396c76332bdd07c486e60549ca4401
Signed-off-by: David Turner <dturner@twosigma.com>
In 8ac65d33ed PackWriter changed its
behavior to always prefer the last object representation presented
to it by the ObjectReuseAsIs implementation. This was a fix to avoid
delta chain cycles.
Unfortunately it can lead to suboptimal compression when concurrent
GCs are run on the same repository. One case is automatic GC running
(with default settings) in parallel to a manual GC that has disabled
delta reuse in order to generate new smaller deltas for the entire
history of the repository.
Running GC with no-reuse generally requires more CPU time, which
also translates to a longer running time. This can lead to a race
where the automatic GC completes before the no-reuse GC, leaving
the repository in a state such as:
no-reuse GC: size 1 GiB, mtime = 18:45
auto GC: size 8 GiB, mtime = 17:30
With the default sort ordering, the smaller no-reuse GC pack is
sorted earlier in the pack list, due to its more recent mtime.
During object reuse in a future GC, these smaller representations
are considered first by PackWriter, but are all discarded when the
auto GC file from 17:30 is examined second (due to its older mtime).
Work around this in two ways.
Well formed DFS repositories should have at most 1 GC pack. If
2 or more GC packs exist, break the sorting tie by selecting the
smaller file earlier in the pack list. This allows all normal read
code paths to favor the smaller file, which places less pressure
on the DfsBlockCache. If any GC race happens, readers serving clone
requests will prefer the file that is smaller.
During object reuse, flip this ordering so that the smaller file is
last. This allows PackWriter to see smaller deltas last, replacing
larger representations that were previously considered from other
pack files.
Change-Id: I0b7dc8bb9711c82abd6bd16643f518cfccc6d31a
Delta search was discarding discovered deltas if an object appeared
near a type boundary in the delta search window. This has caused JGit
to produce larger pack files than other implementations of the packing
algorithm.
Delta search works by pushing prior objects into a search window, an
ordered list of objects to attempt to delta compress the next object
against. (The window size is bounded, avoiding O(N^2) behavior.)
For implementation reasons multiple object types can appear in the
input list, and the window. PackWriter commonly passes both trees and
blobs in the input list handed to the DeltaWindow algorithm. The pack
file format requires an object to only delta compress against the same
type, so the DeltaWindow algorithm must stop doing comparisions if a
blob would be compared to a tree.
Because the input list is sorted by object type and the window is
recently considered prior objects, once a wrong type is discovered in
the window the search algorithm stops and uses the current result.
Unfortunately the termination condition was discarding any found
delta by setting deltaBase and deltaBuf to null when it was trying
to break the window search.
When this bug occurs, the state of the DeltaWindow looks like this:
current
|
\ /
input list: tree0 tree1 blob1 blob2
window: blob1 tree1 tree0
/ \
|
res.prev
As the loop iterates to the right across the window, it first finds
that blob1 is a suitable delta base for blob2, and temporarily holds
this in the bestDelta/deltaBuf fields. It then considers tree1, but
tree1 has the wrong type (blob != tree), so the window loop must give
up and fall through the remaining code.
Moving the condition up and discarding the window contents allows
the bestDelta/deltaBuf to be kept, letting the final file delta
compress blob1 against blob0.
The impact of this bug (and its fix) on real world repositories is
likely minimal. The boundary from blob to tree happens approximately
once in the search, as the input list is sorted by type. Only the
first window size worth of blobs (e.g. 10 or 250) were failing to
produce a delta in the final file.
This bug fix does produce significantly different results for small
test repositories created in the unit test suite, such as when a pack
may contains 6 objects (2 commits, 2 trees, 2 blobs). Packing test
cases can now better sample different output pack file sizes depending
on delta compression and object reuse flags in PackConfig.
Change-Id: Ibec09398d0305d4dbc0c66fce1daaf38eb71148f
Disabling the garbage pack coalescing when garbageTtl > 0 can result in
lot of garbage packs if they are created within the garbageTtl time.
To avoid a large number of garbage packs, re-introducing garbage pack
coalescing for the packs that are created within a single calendar day
when the garbageTtl is more than one day or one third of the garbageTtl.
Change-Id: If969716aeb55fb4fd0ff71d75f41a07638cd5a69
Signed-off-by: Thirumala Reddy Mutchukota <thirumala@google.com>
Cover the case where the exception is wrapped up as a
cause, e.g., PackIndex#open(File).
Change-Id: I0df5b1e9c2ff886bdd84dee3658b6a50866699d1
Signed-off-by: Hongkai Liu <hongkai.liu@ericsson.com>
JGit now requires Java 8, so it is no longer necessary to have a
separate class for Java 7 specific tests. Remove it and merge its
tests into the existing FileTreeIteratorTest.
FileTreeIteratorTest has an @Before annotated method that sets up
some files in the git, which breaks the tests which have assumptions
on the file names. Add adjustments.
Change-Id: I14f88d8e079e1677c8dfbc1fcbf4444ea8265365
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Rename the test class to match the name of the class under test.
JGit now requires Java 8 so it is no longer necessary to have a
separate class (FileUtils7Test) for Java 7 tests. Merge those into
FileUtilsTest.
Change-Id: I39dd7e76a2e4ce97319c7d52261b0a1546879788
Signed-off-by: Hongkai Liu <hongkai.liu@ericsson.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
An orphan file is either a bitmap or an idx file in pack folder,
and its corresponding pack file is missing.
Change-Id: I3c4cb1f7aa99dd7b398bdb8d513f528d7761edff
Signed-off-by: Hongkai Liu <hongkai.liu@ericsson.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The tearDown() of the superclass closed the repository once more which
led to a negative use count warning logged by Repository.close().
Change-Id: I331f85a540c68264a53456276c32f72b79113d61
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
createBareRepository adds the created repo to the list of repos to be
closed in the superclass's teardown. Wrapping it in try-with-resource
causes it to be closed too many times, resulting in a corrupt use
count.
Change-Id: I4c70630bf6008544324dda453deb141f4f89472c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The repositories get added to the "toClose" set by createBareRepository,
and are then closed in the superclass's tearDown method.
Explicitly closing them in this test class's teardown causes the use
count to go negative when subsequently closed again by the superclass.
Change-Id: Idcbb16b4cf4bf0640d7e4ac15d1926d8a27c1251
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
These methods add the created Repository into "toClose", and they are
then closed by LocalDiskRepositoryTestCase's tearDown method.
Calling them in try-with-resource causes them to first be closed in
the test method, and then again in tearDown, which results in the use
count going negative and a log message on the console.
While this is not a serious problem, having so many false positives
in the logs will potentially drown out real cases of Repository being
closed too many times.
Change-Id: Ib374445e101dc11cb840957b8b19ee1caf777392
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The repositories are already closed in the superclass teardown due
to being added to the "toClose" set.
Change-Id: I768ba8a02fc585907687caf37e2e283434016c04
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Otherwise these methods may produce unexpected results if used for
strings that are intended to be interpreted locale independently.
Examples are programming language identifiers, protocol keys, and HTML
tags. For instance, "TITLE".toLowerCase() in a Turkish locale returns
"t\u0131tle", where '\u0131' is the LATIN SMALL LETTER DOTLESS I
character.
See
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toLowerCase--http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html
Bug: 511238
Change-Id: Id8d8f37d84d62239c918b81f8d883ed798d87656
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The Compacter and Garbage Collector will record the estimated size of
the newly going to be created compact, gc or garbage packs. This
information can be used by the clients to better make a call on how to
actually store the pack based on the approximated expected size.
Added a new protected method DfsObjDatabase.newPack(PackSource
packSource, long estimatedPackSize), so that the clients can override
this method to make use of the estimatedPackSize while creating a new
PackDescription object. The default implementation of this method is
equivalent to
newPack(packSource).setEstimatedPackSize(estimatedPackSize). I didn't
make it abstract because that would force all the existing sub classes
of DfsObjDatabase to implement this method. Due to this default
implementation, the estimatedPackSize is added to DfsPackDescription
using a setter instead of a constructor parameter (even though
constructor parameter would be a better choice as this value is set only
during the object creation).
Change-Id: Iade1122633ea774c2e842178a6a6cbb4a57b598b
Signed-off-by: Thirumala Reddy Mutchukota <thirumala@google.com>
An unreferenced object might appear in a pack. This could only happen
because it was previously referenced, and then later that reference
was removed. When we gc, we copy the referenced objects into a new
pack, and delete the old pack. This would remove the unreferenced
object. Now we first create a loose object from any unreferenced
object in the doomed pack. This kicks off the two-week grace period
for that object, after which it will be collected if it's not
referenced.
This matches the behavior of regular git.
Change-Id: I59539aca1d0d83622c41aa9bfbdd72fa868ee9fb
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Jonathan Nieder <jrn@google.com>
The new --preserve-oldpacks option moves old pack files into the
preserved subdirectory instead of deleting them after repacking.
The new --prune-preserved option prunes old pack files from the
preserved subdirectory after repacking, but before potentially
moving the latest old packfiles to this subdirectory.
These options are designed to prevent stale file handle exceptions
during git operations which can happen on users of NFS repos when
repacking is done on them. The strategy is to preserve old pack files
around until the next repack with the hopes that they will become
unreferenced by then and not cause any exceptions to running processes
when they are finally deleted (pruned).
Change-Id: If3f729f0d9ce920ee2c3e6acdde46f2068be61d2
Signed-off-by: James Melvin <jmelvin@codeaurora.org>
Generic normalization method for a possible invalid branch name.
The method compresses dividers between spaces, then replaces spaces
and non word characters with underscores.
This method is needed in preparation for subsequent EGit changes.
Bug: 509878
Change-Id: Ic0d12f098f90f912a45bcc5693d6accf751d4e58
Signed-off-by: Wim Jongman <wim.jongman@remainsoftware.com>
If there are untracked changes, apply only the untracked tree
after a successful merge. The merge tree from merging untracked
with HEAD would also contain files already reset before (changes
in tracked files) and try to reset those again,leading to false
checkout conflicts.
Bug: 505804
Change-Id: Iaced4d277623334d11e3d1cca5969590d7c5093e
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
ObjectDirectory.getShallowCommits should throw an IOException
instead of an InvalidArgumentException if invalid SHAs are present
in .git/shallow (as this file is usually edited by a human).
Change-Id: Ia3a39d38f7aec4282109c7698438f0795fbec905
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
This fixes a nasty performance issue for repositories that have many
objects referenced through refs/tags/, but not in refs/heads/.
Situations like this can arise when a project has made releases like
refs/tags/v1.0, and then decides to orphan history and start over for
version 2. The v1.0 objects are not reachable from master anymore,
but are still live due to the v1.0 tag.
When tags are packed in the GC_OTHER pack, bitmaps are not able to
cover the repository's contents. This may cause very slow counting
times during git clone, as the server must enumerate the ancient
history under refs/tags/ to respond to the client.
Clients by default always ask for all tags when asking for all heads
during clone. This has been true since git-core commit 8434c2f1afedb
(Apr 27 2008), when clone was converted to a builtin. Including tags
in the main GC pack should still allow servers to benefit from the
fast full pack reuse path when serving a clone to a client.
Change-Id: I22e29517b5bc6fa3d6b19a19f13bef0c68afdca3
Previously it was looking for a keep file with the name of a pack file
(extenstion included) appended with a '.keep'. However, the keep file
name should be the pack file name with a '.keep' extension
Change-Id: I9dc4c7c393ae20aefa0b9507df8df83610ce4d42
Signed-off-by: James Melvin <jmelvin@codeaurora.org>
FileSnapshot.isModified may have reported a file to be clean although it
was actually dirty.
Imagine you have a FileSnapshot on file f. lastmodified and lastread are
both t0. Now time is t1 and you
1) modify the file
2) update the FileSnapshot of the file (lastModified=t1, lastRead=t1)
3) modify the file again
4) wait 3 seconds
5) ask the Filesnapshot whether the file is dirty or not. It erroneously
answered it's clean.
Any file which has been modified longer than 2.5 seconds ago was
reported to be clean. As the test shows that's not always correct.
The real-world problem fixed by this change is the following:
* A gerrit server using JGit to serve git repositories is processing
fetch requests while simultaneously a native git garbage collection
runs on the repo.
* At time t1 native git writes temporary files in the pack folder
setting the mtime of the pack folder to t1.
* A fetch request causes JGit to search for new packfiles and JGit
remembers this scan in a Filesnapshot on the packs folder. Since the gc
is not finished JGit doesn't see any new packfiles.
* The fetch is processed and the gc ends while the filesystem timer is
still t1. GC writes a new packfile and deletes the old packfile.
* 3 seconds later another request arrives. JGit does not yet know about
the new packfile but is also not rescanning the pack folder because it
cached that the last scan happened at time t1 and pack folder's mtime is
also t1. Now JGit will not be able to resolve any object contained in
this new pack. This behavior may be persistent if objects referenced by
the ref/meta/config branch are affected so gerrit can't read permissions
stored in the refs/meta/config branch anymore and will not allow any
pushes anymore. The pack folder will not change its mtime and therefore
no rescan will take place.
Change-Id: I3efd0ccffeb97b01207dc3e7a6b85c6b06928fad
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Fix JGits merge-base calculation in case of inconsistent commit times.
JGit was potentially failing to compute correct merge-bases when the
commit times where inconsistent (a parent commit was younger than a
child commit). The code in MergeBaseGenerator was aware of the fact that
sometimes the discovery of a merge base x can occur after the parents of
x have been seen (see comment in #carryOntoOne()). But in the light of
inconsistent commit times it was possible that these parents of a
merge-base have already been returned as a merge-base.
This commit fixes the bug by buffering all commits generated by
MergeBaseGenerator. It is expected that this buffer will be small
because the number of merge-bases will be small. Additionally a new
flag is used to mark the ancestors of merge-bases. This allows to filter
out the unwanted commits.
Bug: 507584
Change-Id: I9cc140b784c3231b972bd2c3de61a789365237ab
Use Oxygen M3 Orbit repository which provides the bundles built using
the new orbit-recipe based build.
CQ: 11658
Change-Id: I7f3dcc966732b32830c75d5daa55383bd028d182
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Java 8 fixed the silent flush during close issue by
FilterOutputStream (base class of BufferedOutputStream)
using try-with-resources to close the stream, getting a
behavior matching what JGit's SafeBufferedOutputStream
was doing:
try {
flush();
} finally {
out.close();
}
With Java 8 as the minimum required version to run JGit
it is no longer necessary to override close() or have
this class. Deprecate the class, and use the JRE's version
of close.
Change-Id: Ic0584c140010278dbe4062df2e71be5df9a797b3
This method pair allows the caller to read and modify the description
file that is traditionally used by gitweb and cgit when rendering a
repository on the web.
Gerrit Code Review has offered this feature for years as part of
its GitRepositoryManager interface, but its fundamentally a feature
of JGit and its Repository abstraction.
git-core typically initializes a repository with a default value
inside the description file. During getDescription() this string
is converted to null as it is never a useful description.
Change-Id: I0a457026c74e9c73ea27e6f070d5fbaca3439be5
In case a value is used which isn’t a power of 2 there will be a high
chance of java.lang.ArrayIndexOutBoundsException and
org.eclipse.jgit.errors.CorruptObjectException due to a mismatching
assumption for the DfsBlockCache#blockSizeShift parameter.
Change-Id: Ib348b3704edf10b5f93a3ffab4fa6f09cbbae231
Signed-off-by: Philipp Marx <smigfu@googlemail.com>
* GC.tooManyLooseObjects() always responded true since the loop missed
to advance the iterator so it always incremented until the threshold was
exceeded.
* Also fix loop exit criterion which was off by 1.
* Add some tests.
Change-Id: I70976dfaa026efbcf3c46bd45941f37277a18e04
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Previously, the streamFileThreshold, the threshold at which a file
would be streamed rather than loaded entirely into memory, was only
configurable on a global basis.
This commit makes this threshold configurable on a per-loader basis.
Bug: 490404
Change-Id: I492c18c3155dbf56eedda9044a61d76120fd75f9
Signed-off-by: Kevin Corcoran <kevin.corcoran@puppetlabs.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>