When creating a new PackFile instance it is specified whether this pack
has an associated bitmap index file or not. This information is cached
and the public method getBitmapIndex() will always assume a bitmap index
file must exist if the cached data tells so. But it may happen that the
packfiles are repacked during a gc in a different process causing the
packfile, bitmap-index and index file to be deleted. Since JGit still
has an open FileHandle on the packfile this file is not really deleted
and can still be accessed. But index and bitmap index file are deleted.
Fix getBitmapIndex() to invalidate the cached packfile instance if such
a situation occurs.
This problem showed up when a gerrit server was serving repositories
which where garbage collected with native git regularly. Fetch and
clone commands for certain repositories failed permanently after a
native git gc had deleted old bitmap index files.
Change-Id: I8e620bec74dd3f310ba42024f9a657062f868f0e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Per the git config documentation[1], pushInsteadOf is ignored when
a remote has explicit pushUris.
Implement this, and adapt tests.
Up to now JGit mistakenly applied pushInsteadOf also to existing
pushUris. If some repositories had relied on this mis-feature,
pushes may newly suddenly fail (the uncritical case; the config
just needs to be fixed) or even still succeed but push to unexpected
places, namely to the non-rewritten pushUrls (the critical case).
The release notes should point out this change.
[1] https://git-scm.com/docs/git-config
Bug: 393170
Change-Id: I38c83204d2ac74f88f3d22d0550bf5ff7ee86daf
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
According to [1], pushInsteadOf is
1. applied to the uris, not to the pushUris
2. ignored if a remote has an explicit pushUri
JGit applied it only to the pushUris. As a result, pushInsteadOf was
ignored for remotes having only a uri, but no pushUri.
This commit implements (1) if there are no pushUris. I did not dare
implement (2) because:
* there are explicit tests for it that expect that pushInsteadOf gets
applied to existing pushUrls, and
* people may actually use and rely on this JGit behavior.
[1] https://git-scm.com/docs/git-config
Bug: 393170
Change-Id: I6dacbf1768a105190c2a8c5272e7880c1c9c943a
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
This matches the proposal that has been discussed at length on
git-core mailing list and seems to be the accepted convention.
Change-Id: I9f6ab15144826893d1e2a4b48a2d657d6dd445ec
Otherwise fancy combinations of attributes (binary or -text in
combination with crlf or eol) may result in the corruption of binary
data.
Bug: 520910
Change-Id: I3ffc666c13d1b9d2ed987b69a67bfc7f42ccdbfc
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Attribute rules must match against the entry path relative to the
attribute node containing the rule. The global entry path is to be
used only for the init and the global node (and of course the root
node).
Bug: 520677
Change-Id: I80389a2dc272a72312729ccd5358d7c75e1ea20a
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
This avoids executing mergeAlgorithm.merge on binary data, which is
unlikely to be useful.
Arguably, binary data should not make it to
ResolveMerger#contentMerge, but this approach has the following
advantages:
* binary detection is exact, since it doesn't only look at the start
of the blob.
* it is cheap, as we have to iterate over the bytes anyway to find
'\n'.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: I424295df1dc60a719859d9d7c599067891b15792
Very short abbreviations that are under 8 hex digits do not
have values in w2. Use w1 as the Java hashCode() instead, so
that the prefix of the abbreviation is always included in the
hashing function used by any java.util.Collection type.
Change-Id: Idaf69f86b62630ba4a022d31b4c293c6d138f557
In a server scenario such as Gerrit Code Review, there may be many
atomic BatchRefUpdates contending for locks on both the packed-refs file
and some subset of loose refs. We already retry lock acquisition to
improve this situation slightly, but we can do better by using an
in-process lock. This way, instead of retrying and potentially exceeding
their timeout, different threads sharing the same Repository instance
can wait on a fair lock without having to touch the disk lock. Since a
server is probably already using RepositoryCache anyway, there is a high
likelihood of reusing the Repository instance.
Change-Id: If5dd1dc58f0ce62f26131fd5965a0e21a80e8bd3
If a repo frequently uses PackedBatchRefUpdates, there is likely to be
contention on the packed-refs file, so it's not appropriate to fail
immediately the first time we fail to acquire a lock. Add some logic to
RefDirectory to support general retrying of lock acquisition.
Currently, there is a hard-coded wait starting at 100ms and backing off
exponentially to 1600ms, for about 3s of total wait. This is no worse
than the hard-coded backoff that JGit does elsewhere, e.g. in
FileUtils#delete. One can imagine a scheme that uses per-repository
configuration of backoff, and the current interface would support this
without changing any callers.
Change-Id: I4764e11270d9336882483eb698f67a78a401c251
Make sure all objects referenced by references are reachable. Stop at
the first missing object.
Change-Id: Ifcd7392c4321b17d9290bd87f038bc62bc10dabb
Signed-off-by: Zhen Chen <czhen@google.com>
JGit already had some fsck-like classes like ObjectChecker which can
check for an individual object.
The read-only FsckPackParser which will parse all objects within a pack
file and check it with ObjectChecker. It will also check the pack index
file against the object information from the pack parser.
Change-Id: Ifd8e0d28eb68ff0b8edd2b51b2fa3a50a544c855
Signed-off-by: Zhen Chen <czhen@google.com>
On-disk reflogs are not stored in the packed-refs file, so we cannot
ensure atomic updates. We choose the lesser evil of dropping failed
reflog updates on the floor, rather than throwing an exception even
though the underlying ref updates succeeded.
Add tests for reflogs to BatchRefUpdateTest.
Change-Id: Ia456ba9e36af8e01fde81b19af46a72378e614cd
* Factor out helpers for setting up and executing updates.
* Use common assert methods, with a special enum type that papers over
the fact that there is no ReceiveCommand.Result for transaction
aborted.
* Static import ReceiveCommand.Type constants.
* Add blank lines to separate repo setup, update execution, and asserts.
Change-Id: Ic3717f94331abfc7ae3e92065f3fe32026bf7cea
Run with @Parameterized, so we don't have to duplicate test setup for
each atomic/non-atomic test. We still have to have two different sets of
asserts for the cases where the behavior is different. In fact, this is
a readability win: it emphasizes that performing the exact same setup
except for the atomic setting will have different behavior.
Change-Id: I78a8214075e204732a423341f14c09de273a7854
The existing packed-refs file provides a mechanism for implementing
atomic multi-ref updates without any changes to the on-disk format or
lockfile protocol. We just need to make sure that there are no loose
refs involved in the transaction, which we can achieve by packing the
refs while holding locks on all loose refs. Full details of the
algorithm are in the PackedBatchRefUpdate javadoc.
This change does not implement reflog support, which will come in a
later change.
Change-Id: I09829544a0d4e8dbb141d28c748c3b96ef66fee1
ReceiveCommand.Result has a slightly richer set of possibilities, so it
makes sense for RefUpdate.Result to have more values in order to match.
In particular, this allows us to return REJECTED_MISSING_OBJECT from
RefUpdate when an object is missing.
The comment in RefUpdate#safeParse about expecting some old objects to be
missing is only applicable to the old ID, not the new ID. A missing new
ID is a bug or programmer error, and we should not update a ref to point
to one.
Fix various tests that started failing because they depended for no good
reason on setting refs to point to nonexistent objects; it's always easy
to create a real object when necessary.
It is possible that some downstream users of RefUpdate.Result might
choose to handle one of the new statuses differently, for example by
providing a more user-readable error message; that is not done in this
change.
Change-Id: I734b1c32d5404752447d9e20329471436ffe05fc
Fix patch matching for patterns of form a/b/** : this should not match
paths like a/b but still match a/b/ and a/b/c.
Change-Id: Iacbf496a43f01312e7d9052f29c3f9c33807c85d
Signed-off-by: Dmitry Pavlenko <pavlenko@tmatesoft.com>
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
When a file uses a different block size (e.g. 500) than the cache
(e.g. 512), and the DfsPackFile's blockSize field has not been
initialized, the cache misaligns block loads. The cache uses its
default of 512 to compute the block alignment instead of the file's
500.
This causes DfsReader try to set an empty range into an Inflater,
resulting in an object being unable to load.
Change-Id: I7d6352708225f62ef2f216d1ddcbaa64be113df6
Simple test to verify two DfsRepository instances will reuse the same
DfsBlocks in the DfsBlockCache, even though the DfsStreamKey instance
is now different between their DfsPackFile instances.
Change-Id: I409c109142dea488d189b9ac0d3c319755dce7b4
By making this a deterministic function, DfsBlockCache can stop
retaining a map of every DfsPackDescription it has ever seen. This
fixes a long standing memory leak in DfsBlockCache.
This refactoring also simplifies the idea of setting up more
lightweight objects around streams.
Change-Id: I051e7b96f5454c6b0a0e652d8f4a69c0bed7f6f4
The reader may find it surprising that this succeeds without incident
unless there is peeling or a fast-forward check involved. This behavior
may be changed in the future, but for now, just document the current
behavior.
Change-Id: I348b37e93e0264dc0905c4d58ce881852d1dfe5e
The RefDirectory implementation of doDelete never considered whether to
delete a symref or its leaf, because the detachingSymbolicRef bit was
never exposed from RefUpdate. The behavior was thus incorrectly to
always delete the symref, never the leaf.
There was no test for this behavior. The only thing that attempted to be
a test was testDeleteHeadInBareRepo, but this test was broken for
reasons unrelated to this bug. Specifically, it set the leaf to point to
a completely nonexistent object, and then asserted that deleting HEAD
resulted in NO_CHANGE. The only reason this test ever passed is because
of a quirk of updateImpl, which treats a missing object as the same as
null. This quirk aside, the test wasn't really testing the right thing.
Turn this into a real test by writing out a real object and pointing the
leaf at that.
Also, add a test for the detachingSymbolicRef case, i.e. deleting the
symref and leaving the leaf alone.
Change-Id: Ib96d2a35b4f99eba0734725486085fc6f9d78aa5
The contents of the packedRefList AtomicReference should never differ
from what we expect prior to writing, because this segment of the code
is protected by the packed-refs lock file on disk. If it does happen,
whether due to programmer error or a rogue process not respecting the
locking protocol, it's better to let the caller know than to silently
drop the whole commit operation on the floor.
The existing concurrentOnlyOneWritesPackedRefs test is inherently
nondeterministic as written, and was already about 6% flaky as measured
by bazel:
$ bazel test --runs_per_test=200 //org.eclipse.jgit.test:org_eclipse_jgit_internal_storage_file_GcPackRefsTest
...
INFO: Elapsed time: 42.608s, Critical Path: 10.35s
//org.eclipse.jgit.test:org_eclipse_jgit_internal_storage_file_GcPackRefsTest FAILED in 12 out of 200 in 1.6s
Stats over 200 runs: max = 1.6s, min = 1.1s, avg = 1.3s, dev = 0.1s
This flakiness was caused by the assumption that exactly one of the 2
threads would fail, when both might actually succeed in practice due to
racing on the compare-and-swap.
For whatever reason, this change affected the interleaving behavior in
such a way that the flakiness jumped to around 50%. Making the
interleaving of the test fully deterministic is beyond the scope of this
change, but a simple tweak to the assertion is enough to make it pass
consistently 200+ times both before and after this change.
Change-Id: I5ff4dc39ee05bda88d47909acb70118f3d0c8f74
Some downstream code checks whether a ReceiveCommand is a create or a
delete based on the type field. Other downstream code (in particular a
good chunk of Gerrit code I wrote) checks the same thing by comparing
oldId/newId to zeroId. Unfortunately, there were no strict checks in the
constructor that ensures that zeroId is only set for oldId/newId if the
type argument corresponds, so a caller that passed mismatched IDs and
types would observe completely undefined behavior as a result. This is
and always has been a misuse of the API; throw IllegalArgumentException
so the caller knows that it is a misuse.
Similarly, throw from the constructor if oldId/newId are null. The
non-nullness requirement was already documented. Fix RefDirectoryTest to
not do the wrong thing.
Change-Id: Ie2d0bfed8a2d89e807a41925d548f0f0ce243ecf
This renaming supports reusing DfsStreamKey in a future commit
to index other PackExt type streams inside of the DfsBlockCache.
Change-Id: Ib52d374e47724ccb837f4fbab1fc85c486c5b408
The merger is now able to react to the use of the merge attribute.
The value unset and the custom value 'binary' are handled (-merge
and merge=binary)
Since the specification of the merge attribute states that when the
attribute is unset, ours version must be kept in case of a conflict, we
don't overwrite the file but keep the local version.
Bug: 517128
Change-Id: Ib5fbf17bdaf727bc5d0e106ce88f2620d9f87a6f
Signed-off-by: Mathieu Cartaud <mathieu.cartaud@obeo.fr>
These config options allow overriding the message type (error, warn or
ignore) of a specific message ID such as missingEmail.
The supported fsck message IDs are defined in ObjectChecker.ErrorType.
Since TransferConfig.FsckMode wasn't public parsing fsck configuration
options like e.g. fsck.missingEmail=ignore failed with an
IllegalAccessException. Fix this by declaring this enum public.
Change-Id: I3f41ff7a76a846250a63ce92a9fd111eb347269f
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
In the case of multiple tags on the same commit, jgit previously
only ever looked at the last of those tags; git behaviour is to
return the first tag (or first matching one if --match is
specified).
Bug: 518377
Change-Id: I3b6b58ad9f8aa3879ae35b84542b7bddc74a27d6
Signed-off-by: Oliver Lockwood <oliver.lockwood@cantab.net>
A `match()` method has been added to the DescribeCommand, allowing
users to specify one or more `glob(7)` matchers as per Git convention.
Bug: 518377
Change-Id: Ib4cf34ce58128eed0334adf6c4a052dbea62c601
Signed-off-by: Oliver Lockwood <oliver.lockwood@cantab.net>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: Idcc93c2ca95938995d489cffda649c7d7b26c50e
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If set, "singlePack" will create a single GC pack file for all
objects reachable from refs/*. If not set, the GC pack will contain
object reachable from refs/heads/* and refs/tags/*, and the GC_REST
pack will contain all other reachable objects.
Change-Id: I56bcb6a9da2c10a0909c2f940c025db6f3acebcb
Signed-off-by: Terry Parker <tparker@google.com>