MergedReftable combines multiple reference tables together in a stack,
allowing higher/later tables to shadow earlier/lower tables. This
forms the basis of a transaction system, where each transaction writes
a new reftable containing only the modified references, and readers
perform a merge on the fly to get the latest value.
Change-Id: Ic2cb750141e8c61a8b2726b2eb95195acb6ddc83
ReftableReader provides sequential scanning support over all
references, a range of references within a subtree (such as
"refs/heads/"), and lookup of a single reference. Reads can be
accelerated by an index block, if it was created by the writer.
The BlockSource interface provides an abstraction to read from the
reftable's backing storage, supporting a future commit to connect
to JGit DFS and the DfsBlockCache.
Change-Id: Ib0dc5fa937d0c735f2a9ff4439d55c457fea7aa8
This is a simple writer to create reftable formatted files. Follow-up
commits will add support for reading from reftable, debugging
utilities, and tests.
Change-Id: I3d520c3515c580144490b0b45433ea175a3e6e11
git-core follows HTTP redirects so JGit should also provide this.
Implement config setting http.followRedirects with possible values
"false" (= never), "true" (= always), and "initial" (only on GET, but
not on POST).[1]
We must do our own redirect handling and cannot rely on the support
that the underlying real connection may offer. At least the JDK's
HttpURLConnection has two features that get in the way:
* it does not allow cross-protocol redirects and thus fails on
http->https redirects (for instance, on Github).
* it translates a redirect after a POST to a GET unless the system
property "http.strictPostRedirect" is set to true. We don't want
to manipulate that system setting nor require it.
Additionally, git has its own rules about what redirects it accepts;[2]
for instance, it does not allow a redirect that adds query arguments.
We handle response codes 301, 302, 303, and 307 as per RFC 2616.[3]
On POST we do not handle 303, and we follow redirects only if
http.followRedirects == true.
Redirects are followed only a certain number of times. There are two
ways to control that limit:
* by default, the limit is given by the http.maxRedirects system
property that is also used by the JDK. If the system property is
not set, the default is 5. (This is much lower than the JDK default
of 20, but I don't see the value of following so many redirects.)
* this can be overwritten by a http.maxRedirects git config setting.
The JGit http.* git config settings are currently all global; JGit has
no support yet for URI-specific settings "http.<pattern>.name". Adding
support for that is well beyond the scope of this change.
Like git-core, we log every redirect attempt (LOG.info) so that users
may know about the redirection having occurred.
Extends the test framework to configure an AppServer with HTTPS support
so that we can test cloning via HTTPS and redirections involving HTTPS.
[1] https://git-scm.com/docs/git-config
[2] 6628eb41db
[3] https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
CQ: 13987
Bug: 465167
Change-Id: I86518cb76842f7d326b51f8715e3bbf8ada89859
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Currently there is no way to determine the precise changes done
to the working tree by a JGit command. Only the CheckoutCommand
actually provides access to the lists of modified, deleted, and
to-be-deleted files, but those lists may be inaccurate (since they
are determined up-front before the working tree is modified) if
the actual checkout then fails halfway through. Moreover, other
JGit commands that modify the working tree do not offer any way to
figure out which files were changed.
This poses problems for EGit, which may need to refresh parts of the
Eclipse workspace when JGit has done java.io file operations.
Provide the foundations for better file change tracking: the working
tree is modified exclusively in DirCacheCheckout. Make it emit a new
type of RepositoryEvent that lists all files that were modified or
deleted, even if the checkout failed halfway through. We update the
'updated' and 'removed' lists determined up-front in case of file
system problems to reflect the actual state of changes made.
EGit thus can register a listener for these events and then knows
exactly which parts of the Eclipse workspace may need to be refreshed.
Two commands manage checking out individual DirCacheEntries themselves:
checkout specific paths, and applying a stash with untracked files.
Make those two also emit such a new WorkingTreeModifiedEvent.
Furthermore, merges may modify files, and clean, rm, and stash create
may delete files.
CQ: 13969
Bug: 500106
Change-Id: I7a100aee315791fa1201f43bbad61fbae60b35cb
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
When creating a new PackFile instance it is specified whether this pack
has an associated bitmap index file or not. This information is cached
and the public method getBitmapIndex() will always assume a bitmap index
file must exist if the cached data tells so. But it may happen that the
packfiles are repacked during a gc in a different process causing the
packfile, bitmap-index and index file to be deleted. Since JGit still
has an open FileHandle on the packfile this file is not really deleted
and can still be accessed. But index and bitmap index file are deleted.
Fix getBitmapIndex() to invalidate the cached packfile instance if such
a situation occurs.
This problem showed up when a gerrit server was serving repositories
which where garbage collected with native git regularly. Fetch and
clone commands for certain repositories failed permanently after a
native git gc had deleted old bitmap index files.
Change-Id: I8e620bec74dd3f310ba42024f9a657062f868f0e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Per the git config documentation[1], pushInsteadOf is ignored when
a remote has explicit pushUris.
Implement this, and adapt tests.
Up to now JGit mistakenly applied pushInsteadOf also to existing
pushUris. If some repositories had relied on this mis-feature,
pushes may newly suddenly fail (the uncritical case; the config
just needs to be fixed) or even still succeed but push to unexpected
places, namely to the non-rewritten pushUrls (the critical case).
The release notes should point out this change.
[1] https://git-scm.com/docs/git-config
Bug: 393170
Change-Id: I38c83204d2ac74f88f3d22d0550bf5ff7ee86daf
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
According to [1], pushInsteadOf is
1. applied to the uris, not to the pushUris
2. ignored if a remote has an explicit pushUri
JGit applied it only to the pushUris. As a result, pushInsteadOf was
ignored for remotes having only a uri, but no pushUri.
This commit implements (1) if there are no pushUris. I did not dare
implement (2) because:
* there are explicit tests for it that expect that pushInsteadOf gets
applied to existing pushUrls, and
* people may actually use and rely on this JGit behavior.
[1] https://git-scm.com/docs/git-config
Bug: 393170
Change-Id: I6dacbf1768a105190c2a8c5272e7880c1c9c943a
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
This matches the proposal that has been discussed at length on
git-core mailing list and seems to be the accepted convention.
Change-Id: I9f6ab15144826893d1e2a4b48a2d657d6dd445ec
Otherwise fancy combinations of attributes (binary or -text in
combination with crlf or eol) may result in the corruption of binary
data.
Bug: 520910
Change-Id: I3ffc666c13d1b9d2ed987b69a67bfc7f42ccdbfc
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Attribute rules must match against the entry path relative to the
attribute node containing the rule. The global entry path is to be
used only for the init and the global node (and of course the root
node).
Bug: 520677
Change-Id: I80389a2dc272a72312729ccd5358d7c75e1ea20a
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Happily, most anonymous SectionParser implementations can be replaced
with FooConfig::new, as long as the constructor takes a single Config
arg. Many of these, the non-public ones, can in turn be inlined. A few
remaining SectionParsers can be lambdas.
Change-Id: I3f563e752dfd2007dd3a48d6d313d20e2685943a
This avoids executing mergeAlgorithm.merge on binary data, which is
unlikely to be useful.
Arguably, binary data should not make it to
ResolveMerger#contentMerge, but this approach has the following
advantages:
* binary detection is exact, since it doesn't only look at the start
of the blob.
* it is cheap, as we have to iterate over the bytes anyway to find
'\n'.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: I424295df1dc60a719859d9d7c599067891b15792
Very short abbreviations that are under 8 hex digits do not
have values in w2. Use w1 as the Java hashCode() instead, so
that the prefix of the abbreviation is always included in the
hashing function used by any java.util.Collection type.
Change-Id: Idaf69f86b62630ba4a022d31b4c293c6d138f557
SpotBugs [1] is the spiritual successor of FindBugs, carrying on from
the point where it left off with support of its community.
[1] http://spotbugs.readthedocs.io/
Change-Id: I127f2c54b04265b6565e780116617ffa8a4d7eaf
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
In a server scenario such as Gerrit Code Review, there may be many
atomic BatchRefUpdates contending for locks on both the packed-refs file
and some subset of loose refs. We already retry lock acquisition to
improve this situation slightly, but we can do better by using an
in-process lock. This way, instead of retrying and potentially exceeding
their timeout, different threads sharing the same Repository instance
can wait on a fair lock without having to touch the disk lock. Since a
server is probably already using RepositoryCache anyway, there is a high
likelihood of reusing the Repository instance.
Change-Id: If5dd1dc58f0ce62f26131fd5965a0e21a80e8bd3
If a repo frequently uses PackedBatchRefUpdates, there is likely to be
contention on the packed-refs file, so it's not appropriate to fail
immediately the first time we fail to acquire a lock. Add some logic to
RefDirectory to support general retrying of lock acquisition.
Currently, there is a hard-coded wait starting at 100ms and backing off
exponentially to 1600ms, for about 3s of total wait. This is no worse
than the hard-coded backoff that JGit does elsewhere, e.g. in
FileUtils#delete. One can imagine a scheme that uses per-repository
configuration of backoff, and the current interface would support this
without changing any callers.
Change-Id: I4764e11270d9336882483eb698f67a78a401c251
Make sure all objects referenced by references are reachable. Stop at
the first missing object.
Change-Id: Ifcd7392c4321b17d9290bd87f038bc62bc10dabb
Signed-off-by: Zhen Chen <czhen@google.com>
JGit already had some fsck-like classes like ObjectChecker which can
check for an individual object.
The read-only FsckPackParser which will parse all objects within a pack
file and check it with ObjectChecker. It will also check the pack index
file against the object information from the pack parser.
Change-Id: Ifd8e0d28eb68ff0b8edd2b51b2fa3a50a544c855
Signed-off-by: Zhen Chen <czhen@google.com>
On-disk reflogs are not stored in the packed-refs file, so we cannot
ensure atomic updates. We choose the lesser evil of dropping failed
reflog updates on the floor, rather than throwing an exception even
though the underlying ref updates succeeded.
Add tests for reflogs to BatchRefUpdateTest.
Change-Id: Ia456ba9e36af8e01fde81b19af46a72378e614cd
The existing packed-refs file provides a mechanism for implementing
atomic multi-ref updates without any changes to the on-disk format or
lockfile protocol. We just need to make sure that there are no loose
refs involved in the transaction, which we can achieve by packing the
refs while holding locks on all loose refs. Full details of the
algorithm are in the PackedBatchRefUpdate javadoc.
This change does not implement reflog support, which will come in a
later change.
Change-Id: I09829544a0d4e8dbb141d28c748c3b96ef66fee1
ReceiveCommand.Result has a slightly richer set of possibilities, so it
makes sense for RefUpdate.Result to have more values in order to match.
In particular, this allows us to return REJECTED_MISSING_OBJECT from
RefUpdate when an object is missing.
The comment in RefUpdate#safeParse about expecting some old objects to be
missing is only applicable to the old ID, not the new ID. A missing new
ID is a bug or programmer error, and we should not update a ref to point
to one.
Fix various tests that started failing because they depended for no good
reason on setting refs to point to nonexistent objects; it's always easy
to create a real object when necessary.
It is possible that some downstream users of RefUpdate.Result might
choose to handle one of the new statuses differently, for example by
providing a more user-readable error message; that is not done in this
change.
Change-Id: I734b1c32d5404752447d9e20329471436ffe05fc
Fix patch matching for patterns of form a/b/** : this should not match
paths like a/b but still match a/b/ and a/b/c.
Change-Id: Iacbf496a43f01312e7d9052f29c3f9c33807c85d
Signed-off-by: Dmitry Pavlenko <pavlenko@tmatesoft.com>
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
This change makes it possible to configure the 'inCoreLimit' of LocalFile
used in ResolveMerger#insertMergeResult. Since LocalFile itself has some
risks, e.g. it may be left behind as garbage in case of failure. It should
be good to be able to control the size limit for using LocalFile.
Change-Id: I3dc545ade370b2bbdb7c610ed45d5dd4d39b9e8e
Signed-off-by: Changcheng Xiao <xchangcheng@google.com>
Allow a DFS implementation to report blockSize to DfsPackFile,
bypassing alignment errors and corrections in the DfsBlockCache when
the blockSize of a specific file differs from the cache's configured
blockSize.
Change-Id: Ic376314d4a86a0bd528c033e169d93eef035b233
When a file uses a different block size (e.g. 500) than the cache
(e.g. 512), and the DfsPackFile's blockSize field has not been
initialized, the cache misaligns block loads. The cache uses its
default of 512 to compute the block alignment instead of the file's
500.
This causes DfsReader try to set an empty range into an Inflater,
resulting in an object being unable to load.
Change-Id: I7d6352708225f62ef2f216d1ddcbaa64be113df6
Holding the current DfsBlock in a local variable 'b' may prevent the
Java GC from reclaiming it while loading the next block. Remove the
local variable and rely only on the field.
Change-Id: Ibfc8394cac717b485fdc94d5c8479c3f8ca78ee4
In the future with reftable a DFS implementation may choose to create
a PackDescription that contains only a REFTABLE extension. Filter
these out by only creating a DfsPackFile if the PackDescription as the
expected PackExt.PACK.
Change-Id: I4c831622378156ae6b68f82c1ee1db5e150893be
Not all DFS implementations use globally unique pack names in the
DfsPackDescription. Most require the DfsRepositoryDescription to
qualify the pack. Include DfsRepositoryDescription in the default
DfsStreamKey implementation, to prevent cache collisions.
Change-Id: I9ebf0c76bf2b414a702ae050b32e42588067bc44
Using a HashMap is overkill for this storage. PackExt is a
constrained type that permits no more than 32 unique values in the JVM.
Each is assigned a unique index (getPosition), which can be used as
indexes in a simple long[].
Change-Id: Ib8e3b2db15d3fde28989b6f4b9897f8a7bb36f3b
When 07f98a8b71 ("Derive DfsStreamKey from DfsPackDescription")
stopped caching DfsPackFile in the DfsBlockCache, the DfsPackFile began
to always load the idx, bitmap, or compute reverse index, as the cache
handles were no longer populated by prior requests.
Rework caching to lookup the objects from the DfsBlockCache if the
local DfsPackFile handle is invalid. This allows the DfsPackFile to
be more of a flyweight instance across requests.
Change-Id: Ic7b42ce2d90692cccea36deb30c2c76ccc81638b
While implementing a custom subclass of DfsStreamKey it became obvious
the required derive(String) was making it impossible to construct an
efficient key in all cases.
Instead, use a special wrapper type ForReverseIndex around the INDEX's
own DfsStreamKey to denote the reverse index stream in the
DfsBlockCache. This adds a smaller layer of boxing, but eliminates
weird issues for DFS implementors using specialized DfsStreamKey
implementations for space efficiency reasons.
Now that DfsStreamKey is reasonably light-weight, avoid allocating the
index and reverse index keys until necessary. DfsPackFile mostly
holds the DfsBlockCache.Ref handle to the object, and only needs the
DfsStreamKey when its looking up the handle.
Change-Id: Icea78e8f7f1514087b94ef5f525d9573ea2913f2
By making this a deterministic function, DfsBlockCache can stop
retaining a map of every DfsPackDescription it has ever seen. This
fixes a long standing memory leak in DfsBlockCache.
This refactoring also simplifies the idea of setting up more
lightweight objects around streams.
Change-Id: I051e7b96f5454c6b0a0e652d8f4a69c0bed7f6f4