Reftable is a binary, block-based storage format for the ref-database.
It provides several advantages over the traditional packed + loose
storage format:
* O(1) write performance, even for deletions and transactions.
* atomic updates to the ref database.
* O(log N) lookup and prefix scans
* free from restrictions imposed by the file system: it is
case-sensitive even on case-insensitive file systems, and has
no inherent limitations for directory/file conflicts
* prefix compression reduces space usage for repetitive ref names,
such as gerrit's refs/changes/xx/xxxxx format.
FileReftableDatabase is based on FileReftableStack, which does
compactions inline. This is simple, and has good median performance,
but every so often it will rewrite the entire ref database.
For testing, a FileReftableTest (mirroring RefUpdateTest) is added to
check for Reftable specific behavior. This must be done separately, as
reflogs have different semantics.
Add a reftable flavor of BatchRefUpdateTest.
Add a FileReftableStackTest to exercise compaction.
Add FileRepository#convertToReftable so existing testdata can be
reused.
CQ: 21007
Change-Id: I1837f268e91c6b446cb0155061727dbaccb714b8
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Prevent adding a null parent to a commit's parent array. Doing so
can cause NPEs later on.
Bug: 552160
Change-Id: Ib24b7b9b7b08e0b6f246006b4a4cade7eeb830b9
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
core.quotePath = false means that "bytes higher than 0x80 are not
considered "unusal" anymore"[1], i.e., they are not escaped. In
essence this preserves non-ASCII characters in path names in output.
Note that control characters and other special characters in the
ASCII range will still be escaped.
Add a new QuotedString.GIT_PATH_MINIMAL singleton implementing this.
Change the normal GIT_PATH algorithm to use bytes instead of characters
so it can be re-used. Provide a setter in DiffFormatter for the quoting
style so that an application can override the default, which is the
setting from the git config (and by default "true"). Use the new
QuotedString.GIT_PATH_MINIMAL when core.quotePath == false.
[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-corequotePath
Bug: 552467
Change-Id: Ifcb233e7d10676333bf42011e32d01a4e1138059
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
IndexDiff would apply ignore mode ALL from .gitmodules to all remaining
submodules, and would ignore other settings from .gitignore and always
apply the setting defined on the IndexDiff instead. Correct that.
In canonical git the ignore setting from .gitmodules can also be
overridden by .git/config.[1] Implement that override in SubmoduleWalk.
[1] https://git-scm.com/docs/gitmodules#Documentation/gitmodules.txt-submoduleltnamegtignore
Bug: 521613
Change-Id: I9199fd447e41c7838924856dce40678370b66395
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Move the BaseReceivePack implementation back into ReceivePack. This is a
backward-incompatible change. For example, BaseReceivePack.FirstLine no
longer exists and cannot be referenced. However, most of the code
should just work by replacing BaseReceivePack with ReceivePack.
Although this is an API change, it only affects callers using JGit as a
server, and there are very few of those in the wild.
Change-Id: I1ce92869435d5eebb7d671be44561e69c6233134
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
This avoids polluting hand-crafted user level config with
auto-configured options which might disturb in environments where
the user level config is replicated between different machines.
Add a jgit config as parent of the system level config. Persist
measured timestamp resolutions always in this jgit config and read it
via the user global config. This has the effect that auto-configured
timestamp resolution will be used by default and can be overridden in
either the system level or user level config.
Store the jgit config under the XDG_CONFIG_HOME directory following the
XDG base directory specification [1] in order to ensure that we have
write permissions to persist the file. This has the effect that each OS
user will use its jgit config since they typically use different
XDG_CONFIG_HOME directories.
If the environment variable XDG_CONFIG_HOME is defined the jgit config
file is located at $XDG_CONFIG_HOME/jgit/config otherwise the default is
~/.config/jgit/config.
If you want to avoid redundant measurement for different OS users
manually copy the values measured and auto-configured for one OS user to
the system level git config.
[1] https://wiki.archlinux.org/index.php/XDG_Base_Directory
Bug: 551850
Change-Id: I0022bd40ae62f82e5b964c2ea25822eb55d94687
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
A merge may write files to the working tree. After a successful
merge one must fire a WorkingTreeModifiedEvent explicitly if
getModifiedFiles() is not empty.
Also, any touched files must be reported by the
WorkingTreeModifiedEvent fired by DirCacheCheckout.checkout().
Bug: 552636
Change-Id: I5fab8279ed8be8a4ae34cddfa726836b9277aea6
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
MergedReftableTest#scanDuplicates tests whether we can write duplicate
keys in a merged reftable. Apparently, the first key appearing should
get precedence, and this works because the sort() algorithm on ordered
collections is stable.
This is potentially confusing behavior, because you can write data
into the table that cannot be retrieved (Merged table can only have
one entry per key), and the APIs such as exactRef() only return a
single value.
Make this consistent with behavior introduced in I04f55c481 "reftable:
enforce ordering for ref and log writes" by considering a duplicate key
in sortAndWriteRefs as a fatal runtime error.
Change-Id: I1eedd18f028180069f78c5c467169dcfe1521157
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Some URLs cannot be converted via URL.toURI(). So don't convert
the full URL but only the bits that are needed to find a proxy
via java.net.ProxySelector.
Bug: 549690
Change-Id: I55b5ecee70c6b52f72f9bdba9ce552fde7f33976
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Instead of just looking for a substring match of user.signingKey
in a key's user ID implement the GPG matching formats[1] for:
'=' Full exact match
'<' Full exact match of the e-mail address
'@' Substring match within the e-mail address only
'*' General case-insensitive substring match (default)
When user.signingKey is not set, the committer's e-mail address is
used by default. In that case, use '<', i.e., require an exact match
on the OpenPGP e-mail address.
Also handle the optional "0x" prefix for (partial) key fingerprints.
[1] https://www.gnupg.org/documentation/manuals/gnupg/Specify-a-User-ID.html
Bug: 550335
Change-Id: I6ce482a099ff1a0dc9de45435cd4d3ec5b504f12
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
TreeRevFilterTest uses an unncessarily complicated TreeFilter - an
AndTreeFilter - when it should be as simple as possible because this
class tests TreeRevFilter, not AndTreeFilter. Replace the filter with a
simpler one.
Change-Id: I3256a65f6e0042d32fd76a9224b79a835674ff3a
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Support the core.hooksPath git config. This can be an absolute or
relative path of a directory where to find git hooks; a relative
path is resolved relative to the directory the hook will run in.
Bug: 500266
Change-Id: I671999a6386a837e897c31718583c91d8035f3ba
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
This doesn't yet ensure that _all_ repositories are closed. It only
handles the obvious, local, and easy cases.
Change-Id: I0f9f8607791f0f03ed1f5ad71e9595e78b78892f
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
This comment was probably copied from testCloneRepositoryWithBranch() to
testBareCloneRepositoryOnlyOneBranch() where it doesn't make sense.
Hence remove it.
Change-Id: I846debd084dd77fd473c3602a799f195a8390f77
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since [1] the gerrit project includes jgit as a submodule, and has this
warning enabled, resulting in 100s of warnings in the console.
Also enable the warning here, and fix them.
At the same time, add missing braces around adjacent and nearby one-line
blocks.
[1] https://gerrit-review.googlesource.com/c/gerrit/+/227897
Change-Id: I81df3fc7ed6eedf6874ce1a3bedfa727a1897e4c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
On changing a ref, the old SHA1 is not updated in the object => ref
mapping. This means search by object ID may still turn up a ref from
deeper within the stack. To fix this, check all refs produced by the
merged iterator against the merged reftables.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: I41e9cd395b0608eedeeaead0a9fd997238d747c9
The flag enabling sideband-all is used in two places: in UploadPack
for advertisement and in the protocol parser to read it from the
request.
This leds to problems in distributed deployments where the two requests of
a fetch can go to different servers with different configurations.
Use the existing allowsidebandall to accept the sideband-all request
(and respond to it) and introduce a new "advertisesidebandall" to toggle
the advertising of the feature.
Change-Id: I892d541bc3f321606c89bad1d333b079dce6b5fa
Signed-off-by: Ivan Frade <ifrade@google.com>
Note that TreeWalk.forPath() needs not be closed; the ObjectReader
_is_ closed when that method returns.
Change-Id: I6e022e4a2fde0c88d610a82de092ea541b33f75c
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
The object identifying packfiles to send them via packfile-uri contains
only the uri and the hash. This is the information that goes through the
wire. It would be useful to know also the size of those packfile, for
example to track how many bytes have been offloaded to HTTP.
Add size field the CachedPackUriProvider.PackInfo object.
Change-Id: If6b921b48a4764d936141c777879b148cc80bbd3
Signed-off-by: Ivan Frade <ifrade@google.com>
The DELIM and END constants are deprecated and using them causes
warnings. Replace them with the accessor methods.
Change-Id: Iadb27000755e8fd8c61d9218591f9d110b8265c8
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Also correctly parse the "always" value (allowed in canonical git
since git 2.12.0[1]). Adapt the ReflogWriter.
[1] https://github.com/git/git/commit/341fb2862
Bug: 551664
Change-Id: I051c76ca355a2ac8d6092de65f44b18bf9aeb125
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
MergedReftable is not used as an AutoCloseable, because closing tables
is currently handled by DfsReftableStack#close.
Encode that a MergedReftable is a list of ReftableReaders. The previous
code suggested that we could form nested trees of MergedReftables,
which is not how we use reftables.
Change-Id: Icbe2fee8a5a12373f45fc5f97d8b1a2b14231c96
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Otherwise the paths modified by a cherry-pick with conflicts won't be
reported as modified via WorkingTreeModifiedEvents.
Change-Id: I875b67c0d2f68efdf90a9c32b80a2e074ed3570d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
This makes the intended use of the classes more clear. It also
simplifies generic functions that write reftables: they only need a
ReftableWriter as argument, as the stream is carried within the
ReftableWriter.
Change-Id: Idbb06f89ae33100f0c0b562cc38e5b3b026d5181
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
In https://git.eclipse.org/r/c/144009/ UploadPack tests moved from
thrown to assertThrows, but newly introduced tests are still using
the thrown.
Update test so all of them use assertThrows.
Change-Id: I0ff19a6f8ba9e978d8ffc7a912c0572d9f00c7fa
Signed-off-by: Ivan Frade <ifrade@google.com>
Make a general test with all the cases, like request
advertised/unadvertised tips, reachable/unreachable from those tips,
commits/blobs.
Implement specific validator tests as subclasses. Each test provides the
validator instance and tells what cases are valid.
Change-Id: I7f961fcc05f7fabbeae1ba8ff73d99072ce8fc72
Signed-off-by: Ivan Frade <ifrade@google.com>
UploadPackTest is already too long and it is covering too many aspects
of UploadPack. This makes difficult to see what is tests and if all
cases are covered.
Move the reachability-related tests to its own file. This moves also an
auxiliary function, reducing the length of UploadPack. Complete also the
coverage, adding combinations of bitmap availability/commits or
blobs/reachable or not.
Change-Id: Id5cfc9d0118d997da30e3886c91db996a86250fc
Signed-off-by: Ivan Frade <ifrade@google.com>
Older JGit stored only milliseconds timestamps in the index. Newer
JGit may get finer timestamps from the file system. This leads to
slow index diffs when a new JGit runs against an index produced
by older JGit because many timestamps will differ and JGit will
then do many content checks. See [1].
Handle this migration case by only comparing milliseconds if the
index entry has only millisecond precision.
The inverse may also occur; also compare only milliseconds if the
file timestamp has only millisecond precision.
Do the same also for microsecond resolution. On Windows, NTFS may
provide 100ns resolution and may be used by external programs writing
the index, but Java's WindowsFileAttributes may provide only
microseconds.
File timestamp precision in Java depends not only on the Java APIs
used by different JGit versions but may also change when running the
same Java code on different VMs. And of course the resolution may
vary among operating and file systems. Moreover, timestamp precision
in the index depends on the program that wrote the index. Canonical
git may use a different resolution, maybe even different between git
versions.
[1] https://www.eclipse.org/forums/index.php/t/1100344/
Change-Id: Idfd08606c883cb98787b2138f9baf0cc89a57b56
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
allRefs determined the end of the ref block without accounting for
index or log blocks. This could cause other blocks to be interpreted
as ref blocks, leading to "invalid block" error messages.
Change-Id: I7b9323e7d5e0e7d64535b3ec1efd576aed1e9870
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Since commit c3b0dec509fe136c5417422f31898b5a4e2d5e02, Git has
disallowed writing a non-commit to refs/heads/* refs. JGit still
allows that, which can put users in a bad situation. For example,
git push origin v1.0:master
pushes the tag object v1.0 to refs/heads/master, instead of the
intended commit object v1.0^{commit}.
Prevent that by validating that the target of the ref points to a
commit object when pushing to refs/heads/*.
Git performs the same check at a lower level (in the RefDatabase). We
could do the same here, but for now let's start conservatively by
handling it in pushes first.
[jn: fleshed out commit message]
Change-Id: I8f98ae6d8acbcd5ef7553ec732bc096cb6eb7c4e
Signed-off-by: Yunjie Li <yunjieli@google.com>
Signed-off-by: Jonathan Nieder <jrn@google.com>
PackedBatchRefUpdate was creating a new packed-refs list that was
potentially unsorted. This would be papered over when the list was
read back from disk in parsePackedRef, which detects unsorted ref
lists on reading, and sorts them. However, the BatchRefUpdate also
installed the new (unsorted) list in-memory in
RefDirectory#packedRefs.
With the timestamp granularity code committed to stable-5.1, we can
more often accurately decide that the packed-refs file is clean, and
will return the erroneous unsorted data more often. Unluckily timed
delays also cause the file to be clean, hence this problem was
exacerbated under load.
The symptom is that refs added by a BatchRefUpdate would stop being
visible directly after they were added. In particular, the Gerrit
integration tests uses BatchRefUpdate in its setup for creating the
Admin group, and then tries to read it out directly afterward.
The tests recreates one failure case. A better approach would be to
revise RefList.Builder, so it detects out-of-order lists and
automatically sorts them.
Fixes https://bugs.eclipse.org/bugs/show_bug.cgi?id=548716 and
https://bugs.chromium.org/p/gerrit/issues/detail?id=11373.
Bug: 548716
Change-Id: I613c8059964513ce2370543620725b540b3cb6d1
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The only usage of this test iterator was removed in df637928d. Hence
delete this iterator and associated test.
Change-Id: I47710133ec3edc675c21db210960c024982668c6
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
(cherry picked from commit a024759cf5)
This test case assumed file system timestamp resolution of 1 second. On
filesystems with a finer resolution this test fails since the index
entry is only smudged if the file index entry's lastModified and the
lastModified of the git index itself are within the same filesystem
timer tick. Fix this by ensuring that these timestamps are identical
which should work for any filesystem timer resolution.
Bug: 548188
Change-Id: Id84d59e1cfeb48fa008f8f27f2f892c4f73985de
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
(cherry picked from commit 1159f9dd7c)
By using File#setLastModified, we can create a racy git situation
stably.
Tested with --runs_per_test=100
Bug: 526111
Change-Id: I60b3632d353e19f335668325aa603640be423f58
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
(cherry picked from commit df637928d2)
Move the handling of cached user and system config to getSystemConfig
and getUserConfig methods and revert the implementation of
openSystemConfig and openUserConfig to the old stateless
implementation.
This ensures the open methods respect the passed-in parent config, which
may be different on each invocation. Additionally, returning a new
instance matches the behavior of the previous implementation of the
default system reader, which downstream callers may be depending on.
Move the implementation of the new caching methods getSystemConfig and
getUserConfig up to SystemReader. This avoids that we break the ABI for
subclasses of SystemReader.
Also see [1] which fixed a similar problem with Gerrit's custom
SystemReader.
[1] https://gerrit-review.googlesource.com/c/gerrit/+/225458
Change-Id: If54a2491932d8fc914d4649cb73c9e837c5b8ad0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When firstParent is set, RevWalk traverses only the first parent of a
commit, even though that commit is UNINTERESTING. Since we want the
maximal UNINTERESTING set, we shouldn't prune any parents here. This
issue is apparent only when some of the commits being traversed are
unparsed, since walker.carryFlagsImpl() propagates the UNINTERESTING
flag to all parsed ancestors, masking the issue.
Therefore teach RevWalk to traverse all parents when a commit is
UNINTERESTING and not only the first parent. Since this issue is
masked by commit parsing, also test situations when the commits
involved are unparsed.
Signed-off-by: Alex Spradlin <alexaspradlin@google.com>
Change-Id: I95e2ad9ae8f1f50fbecae674367ee7e0855519b1
It's expected that jgit should work without native git installation.
In such case Security Manager can be configured to deny access to the
files outside of git repository. JGit tries to find cygwin
installation. If Security manager restricts access to some folders
in PATH, it should be considered that those folders are absent
for jgit.
Also JGit tries to detect if symbolic links are supported by OS. If
security manager forbids creation of symlinks, it should be assumed
that symlinks aren't supported.
Bug: 550115
Change-Id: Ic4b243cada604bc1090db6cc1cfd74f0fa324b98
Signed-off-by: Nail Samatov <sanail@yandex.ru>
Teach UploadPack to take a provider of URIs corresponding to cached
packs. When fetching, if the client supports the packfile-uri feature,
and if such a cached pack were to be streamed, instead send the
corresponding URI.
This packfile-uri feature is implemented in the jt/fetch-cdn-offload
branch of Git. There is interest in this feature [1], but it is not yet
merged.
[1] https://public-inbox.org/git/cover.1552073690.git.jonathantanmy@google.com/
Change-Id: I9a32dae131c9c56ad2ff4a8a9638ae3b5e44dc15
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Previously, the API did not enforce ordering of writes. Misuse of
this API would lead to data effectively being lost.
Guard against that with IllegalArgumentException, and add a test.
Change-Id: I04f55c481d60532fc64d35fa32c47037a03988ae
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Small reftables omit the log index. Currently,
ReftableWriter#shouldHaveIndex does this if there is a single-block
log, but other writers could decide on different criteria.
In the case that the log index is missing, we have to linearly search
for the right block. It is never appropriate to use binary search on
blocks for log data, as the blocks are compressed and therefore
irregularly sized.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: Id59874edf6bf45c7dec502d9465888e077ffe198