Android an Chrome have several repos with >300k refs. We sometimes see
negotiations of >100k rounds. This change provides a "minimal negotiation"
feature on the client side that limits how many "have" lines the client
sends. The client extracts the current SHA-1 values for the refs in its
wants set, and terminates negotiation early when all of those values have
been sent as haves. If a new branch is being fetched then that set will
be empty and the client will terminate after current default minimum
of two rounds.
This feature is gated behind a "fetch.useminimalnegotiation" configuration
flag, which defaults to false.
Change-Id: Ib12b095cac76a59da6e8f72773c4129e3b32ff2b
Signed-off-by: Terry Parker <tparker@google.com>
This would allow compact and GC process to clean up duplicate ref names in the reftables.
Change-Id: I2b9df0bf72dba63cc3525e374982e60559a776c2
Signed-off-by: Minh Thai <mthai@google.com>
When CleanCommand is collecting the files and folders to be deleted
it may happen that the list of directories contains obsolete entries.
E.g. a folder and its parent folder may be in the list. Only the
parent folder would be sufficient.
This was a reason for hitting FileNotFoundExceptions when finally
trying to delete the files and folders. Improve CleanCommand
to ignore files to be deleted which are already gone.
Bug: 514434
Change-Id: I10caa01bfb9cec5967dfdaea50c6e4a713eeeabd
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
After packaging references, the folders containing these references are
not deleted. In a busy repository, this causes operations to slow down
as traversing the references tree becomes longer.
Delete empty reference folders after the loose references have been
packed.
To avoid deleting a folder that was just created by another concurrent
operation, only delete folders that were not modified in the last 30
seconds.
Signed-off-by: Hector Oswaldo Caballero <hector.caballero@ericsson.com>
Change-Id: Ie79447d6121271cf5e25171be377ea396c7028e0
This doesn't handle the really hard thing, which is merging spurious
conflicts inside .gitmodules files. That's OK: git.git doesn't
either. Users can resolve the conflict themselves and then commit
the merge.
Previously, jgit would crash when attempting to merge conflicting
submodule changes. Even if there was no conflict, after a merge which
adds submodules, the repository would have been missing empty
directories for newly-added submodules.
This patch fixes the crash, and adds the empty directories where
necessary. It ensures that the index is in a conflicted state when
submodule changes conflict.
Reported-by: Alexey Korobkov
Bug: 494551
Change-Id: I79db6798c2bdcc1159b5b2589b02da198dc906a1
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Commit fc7d407 corrected line endings for working tree files resulting
from merges when CRLF translations are to be done. However, that also
resulted in the file content being put as-is into the index, which is
wrong. The index must contain the file content with reverse CRLF
translations applied.
With core.autocrlf=true, the working tree file should have CR-LF, but
the index blob must still contain only LF.
Fix this oversight and apply the inverse translation when updating the
index, similar to what is done in AddCommand.
Bug: 499615
Change-Id: I3a33931318bdb580b2390f3450f91ea8f258a6a4
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Merges are performed using the raw text as stored in the git
repository. When we write the merge result, we must apply the
correct CRLF settings. Otherwise the line endings in the result
will be wrong.
Bug: 499615
Change-Id: I37a9b987e9404c97645d2720cd1c7c04c076a96b
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* Section and key names in git config files are case-insensitive.
* If an include directive is invalid, include the line in the
exception message.
* If inclusion of the included file fails, put the file name into
the exception message so that the user knows in which file the
problem is.
Change-Id: If920943af7ff93f5321b3d315dfec5222091256c
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Jsch caches keys (aka identities) specified in ~/.ssh/config via
IndentityFile only for the current Jsch Session. This results in
multiple password prompts for successive sessions.
Do the handling of IdentityFile exclusively in JGit, as it was before
4.9. JGit uses different Jsch instances per host and caches the
IdentityFile there, allowing it to be re-used in different sessions
for the same host.
* Add comments to explain this.
* Move the JschBugFixingConfig from OpenSshConfig to
JschConfigSessionFactory to have all these Jsch work-arounds
in one place.
* Make that config hide the IdentityFile config from Jsch to avoid
that Jsch overrides the JGit behavior.
Bug: 529173
Change-Id: Ib36c34a2921ba736adeb64de71323c2b91151613
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Instead of instantiating a new Random on each invocation of newLargeBlob,
create it once and reuse it.
This fixes a warning raised by Spotbugs about the Random object being
created and only used once.
Change-Id: I5b8e6ccbbc92641811537808aed9eae2034c1133
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Change-Id: Iaaefc2cbafbf083d6ab158b1c378ec69cc76d282
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Originally the patterns were escaped twice leading
to wrong matching results.
Bug: 528886
Change-Id: I26e201b4b0ef51cac08f940b76f381260fa925ca
Signed-off-by: Dmitry Pavlenko <pavlenko@tmatesoft.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
These are ignored by C git when parsing:
$ git config -f - --list <<EOF
[foo "x\0y"]
bar = baz
[foo "x\qy"]
bar = baz
[foo "x\by"]
bar = baz
[foo "x\ny"]
bar = baz
[foo "x\ty"]
bar = baz
EOF
foo.x0y.bar=baz
foo.xqy.bar=baz
foo.xby.bar=baz
foo.xny.bar=baz
foo.xty.bar=baz
This behavior is different from value parsing, where an invalid escape
sequence is an error (which JGit already does as well):
$ git config -f - --list <<EOF
[foo]
bar = x\qy
EOF
fatal: bad config line 2 in standard input
Change-Id: Ifd40129b37d9a62df3d886d8d7e22f766f54e9d1
When interleaving reads and writes from an unflushed pack, we forgot to
reset the file pointer back to the end of the file before writing more
new objects. This had at least two unfortunate effects:
* The pack data was potentially corrupt, since we could overwrite
previous portions of the file willy-nilly.
* The CountingOutputStream would report more bytes read than the size
of the file, which stored the wrong PackedObjectInfo, which would
cause EOFs during reading.
We already had a test in PackInserterTest which was supposed to catch
bugs like this, by interleaving reads and writes. Unfortunately, it
didn't catch the bug, since as an implementation detail we always read a
full buffer's worth of data from the file when inflating during
readback. If the size of the file was less than the offset of the object
we were reading back plus one buffer (8192 bytes), we would completely
accidentally end up back in the right place in the file.
So, add another test for this case where we read back a small object
positioned before a large object. Before the fix, this test exhibited
exactly the "Unexpected EOF" error reported at crbug.com/gerrit/7668.
Change-Id: I74f08f3d5d9046781d59e5bd7c84916ff8225c3b
Previously, Config was using the same method for both escaping and
parsing subsection names and config values. The goal was presumably code
savings, but unfortunately, these two pieces of the git config format
are simply different.
In git v2.15.1, Documentation/config.txt says the following about
subsection names:
"Subsection names are case sensitive and can contain any characters
except newline (doublequote `"` and backslash can be included by
escaping them as `\"` and `\\`, respectively). Section headers cannot
span multiple lines. Variables may belong directly to a section or to
a given subsection."
And, later in the same documentation section, about values:
"A line that defines a value can be continued to the next line by
ending it with a `\`; the backquote and the end-of-line are stripped.
Leading whitespaces after 'name =', the remainder of the line after
the first comment character '#' or ';', and trailing whitespaces of
the line are discarded unless they are enclosed in double quotes.
Internal whitespaces within the value are retained verbatim.
Inside double quotes, double quote `"` and backslash `\` characters
must be escaped: use `\"` for `"` and `\\` for `\`.
The following escape sequences (beside `\"` and `\\`) are recognized:
`\n` for newline character (NL), `\t` for horizontal tabulation (HT,
TAB) and `\b` for backspace (BS). Other char escape sequences
(including octal escape sequences) are invalid."
The main important differences are that subsection names have a limited
set of supported escape sequences, and do not support newlines at all,
either escaped or unescaped. Arguably, it would be easy to support
escaped newlines, but C git simply does not:
$ git config -f foo.config $'foo.bar\nbaz.quux' value
error: invalid key (newline): foo.bar
baz.quux
I468106ac was an attempt to fix one bug in escapeValue, around leading
whitespace, without having to rewrite the whole escaping/parsing code.
Unfortunately, because escapeValue was used for escaping subsection
names as well, this made it possible to write invalid config files, any
time Config#toText is called with a subsection name with trailing
whitespace, like {foo }.
Rather than pile hacks on top of hacks, fix it for real by largely
rewriting the escaping and parsing code.
In addition to fixing escape sequences, fix (and write tests for) a few
more issues in the old implementation:
* Now that we can properly parse it, always emit newlines as "\n" from
escapeValue, rather than the weird (but still supported) syntax with a
non-quoted trailing literal "\n\" before the newline. In addition to
producing more readable output and matching the behavior of C git,
this makes the escaping code much simpler.
* Disallow '\0' entirely within both subsection names and values, since
due to Unix command line argument conventions it is impossible to pass
such values to "git config".
* Properly preserve intra-value whitespace when parsing, rather than
collapsing it all to a single space.
Change-Id: I304f626b9d0ad1592c4e4e449a11b136c0f8b3e3
It was already increased in 61a943e and 661232b but is still not
enough to take into account snapshot versions that are 100 or more
commits ahead of tag, i.e. 4.9.2.201712150930-r.105-gc1d37ca27
Change-Id: Ibeff73adae06b92fe5bb9c5eced9e4c6a08c437c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The Config class must be safe to run against untrusted input files.
Reading arbitrary local system paths using include.path is risky for
servers, including Gerrit Code Review.
This was fixed on master [1] by making "readIncludedConfig" a noop
by default. This allows only FileBasedConfig, which originated from
local disk, to read local system paths.
However, the "readIncludedConfig" method was only introduced in [2]
which was needed by [3], both of which are only on the master branch.
On the stable branch only Config supports includes. Therefore this
commit simply disables the include functionality.
[1] https://git.eclipse.org/r/#/c/113371/
[2] https://git.eclipse.org/r/#/c/111847/
[3] https://git.eclipse.org/r/#/c/111848/
Bug: 528781
Change-Id: I9a3be3f1d07c4b6772bff535a2556e699a61381c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The Config class must be safe to run against untrusted input files.
Reading arbitrary local system paths using include.path is risky for
servers, including Gerrit Code Review. Return null by default to
incide the include should be ignored.
Only FileBasedConfig which originated from local disk should be trying
to read local system paths. FileBasedConfig already overrides this
method with its own implementation.
Change-Id: I2ff31753868aa1bbac4a6843a4c23e50bd6f46f3
Path.getFileName() may return null if the path has zero elements.
Enclose the dereference in a null-check.
Change-Id: I7ea3d3f07edc13a80b593d28e8fd512a4e1ed56b
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Instead of hard-coding the charset strings "US-ASCII", "UTF-8", and
"ISO-8859-1", use the corresponding constants from StandardCharsets.
UnsupportedEncodingException is not thrown when the StandardCharset
constants are used, so remove the now redundant handling.
Because the encoding names are no longer hard-coded strings, also
remove redundant $NON-NLS warning suppressions.
Also replace existing usages of the constants with static imports.
Change-Id: I0a4510d3d992db5e277f009a41434276f95bda4e
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
ConfigTest#pathToString is not visible to FileBasedConfigTest when
bulding with bazel.
Move it to FileUtils rather than messing about with the bazel build
rules to make it visible.
Change-Id: Idcfd4822699dac9dc4a426088a929a9cd31bf53f
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Relative include.path are now resolved against the config's parent
directory. include.path starting with ~/ are resolved against the
user's home directory
Change-Id: I91911ef404126618b1ddd3589294824a0ad919e6
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Change-Id: I37a2ef611aef97faf1b891a9660c1745435a915d
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When a GC operation is interrupted, temporary packs and indexes can be
left on the pack folder. In big, busy repositories this can lead to
significant amounts of wasted disk space if this interruption is done
with a certain frequency.
Remove stale temporary packs and indexes at the end of the GC process so
they do not accumulate. To avoid interfering with a possible concurrent
JGit GC process in the same repository, only delete temporary files that
are older than one day.
Change-Id: If9b6c1e57fac8a6a0ecc0a703089634caba4caae
Signed-off-by: Hector Caballero <hector.caballero@ericsson.com>
Jsch 0.1.54 passes on the values from ~/.ssh/config for
"ServerAliveInterval" and "ConnectTimeout" as read from
the config file to java.net.Socket.setSoTimeout(). That
method expects milliseconds, but the values in the config
file are seconds!
The missing conversion in Jsch means that the timeout is
set way too low, and if the server doesn't respond within
that very short time frame, Jsch kills the connection and
then throws an exception with a message such as "session is
down" or "timeout in waiting for rekeying process".
As a work-around, do the conversion to milliseconds in the
Jsch-facing Config interface of OpenSshConfig. That way Jsch
already gets these values as milliseconds.
Bug: 526867
Change-Id: Ibc9b93f7722fffe10f3e770dfe7fdabfb3b97e74
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
A tombstone will prevent a delayed reference update from resurrecting the
deleted reference.
Change-Id: Id9f4df43d435a299ff16cef614821439edef9b11
Signed-off-by: Minh Thai <mthai@google.com>
Do not use 0 as the unset value for minUpdateIndex, as input reftables
may have minUpdateIndex starting at 0.
Change-Id: Ie040a6b73d4a5eba5521e51d0ee4580713c84a3e
Signed-off-by: Minh Thai <mthai@google.com>
So far, in order to get the pack directory it was necessary to resolve
it from the object directory. This resolution is already done when
creating the object directory, so simplify the call by just adding a
getter to the pack directory.
Change-Id: I69e783141dc6739024e8b3d5acc30843edd651a7
Signed-off-by: Hector Caballero <hector.caballero@ericsson.com>
Currently, unless RequestPolicy#ANY is used, UploadPack rejects all
non-commit "want" lines unless they were advertized. This is fine,
except when "uploadpack.allowreachablesha1inwant" is true
(corresponding to RequestPolicy#REACHABLE_COMMIT), in which case one
would expect that "want"-ing anything reachable would work.
(There is no restriction that "want" lines must only contain commits -
it is allowed for refs to directly point to trees and blobs, and
requesting for them using "want" lines works.)
This commit has been written to avoid performance regressions as much
as possible. In the usual (and currently working) case where the only
unadvertized things requested are commits, we do a standard RevWalk in
order to avoid incurring the cost of loading bitmaps. However, if
unadvertized non-commits are requested, bitmaps are used instead, and
if there are no bitmaps, a WantNotValidException is thrown (as is
currently done).
Change-Id: I68ed4abd0e477ff415c696c7544ccaa234df7f99
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
JGit's delta handling code requires the target to be a single byte
array. Any attempt to inflate a delta larger than fits in the 2GiB
limit will fail with some form of array index exceptions. Check for
this overflow early and abort pack parsing.
Change-Id: I5bb3a71f1e4f4e0e89b8a177c7019a74ee6194da
This new warning was introduced in Eclipse 4.7 Oxygen [1].
The only instances of the warning are in test code that is asserting
that some class does not compare equal to Strings. As in the Gerrit
project [2] these asserts are arguably overkill, but arguably also
a reasonable test of an equals implementation. Ignore the warning in
these cases.
Note that if the project is opened in an earlier version of Eclipse,
a warning "Unsupported @SuppressWarnings" will be emitted.
[1] https://www.eclipse.org/eclipse/news/4.7/M6/
[2] https://gerrit-review.googlesource.com/#/c/gerrit/+/110339/
Change-Id: I08ea33d71e6009cf0f37e6492a475931f447256b
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
In a subsequent commit, more tests will be added. This commit allows
those tests to reuse fields.
Change-Id: Icbd17d158cfe3ba4dacbd8a11a67f9e7607b41b3
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Applications that use ObjectInserters to create lots of individual
objects may prefer to avoid cluttering up the object directory with
loose objects. Add a specialized inserter implementation that produces a
single pack file no matter how many objects. This inserter is loosely
based on the existing DfsInserter implementation, but is simpler since
we don't need to buffer blocks in memory before writing to storage.
An alternative for such applications would be to write out the loose
objects and then repack just those objects later. This operation is not
currently supported with the GC class, which always repacks existing
packs when compacting loose objects. This in turn requires more
CPU-intensive reachability checks and extra I/O to copy objects from old
packs to new packs.
So, the choice was between implementing a new variant of repack, or not
writing loose objects in the first place. The latter approach is likely
less code overall, and avoids unnecessary I/O at runtime.
The current implementation does not yet support newReader() for reading
back objects.
Change-Id: I2074418f4e65853b7113de5eaced3a6b037d1a17
This makes detection of binaries exact for ResolveMerger and
DiffFormatter: they will classify files as binary regardless of where
the '\0' occurs in the text.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: Id4342a199628d9406bfa04af1b023c27a47d4014
This method creates a RawText from a blob, but avoids reading the blob
if the start contains null bytes. This should reduce the amount of
garbage that Gerrit produces for changes with binaries.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: Idd202d20251f2d1653e5f1ca374fe644c2cf205f