If I create a repository containing an empty file and clone it
with
git clone --no-checkout --filter=blob:none \
https://url/of/repository
then I would expect no blobs to be transferred over the wire. Alas,
JGit rewrites filter=blob:none to filter=blob:limit=0, so if the
repository contains an empty file then the empty blob gets
transferred.
Fix it by teaching JGit about filters based on object type to
complement the existing filters based on object size. This prepares
us for other future filters such as object:none.
In particular, this means we do not need to look up the size of the
filtered blobs, which should speed up clones. Noticed by Anna
Pologova and Terry Parker.
Change-Id: Id4b234921a190c108d8be2c87f54dcbfa811602a
Signed-off-by: Jonathan Nieder <jrn@google.com>
Teach the FilterSpec serialization code about tree filters so they can
be communicated over the wire and understood by the server.
While we're here, harden the FilterSpec serialization code to throw
IllegalStateException if we encounter a FilterSpec that cannot be
expressed as a "filter" line. The only public API for creating a
Filterspec is to pass in a "filter" line to be parsed, so these should
not appear in practice.
Change-Id: I9664844059ffbc9c36eb829e2d860f198b9403a0
Signed-off-by: Jonathan Nieder <jrn@google.com>
OS X ships with a default /usr/bin/git that is just a wrapper that
at run-time delegates to the selected XCode toolchain, and that
prompts the user to install the XCode command line tools if not
already installed.
This is annoying for people who don't want to do so, since they'll
be prompted on each Eclipse start. Also, since on OS X the $PATH for
applications started via the GUI is not the same as the $PATH as set
via the shell profile, just using /usr/bin/git (which will normally
be found when JGit runs inside Eclipse) may give slightly surprising
results if the user has installed a non-Apple git and changed his
$PATH in the shell such that the non-Apple git is used in the shell.
(For instance by placing /usr/local/bin earlier on the path.) Eclipse
and the shell will use different git executables, and thus different
git system configs.
Therefore, try to find git via bash --login -c 'which git' not only
if we couldn't find it on $PATH but also if we found the default git
/usr/bin/git. If that finds some other git, use that. If the bash
approach also finds /usr/bin/git, double check via xcode-select -p
that an XCode git is present. If not, assume there is no git installed,
and work without any system config.
Bug: 564372
Change-Id: Ie9d010ebd9437a491ba5d92b4ffd1860c203f8ca
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Using a fixed thread pool with unbounded LinkedBlockingQueue fixes the
RejectedExecutionException thrown if too many threads try to
concurrently determine filesystem attributes.
Comparing that to an alternative implementation using an unbounded
thread pool instead showed similar performance with the reproducer (in
range of 100-1000 threads in reproducer) on my mac:
threads time
fixed threadpool up to 5 threads with LinkedBlockingQueue of unlimited
queue size
100 1103 ms
200 1602 ms
300 2369 ms
500 4002 ms
1000 11071 ms
unbounded cached threadpool
100 1108 ms
200 1591 ms
300 2299 ms
500 4577 ms
1000 11196 ms
Bug: 564202
Change-Id: I773da7414a1dca8e548349442dca9b56643be946
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Include implementation version in jgit library. This version is used
by other products that depend on JGit, and built using Bazel and not
consume officially released artifact from Central or Eclipse own Maven
repository.
Most notably, in Gerrit Code Review JGit agent that was previously
reported as "unknown", is now reported as:
JGit/v5.8.0.202006091008-r-16-g14c43828d
using this change [1].
[1] https://gerrit-review.googlesource.com/c/gerrit/+/272505
Change-Id: Ia50de9ac35b8dbe9e92d8ad7d0d14cd00f057863
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
In JGit 5.0, the FileTreeIterator was changed to skip ignored folders
by default. To catch tracked files inside ignored folders, the tree
walk needs to have a DirCacheIterator, and the FileTreeIterator has
to know about that DirCacheIterator via setDirCacheIterator(). (Or
the optimization has to be switched off explicitly via
setWalkIgnoredDirectories(true).)
Skipping ignored directories is an important optimization in some
cases, for instance in node.js/npm projects, where we'd otherwise
traverse the whole huge and deep hierarchy of the typically ignored
node_modules folder.
While all uses of WorkingTreeIterator in JGit had been adapted,
DiffFormatter was forgotten. To make it work correctly (again) also
for such cases, make it set up a WorkingTreeeIterator automatically,
and make sure the WorkingTreeSource can find such files, too. Also
pass the repository to the TreeWalks used inside the DiffFormatter
to pick up the correct attributes, filters, and line-ending settings.
Bug: 565081
Change-Id: Ie88ac81166dc396ba28b83313964c1712b6ca199
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Make sure we don't produce a spurious empty line at the end.
Bug: 564428
Change-Id: Ib991d93fbd052baca65d32a7842f07f9ddeb8130
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
The message "Too many commands" implies there is a hard limit on the
number of commands, which isn't the case. The limit is on the total
size of the received data, as explained in change I84317d396 which
introduced the configuration setting receive.maxCommandBytes:
shorter reference names allow for more commands, longer reference
names permit fewer commands per batch.
Change the message to:
Commands size exceeds limit defined in receive.maxCommandBytes
Change-Id: I678b78f919b2fec8f8058f3403f2541c26a5d00e
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
MergedReftable ignores the last reftable in the stack while calculating the
minUpdateIndex.
Update the loop indices to include all reftables in the minUpdateIndex
calculation, while skipping position 0 as it is read outside the loop.
Change-Id: I12d3e714581e93d178be79c02408a67ab2bd838e
Signed-off-by: Minh Thai <mthai@google.com>
Currently we're buffering the inflated bitmap entry in
BasePackBitmapIndex to optimize running time. However, this will use
lots of memory during the creation of the pack bitmap index file.
And change 161456, which rewrote the entire getBitmap method, increased
the fetch latency significantly.
This commit introduces getBitmapWithoutCaching method which is used in
the pack bitmap index file creation only and aims to save memory during
garbage collection and not increase fetch latency.
Change-Id: I7b982c9d4e38f5f6193eaa03894e894ba992b33b
Signed-off-by: Yunjie Li <yunjieli@google.com>
The current mechanism for updating the unpack error handler requires
that the error handler is replaced entirely, including communicating
the error to the user. Adding a getter means that delegating
implementations can be constructed so that the error can be processed
before sending to the user, for example for logging.
Change-Id: I4b6f78a041d0f6f5b4076a9a5781565ca3857817
Signed-off-by: Jack Wickham <jwickham@palantir.com>
ObjectWritingException and FileNotFoundException are subclasses
of IOException, which is already declared. Error does not need
to be explicitly declared.
Change-Id: I879820a33e10ec3a7ef676adc9c9148d2b3c4b27
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
- The code to move the file is repeated. Split it out into a
utility method.
- Remove the catch block for AtomicMoveNotSupportedException which
is redundant because it's handled in exactly the same way as the
IOException further down. The only exception we need to explicitly
handle differently in this block is NoSuchFileException.
- Improve the comments.
Change-Id: Ifc5490953ffb25ecd1c48a06289eccb3f19910c6
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The shutdown method releases
- ThreadLocal held by NLS
- GlobalBundleCache used by NLS
- Executor held by WorkQueue
Bug: 437855
Bug: 550529
Change-Id: Icfdccd63668ca90c730ee47a52a17dbd58695ada
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If a hunk does not apply at the position stated in the hunk header
try to determine its position using the old lines (context and
deleted lines).
This is still a far cry from a full git apply: it doesn't do binary
patches, it doesn't handle git's whitespace options, and it's perhaps
not the fastest on big patches. C git hashes the lines and uses these
hashes to speed up matching hunks (and to do its whitespace magic).
Bug: 562348
Change-Id: Id0796bba059d84e648769d5896f497fde0b787dd
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Running recent error prone version complaining on that code:
CharacterHead.java:22: error: [ProtectedMembersInFinalClass] Make
members of final classes package-private: <init>
protected CharacterHead(char expectedCharacter) {
^
(see https://errorprone.info/bugpattern/ProtectedMembersInFinalClass)
Did you mean 'CharacterHead(char expectedCharacter) {'
Bug: 562756
Change-Id: Ic46a0b07e46235592f6e63db631f583303420b73
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
On the first attempt to move the temp file, NoSuchFileException can
be raised if the destination folder does not exist. Instead of handling
this implicitly in the catch of IOException and then continuing to
create the destination folder and try again, explicitly catch it and
create the destination folder. If any other IOException occurs, treat
it as an unexpected error and return FAILURE.
Subsequently, on the second attempt to move the temp file, if ANY kind
of IOException occurs, also consider this an unexpected error and
return FAILURE.
In both catch blocks for IOException, add logging at ERROR level.
Change-Id: I9de9ee3d2b368be36e02ee1c0daf8e844f7e46c8
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
If atomic move is not supported, AtomicMoveNotSupportedException will
be thrown on the first attempt to move the temp file. There is no
point attempting the move operation a second time because it will only
fail for the same reason.
Add an immediate return of FAILURE on the first occasion. Remove the
unnecessary handling of the exception in the second block.
Change-Id: I4658a8b37cfec2d7ef0217c8346e512968d0964c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Running recent error prone version complaining on that code:
RefDatabase.java:444: error: [InvalidInlineTag] Tag name `linkObjectId`
is unknown.
* Includes peeled {@linkObjectId}s. This is the inverse lookup of
^
(see https://errorprone.info/bugpattern/InvalidInlineTag)
Bug: 562756
Change-Id: If91da51d5138fb753c0550eeeb9e3883a394123d
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
Motivation: JSch serves as 'default' implementations of the SSH
transport. If a client application does not use it then there is no need
to pull in this dependency.
Move the classes depending on JSch to an OSGi fragment extending the
org.eclipse.jgit bundle and keep them in the same package as before
since moving them to another package would break API. Defer moving them
to a separate package to the next major release.
Add a new feature org.eclipse.jgit.ssh.jsch feature to enable
installation. With that users can now decide which of the ssh client
integrations (JCraft JSch or Apache Mina SSHD) they want to install.
We will remove the JCraft JSch integration in a later step due to the
reasons discussed in bug 520927.
Bug: 553625
Change-Id: I5979c8a9dbbe878a2e8ac0fbfde7230059d74dc2
Also-by: Michael Dardis <git@md-5.net>
Signed-off-by: Michael Dardis <git@md-5.net>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
Motivation: BouncyCastle serves as 'default' implementation of
the GPG Signer. If a client application does not use it there is no need
to pull in this dependency, especially since BouncyCastle is a large
library.
Move the classes depending on BouncyCastle to an OSGi fragment extending
the org.eclipse.jgit bundle. They are moved to a distinct internal
package in order to avoid split packages. This doesn't break public API
since these classes were already in an internal package before this
change.
Add a new feature org.eclipse.jgit.gpg.bc to enable installation. With
that users can now decide if they want to install it.
Attempts to sign a commit if org.eclipse.jgit.gpg.bc isn't available
will result in ServiceUnavailableException being thrown.
Bug: 559106
Change-Id: I42fd6c00002e17aa9a7be96ae434b538ea86ccf8
Also-by: Michael Dardis <git@md-5.net>
Signed-off-by: Michael Dardis <git@md-5.net>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
If the determination of the user home directory produces a Java File
object with an invalid path, spurious exceptions may occur at the
most inopportune moments anytime later. In the case in the linked bug
report, start-up of EGit failed, leading to numerous user-visible
problems in Eclipse.
So validate the return value of FS.userHomeImpl(). If converting that
File to a Path throws an exception, log the problem and fall back to
Java system property user.home. If that also is not valid, use null.
(A null user home directory is allowed by FS, and calling in Java
new File(null, "some_string") is fine and produces a File relative
to the current working directory.)
Bug: 563739
Change-Id: If9eec0f9a31a45bd815231706285c71b09f8cf56
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Make it possible to programmatically suppress the JMX bean
registration. In EGit it is not needed but can be rather costly
because it occurs during plug-in activation and accesses the
git user config.
Bug: 563740
Change-Id: I07ef7ae2f0208d177d2a03862846a8efe0191956
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Only the presence or absence of whitespace is significant; but not the
actual whitespace characters. Don't compare whitespace bytes.
Compare the C git implementation at [1].
[1] https://github.com/git/git/blob/0d0e1e8/xdiff/xutils.c#L173
Bug: 563570
Change-Id: I2d0522b637ba6b5c8b911b3376a9df5daa9d4c27
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
This reverts commit 3aee92478c, which
increased fetch latency significantly.
Change-Id: Id31a94dff83bf7ab2121718ead819bd08306a0b6
Signed-off-by: Yunjie Li <yunjieli@google.com>
A builder API provides a more convenient way to define a customized
SshdSessionFactory by hiding the subclassing.
Also provide a new interface SshConfigStore to abstract away the
specifics of reading a ssh config file, and provide a way to customize
the concrete ssh config implementation to be used. This facilitates
using an alternate ssh config implementation that may or may not be
based on files.
Change-Id: Ib9038e8ff2a4eb3a9ce7b3554d1450befec8e1e1
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Avoid trying other authentication methods on SocketException or on
InterruptedIOException. SocketException is rather fatal, such as
nothing listening on the peer's port, connection reset, or it could
be a connection time-out.
Time-outs enforced by Timeout{Input,Output}Stream may result in
InterruptedIOException being thrown.
In both cases, it makes no sense to try other authentication methods,
and doing so may wrongly report "authentication not supported" or
"cannot open git-upload-pack" or some such instead of reporting a
time-out.
Bug: 563138
Change-Id: I0191b1e784c2471035e550205abd06ec9934fd00
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Config core.eol is to be ignored if core.autocrlf is true or input.[1]
JGit didn't do so when core.autocrlf=input was set.
[1] https://git-scm.com/docs/git-config#Documentation/git-config.txt-coreeol
Bug: 561877
Change-Id: I5e62e0510d160b5113c1090319af09c2bc1bcb59
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
In Git 2.10.0 the interpretation of gitattributes changed or was fixed
such that "* text=auto eol=crlf" would indeed still do auto-detection
of text vs. binary content.[1] Previously this was identical to
"* text eol=crlf", i.e., treating all files as text.
JGit still did the latter, which caused surprises because it changed
binary files.
[1] https://github.com/git/git/blob/master/Documentation/RelNotes/2.10.0.txt#L248
Bug: 561341
Change-Id: I5b6fb97b5e86fd950a98537b6b8574f768ae30e5
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Update dependency to Bouncy Castle to 1.65.
Add the IssuerFingerprint as a hashed sub-packet in the signature. If
added unhashed, GPG ignores it.
Bug: 553206
Change-Id: I6807e8e2385e6ec5790f388e4753a44aa9474ebb
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
OSGi semantic versioning allows breaking implementers in a minor
release.
Change-Id: Ib55dc43dd3b50b0ef39a7094190f230210aee4b6
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Setting the distance threshold to 2000 in PackWriterBitmapPreparer to
reduce memory usage in garbage collection. When the threshold is 0, GC
for the msm repository would use about 37 GB memory to complete. After
setting it to 2000, GC can finish in 75 min with about 10 GB memory.
Change-Id: I39783eeecbae58261c883735499e61ee1cac75fe
Signed-off-by: Yunjie Li <yunjieli@google.com>
Currently we're buffering the inflated bitmap entry in BasePackBitmapIndex
to optimize running time. However, this will use lots of memory during
the construction of the pack bitmap index file which may cause failure of
garbage collection.
The running time didn't increase significantly, if there's any increase,
after removing the buffering here. The report about usage of time/memory
will come in the next commit.
Change-Id: I874503ecc85714acab7ca62a6a7968c2dc0b56b3
Signed-off-by: Yunjie Li <yunjieli@google.com>
The convertedBitmaps serves for time-optimization purpose. But it's
actually not saving time much but using lots of memory. So remove the
field here to save memory.
Currently the remapper class is only used in the construction of the
bitmap index file. And during the preparation of the file, we're only
getting bitmaps from the remapper when finding objects accessible from
a commit, so bitmap associated with each commit will only be fetched once
and thus the convertedBitmaps would hardly be read, which means that it's
not saving time.
Change-Id: Ic942a8e485135fb177ec21d09282d08ca6646fdb
Signed-off-by: Yunjie Li <yunjieli@google.com>
Currently, the garbage collection is consistently failing for some large
repositories in the building bitmap phase, e.g.Linux-MSM project:
https://source.codeaurora.org/quic/la/kernel/msm-3.18
Historically, bitmap index creation happened in 3 phases:
1. Select the commits to which bitmaps should be attached.
2. Create all bitmaps for these commits, stored in uncompressed format
in the PackBitmapIndexBuilder.
3. Deltify the bitmaps and write them to disk.
We investigated the process. For phase 2 it's most efficient to create
bitmaps starting with oldest commit and moving to the newest commit,
because the newer commits are able to reuse the work for the old ones.
But for bitmap deltification in phase 3, it's better when a newer
commit's bitmap is the base, and the current disk format writes bitmaps
out for the newest commits first.
This change introduces a new collection to hold the deltified and
compressed representations of the bitmaps, keeping a smaller subset of
commits in the PackBitmapIndexBuilder to help make the bitmap index
creation more memory efficient.
And in this commit, we're setting DISTANCE_THRESHOLD to 0 in the
PackWriterBitmapPreparer, which means the garbage collection will not
have much behavoir change and will still use as much memory as before.
Change-Id: I6ec2c3e8dde11805af47874d67d33cf1ef83660e
Signed-off-by: Yunjie Li <yunjieli@google.com>
Add a new revwalk filter, AddToBitmapWithCachedFilter. This filter updates
a client-provided {@code BitmapBuilder} as a side effect of a revwalk.
Similar to {@code AddToBitmapFilter}, it short circuits the walk when it
encounters a commit which is included in the provided bitmap's BitmapIndex.
It also short circuits the walk if it encounters the client-provided
cached commit.
Change-Id: I62cb503016f4d3995d648d92b82baab7f93549a9
Signed-off-by: Yunjie Li <yunjieli@google.com>
Add some utility methods and a builder class for BitmapCommit class in
preparation for improving the memory footprint of GC's bitmap generation
phase.
Change-Id: Ice3d257fc26f3917a65a64eaf53b508b89043caa
Signed-off-by: Yunjie Li <yunjieli@google.com>
Move BitmapCommit from inside the PackWriterBitmapPreparer to a new
top-level class in preparation for improving the memory footprint of GC's
bitmap generation phase.
Change-Id: I4d404a5b3a34998b441d23105197f33d32d39670
Signed-off-by: Yunjie Li <yunjieli@google.com>
Make retrieveCompressed() a method of Bitmap interface to avoid type
casting and later reuse in improving the memory footprint of GC's bitmap
generation phase.
Change-Id: I098d85105cf17af845d43b8c71b4ca48b02fd7da
Signed-off-by: Yunjie Li <yunjieli@google.com>
When downloading LFS objects also accept response code 203 as successful
download. This response may be seen when downloading via a proxy.
Bug: 563022
Change-Id: Iee85fdb451b33369d08859872e5bfc2a67dffa6d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Introduce an IterativeConnectivityChecker which runs a connectivity
check with a filtered set of references, and falls back to using the
full set of advertised references.
It uses references during first check attempt:
- References that are ancestors of an incoming commits (e.g., pushing
a commit onto an existing branch or pushing a new branch based on
another branch)
- Additional list of references we know client can be interested in
(e.g. list of open changes for Gerrit)
We tested it inside Google and it improves connectivity for certain
topologies. For example connectivity counts for
chromium.googlesource.com/chromium/src:
percentile_50: 1923 (was: 22777)
percentile_90: 23272 (was: 353003)
percentile_99: 345522 (was: 353435)
This saved ~2 seconds on every push to this repository.
Signed-off-by: Demetr Starshov <dstarshov@google.com>
Change-Id: I6543c2e10ed04622ca795b195665133e690d3b10
Moving transport related internal classes into dedicated subpackage in
o/e/j/internal package.
Signed-off-by: Demetr Starshov <dstarshov@google.com>
Change-Id: I21ed029d359f5f7d8298f102efbb4b1dcdf404ad