The main concern are submodule urls starting with '-' that could pass as
options to an unguarded tool.
Pass through the parser the ids of blobs identified as .gitmodules
files in the ObjectChecker. Load the blobs and parse/validate them
in SubmoduleValidator.
Change-Id: Ia0cc32ce020d288f995bf7bc68041fda36be1963
Signed-off-by: Ivan Frade <ifrade@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
In order to validate .gitmodules files, we first need to find them
in the incoming pack.
Do it in the ObjectChecker stage. Check in the tree objects if they
point to a .gitmodules file and report the tree id and the .gitmodules
blob id.
This can be used later to check if the file is in the root of the
project and if the contents are good.
While we're here, make isMacHFSGit more accurate by detecting variants
of filenames that vary in case.
[jn: tweaked NTFS and HFS+ checking; added more tests]
Change-Id: I70802e7d2c1374116149de4f89836b9498f39582
Signed-off-by: Ivan Frade <ifrade@google.com>
Signed-off-by: Jonathan Nieder <jrn@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
In C git versions before 2.19.1, the submodule is fetched by running
"git clone <uri> <path>". A URI starting with "-" would be interpreted
as an option, causing security problems. See CVE-2018-17456.
Refuse to add submodules with URIs, names or paths starting with "-",
that could be confused with command line arguments.
[jn: backported to JGit 4.7.y, bringing portions of Masaya Suzuki's
dotdot check code in v5.1.0.201808281540-m3~57 (Add API to specify
the submodule name, 2018-07-12) along for the ride]
Change-Id: I2607c3acc480b75ab2b13386fe2cac435839f017
Signed-off-by: Ivan Frade <ifrade@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This test was never being run. Since it was introduced it was
named "notest.." which meant it didn't run with JUnit3, and
since it is not annotated @Test it also doesn't run with JUnit4.
When compiling with Bazel 0.6.0, error-prone raises an error
that the public method is not annotated with @Ignore or @Test.
Given that the test has never been run anyway, we can just
remove it.
Bug: 525415
Change-Id: Ie9a54f89fe42e0c201f547ff54ff1d419ce37864
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Remove completely the empty directories under refs/<namespace>
including the first level partition of the changes, when they are
completely empty.
Bug: 536777
Change-Id: I88304d34cc42435919c2d1480258684d993dfdca
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
After packaging references, the folders containing these references are
not deleted. In a busy repository, this causes operations to slow down
as traversing the references tree becomes longer.
Delete empty reference folders after the loose references have been
packed.
To avoid deleting a folder that was just created by another concurrent
operation, only delete folders that were not modified in the last 30
seconds.
Signed-off-by: Hector Oswaldo Caballero <hector.caballero@ericsson.com>
Change-Id: Ie79447d6121271cf5e25171be377ea396c7028e0
Signed-off-by: Luca Milanesio <luca.milanesio@gmail.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
When creating a new PackFile instance it is specified whether this pack
has an associated bitmap index file or not. This information is cached
and the public method getBitmapIndex() will always assume a bitmap index
file must exist if the cached data tells so. But it may happen that the
packfiles are repacked during a gc in a different process causing the
packfile, bitmap-index and index file to be deleted. Since JGit still
has an open FileHandle on the packfile this file is not really deleted
and can still be accessed. But index and bitmap index file are deleted.
Fix getBitmapIndex() to invalidate the cached packfile instance if such
a situation occurs.
This problem showed up when a gerrit server was serving repositories
which where garbage collected with native git regularly. Fetch and
clone commands for certain repositories failed permanently after a
native git gc had deleted old bitmap index files.
Change-Id: I8e620bec74dd3f310ba42024f9a657062f868f0e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This is a workaround for
https://bugs.openjdk.java.net/browse/JDK-4666701.
Change-Id: Idd04657e8d95a841d72230f8881b6b899daadbc2
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
In the repo manifest documentation [1] the fetch attribute is marked
as "#REQUIRED".
If the fetch attribute is not specified, this would previously result in
NullPointerException. Throw a SAXException instead.
[1] https://gerrit.googlesource.com/git-repo/+/master/docs/manifest-format.txt
Change-Id: Ib8ed8cee6074fe6bf8f9ac6fc7a1664a547d2d49
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
It was already increased in 61a943e, but that was still not enough to
take into account the length of snapshot versions.
Change-Id: Ib54cec97e97042fe274b87a3a1afa9bb06c8bf19
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
A higher limit is required to account for proper JGit version number
being sent in the UserAgent.
The version string "4.7.0.201704031717-r" is 20 characters, however
the strings used during development are shorter:
- When running from mvn, "4.7.0.qualifier" is used; 15 characters
- When running in Eclipse, "unknown" is used; 7 characters
Change-Id: I9aca2f71389a42fedce305e9078db016869c3d1a
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Add a new API method to set the recurse mode, and pass the mode into
the fetch command.
Extend the existing FetchCommandRecurseSubmodulesTest to also perform
the same tests for fetch. Rename the test class accordingly.
Change-Id: I12553af47774b4778f7011e1018bd575a7909bd0
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
In I3ab958ce8 explicit dependency in lib/BUILD were defined and most
of the bazel build implementation was switched to using it. Switch
test.bzl test implementation to using explicit dependencies as well.
Change-Id: I4413d1a45addeeb2a980d07669fa034c2eebb3a4
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
Add bazel build for ui and junit.http, and the test packages.
A number of different test labels are supported:
api
attributes
dfs
diff
http
lfs
lfs-server
nls
notes
pack
patch
pgm
reftree
revplot
revwalk
storage
submodule
symlinks
transport
treewalk
util
To run all tests:
bazel test //...
To run specific tests, using labels:
bazel test --test_tag_filters=api,dfs,revplot,treewalk //...
Change-Id: Ic41b05a79d855212e67b1b4707e9c6b4dc9ea70d
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
Signed-off-by: Jonathan Nieder <jrn@google.com>
This is a preparation change to Bazel build implementation. Error
Prone rejects the code with variable crypto algorithm as insecure
see: [1].
[1] http://errorprone.info/bugpattern/InsecureCryptoUsage
Change-Id: I92db70a7da454bc364597a995e8be5dccc2d6427
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
With the filename suffix "Tests", the module was not included in tests
when building with Maven, and without the @Test annotations the tests
didn't get executed under Eclipse or buck test.
testRacyGitDetection was failing because the index file did not exist.
Add the missing configuration, the missing annotations, and add a call
to reset() in testRacyGitDetection to force creation of the index file.
Change-Id: I29dd8f89c36fef4ab40bedce7f4a26bd9b2390e4
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
This fixes error flagged by error prone:
Java compilation in rule '//org.eclipse.jgit.test:jgit' failed: Worker
process sent response with exit code: 1.
org.eclipse.jgit.test/tst/org/eclipse/jgit/revwalk/RevFlagSetTest.java:149:
error: [CollectionIncompatibleType] Argument '"bob"' should not be
passed to this method; its type String is not compatible with its
collection's type argument RevFlag
assertFalse(set.contains("bob"));
Change-Id: I4a971ce92fee55e28b2ab0c7b716ac20fa9c6709
Signed-off-by: David Ostrovsky <david@ostrovsky.org>
The submodule.name.fetchRecurseSubmodules value was being read from the
configuration of the submodule, but it should be read from the config
of the parent repository.
Also, the fetch.recurseSubmodules value from the parent repository's
configuration was not being considered at all.
Fix both of these and add tests. Now the precedence of the recurse mode
is determined as follows:
1. Value passed to the API
2. Value configured in submodule.name.fetchRecurseSubmodules
3. Value configured in fetch.recurseSubmodules
4. Default to "on demand"
Change-Id: Ic23b7c40b5f39135fb3fd754c597dd4bcc94240c
Extend FetchCommand to expose a new method, setRecurseSubmodules(mode),
which allows to set the mode to ON, OFF or ON_DEMAND.
After fetching a repository, its submodules are recursively fetched:
- When the mode is YES, submodules are always fetched.
- When the mode is NO, submodules are not fetched.
- When the mode is ON_DEMAND, submodules are only fetched when the
parent repository receives an update of the submodule and the new
revision is not already in the submodule.
The mode is determined in the following order of precedence:
- Value specified in the API call using setRecurseSubmodules.
- Value specified in the repository's config under the key
submodule.name.fetchRecurseSubmodules
- Defaults to ON_DEMAND if neither of the previous is set.
Extend FetchResult to recursively include results for submodules, as
a map of the submodule path to an instance of FetchResult.
Test setup is based on testCloneRepositoryWithNestedSubmodules.
Change-Id: Ibc841683763307cb76e78e142e0da5b11b1add2a
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
This operation was added recently with the goal to provide some
way to auto-correct invalid user input, or to provide a correction
suggestion to the user -- EGit uses it now that way. But the initial
implementation was very restrictive; it removed all non-ASCII
characters and even slashes.
Understandably end users were not happy with that. Git has no such
restriction to ASCII-only; nor does JGit. Branch names should be
meaningful to the end user, and if a user-supplied branch name is
invalid for technical reasons, a "normalized" name should still
be meaningful to the user.
Rewrite to attempt a minimal fix such that the result will pass
isValidRefName.
* Replace all Unicode whitespace by underscore.
* Replace troublesome special characters by dash.
* Collapse sequences of underscores, dots, and dashes.
* Remove underscores, dots, and dashes following slashes, and
collapse sequences of slashes.
* Strip leading and trailing sequences of slashes, dots, dashes,
and underscores.
* Avoid the ".lock" extension.
* Avoid the Windows reserved device names.
* If input name is null return an empty String so callers don't need to
check for null.
This still allows branch names with single slashes as separators
between components, avoids some pitfalls that isValidRefName() tests
for, and leaves other character untouched and thus allows non-ASCII
branch names.
Also move the function from the bottom of the file up to where
isValidRefName is implemented.
Bug: 512508
Change-Id: Ia0576d9b2489162208c05e51c6d54e9f0c88c3a7
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Update SHA1 class to include a Java port of sha1dc[1]'s ubc_check,
which can detect the attack pattern used by the SHAttered[2] authors.
Given the shattered example files that have the same SHA-1, this
modified implementation can identify there is risk of collision given
only one file in the pair:
$ jgit ...
[main] WARN org.eclipse.jgit.util.sha1.SHA1 - SHA-1 collision 38762cf7f55934b34d179ae6a4c80cadccbb7f0a
When JGit detects probability of a collision the SHA1 class now warns
on the logger, reporting the object's SHA-1 hash, and then throws a
Sha1CollisionException to the caller.
From the paper[3] by Marc Stevens, the probability of a false positive
identification of a collision is about 14 * 2^(-160), sufficiently low
enough for any detected collision to likely be a real collision.
git-core[4] may adopt sha1dc before the system migrates to an entirely
new hash function. This commit enables JGit to remain compatible with
that move to sha1dc, and help protect users by warning if similar
attacks as SHAttered are identified.
Performance declined about 8% (detection off), now:
MessageDigest 238.41 MiB/s
MessageDigest 244.52 MiB/s
MessageDigest 244.06 MiB/s
MessageDigest 242.58 MiB/s
SHA1 216.77 MiB/s (was ~240.83 MiB/s)
SHA1 220.98 MiB/s
SHA1 221.76 MiB/s
SHA1 221.34 MiB/s
This decline in throughput is attributed to the step loop unrolling in
compress(), which was necessary to easily fit the UbcCheck logic into
the hash function. Using helper functions s1-s4 reduces the code
explosion, providing acceptable throughput.
With detection enabled (default):
SHA1 detectCollision 180.12 MiB/s
SHA1 detectCollision 181.59 MiB/s
SHA1 detectCollision 181.64 MiB/s
SHA1 detectCollision 182.24 MiB/s
sha1dc (native C) ~206.28 MiB/s
sha1dc (native C) ~204.47 MiB/s
sha1dc (native C) ~203.74 MiB/s
Average time across 100,000 calls to hash 4100 bytes (such as a commit
or tree) for the various algorithms available to JGit also shows SHA1
is slower than MessageDigest, but by an acceptable margin:
MessageDigest 17 usec
SHA1 18 usec
SHA1 detectCollision 22 usec
Time to index-pack for git.git (217982 objects, 69 MiB) has increased:
MessageDigest SHA1 w/ detectCollision
------------- -----------------------
20.12s 25.25s
19.87s 25.48s
20.04s 25.26s
avg 20.01s 25.33s +26%
Being implemented in Java with these additional safety checks is
clearly a penalty, but throughput is still acceptable given the
increased security against object name collisions.
[1] https://github.com/cr-marcstevens/sha1collisiondetection
[2] https://shattered.it/
[3] https://marc-stevens.nl/research/papers/C13-S.pdf
[4] https://public-inbox.org/git/20170223230621.43anex65ndoqbgnf@sigill.intra.peff.net/
Change-Id: I9fe4c6d8fc5e5a661af72cd3246c9e67b1b9fee6
The TreeWalk filtering classes need to support the three different
meanings of the return value the path comparison generates.
A new path comparison method (isPathMatch) is created with
three distinct return values (isPathPrefix use value '0' to
encode two of these) which will makes it possible for the logical
operators (especially NOT) to aggregate a correct verdict.
A filter like: AND(Path("path"), NOT(Path("path/to/other")))
Should filter out 'path/to/other/file', but not 'path/to/my/file'.
The path-limiting feature when testing path/to/my/file, would
result to run test for the following paths:
path
path/to
path/to/my
path/to/my/file
isPathPrefix('path/to/other') will return '0' for the first two
and since there is no way for NOT to distinguish between an exact
match and a match indicating that the tested path is a 'parent',
it will incorrectly return false and thus remove everything below
'path' immediately.
isPathMatch has a distinguished value for 'parent' matches that
will be preserved through the logic operators and should not
cause an over-eager removal of paths.
The functionality of isPathPrefix is required by other parts
and is untouched.
Unit tests are included to ensure that the logical functionality
is correct and can be preserved.
Change-Id: Ice2ca9406f09f1b179569e99b86a0e5d77baa20d
Signed-off-by: Magnus Vigerlöf <magnus.vigerlof@gmail.com>
Allow SHA1 instances to be reused to compute another hash value, and
resume caching them in ObjectInserter and PackParser. This shaves a
small amount of running time off parsing git.git's pack file:
before after
------ ------
25.25s 25.55s
25.48s 25.06s
25.26s 24.94s
Almost noise (small difference), but recycling the instances reduces
some stress on the memory allocator finding two 80 word message block
arrays needed for hashing and collision detection.
Change-Id: I4af88a720e81460293bc5c5d1d3db1a831e7e228
This implementation is derived straight from the description written
in RFC 3174. On Mac OS X with Java 1.8.0_91 it offers similar
throughput as MessageDigest SHA-1:
system 239.75 MiB/s
system 244.71 MiB/s
system 245.00 MiB/s
system 244.92 MiB/s
sha1 234.08 MiB/s
sha1 244.50 MiB/s
sha1 242.99 MiB/s
sha1 241.73 MiB/s
This is the fastest implementation I could come up with. Common SHA-1
implementation tricks such as unrolling loops creates a method too
large for the JIT to effectively optimize, resulting in lower overall
hashing throughput. Using a preprocessor to perform the register
renaming of A-E also didn't help, as again the method was too large
for the JIT to effectively optimize.
Fortunately the fastest version is a naive, straight-forward
implementation very close to the description in RFC 3174.
Change-Id: I228b05c4a294ca2ad51386cf0e47978c68e1aa42
Since the introduction of generic type parameter inference in Java 7,
it's not necessary to explicitly specify the type of generic parameters.
Enable the warning in Eclipse, and fix all occurrences.
Change-Id: I9158caf1beca5e4980b6240ac401f3868520aad0
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>