In order to validate .gitmodules files, we first need to find them
in the incoming pack.
Do it in the ObjectChecker stage. Check in the tree objects if they
point to a .gitmodules file and report the tree id and the .gitmodules
blob id.
This can be used later to check if the file is in the root of the
project and if the contents are good.
While we're here, make isMacHFSGit more accurate by detecting variants
of filenames that vary in case.
[jn: tweaked NTFS and HFS+ checking; added more tests]
Change-Id: I70802e7d2c1374116149de4f89836b9498f39582
Signed-off-by: Ivan Frade <ifrade@google.com>
Signed-off-by: Jonathan Nieder <jrn@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Instead of hard-coding the charset strings "US-ASCII", "UTF-8", and
"ISO-8859-1", use the corresponding constants from StandardCharsets.
UnsupportedEncodingException is not thrown when the StandardCharset
constants are used, so remove the now redundant handling.
Because the encoding names are no longer hard-coded strings, also
remove redundant $NON-NLS warning suppressions.
Also replace existing usages of the constants with static imports.
Change-Id: I0a4510d3d992db5e277f009a41434276f95bda4e
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
The tests:
- testCheckBlobNotCorrupt
- testCheckBlobCorrupt
create instances of ObjectChecker that are the same.
The tests:
- testCheckBlobWithBlobObjectCheckerNotCorrupt
- testCheckBlobWithBlobObjectCheckerCorrupt
also create instances of ObjectChecker that are the same.
Factor these instances out to constants instead of creating them
in the tests.
The `checker` member is still created anew in each test, since some
of the tests change its state.
Change-Id: I2d90263829d01d208632185b1ec2f678ae1a3f4c
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
Some repositories can have a policy that do not accept certain blobs. To
check if the incoming pack file contains such blobs, ObjectChecker can
be used. However, this ObjectChecker is not called by PackParser if the
blob is stored as a whole. This is because the object can be so large
that it doesn't fit in memory.
This change introduces BlobObjectChecker. This interface takes chunks of
a blob instead of the entire object. ObjectChecker can optionally return
a BlobObjectChecker. This won't change existing ObjectChecker
implementation; existing implementation continues to receive deltified
blob objects only.
Change-Id: Ic33a92c2de42bd7a89786a4da26b7a648b25218d
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
JGit already had some fsck-like classes like ObjectChecker which can
check for an individual object.
The read-only FsckPackParser which will parse all objects within a pack
file and check it with ObjectChecker. It will also check the pack index
file against the object information from the pack parser.
Change-Id: Ifd8e0d28eb68ff0b8edd2b51b2fa3a50a544c855
Signed-off-by: Zhen Chen <czhen@google.com>
Consolidate copies of this function into one location.
Add some unit tests to prevent bugs that were accidentally
introduced while trying to make this refactoring.
Change-Id: I82f64bbb8601ca2d8316ca57ae8119df32bb5c08
Accept some of the same section keys that fsck does in git-core,
allowing repositories to skip over specific kinds of acceptable
broken objects, e.g.:
[fsck]
duplicateEntries = ignore
zeroPaddedFilemode = ignore
The zeroPaddedFilemode = ignore is a synonym for the JGit specific
allowLeadingZeroFileMode = true. Only accept the JGit key if git-core
key was not specified.
Change-Id: Idaed9310e2a5ce5511670ead1aaea2b30aac903c
Some ancient objects may be broken, but in a relatively harmless way.
Allow the ObjectChecker caller to whitelist specific objects that are
going to fail checks, but that have been reviewed by a human and decided
the objects are OK enough to permit continued use of.
This avoids needing to rewrite history to scrub the broken objects out.
Honor the git-core fsck.skipList configuration setting when receiving a
push or fetching from a remote repository.
Change-Id: I62bd7c0b0848981f73dd7c752860fd02794233a6
A larger than expected number of real-world repositories found on
the Internet contain invalid author, committer and tagger lines
in their history. Many of these seem to be caused by users misusing
the user.name and user.email fields, e.g.:
[user]
name = Au Thor <author@example.com>
email = author@example.com
that some version of Git (or a reimplementation thereof) copied
directly into the object header. These headers are not valid and
are rejected by a strict fsck, making it impossible to transfer
the repository with JGit/EGit.
Another form is an invalid committer line with double negative for
the time zone, e.g.
committer Au Thor <a@b> 1288373970 --700
The real world is messy. :(
Allow callers and users to weaken the fsck settings to accept these
sorts of breakages if they really want to work on a repo that has
broken history. Most routines will still function fine, however
commit timestamp sorting in RevWalk may become confused by a corrupt
committer line and sort commits out of order. This is mostly fine if
the corrupted chain is shorter than the slop window.
Change-Id: I6d529542c765c131de590f4f7ef8e7c1c8cb9db9
Mac's HFS+ folds concatentations of ".git" and ignorable Unicode
characters [1] to ".git" [2]. Hence we need to disallow all names which
could potentially be a shortname for ".git". Example: in an empty
directory create a folder ".g\U+200Cit". Now you can't create another
folder ".git".
The following characters are ignorable Unicode which are ignored on
HFS+:
unicode hex name
-------------------------------------------------
U+200C 0xe2808c ZERO WIDTH NON-JOINER
U+200D 0xe2808d ZERO WIDTH JOINER
U+200E 0xe2808e LEFT-TO-RIGHT MARK
U+200F 0xe2808f RIGHT-TO-LEFT MARK
U+202A 0xe280aa LEFT-TO-RIGHT EMBEDDING
U+202B 0xe280ab RIGHT-TO-LEFT EMBEDDING
U+202C 0xe280ac POP DIRECTIONAL FORMATTING
U+202D 0xe280ad LEFT-TO-RIGHT OVERRIDE
U+202E 0xe280ae RIGHT-TO-LEFT OVERRIDE
U+206A 0xe281aa INHIBIT SYMMETRIC SWAPPING
U+206B 0xe281ab ACTIVATE SYMMETRIC SWAPPING
U+206C 0xe281ac INHIBIT ARABIC FORM SHAPING
U+206D 0xe281ad ACTIVATE ARABIC FORM SHAPING
U+206E 0xe281ae NATIONAL DIGIT SHAPES
U+206F 0xe281af NOMINAL DIGIT SHAPES
U+FEFF 0xefbbbf ZERO WIDTH NO-BREAK SPACE
[1] http://www.unicode.org/versions/Unicode7.0.0/ch05.pdf#G40025http://www.unicode.org/reports/tr31/#Layout_and_Format_Control_Characters
[2] http://dubeiko.com/development/FileSystems/HFSPLUS/tn1150.html#UnicodeSubtleties
Change-Id: Ib6a1dd090b2649bdd8ec16387c994ed29de2860d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Windows creates shortnames for all non-8.3 files (see [1]). Hence we
need to disallow all names which could potentially be a shortname for
".git". Example: in an empty directory create a folder "GIT~1". Now you
can't create another folder ".git".
The path "GIT~1" may map to ".git" on Windows. A potential victim to
such an attack first has to initialize a git repository in order to
receive any git commits. Hence the .git folder created by init will get
the shortname "GIT~1". ".git" will only get a different shortname if the
user has created a file "GIT~1" before initialization of the git
repository.
[1] http://en.wikipedia.org/wiki/8.3_filename
Change-Id: I9978ab8f2d2951c46c1b9bbde57986d64d26b9b2
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Windows treats "foo." and "foo " as "foo". The ".git" directory is
special, as it contains metadata for a local Git repository. Disallow
variations that Windows considers to be the same.
Change-Id: I28eb48859a95a89111b4987c91de97557e3bb539
The component name ".GIT" inside a tree entry could confuse a
case insensitive filesystem into looking at a submodule and
not a directory entry.
Disallow any case permutations of ".git" to prevent this
confusion from entering a repository and showing up at a
later date on a case insensitive system.
Change-Id: Iaa3f768931d0d5764bf07ac5f6f3ff2b1fdda01b
When safeForMacOS is enabled the checker verifies a name does not
match against another name in the same tree after normalization to
NFC. The check was incorrect and failed when the first name was put
in, rejecting simple trees containing only one file like "F".
Add a test for this simple tree to verify it is accepted.
Fix the test for NFC normalization to actually normalize
and have a collision.
Change-Id: I39e7d71150948872bff6cd2b06bf8dae52aa3c33
Mac OS X and Windows filesystems are generally case insensitive and
will fold 'a' and 'A' to the same directory entry. If the checker is
enforcing safe semantics for these platforms, track all names and
look for duplicates after folding case and normalizing to NFC.
Change-Id: I170b6f649a72d6ef322b7254943d4c604a8d25b9
Reuse the generic logic in ObjectChecker to examine paths.
This required extracting the scanner loop to check for bad
characters within the path name segment.
Change-Id: I02e964d114fb544a0c1657790d5367c3a2b09dff
Most Mac OS X systems use a case insensitive HFS+ volume. Like
Windows ".git" and ".GIT" are the same path and can confuse a Git
program into expecting a repository where one does not exist.
Change-Id: Iec6ce9e6c2872f8b0850cc6aec023fa0fcb05ae4
If Windows rejection is enabled reject special device names like
NUL and PRN, including NUL.txt. This prevents a tree that might
be used on a Windows client from referencing a confusing name.
Change-Id: Ic700ea8fa68724509e0357d4b758a41178c4d70c
Repositories that are frequently checked out on Windows platforms
may need to ensure trees do not contain strange names that cause
problems on those systems. Follow the MSDN guidelines and refuse
to accept a tree containing a special character, or names that end
with " " (space) or "." (dot).
Since Windows filesystems are usually case insensitive, also reject
mixed case versions of the reserved ".git" name.
Change-Id: Ic3042444b1e162c6d01b88c7e6ea39b2a73c4eca
Using .git as a name in a tree is invalid for most Git repositories.
This can confuse clients into thinking there is a submodule or another
repository deeper in the tree, which is incorrect.
Change-Id: I90a1eaf25d45e91557f3f548b69cdcd8f7cddce1
The leading '0' is a broken mode that although incorrect in the
Git canonical tree format was created by a couple of libraries
frequently used on a popular Git hosting site. Some projects have
these modes stuck in their ancient history and cannot easily
repair the damage without a full history rewrite. Optionally permit
ObjectChecker to ignore them.
Bug: 307291
Change-Id: Ib921dfd77ce757e89280d1c00328a88430daef35
Invoke the wrapper types' valueOf via static imports.
For booleans used in asserts, add a new assert in
the JUnit utility package since out current version of JUnit
does not have the assert(boolean, boolean) method.
Change-Id: I9099bd8efbc8c133479344d51ce7dabed8958a2b
Eclipse has some problem re-running single JUnit tests if
the tests are in Junit 3 format, but the JUnit 4 launcher
is used. This was quite unnecessary and the move was not
completed. We still have no JUnit4 test.
This completes the extermination of JUnit3. Most of the
work was global searce/replace using regular expression,
followed by numerous invocarions of quick-fix and organize
imports and verification that we had the same number of
tests before and after.
- Annotations were introduced.
- All references to JUnit3 classes removed
- Half-good replacement for getting the test name. This was
needed to make the TestRngs work. The initialization of
TestRngs was also made lazily since we can not longer find
out the test name in runtime in the @Before methods.
- Renamed test classes to end with Test, with the exception
of TestTranslateBundle, which fails from Maven
- Moved JGitTestUtil to the junit support bundle
Change-Id: Iddcd3da6ca927a7be773a9c63ebf8bb2147e2d13
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Annotated tags created with C Git versions before the introduction
of c818566 ([PATCH] Update tags to record who made them, 2005-07-14),
do not have a "tagger" line present in the object header. This line
did not appear in C Git until v0.99.1~9.
Ancient projects such as the Linux kernel contain such tags, for
example Linux 2.6.12 is older than when this feature first appeared
in C Git. Linux v2.6.13-rc4 in late July 2005 is the first kernel
version tag to actually contain a tagger line.
It is therefore acceptable for the header to be missing, and for
the RevTag.getTaggerIdent() method to return null.
Since the Javadoc for getTaggerIdent() already explained that the
identity may be null, we just need to test that this is true when
the header is missing, and allow the ObjectChecker to pass anyway.
Change-Id: I34ba82e0624a0d1a7edcf62ffba72260af6f7e5d
See: http://code.google.com/p/gerrit/issues/detail?id=399
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Per CQ 3448 this is the initial contribution of the JGit project
to eclipse.org. It is derived from the historical JGit repository
at commit 3a2dd9921c8a08740a9e02c421469e5b1a9e47cb.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>