Some ancient objects may be broken, but in a relatively harmless way.
Allow the ObjectChecker caller to whitelist specific objects that are
going to fail checks, but that have been reviewed by a human and decided
the objects are OK enough to permit continued use of.
This avoids needing to rewrite history to scrub the broken objects out.
Honor the git-core fsck.skipList configuration setting when receiving a
push or fetching from a remote repository.
Change-Id: I62bd7c0b0848981f73dd7c752860fd02794233a6
Hoist ObjectIdSet up to lib as part of the public API and add
the interface to some common types like PackIndex and JGit custom
ObjectId map types. This cleans up wrapper code in a number of
places by allowing direct use of the types as an ObjectIdSet.
Future commits can now rely on ObjectIdSet as a simple read-only
type to check a set of objects from a number of storage options.
Change-Id: Ib62b062421d475bd52abd6c84a73916ef36e084b
If a file (e.g. "A") and a subtree file (e.g. "A/foo.c") both appear
in the DirCache this cache should not be written out as a tree object.
The "A" file and "A" subtree conflict with each other in the same tree
and will fail fsck.
Detect this condition during DirCacheBuilder and DirCacheEditor
finish() so the application can be halted early before it updates a
DirCache that might later write an invalid tree structure.
Change-Id: I95660787e88df336297949b383f4c5fda52e75f5
If a PathEdit tries to store a file where a subtree was, or a subtree
where a file was, replace the entry in the DirCache with the new
name(s). This supports switching between file and tree entry types
using a DirCacheEditor.
Add new unit tests to cover the conditions where these can happen.
Change-Id: Ie843d9388825f9e3d918a5666aa04e47cd6306e7
Adding a path that already exists but is changing type such as
from symlink to subdirectory requires a NameConflictTreeWalk to
match up the two different entry types that share the same name.
NameConflictTreeWalk needs a bug fix to pop conflicting entries
when PathFilterGroup aborts the walk early so that it does not
allow DirCacheBuilderIterator to copy conflicting entries into
the output cache.
Change-Id: I61b49cbe949ca8b4b98f9eb6dbe7b1f82eabb724
Similar to nested directories, nested copyfiles won't work with git submodule
either.
Change-Id: Idbe965ec20a682fca0432802858162f8238f05de
Signed-off-by: Yuxuan 'fishy' Wang <fishywang@google.com>
Handle existing symlink as a file, not as directory if deleting a file
before creating (overriding) a symlink.
Bug: 484491
Change-Id: I29dbf57d1daec2ba98454975b093e1d381d05196
Signed-off-by: Andrey Loskutov <loskutov@gmx.de>
Previously, non-reuse deltas were only included in packStatistics if they
were not cached by the deltaWindow.
Change-Id: I7684d8214875f0a7569b34614f8a3ba341dbde9c
Signed-off-by: James Kolb <jkolb@google.com>
PathFilter and PathFilterGroup form JGit's implementation of git's
path-limiting feature in commands like log and diff. To save time
when traversing trees, a path specification
foo/bar/baz
tells the tree walker not to traverse unrelated trees like qux/. It
does that by returning false from include when the tree walker is
visiting qux and true when it is visiting foo.
Unfortunately that test was implemented to be slightly over-eager: it
doesn't only return true when asked whether to visit a subtree "foo"
but when asked about a plain file "foo" as well. As a result, diffs
and logs restricted to some-file/non-existing-suffix unexpectedly
match against some-file:
$ jgit log -- LICENSE/no-such-file
commit 629fd0d594
Author: Shawn O. Pearce <spearce@spearce.org>
Date: Fri Jul 02 14:52:49 2010 -0700
Clean up LICENSE file
[...]
Fix it by checking against the entry's mode.
Gitiles +log has the same bug and benefits from the same fix.
Callers know not to worry about what subtrees are included in the tree
walk because shouldBeRecursive() returns true in this case, so this
behavior change should be safe. This also better matches the behavior
of C git:
$ empty=$(git mktree </dev/null)
$ git diff-tree --abbrev $empty HEAD -- LICENSE/no-such-file
$ git diff-tree --abbrev $empty HEAD -- tools/no-such-file
:000000 040000 0000000... b62648d... A tools
Bug: 484266
Change-Id: Ib4d53bddd8413a9548622c7b25b338d287d8889d
Expand the existing PathFilterGroup tests to check which paths the
tree entry matches. This expands test coverage by ensuring that
PathFilterGroup's simpler code path to match against a single
PathFilter works correctly.
While at it, move the check on tree entry d/e/f/g.y into two separate
tests: one to check that it doesn't match any of the configured paths,
and another to check that it does not throw StopWalkException to end
the walk early.
Change-Id: I55bd512cd049fc2018659e2f86a4b8650f171fda
If an application uses PushConnection directly on the native Git wire
protocols JGit should send along the application's expected oldId, not
the advertised value. This allows the remote peer to compare-and-swap
since it was not tested inside JGit.
Discovered when I tried to use a PushConnection (bypassing the
standard PushProcess) and the client blindly overwrote the remote
reference, even though my app had supplied the wrong ObjectId for
the expectedOldObjectId. This was not expected and cost me over an
hour of debugging, plus "corruption" in the remote repository.
By passing along the exact expectedOldObjectId from the app the
remote side can do the check that the application skipped, and
avoid data loss.
Change-Id: Id3920837e6c47100376225bb4dd61fa3e88c64db
FileTreeIterator was calling by mistake
WorkingTreeIterator.idSubmodule(Entry). Instead it should always compute
idSubmodule on its own.
Change-Id: Id1b988aded06939b1d7edd2671e34bf756896c0e
This should mirror the behavior of `git push --atomic` where the
client asks the server to apply all-or-nothing. Some JGit servers
already support this based on a custom DFS backend. InMemoryRepository
is extended to support atomic push for unit testing purposes.
Local disk server side support inside of JGit is a more complex animal
due to the excessive amount of file locking required to protect every
reference as a loose reference.
Change-Id: I15083fbe48447678e034afeffb4639572a32f50c
Instead of checking every entry for .gitattributes only look for the
entry on request by TreeWalk. This avoids impacting uses like RevWalk
filtering history.
When the attrs is requested skip to the start of the tree and look for
.gitattributes until either it is found, or it is impossible to be
present. Due to the sorting rules of tree entries .gitattributes
should be among the first or second entries in the tree so very few
entries will need to be considered.
Waiting to find the .gitattributes file by native ordering may miss
attrs for files like .config, which sorts before .gitattributes.
Starting from the front of the tree on demand ensures the attributes
are parsed as early as necessary to process any entry in the tree.
Due to TreeWalk recursively processing up the tree of iterators we
cannot just reset the current CanonicalTreeParser to the start as
parent parsers share the same path buffer as their children.
Resetting a parent to look for .gitattributes may overwrite path
buffer data used by a child iterator. Work around this by building a
new temporary CanonicalTreeParser instance.
Change-Id: Ife950253b687be325340d27e9915c9a40df2641c
The checkPath function is available as a byte[] form, in fact the
String form just converts to byte[] to run the algorithm.
Having DirCacheEntry take a byte[] -> String -> byte[] to check if
each path is valid is a huge waste of CPU time. On some systems it
can double the time required to read 38,999 files from trees to the
DirCache. This slows down any operation using a DirCache.
Expose the byte[] form and use it for DirCacheEntry creation.
Change-Id: I6db7bc793ece99ff3c356338d793c07c061aeac7
If defined in .gitattributes call smudge filter during checkout.
To support checkout where current HEAD,index do not contain attributes
we need to also consider attributes from the tree we checkout. Therefore
CanonicalTreeParser has to learn how to provide attributes.
Change-Id: I168fdb81a8e1a9f991587b3e95a36550ea845f0a
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When filters are defined for certain paths in gitattributes make
sure that clean filters are processed when adding new content to the
object database.
Change-Id: Iffd72914cec5b434ba4d0de232e285b7492db868
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Attributes represents a semantic collector of Attribute(s) and replaces
the anonymous Map<String,Attribute>. This class will be returned by
TreeWalk.getAttributes(). It offers convenient access to the attributes
wrapped in the Attributes object. Adds preparations for a future
Attribute Macro Expansion
Change-Id: I8348c8c457a2a7f1f0c48050e10399b0fa1cdbe1
Signed-off-by: Ivan Motsch <ivan.motsch@bsiag.com>
PushCommandTest and RunExternalScriptTest didn't succeed on Windows.
Fix this by expecting a simple line-feed as line ending (instead of the
platform dependent line separator. Additionally correct the computation
of expected URLs in PushCommandTest.
Change-Id: Idcdc41cd7e535ff88df33ea0a249333ed8fc91b0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This test expected that the test scripts emit a platform-dependent
newline (crlf on windows, lf on linux). But that's not true. Expected
result should always be a trailing "\n" because the test scripts
explicitly echo a "\n" in the end.
Change-Id: I604e08cda8cebe276b5214ba0f618b6112c3441f
The Repository class provides only one method to look up a ref by
name, getRef. If I request refs/heads/master and that ref does not
exist, getRef will look further in the search path:
ref/refs/heads/master
refs/heads/refs/heads/master
refs/remotes/refs/heads/master
This behavior is counterintuitive, needlessly inexpensive, and usually
not what the caller expects.
Allow callers to specify whether to use the search path by providing
two separate methods:
- exactRef, which looks up a ref when its exact name is known
- findRef, which looks for a ref along the search path
For backward compatibility, keep getRef as a deprecated synonym for
findRef.
This change introduces findRef and exactRef but does not update
callers outside tests to use them yet.
Change-Id: I35375d942baeb3ded15520388f8ebb9c0cc86f8c
Signed-off-by: Jonathan Nieder <jrn@google.com>
The class FS_Win32 was always trying out to create a temporary symlink
in order to find out whether symlinks are supported. FS_Win32_Cygwin was
overwriting this method and always returned true. But when the user
running JGit does not have administrative rights then the creation of
symlinks is forbidden even if he is running on FS_Win32_Cygwin. A lot of
tests failed only on the Windows platform because of this. It was
correctly detected that FS_Win32_Cygwin is the filesystem abstraction to
be used but creation of symlinks always failed because of lacking
privileges of the user running the tests.
This fix teaches FS_Win32_Cygwin to behave like FS_Win32 and to test
whether symlinks can be created in order to find out whether symlinks
are supported.
Change-Id: Ie2394631ffc4c489bd37c3ec142ed44bbfcac726
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Remove references to the bundle org.eclipse.jgit.java7 which was removed
in 4.0.
Change-Id: I85527eb2a34bb94979fdab1311043ae77a2b5ecd
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Prevent that WalkEncryptionTest fails when it can't determine the public
IP address using http://checkip.amazonws.com. Also set timeouts when
determining IP address in order to prevent long wait times during tests.
Change-Id: I1d2fe09f99df2a5f75f8077811a72fb2271cdddb
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
getRef() loops over its search path to find a ref:
Ref ref = null;
for (String prefix : SEARCH_PATH) {
ref = readRef(prefix + needle, packed);
if (ref != null) {
ref = resolve(ref, 0, null, null, packed);
break;
}
}
fireRefsChanged();
return ref;
If readRef returns null (indicating that the ref does not exist), the
loop continues so we can find the ref later in the search path. And
resolve should never return null, so if we return null it should mean
we exhausted the entire search path and didn't find the ref.
... except that resolve can return null: it does so when it has
followed too many symrefs and concluded that there is a symref loop:
if (MAX_SYMBOLIC_REF_DEPTH <= depth)
return null; // claim it doesn't exist
Continue the loop instead of returning null immediately. This makes
the behavior more consistent.
Arguably getRef should throw an exception when a symref loop is
detected. That would be a more invasive change, so if it's a good
idea it will have to wait for another patch.
Change-Id: Icb1c7fafd4f1e34c9b43538e27ab5bbc17ad9eef
Signed-off-by: Jonathan Nieder <jrn@google.com>
Adds the getAttributes feature to the tree walk. The computation of
attributes needs to be done by the TreeWalk since it needs both a
WorkingTreeIterator and a DirCacheIterator.
Bug: 342372
CQ: 9120
Change-Id: I5e33257fd8c9895869a128bad3fd1e720409d361
Signed-off-by: Arthur Daussy <arthur.daussy@obeo.fr>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
When attempting to determine the size of a blob that does not exist,
the RenameDetector throws a MissingObjectException.
The fix is to return a size of zero if the size is requested for a blob
id that doesn't exist.
Bug: 481577
Change-Id: I4e86276039c630617610cc51d0eefa56d7d3952f
Signed-off-by: Rüdiger Herrmann <ruediger.herrmann@gmx.de>
When asked to read a symref pointing to a branch-yet-to-be-born (such
as HEAD in a newly initialized repository), DfsRepository and
FileRepository return different results.
FileRepository:
exactRef("HEAD") => null
DfsRepository:
exactRef("HEAD") => SymbolicRef[HEAD -> refs/heads/master=00000000]
getRef("HEAD") returns the same as DfsRepository's exactRef in both
backends.
The intended behavior is the DfsRepository one: exactRef() is supposed
to be like getRef(), but more exact because it doesn't need to
traverse the search path.
The discrepancy is because DfsRefDatabase implements exactRef()
directly with the intended semantics, while RefDirectory uses a
fallback implementation built on top of getRefs(). getRefs() skips
symrefs to an unborn branch.
Override the fallback implementation with a correct implementation
that is similar to getRef() to avoid this. A followup change will fix
the fallback.
Change-Id: Ic138a5564a099ebf32248d86b93e2de9ab3c94ee
Reported-by: David Pursehouse <david.pursehouse@sonymobile.com>
Improved-by: Christian Halstrick <christian.halstrick@sap.com>
Bug: 478865
Prior to this change, DfsInserter would not insert an object into a pack
if it already existed in another pack in the repository, even if that
pack was unreachable. Consider this sequence of events:
- Object FOO is pushed to a repository.
- Subsequent ref changes make FOO UNREACHABLE_GARBAGE.
- FOO is subsequently re-inserted using a DfsInserter, but skipped
due to existing in UNREACHABLE_GARBAGE.
- The repository is repacked; FOO will not be written into a new pack
because it is not yet reachable from a reference. If the
UNREACHABLE_GARBAGE packs are deleted, FOO disappears.
- A reference is updated to reference FOO. This reference is now broken
as FOO was removed when the repacking process deleted the
UNREACHABLE_GARBAGE pack that stored the only copy of FOO.
The garbage collector can't safely delete the UNREACHABLE_GARBAGE
pack because FOO might be in the middle of being re-inserted/re-packed.
This change writes a duplicate copy of an object if it only exists in
UNREACHABLE_GARBAGE. This "freshens" the object to give it a chance to
survive long enough to be made reachable through a reference.
Change-Id: I20f2062230f3af3bccd6f21d3b7342f1152a5532
Signed-off-by: Mike Williams <miwilliams@google.com>
When the file <git-dir>/hooks/pre-push exists make sure that is is
executing during a push. The pre-push hook runs during git push, after
the remote refs have been updated but before any objects have been
transferred.
Change-Id: Ibbb58ee3227742d1a2f913134ce11e7a135c7f4c
In order to support filters in gitattributes FS.runProcess() is made
public. Support for stdin redirection has been added. Support for binary
data on stdin/stdout (as used be clean/smudge filters) has been added.
Change-Id: Ice2c152e9391368dc5748d7b825a838e3eb755f9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When commits are selected for bitmap generation, they are reordered
so that related "chains" of commits are grouped together. Chains are
"subbranches" of commits that may branch off of and re-merge with the
main line. Grouping by chains means that the XOR difference between
consecutive selected commits will be smaller, resulting in better
run-length compression of the XORed bitmaps.
Add a new testSelectionOrderingWithChains() test in a new
GcCommitSelectionTest test class. Also move related GC commit selection
tests out of GcBasicPackingTest and into GcCommitSelectionTest.
Change-Id: I8e80cac29c4ca8193b41c9898e5436c22a659f11
Signed-off-by: Terry Parker <tparker@google.com>
Expose the following bitmap selection parameters via PackConfig:
"bitmapContiguousCommitCount", "bitmapRecentCommitCount",
"bitmapRecentCommitSpan", "bitmapDistantCommitSpan",
"bitmapExcessiveBranchCount", and "bitmapInactiveBranchAge".
The value of bitmapContiguousCommitCount, whereby bitmaps are
created for the most recent N commits in a branch, has never
been verified. If experiments show that they are not valuable,
then we can simplify the implementation so that there is only
a concept of recent and distant commit history (defined by
"bitmapRecentCommitCount"), and the only controls we need are
"bitmapRecentCommitSpan" and "bitmapDistantCommitSpan".
Change-Id: I288bf3f97d6fbfdfcd5dde2699eff433a7307fb9
Signed-off-by: Terry Parker <tparker@google.com>
Replace the “bitmapCommitRange” parameter that was recently introduced
with two new parameters: “bitmapExcessiveBranchCount” and
“bitmapInactiveBranchAgeInDays”. If the count of branches does not
exceed “bitmapExcessiveBranchCount”, then the current algorithm is kept
for all branches.
If the branch count is excessive, then the commit time for the tip
commit for each branch is used to determine if a branch is “inactive”.
"Active" branches get full commit selection using the existing
algorithm. "Inactive" branches get fewer bitmaps near the branch tips.
Introduce a "contiguousCommitCount" parameter that always enforces that
the N most recent commits in a branch are selected for bitmaps. The
previous nextSelectionDistance() algorithm created anywhere from 1-100
contiguous bitmaps at branch tips.
For example, consider a branch with commits numbering 0-300, with 0
being the most recent commit. If the most recent 200 commits are not
merge commits and the 200th commit was the last one selected,
nextSelectionDistance() returned 100, causing commits 200-101 to be
ignored. Then a window of size 100 was evaluated, searching for merge
commits. Since no merge commits are found, the next commit (commit 0)
was selected, for a total of 1 commit in the topmost 100 commits.
If instead the 250th commit was selected, then by the same logic
commit 50 is selected. At that point nextSelectionDistance() switches to
selecting consecutive commits, so commits 0-50 in the topmost 100
commits are selected. The "contiguousCommitCount" parameter provides
more determinism by always selecting a constant number or topmost
commits.
Add an optimization to break out of the inner loop of selectCommits() if
all of the commits for the current branch have already been found.
When reusing bitmaps from an existing pack, remove unnecessary
populating and clearing of the writeBitmaps/PackBitmapIndexBuilder.
Add comments to PackWriterBitmapPreparer, rename methods and variables
for readability.
Add tests for bitmap selection with and without merge commits and with
excessive branch pruning triggered.
Note: I will follow up with an additional change that exposes the new
parameters through PackConfig.
Change-Id: I5ccbb96c8849f331c302d9f7840e05f9650c4608
Signed-off-by: Terry Parker <tparker@google.com>
Since Git 2.6 wildcard restrictions for refspecs have been loosened:
refspecs like "refs/heads/*foo:refs/heads/foo*" are valid now.
See Git commit 8d3981ccbed9fc211b4e67105015179d9d2a5692
Change-Id: Icb78afbd282c425173b3a7bc10eadc4015689bb8
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
LocalDiskRepositoryTestCase and TestRepository have competing ideas
about time. Push them into MockSystemReader so they can
cooperate.
Rename getClock() methods that return Dates to getDate().
Change-Id: Ibbd9fe7f85d0064b0a19e3b675b9718a9e67c479
Signed-off-by: Terry Parker <tparker@google.com>
Building on top of https://git.eclipse.org/r/#/c/56391/
Here we preserve compatibility with JetS3t
and add 2 new native JGit encryption implementations.
For reference, see connection configuration files:
* Version 0: jgit-s3-connection-v-0.properties
* Version 1: jgit-s3-connection-v-1.properties
* Version 2: jgit-s3-connection-v-2.properties
Change-Id: I713290bcacbe92d88e5ef28ce137de73dd1abe2f
Signed-off-by: Andrei Pozolotin <andrei.pozolotin@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
See previous attempt: https://git.eclipse.org/r/#/c/16674/
Here we preserve as much of JetS3t mode as possible
while allowing to use new Java 8+ PBE algorithms
such as PBEWithHmacSHA512AndAES_256
Summary of changes:
* change pom.xml to control long tests
* add WalkEncryptionTest.launch to run long tests
* add AmazonS3.Keys to to normalize use of constants
* change WalkEncryption to support AES in JetS3t mode
* add WalkEncryptionTest to test remote encryption pipeline
* add support for CI configuration for live Amazon S3 testing
* add log4j based logging for tests in both Eclipse and Maven build
To test locally, check out the review branch, then:
* create amazon test configuration file
* located your home dir: ${user.home}
* named jgit-s3-config.properties
* file format follows AmazonS3 connection settings file:
accesskey = your-amazon-access-key
secretkey = your-amazon-secret-key
test.bucket = your-bucket-for-testing
* finally:
* run in Eclipse: WalkEncryptionTest.launch
* or
* run in Shell: mvn test --define test=WalkEncryptionTest
Change-Id: I6f455fd9fb4eac261ca73d0bec6a4e7dae9f2e91
Signed-off-by: Andrei Pozolotin <andrei.pozolotin@gmail.com>