Counting the objects needed for packing is the most expensive part of
an UploadPack request that has no uninteresting objects (otherwise
known as an initial clone). During this phase the PackWriter is
enumerating the entire set of objects in this repository, so they can
be sent to the client for their new clone.
Allow the ObjectReader (and therefore the underlying storage system)
to keep a cached list of all reachable objects from a small number of
points in the project's history. If one of those points is reached
during enumeration of the commit graph, most objects are obtained from
the cached list instead of direct traversal.
PackWriter uses the list by discarding the current object lists and
restarting a traversal from all refs but marking the object list name
as uninteresting. This allows PackWriter to enumerate all objects
that are more recent than the list creation, or that were on side
branches that the list does not include.
However, ObjectWalk tags all of the trees and commits within the list
commit as UNINTERESTING, which would normally cause PackWriter to
construct a thin pack that excludes these objects. To avoid that,
addObject() was refactored to allow this list-based enumeration to
always include an object, even if it has been tagged UNINTERESTING by
the ObjectWalk. This implies the list-based enumeration may only be
used for initial clones, where all objects are being sent.
The UNINTERESTING labeling occurs because StartGenerator always
enables the BoundaryGenerator if the walker is an ObjectWalk and a
commit was marked UNINTERESTING, even if RevSort.BOUNDARY was not
enabled. This is the default reasonable behavior for an ObjectWalk,
but isn't desired here in PackWriter with the list-based enumeration.
Rather than trying to change all of this behavior, PackWriter works
around it.
Because the list name commit's immediate files and trees were all
enumerated before the list enumeration itself starts (and are also
within the list itself) PackWriter runs the risk of adding the same
objects to its ObjectIdSubclassMap twice. Since this breaks the
internal map data structure (and also may cause the object to transmit
twice), PackWriter needs to use a new "added" RevFlag to track whether
or not an object has been put into the outgoing list yet.
Change-Id: Ie99ed4d969a6bb20cc2528ac6b8fb91043cee071
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It can be very handy for the implementation to resort the
object list based on data locality, improving prefetch in
the operating system's buffer cache.
Export the list to the implementation was a proper List,
and document that its mutable and OK to be modified. The
only caller in PackWriter is already OK with these rules.
Change-Id: I3f51cf4388898917b2be36670587a5aee902ff10
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When performing an initial clone of a repository there are no
uninteresting commits, and the resulting pack will be completely
self-contained. Therefore PackWriter does not need to honor C
Git standard TOPO ordering as described in JGit commit ba984ba2e0
("Fix checkReferencedIsReachable to use correct base list").
Switching to COMMIT_TIME_DESC when there are no uninteresting commits
allows the "Counting objects" phase to emit progress earlier, as the
RevWalk will not buffer the commit list. When TOPO is set the RevWalk
enumerates all commits first, before outputing any for PackWriter to
mark progress updates from.
Change-Id: If2b6a9903b536c7fb3c45f85d0a67ff6c6e66f22
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These two methods are specific to the FileRepository implementation
and should not be exposed as part of the base Repository API. Now
that PackParser is generic and does not require these two methods
to import a pack stream into a repostiory, it is safe to remove
these and get them out of the public view.
Change-Id: I8990004d08074657f467849dabfdaa7e6674e69a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This problem surfaced since EGit Core ResetOperationTest is failing
since change I26806d21. JGit detected checkout conflict for untracked
files which never were tracked by the repository.
"git reset --hard" in c git also doesn't remove such untracked files.
Change-Id: Icc8e1c548ecf6ed48bd2979c81eeb6f578d347bd
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This information is generally useful - have followed the
accessor pattern of 'children' and 'parents'
Change-Id: I79b3ddd6f390152aa49e6b7a4c72a4aca0d6bc72
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
The change Ie0350e032a97e0d09626d6143c5c692873a5f6a2 was not
done properly. The renamed file was not write protected, and
this broke a test.
Bug: 335388
Change-Id: I41b2235b7677bc5fddc70dda2a56cdd2cb53ce5d
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
This is needed for implementing Fetch in EGit using the API.
Change-Id: Ibdcc95906ef0f93e3798ae20d4de353fb394f2e2
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
When DirCacheCheckout was checking out it was silently
overwriting untracked files. This is only ok if the
files are also ignored. Untracked and not ignored files
should not be overwritten. This fix adds checks for
this situation.
Because this change in the behaviour also broke tests
which expected that a checkout will overwrite untracked
files (PullCommandTest) these tests have to be modified
also.
Bug: 333093
Change-Id: I26806d2108ceb64c51abaa877e11b584bf527fc9
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
We cannot always rename read-only files on network shares,
so rename the temp file for a new loose object first, and
then set it as read-only.
Bug: 335388
Change-Id: Ie0350e032a97e0d09626d6143c5c692873a5f6a2
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
When debugging and enhancing DirCacheCheckout.processEntry() I found
that some of if-statements where hard to read/understand. This
change just splits some long if statements and adds more comments
explaining in which state we are. This change is only a preparation
for followup commits which introduce checks for untracked+ignored
files.
Change-Id: I670ff08310b72c858709b9e395f0aebb4b290a56
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
If HEAD exists but points to an not-existing branch the merge
command should silently create the missing branch and check
it out. This happens if you pull into freshly initalized repo.
HEAD points to refs/heads/master but refs/heads/master doesn't
exist. If you know merge a commit X into HEAD then the branch
master should be created (pointing to X) the working tree should
be updated to reflect X. That is achieved by checkout with one
tree only (HEAD is missing).
A test for this functionality will come the the next proposal
in PullCommandTest.
Change-Id: Id4a0d56d944e0acebd4b3157428bb50bd3fdd872
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Add possibility to disable ssl verification, just as i can do with git
using: git config --global http.sslVerify false
To enable the feature, configure
Window->Preferences->Team->Git->Configuration
and add a new key/value: http.sslVerify=false
When handling repos over https, JGit will then check that flag to see
if security is loose and the ssl verification should be ignored.
Having it implemented as a key/value makes it not too obvious in the
GUI - so the user must know what he/she is doing when adding it. Being
aware of the risks etc.
Bug: 332487
Change-Id: I2a1b8098b5890bf512b8dbe07da41036c0fc9b72
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Reading a repository for millions of missing objects might be very
expensive to perform, especially if the repository is on a network
filesystem or some other costly RPC backend. A repository owner
might choose to accept some risk in return for better performance,
so allow disabling collision checking when receiving a pack.
Currently there is no way for an end-user to disable this feature.
This is intentional, because it is generally *NOT* a good idea to
skip this check. Instead this feature is supplied for storage
implementations to bypass the default checking logic, should they
have their own custom routines that is just as effective but can
be handled more efficiently.
Change-Id: I90c801bb40e86412209de0c43e294a28f6a767a5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a pack uses OFS_DELTA only (e.g. its an initial push to a
repository) and PackParser's implementation is broken such that the
delta chain that hangs below a particular object offset is empty, the
entryCount won't match the expected objectCount. Fail fast rather
than claiming the stream was parsed correctly.
The current implementation is not broken as described above. I broke
the code when I implemented my own new subclass of PackParser (which
incorrectly mucked with the object offset information), leading me to
discover this consistency check was missing.
Change-Id: I07540f0ae1144ef6f3bda48774dbdefb8876e1d3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
By moving the logic that parses a pack stream from the network (or
a bundle) into a type that can be constructed by an ObjectInserter,
repository implementations have a chance to inject their own logic
for storing object data received into the destination repository.
The API isn't completely generic yet, there are still quite a few
assumptions that the PackParser subclass is storing the data onto
the local filesystem as a single file. But its about the simplest
split of IndexPack I can come up with without completely ripping
the code apart.
Change-Id: I5b167c9cc6d7a7c56d0197c62c0fd0036a83ec6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
RevFilter.include()'s documentation promises the RevCommit's
body is parsed before include is invoked. This wasn't always
true if the commit was parsed once, had its body discarded,
the RevWalk was reset() and started a new traversal.
Change-Id: Ie5cafde09ae870712b165d8a97a2c9daf90b1dbd
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
This is needed for commands that use Transport internally.
Change-Id: I9417c85255b160723968c647063b9c7e05995ea4
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Additionally, defined the NoteMap.getNote method which returns a Note
instance. These changes were necessary to enable implementation of
the NoteMerger interface (the merge method needs to instantiate a
Note) and to enable direct use of NoteMerger which expects instances
of Note class as its paramters. Implementing creation of code review
summary notes in Gerrit [1] will make use of both of these features.
[1] https://review.source.android.com/#change,20045
Change-Id: I627aefcedcd3434deecd63fa1d3e90e303b385ac
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Instead of offering only a high-level isModified() method a new
method compareMetadata() is introduced which compares a working tree entry
and a index entry by looking at metadata only. Some use-cases
(e.g. computing the content-id in idBuffer()) may use this new method
instead of isModified().
Change-Id: I4de7501d159889fbac5ae6951f4fef8340461b47
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Change-Id: I4f05bdb0c58b039bd379341a6093f06a2cdfec6e
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The java.io.File.createNewFile() method for creating new empty files
reports failure by returning false. To ease proper checking of return
values provide a utility method wrapping createNewFile() throwing
IOException on failure.
Change-Id: I42a3dc9d8ff70af62e84de396e6a740050afa896
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If remote branches are present they can not be added
to the RefMap from the local branches - the two RefMaps
have a different value of 'prefix' and consequently an
IllegalArgumentException is thrown.
Java's user.home is not the same as $HOME so EGit did see the
same global configuration as C Git does.
Bug: 333269
Change-Id: Id54fc5292bf8c5a67177f9097ee692717a7df336
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
There is a space missing between <from> and "to" in the reflog
message produced by the CheckoutCommand, which is of the form
moving from <from> to <to>
Change-Id: I3dc57ab0a6589292db77a17d9029ee9499dfc725
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
This is needed by a EGit change
http://egit.eclipse.org/r/#change,2232
Change-Id: I3d62f904b769fc2f1b7b8f0f24f7dd757fc9c379
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Instead just return success. In the case that no commit has been
cherry-picked or reverted, just return the old HEAD.
Bug: 333814
Change-Id: I67db2b77b52c43932436d22a8daa5a6556423484
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Merging Git notes branches has several differences from merging "normal"
branches. Although Git notes are initially stored as one flat tree the
tree may fanout when the number of notes becomes too large for efficient
access. In this case the first two hex digits of the note name will be
used as a subdirectory name and the rest 38 hex digits as the file name
under that directory. Similarly, when number of notes decreases a fanout
tree may collapse back into a flat tree. The Git notes merge algorithm
must take into account possibly different tree structures in different
note branches and must properly match them against each other.
Any conflict on a Git note is, by default, resolved by concatenating
the two conflicting versions of the note. A delete-edit conflict is, by
default, resolved by keeping the edit version.
The note merge logic is pluggable and the caller may provide custom
note merger that will perform different merging strategy.
Additionally, it is possible to have non-note entries inside a notes
tree. The merge algorithm must also take this fact into account and
will try to merge such non-note entries. However, in case of any merge
conflicts the merge operation will fail. Git notes merge algorithm is
currently not trying to do content merge of non-note entries.
Thanks to Shawn Pearce for patiently answering my questions related to
this topic, giving hints and providing code snippets.
Change-Id: I3b2335c76c766fd7ea25752e54087f9b19d69c88
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Patterns containing only a trailing slash have to be treated
as "global" patterns. For example: "classes/" matches "classes"
as well as "dir/classes" directory.
When an application asks for the names in a section, it may want to
see the existing case that was stored by the user. For example,
Gerrit Code Review wants to store a configuration block like:
[access "refs/heads/master"]
label-Code-Review = group Developers
and although the name label-Code-Review is case-insensitive, it wants
to display the case as it appeared in the configuration file.
When enumerating section names or variable names (both of which are
case-insensitive), Config now keeps track of the string that first
appeared, and presents them in file order, permitting applications to
use this information. To maintain case-insensitive behavior, the
contains() method of the returned Set<String> still performs a
case-insensitive compare.
This is a behavior change if the caller enumerates the returned
Set<String> and copies it to his own Set<String>, and then performs
contains() tests against that, as the strings are now the original
case from the configuration block. But I don't think anyone actually
does this, as the returned sets are immutable and are cached.
Change-Id: Ie4e060ef7772958b2062679e462c34c506371740
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of using the current thread's stack to recurse through the
delta chain, use a linked list that is stored in the heap. This
permits the any thread to load a deep delta chain without running out
of thread stack space.
Despite needing to allocate a stack entry object for each delta
visited along the chain being loaded, the object allocation count is
kept the same as in the prior version by removing the transient
ObjectLoaders from the intermediate objects accessed in the chain.
Instead the byte[] for the raw data is passed, and null is used as a
magic value to signal isLarge() and enter the large object code path.
Like the old version, this implementation minimizes the amount of
memory that must be live at once. The current delta instruction
sequence, the base it applies onto, and the result are the only live
data arrays. As each level is processed, the prior base is discarded
and replaced with the new result.
Each Delta frame on the stack is slightly larger than the standard
ObjectLoader.SmallObject type that was used before, however the Delta
instances should be smaller than the old method stack frames, so total
memory usage should actually be lower with this new implementation.
Change-Id: I6faca2a440020309658ca23fbec4c95aa637051c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We will need to iterate over all notes of a NoteMap, at least this will be
needed for testing purposes. This change also implied making the Note class
public.
Change-Id: I9b0639f9843f457ee9de43504b2499a673cd0e77
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
This is almost reverted cherry-pick, and the implementation is
almost identical. It orders the input to merge differently to get
the effect and produces a different commit message with the
default author, rather than the original author.
Change-Id: I39970091d9f7406ae7168b8efaab23a5e2c16bad
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Eclipse has some problem re-running single JUnit tests if
the tests are in Junit 3 format, but the JUnit 4 launcher
is used. This was quite unnecessary and the move was not
completed. We still have no JUnit4 test.
This completes the extermination of JUnit3. Most of the
work was global searce/replace using regular expression,
followed by numerous invocarions of quick-fix and organize
imports and verification that we had the same number of
tests before and after.
- Annotations were introduced.
- All references to JUnit3 classes removed
- Half-good replacement for getting the test name. This was
needed to make the TestRngs work. The initialization of
TestRngs was also made lazily since we can not longer find
out the test name in runtime in the @Before methods.
- Renamed test classes to end with Test, with the exception
of TestTranslateBundle, which fails from Maven
- Moved JGitTestUtil to the junit support bundle
Change-Id: Iddcd3da6ca927a7be773a9c63ebf8bb2147e2d13
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These settings are stored in <prefix>/etc/gitconfig. The C Git
binary is installed in <prefix>/bin, so we look for the C Git
executable to find this location, first by looking at the PATH
environment variable and then by attemting to launch bash as
a login shell to find out.
Bug: 333216
Change-Id: I1bbee9fb123a81714a34a9cc242b92beacfbb4a8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Replace 'method' with 'heap'-based recursion for resolving deltas.
Git packfile delta-chain depth can exceed 50 levels in certain files
(the packfile of the JGit project itself has >800 objects with
chain-length >50). Using method-based recursion on such packfiles will
quickly throw a StackOverflowError on VMs with constrained stack.
Benefits:
* packfile delta-resolution no longer limited by the maximum number
of stack frames permitted on the current thread.
* slight performance improvement
(3% speed increase on the packfile of the JGit project)
Change-Id: I1d9b3a8ba3c6d874d83cb93ebf171c6ab193e6cc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We cannot use SystemReader to get the time, unless we do that consistently,
which is harder to do and be sure we are really testing what we want.
Then we need to update our lastRead variable whenever we conclude that
our file is not racily clean according to lastRead. It may well be clean,
but we do not know that until we check the system clock again.
Finally add a test for this class.
Change-Id: I1894b032b9bd359d1b5325e5472d48e372599e4c
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
If the 'TREE' extension contains an invalid subtree that has
been removed, DirCacheIterator still tried to access it due to
an invalid childCnt field within the parent DirCacheTree object.
This is easy for a user to do, they just need to move all files
out of a subdirectory.
For example, the input for the JUnit test case for this bug was
built using the following C Git sequence:
mkdir -p a/b
touch a/b/c q
git add a/b/c q
git write-tree
git mv a/b/c a/a
After the last step, the subdirectory a/b is empty, as its only
file was moved into the parent directory. Because of the earlier
`git write-tree` operation, there is a 'TREE' extension present, but
the a and a/b subdirectories have been marked invalid by the rename.
When JGit tried to iterate over the a tree, it tried to correct
childCnt to be zero as a/b no longer exists, but it failed to
update childCnt.
Change-Id: I7a0f78fc48a36b1a83252d354618f6807fca0426
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As discussed in
http://egit.eclipse.org/r/#change,2127
we should use paths relative the working directory instead of Files to
notify the caller about conflicts and nondeleted files.
Change-Id: I034c7bd846f0df78d97bc246f38d411f29713dde
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
The CheckoutCommand does not handle names other than local branch
names properly; it must detach HEAD if such a name is encountered (for
example a commit ID or a remote tracking branch).
Change-Id: I5d55177f4029bcc34fc2649fd564b125a2929cc4
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
This is needed by callers to determine checkout conflicts and
possible files that were not deleted during the checkout so that they
can present the end user with a better Exception description and retry
to delete the undeleted files later, respectively.
Change-Id: I037930da7b1a4dfb24cfa3205afb51dc29e4a5b8
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>