Update the ObjectReuseAsIs API to support creating new
ObjectToPack with only the AnyObjectId and Git object type. This is
needed to support the future pack index bitmaps, which only contain
this information and do not want the overhead of creating a temporary
object for every ObjectId.
Change-Id: I906360b471412688bf429ecef74fd988f47875dc
CloneCommand has been creating fetch refspecs like this on bare clones:
[remote "origin"]
url = ssh://example.com/my-repo.git
fetch = +refs/heads/*:refs/heads//*
As you can see, the destination ref pattern has a superfluous slash.
It looks like this behaviour has always been the case for CloneCommand,
at least since cc2197ed when code catering to bare-clone fetch refspecs
was added. That was released with JGit v1.0 almost 2 years ago, so
there will probably be some bare repos in the wild which will have been
cloned with JGit and have these corrupted refspecs.
The effect of the corrupted fetch refspec is quite interesting. Up to
and including JGit 2.0, the corrupt refspec was tolerated and fetches
would work as intended with no indication to the user that anything was
amiss. With JGit 2.1, a change was introduced which made JGit less
tolerant, and fetches now attempt to update the non-existing ref
"refs/heads//master". No exception is raised, but the real ref -
"refs/heads/master" - is not updated.
This behaviour was noticed by a user of Agit (which does bare clones by
default and recently updated from JGit v2.0 to v2.2), reported here:
https://github.com/rtyley/agit/issues/92
If you run C-Git fetch on a bare-repo cloned by JGit, it flat-out
rejects the refspec (checked against v1.7.10.4):
fatal: Invalid refspec '+refs/heads/*:refs/heads//*'
Incidentally, C-Git does not create an explicit fetch refspec at all
when performing a bare clone - the full remote config generated by C-Git
looks like this:
[remote "origin"]
url = ssh://example.com/my-repo.git
Using JGit on such a repository works fine, so omitting the fetch
refspec entirely is also an option.
Change-Id: I14b0d359dc69b8908f68e02cea7a756ac34bf881
No longer invoke the expensive RefDatabase.isNameConflicting() check on
updating existing refs, reducing batch ref update time by ~97%.
The RefDirectory implementation of isNameConflicting() is quite
slow (it has to do an expensive loose-ref scan) but it's only necessary
to perform this check on ref update if the ref is being *created* - if
the ref already exists, we can already guarantee that it does not
conflict with any other refs.
C-Git seems to use a similar condition before making the
is_refname_available() check:
https://github.com/git/git/blob/v1.8.1.4/refs.c#L1660-L1670
As an example of the effects on performance, here's a simple timing
experiment using The BFG to remove one file from the JGit repo:
---
$ wget http://repo1.maven.org/maven2/com/madgag/bfg-repo-cleaner/1.0.1/bfg-1.0.1.jar
$ git clone --mirror https://git.eclipse.org/r/p/jgit/jgit.git
$ java -jar bfg-1.0.1.jar -D make_jgit.sh jgit.git
....
Updating references: 100% (5760/5760)
...Ref update completed in 148,949 ms.
BFG run is complete!
---
The execution time for the run is completely dominated by the batch ref
update at the end. Repeating the experiment with BFG v1.0.2 (using JGit
patched with this change), the refs update is dramatically reduced:
---
Updating references: 100% (5760/5760)
...Ref update completed in 4,327 ms.
---
Change-Id: I9057bc4ee22f9cc269b1cc00c493841c71527cd6
Previously a PackFile class was assumed to only support a .pack and .idx
file. Update the constructor to enumerate the supported extensions for
the pack file. This will allow the bitmap code to only be executed if
the bitmap extension file is known to exist.
Change-Id: Ie59041dffec5f60d7ea2771026ffd945106bd4bf
When a lot of commits are added to DateRevQueue, the
sort-on-insertion approach is very heavy on CPU cycles.
One approach to fix this was made by Dave Borowitz:
https://git.eclipse.org/r/#/c/5491/
But using Java's PriorityQueue seems to have brought some
extra overhead, and the desired performance could not be
reached.
This fix takes another approach to the insertion problem,
without changing the expected behaviour or bringing extra
memory overhead:
If we detect over 1000 commits in the DateRevQueue, a
"seek-index" is rebuilt every 1000th added commit.
The index keeps track of every 100th commit in the
DateRevQueue. During insertions, it will be used for a
preliminary scanning (binary search) of the queue, with
the intention of helping add() find a good starting point
to start walking from. After finding this starting point,
add() will step commit-by-commit until the correct
insertion place in the queue is found (today, the queue
is expected to be sorted at all times).
When applied to repositories with many refs, this approach
has proven to bring huge performance gains and scales quite
well.
For instance, in a repository with close to 80000 refs,
we could cut down the time a typical Gerrit replication
of 1 commit would take (just a push from JGit's point of
view) from 32sec down to 3.5sec.
Below you see some typical times to add a specific amount
of commits (with random commit times) to the DateRevQueue
and the difference the preliminary seek-index makes:
Commits | Index | No Index
1024 8ms 8ms
2048 13ms 9ms
4096 5ms 59ms
8192 11ms 595ms
16384 22ms 3058ms
32768 64ms 13811ms
65536 201ms 62677ms
131072 783ms 331585ms
Only one extra reference is needed for every 100 inserted
commits (and only when we see more than 1000 commits in
the queue), so the memory overhead should be negligible.
Various index-stepping values were tested, and 100 seemed to
scale very well and be effective from start.
In the future, it should probably be dynamic and based on
the number of refs in the queue, but this should serve well
as a starting point.
Note: While other fundamentally different data structures may
be more suitable, the DateRevQueue is extremely central to
many of the Git core operations. This approach was chosen,
since the effect of the patch is easy to predict in conjuction
with the current implementation. A totally new data structure
will make it harder to predict behaviour in many common and
uncommon cases (in terms of breaking ties, memory usage, cost
when using few elements, object creation/disposing overhead,
etc).
Change-Id: Ie7b99f40eacf6324bfb4716d82073adeda64d10f
Extend ResolveMerger with RecursiveMerger to merge two tips
that have up to 200 bases.
Bug: 380314
CQ: 6854
Change-Id: I6292bb7bda55c0242a448a94956f2d6a94fddbaa
Also-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <zx@twitter.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* stable-2.3:
Prepare 2.3.2-SNAPSHOT builds
JGit v2.3.1.201302201838-r
Accept Change-Id even if footer contains not well-formed entries
Fix false positives in hashing used by PathFilterGroup
Change-Id: I5882aa3b482d6bcd40a45bed51e5ab03f018a5bc
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Instead of only looking for a Change-Id in the last section if it
consists only of well-formed "key: value" lines replace the last
occurrence of a valid Change-Id line in the last section. Some tools
require footer lines e.g. without a colon.
Gerrit doesn't accept Change-Id lines in the footer if the Change-Id
line doesn't start at the beginning of the line.
Bug: 400818
Change-Id: Icce54872adc8c566994beea848448a2f7ca87085
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The ByteArraySet failed to check the length of the entry correctly leading
to matches where no match should be.
Bug: 401249
Change-Id: I925bc48d9cafcdf13e1a797bb09fc2555eb270c5
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
These imports are unused since commit
cb349da017
Change-Id: I74ea2a17bf4976d9c74255500e5deeff18208e87
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
* stable-2.3:
Prepare post 2.3.0.201302130906 builds
JGit v2.3.0.201302130906
Replace explicit version by property where possible
Add better documentation to DirCacheCheckout
Prepare post 2.3rc1 builds
JGit v2.3.0.201302060400-rc1
Change-Id: I5a7626014dc38e7623937a4241dbf8a6db05c1f9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This variable has been populated and never used ever since it was
introduced in v0.4.9~336 (Add "jgit clone", 2008-12-23). Remove it
to make the function easier to understand.
Change-Id: Idb7eb80bc236a20f7385ad2d6141b4d1c5c3f1cc
There is a huge performance issue when using both JGit (EGit) and Git
because JGit does not fill all dircache stat fields with the values Git
would expect. As a result thereof Git would typically revalidate a large
number of tracked files. This can take several minutes for large
repositories with many large files.
Since 1.8.2 Git will restrict stat checking to the size and whole second
part of the modification time stamp, if core.statinfo is set to
"minimal".
As JGit checks only size and modification time this is close to what
JGit already does. To make the match perfect ignore the sub-second part
of the modification time stamp if core.statinfo = minimal.
Change-Id: I8eaff1858a891571075a86db043f9d80da3d7503
This has the same logic as isNameConflicting, but instead of only
returning a boolean, it returns a collection of names that conflict.
It will be used in EGit to provide a better message to the user when
validating a ref name, see Ibea9984121ae88c488858b8a8e73b593195b15e0.
Existing implementations of isNameConflicting could be rewritten like
this:
return !getConflictingNames(name).isEmpty();
But I'm not sure about that, as isNameConflicting can be implemented in
a faster way than getConflictingNames.
Change-Id: I11e0ba2f300adb8b3612943c304ba68bbe73db8a
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Update 3rd party dependencies to respective latest approved version.
args4j 2.0.21 is not yet available on Maven central, hence compile
against 2.0.12 and package 2.0.21 until 2.0.21 has been published on
Maven central.
Change-Id: I41a34485970af41b4b5b2404e3d29c98979ddb48
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Instead of the complicated strange stuff, implement staah
apply as cherry-pick.
Provided there are no conflicts and it is requested that
the index should be applied, perform yet another cherry-pick,
but discard tha results thereof it that would result in conflicts.
Bug: 376035
Change-Id: I553f3a753e0124b102a51f8edbb53ddeff2912e2
In order to be able to determine the range of the first header line
(e.g. "diff --git a/file1 b/file2") in subclasses, the code that formats
the first header line is extracted.
Required by egit's change: Ia61398146c0336ab332234f24d341561292554db
Change-Id: I9dd5eb964ed8b6869745c3162159b7425ac2c44a
Signed-off-by: Tobias Pfeifer <to.pfeifer@sap.com>