JGit merge algorithm behaved differently from C Git when
we had adjacent modifications. If line 9 was modified by
OURS and line 10 by theirs then C Git will return a
conflict while JGit was seeing this as independent
modifications. This change is not only there to achieve
compatibility, but there where also some really wrong
merge results produced by JGit in the area of adjacent
modifications.
Change-Id: I8d77cb59e82638214e45b3cf9ce3a1f1e9b35c70
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Introduced similar helper methods than in AbstractDiffTestCase.
Then the test cases are much smaller and better understandable.
Change-Id: I2beb4db5a93bd8c0c1238d5d3039cbd6719eee90
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
When adding a new method near the end of the sequence we want to
show the full method inserted, and not tear the prior method due
to the common trailing curly brace being consumed as part of the
common end region of the sequences.
Bug: 328895
Change-Id: I233bc40445fb5452863f5fb082bc3097433a8da6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This test isn't that useful. The better way to evaluate diff
algorithm performance is to run `jgit debug-diff-algorithms` over
real-world repositories, such as linux-2.6.git. Whenever we modify
an algorithm we should manually verify that its runtime performance
doesn't get any worse than it already is.
Change-Id: I0beed3a5a8a537c958a5a6438a1283f97fa2097a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
HistogramDiff failed on cases where the initial element for the LCS
was actually very common (e.g. has 20 occurrences), and the first
element of the inserted region after the LCS was also common but
had fewer occurrences (e.g. 10), while the LCS also contained a
unique element (1 occurrence).
This happens often in Java source code. The initial element for
the LCS might be the empty line ("\n"), and the inserted but common
element might be "\t/**\n", with the LCS being a large span of
lines that contains unique method declarations. Even though "/**"
occurs less often than the empty line its not a better LCS if the
LCS we already have contains a unique element.
The logic in HistogramDiff would normally have worked fine, except I
tried to optimize scanning of B by making tryLongestCommonSequence
return the end of the region when there are matching elements
found in A. This allows us to skip over the current LCS region,
as it has already been examined, but caused us to fail to identify
an element that had a lower occurrence count within the region.
The solution used here is to trade space-for-time by keeping a
table of A positions to their occurrence counts. This allows the
matching logic to always use the smallest count for this region,
even if the smallest count doesn't appear on the initial element.
The new unit test testEdit_LcsContainsUnique() verifies this new
behavior works as expected.
Bug: 328895
Change-Id: Id170783b891f645b6a8cf6f133c6682b8de40aaf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As described in Bug 328551 there was a bug that the merge algorithm
was not always reporting conflicts when the same line was deleted
and modified. This problem was introduced during commit
0c017188b4 when reported conflicts have
been checked for common pre- and suffixes.
This was fixed here by better determining whether after stripping
off common prefixes and suffixes from a conflicting region there
is still some conflicting part left.
I also added a unit test to test this situation.
Bug: 328551
Change-Id: Iec6c9055d00e5049938484a27ab98dda2577afc4
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
When creating a local branch based on another local branch, the
upstream configuration contains "." as origin and the source branch
as "merge". The PullCommand should support this by skipping the
fetch step altogether and use the base branch to merge with.
Change-Id: I260a1771aeeffca5b0161d1494fd63c672ecc2a6
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Don't permit transient worker threads to access the underlying output
stream of a ProgressMonitor, as they might get marked as the stream's
writer thread. Instead proxy update events from the workers back onto
the application's real work thread. This ensures that the stream only
sees a single thread, and its the thread that will remain alive for
the entire life cycle of the operation.
This fixes IOException("Write end dead") during local repository fetch
when threaded delta search is enabled. One of the transient delta
search threads became the designated writer for the pipe, and when it
terminated the reader end thought the writer was dead, even though the
main writer thread was still executing in PackWriter.
Bug: 326557
Change-Id: I01d1b20a3d7be1c0b480c7fb5c9773c161fe5c15
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Implemented the initial version of a cherry-pick command.
A correct error handling is missing (what happens if the
checkout fails, the cherry-pick leads to conflicts etc).
But straightforward cherry-picks works.
Change-Id: I235c0eb3a7a2d5bdfe40400f1deed06f29d746e1
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since the introduction of HashedSequence we no longer need to supply
the RawTextComparator at the time of constructing a RawText. Drop the
definition from the constructor, because it doesn't make sense as part
of our public API.
Change-Id: Iaab34611d60eee4a2036830142b089b2dae81842
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When an empty line was inserted at the beginning of the common end
part of a RawText the comparator incorrectly considered it to be
common, which meant the DiffAlgorithm would later not even have it be
part of the region it examines. This would cause JGit to skip a line
of insertion, which later confused Gerrit Code Review when it tried to
match up the pre and post RawText files for a difference that had this
type of insertion.
Define two new unit tests to check for this insertion of a blank line
condition and correct for it by removing the LF from the common region
when the condition is detected.
Change-Id: I2108570eb2929803b9a56f9fb9c400c758e7156b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It wrongly uses the full name of the branch to remove the
configuration entries but must use the shortened one.
Change-Id: Ie386a128a6c6beccc20bafd15c2e36254c5f560d
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
When creating a branch with CreateBranchCommand.call() without
specifying an explicit startPoint HEAD should be used as startPoint.
There was a bug leading to an NPE in such a case.
Change-Id: Ic0a5dc1f33a0987d66c09996c8012c45785500ff
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
We wanted to wrap all LowLevel JGit excpetions into a
JGitInternalException so that users of this high-level interface
don't have to explicitly catch all of them. This
was forgotten on BranchCreateCommand.call() and I added
it.
Change-Id: Ie140e99574fb004137c66e80fb92eb6c6d0fa5e1
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
HistogramDiff outperforms it for any case where PatienceDiff needs to
fallback to another algorithm. Consequently it's not worth keeping
around, because we would always want a fallback enabled.
Change-Id: I39b99cb1db4b3be74a764dd3d68cd4c9ecd91481
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The need for branching becomes more pressing with pull
support: we need to make sure the upstream configuration entries
are written correctly when creating and renaming branches
(and of course are cleaned up when deleting them).
This adds support for listing, adding, deleting and renaming
branches including the more common options.
Bug: 326938
Change-Id: I00bcc19476e835d6fd78fd188acde64946c1505c
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Application code, including unit tests for storage implementations,
should not extend RevCommit outside of the scope of using it for a
RevWalk. Its a lot of overhead and unlikely to work long-term.
Instead for the one test that matters, use a custom subclass of the
ObjectId type. This lets us measure exactly what we are looking for,
which is that the subclass isn't retained.
A lot of other tests were unnecessarily wrapping an object with a
RevCommit and storing that back into the RefUpdate. This is just
retesting what the earlier no-cache test was doing, and complicated
the test considerably. Drop that code and just rely on the value that
was configured by the helper method.
Change-Id: I5b31813484eaa306e9bc4de9622dd5bd4846b16d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In bug 323571 it is mentioned that if you call
'toURI().toURL().toString()' on a java.io.File you cannot pass
that string to jgit as an URIish. Problem is that the passed
URI looks like 'file:/C:/a/b.txt' and that we where expecting
double slashes after scheme':'. This fix adds support for this
single-slash file URLs.
Bug: 323571
Change-Id: I866a76a4fcd0c3b58e0d26a104fc4564e7ba5999
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
This is the minimal implementation of a "Pull" command. It does not
have any parameters besides the generic progress monitor and timeout.
It works on the currently checked-out branch and assumes that the
configuration contains the keys "branch.<branch name>.remote" and
"branch.<branch name>.merge" to determine the remote configuration
for the fetch and the remote branch name for the merge.
Bug: 303404
Change-Id: I7fe09029996d0cfc09a7d8f097b5d6af1488fa93
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There where quite some bugs regarding wrong URI parsing. In order
to solve them the parsing has to be refactored. We now have
specialized regexps for 'scheme://host/...', scp URIs and local
file names. Now we can detect problems while parsing 'git://host:/abc' which
was previously not possible.
Bug: 315571
Bug: 292897
Bug: 307017
Bug: 323571
Bug: 317388
Change-Id: If72576576ebb6b9d9dc8b7e51ddd87c9909e8b62
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The regular expression which should handle the
user/password part in an URI was potentially
processing too many chars. This led to problems
when user/pwd and port was specified
Change-Id: I87db02494c4b367283e1d00437b1c06d2c8fdd28
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
URIs for the git protocol have to have a hostname.
(see http://www.kernel.org/pub/software/scm/git/docs
/git-clone.html#_git_urls_a_id_urls_a) Some tests tested
URIs like git:/abc.git which is not allowed. Fixed this.
Change-Id: Ia3b8b681ad6592f03b090a874a6e91068a8301fe
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Projects like org.eclipse.mdt contain large XML files about 6 MiB
in size. So does the Android project platform/frameworks/base.
Doing a clone of either project with JGit takes forever to checkout
the files into the working directory, because delta decompression
tends to be very expensive as we need to constantly reposition the
base stream for each copy instruction. This can be made worse by
a very bad ordering of offsets, possibly due to an XML editor that
doesn't preserve the order of elements in the file very well.
Increasing the threshold to the same limit PackWriter uses when
doing delta compression (50 MiB) permits a default configured
JGit to decompress these XML file objects using the faster
random-access arrays, rather than re-seeking through an inflate
stream, significantly reducing checkout time after a clone.
Since this new limit may be dangerously close to the JVM maximum
heap size, every allocation attempt is now wrapped in a try/catch
so that JGit can degrade by switching to the large object stream
mode when the allocation is refused. It will run slower, but the
operation will still complete.
The large stream mode will run very well for big objects that aren't
delta compressed, and is acceptable for delta compressed objects that
are using only forward referencing copy instructions. Copies using
prior offsets are still going to be horrible, and there is nothing
we can do about it except increase core.streamFileThreshold.
We might in the future want to consider changing the way the delta
generators work in JGit and native C Git to avoid prior offsets once
an object reaches a certain size, even if that causes the delta
instruction stream to be slightly larger. Unfortunately native
C Git won't want to do that until its also able to stream objects
rather than malloc them as contiguous blocks.
Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
HistogramDiff is an alternative implementation of patience diff,
performing a search over all matching locations and picking the
longest common subsequence that has the lowest occurrence count.
If there are unique common elements, its behavior is identical to
that of patience diff.
Actual performance on real-world source files usually beats
MyersDiff, sometimes by a factor of 3, especially for complex
comparators that ignore whitespace.
Change-Id: I1806cd708087e36d144fb824a0e5ab7cdd579d73
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Each algorithm should produce a particular number of results
given one of the standard inputs used during the performance
tests. To help ensure those tests are accurate, assert that
the edit list length is correct.
Change-Id: I292f8fde0cec6a60a75ce09e70814a00ca47cb99
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Because PatienceDiff works by looking for common unique lines within
the region, the DiffTestDataGenerator needs to be modified to produce
a unique character for each region. If we don't give PatienceDiff
a few unique points, it will just offer back a single REPLACE edit
that covers the entire files, and this doesn't tell us very much.
Change-Id: I5129faea1e763c74739118ca20d86bd62e0deaef
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
For certain tiny input sequences, every DiffAlgorithm should produce
exactly the same results, as there should be no ambiguity. Package
these up in an abstract TestCase that algorithms can extend from in
order to perform basic validation of their implementation.
Since these tests are more complete than what we used to have for
the MyersDiff algorithm, throw away Johannes' tests and only use
this new package.
Change-Id: I9a044330887c849ad4c78aa5c7aa04c783c10252
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is a faster exact match based form that tries to improve
performance for the common case of the header and trailer of
a text file not changing at all. After this fast path we use
the slower path based on the super class' using equals() to
allow for whitespace ignore modes to still work.
Some simple performance testing showed a major improvement over the
older implementation for a common edit we see in JGit. The test
compared blob 29a89bc and 372a978, which is the ObjectDirectory.java
file difference in commit 41dd9ed1c0.
The two text files are approximately 22 KiB in size.
DEFAULT old 203900 ns
DEFAULT new 100400 ns
This new version is 2x faster for the DEFAULT comparator, which does
not treat space specially. This is because we can now examine a
larger swath of text with fewer instructions per byte compared. The
older algorithm had to stop at each line break and recompute how to
examine the next line, while the new algorithm only stops when the
first difference is found.
WS_IGNORE_ALL old 298500 ns
WS_IGNORE_ALL new 63300 ns
Its 4.7x faster for the whitespace ignore comparator, as the common
header and footer do not have a whitespace difference. Avoiding the
special case handling for whitespace on each byte considered saves a
lot of time.
Since most edits to source code (and other text like files) appears in
the interior of the file, fast elimination of common header/footer
means faster diff throughput. In the less common case of an actual
header or footer edit, the common header/footer elimination is stopped
rather quickly either way, so there is very little downside to the
optimiation applied here.
Change-Id: I1d501b4c3ff80ed086b20bf12faf51ae62167db7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Adds API for performing git fetch operations.
Change-Id: Idd95664fd4e3bca03211e4ffda3e354849f92a35
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Method test006_readCaseInsensitive in TestConfig already does the
same thing, and doesn't require an OS specific test for the value being
asserted.
This is additionally a fast fix for the failing JUnit test after
change 3fe5276.
Change-Id: I96d2794dbc7db55bdd0fbfcf675aabb15cc8419f
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
In PlotCommitList.enter() commits are positioned on lanes for visual
presentation. This implementation was buggy: commits without
children (often the starting points for the RevWalk) are not positioned
on separate lanes.
The problem was that when handling commits with multiple children
(that's where branches fork out) it was not handled that some of the
children may not have been positioned on a lane yet. I fixed that and
added a number of tests which specifically test the layout of commits
on lanes.
Bug: 300282
Bug: 320263
Change-Id: I267b97ecccb5251cec54cec90207e075ab50503e
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This makes it easier to parametrize DiffFormatter with a different
implementation, as we later plan to add PatienceDiff to JGit.
Change-Id: Id35ef478d5fa20fe10a1ba297f9436fd7adde9ce
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
git allows remotes to be relative paths, but the regex
validating urls wouldn't accept anything starting with "..".
Other functionality works fine with these paths.
Bug: 311300
Change-Id: Ib74de0450a1c602b22884e19d994ce2f52634c77
The core.autocrlf variable can take on three values: false, true,
and input. Parsing it as a boolean is wrong, we instead need to
parse a tri-state enumeration.
Add support for parsing and setting enum values from Java from and
to the text based configuration file, and use that to handle the
autocrlf variable.
Bug: 301775
Change-Id: I81b9e33087a33d2ef2eac89ba93b9e83b7ecc223
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of making the sequence itself responsible for the equivalence
function, use an external function that is supplied by the caller.
This cleans up the code because we now say cmp.equals(a, ai, b, bi)
instead of a.equals(ai, b, bi).
This refactoring also removes the odd concept of creating different
types of sequences to have different behaviors for whitespace
ignoring. Instead DiffComparator now supports singleton functions
that apply a particular equivalence algorithm to a type of sequence.
Change-Id: I559f494d81cdc6f06bfb4208f60780c0ae251df9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When checkReferencedIsReachable is set in ReceivePack we are trying
to prove that the push client is permitted to access an object that
it did not send to us, but that the received objects link to either
via a link inside of an object (e.g. commit parent pointer or tree
member) or by a delta base reference.
To do this check we are making a list of every potential delta base,
and then ensuring that every delta base used appears on this list.
If a delta base does not appear on this list, we abort with an error,
letting the client know we are missing a particular object.
Preventing spurious errors about missing delta base objects requires
us to use the exact same list of potential delta bases as the remote
push client used. This means we must use TOPO ordering, and we
need to enable BOUNDARY sorting so that ObjectWalk will correctly
include any trees found during the enumeration back to the common
merge base between the interesting and uninteresting heads.
To ensure JGit's own push client matches this same potential delta
base list, we need to undo 60aae90d4d ("Disable topological
sorting in PackWriter") and switch back to using the conventional
TOPO ordering for commits in a pack file. This ensures that our
own push client will use the same potential base object list as
checkReferencedIsReachable uses on the receiving side.
Change-Id: I14d0a326deb62a43f987b375cfe519711031e172
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the copy instruction was larger than the input buffer given to us,
we copied the wrong part of the base stream during the next read().
This occurred on really big binary files where a copy instruction
of 64k wasn't unreasonable, but the caller's buffer was only 8192
bytes long. We copied the first 8192 bytes correctly, but then
reseeked the base stream back to the start of the copy region on
the second read of 8192 bytes. Instead of a sequence like ABCD
being read into the caller, we read AAAA.
Change-Id: I240a3f722a3eda1ce8ef5db93b380e3bceb1e201
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Exposing isEmpty, getLengthA, getLengthB make it easier to examine
the state of an edit and work with it from higher level code.
The before and after cut routines make it easy to split an edit
that contains another edit, such as to decompose a REPLACE that
contains a common sequence within it.
Change-Id: Id63d6476a7a6b23acb7ab237d414a0a1a7200290
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We shouldn't escape non-special ASCII characters such as '@' or '~'.
These are valid in a path name on POSIX systems, and may appear as
part of a path in a GNU or Git style patch script. Escaping them
into octal just obfuscates the user's intent, with no gain.
When parsing an escaped octal sequence, we must parse no more
than 3 digits. That is, "\1002" is actually "@2", not the Unicode
character \u0202.
Change-Id: I3a849a0d318e69b654f03fd559f5d7f99dd63e5c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If we get an exception while indexing the incoming pack, its likely
a stream corruption. We already report an error to the client, but
we eat the stack trace, which makes debugging issues related to a
bug inside of JGit nearly impossible. Rethrow it under a new type
UnpackException, so embedding servers or applications can catch the
error and provide it to a human who might be able to forward such
traces onto a JGit developer for evaluation.
Change-Id: Icad41148bbc0c76f284c7033a195a6b51911beab
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Applications just want a quick way to configure our diff
implementation, and then just want to use it without a lot of fuss.
Move all of the rename detection logic and path following logic
out of our pgm package and into DiffFormatter itself, making it
much easier for a GUI to take advantage of the features without
duplicating a lot of code.
Change-Id: I4b54e987bb6dc804fb270cbc495fe4cae26c7b0e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>