The diff algorithm which is used by Merge, Cherry-Pick, Rebase
should be configurable. A new configuration parameter "diff.algorithm"
is introduced which currently accepts the values "myers" or
"histogram". Based on this parameter for example the ResolveMerger
will choose a diff algorithm. The reason for this is bug 331078.
This bug shows that JGit is more compatible with C Git when
histogram diff is in place. But since histogram diff is quite new we
need an easy way to fall back to Myers diff.
Bug: 331078
Change-Id: I2549c992e478d991c61c9508ad826d1a9e539ae3
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
Coverage tests showed that we are missing to test certain areas
in the rebase command. Add the missing tests.
Change-Id: Ia4a272d26cde7e1861dac30496e4b6799fc8187a
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
The implementation delegates to the CheckoutCommand and
therefore only supports some of the options supported by
the CheckoutCommand.
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Add the ability to checkout a branch to the working tree.
Bug: 330860
Change-Id: Ie06b9e799a9e1be384da0b8996efa7209b32eac3
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There was a bug introduced by commit 0e815fe. For non-versioned files
the merge algorithm detected an incoming deletion from THEIRS.
Consequently such files were deleted. That's a severe bug which was
fixed by more precisely detecting incoming deletions.
Change-Id: I4385d3c990db11d62e371a385dc8ee89841db84a
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Philipp Thun <philipp.thun@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since we have the RmCommand API now, update Rm to use it.
Change-Id: I6e2cb37573cc8a29846f01e09e8c07e0dc279dbe
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
This is a first iteration to implement Rebase. At the moment, this
does not implement --continue and --skip, so if the first
conflict is found, the only option is to --abort the command.
Bug: 328217
Change-Id: I24d60c0214e71e5572955f8261e10a42e9e95298
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Instead of copying up to 4 fields from the parent iterator each time a
child iterator is initialized and used, construct a single state
object that contains the 4 fields, and pass that one state object
through to the child. This makes it easier to add additional state
fields that must be inherited, at the slight expense of an extra
object allocation per TreeWalk, and an extra level of field
indirection whenever the options, nameEncoder, or read buffer is
required by the iterator.
Change-Id: Ic4603c33b772d7a45f9c81140537d51945688fcb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Naming these inner classes ensures that stack traces which contain
them will give us useful information about which filter is involved in
the trace, rather than the generated names $1, $2, etc. This makes it
much easier to understand a stack trace at a glance.
Change-Id: Ia6a75fdb382ff6461e02054d94baf011bdeee5aa
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In the case where DirCacheCheckout was used to checkout a tree
without taking HEAD into account (e.g. during a clone or hard reset)
we didn't handle conflicts correctly. E.g. if there are conflicts
(entries with stage != 0) in the index and we tried to hard reset
we have been processing the conflicting pathes multiple times (once
for every stage). With this fix we will update the index with the
entry from the "merge" state (the state we want checkout) when we
detect existing conflicts.
Change-Id: Iffbddccaa588cf0d1460a5e44dabaf540d996e26
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
* rename-detection:
RenameDetector: Only scan deletes if adds exist
SimilarityRenameDetector: Initialize sizes to 0
SimilarityRenameDetector: Avoid allocating source index
SimilarityRenameDetector: Only attempt to index large files once
SimilarityIndex: Don't overflow internal counter fields
SimilarityIndex: Accept files larger than 8 MB
SimilarityIndex: Correct comment explaining the logic
* fs-fsync:
Remove unnecessary flush calls from LockFile
Remove unnecessary region locking from LockFile
Support core.fsyncRefFiles option
Support core.fsyncObjectFiles option
Simplify LockFile write(ObjectId) case
Rewrite the initialization of the encoding tables to be more clear,
but slightly slower to setup. We generally perfer a clear definition
of the data over a slightly slower class load time.
Change-Id: I0c7f89b6ab82dcf71525ffb69a388c312c195913
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since we have already modified this class to localize an error
message, we might as well strip it down to contain only the
functionality we need, or might ever use.
To keep this simple to review we don't adjust formatting right
away, so code that was buried inside of an if or else block whose
condition was removed might not have the correct indentation anymore.
We can fix this with a later reformatting change.
Change-Id: I2996aaa704e9d6182e5500c7a63240d5e9d722cc
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When DirCacheCheckout should be used to checkout only one
tree (reset --hard, clone) then we had to use the standard
constructor and specify null as value for head. This change
adds explicit constructors not taking HEAD and documents
that.
Bug: 330021
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Fanout level notes trees are combined back together into a flat leaf
level tree if during a removal of a subtree there are less than 3/4 of
the fanout subtrees still existing, and the size of the combined leaf
is under the 256 split limit noted above.
This rule is used because deletes are less common than insertions, and
SHA-1's relatively uniform distribution suggests that with only 192
subtrees existing in the fanout, there should be approximately 192
names in the combined replacement leaf tree.
Change-Id: Ia9d145ffd5454982509fc40906bc4dbbf2b13952
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Leaf level notes trees are split into a new fan-out tree if an
insertion occurs and the tree already contains >= 256 notes in it.
The splitting may occur multiple times if all of the notes have the
same prefix; in the worst case this produces a tree path such as
"00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/00/be" if all
of the notes begin with zeros.
Change-Id: I2d7d98f35108def9ec49936ddbdc34b13822a3c7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some algorithms need to be able to iterate through all notes within a
particular bucket, such as when splitting or combining a bucket.
Exposing an Iterator<Note> makes this traversal possible.
For a LeafBucket the iteration is simple, its over the sorted array of
elements. For FanoutBucket its a bit more complex as the iteration
needs to union the iterators of each fanout bucket, lazily loading any
buckets that aren't already in-memory.
Change-Id: I3d5279b11984f44dcf0ddb14a82a4b4e51d4632d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is necessary to allow applications to wrap the note tree in
a commit and update the note branch with the new state.
Change-Id: Idbd7ead4a1b16ae2b64a30a4a01a29cfed548cdf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
NoteMap now supports editing in-memory, allowing applications to
modify the NoteMap once it has been loaded from the branch. The
ability to write the branch back to tree objects is not yet done,
so the edits are strictly transient.
Change-Id: I63448954abfca2a8e3e95369cd84c0d1176cdb79
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The lock file protocol relies on the atomic creation of a standardized
name in the parent directory of the file being updated. Since the
creation is atomic, at most one thread in any process can succeed on
this creation, and all others will fail. While the lock file exists,
that file is private to the thread that is writing it, and no others
will attempt to read or modify the file.
Consequently the use of the region level locks around the file are
unnecessary, and may actually reduce performance when using NFS, SMB,
or some other sort of remote filesystem that supports locking.
Change-Id: Ice312b6fb4fdf9d36c734c3624c6d0537903913b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If core.fsyncRefFiles is set to true, fsync is used whenever a
reference file is updated, ensuring the file contents are also
written to disk. This can help to prevent empty ref files after
a system crash when using a filesystem such as HFS+ where data
writes may be delayed.
Change-Id: Ie508a974da50f63b0409c38afe68772322dc19f1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some repositories may be on really unstable filesystems, but still
want to have good reliability when objects are written to disk. If
core.fsyncObjectFiles is set to true, request the JVM to ensure the
data is written before returning success to the caller of insert.
The option defaults to false because it should be useless on any
filesystem that orders writes and metadata, such as ext3 mounted with
data=ordered (or data=journal). But it may be useful on some systems
(especially HFS+) where file content may flush to the disk
independently of filesystem structure changes.
Because FileChannel.force(boolean) only claims to ensure data is
written if it was written using the write(ByteBuffer) method of
FileChannel, redirect all writes when using fsyncObjectFiles to go
through the FileChannel interface instead of through the older style
OutputStream interface. This may not be necessary on all JVMs, but
its more portable to follow the definition than the common behavior.
Change-Id: I57f6b6bb7e403c07fbae989dbf3758eaf5edbc78
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If there are only deletes, don't need perform rename or copy
detection. There are no adds (aka destinations) for the deletes
to match against.
Change-Id: I00fb90c509fa26a053de561dd8506cc1e0f5799a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Setting the array elements to -1 is more expensive than relying on
the allocator to zero the array for us first. Shifting the code to
always add 1 to the size (so an empty file is actually 1 byte long)
allows us to detect an unloaded size by comparing to 0, thus saving
the array fill calls.
Change-Id: Iad859e910655675b53ba70de8e6fceaef7cfcdd1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the only file added is really small, and all of the deleted
files are really big, none of the permutations will match up due
to the sizes being too far apart to fit the current rename score.
Avoid allocating the really big deleted SimilarityIndex by deferring
its construction until at least one add along that row has a
reasonable chance of matching it.
This avoids expending a lot of CPU time looking at big deleted
binary files when a small modified text file was broken due to a
high percentage of changed lines.
Change-Id: I11ae37edb80a7be1eef8cc01d79412017c2fc075
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a file fails to index the first time the loop encounters it, the
file is likely to fail to index again on the next row. Rather than
wasting a huge amount of CPU to index it again and fail, remember
which destination files failed to index and skip over them on each
subsequent row.
Because this condition is very unlikely, avoid allocating the BitSet
until its actually needed. This keeps the memory usage unaffected
for the common case.
Change-Id: I93509b28b61a9bba8f681a7b4df4c6127bca2a09
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The counter portion of each pair is only 32 bits wide, but is part
of a larger 64 bit integer. If the file size was larger than 4 GB
the counter could overflow and impact the key, changing the hash,
and later resulting in an incorrect similarity score.
Guard against this overflow condition by capping the count for each
record at 2^32-1. If any record contains more than that many bytes
the table aborts hashing and throws TableFullException.
This permits the index to scan and work on files that exceed 4 GB
in size, but only if the file contains more than one unique block.
The index throws TableFullException on a 4 GB file containing all
zeros, but should succeed on a 6 GB file containing unique lines.
The index now uses a 64 bit accumulator during the common scoring
algorithm, possibly resulting in slower summations. However this
index is already heavily dependent upon 64 bit integer operations
being efficient, so increasing from 32 bits to 64 bits allows us
to correctly handle 6 GB files.
Change-Id: I14e6dbc88d54ead19336a4c0c25eae18e73e6ec2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Files bigger than 8 MB (2^23 bytes) tended to overflow the internal
hashtable, as the table was capped in size to 2^17 records. If a
file contained 2^17 unique data blocks/lines, the table insertion
got stuck in an infinite loop as the able couldn't grow, and there
was no open slot for the new item.
Remove the artifical 2^17 table limit and instead allow the table
to grow to be as big as 2^30. With a 64 byte block size, this
permits hashing inputs as large as 64 GB.
If the table reaches 2^30 (or cannot be allocated) hashing is
aborted. RenameDetector no longer tries to break a modify file pair,
and it does not try to match the file for rename or copy detection.
Change-Id: Ibb4d756844f4667e181e24a34a468dc3655863ac
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This comment was wrong, due to a copy-and-paste error. Here the
code is looking at records of dst that do not exist in src, and
are skipping past them to find another match.
Change-Id: I07c1fba7dee093a1eeffcf7e0c7ec85446777ffb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In order to safely edit a notes tree, NoteMap needs to retain any
non-note tree entries it read from the source tree and put them
back out into the modified tree when it commits a new version of
the note branch.
Remember any tree entries that didn't look like a note during
the parsing of the tree, so they can be put into a TreeFormatter
later when the tree writes to the repository.
Change-Id: Ia284af7e7866da35db35374c6c5869f00c857944
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of reading a note tree recursively up front when the NoteMap
is loaded, read only the root tree and load subtrees on demand when
they are accessed by the application. This gives a lower latency
to read a note for the recent commits on a branch, as only the paths
that are needed get read.
Given a 2/38 style fanout, the tree will fully load when 256 objects
have been accessed by the application. But unlike the prior version
of NoteMap, the NoteMap will load faster and answer lookups sooner,
as the loading time for all 256 levels is spread out across each of
the get() requests.
Given a 2/2/36 style fanout, the tree won't need to fully load until
about 65,536 objects are accessed.
To simplify the implementation we only support the flat layout (all
notes in the top level tree), or a 2/38, 2/2/36, 2/2/2/34, through
2/.../2 style fanout. Unlike C Git we don't support reading the old
experimental 4/36 fanout. This is sufficient because C Git won't
create the 4/36 style fanout when creating or updating a notes tree,
and there really aren't any in the wild today.
Change-Id: I6099b35916a8404762f31e9c11f632e43e0c1bfd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The NoteMap makes it easy to read a small notes tree as created by
the `git notes` command in C Git. To make the initial implementation
simple a notes tree is read recursively into a map in memory.
This is reasonable if the application will need to access all notes,
or if there are less than 256 notes in the tree, but doesn't behave
well when the number of notes exceeds 256 and the application
doesn't need to access all of them.
We can later add support for lazily loading different subpaths,
thus fixing the large note tree problem described above.
Currently the implementation only supports reading. Writing notes
is more complex because trees need to be expanded or collapsed at
the exact 256 entry cut-off in order to retain the same tree SHA-1
that C Git would use for the same content. It also needs to retain
non-note tree entries such as ".gitignore" or ".gitattribute" files
that might randomly appear within a notes tree. We can also add
writing support later.
Change-Id: I93704bd84ebf650d51de34da3f1577ef0f7a9144
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>