Implementation delegates all work to the AddCommand class and,
therefore, supports only those options currently supported by the
AddCommand which means: --update and the filepattern... arguments.
Change-Id: I4827d37e08b4c988c2458d9ba60a61b6ad414d10
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
The diff command in the pgm package was enhanced to allow
choosing the diff algorithm (currently myers or histogram)
Change-Id: I72083e78fb5c92868eb5d8ec512277d212a39349
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
3rd party packages that use repository types other than FileRepository
may wish to extend our pgm package and implement their own resolution
scheme for repository "names" that are passed in by the --git-dir
command line option. Make that possible by allowing the package to
extend the Main class and override the lookup.
This is primarily useful when developing new storage implementations
and trying to experiment with the results, without linking all of it
into the core JGit package.
Change-Id: Id30e168da16341e5da43365688a63aa30c7b7e2c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Command names like MakeCacheTree weren't coming up with hyphens between
every word, so they read "debug-make-cachetree" rather than the
expected "debug-make-cache-tree". On each lowercase character reset
the lastWasDash flag so the next uppercase will insert a hyphen before
the next word.
Change-Id: I539fabb339e60896165619c307dec71e3317b0d8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
HistogramDiff outperforms it for any case where PatienceDiff needs to
fallback to another algorithm. Consequently it's not worth keeping
around, because we would always want a fallback enabled.
Change-Id: I39b99cb1db4b3be74a764dd3d68cd4c9ecd91481
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When working on a difference algorithm's implementation, its generally
more important to care about how it behaves on real-world inputs than
it does on fake inputs created for unit test cases. Run each
implementation against a number of real-world repositories, looking at
changes between files in each commit. This gives a better picture of
how a particular algorithm performs.
This test suite run against JGit and linux-2.6 with the current
available algorithms shows HistogramDiff always out-performs
MyersDiff, and by a wide margin on the linux-2.6 sources. As
HistogramDiff has similar output properties as PatienceDiff, the
resulting edits are probably also more human-readable. These test
results show that HistogramDiff is a good choice for the default
implementation, and also show that PatienceDiff isn't worth keeping.
jgit: start at baa83ae
2686 files, 760 commits
N= 3 min lines, 3016 max lines
Algorithm Time(ns) ( Time(ns) on Time(ns) on )
( N=3 N=3016 )
---------------------------------------------------------------------
histogram_myers 314652100 ( 3900 298100 )
histogram 315973000 ( 3800 302100 )
patience 774724900 ( 4500 347900 )
patience_histogram_myers 786332800 ( 3700 351200 )
myers 819359300 ( 4100 379100 )
patience_myers 843416700 ( 3800 348000 )
linux-2.6.git: start at 85a3318
4001 files, 2680 commits
N= 2 min lines, 39098 max lines
Algorithm Time(ns) ( Time(ns) on Time(ns) on )
( N=2 N=39098 )
---------------------------------------------------------------------
histogram_myers 1229870000 ( 5900 2642700 )
histogram 1235654100 ( 6000 2695400 )
patience 3856546000 ( 5900 2627700 )
patience_histogram_myers 3866728100 ( 7000 2624000 )
patience_myers 4004875300 ( 8000 2651700 )
myers 9794679000 ( 7200 2716200 )
Change-Id: I2502684d31f7851e720356820d04d8cf767f7229
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is the test suite I was using to help understand why we had
such a high collision rate with RawTextComparator, and to select
a replacement function.
Since its not something we will run very often, lets make it a
program in the debug package rather than a JUnit test. This way
we can run it on demand against any corpus of files we choose,
but we aren't bottlenecking our daily builds running tests with
no assertions.
Adding a new hash function to this suite is simple, just define
a new instance member of type "Hash" with the logic applied to
the region passed in.
Change-Id: Iec0b176adb464cf95b06cda157932b79c0b59886
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Allow our command line commands like Glog, Log to accept the
--all option to walk all known refs.
Change-Id: I6a0c84fc19e7fa80ddaa2315851c58ba89d43ca5
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
We used the wrong format method, which lead to this confusing output:
$ ./jgit clone git://...
Initialized empty Git repository in {0}
remote: Counting objects: 201783
...
remote: {0}
We need to use MessageFormat.format() as the message translations
use {0} syntax and not %s syntax for placeholders.
Change-Id: I8bf0fd3f7dbecf9edf47419c46aed0493d405f9e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of making the sequence itself responsible for the equivalence
function, use an external function that is supplied by the caller.
This cleans up the code because we now say cmp.equals(a, ai, b, bi)
instead of a.equals(ai, b, bi).
This refactoring also removes the odd concept of creating different
types of sequences to have different behaviors for whitespace
ignoring. Instead DiffComparator now supports singleton functions
that apply a particular equivalence algorithm to a type of sequence.
Change-Id: I559f494d81cdc6f06bfb4208f60780c0ae251df9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We weren't flushing the commit message before the diff output, which
meant the headers and message showed randomly interleaved with the
diff rather than immediately before.
Change-Id: I6cefab8d40e9d40c937e9deb12911188fec41b26
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Similar to C Git, default our difference when no trees are given
to us to something that makes a tiny bit of sense to the human.
We also now support the --cached flag, and have its meaning work the
same way as C Git.
Change-Id: I2f19dad4e018404e280ea3e95ebd448a4b667f59
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Applications just want a quick way to configure our diff
implementation, and then just want to use it without a lot of fuss.
Move all of the rename detection logic and path following logic
out of our pgm package and into DiffFormatter itself, making it
much easier for a GUI to take advantage of the features without
duplicating a lot of code.
Change-Id: I4b54e987bb6dc804fb270cbc495fe4cae26c7b0e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Create a new 'org.eclipse.jgit.api.errors' package to contain
exceptions related to using the Git porcelain API.
Change-Id: Iac1781bd74fbd520dffac9d347616c3334994470
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
This reverts commit db4c516f67 since
it breaks compatibility with Eclipse 3.5 which can no longer import
the projects
Bug: 323390
Change-Id: I3cc91364a6747cfcb4c611a9be5258f81562f726
Updates the project level settings to run the formatter
on save on only on the edited lines.
Change-Id: I26dd69d0c95e6d73f9fdf7031f3c1dbf3becbb79
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
PersonIdent should be parsable for an invalid commit which
contains multiple authors, like "A <a@a.org>, B <b@b.org>".
PersonIdent(String) constructor now delegates to
RawParseUtils.parsePersonIdent().
Change-Id: Ie9798d36d9ecfcc0094ca795f5a44b003136eaf7
ObjectReader implementations are now responsible for creating the
unique abbreviation of an ObjectId, or for resolving an abbreviation
back to its full form. In this latter case the reader can offer up
multiple candidates to the caller, who may be able to disambiguate
them based on context.
Repository.resolve() doesn't take multiple candidates into account
right now, but it could in the future by looking for a remaining
^0 or ^{commit} suffix and take an expansion if there is only one
commit that matches the input abbreviation. It could also use
the distance from an annotated tag to resolve "tag-NNN-gcommit"
style strings that are often output by `git describe`.
Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These objects should be responsible for their own formatting,
rather than delegating it to some obtuse type called ObjectInserter.
While we are at it, simplify the way we insert these into a database.
Passing in the type and calling format in application code turned
out to be a huge mistake in terms of ease-of-use of the insert API.
Change-Id: Id5bb95ee56aa2a002243e9b7853b84ec8df1d7bf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since these types no longer support reading, calling them a Builder
is a better description of what they do. They help the caller to
build a commit or a tag object.
Change-Id: I53cae5a800a66ea1721b0fe5e702599df31da05d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
WorkingTreeIterator now optionally performs CRLF to LF conversion for
text files. A basic framework is left in place to support enabling
(or disabling) this feature based on gitattributes, and also to
support the more generic smudge/clean filter system. As there is
no gitattribute support yet in JGit this is left unimplemented,
but the mightNeedCleaning(), isBinary() and filterClean() methods
will provide reasonable places to plug that into in the future.
[sp: All bugs inside of WorkingTreeIterator are my fault, I wrote
most of it while cherry-picking this patch and building it on
top of Marc's original work.]
CQ: 4419
Bug: 301775
Change-Id: I0ca35cfbfe3f503729cbfc1d5034ad4abcd1097e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectReader implementations may wish to use multiple threads in
order to evaluate object reuse faster. Let the reader make that
decision by passing the iteration down into the reader.
Because the work is pushed into the reader, it may need to locate a
given ObjectToPack given its ObjectId. This can easily occur if the
reader has sent a list of ObjectIds to the object database and gets
back information keyed only by ObjectId, without the ObjectToPack
handle. Expose lookup using the PackWriter's own internal map,
so the reader doesn't need to build a redundant copy to track the
assocation of ObjectId back to ObjectToPack.
Change-Id: I0c536405a55034881fb5db92a2d2a99534faed34
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Tag class now only supports the creation of an annotated tag
object. To read an annotated tag, applictions should use RevTag.
This permits us to have exactly one implementation, and RevTag's
is faster and more bug-free.
Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Commit class now only supports the creation of a commit object.
To read a commit, applictions should use RevCommit. This permits
us to have exactly one implementation, and RevCommit's is faster
and more bug-free.
Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When RevObject overrode equals() to provide only reference equality
we used to need to convert a RevObject into an ObjectId by copy()
just to use standard Java tools like JUnit assertEquals(), or to
use contains() or get() on standard java.util collection types.
Now that we have removed this override and made ObjectId's equals()
final (preventing any of this mess in the future), some copy()
calls are unnecessary. Anytime the value is being used as an input
to a lookup routine, or to an equals, we can avoid the copy().
However we still want to use copy() anytime we are given an ObjectId
that may exist long-term, where we don't want the high cost of the
additional storage from a RevCommit extension. So we can't remove
all uses of copy(), just some of them.
Change-Id: Ief275dace435c0ddfa362ac8e5d93558bc7e9fc3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
TreeWalk calls this value "path", while "name" is the stuff after the
last slash. FileHeader should do the same thing to be consistent.
Rename getOldName to getOldPath and getNewName to getNewPath.
Bug: 318526
Change-Id: Ib2e372ad4426402d37939b48d8f233154cc637da
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The version builtin should be able to run without a git directory to call
it from wherever you want.
Change-Id: I1a3bce662e6788b860a275ee50315af8d5cc094a
Signed-off-by: Benjamin Muskalla <bmuskalla@eclipsesource.com>
When we are creating a pack the higher level application should be able
to override the PackConfig used, allowing it to control the number of
threads used or how much memory is allocated per writer.
Change-Id: I47795987bb0d161d3642082acc2f617d7cb28d8c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
LockFile.commit fails if another thread concurrently reads
the base file. The problem is fixed by retrying the rename
operation if it fails.
Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb
Bug: 308506
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
This is a horribly crude application, it doesn't even verify that
the object its dumping is delta encoded. Its method of getting the
delta is pretty abusive to the public PackWriter API, because right
now we don't want to expose the real internal low-level methods
actually required to do this.
Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The FollowFilter can be installed on a RevWalk to cause the path
to be updated through rename detection when the affected file is
found to be added to the project.
The filter works reasonably well, for example we can follow the
history of the fsck command in git-core:
$ jgit log --name-status --follow builtin/fsck.c | grep ^R
R100 builtin-fsck.c builtin/fsck.c
R099 fsck.c builtin-fsck.c
R099 fsck-objects.c fsck.c
R099 fsck-cache.c fsck-objects.c
Change-Id: I4017bcfd150126aa342fdd423a688493ca660a1f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Similar to what we did with diff, implement whitespace ignore options
for log too. This requires us to define some means of creating any
RawText object type at will inside of DiffFormatter, so we define a
new factory interface to construct RawText instances on demand.
Unfortunately we have to copy the entire block of common options.
args4j only processes the options/arguments on the one command class
and Java doesn't support multiple inheritance.
Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Passing around the OutputStream and the Repository is crazy. Instead
put the stream in the constructor, since this formatter exists only to
output to the stream, and put the repository as a member variable that
can be optionally set.
Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Implement rename detection in the command line diff and log commands.
Also support --name-status, -p and -U flags, as these can be quite
useful to view more detail.
All of the Git patch file formatting code is now moved over to the
DiffFormatter class. This permits us to reuse it in any context,
including inside of IDEs.
Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
During code review, Alex raised a few comments about commit
532421d989 ("Refactor repository construction to builder class").
Due to the size of the related series we aren't going to go back
and rebase in something this minor, so resolve them as a follow-up
commit instead.
Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of creating the DirCache from a static factory method, use
an instance method on Repository, permitting the implementation to
override the method with a completely different type of DirCache
reading and writing. This would better support a repository in the
cloud strategy, or even just an in-memory unit test environment.
Change-Id: I6399894b12d6480c4b3ac84d10775dfd1b8d13e7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Added a check in Diff to ensure that files that are most likely
not text are not line-by-line diffed. Files are determined to be
binary by checking the first 8000 bytes for a null character. This
is a similar heuristic to what C Git uses.
Change-Id: I2b6f05674c88d89b3f549a5db483f850f7f46c26
Update a number of calling sites of RevWalk to ensure the walker's
internal ObjectReader is released after the walk is no longer used.
Because the ObjectReader is likely to hold onto a native resource
like an Inflater, we don't want to leak them outside of their
useful scope.
Where possible we also try to share ObjectReaders across several
walk pools, or between a walker and a PackWriter. This permits
the ObjectReader to actually do some caching if it felt inclined
to do so.
Not everything was updated, we'll probably need to come back and
update even more call sites, but these are some of the biggest
offenders. Test cases in particular aren't updated. My plan is to
move most storage-agnostic tests onto some purely in-memory storage
solution that doesn't do compression.
Change-Id: I04087ec79faeea208b19848939898ad7172b6672
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>