github/jgit - jgit - 帆软第三方插件仓库

Commit Graph

Author	SHA1	Message	Date
Shawn O. Pearce	14da6e0b9d	debug-diff-algorithms: Real world performance test implementations When working on a difference algorithm's implementation, its generally more important to care about how it behaves on real-world inputs than it does on fake inputs created for unit test cases. Run each implementation against a number of real-world repositories, looking at changes between files in each commit. This gives a better picture of how a particular algorithm performs. This test suite run against JGit and linux-2.6 with the current available algorithms shows HistogramDiff always out-performs MyersDiff, and by a wide margin on the linux-2.6 sources. As HistogramDiff has similar output properties as PatienceDiff, the resulting edits are probably also more human-readable. These test results show that HistogramDiff is a good choice for the default implementation, and also show that PatienceDiff isn't worth keeping. jgit: start at baa83ae 2686 files, 760 commits N= 3 min lines, 3016 max lines Algorithm Time(ns) ( Time(ns) on Time(ns) on ) ( N=3 N=3016 ) --------------------------------------------------------------------- histogram_myers 314652100 ( 3900 298100 ) histogram 315973000 ( 3800 302100 ) patience 774724900 ( 4500 347900 ) patience_histogram_myers 786332800 ( 3700 351200 ) myers 819359300 ( 4100 379100 ) patience_myers 843416700 ( 3800 348000 ) linux-2.6.git: start at 85a3318 4001 files, 2680 commits N= 2 min lines, 39098 max lines Algorithm Time(ns) ( Time(ns) on Time(ns) on ) ( N=2 N=39098 ) --------------------------------------------------------------------- histogram_myers 1229870000 ( 5900 2642700 ) histogram 1235654100 ( 6000 2695400 ) patience 3856546000 ( 5900 2627700 ) patience_histogram_myers 3866728100 ( 7000 2624000 ) patience_myers 4004875300 ( 8000 2651700 ) myers 9794679000 ( 7200 2716200 ) Change-Id: I2502684d31f7851e720356820d04d8cf767f7229 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Chris Aniszczyk	7429a9a5aa	Merge "Define LowLevelDiffAlgorithm to bypass re-hashing"	14 years ago
Chris Aniszczyk	033ab7f6f0	Merge changes I50dcec81,Ieab28bb3 * changes: Fix empty block corner case in PatienceDiff Fix infinite loop in PatienceDiff	14 years ago
Shawn O. Pearce	4522b07d0f	Fix corrupted large deltas Large objects stored as deltas get unpacked by JGit into a loose object, so they are cheaper to access later on. This unpacking was broken because TeeInputStream copied the wrong length into the loose object, sometimes copying too many bytes into the result. This created a loose object that did not have the correct content, and whose length did not match the length denoted in the object header. Change-Id: I3ce1fd9f3dc5bd195249c7872b3bec49570424a2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	1bd24a23f9	Define LowLevelDiffAlgorithm to bypass re-hashing When passing to a fallback algorithm, we can avoid creating a new copy of the hash codes for each sequence by passing in the hashed sequences directly. This makes it cheaper to switch from HistogramDiff down to MyersDiff in a single pass. Change-Id: Ibf2e81be57c083862eeb134279aed676653bf9b5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	4fc50df97d	Fix empty block corner case in PatienceDiff There is a corner case where we get an EMPTY region during recursion, but we didn't expect to receive that. Its harmless to ignore the region since the region is empty and has no content, so do so rather than throwing an exception Change-Id: I50dcec81ecba763072bb739adfab5879fb48b23a Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	7a0c126d5f	Fix infinite loop in PatienceDiff Certain inputs caused an infinite loop because the prior match data couldn't be used as expected. Rather than incrementing the match pointer before looking at an element, do it after, so the loop breaks when we wrap around to the starting point. Change-Id: Ieab28bb3485a914eeddc68aa38c256f255dd778c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Chris Aniszczyk	782dbfc60f	Update Push to use latest API Change-Id: I57ea8634a46472f40046f4ec69de505abbf5f6cf Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Chris Aniszczyk	ae22630bd8	Merge "Cleanup RefUpdateTest"	14 years ago
Mathias Kinzler	7bdef4583b	Add "Branch" command The need for branching becomes more pressing with pull support: we need to make sure the upstream configuration entries are written correctly when creating and renaming branches (and of course are cleaned up when deleting them). This adds support for listing, adding, deleting and renaming branches including the more common options. Bug: 326938 Change-Id: I00bcc19476e835d6fd78fd188acde64946c1505c Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	1739af643e	Cleanup RefUpdateTest Application code, including unit tests for storage implementations, should not extend RevCommit outside of the scope of using it for a RevWalk. Its a lot of overhead and unlikely to work long-term. Instead for the one test that matters, use a custom subclass of the ObjectId type. This lets us measure exactly what we are looking for, which is that the subclass isn't retained. A lot of other tests were unnecessarily wrapping an object with a RevCommit and storing that back into the RefUpdate. This is just retesting what the earlier no-cache test was doing, and complicated the test considerably. Drop that code and just rely on the value that was configured by the helper method. Change-Id: I5b31813484eaa306e9bc4de9622dd5bd4846b16d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Christian Halstrick	be2ddff6a7	Add support for single-slash URI In bug 323571 it is mentioned that if you call 'toURI().toURL().toString()' on a java.io.File you cannot pass that string to jgit as an URIish. Problem is that the passed URI looks like 'file:/C:/a/b.txt' and that we where expecting double slashes after scheme':'. This fix adds support for this single-slash file URLs. Bug: 323571 Change-Id: I866a76a4fcd0c3b58e0d26a104fc4564e7ba5999 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	14 years ago
Chris Aniszczyk	467f3de0f5	Merge "Update build to use tycho 0.10.0"	14 years ago
Mathias Kinzler	db55d13f5f	Add "Pull" command This is the minimal implementation of a "Pull" command. It does not have any parameters besides the generic progress monitor and timeout. It works on the currently checked-out branch and assumes that the configuration contains the keys "branch.<branch name>.remote" and "branch.<branch name>.merge" to determine the remote configuration for the fetch and the remote branch name for the merge. Bug: 303404 Change-Id: I7fe09029996d0cfc09a7d8f097b5d6af1488fa93 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Christian Halstrick	2160c09dd4	Refactored URI parsing to detect wrong URIs There where quite some bugs regarding wrong URI parsing. In order to solve them the parsing has to be refactored. We now have specialized regexps for 'scheme://host/...', scp URIs and local file names. Now we can detect problems while parsing 'git://host:/abc' which was previously not possible. Bug: 315571 Bug: 292897 Bug: 307017 Bug: 323571 Bug: 317388 Change-Id: If72576576ebb6b9d9dc8b7e51ddd87c9909e8b62 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Christian Halstrick	2136095203	Fixed URI regexp regarding user/password part The regular expression which should handle the user/password part in an URI was potentially processing too many chars. This led to problems when user/pwd and port was specified Change-Id: I87db02494c4b367283e1d00437b1c06d2c8fdd28 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Matthias Sohn	dac4386a6d	Update build to use tycho 0.10.0 Change-Id: Ib3328379841fa79641fe1cd70cd87ee057eefb1a Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Christian Halstrick	cee08c3027	Fix URIish tests to contain a hostname for git protocol URIs for the git protocol have to have a hostname. (see http://www.kernel.org/pub/software/scm/git/docs /git-clone.html#_git_urls_a_id_urls_a) Some tests tested URIs like git:/abc.git which is not allowed. Fixed this. Change-Id: Ia3b8b681ad6592f03b090a874a6e91068a8301fe Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	14 years ago
Christian Halstrick	a1b0ca1807	Introduce commented constants for the segments of an URI regex The regular expressions used to parse URI's are constructed by concatenating different segments to a big String. Introduce String constants for these segements and document them. Change-Id: If8b9dbaaf57ca333ac0b6c9610c3d3a515c540f9 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	14 years ago
Matthias Sohn	784d388c49	Externalize strings in TransportHttp Some strings were not externalized. Also use them in HTTP tests to ensure that they will also succeed when message bundles are translated. Change-Id: Id02717176557e7d57e676e1339cd89f2be88d330 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Matthias Sohn	2b98a878b4	Fix HTTP tests Since `858b2c92` we have a HTTP authentication implementation hence we now get different exception messages when required authentication headers are not available. This broke the HTTP tests. Change-Id: Ie08c1ec37e497c2a6f70a75f7c59f0805812a5cc Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Chris Aniszczyk	7a6efe1dfc	Merge "Support HTTP basic and digest authentication"	14 years ago
Christian Halstrick	0a2b4c1455	Split URI regex strings differently The strings used to construct the regex to parse URIs are split differently. This makes it easier to introduce meaningful String constants later on. Change-Id: I9355fd42e57e0983204465c5d6fe5b6b93655074 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	14 years ago
Chris Aniszczyk	47e9e165b8	Add pull operation related constants Change-Id: Idb7526800e80e17624ec05fb10bbc19e7f744f49 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Chris Aniszczyk	98a41bd4d0	Add PushCommand API Change-Id: Iff144a51fdc9a1112a21492c390a873a2b293bc9 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	7ba31474a3	Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Chris Aniszczyk	b0bfa8044a	Merge "Update Fetch to use FetchCommand API"	14 years ago
Chris Aniszczyk	44b4f458a8	Merge "Add reflog message to TagCommand"	14 years ago
Robin Rosenberg	afedfc2530	Comment the use of System.gc in LocalDiskRepositoryTestCase Change-Id: Ic5e9bda4275006ef3bf6ea6255ddf1c0eecc3770 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	14 years ago
Robin Rosenberg	96f45e35f3	Shut up findbugs/protect the shutdownHook in LocalDiskRepositoryTestcase Singleton references should be protected from multiple threads. As far as we know this cannot happen as JUnit is used today since we currently don't run tests in parallel, but now this code will not prevent anyone. Change-Id: I29109344d2e8025fa2a3ccaf7c2c16469544ce05 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	14 years ago
Shawn O. Pearce	858b2c92e8	Support HTTP basic and digest authentication Natively support the HTTP basic and digest authentication methods by setting the Authorization header without going through the JREs java.net.Authenticator API. The Authenticator API is difficult to work with in a multi-threaded server environment, where its using a singleton for the entire JVM. Instead compute the Authorization header from the URIish user and pass, if available. Change-Id: Ibf83fea57cfb17964020d6aeb3363982be944f87 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Chris Aniszczyk	e5c217bcf3	Merge "Use only a single instance for NLS translation bundles"	14 years ago
Chris Aniszczyk	6b6c8dd01b	Update Fetch to use FetchCommand API Change-Id: I06ddc74f1ef658f4876e2bbcc3eaad3475a5371e Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Chris Aniszczyk	153c796bce	Merge "Update FetchCommand with dry run and thin options"	14 years ago
Robin Rosenberg	65ed25b34e	Return the documented value from DirCacheCheckout.checkout Change-Id: I34d773b18e6a1ee05774d7b9471f9915c48aa63e Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	14 years ago
Christian Halstrick	82d75f31d4	Merge "Extend merge support for bare repositories"	14 years ago
Robin Rosenberg	be9d096986	Use only a single instance for NLS translation bundles As findbugs pointed out, there was a small risk for creating multiple instances of translation bundles. If that happens, drop the second instance. Change-Id: I3aacda86251d511f6bbc2ed7481d561449ce3b6c Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	14 years ago
Shawn O. Pearce	b533a72934	Implement HistogramDiff HistogramDiff is an alternative implementation of patience diff, performing a search over all matching locations and picking the longest common subsequence that has the lowest occurrence count. If there are unique common elements, its behavior is identical to that of patience diff. Actual performance on real-world source files usually beats MyersDiff, sometimes by a factor of 3, especially for complex comparators that ignore whitespace. Change-Id: I1806cd708087e36d144fb824a0e5ab7cdd579d73 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	e7a3e590ed	Reuse DiffPerformanceTest support code to validate algorithms Each algorithm should produce a particular number of results given one of the standard inputs used during the performance tests. To help ensure those tests are accurate, assert that the edit list length is correct. Change-Id: I292f8fde0cec6a60a75ce09e70814a00ca47cb99 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	9bcf391355	Micro-optimize EditList.addAll Pass through the addAll request to our underlying ArrayList. This way the underlying ArrayList grows no more than once during the call, which may be important if the list was originally allocated at the default size of 16, but 64 Edits are being added. Change-Id: I31c3261e895766f82c3c832b251a09f6e37e8860 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Chris Aniszczyk	39734f2908	Update FetchCommand with dry run and thin options FetchCommand was missing the ability to set dry run and thin preferences on the transport operation. Change-Id: I0bef388a9b8f2e3a01ecc9e7782aaed7f9ac82ce Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	af3fbb13f6	debug-text-hashfunctions: Test suite for content hashes This is the test suite I was using to help understand why we had such a high collision rate with RawTextComparator, and to select a replacement function. Since its not something we will run very often, lets make it a program in the debug package rather than a JUnit test. This way we can run it on demand against any corpus of files we choose, but we aren't bottlenecking our daily builds running tests with no assertions. Adding a new hash function to this suite is simple, just define a new instance member of type "Hash" with the logic applied to the region passed in. Change-Id: Iec0b176adb464cf95b06cda157932b79c0b59886 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Dmitry Fink	906887a735	Extend merge support for bare repositories Optional inCore parameter to Resolver/Strategy will instruct it to perform all the operations in memory and avoid modifying working folder even if there is one. Change-Id: I5b873dead3682f79110f58d7806e43f50bcc5045	14 years ago
Shawn O. Pearce	11f99fecfd	Reduce content hash function collisions The hash code returned by RawTextComparator (or that is used by the SimilarityIndex) play an important role in the speed of any algorithm that is based upon them. The lower the number of collisions produced by the hash function, the shorter the hash chains within hash tables will be, and the less likely we are to fall into O(N^2) runtime behaviors for algorithms like PatienceDiff. Our prior hash function was absolutely horrid, so replace it with the proper definition of the DJB hash that was originally published by Professor Daniel J. Bernstein. To support this assertion, below is a table listing the maximum number of collisions that result when hashing the unique lines in each source code file of 3 randomly chosen projects: test_jgit: 931 files; 122 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 418 djb 5 sha1 6 string_hash31 11 test_linux26: 30198 files; 258 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 8675 djb 32 sha1 8 string_hash31 32 test_frameworks_base: 8381 files; 184 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 4615 djb 10 sha1 6 string_hash31 13 We can clearly see that prior_hash performed very poorly, resulting in 8,675 collisions (elements in the same hash bucket) for at least one file in the Linux kernel repository. This leads to some very bad O(N) style insertion and lookup performance, even though the hash table was sized to be the next power-of-2 larger than the total number of unique lines in the file. The djb hash we are replacing prior_hash with performs closer to SHA-1 in terms of having very few collisions. This indicates it provides a reasonably distributed output for this type of input, despite being a much simpler algorithm (and therefore will be much faster to execute). The string_hash31 function is provided just to compare results with, it is the algorithm commonly used by java.lang.String hashCode(). However, life isn't quite this simple. djb produces a 32 bit hash code, but our hash tables are always smaller than 2^32 buckets. Mashing the 32 bit code into an array index used to be done by simply taking the lower bits of the hash code by a bitwise and operator. This unfortuntely still produces many collisions, e.g. 32 on the linux-2.6 repository files. From [1] we can apply a final "cleanup" step to the hash code to mix the bits together a little better, and give priority to the higher order bits as they include data from more bytes of input: test_jgit: 931 files; 122 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 418 djb 5 djb + cleanup 6 test_linux26: 30198 files; 258 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 8675 djb 32 djb + cleanup 7 test_frameworks_base: 8381 files; 184 avg. unique lines/file Algorithm \| Collisions -------------+----------- prior_hash 4615 djb 10 djb + cleanup 7 This is a massive improvement, as the number of collisions for common inputs drops to acceptable levels, and we haven't really made the hash functions any more complex than they were before. [1] http://lkml.org/lkml/2009/10/27/404 Change-Id: Ia753b695de9526a157ddba265824240bd05dead1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	4447d76a41	Fix PatienceDiffTest Because PatienceDiff works by looking for common unique lines within the region, the DiffTestDataGenerator needs to be modified to produce a unique character for each region. If we don't give PatienceDiff a few unique points, it will just offer back a single REPLACE edit that covers the entire files, and this doesn't tell us very much. Change-Id: I5129faea1e763c74739118ca20d86bd62e0deaef Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Chris Aniszczyk	fcc3349cfc	Add reflog message to TagCommand Ensure we update the reflog when tagging. Change-Id: I3f4a4d68cbfc62d2276e3a47e3e3720f02cb2522 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	b60eefb247	Define an abstract DiffAlgorithm test framework For certain tiny input sequences, every DiffAlgorithm should produce exactly the same results, as there should be no ambiguity. Package these up in an abstract TestCase that algorithms can extend from in order to perform basic validation of their implementation. Since these tests are more complete than what we used to have for the MyersDiff algorithm, throw away Johannes' tests and only use this new package. Change-Id: I9a044330887c849ad4c78aa5c7aa04c783c10252 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	857d68d173	Perform common start/end elimination by default for DiffAlgorithm As it turns out, every single diff algorithm we might try to implement can benfit from using the SequenceComparator's native concept of the simple reduceCommonStartEnd() step. For most inputs, there can be a significant number of elements that can be removed from the space the DiffAlgorithm needs to consider, which will reduce the overall running time for the final solution. Pool this logic inside of DiffAlgorithm itself as a default, but permit a specific algorithm to override it when necessary. Convert MyersDiff to use this reduction to reduce the space it needs to search, making it perform slightly better on common inputs. Change-Id: I14004d771117e4a4ab2a02cace8deaeda9814bc1 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	e84d826eb6	Remove unnecessary hash cache from PatienceDiffIndex PatienceDiff always uses a HashedSequence, which promises to provide constant time access for hash codes during the equals method and aborts fast if the hash codes don't match. Therefore we don't need to cache the hash codes inside of the index, saving us memory. Change-Id: I80bf1e95094b7670e6c0acc26546364a1012d60e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	a67afbfee1	Implement Bram Cohen's Patience Diff Change-Id: Ic7a76df2861ea6c569ab9756a62018987912bd13 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago

1 2 3 4 5 ...

767 Commits (14da6e0b9d19ec7e0cb9b3427044e178f751a59c) All Branches Search

767 Commits (14da6e0b9d19ec7e0cb9b3427044e178f751a59c)

All Branches