github/jgit - jgit - 帆软第三方插件仓库

Commit Graph

Author	SHA1	Message	Date
Shawn O. Pearce	461b012e95	PackWriter: Support reuse of entire packs The most expensive part of packing a repository for transport to another system is enumerating all of the objects in the repository. Once this gets to the size of the linux-2.6 repository (1.8 million objects), enumeration can take several CPU minutes and costs a lot of temporary working set memory. Teach PackWriter to efficiently reuse an existing "cached pack" by answering a clone request with a thin pack followed by a larger cached pack appended to the end. This requires the repository owner to first construct the cached pack by hand, and record the tip commits inside of $GIT_DIR/objects/info/cached-packs: cd $GIT_DIR root=$(git rev-parse master) tmp=objects/.tmp-$$ names=$(echo $root \| git pack-objects --keep-true-parents --revs $tmp) for n in $names; do chmod a-w $tmp-$n.pack $tmp-$n.idx touch objects/pack/pack-$n.keep mv $tmp-$n.pack objects/pack/pack-$n.pack mv $tmp-$n.idx objects/pack/pack-$n.idx done (echo "+ $root"; for n in $names; do echo "P $n"; done; echo) >>objects/info/cached-packs git repack -a -d When a clone request needs to include $root, the corresponding cached pack will be copied as-is, rather than enumerating all of the objects that are reachable from $root. For a linux-2.6 kernel repository that should be about 376 MiB, the above process creates two packs of 368 MiB and 38 MiB[1]. This is a local disk usage increase of ~26 MiB, due to reduced delta compression between the large cached pack and the smaller recent activity pack. The overhead is similar to 1 full copy of the compressed project sources. With this cached pack in hand, JGit daemon completes a clone request in 1m17s less time, but a slightly larger data transfer (+2.39 MiB): Before: remote: Counting objects: 1861830, done remote: Finding sources: 100% (1861830/1861830) remote: Getting sizes: 100% (88243/88243) remote: Compressing objects: 100% (88184/88184) Receiving objects: 100% (1861830/1861830), 376.01 MiB \| 19.01 MiB/s, done. remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844) Resolving deltas: 100% (1564621/1564621), done. real 3m19.005s After: remote: Counting objects: 1601, done remote: Counting objects: 1828460, done remote: Finding sources: 100% (50475/50475) remote: Getting sizes: 100% (18843/18843) remote: Compressing objects: 100% (7585/7585) remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510) Receiving objects: 100% (1861830/1861830), 378.40 MiB \| 31.31 MiB/s, done. Resolving deltas: 100% (1559477/1559477), done. real 2m2.938s Repository owners can periodically refresh their cached packs by repacking their repository, folding all newer objects into a larger cached pack. Since repacking is already considered to be a normal Git maintenance activity, this isn't a very big burden. [1] In this test $root was set back about two weeks. Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	71f168fcd7	PackWriter: Display totals after sending objects CGit pack-objects displays a totals line after the pack data was fully written. This can be useful to understand some of the decisions made by the packer, and has been a great tool for helping to debug some of that code. Track some of the basic values, and send it to the client when packing is done: remote: Counting objects: 1826776, done remote: Finding sources: 100% (55121/55121) remote: Getting sizes: 100% (25654/25654) remote: Compressing objects: 100% (11434/11434) remote: Total 1861830 (delta 3926), reused 1854705 (delta 38306) Receiving objects: 100% (1861830/1861830), 386.03 MiB \| 30.32 MiB/s, done. Change-Id: If3b039017a984ed5d5ae80940ce32bda93652df5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	8f63dface2	PackWriter: Correct 'Compressing objects' progress message The first 'Compressing objects' progress message is wrong, its actually PackWriter looking up the sizes of each object in the ObjectDatabase, so objects can be sorted correctly in the later type-size sort that tries to take advantage of "Linus' Law" to improve delta compression. Rename the progress to say 'Getting sizes', which is an accurate description of what it is doing. Change-Id: Ida0a052ad2f6e994996189ca12959caab9e556a3 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Mathias Kinzler	b15b9d5df2	Proper handling of rebase during pull After consulting with Christian Halstrick, it turned out that the handling of rebase during pull was implemented incorrectly. Change-Id: I40f03409e080cdfeceb21460150f5e02a016e7f4 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>	14 years ago
Christian Halstrick	c62882191f	Introduce metaData compare between working tree and index entries Instead of offering only a high-level isModified() method a new method compareMetadata() is introduced which compares a working tree entry and a index entry by looking at metadata only. Some use-cases (e.g. computing the content-id in idBuffer()) may use this new method instead of isModified(). Change-Id: I4de7501d159889fbac5ae6951f4fef8340461b47 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Matthias Sohn	c45f2aec56	File utility for creating a new empty file The java.io.File.createNewFile() method for creating new empty files reports failure by returning false. To ease proper checking of return values provide a utility method wrapping createNewFile() throwing IOException on failure. Change-Id: I42a3dc9d8ff70af62e84de396e6a740050afa896 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Sasa Zivkov	1993cf8a27	Merging Git notes Merging Git notes branches has several differences from merging "normal" branches. Although Git notes are initially stored as one flat tree the tree may fanout when the number of notes becomes too large for efficient access. In this case the first two hex digits of the note name will be used as a subdirectory name and the rest 38 hex digits as the file name under that directory. Similarly, when number of notes decreases a fanout tree may collapse back into a flat tree. The Git notes merge algorithm must take into account possibly different tree structures in different note branches and must properly match them against each other. Any conflict on a Git note is, by default, resolved by concatenating the two conflicting versions of the note. A delete-edit conflict is, by default, resolved by keeping the edit version. The note merge logic is pluggable and the caller may provide custom note merger that will perform different merging strategy. Additionally, it is possible to have non-note entries inside a notes tree. The merge algorithm must also take this fact into account and will try to merge such non-note entries. However, in case of any merge conflicts the merge operation will fail. Git notes merge algorithm is currently not trying to do content merge of non-note entries. Thanks to Shawn Pearce for patiently answering my questions related to this topic, giving hints and providing code snippets. Change-Id: I3b2335c76c766fd7ea25752e54087f9b19d69c88 Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Robin Rosenberg	b3e59bd9d6	Implement a revert command This is almost reverted cherry-pick, and the implementation is almost identical. It orders the input to merge differently to get the effect and produces a different commit message with the default author, rather than the original author. Change-Id: I39970091d9f7406ae7168b8efaab23a5e2c16bad Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	14 years ago
Robin Rosenberg	797ebba307	Add support for getting the system wide configuration These settings are stored in <prefix>/etc/gitconfig. The C Git binary is installed in <prefix>/bin, so we look for the C Git executable to find this location, first by looking at the PATH environment variable and then by attemting to launch bash as a login shell to find out. Bug: 333216 Change-Id: I1bbee9fb123a81714a34a9cc242b92beacfbb4a8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	14 years ago
Matthias Sohn	c6ca443b61	File utilities for creating directories The java.io.File methods for creating directories report failure by returning false. To ease proper checking of return values provide utility methods wrapping mkdir() and mkdirs() which throw IOException on failure. Also fix the tests to store test data under a trash folder and cleanup after test. Change-Id: I09c7f9909caf7e25feabda9d31e21ce154e7fcd5 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Mathias Kinzler	6bca46e168	Implement rebase --continue and --skip For --continue, the Rebase command asserts that there are no unmerged paths in the current repository. Then it checks if a commit is needed. If yes, the commit message and author are taken from the author_script and message files, respectively, and a commit is performed before the next step is applied. For --skip, the workspace is reset to the current HEAD before applying the next step. Includes some tests and a refactoring that extracts Strings in the code into constants. Change-Id: I72d9968535727046e737ec20e23239fe79976179 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	14 years ago
Matthias Sohn	e22f9552a8	Provide file utilities for file deletion Provide file helper methods in a reusable utility class to replace many local implementations. java.io.File has some methods reporting failure by returning false. We prefer to throw IOException on failure so that callers can't forget checking the return value. Change-Id: I430c77b5d2cffcf8b47584326ad4817a7291845e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Christian Halstrick	7e298c9ed5	Add more tests for rebase and externalized missing Strings Coverage tests showed that we are missing to test certain areas in the rebase command. Add the missing tests. Change-Id: Ia4a272d26cde7e1861dac30496e4b6799fc8187a Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>	14 years ago
Chris Aniszczyk	923443f94f	Add CheckoutCommand Add the ability to checkout a branch to the working tree. Bug: 330860 Change-Id: Ie06b9e799a9e1be384da0b8996efa7209b32eac3 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Chris Aniszczyk	f7690cceef	Add RmCommand to Git API Bug: 330827 Change-Id: I0b74bb92254d0ee988139d25022d06d16ed89d58 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Mathias Kinzler	e5b96a7848	Initial implementation of a Rebase command This is a first iteration to implement Rebase. At the moment, this does not implement --continue and --skip, so if the first conflict is found, the only option is to --abort the command. Bug: 328217 Change-Id: I24d60c0214e71e5572955f8261e10a42e9e95298 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	308e074f65	Enable providing credentials for HTTP authentication This change is based on http://egit.eclipse.org/r/#change,1652 by David Green. The change adds the concept of a CredentialsProvider which can be registered for git transports and which is responsible to return credential-related data like passwords and usernames. Whenenver the transports detects that an authentication with certain credentials has to be done it will ask the CredentialsProvider for this data. Foreseen implementations for such a Provider may be a EGitCredentialsProvider (caching credential data entered e.g. in the Clone-Wizzard) or a NetRcProvider (gathering data out of ~/.netrc file). Bug: 296201 Change-Id: Ibe13e546b45eed3e193c09ecb414bbec2971d362 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com> Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: David Green <dgreen99@gmail.com>	14 years ago
Mathias Kinzler	7668a46282	PullCommand: support upstream configuration for local branches When creating a local branch based on another local branch, the upstream configuration contains "." as origin and the source branch as "merge". The PullCommand should support this by skipping the fetch step altogether and use the base branch to merge with. Change-Id: I260a1771aeeffca5b0161d1494fd63c672ecc2a6 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>	14 years ago
Christian Halstrick	9b4876cedf	Add Cherry-Pick command Implemented the initial version of a cherry-pick command. A correct error handling is missing (what happens if the checkout fails, the cherry-pick leads to conflicts etc). But straightforward cherry-picks works. Change-Id: I235c0eb3a7a2d5bdfe40400f1deed06f29d746e1 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Mathias Kinzler	7bdef4583b	Add "Branch" command The need for branching becomes more pressing with pull support: we need to make sure the upstream configuration entries are written correctly when creating and renaming branches (and of course are cleaned up when deleting them). This adds support for listing, adding, deleting and renaming branches including the more common options. Bug: 326938 Change-Id: I00bcc19476e835d6fd78fd188acde64946c1505c Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Mathias Kinzler	db55d13f5f	Add "Pull" command This is the minimal implementation of a "Pull" command. It does not have any parameters besides the generic progress monitor and timeout. It works on the currently checked-out branch and assumes that the configuration contains the keys "branch.<branch name>.remote" and "branch.<branch name>.merge" to determine the remote configuration for the fetch and the remote branch name for the merge. Bug: 303404 Change-Id: I7fe09029996d0cfc09a7d8f097b5d6af1488fa93 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Matthias Sohn	784d388c49	Externalize strings in TransportHttp Some strings were not externalized. Also use them in HTTP tests to ensure that they will also succeed when message bundles are translated. Change-Id: Id02717176557e7d57e676e1339cd89f2be88d330 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Chris Aniszczyk	98a41bd4d0	Add PushCommand API Change-Id: Iff144a51fdc9a1112a21492c390a873a2b293bc9 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	7ba31474a3	Increase core.streamFileThreshold default to 50 MiB Projects like org.eclipse.mdt contain large XML files about 6 MiB in size. So does the Android project platform/frameworks/base. Doing a clone of either project with JGit takes forever to checkout the files into the working directory, because delta decompression tends to be very expensive as we need to constantly reposition the base stream for each copy instruction. This can be made worse by a very bad ordering of offsets, possibly due to an XML editor that doesn't preserve the order of elements in the file very well. Increasing the threshold to the same limit PackWriter uses when doing delta compression (50 MiB) permits a default configured JGit to decompress these XML file objects using the faster random-access arrays, rather than re-seeking through an inflate stream, significantly reducing checkout time after a clone. Since this new limit may be dangerously close to the JVM maximum heap size, every allocation attempt is now wrapped in a try/catch so that JGit can degrade by switching to the large object stream mode when the allocation is refused. It will run slower, but the operation will still complete. The large stream mode will run very well for big objects that aren't delta compressed, and is acceptable for delta compressed objects that are using only forward referencing copy instructions. Copies using prior offsets are still going to be horrible, and there is nothing we can do about it except increase core.streamFileThreshold. We might in the future want to consider changing the way the delta generators work in JGit and native C Git to avoid prior offsets once an object reaches a certain size, even if that causes the delta instruction stream to be slightly larger. Unfortunately native C Git won't want to do that until its also able to stream objects rather than malloc them as contiguous blocks. Change-Id: Ief7a3896afce15073e80d3691bed90c6a3897307 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	b533a72934	Implement HistogramDiff HistogramDiff is an alternative implementation of patience diff, performing a search over all matching locations and picking the longest common subsequence that has the lowest occurrence count. If there are unique common elements, its behavior is identical to that of patience diff. Actual performance on real-world source files usually beats MyersDiff, sometimes by a factor of 3, especially for complex comparators that ignore whitespace. Change-Id: I1806cd708087e36d144fb824a0e5ab7cdd579d73 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Matthias Sohn	048d7342df	Remove duplicate resource bundle entry Change-Id: Ifdf9fa5dd49bc3f4a0cc8a1ed505d77ec3fa526b Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Chris Aniszczyk	bbabc19e2f	Add FetchCommand Adds API for performing git fetch operations. Change-Id: Idd95664fd4e3bca03211e4ffda3e354849f92a35 Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	9f61c615e8	Support core.autocrlf = input The core.autocrlf variable can take on three values: false, true, and input. Parsing it as a boolean is wrong, we instead need to parse a tri-state enumeration. Add support for parsing and setting enum values from Java from and to the text based configuration file, and use that to handle the autocrlf variable. Bug: 301775 Change-Id: I81b9e33087a33d2ef2eac89ba93b9e83b7ecc223 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	9239c10385	ReceivePack: Rethrow exceptions caught during indexing If we get an exception while indexing the incoming pack, its likely a stream corruption. We already report an error to the client, but we eat the stack trace, which makes debugging issues related to a bug inside of JGit nearly impossible. Rethrow it under a new type UnpackException, so embedding servers or applications can catch the error and provide it to a human who might be able to forward such traces onto a JGit developer for evaluation. Change-Id: Icad41148bbc0c76f284c7033a195a6b51911beab Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Marc Strapetz	253b36d27a	Partial support for index file format "3". Extended flags are processed and available via DirCacheEntry's new isSkipWorkTree() and isIntentToAdd() methods. "resolve-undo" information is completely ignored since its an optional extension. Change-Id: Ie6e9c6784c9f265ca3c013c6dc0e6bd29d3b7233	14 years ago
Shawn O. Pearce	e6bd689d2c	Improve LargeObjectException reporting Use 3 different types of LargeObjectException for the 3 major ways that we can fail to load an object. For each of these use a unique string translation which describes the root cause better than just the ObjectId.name() does. Change-Id: I810c98d5691b74af9fc6cbd46fc9879e35a7bdca Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	1709800f27	Undo translation of protocol string 'unpack error' This string is part of the network protocol, and isn't meant to be translated into another language. Clients actually scan for the string "unpack error " off the wire and react magically to this information. If it were translated, they would instead have a protocol exception, which isn't very useful when there is already an error occurring. Change-Id: Ia5dc8d36ba65ad2552f683bb637e80b77a7d92f0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Chris Aniszczyk	f54e883566	Add TagCommand A tag command is added to the Git porcelain API. Tests were also added to stress test the tag command. Change-Id: Iab282a918eb51b0e9c55f628a3396ff01c9eb9eb Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	14 years ago
Christian Halstrick	2059ed205e	Implement a Dircache checkout (needed for merge) Implementation of a checkout (or 'git read-tree') operation which works together with DirCache. This implementation does similar things as WorkDirCheckout which main problem is that it works with deprecated GitIndex. Since GitIndex doesn't support multiple stages of a file which is required in merge situations this new implementation is required to enable merge support. Change-Id: I13f0f23ad60d98e5168118a7e7e7308e066ecf9c Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Shawn O. Pearce	c44495fa2f	Complete an abbreviation when formatting a patch If we are given a DiffEntry header that already has abbreviated ObjectIds on it, we may still be able to resolve those locally and output the difference. Try to do that through the new resolve API on ObjectReader. Change-Id: I0766aa5444b7b8fff73620290f8c9f54adc0be96 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	edd8029558	Add setLength(long) to DirCacheEntry Applications should favor the long style interface, especially when their source input is a long type, e.g. coming from java.io.File. This way when the index format is later changed to support a larger file size than 2 GiB we can handle it by just changing the entry code, and not need to fix a lot of applications. Change-Id: I332563caeb110014e2d544dc33050ce67ae9e897 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	b85af06324	Allow object reuse selection to occur in parallel ObjectReader implementations may wish to use multiple threads in order to evaluate object reuse faster. Let the reader make that decision by passing the iteration down into the reader. Because the work is pushed into the reader, it may need to locate a given ObjectToPack given its ObjectId. This can easily occur if the reader has sent a list of ObjectIds to the object database and gets back information keyed only by ObjectId, without the ObjectToPack handle. Expose lookup using the PackWriter's own internal map, so the reader doesn't need to build a redundant copy to track the assocation of ObjectId back to ObjectToPack. Change-Id: I0c536405a55034881fb5db92a2d2a99534faed34 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Shawn O. Pearce	707912b35d	Make Tag class only for writing The Tag class now only supports the creation of an annotated tag object. To read an annotated tag, applictions should use RevTag. This permits us to have exactly one implementation, and RevTag's is faster and more bug-free. Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	14 years ago
Christian Halstrick	75c9b24385	Enhance MergeResult to report conflicts, etc The MergeResult class is enhanced to report more data about a three-way merge. Information about conflicts and the base, ours, theirs commits can be retrived. Change-Id: Iaaf41a1f4002b8fe3ddfa62dc73c787f363460c2 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>	14 years ago
Mathias Kinzler	6e59e6dab9	Meaningful error message when trying to check-out submodules Currently, a NullPointerException occurs in this case. We should instead throw a more meaningful Exception with a proper message. This is a very "stupid" implementation which simply checks for the existence of a ".gitmodules" file. Bug: 300731 Bug: 306765 Bug: 308452 Bug: 314853 Change-Id: I155aa340a85cbc5d7d60da31dba199fc30689b67 Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>	15 years ago
Jeff Schumacher	396fe6da45	Break dissimilar file pairs during diff File pairs that are very dissimilar during a diff were not being broken apart into their constituent ADD/DELETE pairs. The leads to sub-optimal rename detection. Take, for example, this situation: A file exists at src/a.txt containing "foo". A user renames src/a.txt to src/b.txt, then adds a new src/a.txt containing "bar". Even though the old a.txt and the new b.txt are identical, the rename detection algorithm would not detect it as a rename since it was already paired in a MODIFY. I added code to split all MODIFYs below a certain score into their constituent ADD/DELETE pairs. This allows situations like the one I described above to be more correctly handled. Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	0b46e70155	Fix infinite loop in IndexPack A programming error using the Inflater API led to an infinite loop within IndexPack, caused by the Inflater returning 0 from the inflate() method, but it didn't want more input. This happens when it has reached the end of the stream, or has reached a spot asking for an external dictionary. Such a case is a failure for us, and we should abort out. Thanks to Alex for pointing out that we had 3 implementations of the inflate rountine, which should be consolidated into one and use a switch to determine where to load data from. Bug: 317416 Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Stefan Lay	233e0130b5	Git Porcelain API: Add Command The new Add command adds files to the Git Index. It uses the DirCache to access the git index. It works also in case of an existing conflict. Fileglobs (e.g. *.c) are not yet supported. The new Add command does add ignored files because there is no gitignore support in jgit yet. Bug: 318440 Change-Id: If16fdd4443e46b27361c2a18ed8f51668af5d9ff Signed-off-by: Stefan Lay <stefan.lay@sap.com>	15 years ago
Robin Rosenberg	d787a82e50	Internationalize RepositoryState descriptions Change-Id: I104cd62f3e89acf010b1d40a2b08e7f68f63bb85 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	15 years ago
Shawn O. Pearce	8a0c58394d	log: Add whitespace ignore options Similar to what we did with diff, implement whitespace ignore options for log too. This requires us to define some means of creating any RawText object type at will inside of DiffFormatter, so we define a new factory interface to construct RawText instances on demand. Unfortunately we have to copy the entire block of common options. args4j only processes the options/arguments on the one command class and Java doesn't support multiple inheritance. Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	5be90be996	Redo DiffFormatter API to be easier to use Passing around the OutputStream and the Repository is crazy. Instead put the stream in the constructor, since this formatter exists only to output to the stream, and put the repository as a member variable that can be optionally set. Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	978535b090	Implement similarity based rename detection Content similarity based rename detection is performed only after a linear time detection is performed using exact content match on the ObjectIds. Any names which were paired up during that exact match phase are excluded from the inexact similarity based rename, which reduces the space that must be considered. During rename detection two entries cannot be marked as a rename if they are different types of files. This prevents a symlink from being renamed to a regular file, even if their blob content appears to be similar, or is identical. Efficiently comparing two files is performed by building up two hash indexes and hashing lines or short blocks from each file, counting the number of bytes that each line or block represents. Instead of using a standard java.util.HashMap, we use a custom open hashing scheme similiar to what we use in ObjecIdSubclassMap. This permits us to have a very light-weight hash, with very little memory overhead per cell stored. As we only need two ints per record in the map (line/block key and number of bytes), we collapse them into a single long inside of a long array, making very efficient use of available memory when we create the index table. We only need object headers for the index structure itself, and the index table, but not per-cell. This offers a massive space savings over using java.util.HashMap. The score calculation is done by approximating how many bytes are the same between the two inputs (which for a delta would be how much is copied from the base into the result). The score is derived by dividing the approximate number of bytes in common into the length of the larger of the two input files. Right now the SimilarityIndex table should average about 1/2 full, which means we waste about 50% of our memory on empty entries after we are done indexing a file and sort the table's contents. If memory becomes an issue we could discard the table and copy all records over to a new array that is properly sized. Building the index requires O(M + N log N) time, where M is the size of the input file in bytes, and N is the number of unique lines/blocks in the file. The N log N time constraint comes from the sort of the index table that is necessary to perform linear time matching against another SimilarityIndex created for a different file. To actually perform the rename detection, a SxD matrix is created, placing the sources (aka deletions) along one dimension and the destinations (aka additions) along the other. A simple O(S x D) loop examines every cell in this matrix. A SimilarityIndex is built along the row and reused for each column compare along that row, avoiding the costly index rebuild at the row level. A future improvement would be to load a smaller square matrix into SimilarityIndexes and process everything in that sub-matrix before discarding the column dimension and moving down to the next sub-matrix block along that same grid of rows. An optional ProgressMonitor is permitted to be passed in, allowing applications to see the progress of the detector as it works through the matrix cells. This provides some indication of current status for very long running renames. The default line/block hash function used by the SimilarityIndex may not be optimal, and may produce too many collisions. It is borrowed from RawText's hash, which is used to quickly skip out of a longer equality test if two lines have different hash functions. We may need to refine this hash in the future, in order to minimize the number of collisions we get on common source files. Based on a handful of test commits in JGit (especially my own recent rename repository refactoring series), this rename detector produces output that is very close to C Git. The content similarity scores are sometimes off by 1%, which is most probably caused by our SimilarityIndex type using a different hash function than C Git uses when it computes the delta size between any two objects in the rename matrix. Bug: 318504 Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	08d349a27b	amend commit: Refactor repository construction to builder class During code review, Alex raised a few comments about commit `532421d989` ("Refactor repository construction to builder class"). Due to the size of the related series we aren't going to go back and rebase in something this minor, so resolve them as a follow-up commit instead. Change-Id: Ied52f7a8f7252743353c58d20bfc3ec498933e00 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Jeff Schumacher	cb8e1e6014	Added a preliminary version of rename detection JGit does not currently do rename detection during diffs. I added a class that, given a TreeWalk to iterate over, can output a list of DiffEntry's for that TreeWalk, taking into account renames. This class only detects renames by SHA1's. More complex rename detection, along the lines of what C Git does will be added later. Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04	15 years ago
Shawn O. Pearce	71aace52f7	Simplify ObjectLoaders coming from PackFile We no longer need an ObjectLoader to be lazy and try to delay the materialization of the object content. That was done only to support PackWriter searching for a good reuse candidate. Instead, simplify the code base by doing the materialization immediately when the loader asks for it, because any caller asking for the loader is going to need the content. Change-Id: Id867b1004529744f234ab8f9cfab3d2c52ca3bd0 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago

... 4 5 6 7 8

363 Commits (5e0eca69432c4a382ab6b0d08728f52b1e305a30)