github/jgit - jgit - 帆软第三方插件仓库

Commit Graph

Author	SHA1	Message	Date
Shawn O. Pearce	6e155d5f41	Merge branch 'js/rename' * js/rename: Implemented file path based tie breaking to exact rename detection Added more test cases for RenameDetector Added very small optimization to exact rename detection Fixed Misleading Javadoc Added file path similarity to scoring metric in rename detection Fixed potential div by zero bug Added file size based rename detection optimization Create FileHeader from DiffEntry log: Implement --follow Cache the diff configuration section log: Add whitespace ignore options Format submodule links during differences Redo DiffFormatter API to be easier to use log, diff: Add rename detection support Implement similarity based rename detection Added a preliminary version of rename detection Refactored code out of FileHeader to facilitate rename detection	15 years ago
Shawn O. Pearce	0b46e70155	Fix infinite loop in IndexPack A programming error using the Inflater API led to an infinite loop within IndexPack, caused by the Inflater returning 0 from the inflate() method, but it didn't want more input. This happens when it has reached the end of the stream, or has reached a spot asking for an external dictionary. Such a case is a failure for us, and we should abort out. Thanks to Alex for pointing out that we had 3 implementations of the inflate rountine, which should be consolidated into one and use a switch to determine where to load data from. Bug: 317416 Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Jeff Schumacher	31311cacfd	Implemented file path based tie breaking to exact rename detection During the exact rename detection phase in RenameDetector, ties were resolved on a first-found basis. I added support for file path based tie breaking during that phase. Basically, there are four situations that have to be handled: One add matching one delete: In this simple case, we pair them as a rename. One add matching many deletes: Find the delete whos path matches the add the closest, and pair them as a rename. Many adds matching one delete: Similar to the above case, we find the add that matches the delete the closest, and pair them as a rename. The other adds are marked as copies of the delete. Many adds matching many deletes: Build a scoring matrix similar to the one used for content- based matching, scoring instead by file path. Some of the utility functions in SimilarityRenameDetector are used in this case, as we use the same encoding scheme. Once the matrix is built, scan it for the best matches, marking them as renames. The rest are marked as copies. I don't particularly like the idea of using utility functions right out of SimilarityRenameDetector, but it works for the moment. A later commit will likely refactor this into a common utility class, as well as bringing exact rename detection out of RenameDetector and into a separate class, much like SimilarityRenameDetector. Change-Id: I1fb08390aebdcbf20d049aecf402a36506e55611	15 years ago
Christian Halstrick	b840ed0121	Added dirty-detection to WorkingTreeIterator Added possibility to compare the current entry of a WorkingTreeIterator to a given DirCacheEntry. This is done to detect whether an entry in the index is dirty or not. 'Dirty' means that the file in the working tree is different from what's in the index. Merge algorithms will make use of this to detect conflicts. Change-Id: I3ff847f4bf392553dcbd6ee236c6ca32a13eedeb Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Shawn Pearce	ff59ce4bff	Merge "Remove an unused File reference in test code"	15 years ago
Robin Rosenberg	9d589c88f7	Remove an unused File reference in test code Change-Id: Ib0d6c36811df719a53c66e9fa7460b89b2faf98b Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	15 years ago
Shawn Pearce	19473b1dbc	Merge "Handle the tilde notation (~user) of git url"	15 years ago
Robin Rosenberg	845714158a	Handle the tilde notation (~user) of git url When the path is prefixed with ~ the URI parser thought about this as /~. Strip the / if the next character is the tilde. Bug: 307017 Change-Id: I58203e5617956b46d83e8987d1f8042beddffac3 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	15 years ago
Stefan Lay	233e0130b5	Git Porcelain API: Add Command The new Add command adds files to the Git Index. It uses the DirCache to access the git index. It works also in case of an existing conflict. Fileglobs (e.g. *.c) are not yet supported. The new Add command does add ignored files because there is no gitignore support in jgit yet. Bug: 318440 Change-Id: If16fdd4443e46b27361c2a18ed8f51668af5d9ff Signed-off-by: Stefan Lay <stefan.lay@sap.com>	15 years ago
Shawn Pearce	0ef99921fa	Merge changes I104cd62f,I1d0238b4 * changes: Internationalize RepositoryState descriptions Say that commit is allowed during bisect	15 years ago
Christian Halstrick	33160cd2da	Fix ReadTreeTest After refactoring ReadTreeTest the tests failed for filesystems with coarse modification time granularity. This is fixed by explicitly telling the repo to reread the index after we build a new index. Additionally the test testDirectoryFileSimple was simplified by using buildTree() instead of misusing GitIndex to construct trees. Change-Id: I20d2f097491e4cc8c657a696beabc7026b485017 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Charley Wang	b878cdcf6b	Add compatibility with gitignore specifications This patch adds ignore compatibility to jgit. It encompasses exclude files as well as .gitignore. Uses TreeWalk and FileTreeIterator to find nodes and parses .gitignore files when required. The patch includes a simple cache that can be used to save results and avoid excessive gitignore parsing. CQ: 4302 Bug: 303925 Change-Id: Iebd7e5bb534accca4bf00d25bbc1f561d7cad11b Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Jeff Schumacher	f666cc755b	Added more test cases for RenameDetector I added test cases to cover the majority of the code. It's not 100% coverage yet, but the remaining bits are small. Change-Id: Ib534c8e94b13358b8b22cf54e2ff84132bae6d14	15 years ago
Jeff Schumacher	bc08fafb41	Added very small optimization to exact rename detection Optimized a small loop in findExactRenames. The loop would go through all the items in a list of DiffEntries even after it already found what it was looking for. I made it break out of the loop as soon as a good match was found. Change-Id: I28741e0c49ce52d8008930a87cd1db7037700a61	15 years ago
Jeff Schumacher	a20e6f6fec	Fixed Misleading Javadoc The javadoc for the setRenameLimit method in RenameDetector said that you could only have limits in the range (0,100), implying that 0 and 100 were illegal inputs. The code, however, allowed 0 and 100. I changed the javadoc to say that the range [0,100] was legal. I also documented the IllegalArgumentException that is thrown if the limit is outside that range. Change-Id: I916838f254859f6f0e1516bb55b8e7dc87e57dc2	15 years ago
Jeff Schumacher	9a48de86d8	Added file path similarity to scoring metric in rename detection The scoring method was not taking into account the similarity of the file paths and file names. I changed the metric so that it is 99% based on content (which used to be 100% of the old metric), and 1% based on path similarity. Of that 1%, half (.5% of the total final score) is based on the actual file names (e.g. "foo.java"), and half on the directory (e.g. "src/com/foo/bar/"). Change-Id: I94f0c23bf6413c491b10d5625f6ad7d2ecfb4def	15 years ago
Jeff Schumacher	4c14b7869d	Fixed potential div by zero bug The scoring logic in SimilarityIndex was dividing by the max file size. If both files are empty, this would cause a div by zero error. This case cannot currently happen, since two empty files would have the same SHA1, and would therefore be caught in the earlier SHA1 based detection pass. Still, if this logic eventually gets separated from that pass, a div by zero error would occur. I changed the logic to instead consider two empty files to have a similarity score of 100. Change-Id: Ic08e18a066b8fef25bb5e7c62418106a8cee762a	15 years ago
Jeff Schumacher	64b9458640	Added file size based rename detection optimization Prior to this change, files that were very different in size (enough so that they could not have enough in common to be detected as renames) were still having their scores calculated. I added an optimization to skip such files. For example, if the rename detection threshold is 60%, the larger file is 200kb, and the smaller file is 50kb, the pair cannot be counted as a rename since they cannot possibly share 60% of their content in common. (200*.6=120, 120>50) Change-Id: Icd8315412d5de6292839778e7cea7fe6f061b0fc	15 years ago
Robin Rosenberg	d787a82e50	Internationalize RepositoryState descriptions Change-Id: I104cd62f3e89acf010b1d40a2b08e7f68f63bb85 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	15 years ago
Robin Rosenberg	a1492f1922	Say that commit is allowed during bisect C Git allows this and it is quite handy. Change-Id: I1d0238b43fca931ad2079649fb7b431e2815c351 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	15 years ago
Christian Halstrick	d1378e4c51	Merge "Allow ReadTreeTest to test arbitrary Checkouts"	15 years ago
Matthias Sohn	b8f2bb7d2a	Add support for updateNeeded flag in DirCacheEntry Change-Id: If06ff41d9ccd422afbc79ecbc3cfdf8bb2508dcd Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Jeff Schumacher	a8b29afd82	Create FileHeader from DiffEntry Added support for converting DiffEntrys to FileHeaders. FileHeaders are DiffEntrys with a buffer containing the diff output as well as a list of HunkHeaders. The HunkHeaders contain EditLists. The createFileHeader(DiffEntry) method in DiffFormatter performs a Myers Diff on the files refered to by the DiffEntry, then puts the returned EditList into a single HunkHeader, which is then put into the FileHeader to be returned. It also generates the appropriate diff header an puts it into the FileHeader's buffer. The rest of the diff output, which would normally be parsed to generate the HunkHeaders, is not generated. In fact, the purpose of this method is to avoid the costly diff output generation and parsing normally required to create a FileHeader. Change-Id: I7d8b18c0f6c85e3d02ad58995d3d231e69af5887	15 years ago
Christian Halstrick	4be88168b6	Allow ReadTreeTest to test arbitrary Checkouts ReadTreeTest was hardcoded to test WorkDirCheckout. Since we want alternative checkout implementations (especially DirCacheCheckout) this class has been refactored so that the tests can be reused to test other implementations The following changes have been done: - abstract methods for checkout and prescanTwoTrees have been introduced. Parameters are only the two trees. As index we will implicitly use the current index of the repo. - whenever tests needed a manipulated index before checkout and prescanTwoTrees it was ensured that the correct index was persisted (before we could use not-persisted instantiations of GitIndex passed as parameters to checkout, prescanTwoTrees - abstract methods for getting updated, conflicting, removed entries resulting from the last checkout, prescanTwoTrees have been introduced - an implementation for all these abstract methods using WorkDirCheckout has been added - method to assert a certain state of the index and the working tree has been added Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Change-Id: Icf177cf8043487169a32ddd72b6f8f9246a433f7	15 years ago
Stefan Lay	354b90131a	Fix javadoc typos in JGit API There were some small errors which made it difficult to read the JavaDoc. Change-Id: Ib3b34353465162adebaca3514d596d0edf5aea51 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	15 years ago
Shawn O. Pearce	384a19eee0	Deprecate all of the older Tree related code We want to get rid of these APIs, because they don't perform as well as DirCache/TreeWalk, or don't offer nearly as many features. Bug: 319145 Change-Id: I2b28f9cddc36482e1ad42d53e86e9d6461ba3bfc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Stefan Lay	311da9b211	Fix comparison of nanoseconds NB.decodeInt32(info, base + 4) already returns nanoseconds. Therefore it must not be divided by 1000000. Change-Id: Ie8f5c4a03f984d98935dccedc2b1ba4457094899 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	15 years ago
Shawn O. Pearce	1913b41bc7	log: Implement --follow The FollowFilter can be installed on a RevWalk to cause the path to be updated through rename detection when the affected file is found to be added to the project. The filter works reasonably well, for example we can follow the history of the fsck command in git-core: $ jgit log --name-status --follow builtin/fsck.c \| grep ^R R100 builtin-fsck.c builtin/fsck.c R099 fsck.c builtin-fsck.c R099 fsck-objects.c fsck.c R099 fsck-cache.c fsck-objects.c Change-Id: I4017bcfd150126aa342fdd423a688493ca660a1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	e9de5643fa	Cache the diff configuration section This way we don't have to reparse for the rename limit every time we create a new rename detector for a repository. Change-Id: I669d031690b85ef4da5e39189be7173fb773fc56 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	8a0c58394d	log: Add whitespace ignore options Similar to what we did with diff, implement whitespace ignore options for log too. This requires us to define some means of creating any RawText object type at will inside of DiffFormatter, so we define a new factory interface to construct RawText instances on demand. Unfortunately we have to copy the entire block of common options. args4j only processes the options/arguments on the one command class and Java doesn't support multiple inheritance. Change-Id: Ia16cd3a11b850fffae9fbe7b721d7e43f1d0e8a5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	bd8740dc14	Format submodule links during differences Instead of crashing, output a submodule link with the simple "Subproject commit $fullid\n" syntax used by C Git. Change-Id: Iae8646941683fb19b73fb038217d2e3bf5f77fa9 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	5be90be996	Redo DiffFormatter API to be easier to use Passing around the OutputStream and the Repository is crazy. Instead put the stream in the constructor, since this formatter exists only to output to the stream, and put the repository as a member variable that can be optionally set. Change-Id: I2bad012fee7f40dc1346700ebd19f1e048982878 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	04a9d23b9a	log, diff: Add rename detection support Implement rename detection in the command line diff and log commands. Also support --name-status, -p and -U flags, as these can be quite useful to view more detail. All of the Git patch file formatting code is now moved over to the DiffFormatter class. This permits us to reuse it in any context, including inside of IDEs. Change-Id: I687ccba34e18105a07e0a439d2181c323209d96c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	978535b090	Implement similarity based rename detection Content similarity based rename detection is performed only after a linear time detection is performed using exact content match on the ObjectIds. Any names which were paired up during that exact match phase are excluded from the inexact similarity based rename, which reduces the space that must be considered. During rename detection two entries cannot be marked as a rename if they are different types of files. This prevents a symlink from being renamed to a regular file, even if their blob content appears to be similar, or is identical. Efficiently comparing two files is performed by building up two hash indexes and hashing lines or short blocks from each file, counting the number of bytes that each line or block represents. Instead of using a standard java.util.HashMap, we use a custom open hashing scheme similiar to what we use in ObjecIdSubclassMap. This permits us to have a very light-weight hash, with very little memory overhead per cell stored. As we only need two ints per record in the map (line/block key and number of bytes), we collapse them into a single long inside of a long array, making very efficient use of available memory when we create the index table. We only need object headers for the index structure itself, and the index table, but not per-cell. This offers a massive space savings over using java.util.HashMap. The score calculation is done by approximating how many bytes are the same between the two inputs (which for a delta would be how much is copied from the base into the result). The score is derived by dividing the approximate number of bytes in common into the length of the larger of the two input files. Right now the SimilarityIndex table should average about 1/2 full, which means we waste about 50% of our memory on empty entries after we are done indexing a file and sort the table's contents. If memory becomes an issue we could discard the table and copy all records over to a new array that is properly sized. Building the index requires O(M + N log N) time, where M is the size of the input file in bytes, and N is the number of unique lines/blocks in the file. The N log N time constraint comes from the sort of the index table that is necessary to perform linear time matching against another SimilarityIndex created for a different file. To actually perform the rename detection, a SxD matrix is created, placing the sources (aka deletions) along one dimension and the destinations (aka additions) along the other. A simple O(S x D) loop examines every cell in this matrix. A SimilarityIndex is built along the row and reused for each column compare along that row, avoiding the costly index rebuild at the row level. A future improvement would be to load a smaller square matrix into SimilarityIndexes and process everything in that sub-matrix before discarding the column dimension and moving down to the next sub-matrix block along that same grid of rows. An optional ProgressMonitor is permitted to be passed in, allowing applications to see the progress of the detector as it works through the matrix cells. This provides some indication of current status for very long running renames. The default line/block hash function used by the SimilarityIndex may not be optimal, and may produce too many collisions. It is borrowed from RawText's hash, which is used to quickly skip out of a longer equality test if two lines have different hash functions. We may need to refine this hash in the future, in order to minimize the number of collisions we get on common source files. Based on a handful of test commits in JGit (especially my own recent rename repository refactoring series), this rename detector produces output that is very close to C Git. The content similarity scores are sometimes off by 1%, which is most probably caused by our SimilarityIndex type using a different hash function than C Git uses when it computes the delta size between any two objects in the rename matrix. Bug: 318504 Change-Id: I11dff969e8a2e4cf252636d857d2113053bdd9dc Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	629fd0d594	Clean up LICENSE file We used our LICENSE file to describe both the license of the package, and also the header template that should appear at the start of all Java files we create. This creates a confusing situation for readers who just want to consume the package, because our file header template starts off in the middle of a sentence. Move our template header to a separate file, and reformat the text of the license to be something more readable by a person reviewing the project's terms of use. Change-Id: If318e64c06683ea14e0240914c2d057c9199ce98 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Jeff Schumacher	cb8e1e6014	Added a preliminary version of rename detection JGit does not currently do rename detection during diffs. I added a class that, given a TreeWalk to iterate over, can output a list of DiffEntry's for that TreeWalk, taking into account renames. This class only detects renames by SHA1's. More complex rename detection, along the lines of what C Git does will be added later. Change-Id: I93606ce15da70df6660651ec322ea50718dd7c04	15 years ago
Jeff Schumacher	7b0b4110ed	Refactored code out of FileHeader to facilitate rename detection Refactored a superclass out of FileHeader called DiffEntry that holds the more general data from FileHeader that is useful in rename detection (old/new Ids, modes, names, as well as changeType and score). FileHeader is now a DiffEntry that adds Hunks, parsing abilities, etc. Change-Id: I8398728cd218f8c6e98f7a4a7f2f342391d865e4	15 years ago
Dmitry Neverov	44854741c5	Fix missing flush in StreamCopyThread It is possible that StreamCopyThread will not flush everything from it's src to it's dst. In most cases StreamCopyThread works like this: in loop: n = src.read(buf); dst.write(buf, 0, n); and when we want to flush, we interrupt() StreamCopyThread and it flushes everything it wrote to dst. The problem is that our interrupt() could interrupt reading. In this case we will flush everything we wrote to dst, but not everything we wrote to src. Change-Id: Ifaf4d8be87535c7364dd59b217dfc631460018ff Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Jeff Schumacher	9f2249bd26	Added check for binary files while diffing Added a check in Diff to ensure that files that are most likely not text are not line-by-line diffed. Files are determined to be binary by checking the first 8000 bytes for a null character. This is a similar heuristic to what C Git uses. Change-Id: I2b6f05674c88d89b3f549a5db483f850f7f46c26	15 years ago
Matthias Sohn	730b708dae	Merge "Update build to use Tycho 0.9.0"	15 years ago
Shawn Pearce	3fd4918852	Merge changes Ie56301aa,Ic2f79e85 * changes: Added further support for whitespace ignoring during diff Added support for whitespace ignoring	15 years ago
Jeff Schumacher	9869ef2592	Added further support for whitespace ignoring during diff Added code to support ignoring leading, trailing, and changed whitespace when performing a diff operation. I also added command line options to Diff to enable the various whitespace ignoring methods. These match the flags for git diff. Change-Id: Ie56301aafad59ee3f0fe5de62719f5023cd702c8	15 years ago
Matthias Sohn	a2325f6885	Update build to use Tycho 0.9.0 Change-Id: I589267e6cfd0514383c2a3da51c9b7a659f77844 Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Jeff Schumacher	543235b805	Added support for whitespace ignoring JGit did not have support for skipping whitespace when comparing lines in RawText objects. I added a subclass of RawText that skips whitespace in its equals and hashCode methods. I used a subclass rather than adding functionality into RawText so that performance would not be impacted by extra logic. This class only supports ignoring all whitespace. Others will follow that allow other forms of whitespace ignoring. Change-Id: Ic2f79e85215e48d3fd53ec1b4ad13373dd183a4a	15 years ago
Shawn O. Pearce	5ed96eb7f4	UploadPack: Avoid unnecessary flush in smart HTTP Under smart HTTP the biDirectionalPipe flag is false, and we return back immediately at this point in the negotiation process. There is no need to flush the stream to the client, the request is over and it will be automatically flushed out by the higher level servlet that invoked us. Avoiding flush here allows us to only use flush after a progress message is sent during pack generation. Change-Id: Id0c8b7e95e3be6ca4c1b479e096bed6b0283b828 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	066df3d1a1	Add MutableObjectId.copyFrom(AnyObjectId) This simplifies the PackIndex code, which is trying to quickly copy an existing ObjectId into a MutableObjectId. Rather than having the PackIndex violate the ObjectId's internals, expose a copy from function similar to the other ones for copying from raw byte arrays or hex formatted strings. Change-Id: I142635cbece54af2ab83c58477961ce925dc8255 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	677b9b17e2	Expose AnyObjectId compareTo(byte[]) and compareTo(int[]) Storage systems can use these implementations to compare a passed AnyObjectId with a stored representation of an ObjectId in the canonical network byte order format. This can be useful to do a binary search, or just linear scan, over an encoded storage file. Change-Id: I8c72993c4f4c6e98d599ac2c9867453752f25fd2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	864cc3de10	Expose RefWriter constructor taking RefList An implementation might prefer to use the RefList type here, and RefList is part of our public API. Expose the constructor so callers who have a RefList can take advantage of the existing sorting. Change-Id: I545867f85aa2c479d2d610024ebbe318144709c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	bfc43c13bc	Expose RefUpdate constructor to any subclass When we finally move RefDirectory to the new storage.file package, its associated RefDirectoryUpdate will need visiblity to this constructor in order to initialize itself. This is true of any other repository implementation, so make it protected rather than package level visible. Change-Id: If838aec9baeb80ee2f12dcbca717657c725a9242 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	8e40697047	Expose repository change event constructors Repository implementations outside of .lib need to be able to create these events and deliver them to listening application code. Expose and document the constructors so that they are visible when we move FileRepository into storage.file.FileRepository. Change-Id: I7fb6e8f4f5fdab683c5ebb5267673aa6d5b560bb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago

1 2 3 4 5 ...

374 Commits (6e155d5f415e7f62f3f25e082dbee558e5be0b2d) All Branches Search

374 Commits (6e155d5f415e7f62f3f25e082dbee558e5be0b2d)

All Branches