github/jgit - jgit - 帆软第三方插件仓库

Commit Graph

Author	SHA1	Message	Date
Jeff Schumacher	396fe6da45	Break dissimilar file pairs during diff File pairs that are very dissimilar during a diff were not being broken apart into their constituent ADD/DELETE pairs. The leads to sub-optimal rename detection. Take, for example, this situation: A file exists at src/a.txt containing "foo". A user renames src/a.txt to src/b.txt, then adds a new src/a.txt containing "bar". Even though the old a.txt and the new b.txt are identical, the rename detection algorithm would not detect it as a rename since it was already paired in a MODIFY. I added code to split all MODIFYs below a certain score into their constituent ADD/DELETE pairs. This allows situations like the one I described above to be more correctly handled. Change-Id: I22c04b70581f206bbc68c4cd1ee87a1f663b418e Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Christian Halstrick	f56a459966	Add methods which write MERGE_HEAD and MERGE_MSG Add methods to the Repository class which write into MERGE_HEAD and MERGE_MSG files. Since we have the read methods in the same class this seems to be the right place. Change-Id: I5dd65306ceb06e008fcc71b37ca3a649632ba462 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Jens Baumgart	db82b8d7eb	Fix concurrent read / write issue in LockFile on Windows LockFile.commit fails if another thread concurrently reads the base file. The problem is fixed by retrying the rename operation if it fails. Change-Id: I6bb76ea7f2e6e90e3ddc45f9dd4d69bd1b6fa1eb Bug: 308506 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	15 years ago
Robin Stocker	a00377a7e2	Fix Javadoc warnings There were some broken links, incorrect uses of @value, an invalid tag and an outdated comment. Change-Id: I22886bcc869a4b62bd606ebed40669f7b4723664 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	80fe789690	Make forPath(ObjectReader) variant in TreeWalk This simplifies the logic for those who already have an ObjectReader on hand want to reuse it to lookup a single path. Change-Id: Ief17d6b2a0674ddb34bbc9f43121b756eae960fb Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	7ff18f3ec9	Make StoredConfig an abstraction above FileBasedConfig This exposes a load and save method, allowing a Repository to denote that it has a persistent configuration of some kind which can be accessed by the application, without needing to know exact details of how its stored . Change-Id: I7c414bc0f975b80f083084ea875eca25c75a07b2 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	fa9b225e06	Merge branch 'delta' * delta: (103 commits) Discard the uncompressed delta as soon as its compressed Honor pack.windowlimit to cap memory usage during packing Honor pack.threads and perform delta search in parallel Cache small deltas during packing Implement delta generation during packing debug-show-packdelta: Dump a pack delta to the console Initial pack format delta generator Add debugging toString() method to ObjectToPack Make ObjectToPack clearReuseAsIs signal available to subclasses Correctly classify the compressing objects phase Refactor ObjectToPack's delta depth setting Configure core.bigFileThreshold into PackWriter Add doNotDelta flag to ObjectToPack Add more configuration options to PackWriter Save object path hash codes during packing Add path hash code to ObjectWalk Add getObjectSize to ObjectReader Allow TemporaryBuffer.Heap to allocate smaller than 8 KiB Define a constant for 127 in DeltaEncoder Cap delta copy instructions at 64k ... Conflicts: org.eclipse.jgit.pgm/src/org/eclipse/jgit/pgm/Diff.java org.eclipse.jgit/resources/org/eclipse/jgit/JGitText.properties org.eclipse.jgit/src/org/eclipse/jgit/JGitText.java org.eclipse.jgit/src/org/eclipse/jgit/revwalk/RewriteTreeFilter.java Change-Id: I7c7a05e443a48d32c836173a409ee7d340c70796	15 years ago
Stefan Lay	ab062caa22	Allow client of Add command to set a WorkingTreeIterator This is e.g. useful when a client of the AddCommand has additional rules to ignore files. In Eclipse a resource can be set to derived or be excluded by preferences. Change-Id: I6c47e54a1ce26315faf5ed0723298ad2c2db197c Signed-off-by: Stefan Lay <stefan.lay@sap.com>	15 years ago
Stefan Lay	88957f6c5a	Allow for filepattern "." in AddCommand Enable adding on repository root level. Change-Id: I415b10dc74cc9435578424d9f106c972fd703055 Signed-off-by: Stefan Lay <stefan.lay@sap.com>	15 years ago
Stefan Lay	aa86cfc339	Do not add ignored files in Add command Signed-off-by: Stefan Lay <stefan.lay@sap.com>	15 years ago
Shawn O. Pearce	09910ffa32	Move ignore node handling into WorkingTreeIterator The working tree iterator has perfect knowledge of the path structure as well as immediate information about whether or not an ignore file even exists at this level. We can exploit that to simplify the logic and running time for testing ignored file status by pushing all of the checks down into the iterator itself. Change-Id: I22ff534853e8c5672cc5c2d9444aeb14e294070e Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Shawn Pearce	0ec0e21fdf	Merge "Fix concurrent read / write issue in GitIndex on Windows"	15 years ago
Jens Baumgart	e99c48a61a	Fix concurrent read / write issue in GitIndex on Windows GitIndex.write fails if another thread concurrently reads the index file. The problem is fixed by retrying the rename operation if it fails. Bug: 311051 Change-Id: Ib243d2a90adae312712d02521de4834d06804944 Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>	15 years ago
Christian Halstrick	5c94321b47	Check for racy git in WorkingTreeIterator The WorkingTreeIterator has a method to check whether the current file differs from the corresponding index entry. This commit improves this check to also handle racy git situations. See http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD Change-Id: I3ad0897211dcbb2eac9eebcb19d095a5052fb06b Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Christian Halstrick	c98d97731b	Smudge racily clean index entries by truncating length (like git.git) To mark an entry racily clean we set its length to 0 (like native git does). Entries which are not racily clean and have zero length can be distinguished from racily clean entries by checking P_OBJECTID against the SHA1 of empty content. When length is 0 and P_OBJECTID is different from SHA1 of empty content we know the entry is marked racily clean. See http://dev.eclipse.org/mhonarc/lists/jgit-dev/msg00488.html Change-Id: I689552931441ab51964b430b303160c9126b66af Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Shawn O. Pearce	938943d674	Use proper constants for .gitignore and .git directory We have a constant for .gitignore, so use it. While we are in the same method, correct the reference of ".git" to be the actual GIT_DIR given. This might not be within the work tree if the GIT_DIR and GIT_WORK_TREE environment variables were used. Change-Id: I38e1cec13405109b9c347858b38dd9fb2f1f2560 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Shawn O. Pearce	c59db09bc5	Remove gitIgnoreTimestamp from abstract iterator API This never should have been exposed on the top of the AbstractTreeIterator type hierarchy. There is no concept of a timestamp in a canonical tree read from the object database, and the time in the DirCache isn't what we want here either. Actually all that we need is to find the files whose names are ".gitignore" and are below the root directory. We can accomplish that with a suffix filter, and process them immediately. Change-Id: Ib09cbf81a9e038452ce491385c65498312e2916b Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Charley Wang <chwang@redhat.com> CC: Chris Aniszczyk <caniszczyk@gmail.com> CC: Stefan Lay <stefan.lay@sap.com> CC: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Shawn O. Pearce	395d236058	Fix NPE in RenameDetector If we have two adds of the same object but no deletes the detector threw an NPE because the entry that came back from the deleted map was null (no matching objects). In this case we need to put the adds all back onto the list of left over additions since they did not match a delete. Change-Id: Ie68fbe7426b4dc0cb571a08911c7adbffff755d5 Signed-off-by: Shawn O. Pearce <spearce@spearce.org> CC: Jeffrey Schumacher" <jeffschu@google.com>	15 years ago
Shawn O. Pearce	b518189b5c	IndexPack: Fix spurious pack file corruption errors We didn't correctly handle the zlib trailer for an object. If the trailer bytes were outside of the current buffer window but we had fully inflated the object itself, we broke out of the loop (as we had our target size) but inflate wasn't finished (as it did not yet get the trailer) so we failed the test and threw a corruption exception. Use an infinite loop and only break out when the inflater is done. Change-Id: I7c9bbbeb577a990d9bc56a50ebd485935460f6c8 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Jonathan Gossage	ec13e0382a	Fully implement Logger interface On April 27, 2010 the Logger interface was upgraded with a number of new methods to make it consistent with the implementations it was meant to support. This patch makes RecordingLogger consistent with the Logger interface and allows to also use Jetty 7.1.5 released with Helios which can be installed from the p2 repository at http://download.eclipse.org/jetty/7.1.5.v20100705/repository Change-Id: I5645436bbe7492f82d4069e4d9cbebede0bf764e Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Shawn O. Pearce	12fe0f2d1e	Discard the uncompressed delta as soon as its compressed The DeltaCache will most likely need to copy the compressed delta into a new buffer in order to compact away the wasted space at the end caused by over allocation. Since we don't need the uncompressed format anymore, null out our only reference to it so the GC can reclaim this memory if it needs to perform a collection in order to satisfy the cache's allocation attempt. Change-Id: I50403cfd2e3001b093f93a503cccf7adab43cc9d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	6e155d5f41	Merge branch 'js/rename' * js/rename: Implemented file path based tie breaking to exact rename detection Added more test cases for RenameDetector Added very small optimization to exact rename detection Fixed Misleading Javadoc Added file path similarity to scoring metric in rename detection Fixed potential div by zero bug Added file size based rename detection optimization Create FileHeader from DiffEntry log: Implement --follow Cache the diff configuration section log: Add whitespace ignore options Format submodule links during differences Redo DiffFormatter API to be easier to use log, diff: Add rename detection support Implement similarity based rename detection Added a preliminary version of rename detection Refactored code out of FileHeader to facilitate rename detection	15 years ago
Shawn O. Pearce	0b46e70155	Fix infinite loop in IndexPack A programming error using the Inflater API led to an infinite loop within IndexPack, caused by the Inflater returning 0 from the inflate() method, but it didn't want more input. This happens when it has reached the end of the stream, or has reached a spot asking for an external dictionary. Such a case is a failure for us, and we should abort out. Thanks to Alex for pointing out that we had 3 implementations of the inflate rountine, which should be consolidated into one and use a switch to determine where to load data from. Bug: 317416 Change-Id: I34120482375b687ea36ed9154002d77047e94b1f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Jeff Schumacher	31311cacfd	Implemented file path based tie breaking to exact rename detection During the exact rename detection phase in RenameDetector, ties were resolved on a first-found basis. I added support for file path based tie breaking during that phase. Basically, there are four situations that have to be handled: One add matching one delete: In this simple case, we pair them as a rename. One add matching many deletes: Find the delete whos path matches the add the closest, and pair them as a rename. Many adds matching one delete: Similar to the above case, we find the add that matches the delete the closest, and pair them as a rename. The other adds are marked as copies of the delete. Many adds matching many deletes: Build a scoring matrix similar to the one used for content- based matching, scoring instead by file path. Some of the utility functions in SimilarityRenameDetector are used in this case, as we use the same encoding scheme. Once the matrix is built, scan it for the best matches, marking them as renames. The rest are marked as copies. I don't particularly like the idea of using utility functions right out of SimilarityRenameDetector, but it works for the moment. A later commit will likely refactor this into a common utility class, as well as bringing exact rename detection out of RenameDetector and into a separate class, much like SimilarityRenameDetector. Change-Id: I1fb08390aebdcbf20d049aecf402a36506e55611	15 years ago
Christian Halstrick	b840ed0121	Added dirty-detection to WorkingTreeIterator Added possibility to compare the current entry of a WorkingTreeIterator to a given DirCacheEntry. This is done to detect whether an entry in the index is dirty or not. 'Dirty' means that the file in the working tree is different from what's in the index. Merge algorithms will make use of this to detect conflicts. Change-Id: I3ff847f4bf392553dcbd6ee236c6ca32a13eedeb Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Shawn Pearce	ff59ce4bff	Merge "Remove an unused File reference in test code"	15 years ago
Robin Rosenberg	9d589c88f7	Remove an unused File reference in test code Change-Id: Ib0d6c36811df719a53c66e9fa7460b89b2faf98b Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	15 years ago
Shawn Pearce	19473b1dbc	Merge "Handle the tilde notation (~user) of git url"	15 years ago
Robin Rosenberg	845714158a	Handle the tilde notation (~user) of git url When the path is prefixed with ~ the URI parser thought about this as /~. Strip the / if the next character is the tilde. Bug: 307017 Change-Id: I58203e5617956b46d83e8987d1f8042beddffac3 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	15 years ago
Stefan Lay	233e0130b5	Git Porcelain API: Add Command The new Add command adds files to the Git Index. It uses the DirCache to access the git index. It works also in case of an existing conflict. Fileglobs (e.g. *.c) are not yet supported. The new Add command does add ignored files because there is no gitignore support in jgit yet. Bug: 318440 Change-Id: If16fdd4443e46b27361c2a18ed8f51668af5d9ff Signed-off-by: Stefan Lay <stefan.lay@sap.com>	15 years ago
Shawn Pearce	0ef99921fa	Merge changes I104cd62f,I1d0238b4 * changes: Internationalize RepositoryState descriptions Say that commit is allowed during bisect	15 years ago
Christian Halstrick	33160cd2da	Fix ReadTreeTest After refactoring ReadTreeTest the tests failed for filesystems with coarse modification time granularity. This is fixed by explicitly telling the repo to reread the index after we build a new index. Additionally the test testDirectoryFileSimple was simplified by using buildTree() instead of misusing GitIndex to construct trees. Change-Id: I20d2f097491e4cc8c657a696beabc7026b485017 Signed-off-by: Christian Halstrick <christian.halstrick@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Charley Wang	b878cdcf6b	Add compatibility with gitignore specifications This patch adds ignore compatibility to jgit. It encompasses exclude files as well as .gitignore. Uses TreeWalk and FileTreeIterator to find nodes and parses .gitignore files when required. The patch includes a simple cache that can be used to save results and avoid excessive gitignore parsing. CQ: 4302 Bug: 303925 Change-Id: Iebd7e5bb534accca4bf00d25bbc1f561d7cad11b Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com> Signed-off-by: Stefan Lay <stefan.lay@sap.com> Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>	15 years ago
Jeff Schumacher	f666cc755b	Added more test cases for RenameDetector I added test cases to cover the majority of the code. It's not 100% coverage yet, but the remaining bits are small. Change-Id: Ib534c8e94b13358b8b22cf54e2ff84132bae6d14	15 years ago
Jeff Schumacher	bc08fafb41	Added very small optimization to exact rename detection Optimized a small loop in findExactRenames. The loop would go through all the items in a list of DiffEntries even after it already found what it was looking for. I made it break out of the loop as soon as a good match was found. Change-Id: I28741e0c49ce52d8008930a87cd1db7037700a61	15 years ago
Jeff Schumacher	a20e6f6fec	Fixed Misleading Javadoc The javadoc for the setRenameLimit method in RenameDetector said that you could only have limits in the range (0,100), implying that 0 and 100 were illegal inputs. The code, however, allowed 0 and 100. I changed the javadoc to say that the range [0,100] was legal. I also documented the IllegalArgumentException that is thrown if the limit is outside that range. Change-Id: I916838f254859f6f0e1516bb55b8e7dc87e57dc2	15 years ago
Jeff Schumacher	9a48de86d8	Added file path similarity to scoring metric in rename detection The scoring method was not taking into account the similarity of the file paths and file names. I changed the metric so that it is 99% based on content (which used to be 100% of the old metric), and 1% based on path similarity. Of that 1%, half (.5% of the total final score) is based on the actual file names (e.g. "foo.java"), and half on the directory (e.g. "src/com/foo/bar/"). Change-Id: I94f0c23bf6413c491b10d5625f6ad7d2ecfb4def	15 years ago
Jeff Schumacher	4c14b7869d	Fixed potential div by zero bug The scoring logic in SimilarityIndex was dividing by the max file size. If both files are empty, this would cause a div by zero error. This case cannot currently happen, since two empty files would have the same SHA1, and would therefore be caught in the earlier SHA1 based detection pass. Still, if this logic eventually gets separated from that pass, a div by zero error would occur. I changed the logic to instead consider two empty files to have a similarity score of 100. Change-Id: Ic08e18a066b8fef25bb5e7c62418106a8cee762a	15 years ago
Jeff Schumacher	64b9458640	Added file size based rename detection optimization Prior to this change, files that were very different in size (enough so that they could not have enough in common to be detected as renames) were still having their scores calculated. I added an optimization to skip such files. For example, if the rename detection threshold is 60%, the larger file is 200kb, and the smaller file is 50kb, the pair cannot be counted as a rename since they cannot possibly share 60% of their content in common. (200*.6=120, 120>50) Change-Id: Icd8315412d5de6292839778e7cea7fe6f061b0fc	15 years ago
Robin Rosenberg	d787a82e50	Internationalize RepositoryState descriptions Change-Id: I104cd62f3e89acf010b1d40a2b08e7f68f63bb85 Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>	15 years ago
Shawn O. Pearce	9734194917	Honor pack.windowlimit to cap memory usage during packing The pack.windowlimit configuration parameter places an upper bound on the number of bytes used by the DeltaWindow class as it scans through the object list. If memory usage would exceed the limit the window is temporarily decreased in size to keep memory used within that bound. Change-Id: I09521b8f335475d8aee6125826da8ba2e545060d Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	74e0835012	Honor pack.threads and perform delta search in parallel If we have multiple CPUs available, packing usually goes faster when each CPU is assigned a slice of the available search space. The number of threads to use is guessed from the runtime if it wasn't set by the caller, or wasn't set in the configuration. Change-Id: If554fd8973db77632a52a0f45377dd6ec13fc220 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	a960d1429e	Cache small deltas during packing PackWriter now caches small deltas, or deltas that are very tiny compared to their source inputs, so that the writing phase goes faster by reusing those cached deltas. The cached data is stored compressed, which usually translates to a bigger footprint due to deltas being very hard to compress, but saves time during writing by avoiding the deflate step. They are held under SoftReferences so that the JVM GC can clear out deltas if memory gets very tight. We would rather continue working and spend a bit more CPU time during writing than crash due to OOME. To avoid OutOfMemoryErrors during the caching phase we also trap OOME and just abort out of the caching. Because deflateBound() always produces something larger than what we need to actually store the deflated data, we copy it over into a new buffer if the actual length doesn't match the buffer length. When packing jgit.git this saves over 111 KiB in the cache, and is thus a worthwhile hit on CPU time. To further save memory we store the inflated size of the delta (which we need for the object header) in the same field as the pathHash, as the pathHash is no longer necessary by this phase of the packing algorithm. Change-Id: I0da0c600d845e8ec962289751f24e65b5afa56d7 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	dfad23bf3d	Implement delta generation during packing PackWriter now produces new deltas if there is not a suitable delta available for reuse from an existing pack file. This permits JGit to send less data on the wire by sending a delta relative to an object the other side already has, instead of sending the whole object. The delta searching algorithm is similar in style to what C Git uses, but apparently has some differences (see below for more on). Briefly, objects that should be considered for delta compression are pushed onto a list. This list is then sorted by a rough similarity score, which is derived from the path name the object was discovered at in the repository during object counting. The list is then walked in order. At each position in the list, up to $WINDOW objects prior to it are attempted as delta bases. Each object in the window is tried, and the shortest delta instruction sequence selects the base object. Some rough rules are used to prevent pathological behavior during this matching phase, like skipping pairings of objects that are not similar enough in size. PackWriter intentionally excludes commits and annotated tags from this new delta search phase. In the JGit repository only 28 out of 2600+ commits can be delta compressed by C Git. As the commit count tends to be a fair percentage of the total number of objects in the repository, and they generally do not delta compress well, skipping over them can improve performance with little increase in the output pack size. Because this implementation was rebuilt from scratch based on my own memory of how the packing algorithm has evolved over the years in C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly the same rules everywhere, and that leads JGit to produce different (but logically equivalent) pack files. Repository \| Pack Size (bytes) \| Packing Time \| JGit - CGit = Difference \| JGit / CGit -----------+----------------------------------+----------------- git \| 25094348 - 24322890 = +771458 \| 59.434s / 59.133s jgit \| 5669515 - 5709046 = - 39531 \| 6.654s / 6.806s linux-2.6 \| 389M - 386M = +3M \| 20m02s / 18m01s For the above tests pack.threads was set to 1, window size=10, delta depth=50, and delta and object reuse was disabled for both implementations. Both implementations were reading from an already fully packed repository on local disk. The running time reported is after 1 warm-up run of the tested implementation. PackWriter is writing 771 KiB more data on git.git, 3M more on linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an extra 2 minutes to pack. On the running time side, JGit is at a major disadvantage because linux-2.6 doesn't fit into the default WindowCache of 20M, while C Git is able to mmap the entire pack and have it available instantly in physical memory (assuming hot cache). CGit also has a feature where it caches deltas that were created during the compression phase, and uses those cached deltas during the writing phase. PackWriter does not implement this (yet), and therefore must create every delta twice. This could easily account for the increased running time we are seeing. Change-Id: I6292edc66c2e95fbe45b519b65fdb3918068889c Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	074055d747	debug-show-packdelta: Dump a pack delta to the console This is a horribly crude application, it doesn't even verify that the object its dumping is delta encoded. Its method of getting the delta is pretty abusive to the public PackWriter API, because right now we don't want to expose the real internal low-level methods actually required to do this. Change-Id: I437a17ceb98708b5603a2061126eb251e82f4ed4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	8612c0ace1	Initial pack format delta generator DeltaIndex is a simple pack style delta generator. The function works by creating a compact index of a source buffer's blocks, and then walking a sliding window along a desired result buffer, searching for the window in the index. When a match is found, the window is stretched to the longest possible length that is common with the source buffer, and a copy instruction is created. Rabin's polynomial hash function is used to compute the hash for a block, permitting efficient sliding of the window in single byte increments. The update function to slide one byte originated from David Mazieres' work in LBFS, and our implementation of the update step was certainly inspired by the initial work Geert Bosch proposed for C Git in http://marc.info/?l=git&m=114565424620771&w=2. To ensure the encoder runs in linear time with respect to the size of the two input buffers (source and result), the maximum number of blocks that can share the same position in the index's hashtable is capped at a constant number. This prevents bad inputs from causing the encoder to run in quadratic time, but comes with a penalty of creating a longer delta due to fewer considered copy positions. Strange hackery is used to cap the amount of memory used by the index to be no more than 12 bytes for every 16 bytes of source buffer, no matter what the JVM per-object overhead is. This permits an index to always be no larger than 1.75x the source buffer length, which is an important feature to support large windows of candidates to match against while packing. Here the strange hackery is nothing more than a manually managed chained hashtable, where pointers are array indexes into storage arrays rather than object references. Computation of the hash function for a single fixed sized block is done through an unrolled loop, where the first 4 iterations have been manually reduced down to eliminate unnecessary instructions. The pattern is derived from ObjectId.equals(byte[], int, byte[], int), where we have unrolled the loop required to compare two 20 byte arrays. Hours of testing with the Sun 1.6 JRE concluded that the non-obvious "foo[idx + 1]" style of reference is faster than "foo[idx++]", and so that is what we use here during hashing. Change-Id: If9fb2a1524361bc701405920560d8ae752221768 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	b38426ae8c	Add debugging toString() method to ObjectToPack Its useful to know what the flags are or what the base that was selected is. Dump these out as part of the object's toString. Change-Id: I8810067fb8337b08b4fcafd5f9ea3e1e31ca6726 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	699e4aa7c5	Make ObjectToPack clearReuseAsIs signal available to subclasses A subclass may want to use this method to release handles that are caching reuse information. Make it protected so they can override it and update themselves. Change-Id: I2277a56ad28560d2d2d97961cbc74bc7405a70d4 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	4569d77e13	Correctly classify the compressing objects phase Searching for reuse candidates should be fast compared to actually doing delta compression. So pull the progress monitor out of this phase and rename it back to identify the compressing objects state. Change-Id: I5eb80919f21c1251e0e3420ff7774126f1f79b27 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago
Shawn O. Pearce	85b7a53d52	Refactor ObjectToPack's delta depth setting Long ago when PackWriter is first written we thought that the delta depth could be updated automatically. But its never used. Instead make this a simple standard setter so the caller can more directly set the delta depth of this object. This permits us to configure a depth that takes into account more than just the depth of another object in this same pack. Change-Id: I1d71b74f2edd7029b8743a2c13b591098ce8cc8f Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	15 years ago

... 120 121 122 123 124 ...

6547 Commits (4678710c6862602df1bb45ee1035705a0c6a200b) All Branches Search

6547 Commits (4678710c6862602df1bb45ee1035705a0c6a200b)

All Branches