|
|
@ -45,11 +45,11 @@ package org.eclipse.jgit.diff; |
|
|
|
|
|
|
|
|
|
|
|
/** |
|
|
|
/** |
|
|
|
* An extended form of Bram Cohen's patience diff algorithm. |
|
|
|
* An extended form of Bram Cohen's patience diff algorithm. |
|
|
|
* |
|
|
|
* <p> |
|
|
|
* This implementation was derived by using the 4 rules that are outlined in |
|
|
|
* This implementation was derived by using the 4 rules that are outlined in |
|
|
|
* Bram Cohen's <a href="http://bramcohen.livejournal.com/73318.html">blog</a>, |
|
|
|
* Bram Cohen's <a href="http://bramcohen.livejournal.com/73318.html">blog</a>, |
|
|
|
* and then was further extended to support low-occurrence common elements. |
|
|
|
* and then was further extended to support low-occurrence common elements. |
|
|
|
* |
|
|
|
* <p> |
|
|
|
* The basic idea of the algorithm is to create a histogram of occurrences for |
|
|
|
* The basic idea of the algorithm is to create a histogram of occurrences for |
|
|
|
* each element of sequence A. Each element of sequence B is then considered in |
|
|
|
* each element of sequence A. Each element of sequence B is then considered in |
|
|
|
* turn. If the element also exists in sequence A, and has a lower occurrence |
|
|
|
* turn. If the element also exists in sequence A, and has a lower occurrence |
|
|
@ -58,34 +58,34 @@ package org.eclipse.jgit.diff; |
|
|
|
* lowest number of occurrences is chosen as a split point. The region is split |
|
|
|
* lowest number of occurrences is chosen as a split point. The region is split |
|
|
|
* around the LCS, and the algorithm is recursively applied to the sections |
|
|
|
* around the LCS, and the algorithm is recursively applied to the sections |
|
|
|
* before and after the LCS. |
|
|
|
* before and after the LCS. |
|
|
|
* |
|
|
|
* <p> |
|
|
|
* By always selecting a LCS position with the lowest occurrence count, this |
|
|
|
* By always selecting a LCS position with the lowest occurrence count, this |
|
|
|
* algorithm behaves exactly like Bram Cohen's patience diff whenever there is a |
|
|
|
* algorithm behaves exactly like Bram Cohen's patience diff whenever there is a |
|
|
|
* unique common element available between the two sequences. When no unique |
|
|
|
* unique common element available between the two sequences. When no unique |
|
|
|
* elements exist, the lowest occurrence element is chosen instead. This offers |
|
|
|
* elements exist, the lowest occurrence element is chosen instead. This offers |
|
|
|
* more readable diffs than simply falling back on the standard Myers' O(ND) |
|
|
|
* more readable diffs than simply falling back on the standard Myers' O(ND) |
|
|
|
* algorithm would produce. |
|
|
|
* algorithm would produce. |
|
|
|
* |
|
|
|
* <p> |
|
|
|
* To prevent the algorithm from having an O(N^2) running time, an upper limit |
|
|
|
* To prevent the algorithm from having an O(N^2) running time, an upper limit |
|
|
|
* on the number of unique elements in a histogram bucket is configured by |
|
|
|
* on the number of unique elements in a histogram bucket is configured by |
|
|
|
* {@link #setMaxChainLength(int)}. If sequence A has more than this many |
|
|
|
* {@link #setMaxChainLength(int)}. If sequence A has more than this many |
|
|
|
* elements that hash into the same hash bucket, the algorithm passes the region |
|
|
|
* elements that hash into the same hash bucket, the algorithm passes the region |
|
|
|
* to {@link #setFallbackAlgorithm(DiffAlgorithm)}. If no fallback algorithm is |
|
|
|
* to {@link #setFallbackAlgorithm(DiffAlgorithm)}. If no fallback algorithm is |
|
|
|
* configured, the region is emitted as a replace edit. |
|
|
|
* configured, the region is emitted as a replace edit. |
|
|
|
* |
|
|
|
* <p> |
|
|
|
* During scanning of sequence B, any element of A that occurs more than |
|
|
|
* During scanning of sequence B, any element of A that occurs more than |
|
|
|
* {@link #setMaxChainLength(int)} times is never considered for an LCS match |
|
|
|
* {@link #setMaxChainLength(int)} times is never considered for an LCS match |
|
|
|
* position, even if it is common between the two sequences. This limits the |
|
|
|
* position, even if it is common between the two sequences. This limits the |
|
|
|
* number of locations in sequence A that must be considered to find the LCS, |
|
|
|
* number of locations in sequence A that must be considered to find the LCS, |
|
|
|
* and helps maintain a lower running time bound. |
|
|
|
* and helps maintain a lower running time bound. |
|
|
|
* |
|
|
|
* <p> |
|
|
|
* So long as {@link #setMaxChainLength(int)} is a small constant (such as 64), |
|
|
|
* So long as {@link #setMaxChainLength(int)} is a small constant (such as 64), |
|
|
|
* the algorithm runs in O(N * D) time, where N is the sum of the input lengths |
|
|
|
* the algorithm runs in O(N * D) time, where N is the sum of the input lengths |
|
|
|
* and D is the number of edits in the resulting EditList. If the supplied |
|
|
|
* and D is the number of edits in the resulting EditList. If the supplied |
|
|
|
* {@link SequenceComparator} has a good hash function, this implementation |
|
|
|
* {@link SequenceComparator} has a good hash function, this implementation |
|
|
|
* typically out-performs {@link MyersDiff}, even though its theoretical running |
|
|
|
* typically out-performs {@link MyersDiff}, even though its theoretical running |
|
|
|
* time is the same. |
|
|
|
* time is the same. |
|
|
|
* |
|
|
|
* <p> |
|
|
|
* This implementation has an internal limitation that prevents it from handling |
|
|
|
* This implementation has an internal limitation that prevents it from handling |
|
|
|
* sequences with more than 268,435,456 (2^28) elements. |
|
|
|
* sequences with more than 268,435,456 (2^28) elements. |
|
|
|
*/ |
|
|
|
*/ |
|
|
|