If a file fails to index the first time the loop encounters it, the
file is likely to fail to index again on the next row. Rather than
wasting a huge amount of CPU to index it again and fail, remember
which destination files failed to index and skip over them on each
subsequent row.
Because this condition is very unlikely, avoid allocating the BitSet
until its actually needed. This keeps the memory usage unaffected
for the common case.
Change-Id: I93509b28b61a9bba8f681a7b4df4c6127bca2a09
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The counter portion of each pair is only 32 bits wide, but is part
of a larger 64 bit integer. If the file size was larger than 4 GB
the counter could overflow and impact the key, changing the hash,
and later resulting in an incorrect similarity score.
Guard against this overflow condition by capping the count for each
record at 2^32-1. If any record contains more than that many bytes
the table aborts hashing and throws TableFullException.
This permits the index to scan and work on files that exceed 4 GB
in size, but only if the file contains more than one unique block.
The index throws TableFullException on a 4 GB file containing all
zeros, but should succeed on a 6 GB file containing unique lines.
The index now uses a 64 bit accumulator during the common scoring
algorithm, possibly resulting in slower summations. However this
index is already heavily dependent upon 64 bit integer operations
being efficient, so increasing from 32 bits to 64 bits allows us
to correctly handle 6 GB files.
Change-Id: I14e6dbc88d54ead19336a4c0c25eae18e73e6ec2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Files bigger than 8 MB (2^23 bytes) tended to overflow the internal
hashtable, as the table was capped in size to 2^17 records. If a
file contained 2^17 unique data blocks/lines, the table insertion
got stuck in an infinite loop as the able couldn't grow, and there
was no open slot for the new item.
Remove the artifical 2^17 table limit and instead allow the table
to grow to be as big as 2^30. With a 64 byte block size, this
permits hashing inputs as large as 64 GB.
If the table reaches 2^30 (or cannot be allocated) hashing is
aborted. RenameDetector no longer tries to break a modify file pair,
and it does not try to match the file for rename or copy detection.
Change-Id: Ibb4d756844f4667e181e24a34a468dc3655863ac
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This comment was wrong, due to a copy-and-paste error. Here the
code is looking at records of dst that do not exist in src, and
are skipping past them to find another match.
Change-Id: I07c1fba7dee093a1eeffcf7e0c7ec85446777ffb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In order to safely edit a notes tree, NoteMap needs to retain any
non-note tree entries it read from the source tree and put them
back out into the modified tree when it commits a new version of
the note branch.
Remember any tree entries that didn't look like a note during
the parsing of the tree, so they can be put into a TreeFormatter
later when the tree writes to the repository.
Change-Id: Ia284af7e7866da35db35374c6c5869f00c857944
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of reading a note tree recursively up front when the NoteMap
is loaded, read only the root tree and load subtrees on demand when
they are accessed by the application. This gives a lower latency
to read a note for the recent commits on a branch, as only the paths
that are needed get read.
Given a 2/38 style fanout, the tree will fully load when 256 objects
have been accessed by the application. But unlike the prior version
of NoteMap, the NoteMap will load faster and answer lookups sooner,
as the loading time for all 256 levels is spread out across each of
the get() requests.
Given a 2/2/36 style fanout, the tree won't need to fully load until
about 65,536 objects are accessed.
To simplify the implementation we only support the flat layout (all
notes in the top level tree), or a 2/38, 2/2/36, 2/2/2/34, through
2/.../2 style fanout. Unlike C Git we don't support reading the old
experimental 4/36 fanout. This is sufficient because C Git won't
create the 4/36 style fanout when creating or updating a notes tree,
and there really aren't any in the wild today.
Change-Id: I6099b35916a8404762f31e9c11f632e43e0c1bfd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The NoteMap makes it easy to read a small notes tree as created by
the `git notes` command in C Git. To make the initial implementation
simple a notes tree is read recursively into a map in memory.
This is reasonable if the application will need to access all notes,
or if there are less than 256 notes in the tree, but doesn't behave
well when the number of notes exceeds 256 and the application
doesn't need to access all of them.
We can later add support for lazily loading different subpaths,
thus fixing the large note tree problem described above.
Currently the implementation only supports reading. Writing notes
is more complex because trees need to be expanded or collapsed at
the exact 256 entry cut-off in order to retain the same tree SHA-1
that C Git would use for the same content. It also needs to retain
non-note tree entries such as ".gitignore" or ".gitattribute" files
that might randomly appear within a notes tree. We can also add
writing support later.
Change-Id: I93704bd84ebf650d51de34da3f1577ef0f7a9144
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Instead of configuring the JSch session factory, configure a more
generic CredentialsProvider, which will work for other transport
types such as http, in addition to the existing ssh.
Change-Id: I22b13303c17e654ba6720edf4be2ef15fe29537a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When setting up an SSH connection, use the caller supplied
CredentialsProvider, if one has been given to the Transport
or was defined as the default.
The CredentialsProvider is re-wrapped as a JSch UserInfo,
allowing the connection to use this for user interactive
prompts. This give a unified API for authentication on
any transport type.
Change-Id: Id3b4cf5bfd27a23207cdfb188bae3b78e71e02c0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This permits applications to set their preferred credentials UI
implementation once, rather than needing to define it on every
single Transport instance they open.
Change-Id: I010550de1a6becab27f7aa5a9901df5a1c7e74bd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This change is based on http://egit.eclipse.org/r/#change,1652
by David Green. The change adds the concept of a CredentialsProvider
which can be registered for git transports and which is
responsible to return credential-related data like passwords and
usernames. Whenenver the transports detects that an authentication
with certain credentials has to be done it will ask the
CredentialsProvider for this data. Foreseen implementations for
such a Provider may be a EGitCredentialsProvider (caching
credential data entered e.g. in the Clone-Wizzard) or a NetRcProvider
(gathering data out of ~/.netrc file).
Bug: 296201
Change-Id: Ibe13e546b45eed3e193c09ecb414bbec2971d362
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: David Green <dgreen99@gmail.com>
The auth-scheme token (like "Basic" or "Digest") is not specified in a
case sensitive way. RFC2617 (http://tools.ietf.org/html/rfc2617) specifies
in section 1.2 the use of a "case-insensitive token to identify the
authentication scheme". Jetty, for example, uses "basic" as token.
Change-Id: I635a94eb0a741abcb3e68195da6913753bdbd889
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
The ObjectId (for a ref) can be easily reformatted into a temporary
byte[] and then passed off to write(byte[]), removing the duplicated
code that existed in both write methods.
Change-Id: I09740658e070d5f22682333a2e0d325fd1c4a6cb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Test was broken by commit b087bba3 changing formatting of merge
commit messages.
Change-Id: I98b1b936b9b6cbaa50fbc59d243a43e66a6ee9f9
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
We stopped handling URIs such as "example.com:/some/p ath", because
this was confused with the Windows absolute path syntax of "c:/path".
Support absolute style scp URIs again, but only when the host name
is more than 2 characters long.
Change-Id: I9ab049bc9aad2d8d42a78c7ab34fa317a28efc1a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This reverts commit 1e510ec20e.
Instead work around the warning by defining our constant by
constructing it through a StringBuilder.
Change-Id: If139509e769d649609c62eff359ebaea5dd286b2
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CC: Matthias Sohn <matthias.sohn@sap.com>
CC: Chris Aniszczyk <caniszczyk@gmail.com>
IndexDiff was extended to detect files which are both removed from the
index and untracked. Before this change these files were only added
to the removed collection.
Change-Id: I971d8261d2e8932039fce462b59c12e143f79f90
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
jgit.sh <command> --help was not working for the commands Diff
and ShowCommands because of missing metaVar information. Missing
information is added here.
Change-Id: I0ab7e35006b6aa7d4326a634309dddfcdb78f2a6
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Implementation delegates all work to the AddCommand class and,
therefore, supports only those options currently supported by the
AddCommand which means: --update and the filepattern... arguments.
Change-Id: I4827d37e08b4c988c2458d9ba60a61b6ad414d10
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
The code analyzer can't know that passing a value known to be null is
not a problem. Hence better pass null explicitly instead of the
parameters being null.
Change-Id: I8db6f8014de6c00dd95974d60f61ecc66191e6d4
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
There was a bug in ResolveMerger which is one reason for
bug 328841. If a merge was failing because of conflicts
deletions where not handled correctly. Files which have
to be deleted (because there was a non-conflicting deletion
coming in from THEIRS) are not deleted. In the
non-conflicting case we also forgot to delete the file but
in this case we explicitly checkout in the end these files
get deleted during that checkout.
This is fixed by handling incoming deletions explicitly.
Bug: 328841
Change-Id: I7f4c94ab54138e1b2f3fcdf34fb803d68e209ad0
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
The automatically generated commit message of a merge should have the
same structure as in C Git for consistency (as per git fmt-merge-msg).
Before this change:
merging refs/heads/a into refs/heads/master
After:
Merge branch 'a'
Plurals, "into" and joining by "," and "and" also work.
Change-Id: I9658ce2817adc90d2df1060e8ac508d7bd0571cb
When --git-dir=X is given JGit creates a bare repository in the
directory X. However, when the --bare option is not explicitly
given, this is not properly reflected in the X/config file i.e.
the bare=true is missing. This change fixes this minor issue.
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
There were also some compiler warning due to empty catch blocks that
were fixed.
Change-Id: I165bcddcdfacd34f020d1b938a41954916eb106e
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
This mirrors the getByte() API in ObjectId and allows the caller to
modify a single byte, which is useful when updating it as part of a
loop walking through 0x00..0xff inside of a range of objects.
Change-Id: I57fa8420011fe5ed5fc6bfeb26f87a02b3197dab
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Processing git notes requires random access to part of the raw data
of each ObjectId... which isn't easy because ObjectIds are stored
with an internal representation of 5 ints. Expose random access
to the individual data bytes through new methods, avoiding the
need to convert first to a byte[20].
Change-Id: I99e64700b27fc0c95aa14ef8ad46a0e8832d4441
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Instead of hiding this logic inside of DirCacheTree and the legacy
Tree type, pull it into a common place where we can reuse it by
creating tree records in a buffer that can be passed directly into
the ObjectInserter. This allows us to avoid some copying, as the
inserter can be given the internal buffer of the formatter.
Because we trust these two callers to feed us records in the proper
order, without '/' in the names, and without duplicate names in the
same tree, we don't do any validation inside of the formatter itself.
To protect themselves from making ordering errors, developers should
continue to use DirCache to process edits to source code trees.
Change-Id: Idf7f10e736d4a44ccdf8afe060535d7b0554a92f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>