Its wrong to return null if we are resolving an abbreviation and we
have proven it matches more than one object. We know how to resolve
it if we had more nybbles, as there are two or more objects with the
same prefix. Declare that to the caller quite clearly by giving them
an AmbiguousObjectException.
Change-Id: I01bb48e587e6d001b93da8575c2c81af3eda5a32
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If we are given a DiffEntry header that already has abbreviated
ObjectIds on it, we may still be able to resolve those locally and
output the difference. Try to do that through the new resolve API
on ObjectReader.
Change-Id: I0766aa5444b7b8fff73620290f8c9f54adc0be96
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Parsing is rewritten to use the size limited form of getCachedBytes,
thus freeing the revwalk infrastructure from needing to care about
a large object vs. a small object when it gets an ObjectLoader.
Right now we hardcode our upper bound for a commit or annotated
tag to be 15 MiB. I don't know of any that is more than 1 MiB in
the wild, so going 15x that should give us some reasonable headroom.
Change-Id: If296c211d8b257d76e44908504e71dd9ba70ffa8
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Rather than duplicating this block everywhere, reuse the limited size
form of getCachedBytes to acquire the content of an object.
Change-Id: I2e26a823e6fd0964d8f8dbfaa0fc2e8834c179c1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Some algorithms are coded in a way that requires us to provide them
the entire object contents as a contiguous byte array. The parsers
in RevCommit and RevTag, or our RawText objects are really good
examples of these.
Instead of duplicating this logic everywhere, lets put it into the
base ObjectLoader type. That way the caller only needs to give us
their upper size bound, and we'll do the rest of the heavy work to
figure out if the object still fits within that bound, and get them
an array that has the complete contents.
Change-Id: Id95a7f79d2b97e39f6949370ccca2f2c9cfb1a0f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
A chunk of code that throws LargeObjectException may or may not have
the specific ObjectId on hand when its thrown. If it does, we want
to cache it in the exception, and put that in the message. If it is
missing we want to be able to set it later from a higher level stack
frame that does have the object handy.
Change-Id: Ife25546158868bdfa886037e4493ef8235ebe4b9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
If the loader's stream is broken and returns to us more content
than it originally declared as the size of the object, don't
copy that onto the output stream. Instead throw EOFException
and abort fast. This way we don't follow an infinite stream,
but instead will at least stop when the size was reached.
Change-Id: I7ec0c470c875f03b1f12a74a9b4d2f6e73b659bb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the stream is a delta decompression stream, getting the size
can be expensive. Its cheaper to get it from the stream itself
rather than from the object loader.
Change-Id: Ia7f0af98681f6d56ea419a48c6fa8eea09274b28
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If we can't resolve an abbreviation, it might be because there is
a new pack file we haven't picked up yet. Try scanning the packs
again and recheck each pack if there were differences from the last
scan we did.
Because of this, we don't have to open a pack during the test where
we generate a pack on the fly. We'll miss on the first loop during
which the PackList is the NO_PACKS magic initialization constant,
and pick up the newly created index during this retry logic.
Change-Id: I7b97efb29a695ee60c90818be380f7ea23ad13a3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectReader implementations are now responsible for creating the
unique abbreviation of an ObjectId, or for resolving an abbreviation
back to its full form. In this latter case the reader can offer up
multiple candidates to the caller, who may be able to disambiguate
them based on context.
Repository.resolve() doesn't take multiple candidates into account
right now, but it could in the future by looking for a remaining
^0 or ^{commit} suffix and take an expansion if there is only one
commit that matches the input abbreviation. It could also use
the distance from an annotated tag to resolve "tag-NNN-gcommit"
style strings that are often output by `git describe`.
Change-Id: Icd3250adc8177ae05278b858933afdca0cbbdb56
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectWriter is a deprecated API that people shouldn't be using.
So get rid of it in favor of the ObjectInserter API.
Change-Id: I6218bcb26b6b9ffb64e3e470dba5dca2e0a62fd4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This makes it easier for abstract tools like AddCommand to open the
file from the working tree, without knowing internal details about
how the tree is managed.
Change-Id: Ie64a552f07895d67506fbffb3ecf1c1be8a7b407
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Applications should favor the long style interface, especially when
their source input is a long type, e.g. coming from java.io.File.
This way when the index format is later changed to support a
larger file size than 2 GiB we can handle it by just changing the
entry code, and not need to fix a lot of applications.
Change-Id: I332563caeb110014e2d544dc33050ce67ae9e897
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These objects should be responsible for their own formatting,
rather than delegating it to some obtuse type called ObjectInserter.
While we are at it, simplify the way we insert these into a database.
Passing in the type and calling format in application code turned
out to be a huge mistake in terms of ease-of-use of the insert API.
Change-Id: Id5bb95ee56aa2a002243e9b7853b84ec8df1d7bf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since these types no longer support reading, calling them a Builder
is a better description of what they do. They help the caller to
build a commit or a tag object.
Change-Id: I53cae5a800a66ea1721b0fe5e702599df31da05d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since we stopped supporting these types for reading, but their
name is a natural candidate for someone to try and use in code,
explain where they should be looking instead.
Change-Id: I091a1b0ef71b842016020f938ba3161431aab9c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There where 3 cases where a JGitInternalExcption was created
without specifying the root cause. This has been fixed.
Change-Id: I2ee08d04732371cd9e30874b1437b61217770b6a
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
WorkingTreeIterator now optionally performs CRLF to LF conversion for
text files. A basic framework is left in place to support enabling
(or disabling) this feature based on gitattributes, and also to
support the more generic smudge/clean filter system. As there is
no gitattribute support yet in JGit this is left unimplemented,
but the mightNeedCleaning(), isBinary() and filterClean() methods
will provide reasonable places to plug that into in the future.
[sp: All bugs inside of WorkingTreeIterator are my fault, I wrote
most of it while cherry-picking this patch and building it on
top of Marc's original work.]
CQ: 4419
Bug: 301775
Change-Id: I0ca35cfbfe3f503729cbfc1d5034ad4abcd1097e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These classes need to be visible if an application wants to define
its own native pack based protocol embedded within another layer,
much like we already support for smart HTTP.
Change-Id: I7e2ac3ad01d15b94d340128a395fe0b2f560ff35
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The reuse system used by an object database may be able to benefit
from knowing what objects are coming next, and even improve data
throughput by delaying (or moving up) objects that are stored near
each other in the source database.
Pushing the iteration down into the reuse code makes it possible
for a smarter implementation to aggregate reuse. But for the
standard pack file format on disk we don't bother, its quite
efficient already.
Change-Id: I64f0048ca7071a8b44950d6c2a5dfbca3be6bba6
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some instances may benefit from having access to memory efficient
storage for some small values, like single flag bits. Give up a
portion of our delta depth field to make 4 bits available to any
subclass that wants it.
This still gives us room for delta chains of 1,048,576 objects,
and that is just insane. Unpacking 1 million objects to get to
something is longer than most users are willing to wait for data
from Git.
Change-Id: If17ea598dc0ddbde63d69a6fcec0668106569125
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
An ObjectReader implementation may be very slow for a single object,
but yet support bulk queries efficiently by batching multiple small
requests into a single larger request. This easily happens when the
reader is built on top of a database that is stored on another host,
as the network round-trip time starts to dominate the operation cost.
RevWalk, ObjectWalk, UploadPack and PackWriter are the first major
users of this new bulk interface, with the goal being to support an
efficient way to pack a repository for a fetch/clone client when the
source repository is stored in a high-latency storage system.
Processing the want/have lists is now done in bulk, to remove
the high costs associated with common ancestor negotiation.
PackWriter already performs object reuse selection in bulk, but it
now can also do the object size lookup and object counting phases
with higher efficiency. Actual object reuse, deltification, and
final output are still doing sequential lookups, making them a bit
more expensive to perform.
Change-Id: I4c966f84917482598012074c370b9831451404ee
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
By giving the reader information about the roots of a revision
traversal, some readers may be able to prefetch information from
their backing store using background threads in order to reduce
data access latency. However this isn't typically necessary so
the default reader implementation doesn't react to the advice.
Change-Id: I72c6cbd05cff7d8506826015f50d9f57d5cda77e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ObjectReader implementations may wish to use multiple threads in
order to evaluate object reuse faster. Let the reader make that
decision by passing the iteration down into the reader.
Because the work is pushed into the reader, it may need to locate a
given ObjectToPack given its ObjectId. This can easily occur if the
reader has sent a list of ObjectIds to the object database and gets
back information keyed only by ObjectId, without the ObjectToPack
handle. Expose lookup using the PackWriter's own internal map,
so the reader doesn't need to build a redundant copy to track the
assocation of ObjectId back to ObjectToPack.
Change-Id: I0c536405a55034881fb5db92a2d2a99534faed34
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When the output stream is deeply buffered (e.g. 1 MiB or more in
an HTTP servlet on some containers) trying to kick out the header
earlier will prevent the client from stalling hard while the first
1 MiB is received and it can process the pack header. Forcing a
flush here lets the client see the header and start its progress
monitor for "Receiving objects: (1/N)" so the user knows there
is still activity occurring, even though the buffering may cause
there to be some lag as the buffer fills up on the sending side.
Change-Id: I3edf39e8f703fe87a738dc236d426b194db85e3a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Callers catching a MissingObjectException may need programmatic
access to the ObjectId that wasn't available in the repository.
Change-Id: I2be0380251ebe7e4921fa74e246724e48ad88b0e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Storage implementations or application code using an ObjectReader
may want to access this constant without being inside of a subclass
of the reader.
Change-Id: I6c871a03d5846b9bb899de4d14a265e8b204d8e0
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Storage implementations may find this useful when implementing the
ObjectReuseAsIs interface on their ObjectReader. Expose it so we
don't force them to create a redundant copy of the information.
Change-Id: I802ec8113c00884fccde5d0e92b9849716316f62
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This permits formatting in hex into an existing byte array
supplied by the caller, and mirrors our copyRawTo method
with the same parameter signature.
Change-Id: Ia078d83e338b09b903bfd2d04284e5283f885a19
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Callers might have a canonical tag encoding on hand that they
wish to convert into a clean structure for presentation purposes,
and the object may not be available in a repository. (E.g. maybe
its a "draft" tag being written in an editor.)
Change-Id: I387a462afb70754aa7ee20891e6c0262438fdf32
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Callers might have a canonical commit encoding on hand that they
wish to convert into a clean structure for presentation purposes,
and the object may not be available in a repository. (E.g. maybe
its a "draft" commit being written in an editor.)
Change-Id: I21759cff337cbbb34dbdde91aec5aa4448a1ef37
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Tag class now only supports the creation of an annotated tag
object. To read an annotated tag, applictions should use RevTag.
This permits us to have exactly one implementation, and RevTag's
is faster and more bug-free.
Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Commit class now only supports the creation of a commit object.
To read a commit, applictions should use RevCommit. This permits
us to have exactly one implementation, and RevCommit's is faster
and more bug-free.
Change-Id: Ib573f7e15f36855112815269385c21dea532e2cf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Git doesn't store millisecond accuracy in person identity lines,
so a line that we create in Java and round-trip through a Git object
wouldn't compare as being equal. Truncate to seconds when comparing
values to ensure the same identity is equal.
Change-Id: Ie4ebde64061f52c612714e89ad34de8ac2694b07
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When we need the canonical form of a commit or a tag in order to
parse it into our RevCommit or RevTag fields, we really need it as a
single contiguous byte array. However the ObjectDatabase may choose
to give us a large loader. In general commits or tags are always
under the several MiB limit, so even if the loader calls it "large"
we should still be able to afford the JVM heap memory required to
get a single byte array. Coerce even large loaders into a single
byte array anyway.
Change-Id: I04efbaa7b31c5f4b0a68fc074821930b1132cfcf
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ReadTreeTests relied on Repository.getIndex() which on
platforms which coarse FileSystemTimers failed to detect
index modifications. By explicitly reloading and writing
the index this problem is solved.
Change-Id: I0a98babfc2068a3b6b7d2257834988e1154f5b26
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since equals() is now final and does not permit being overridden,
we should do the same thing with compareTo() to prevent different
subclasses from having different ordering behaviors. This could
lead to the same mess that we had with different equals() behaviors.
Change-Id: I35a849b6efccee5fe74cc5788a3566a1516004b7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since equals() is now final and does not permit being overridden,
we should do the same thing with hashCode() to prevent different
subclasses from having different hashing behaviors. This could
lead to the same mess that we had with different equals() behaviors.
Change-Id: I35a849b6efccee5fe74cc5788a3566a1516004b7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When RevObject overrode equals() to provide only reference equality
we used to need to convert a RevObject into an ObjectId by copy()
just to use standard Java tools like JUnit assertEquals(), or to
use contains() or get() on standard java.util collection types.
Now that we have removed this override and made ObjectId's equals()
final (preventing any of this mess in the future), some copy()
calls are unnecessary. Anytime the value is being used as an input
to a lookup routine, or to an equals, we can avoid the copy().
However we still want to use copy() anytime we are given an ObjectId
that may exist long-term, where we don't want the high cost of the
additional storage from a RevCommit extension. So we can't remove
all uses of copy(), just some of them.
Change-Id: Ief275dace435c0ddfa362ac8e5d93558bc7e9fc3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Configuration change events were not being triggered, now they are
forwarded from the FileConfig up to the Repository's listeners.
Change-Id: Ida94a59f5a2b7fa8ae0126e33c13343275483ee5
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The MergeResult class is enhanced to report more data about a
three-way merge. Information about conflicts and the base, ours,
theirs commits can be retrived.
Change-Id: Iaaf41a1f4002b8fe3ddfa62dc73c787f363460c2
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
We should be more lenient when tagging without an
tagger or message. Currently, we will throw an NPE
which is incorrect behavior.
Change-Id: I04e30ce25a9432e4ca56c3f29658ecb24fb18d24
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
There was a duplicated getter and setter for tagger in Tag.
There's no needed to have two getters and setters that represent
the same things. The appropriate tests were updated also.
Change-Id: If46dc00c4c0f31ea4234c6d3bda3c03e6ebbafac
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
indexState() encodes the complete state of the index
into one readable String. This helps to write tests
against the index. indexState() is enhanced to optionally
also contain the content of the files in the index.
Change-Id: Ie988f93768d864f4cbd55809a786bd5759fc24a5
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>