If UploadPack invokes flush() on the output stream we pass it, its
most likely the progress messages coming down the side band stream.
As pack generation can take a while, we want to push that down
at the client as early as we can, to keep the connection alive,
and to let the user know we are still working on their behalf.
Ensure we dump the temporary buffer whenever flush() is invoked,
otherwise the messages don't get sent in a timely fashion to the
user agent (in this case, git fetch).
We specifically don't implement flush() for ReceivePack right now,
as that protocol currently does not provide progress messages to
the user, but it does invoke flush several times, as the different
streams include '0000' type flush-pkts to denote various end points.
Change-Id: I797c90a2c562a416223dc0704785f61ac64e0220
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Created wrong tags for 0.8.3 hence creating another version.
Change-Id: I4e00bbcffe1cf872e2d7e3f3d88d068701fb5330
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
On Windows, FS_Win32_Cygwin has been used if a Cygwin Git installation
is present in the PATH. Assuming that the user works with the Cygwin
Git installation may result in unnecessary overhead if he actually
does not.
Applications built on top of jgit may have more knowledge on the
actually used Git client (Cygwin or not) and hence should be able to
configure which FS to use accordingly.
Change-Id: Ifc4278078b298781d55cf5421e9647a21fa5db24
The J2SE NIO APIs require that FileChannel close the underlying file
descriptor if a thread is interrupted while it is inside of a read or
write operation on that channel. This is insane, because it means we
cannot share the file descriptor between threads. If a thread is in
the middle of the FileChannel variant of IO.readFully() and it
receives an interrupt, the pack will be automatically closed on us.
This causes the other threads trying to use that same FileChannel to
receive IOExceptions, which leads to the pack getting marked as
invalid. Once the pack is marked invalid, JGit loses access to its
entire contents and starts to report MissingObjectExceptions.
Because PackWriter must ensure that the chosen pack file stays
available until the current object's data is fully copied to the
output, JGit cannot simply reopen the pack when its automatically
closed due to an interrupt being sent at the wrong time. The pack may
have been deleted by a concurrent `git gc` process, and that open file
descriptor might be the last reference to the inode on disk. Once its
closed, the PackWriter loses access to that object representation, and
it cannot complete sending the object the client.
Fortunately, RandomAccessFile's readFully method does not have this
problem. Interrupts during readFully() are ignored. However, it
requires us to first seek to the offset we need to read, then issue
the read call. This requires locking around the file descriptor to
prevent concurrent threads from moving the pointer before the read.
This reduces the concurrency level, as now only one window can be
paged in at a time from each pack. However, the WindowCache should
already be holding most of the pages required to handle the working
set for a process, and its own internal locking was already limiting
us on the number of concurrent loads possible. Provided that most
concurrent accesses are getting hits in the WindowCache, or are for
different repositories on the same server, we shouldn't see a major
performance hit due to the more serialized loading.
I would have preferred to use a pool of RandomAccessFiles for each
pack, with threads borrowing an instance dedicated to that thread
whenever they needed to page in a window. This would permit much
higher levels of concurrency by using multiple file descriptors (and
file pointers) for each pack. However the code became too complex to
develop in any reasonable period of time, so I've chosen to retrofit
the existing code with more serialization instead.
Bug: 308945
Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Since the API is changing relative to 0.7.0, we'll call our next
release 0.8.1. But until that gets released, builds from master
will be 0.8.0.qualifier.
Change-Id: I921e984f51ce498610c09e0db21be72a533fee88
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The HTTP server side code now uses the same approach that the smart
HTTP client code uses when preparing a request body. The payload
is streamed into a TemporaryBuffer of limited size. If the entire
data fits, its compressed with gzip if the user agent supports that,
and a Content-Length header is used to transmit the fixed length
body to the peer. If however the data overflows the limited memory
segment, its streamed uncompressed to the peer.
One might initially think that larger contents which overflow
the buffer should also be compressed, rather than sent raw, since
they were deemed "large". But usually these larger contents are
actually a pack file which has been already heavily compressed by
Git specific routines. Trying to deflate that with gzip is probably
going to take up more space, not less, so the compression overhead
isn't worthwhile.
This buffer and compress optimization helps repositories with a
large number of references, as their text based advertisements
compress well. For example jgit's own native repository currently
requires 32,628 bytes for its full advertisement of 489 references.
Most repositories have fewer references, and thus could compress
their entire response in one buffer.
Change-Id: I790609c9f763339e0a1db9172aa570e29af96f42
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Actually set the range of versions we are willing to accept for
each package we import, lest we import something in the future
that isn't compatible with our needs.
Change-Id: I25dbbb9eaabe852631b677e0c608792b3ed97532
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Technically our project name is "JGit", not "Java Git". In fact
there is already another project called "JavaGit" (no space) that we
don't want to become confused with. Ensure we always call ourselves
"JGit" in user visible assets, like the bundle name.
Other Eclipse products list their provider as "Eclipse.org",
not "eclipse.org". So list ourselves that way in all of our
plugin.properties files.
Change-Id: Ibcea1cd6dda2af757a8584099619fc23b7779a84
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Don't copy and sort the set of references if they are passed through
in a RefMap or a SortedMap using the key's natural sort ordering.
Either map is already in the order we want to present the items
to the client in, so copying and sorting is a waste of local CPU
and memory.
Change-Id: I49ada7c1220e0fc2a163b9752c2b77525d9c82c1
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This commit actually does three major changes to the way references
are handled within JGit. Unfortunately they were easier to do as
a single massive commit than to break them up into smaller units.
Disambiguate symbolic references:
---------------------------------
Reporting a symbolic reference such as HEAD as though it were
any other normal reference like refs/heads/master causes subtle
programming errors. We have been bitten by this error on several
occasions, as have some downstream applications written by myself.
Instead of reporting HEAD as a reference whose name differs from
its "original name", report it as an actual SymbolicRef object
that the application can test the type and examine the target of.
With this change, Ref is now an abstract type with different
subclasses for the different types.
In the classical example of "HEAD" being a symbolic reference to
branch "refs/heads/master", the Repository.getAllRefs() method
will now return:
Map<String, Ref> all = repository.getAllRefs();
SymbolicRef HEAD = (SymbolicRef) all.get("HEAD");
ObjectIdRef master = (ObjectIdRef) all.get("refs/heads/master");
assertSame(master, HEAD.getTarget());
assertSame(master.getObjectId(), HEAD.getObjectId());
assertEquals("HEAD", HEAD.getName());
assertEquals("refs/heads/master", master.getName());
A nice side-effect of this change is the storage type of the
symbolic reference is no longer ambiguous with the storge type
of the underlying reference it targets. In the above example,
if master was only available in the packed-refs file, then the
following is also true:
assertSame(Ref.Storage.LOOSE, HEAD.getStorage());
assertSame(Ref.Storage.PACKED, master.getStorage());
(Prior to this change we returned the ambiguous storage of
LOOSE_PACKED for HEAD, which was confusing since it wasn't
actually true on disk).
Another nice side-effect of this change is all intermediate
symbolic references are preserved, and are therefore visible
to the application when they walk the target chain. We can
now correctly inspect chains of symbolic references.
As a result of this change the Ref.getOrigName() method has been
removed from the API. Applications should identify a symbolic
reference by testing for isSymbolic() and not by using an arcane
string comparsion between properties.
Abstract the RefDatabase storage:
---------------------------------
RefDatabase is now abstract, similar to ObjectDatabase, and a
new concrete implementation called RefDirectory is used for the
traditional on-disk storage layout. In the future we plan to
support additional implementations, such as a pure in-memory
RefDatabase for unit testing purposes.
Optimize RefDirectory:
----------------------
The implementation of the in-memory reference cache, reading, and
update routines has been completely rewritten. Much of the code
was heavily borrowed or cribbed from the prior implementation,
so copyright notices have been left intact as much as possible.
The RefDirectory cache no longer confuses symbolic references
with normal references. This permits the cache to resolve the
value of a symbolic reference as late as possible, ensuring it
is always current, without needing to maintain reverse pointers.
The cache is now 2 sorted RefLists, rather than 3 HashMaps.
Using sorted lists allows the implementation to reduce the
in-memory footprint when storing many refs. Using specialized
types for the elements allows the code to avoid additional map
lookups for auxiliary stat information.
To improve scan time during getRefs(), the lists are returned via
a copy-on-write contract. Most callers of getRefs() do not modify
the returned collections, so the copy-on-write semantics improves
access on repositories with a large number of packed references.
Iterator traversals of the returned Map<String,Ref> are performed
using a simple merge-join of the two cache lists, ensuring we can
perform the entire traversal in linear time as a function of the
number of references: O(PackedRefs + LooseRefs).
Scans of the loose reference space to update the cache run in
O(LooseRefs log LooseRefs) time, as the directory contents
are sorted before being merged against the in-memory cache.
Since the majority of stable references are kept packed, there
typically are only a handful of reference names to be sorted,
so the sorting cost should not be very high.
Locking is reduced during getRefs() by taking advantage of the
copy-on-write semantics of the improved cache data structure.
This permits concurrent readers to pull back references without
blocking each other. If there is contention updating the cache
during a scan, one or more updates are simply skipped and will
get picked up again in a future scan.
Writing to the $GIT_DIR/packed-refs during reference delete is
now fully atomic. The file is locked, reparsed fresh, and written
back out if a change is necessary. This avoids all race conditions
with concurrent external updates of the packed-refs file.
The RefLogWriter class has been fully folded into RefDirectory
and is therefore deleted. Maintaining the reference's log is
the responsiblity of the database implementation, and not all
implementations will use java.io for access.
Future work still remains to be done to abstract the ReflogReader
class away from local disk IO.
Change-Id: I26b9287c45a4b2d2be35ba2849daa316f5eec85d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Translate the version qualifier using maven-antrun-plugin since we want
manifest-first and currently cannot rely on Tycho for the JGit build.
Introduce property for Eclipse p2 repository to enable builds against
other Eclipse versions.
Change-Id: I62c4e77ae91fe17f56c5a5338d53828d4e225395
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Clients can request smart fetch support by examining the info/refs URL
with the service parameter set to the magic git-upload-pack string:
GET /$GIT_DIR/info/refs?service=git-upload-pack HTTP/1.1
The response is formatted with the upload pack capabilities, using
the standard packet line formatter. A special header line is put
in front of the standard upload-pack advertisement to let clients
know the service was recognized and is supported.
If the requested service is disabled an authorization status code is
returned, allowing the user agent to retry once they have obtained
credentials from a human, in case authentication is required by
the configured UploadPackFactory implementation.
Change-Id: Ib0f1a458c88b4b5509b0f882f55f83f5752bc57a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Clients can request smart push support by examining the info/refs URL
with the service parameter set to the magic git-receive-pack string:
GET /$GIT_DIR/info/refs?service=git-receive-pack HTTP/1.1
The response is formatted with the receive pack capabilities, using
the standard packet line formatter. A special header block is put
in front of the standard receive-pack advertisement to let clients
know the service was recognized and is supported.
If the requested service is disabled an authorization status code is
returned, allowing the user agent to retry once they have obtained
credentials from a human, in case authentication is required by
the configured ReceivePackFactory implementation.
Change-Id: Ie4f6e0c7b68a68ec4b7cdd5072f91dd406210d4f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is a simple HTTP server that provides the minimum server side
support required for dumb (non-git aware) transport clients.
We produce the info/refs and objects/info/packs file on the fly
from the local repository state, but otherwise serve data as raw
files from the on-disk structure.
In the future we could better optimize the FileSender class and the
servlets that use it to take advantage of direct file to network
APIs in more advanced servlet containers like Jetty.
Our glue package borrows the idea of a micro embedded DSL from
Google Guice and uses it to configure a collection of Filters
and HttpServlets, all of which are matched against requests using
regular expressions. If a subgroup exists in the pattern, it is
extracted and used for the path info component of the request.
Change-Id: Ia0f1a425d07d035e344ae54faf8aeb04763e7487
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>