Smart HTTP clients may request both multi_ack_detailed and no-done in
the same request to prevent the client from needing to send a "done"
line to the server in response to a server's "ACK %s ready".
For smart HTTP, this can save 1 full HTTP RPC in the fetch exchange,
improving overall latency when incrementally updating a client that
has not diverged very far from the remote repository.
Unfortuantely this capability cannot be enabled for the traditional
bi-directional connections. multi_ack_detailed has the client sending
more "have" lines at the same time that the server is creating the
"ACK %s ready" and writing out the PACK stream, resulting in some race
conditions and/or deadlock, depending on how the pipe buffers are
implemented. For very small updates, a server might actually be able
to send "ACK %s ready", then the PACK, and disconnect before the
client even finishes sending its first batch of "have" lines. This
may cause the client to fail with a broken pipe exception. To avoid
all of these potential problems, "no-done" is restricted only to the
smart HTTP variant of the protocol.
Change-Id: Ie0d0a39320202bc096fec2e97cb58e9efd061b2d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In order to run the static checks run:
mvn -P static-checks clean install
Change-Id: I14077498a04be986ded123ddbfc97da8f9bc3130
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When the client is clearly making a smart HTTP request to our smart
HTTP server, return any errors like RepositoryNotFoundException or
ServiceNotEnabledException inside of the payload as a Git level ERR
message, rather than an HTTP error code.
This prevents the C Git command line client from retrying a failed
"$URL/info/refs?service=git-upload-pack" request without the smart
service URL, only to fail again with "403 Forbidden" when the dumb
as-is service has been disabled by the server configuration, or is
unavailable because the repository is not on the local filesystem.
Change-Id: I57e8756d5026e885e0ca615979bfcd729703be6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Embedding applications can use this hook to watch actions within
UploadPack and possibly reject them. This could be useful to prevent
clones of a large repository from this server, or to stop abusive
negotiation rounds that offer thousands of objects in a single batch.
Change-Id: Id96f1885ac4d61f22c80b6418fff54184b7348ba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Permit applications embedding GitServlet to wrap the
info/refs?service=$name and /$name operations with a
servlet Filter.
To help applications inspect state of the operation,
expose the UploadPack or ReceivePack object into a
request attribute. This can be useful for logging,
or to implement throttling of requests like Gerrit
Code Review uses to prevent server overload.
Change-Id: Ib8773c14e2b7a650769bd578aad745e6651210cb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some clients coming through proxies may advertise a different
Accept-Encoding, for example "Accept-Encoding: gzip(proxy)".
Matching by substring causes us to identify this as a false positive;
that the client understands gzip encoding and will inflate the
response before reading it.
In this particular case however it doesn't. Its the reverse proxy
server in front of JGit letting us know the proxy<->JGit link can
be gzip compressed, while the client<->proxy part of the link is not:
client <-- no gzip --> proxy <-- gzip --> JGit
Use a more standard method of parsing by splitting the value into
tokens, and only using gzip if one of the tokens is exactly the
string "gzip". Add a unit test to make sure this isn't broken in
the future.
Change-Id: Ib4c40f9db177322c7a2640808a6c10b3c4a73819
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
As PackParser supports a progress meter for the "Resolving deltas"
phase of its work, we should export this to smart HTTP clients so
they know the server is still working on their (large) upload.
However this isn't as simple as just dropping in a binding for
the SmartOutputStream to flush when its told to. We want to
avoid spurious flushes triggered by the use of sideband, or the
status report formatting in the send-pack/receive-pack protocol.
Change-Id: Ibd88022a298c5fed0edb23dfaf2e90278807ba8b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some clients coming through proxies may advertise a different
Accept-Encoding, for example "Accept-Encoding: gzip(proxy)".
Matching by substring causes us to identify this as a false positive;
that the client understands gzip encoding and will inflate the
response before reading it.
In this particular case however it doesn't. Its the reverse proxy
server in front of JGit letting us know the proxy<->JGit link can
be gzip compressed, while the client<->proxy part of the link is not:
client <-- no gzip --> proxy <-- gzip --> JGit
Use a more standard method of parsing by splitting the value into
tokens, and only using gzip if one of the tokens is exactly the
string "gzip". Add a unit test to make sure this isn't broken in
the future.
Change-Id: I30cda8a6d11ad235b56457adf54a2d27095d964e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Using a resolver and factory pattern for the anonymous git:// Daemon
class makes transport.Daemon more useful on non-file storage systems,
or in embedded applications where the caller wants more precise
control over the work tasks constructed within the daemon.
Rather than defining new interfaces, move the existing HTTP ones
into transport.resolver and make them generic on the connection
handle type. For HTTP, continue to use HttpServletRequest, and
for transport.Daemon use DaemonClient.
To remain compatible with transport.Daemon, FileResolver needs to
learn how to use multiple base directories, and how to export any
Repository instance at a fixed name.
Change-Id: I1efa6b2bd7c6567e983fbbf346947238ea2e847e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We need to use findbugs-maven-plugin:2.3.2-SNAPSHOT
since otherwise build fails with maven-3.0 [1], [2].
We should switch to the release version as soon
as this becomes available.
[1] http://www.sonatype.com/people/2010/10/maven-3-0-has-landed/
[2] http://jira.codehaus.org/browse/MFINDBUGS-122
Bug: 327799
Change-Id: I1c57f81cf6f0450e56411881488c4ee754e458e3
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This reverts commit db4c516f67 since
it breaks compatibility with Eclipse 3.5 which can no longer import
the projects
Bug: 323390
Change-Id: I3cc91364a6747cfcb4c611a9be5258f81562f726
Updates the project level settings to run the formatter
on save on only on the edited lines.
Change-Id: I26dd69d0c95e6d73f9fdf7031f3c1dbf3becbb79
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Update a number of calling sites of RevWalk to ensure the walker's
internal ObjectReader is released after the walk is no longer used.
Because the ObjectReader is likely to hold onto a native resource
like an Inflater, we don't want to leak them outside of their
useful scope.
Where possible we also try to share ObjectReaders across several
walk pools, or between a walker and a PackWriter. This permits
the ObjectReader to actually do some caching if it felt inclined
to do so.
Not everything was updated, we'll probably need to come back and
update even more call sites, but these are some of the biggest
offenders. Test cases in particular aren't updated. My plan is to
move most storage-agnostic tests onto some purely in-memory storage
solution that doesn't do compression.
Change-Id: I04087ec79faeea208b19848939898ad7172b6672
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This move isolates all of the local file specific implementation code
into a single package, where their package-private methods and support
classes are properly hidden away from the rest of the core library.
Because of the sheer number of files impacted, I have limited this
change to only the renames and the updated imports.
Change-Id: Icca4884e1a418f83f8b617d0c4c78b73d8a4bd17
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some types of repositories might not be stored on local disk. For
these, they will most likely return null for getDirectory() as the
java.io.File type cannot describe where their storage is, its not
in the host's filesystem.
Document that getDirectory() can return null now, and update all
current non-test callers in JGit that might run into problems on
such repositories. For the most part, just act like its bare.
Change-Id: I061236a691372a267fd7d41f0550650e165d2066
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If UploadPack invokes flush() on the output stream we pass it, its
most likely the progress messages coming down the side band stream.
As pack generation can take a while, we want to push that down
at the client as early as we can, to keep the connection alive,
and to let the user know we are still working on their behalf.
Ensure we dump the temporary buffer whenever flush() is invoked,
otherwise the messages don't get sent in a timely fashion to the
user agent (in this case, git fetch).
We specifically don't implement flush() for ReceivePack right now,
as that protocol currently does not provide progress messages to
the user, but it does invoke flush several times, as the different
streams include '0000' type flush-pkts to denote various end points.
Change-Id: I797c90a2c562a416223dc0704785f61ac64e0220
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Created wrong tags for 0.8.3 hence creating another version.
Change-Id: I4e00bbcffe1cf872e2d7e3f3d88d068701fb5330
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
On Windows, FS_Win32_Cygwin has been used if a Cygwin Git installation
is present in the PATH. Assuming that the user works with the Cygwin
Git installation may result in unnecessary overhead if he actually
does not.
Applications built on top of jgit may have more knowledge on the
actually used Git client (Cygwin or not) and hence should be able to
configure which FS to use accordingly.
Change-Id: Ifc4278078b298781d55cf5421e9647a21fa5db24
The J2SE NIO APIs require that FileChannel close the underlying file
descriptor if a thread is interrupted while it is inside of a read or
write operation on that channel. This is insane, because it means we
cannot share the file descriptor between threads. If a thread is in
the middle of the FileChannel variant of IO.readFully() and it
receives an interrupt, the pack will be automatically closed on us.
This causes the other threads trying to use that same FileChannel to
receive IOExceptions, which leads to the pack getting marked as
invalid. Once the pack is marked invalid, JGit loses access to its
entire contents and starts to report MissingObjectExceptions.
Because PackWriter must ensure that the chosen pack file stays
available until the current object's data is fully copied to the
output, JGit cannot simply reopen the pack when its automatically
closed due to an interrupt being sent at the wrong time. The pack may
have been deleted by a concurrent `git gc` process, and that open file
descriptor might be the last reference to the inode on disk. Once its
closed, the PackWriter loses access to that object representation, and
it cannot complete sending the object the client.
Fortunately, RandomAccessFile's readFully method does not have this
problem. Interrupts during readFully() are ignored. However, it
requires us to first seek to the offset we need to read, then issue
the read call. This requires locking around the file descriptor to
prevent concurrent threads from moving the pointer before the read.
This reduces the concurrency level, as now only one window can be
paged in at a time from each pack. However, the WindowCache should
already be holding most of the pages required to handle the working
set for a process, and its own internal locking was already limiting
us on the number of concurrent loads possible. Provided that most
concurrent accesses are getting hits in the WindowCache, or are for
different repositories on the same server, we shouldn't see a major
performance hit due to the more serialized loading.
I would have preferred to use a pool of RandomAccessFiles for each
pack, with threads borrowing an instance dedicated to that thread
whenever they needed to page in a window. This would permit much
higher levels of concurrency by using multiple file descriptors (and
file pointers) for each pack. However the code became too complex to
develop in any reasonable period of time, so I've chosen to retrofit
the existing code with more serialization instead.
Bug: 308945
Change-Id: I2e6e11c6e5a105e5aef68871b66200fd725134c9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The strings are externalized into the root resource bundles.
The resource bundles are stored under the new "resources" source
folder to get proper maven build.
Strings from tests are, in general, not externalized. Only in
cases where it was necessary to make the test pass the strings
were externalized. This was typically necessary in cases where
e.getMessage() was used in assert and the exception message was
slightly changed due to reuse of the externalized strings.
Change-Id: Ic0f29c80b9a54fcec8320d8539a3e112852a1f7b
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Since the API is changing relative to 0.7.0, we'll call our next
release 0.8.1. But until that gets released, builds from master
will be 0.8.0.qualifier.
Change-Id: I921e984f51ce498610c09e0db21be72a533fee88
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The HTTP server side code now uses the same approach that the smart
HTTP client code uses when preparing a request body. The payload
is streamed into a TemporaryBuffer of limited size. If the entire
data fits, its compressed with gzip if the user agent supports that,
and a Content-Length header is used to transmit the fixed length
body to the peer. If however the data overflows the limited memory
segment, its streamed uncompressed to the peer.
One might initially think that larger contents which overflow
the buffer should also be compressed, rather than sent raw, since
they were deemed "large". But usually these larger contents are
actually a pack file which has been already heavily compressed by
Git specific routines. Trying to deflate that with gzip is probably
going to take up more space, not less, so the compression overhead
isn't worthwhile.
This buffer and compress optimization helps repositories with a
large number of references, as their text based advertisements
compress well. For example jgit's own native repository currently
requires 32,628 bytes for its full advertisement of 489 references.
Most repositories have fewer references, and thus could compress
their entire response in one buffer.
Change-Id: I790609c9f763339e0a1db9172aa570e29af96f42
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Actually set the range of versions we are willing to accept for
each package we import, lest we import something in the future
that isn't compatible with our needs.
Change-Id: I25dbbb9eaabe852631b677e0c608792b3ed97532
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>