The HTTP RFCs require a server to fully consume the request body before
it can return a non-error status code, which is any code below 400.
JGit returns most Git level errors inside of an HTTP 200 OK response,
and sometimes this happens before the entire request was consumed from
the servlet container. In such cases the body must be skipped or read
until EOF is reached, ensuring the HTTP keep-alive semantics will work
for the next request on the same TCP connection.
HTTP status codes >= 400 may be returned without consuming the body,
and a servlet container must set "Connection: close" in the response
headers when this happens, since the state of the request body is not
well defined with an early abort.
With the introduction of sendError() in GitSmartHttpTools there are
only a handful of locations that need to worry about the request body
being consumed, so sprinkle the call in as necessary.
Change-Id: I5381e110585f780c01a764df8e27c80aacf5146e
Error messages are typically short, below the 32 KiB in-memory buffer
size of the SmartOutputStream. When an error is queued up for sending
to a client and an exception is thrown up into the servlet handler we
discarded the message and sent nothing to the client, as the messages
were stuck inside of the SmartOutputStream buffer.
Hoist the creation of the output stream above the invocation of try
block of the service, and use close() in the few catch blocks that
assume there are buffered messages ready for transmission. This will
ensure errors from unpacking a stream in ReceivePack are sent off to
a client correctly, as previously these were causing no status report
to arrive at the client side as the data was stuck in the buffer.
Change-Id: I5534b560697731121f48979ae077aa7c95b8e39c
The stream must be closed to ensure the native resources associated
with its internal Deflater instance are cleaned up early, instead of
waiting for GC to identify the dead object and finialize it.
Change-Id: Ic31b5df563f19404ed4682556999f4332aa61562
The GitSmartHttpTools class started as utility functions to help report
useful error messages to users of the android.googlesource.com service.
Now that the GitServlet and GitFilter classes support filters before a
git-upload-pack or git-receive-pack request, server implementors may
these routines helpful to report custom messages to clients. Using the
sendError() method to return an HTTP 200 OK with error text embedded in
the payload prevents native Git clients from retrying the action with a
dumb Git or WebDAV HTTP request.
Refactor some of the existing code to use these new error functions and
protocol constants. The new sendError() function is very close to being
identical to the old error handling code in RepositoryFilter, however we
now use the POST Content-Type rather than the Accept HTTP header to check
if the client will accept the error data in the response body rather than
using the HTTP status code. This is a more reliable way of checking for
native Git clients, as the Accept header was not always populated with the
correct string in older versions of Git smart HTTP.
Change-Id: I828ac2deb085af12b6689c10f86662ddd39bd1a2
If removing the leading slash results in an empty string, return
with an HTTP 404 error before trying to use the RepositoryResolver.
Moving this into a loop ahead of the length check ensures there is
no empty string passed into the resolver.
Change-Id: I80e5b7cf25ae9f2164b5c396a29773e5c7d7286e
All Git URLs operate off a suffix approach, for example the default
binding is for paths such as:
*/info/refs
*/git-upload-pack
*/git-receive-pack
These names are not common on project hosting servers, especially
one like Gerrit Code Review.
In addition to offering Git-over-HTTP as a servlet, offer it as a
filter that triggers when a matching suffix appears, but otherwise
delegates the request through the chain. This filter would permit
Gerrit Code Review to place projects at the root of the server,
rather than within the "/p/" subdirectory, making the HTTP and SSH
URL structure exactly match each other.
To prevent breakage with existing users, the MetaServlet and
GitServlet are kept as wrappers delegating to their filters,
returning 404 Not Found when the filter has no match.
Change-Id: I2465c15c086497e0faaae5941159d80c028fa8b1
We should use a template for Mylyn commit messages that matches with our
guidelines for commit messages.
http://wiki.eclipse.org/EGit/Contributor_Guide#Commit_message_guidelines
Bug: 337401
Change-Id: I05812abf0eb0651d22c439142640f173fc2f2ba0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If an internal exception occurs while packing and the request
needs to abort, the HTTP response might already be committed due
to progress message having already been delivered to the client.
This prevents UploadPackServlet from resetting the response and
sending back an HTTP 500 response.
Try to catch all exceptions and report internal errors over the
sideband stream or as an ERR command during the initial ACK/NAK
negotiation phase. This allows JGit to transmit an error message
that the user will receive on their console without needing to
worry about resetting the (already gone) HTTP response.
Change-Id: Ie393fb8bb55d2b79ab1276adf71c781c1807f9fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The filter did not correctly match smart HTTP client requests,
so it always fell back on HTTP status codes for errors. This
usually causes a smart client to retry a dumb request, which
is not what the server wants.
Change-Id: I42592378dc42fbe308ef30a2923786c690f668a9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Smart HTTP clients may request both multi_ack_detailed and no-done in
the same request to prevent the client from needing to send a "done"
line to the server in response to a server's "ACK %s ready".
For smart HTTP, this can save 1 full HTTP RPC in the fetch exchange,
improving overall latency when incrementally updating a client that
has not diverged very far from the remote repository.
Unfortuantely this capability cannot be enabled for the traditional
bi-directional connections. multi_ack_detailed has the client sending
more "have" lines at the same time that the server is creating the
"ACK %s ready" and writing out the PACK stream, resulting in some race
conditions and/or deadlock, depending on how the pipe buffers are
implemented. For very small updates, a server might actually be able
to send "ACK %s ready", then the PACK, and disconnect before the
client even finishes sending its first batch of "have" lines. This
may cause the client to fail with a broken pipe exception. To avoid
all of these potential problems, "no-done" is restricted only to the
smart HTTP variant of the protocol.
Change-Id: Ie0d0a39320202bc096fec2e97cb58e9efd061b2d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In order to run the static checks run:
mvn -P static-checks clean install
Change-Id: I14077498a04be986ded123ddbfc97da8f9bc3130
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When the client is clearly making a smart HTTP request to our smart
HTTP server, return any errors like RepositoryNotFoundException or
ServiceNotEnabledException inside of the payload as a Git level ERR
message, rather than an HTTP error code.
This prevents the C Git command line client from retrying a failed
"$URL/info/refs?service=git-upload-pack" request without the smart
service URL, only to fail again with "403 Forbidden" when the dumb
as-is service has been disabled by the server configuration, or is
unavailable because the repository is not on the local filesystem.
Change-Id: I57e8756d5026e885e0ca615979bfcd729703be6c
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Embedding applications can use this hook to watch actions within
UploadPack and possibly reject them. This could be useful to prevent
clones of a large repository from this server, or to stop abusive
negotiation rounds that offer thousands of objects in a single batch.
Change-Id: Id96f1885ac4d61f22c80b6418fff54184b7348ba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Permit applications embedding GitServlet to wrap the
info/refs?service=$name and /$name operations with a
servlet Filter.
To help applications inspect state of the operation,
expose the UploadPack or ReceivePack object into a
request attribute. This can be useful for logging,
or to implement throttling of requests like Gerrit
Code Review uses to prevent server overload.
Change-Id: Ib8773c14e2b7a650769bd578aad745e6651210cb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some clients coming through proxies may advertise a different
Accept-Encoding, for example "Accept-Encoding: gzip(proxy)".
Matching by substring causes us to identify this as a false positive;
that the client understands gzip encoding and will inflate the
response before reading it.
In this particular case however it doesn't. Its the reverse proxy
server in front of JGit letting us know the proxy<->JGit link can
be gzip compressed, while the client<->proxy part of the link is not:
client <-- no gzip --> proxy <-- gzip --> JGit
Use a more standard method of parsing by splitting the value into
tokens, and only using gzip if one of the tokens is exactly the
string "gzip". Add a unit test to make sure this isn't broken in
the future.
Change-Id: Ib4c40f9db177322c7a2640808a6c10b3c4a73819
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
As PackParser supports a progress meter for the "Resolving deltas"
phase of its work, we should export this to smart HTTP clients so
they know the server is still working on their (large) upload.
However this isn't as simple as just dropping in a binding for
the SmartOutputStream to flush when its told to. We want to
avoid spurious flushes triggered by the use of sideband, or the
status report formatting in the send-pack/receive-pack protocol.
Change-Id: Ibd88022a298c5fed0edb23dfaf2e90278807ba8b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some clients coming through proxies may advertise a different
Accept-Encoding, for example "Accept-Encoding: gzip(proxy)".
Matching by substring causes us to identify this as a false positive;
that the client understands gzip encoding and will inflate the
response before reading it.
In this particular case however it doesn't. Its the reverse proxy
server in front of JGit letting us know the proxy<->JGit link can
be gzip compressed, while the client<->proxy part of the link is not:
client <-- no gzip --> proxy <-- gzip --> JGit
Use a more standard method of parsing by splitting the value into
tokens, and only using gzip if one of the tokens is exactly the
string "gzip". Add a unit test to make sure this isn't broken in
the future.
Change-Id: I30cda8a6d11ad235b56457adf54a2d27095d964e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Using a resolver and factory pattern for the anonymous git:// Daemon
class makes transport.Daemon more useful on non-file storage systems,
or in embedded applications where the caller wants more precise
control over the work tasks constructed within the daemon.
Rather than defining new interfaces, move the existing HTTP ones
into transport.resolver and make them generic on the connection
handle type. For HTTP, continue to use HttpServletRequest, and
for transport.Daemon use DaemonClient.
To remain compatible with transport.Daemon, FileResolver needs to
learn how to use multiple base directories, and how to export any
Repository instance at a fixed name.
Change-Id: I1efa6b2bd7c6567e983fbbf346947238ea2e847e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>