Add a storage implementation storing large objects in Amazon S3.
The AmazonS3Repository pre-signs download and upload requests.
AWS access and secret key are expected to be in the
$HOME/.aws/credentials file in the following format:
[default]
accessKey = ...
secretKey = ...
Use AWS version 4 request signing [1] because it is more secure and
supported by all regions. The version 3 signing is not supported in
newer regions.
In follow up changes we should:
- implement getVerifyAction() and do actual verification. Subclasses of
S3Repository can implement caching for object meta data (size) in order
to avoid extra roundtrips to S3. Verification should ensure that meta
data store and content of S3 storage are in sync
- HEAD request used in S3Repository.getSize() seems to always return
Content-length 0 in contrast to the documentation [2]. So getSize() does
detect if the object exists in S3 or not but in case the object exists
it always returns size 0
[1] http://docs.aws.amazon.com/general/latest/gr/signature-version-4.html
[2] https://forums.aws.amazon.com/thread.jspa?threadID=223616
Change-Id: Ic47f094928a259e5264c92b3aacf6d90210907a8
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Implement LfsProtocolServlet handling the "Git LFS v1 Batch API"
protocol [1]. Add a simple file system based LFS content store and the
debug-lfs-store command to simplify testing.
Introduce a LargeFileRepository interface to enable additional storage
implementation while reusing the same protocol implementation.
At the client side we have to configure the lfs.url, specify that
we use the batch API and we don't use authentication:
[lfs]
url = http://host:port/lfs
batch = true
[lfs "http://host:port/lfs"]
access = none
the git-lfs client appends the "objects/batch" to the lfs.url.
Hard code an Authorization header in the FileLfsRepository.getAction
because then git-lfs client will skip asking for credentials. It will
just forward the Authorization header from the response to the
download/upload request.
The FileLfsServlet supports file content storage for "Large File
Storage" (LFS) server as defined by the Github LFS API [2].
- upload and download of large files is probably network bound hence use
an asynchronous servlet for good scalability
- simple object storage in file system with 2 level fan-out
- use LockFile to protect writing large objects against multiple
concurrent uploads of the same object
- to prevent corrupt uploads the uploaded file is rejected if its hash
doesn't match id given in URL
The debug-lfs-store command is used to run the LfsProtocolServlet and,
optionally, the FileLfsServlet which makes it easier to setup a
local test server.
[1]
https://github.com/github/git-lfs/blob/master/docs/api/http-v1-batch.md
[2] https://github.com/github/git-lfs/tree/master/docs/api
Bug: 472961
Change-Id: I7378da5575159d2195138d799704880c5c82d5f3
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
Experimental flag to turn on the KetchLeader within this daemon JVM.
This is a manually elected leader process, set from the command line.
Remote followers for each repository are configured per-repository
using remote sections with ketch-type = FULL. For example:
Manually elected leader's $GIT_DIR/config:
[ketch]
name = A
[remote "A"]
ketch-type = FULL
[remote "B"]
url = git://127.0.0.1:9421/sample.git
ketch-type = FULL
[remote "C"]
url = git://127.0.0.1:9422/sample.git
ketch-type = FULL
Replica B and C daemons:
git daemon \
--export-all \
--enable=receive-pack \
--listen=127.0.0.1 --port=9421 \
--base-path=$HOME/ketch_test/follower_one \
$HOME/ketch_test/follower_one &
git daemon \
--export-all \
--enable=receive-pack \
--listen=127.0.0.1 --port=9422 \
--base-path=$HOME/ketch_test/follower_two \
$HOME/ketch_test/follower_two &
Change-Id: I165f85970a77e16b5263115290d685d8a00566f5
This tool scans all references in the repository and writes out a new
reference pointing to a single commit whose root tree is a RefTree
containing the current refs of this repository.
It alway skips storing the reference it will write to, avoiding the
obvious cycle.
Change-Id: I20b1eeb81c55dc49dd600eac3bf8f90297394113
Pushing with JGit commandline to e.g. Github failed with "unauthorized"
since HttpUrlConnection calls the configured authenticator implicitly.
The problem is that during a push two requests are sent to the server,
first a GET and then a POST (containing the pack data). The first GET
request sent anonymously is rejected with 401 (unauthorized). When an
Authenticator is installed the java.net classes will use the
Authenticator to ask the user for credentials and retry the request.
But this happens under the hood and JGit level code doesn't see that
this happens.
The next request is the POST but since JGit thinks the first GET request
went through anonymously it doesn't add authentication headers to the
POST request. This POST of course also fails with 401 but since this
request contains a lot of body-data streamed from JGit (the pack file!)
the java.net classes can't simply retry the request with authorization
headers. The whole process fails.
Fix this by using Apache httpclient which doesn't use Authenticator to
retrieve credentials. Instead initialize TransportCommand to use the
default credential provider if no other credentials provider was set
explicitly. org.eclipse.jgit.pgm.Main sets this default for the JGit
command line client.
Change-Id: Ic4e0f8b60d4bd6e69d91eae0c7e1b44cdf851b00
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>