Before this change, jgit used to read packed-refs before scanning
loose refs. That was not a problem if gc didn't run concurrently. When
gc did run concurrently with such refs reading, that order sometimes
broke the latter. This lead to reading an older version of a ref's
tip, which meant "losing" the real tip or commit. The specific
read-Vs-gc concurrency scenario which broke reading that way follows:
1. let ref R be in packed-refs and R' be in loose
2. jgit starts reading packed-refs
3. gc also starts its business around that very time
4. jgit still has the time to read R from packed-refs
5. as gc is not done yet updating packed-refs with R'
6. jgit then starts scanning loose refs (or is about to)
7. gc quickly ends up being done moving loose R' to packed-refs
8. so gc (quickly) removes loose refs
9. -while jgit is scanning loose refs, now gone
10. so jgit assumes no loose to consider => packed-refs winning
11. so jgit wrongfully returns R (from 4.) as the tip, instead of R'.
This fix switches the order so loose refs are scanned (secured) before
taking the time to read packed-refs. This way, knowledge of the
likelier tip is guaranteed for ref reading to return the true tip
- despite concurrent gc. If there is no loose ref to scan, jgit reads
packed-refs and lands on R' (or S), which it then returns, as
expected. The gerrit issue [1] should be solved by this fix.
[1] https://code.google.com/p/gerrit/issues/detail?id=2302
Change-Id: Ibd120120a361a3a6ed565f3836afc1db706fbcdd
Signed-off-by: Marco Miller <marco.miller@ericsson.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The special characters <> and '\n' interfere with parsing of
identities. C git strips these special characters, so we should too.
Rather than allocating extra strings by calling String#trim(), add a
few lines to our sanitization method to perform the same trimming as
described in String's Javadoc.
Change-Id: I96edcb93a2fc194ee354d60566d352299742a52f
This might be somewhat surprising behavior to users who might
naturally assume the following invariant:
ident.equals(parseIdent(ident.toExternalString()))
This invariant does not hold since whitespace is only trimmed during
serialization. We don't want to mess with the strings during
initialization, as this is called during the highly-optimized commit
parsing codepath.
Change-Id: I081a603f0ac0e33167462244779b0ff3ad51e80c
There was a bug in JGit which caused NPEs being thrown when Transport
errors should be reported. Avoid the NPE to let the original error show
up.
Change-Id: I9e1e2b0195bd61b7e531a09d0fc7bce109bd6515
We've found in Gerrit Code Review that it is common to pass around
both an ObjectReader (or more commonly a RevWalk wrapping one) and an
ObjectInserter. These code paths often assume that the ObjectReader
can read back any objects created by the ObjectInserter without
flushing. However, we previously had no way to enforce that constraint
programmatically, leading to hard-to-spot problems.
Provide a solution by exposing the ObjectInserter that created an
ObjectReader, when known. Callers can either continue passing both
objects and check:
reader.getCreatedFromInserter() == inserter
or they can just pass around ObjectReader and extract the inserter
when it's needed (checking that it's not null at usage time).
Change-Id: Ibbf5d1968b506f6b47030ab1b046ffccb47352ea
When CheckoutCommand or MergeCommand is called then not in all situation
the treewalks have been prepared to support clean/smudge filters. Fix
this
Bug: 491505
Change-Id: Iab5608049221c46d06812552ab97299e44d59e64
Repurpose RefDatabase#performsAtomicTransactions() slightly, to
indicate that the backend _supports_ atomic transactions, rather than
the current definition, which is that the backend always _uses_ atomic
transactions regardless of whether or not the caller actually wants
them. Allow BatchRefUpdate callers to turn off atomic transactions by
calling setAtomic(false). Defaulting to true means this is backwards
compatible.
Change-Id: I6df78d7df65ab147b4cce7764bd3101db985491c
Such hunks are identifiable by a zero value for "new start line". Prior
to the fix, JGit throws and ArrayIndexOutOfBoundsException on such
patches.
Change-Id: I4f3deb5e5f41a08af965fcc178d678c77270cddb
Signed-off-by: Jonathan Schneider <jkschneider@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Now if refs are unreadable when serving an upload pack the handler
will fail due to the actual underlying failure. Previously all wants
would be rejected as invalid because Repository.getAllRefs() returned
an empty map.
Testing this required a new subclass of InMemoryRepository so that
an IOException could be injected at the correct time.
Signed-off-by: Michael Edgar <adgar@google.com>
Change-Id: Iac708b1db9d0ccce08c4ef5ace599ea0b57afdc0
Change-Id: I5b3b7b0633354d5ccf0c6c320c0df9c93fdf8eeb
Signed-off-by: Ned Twigg <ned.twigg@diffplug.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
CommitCommand already provided a method to set the comment which should
be written into the reflog. The underlying RefUpdate class supported to
skip writing a reflog entry. But through the CommitCommand API it was
not possible to prevent writing a reflog entry. Fix this and allow
creating commits which don't occur in the reflog.
Change-Id: I193c53de71fb5958ea749c4bfa8360a51acc9b58
PackWriter.writeObject() can get into an infinite loop when corrupt
packs are present. When it finds a pack file with an object that can be
reused it calls DfsPackFile.copyAsIs(). If that method sees an invalid
CRC, it adds the object to the DfsPackFile's corrupt object list and
throws a CorruptObjectException, which it later catches as an
IOException and wraps in a
StoredObjectRepresentationNotAvailableException.
PackWriter.writeObjectImpl() catches that SORNAE and retries the
operation by calling DfsReader.selectObjectRepresentation(). But
currently that method returns the same object which was just seen to
be corrupt.
Change DfsPackFile.isCorrupt() from private to package private, and use
that method in DfsReader.selectObjectRepresentation() to filter out
corrupt objects.
The stack traces that show the problem are:
org.eclipse.jgit.errors.CorruptObjectException.<init>(CorruptObjectException.java:113)
org.eclipse.jgit.internal.storage.dfs.DfsPackFile.copyAsIs(DfsPackFile.java:624)
org.eclipse.jgit.internal.storage.dfs.DfsReader.copyObjectAsIs(DfsReader.java:491)
org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1478)
org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1455)
org.eclipse.jgit.internal.storage.dfs.DfsPackFile.getPackIndex(DfsPackFile.java:228)
org.eclipse.jgit.internal.storage.dfs.DfsReader.findAllFromPack(DfsReader.java:476)
org.eclipse.jgit.internal.storage.dfs.DfsReader.selectObjectRepresentation(DfsReader.java:455)
org.eclipse.jgit.internal.storage.pack.PackWriter.writeObjectImpl(PackWriter.java:1492)
org.eclipse.jgit.internal.storage.pack.PackWriter.writeObject(PackWriter.java:1455)
Change-Id: Iad7bbcaed1f11a6aa3b4f5af911a73a34c0fabfd
Signed-off-by: Terry Parker <tparker@google.com>
RepositoryCache has 2 methods to remove a repository from the cache but
they are never called when a repository is closed. Users of the cache
were expected to call one of those 2 methods but how could they have
called them at proper time without having visibility of the repository
usage count.
Ideally, I would have reworked the RepositoryCache to wrap any
repository it opens in a class that would be responsible to unregister
them from the cache when it's really closed, i.e. when usage counter
reaches 0. The problem preventing the wrapping solution is the
RepositoryCache.register method that allows to register an already
opened repository in the cache. Such repositories cannot be wrapped
because callers are still holding a reference on the unwrapped
repository.
Document that RepositoryCache.close method is removing the repository
from the cache as well as closing it and rework
RepositoryCache.unregister method to only remove the repository from the
cache. Use the latter to unregister repository when Repository.doClose
is getting executed.
Change-Id: Ia364816e4da8d7b6cfa72f10758ca31aa8a1f9db
Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When repositories are opened using the RepositoryCache, they are kept in
memory and when the repository usage counter reaches 0, the
Repository.close method is called which then calls close method on its
reference and object databases.
The problem is that RefDirectory.close method was a no-op and the
reference database was kept in memory. This problem is only happening
when opening a repository using the RepositoryCache because it never
evicts repositories, it's just calling the close method.
Change-Id: Iacb961de8e8b1f5b37824bf0d1a4caf4c6f1233f
Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Repository has a usage counter that is initialized to 1 at
instantiation and this counter is decremented when Repository.close
method is called. There is also a Repository.incrementOpen method that
RepositoryCache uses to increment the usage count when it's returning a
repository that is already opened.
The problem was that RepositoryCache was incrementing the usage count
for repositories that it just opened or registered. The usage count was
2 when it should have been 1.
Incrementing usage count is now only be done for repository that are
served from the cache.
This bug is causing slow memory increase of our Gerrit server until the
server become slow. Even if the RepositoryCache is using SoftReference,
it seems that the JVM is not garbage collecting the repositories because
it's not yet on the edge of being out of memory.
To test this change, I replicated all repositories(11k) from Gerrit
master to one slave. The Gerrit master used memory after this test was
10GB without this change and 3.5GB with.
Change-Id: I86c7b36174e384f106b51fe92f306018fd1dbdf0
Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
When checking out commits/branches JGit was triggering correctly
configured smudge filters. But when checking out paths (either from
index or from commits) JGit was not triggering smudge filters. Fix
CheckoutCommand to properly call filters.
Bug: 486560
Also-by: Pascal Krause <pascal.krausek@sap.com>
Change-Id: I5ff893054defe57ab12e201d901fe74e1376efea
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Implement the DIR_NO_GITLINKS setting with the same functionality
it provides in cGit.
Bug: 436200
Change-Id: I8304e42df2d7e8d7925f515805e075a92ff6ce28
Signed-off-by: Preben Ingvaldsen <preben@puppetlabs.com>
JGit's Garbage Collector is repacking relevant objects into new
packfiles and is afterwards deleting the now obsolete packfiles. But to
prevent problems caused by race conditions JGit was not deleting
packfiles when they are too young. The same mechanism as for loose
objects and the config parameter gc.pruneExpire was used.
But JGit was reusing the parameter gc.pruneExpire also for packfiles
which may cause a lot of filesystem consumption if gc.pruneExpire was
set to the default of 2 weeks. Only two weeks after packfile creation gc
was allowed to delete this packfile.
This change introduces a new config paramter gc.prunePackExpire with a
default of "1.hour". This parameter is used when packfiles are deleted.
Only packfiles older than the specified time can be deleted.
For loose objects the behaviour is not changed and only the old
parameter gc.pruneExpire is relevant.
Change-Id: I6209efb05678b15153bd22479dc13486907a44f8
This change replaced some tabs with spaces introduced in
change I8b3765713599e34f1411f9bbc7f575ec7c2384e0.
Change-Id: Ia5c23b38c9fbbb46f150e527347b61c64c8d9e87
Signed-off-by: Yuxuan 'fishy' Wang <fishywang@google.com>
With ignoreRemoteFailures set to true, we can ignore remote failures
(e.g. the branch of a project described in the manifest file does not
exist), skip that project and continue to the next one, instead of fail
the whole operation.
Change-Id: I8b3765713599e34f1411f9bbc7f575ec7c2384e0
Signed-off-by: Yuxuan 'fishy' Wang <fishywang@google.com>
This commit introduces a FileModeStrategy to
the FileTreeIterator class. This provides a way to
allow different modes of traversing a file tree;
for example, to control whether or not a nested
.git directory should be treated as a gitlink.
Bug: 436200
Change-Id: Ibf85defee28cdeec1e1463e596d0dcd03090dddd
Signed-off-by: Preben Ingvaldsen <preben@puppetlabs.com>
Allow access to the ObjectId of a DirCacheTree if known for low level
integration code.
Change-Id: I6f05b10c9ac781f5e8b38af4a19e653313c91fa8
Signed-off-by: Philipp Marx <smigfu@googlemail.com>
TreeWalk provides the new method getEolStreamType. This new method can
be used with EolStreamTypeUtil in order to create a wrapped InputStream
or OutputStream when reading / writing files. The implementation
implements support for the git configuration options core.crlf, core.eol
and the .gitattributes "text", "eol" and "binary"
CQ: 10896
Bug: 486563
Change-Id: Ie4f6367afc2a6aec1de56faf95120fff0339a358
Signed-off-by: Ivan Motsch <ivan.motsch@bsiag.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The reason is that URIish(URL) and URIish(String) make different parsing
of path / rawPath with regard to drive letters. /C:/... for URL and
C:/... for String. This patch fixes the issue.
Change-Id: I8e2013fff30b7bb198ff733c038e21366667b8a0
Signed-off-by: Ivan Motsch <ivan.motsch@bsiag.com>
Capture the internal "want X not valid" state as a specific subclass
of PackProtocolException, allowing this to be more easily identified
in server stack traces and wrapper application code.
Change-Id: I4b1adb7497f396432da420b0f600ad25a261f912
If the client sends a SHA-1 that the server does not recognize echo
this back to the client with an explicit error message instead of
the generic "internal server error".
This was always the intent of the implementation but it was being
dropped on smart HTTP due to the UploadPackServlet catching the
PackProtocolException, discarding the buffered message UploadPack
meant to send, and sending along a generic message instead.
Change-Id: I8d96b064ec655aef64ac2ef3e01853625af32cd1
The RevWalk given in the arguments is not used. According to the
comment at the top of the method, a new RevWalk is intentionally
used in the implementation.
Remove the unused argument.
Change-Id: Iec81a1341d5bf377801475845b96a465753096ef
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
Attributes MacroExpander implements macros used in git attributes. This
is implemented inside the TreeWalk using a lazy created MacroExpander.
In addition, the macro expander caches the global and info attributes
node in order to provide fast merge of attributes.
Change-Id: I2e69c9fc84e9d7fb8df0a05817d688fc456d8f00
Signed-off-by: Ivan Motsch <ivan.motsch@bsiag.com>
Method parameter names were hiding class members of the same
name.
Change-Id: I182f2715894ac4259b09a371cb4e0eb24f52518a
Signed-off-by: David Pursehouse <david.pursehouse@sonymobile.com>
The disableSslVerify method will be used in the follow up change.
Change-Id: Ie00b5e14244a9a036cbdef94768007f1c25aa8d3
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Implement LfsProtocolServlet handling the "Git LFS v1 Batch API"
protocol [1]. Add a simple file system based LFS content store and the
debug-lfs-store command to simplify testing.
Introduce a LargeFileRepository interface to enable additional storage
implementation while reusing the same protocol implementation.
At the client side we have to configure the lfs.url, specify that
we use the batch API and we don't use authentication:
[lfs]
url = http://host:port/lfs
batch = true
[lfs "http://host:port/lfs"]
access = none
the git-lfs client appends the "objects/batch" to the lfs.url.
Hard code an Authorization header in the FileLfsRepository.getAction
because then git-lfs client will skip asking for credentials. It will
just forward the Authorization header from the response to the
download/upload request.
The FileLfsServlet supports file content storage for "Large File
Storage" (LFS) server as defined by the Github LFS API [2].
- upload and download of large files is probably network bound hence use
an asynchronous servlet for good scalability
- simple object storage in file system with 2 level fan-out
- use LockFile to protect writing large objects against multiple
concurrent uploads of the same object
- to prevent corrupt uploads the uploaded file is rejected if its hash
doesn't match id given in URL
The debug-lfs-store command is used to run the LfsProtocolServlet and,
optionally, the FileLfsServlet which makes it easier to setup a
local test server.
[1]
https://github.com/github/git-lfs/blob/master/docs/api/http-v1-batch.md
[2] https://github.com/github/git-lfs/tree/master/docs/api
Bug: 472961
Change-Id: I7378da5575159d2195138d799704880c5c82d5f3
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Sasa Zivkov <sasa.zivkov@sap.com>
The Large File Storage extension specified by GitHub [1] uses SHA-256 to
compute the ID of large files stored by the extension. Hence implement a
SHA-256 abstraction similar to the SHA-1 abstraction used by JGit.
[1] https://git-lfs.github.com/
Bug: 470333
Change-Id: I3a95954543c8570d73929e55f4a884b55dbf1b7a
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>