Previously jgit would attempt to clean git repositories that had not
been committed by calling a non-recursive delete on them, which would
fail as they are directories. This commit addresses that issue in the
following ways.
Repositories are skipped in a default clean, similarly to cgit and only
cleaned when the force flag is applied. When the force flag is applied
repositories are deleted using a recursive delete call. The force flag
and setForce method are added here to CleanCommand to support this
change.
Bug: 498367
Change-Id: Ib6cfff65a033d0d0f76395060bf76719e13fc467
Signed-off-by: Matthaus Owens <matthaus@puppetlabs.com>
This commit adds some test coverage to cleaning a repository with a
submodule, which did not previously exist.
Bug: 498367
Change-Id: Ia5c4e4cc53488800dd486f8556dc57656783f1c4
Signed-off-by: Matthaus Owens <matthaus@puppetlabs.com>
Gerrit's superproject subscription feature uses RefSpecs to formalize
the ACLs of when the superproject subscription feature is allowed.
As this is a slightly different use case than describing a local/remote
pair of refs, we need to be more permissive. Specifically we want to allow:
refs/heads/*
refs/heads/*:refs/heads/master
refs/heads/master:refs/heads/*
Introduce a new constructor, that allows constructing these RefSpecs.
Change-Id: I46c0bea9d876e61eb2c8d50f404b905792bc72b3
Signed-off-by: Stefan Beller <sbeller@google.com>
We had a case in Gerrits superproject subscriptions where
'refs/heads/' was configured with the intention to mean 'refs/heads/*'.
The first expression lacks the '*', which is why it is not considered
a wildcard but it was considered valid and so was not found early to be
a typo.
Refs are not allowed to end with '/' anyway, so add a check for that.
Change-Id: I3ffdd9002146382acafb4fbc310a64af4cc1b7a9
Signed-off-by: Stefan Beller <sbeller@google.com>
Example usage:
$ ./jgit push \
--push-option "Reviewer=j.doe@example.org" \
--push-option "<arbitrary string>" \
origin HEAD:refs/for/master
Stefan Beller has also made an equivalent change to CGit:
http://thread.gmane.org/gmane.comp.version-control.git/299872
Change-Id: I6797e50681054dce3bd179e80b731aef5e200d77
Signed-off-by: Dan Wang <dwwang@google.com>
When Repository.close() decrements the useCount to 0 currently the cache
immediately evicts the repository from WindowCache and RepositoryCache.
This leads to I/O overhead on busy repositories because pack files and
references are inserted and deleted from the cache frequently.
This commit defers the eviction of a repository from the caches until
last use of the repository is older than time to live. The eviction is
handled by a background task running periodically.
Add two new configuration parameters:
* core.repositoryCacheExpireAfter: cache entries are evicted if the
cache entry wasn't accessed longer than this time in milliseconds
* core.repositoryCacheCleanupDelay: defines the interval in milliseconds
for running a background task evicting expired cache entries. If set to
-1 the delay is set to min(repositoryCacheExpireAfter, 10 minutes). If
set to 0 the time based eviction is switched off and no background task
is started. If time based eviction is switched off the JVM can still
evict cache entries if heap memory is running low.
Change-Id: I4a0214ad8b4a193985dda6a0ade63b70bdb948d7
Also-by: Matthias Sohn <matthias.sohn@sap.com>
Also-by: Hugo Arès <hugo.ares@ericsson.com>
Also-by: Sasa Zivkov <sasa.zivkov@sap.com>
If the client sent a well-formed enough request to see it wants to use
side-band-64k for status reporting (meaning its a modern client), but
any other command record was somehow invalid (e.g. corrupt SHA-1)
report the parsing exception using channel 3. This allows clients to
see the failure and know the server will not be continuing.
git-core and JGit clients send all commands and then start a sideband
demux before sending the pack. By consuming all commands first we get
the client into a state where it can see and respond to the channel 3
server failure.
This behavior is useful on HTTPS connections when the client is buggy
and sent a corrupt command, but still managed to request side-band-64k
in the first line.
Change-Id: If385b91ceb9f024ccae2d1645caf15bc6b206130
Some branches in WorkingTreeIterator.getIndexFileMode() have not been
covered by tests. Enhance the tests to increase test coverage.
Change-Id: I400a221048f0f6cbaa987350eaf998b0ebb50a4e
DfsGarbageCollector will now enforce a maximum time to live (TTL) for
UNREACHABLE_GARBAGE packs. The default TTL is 1 day, which should be
enough time to avoid races with other processes that are inserting
data into the repository.
Change-Id: Id719e6e2a03cfc9a0c0aef8ed71d261dda14bd0c
Signed-off-by: Mike Williams <miwilliams@google.com>
1f86350 added initial support for include.path. Relative path and path
with tilde are not yet supported but config load was failing if one of
those 2 unsupported options was encountered. Another problem was that
config load was failing if the include.path file did not exist.
Change the behavior to be consistent with native git. Ignore unsupported
or nonexistent include.path.
Bug: 495505
Bug: 496732
Change-Id: I7285d0e7abb6389ba6983e9c46021bea4344af68
Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Treewalk has a member 'attr' which caches the attributes for the current
entry. We did not reset the cache always when moving to next entry. The
effect was that when there are no attributes for an entry 'a' but 'a'
was skipped by a Treewalk filter then Treewalk stopped looking for
attributes until TreeWalk.next() was called again.
Change-Id: Ied39b7fb5f56afe7a237da17801003d0abe6b1c7
Problem occurs when the checkout wants to create a file 'd/f' but
the workingtree contains a dirty file 'd'. In order to create d/f the
file 'd' would have to be deleted and since the file is dirty that
content would be lost. This should lead to a CheckoutConflictException
for d/f when failOnConflict was set to true.
This fix also changes jgit checkout semantics to be more like native
gits checkout semantics. If during a checkout jgit wants to delete a
folder but finds that the working tree contains a dirty file at this
path then JGit will now throw an exception instead of silently keeping
the dirty file. Like in this example:
git init
touch b
git add b
git commit -m addB
mkdir a
touch a/c
git add a/c
git commit -m addAC
rm -fr a
touch a
git checkout HEAD~
Change-Id: I9089123179e09dd565285d50b0caa308d290cccd
Signed-off-by: Rüdiger Herrmann <ruediger.herrmann@gmx.de>
Also-by: Rüdiger Herrmann <ruediger.herrmann@gmx.de>
The native wire protocol sends ref advertisements in the pkt-line
format, which requires encoding the ObjectId and ref name onto a byte
sequence. Busy servers show this is a very high source of garbage,
which pushes the garbage collector harder when there are many refs in
the repository (e.g. 70k, in a Gerrit managed repository).
Optimize the side band advertiser by retaining the CharsetEncoder,
minimizing the amount of temporary garbage built during encoding.
Change-Id: I406c654bf82c1eb94b38862da2425e98396134cb
Before this fix, ref directory removal did not work. That was because
the ref lock file was still in the leaf directory at deletion time.
Hence no deep ref directories were ever deleted, which negatively
impacted performance under large directory structure circumstances.
This fix removes the ref lock file before attempting to delete the ref
directory (which includes it). The other deep parent directories are
therefore now successfully deleted in turn, since leaf's content
(lock file) gets removed first.
So, given a structure such as refs/any/directory[/**], this fix now
deletes all empty directories up to -and including- 'directory'. The
'any' directory (e.g.) does not get deleted even if empty, as before.
The ref lock file is still also removed in the calling block's finally
clause, just in case, as before. Such double-unlock brought by this
fix is harmless (a no-op).
A new (private) RefDirectory#delete method is introduced to support
this #pack-specific case; other RefDirectory#delete callers remain
untouched.
Change-Id: I47ba1eeb9bcf0cb93d2ed105d84fea2dac756a5a
Signed-off-by: Marco Miller <marco.miller@ericsson.com>
Git servers supporting HTTP transport can send multiple WWW-Authenticate
challenges [1] for different authentication schemes the server supports.
If authentication fails now retry all authentication types proposed by
the server.
[1] https://tools.ietf.org/html/rfc2617#page-3
Bug: 492057
Change-Id: I01d438a5896f9b1008bd6b751ad9c7cbf780af1a
Signed-off-by: Christian Pontesegger <christian.pontesegger@web.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When using a DfsInserter for high-throughput insertion of many
objects (analogous to git-fast-import), we don't necessarily want to
do a random object lookup for each. It'll be faster from the
inserter's perspective to insert the duplicate objects and let a later
GC handle the deduplication.
Change-Id: Ic97f5f01657b4525f157e6df66023f1f07fc1851
Git core learned about the submodule.<name>.shallow option in
.gitmodules files, which is a recommendation to clone a submodule
shallow. A repo manifest may record a clone depth recommendation as
an optional field, which contains more information than a binary
shallow/nonshallow recommendation, so any attempted conversion may be
lossy. In practice the clone depth recommendation is either '1' or doesn't
exist, which is the binary behavior we have in Git core.
Change-Id: I51aa9cb6d1d9660dae6ab6d21ad7bae9bc5325e6
Signed-off-by: Stefan Beller <sbeller@google.com>
Git core learned about attributes in pathspecs:
pathspec: allow querying for attributes
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it
requires paths to not just match the given pattern but also have the
specified attrs attached for them to be chosen.
(177161a5f7, 2016-05-20)
We intend to use these pathspec attribute patterns for submodule
grouping, similar to the grouping in repo. So the RepoCommand which
translates repo manifest files into submodules should propagate this
information along. This requires writing information to the
.gitattributes file instead of the .gitmodules file. For now we just
overwrite any existing .gitattributes file and do not care about prior
attributes set. If this becomes an issue we need to figure out how to
correctly amend the grouping information to an existing .gitattributes
file.
Change-Id: I0f55b45786b6b8fc3d5be62d7f6aab9ac00ed60e
Signed-off-by: Stefan Beller <sbeller@google.com>
JGit failed to do checkouts when the index contained smudged entries and
autocrlf was on. In such cases the WorkingTreeIterator calculated the
SHA1 sometimes on content which was not correctly filtered. The SHA1 was
computed on content which two times went through a lf->crlf conversion.
We used to tell the treewalk whether it is a checkin or checkout
operation and always use the related filters when reading any content.
If on windows and autocrlf is true and we do a checkout operation then
we always used a lf->crlf conversion on any text content. That's not
correct. Even during a checkout we sometimes need the crlf->lf
conversion. E.g. when calculating the content-id for working-tree
content we need to use crlf->lf filtering although the overall operation
type is checkout.
Often this bug does not have effects because we seldom compute the
content-id of filesystem content during a checkout. But we do need to
know whether a file is dirty or not before we overwrite it during a
checkout. And if the index entries are smudged we don't trust the index
and compute filesystem-content-sha1's explicitly.
This caused EGit not to be able to switch branches anymore on Windows
when autocrlf was true. EGit denied the checkout because it thought
workingtree files are dirty because content-sha1 are computed on wrongly
filtered content.
Bug: 493360
Change-Id: I1072a57b4c529ba3aaa50b7b02d2b816bb64a9b8
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
As per [1], but limited to absolute paths indeed. No support yet for
tilde or $HOME expansion. Support for the --[no-]includes options
([1]) is not part of this commit scope either, but those options'
defaults are in effect as described in [1].
[1] https://git-scm.com/docs/git-config
Included path can be a config file that includes other path-s in turn.
An exception is thrown if too many recursions (circular includes)
happen because of ill-specified config files.
Change-Id: I700bd7b7e1625eb7de0180f220c707d8e7b0930b
Signed-off-by: Marco Miller <marco.miller@ericsson.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Added the option to retrieve either merge or non-merge commits in the
LogCommand.
Change-Id: Ie0e1c515a823f2392783f1a47d385c31230e8167
Signed-off-by: Alcemir Santos <alcemir.santos@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
There was a bug regarding how JGit handled untracked files when applying
a stash. Problem was that untracked files are applied by doing a merge
of HEAD and untrackedFiles commit with a merge base of the stashed HEAD.
That's wrong because the untrackedFiles commit has no parent and
contains only the untracked files. Using stashed HEAD as merge base
leads to unneccessary conflicts on files not event included in the
untrackedFiles commit.
Imagine this graph directly before you want to apply a stash which was
based on 0. You want to apply the stash on current HEAD commit 5.
5 (HEAD,master)
/
0---+
\ \
1---3 (WIP on master)
/
2 (untracked files on master)
Imagine for a specific (tracked) file f
- commit 0 contains X
- HEAD contains Y
- commit 2 (the untracked files) does not contain file f
A merge of 2 and 5 with a merge base of 0 leads to a conflict. The 5
commit wants to modify the file and the 2 commit wants to delete the
file -> conflict.
If no merge base is set then the semantic is correct.
Thanks to Bow for finding this bug and providing the test case.
Bug: 485467
Change-Id: I453fa6ec337f81b2a52c4f51f23044faeec409e6
Also-by: Bow Ruggeri <bow@bow.net>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
When performing a "reset --hard" a checkout is done. The pathes are
checked for potential checkout conflicts. But in the end for all
remaining conflicts these files are simply deleted from the working
tree. That's not the right strategy to handle checkout conflicts during
"reset --hard". Instead for every conflict we should simply checkout the
merge commit's content.
This is different from native gits behavior which reports errors when
during a "checkout --hard" a file shows up where a folder was expected.
"warning: unable to unlink d/c.txt: Not a directory"
Why it is like that in native git was asked in
http://permalink.gmane.org/gmane.comp.version-control.git/279482. Unless
it is explained why native git why this error is reported JGit should
overwrite the files.
Bug: 474842
Change-Id: I08e23822a577aaf22120c5137eb169b6bd08447b
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
git-apply allows modifying file modes in patched files using either
"new mode" or "new file mode" headers. This patch adds support for
setting files as executables and vice-versa.
Change-Id: I24848966b46f686f540a8efa8150b42e0d9c3ad1
Signed-off-by: Nadav Cohen <nadavcoh@gmail.com>
Before this fix, getting the value of 'key' below used to return
value1. This fix makes it so that value3 gets returned instead,
just like native git's get.
[section]
key = value1
key = value2
key = value3
Change-Id: Iccb24de9b63c3ad8646c909494ca3f8c9ed6e29c
Signed-off-by: Marco Miller <marco.miller@ericsson.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The special characters <> and '\n' interfere with parsing of
identities. C git strips these special characters, so we should too.
Rather than allocating extra strings by calling String#trim(), add a
few lines to our sanitization method to perform the same trimming as
described in String's Javadoc.
Change-Id: I96edcb93a2fc194ee354d60566d352299742a52f
This might be somewhat surprising behavior to users who might
naturally assume the following invariant:
ident.equals(parseIdent(ident.toExternalString()))
This invariant does not hold since whitespace is only trimmed during
serialization. We don't want to mess with the strings during
initialization, as this is called during the highly-optimized commit
parsing codepath.
Change-Id: I081a603f0ac0e33167462244779b0ff3ad51e80c
We've found in Gerrit Code Review that it is common to pass around
both an ObjectReader (or more commonly a RevWalk wrapping one) and an
ObjectInserter. These code paths often assume that the ObjectReader
can read back any objects created by the ObjectInserter without
flushing. However, we previously had no way to enforce that constraint
programmatically, leading to hard-to-spot problems.
Provide a solution by exposing the ObjectInserter that created an
ObjectReader, when known. Callers can either continue passing both
objects and check:
reader.getCreatedFromInserter() == inserter
or they can just pass around ObjectReader and extract the inserter
when it's needed (checking that it's not null at usage time).
Change-Id: Ibbf5d1968b506f6b47030ab1b046ffccb47352ea
When CheckoutCommand or MergeCommand is called then not in all situation
the treewalks have been prepared to support clean/smudge filters. Fix
this
Bug: 491505
Change-Id: Iab5608049221c46d06812552ab97299e44d59e64
Such hunks are identifiable by a zero value for "new start line". Prior
to the fix, JGit throws and ArrayIndexOutOfBoundsException on such
patches.
Change-Id: I4f3deb5e5f41a08af965fcc178d678c77270cddb
Signed-off-by: Jonathan Schneider <jkschneider@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: I5b3b7b0633354d5ccf0c6c320c0df9c93fdf8eeb
Signed-off-by: Ned Twigg <ned.twigg@diffplug.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
CommitCommand already provided a method to set the comment which should
be written into the reflog. The underlying RefUpdate class supported to
skip writing a reflog entry. But through the CommitCommand API it was
not possible to prevent writing a reflog entry. Fix this and allow
creating commits which don't occur in the reflog.
Change-Id: I193c53de71fb5958ea749c4bfa8360a51acc9b58
RepositoryCache has 2 methods to remove a repository from the cache but
they are never called when a repository is closed. Users of the cache
were expected to call one of those 2 methods but how could they have
called them at proper time without having visibility of the repository
usage count.
Ideally, I would have reworked the RepositoryCache to wrap any
repository it opens in a class that would be responsible to unregister
them from the cache when it's really closed, i.e. when usage counter
reaches 0. The problem preventing the wrapping solution is the
RepositoryCache.register method that allows to register an already
opened repository in the cache. Such repositories cannot be wrapped
because callers are still holding a reference on the unwrapped
repository.
Document that RepositoryCache.close method is removing the repository
from the cache as well as closing it and rework
RepositoryCache.unregister method to only remove the repository from the
cache. Use the latter to unregister repository when Repository.doClose
is getting executed.
Change-Id: Ia364816e4da8d7b6cfa72f10758ca31aa8a1f9db
Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Repository has a usage counter that is initialized to 1 at
instantiation and this counter is decremented when Repository.close
method is called. There is also a Repository.incrementOpen method that
RepositoryCache uses to increment the usage count when it's returning a
repository that is already opened.
The problem was that RepositoryCache was incrementing the usage count
for repositories that it just opened or registered. The usage count was
2 when it should have been 1.
Incrementing usage count is now only be done for repository that are
served from the cache.
This bug is causing slow memory increase of our Gerrit server until the
server become slow. Even if the RepositoryCache is using SoftReference,
it seems that the JVM is not garbage collecting the repositories because
it's not yet on the edge of being out of memory.
To test this change, I replicated all repositories(11k) from Gerrit
master to one slave. The Gerrit master used memory after this test was
10GB without this change and 3.5GB with.
Change-Id: I86c7b36174e384f106b51fe92f306018fd1dbdf0
Signed-off-by: Hugo Arès <hugo.ares@ericsson.com>
When checking out commits/branches JGit was triggering correctly
configured smudge filters. But when checking out paths (either from
index or from commits) JGit was not triggering smudge filters. Fix
CheckoutCommand to properly call filters.
Bug: 486560
Also-by: Pascal Krause <pascal.krausek@sap.com>
Change-Id: I5ff893054defe57ab12e201d901fe74e1376efea
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Implement the DIR_NO_GITLINKS setting with the same functionality
it provides in cGit.
Bug: 436200
Change-Id: I8304e42df2d7e8d7925f515805e075a92ff6ce28
Signed-off-by: Preben Ingvaldsen <preben@puppetlabs.com>
JGit's Garbage Collector is repacking relevant objects into new
packfiles and is afterwards deleting the now obsolete packfiles. But to
prevent problems caused by race conditions JGit was not deleting
packfiles when they are too young. The same mechanism as for loose
objects and the config parameter gc.pruneExpire was used.
But JGit was reusing the parameter gc.pruneExpire also for packfiles
which may cause a lot of filesystem consumption if gc.pruneExpire was
set to the default of 2 weeks. Only two weeks after packfile creation gc
was allowed to delete this packfile.
This change introduces a new config paramter gc.prunePackExpire with a
default of "1.hour". This parameter is used when packfiles are deleted.
Only packfiles older than the specified time can be deleted.
For loose objects the behaviour is not changed and only the old
parameter gc.pruneExpire is relevant.
Change-Id: I6209efb05678b15153bd22479dc13486907a44f8
This commit introduces a FileModeStrategy to
the FileTreeIterator class. This provides a way to
allow different modes of traversing a file tree;
for example, to control whether or not a nested
.git directory should be treated as a gitlink.
Bug: 436200
Change-Id: Ibf85defee28cdeec1e1463e596d0dcd03090dddd
Signed-off-by: Preben Ingvaldsen <preben@puppetlabs.com>
TreeWalk provides the new method getEolStreamType. This new method can
be used with EolStreamTypeUtil in order to create a wrapped InputStream
or OutputStream when reading / writing files. The implementation
implements support for the git configuration options core.crlf, core.eol
and the .gitattributes "text", "eol" and "binary"
CQ: 10896
Bug: 486563
Change-Id: Ie4f6367afc2a6aec1de56faf95120fff0339a358
Signed-off-by: Ivan Motsch <ivan.motsch@bsiag.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>