DHT based repository types don't use a java.io.File to name the
repository. Moving the type to a string starts to open up more types
of repository names, making the standard pgm package easier to reuse
on other storage systems.
Change-Id: I262ccc8c01cd6db88f832ef317b0e1e5db2d016a
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Using a resolver and factory pattern for the anonymous git:// Daemon
class makes transport.Daemon more useful on non-file storage systems,
or in embedded applications where the caller wants more precise
control over the work tasks constructed within the daemon.
Rather than defining new interfaces, move the existing HTTP ones
into transport.resolver and make them generic on the connection
handle type. For HTTP, continue to use HttpServletRequest, and
for transport.Daemon use DaemonClient.
To remain compatible with transport.Daemon, FileResolver needs to
learn how to use multiple base directories, and how to export any
Repository instance at a fixed name.
Change-Id: I1efa6b2bd7c6567e983fbbf346947238ea2e847e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The UploadPackLogger interface allows applications that embed
GitServlet or otherwise use UploadPack to service clients to
track and log how PackWriter was used, and what it sent. This
provides more granularity into the request activity than might
be available from the HTTP server logs, helping administrators
to better understand utilization and Git server performance.
Change-Id: I1d36b060eb3385339d5f986e68192789ef70fc4e
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the RevFilter doesn't actually require the commit body,
we shouldn't reparse it if the body was disposed. This happens
often inside of UploadPack during common ancestor negotation, the
RevWalk is reset and re-run over roughly the same commit space,
but the bodies are discarded because the commit message is not
relevant to the process.
Change-Id: I87b6b6a5fb269669867047698abf718d366bd002
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Applications like UploadPack reset() and reuse the same RevWalk
multiple times in very rapid succession. Releasing the ObjectReader's
internal state on each use, only to allocate it again on the next
cycle kills performance if the ObjectReader has internal caches, or
even if the Inflater gets returned and pulled from the InflaterCache
too frequently.
Making releasing the ObjectReader the application's responsibility
when it is done with the RevWalk, which most already do by wrapping
their loop in a try/finally block.
Change-Id: I3ad188a719e8d7f6bf27d1a7ca16d465534713f4
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When UploadPack has computed the merge base between the client's have
set and the want set, its already loaded and parsed all of the
interesting commits that PackWriter needs to transmit to the client.
Switching the RevWalk and its object pool over to be an ObjectWalk
saves PackWriter from needing to re-parse these same commits from the
ObjectDatabase, reducing the startup latency for the enumeration
phase of packing.
UploadPack doesn't want to use an ObjectWalk for the okToGiveUp()
tests because its slower, during each commit popped it needs to cache
the tree into the pendingObjects list, and during each reset() it
discards a bunch of ObjectWalk specific state and reallocates some
internal collections. ObjectWalk was never meant to be rapidly
reset() like UploadPack does, so its perhaps somewhat cleaner to allow
"upgrading" a RevWalk to an ObjectWalk.
Bug: 301639
Change-Id: I97ef52a0b79d78229c272880aedb7f74d0f7532f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The peeled reference information for tags is more efficient to
work with than parsing the tag objects, as usually its coming from
the packed-refs file, which stores the peeled information for us.
Rely on the peeled information to decide if the tag should be
included or not, instead of using our RevWalk to parse the object.
Change-Id: I6714a8560a1c04b5578e9c5b469ea3c77188dff3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When negotiate() starts there is at least one want, but no haves, and
thus no common base exists. Its not ok to give up yet, the client
should try to find a common base with the server. Avoid scanning our
history along the want chains until we have found at least one commit
in common with the client, this will trigger okToGiveUp to be set to
null, enabling okToGiveUp() to perform the scan.
Bug: 301639
Change-Id: I98a82a5424fd4c9995924375c7910f76ca4f03af
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the client presents a common commit on a side branch, and there is
a want for a disconnected branch UploadPack was walking back on the
entire history of the disconnected branch because it never would find
the common commit.
Limit our search back along any given want to be no earlier than the
oldest common commit received via a "have" line from our client. This
prevents us from looking at all of the project history.
Bug: 301639
Change-Id: Iffaaa2250907150d6efa1cf2f2fcf59851d5267d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
isOutdated returns true iff the memory state differs from the index
file.
Change-Id: If35db06743f5f588ab19d360fd2a18a07c918edb
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
When pulling into a local branch that has no upstream configuration,
pull should try to used the default remote ("origin") instead of
throwing an Exception.
Bug: 336504
Change-Id: Ife75858e89ea79c0d6d88ba73877fe8400448e34
Signed-off-by: Mathias Kinzler <mathias.kinzler@sap.com>
If the command contains spaces, it needs to be evaluated by the remote
shell. Quoting the command breaks this, making it impossible to run a
remote command that needs additional options.
Bug: 336301
Change-Id: Ib5d88f0b2151df2d1d2b4e08d51ee979f6da67b5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This gets non-commits out of the wantSatisfied() main loop by making
use of the cached SATISIFIED flag and its existing bypass. Anything
that isn't a commit cannot be discovered by the have negotiation, so
its always assumed to be SATISIFIED by the server.
Bug: 301639
Change-Id: I1ef354fbf2e2ed44c9020a4069d7179f2159f19f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When the walker resets, its going to scrub the COMMON and SATISIFIED
flags off a commit if the commit is contained within another commit
the client wants. This is common if the client asks for both a
'maint' and 'master' branch, and 'maint' is also fully merged into
'master'.
COMMON shouldn't be scrubbed during reset because its used to control
membership of the commonBase collection, which is a List. commonBase
should technically be a set, but membership is cheaper with a RevFlag.
COMMON appears on a commit reachable from a WANT when there is also a
PEER_HAS flag present, as this is a merge base. Scrubbing this off
when another branch is tested isn't useful.
SATISIFIED is a cache to tell us if wantSatisified() has already
completed for this particular WANT. If it has, there isn't a need to
recompute on that branch. Scrubbing it off 'maint' when we test
'master' just means we would later need to re-test 'maint', wasting
CPU time on the server.
Bug: 301639
Change-Id: I3bb67d68212e4f579e8c5dfb138f007b406d775f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
okToGiveUpImp() has been missing a ! for a long time. This loop over
wantAll() is looking for an object where wantSatisfied() returns
false, because there is no common merge base present. Unfortunately
it was missing a !, causing the loop to break and return false after
at least one want was satisified.
Bug: 301639
Change-Id: Ifdbe0b22c9cd0a9181546d090b4990d792d70c82
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
JGit did not use sh -c to run the receive-pack or upload-pack programs
locally, which caused errors if these strings contained spaces and
needed the local shell to evaluate them.
Win32 support using cmd.exe /c is completely untested, but seems like
it should work based on the limited information I could get through
Google search results.
Bug: 336301
Change-Id: I22e5e3492fdebbae092d1ce6b47ad411e57cc1ba
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a client wants to perform a clone of the repository, it sends
wants, but no haves. There is no point in parsing the want list
within UploadPack, as there won't be a common merge base search.
Instead just defer the parsing to PackWriter, which will do its
own parsing and object enumeration.
If the client does have a "have" set, defer parsing of the want list
until the have list is also parsed, and parse them together in a
single batch queue. This lets the underlying storage system use a
larger lookup batch if there is significant latency involved when
resolving an ObjectId to a RevObject.
Change-Id: I9c30d34f8e344da05c8a2c041a6dc181d8e8bc19
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The most expensive part of packing a repository for transport to
another system is enumerating all of the objects in the repository.
Once this gets to the size of the linux-2.6 repository (1.8 million
objects), enumeration can take several CPU minutes and costs a lot
of temporary working set memory.
Teach PackWriter to efficiently reuse an existing "cached pack"
by answering a clone request with a thin pack followed by a larger
cached pack appended to the end. This requires the repository
owner to first construct the cached pack by hand, and record the
tip commits inside of $GIT_DIR/objects/info/cached-packs:
cd $GIT_DIR
root=$(git rev-parse master)
tmp=objects/.tmp-$$
names=$(echo $root | git pack-objects --keep-true-parents --revs $tmp)
for n in $names; do
chmod a-w $tmp-$n.pack $tmp-$n.idx
touch objects/pack/pack-$n.keep
mv $tmp-$n.pack objects/pack/pack-$n.pack
mv $tmp-$n.idx objects/pack/pack-$n.idx
done
(echo "+ $root";
for n in $names; do echo "P $n"; done;
echo) >>objects/info/cached-packs
git repack -a -d
When a clone request needs to include $root, the corresponding
cached pack will be copied as-is, rather than enumerating all of
the objects that are reachable from $root.
For a linux-2.6 kernel repository that should be about 376 MiB,
the above process creates two packs of 368 MiB and 38 MiB[1].
This is a local disk usage increase of ~26 MiB, due to reduced
delta compression between the large cached pack and the smaller
recent activity pack. The overhead is similar to 1 full copy of
the compressed project sources.
With this cached pack in hand, JGit daemon completes a clone request
in 1m17s less time, but a slightly larger data transfer (+2.39 MiB):
Before:
remote: Counting objects: 1861830, done
remote: Finding sources: 100% (1861830/1861830)
remote: Getting sizes: 100% (88243/88243)
remote: Compressing objects: 100% (88184/88184)
Receiving objects: 100% (1861830/1861830), 376.01 MiB | 19.01 MiB/s, done.
remote: Total 1861830 (delta 4706), reused 1851053 (delta 1553844)
Resolving deltas: 100% (1564621/1564621), done.
real 3m19.005s
After:
remote: Counting objects: 1601, done
remote: Counting objects: 1828460, done
remote: Finding sources: 100% (50475/50475)
remote: Getting sizes: 100% (18843/18843)
remote: Compressing objects: 100% (7585/7585)
remote: Total 1861830 (delta 2407), reused 1856197 (delta 37510)
Receiving objects: 100% (1861830/1861830), 378.40 MiB | 31.31 MiB/s, done.
Resolving deltas: 100% (1559477/1559477), done.
real 2m2.938s
Repository owners can periodically refresh their cached packs by
repacking their repository, folding all newer objects into a larger
cached pack. Since repacking is already considered to be a normal
Git maintenance activity, this isn't a very big burden.
[1] In this test $root was set back about two weeks.
Change-Id: Ib87131d5c4b5e8c5cacb0f4fe16ff4ece554734b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
CGit pack-objects displays a totals line after the pack data
was fully written. This can be useful to understand some of
the decisions made by the packer, and has been a great tool
for helping to debug some of that code.
Track some of the basic values, and send it to the client when
packing is done:
remote: Counting objects: 1826776, done
remote: Finding sources: 100% (55121/55121)
remote: Getting sizes: 100% (25654/25654)
remote: Compressing objects: 100% (11434/11434)
remote: Total 1861830 (delta 3926), reused 1854705 (delta 38306)
Receiving objects: 100% (1861830/1861830), 386.03 MiB | 30.32 MiB/s, done.
Change-Id: If3b039017a984ed5d5ae80940ce32bda93652df5
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It isn't strictly necessary to validate every reference's target
object is reachable in the repository before advertising it to a
client. This is an expensive operation when there are thousands of
references, and its very unlikely that a reference uses a missing
object, because garbage collection proceeds from the references and
walks down through the graph. So trying to hide a dangling reference
from clients is relatively pointless.
Even if we are trying to avoid giving a client a corrupt repository,
this simple check isn't sufficient. It is possible for a reference to
point to a valid commit, but that commit to have a missing blob in its
root tree. This can be caused by staging a file into the index,
waiting several weeks, then committing that file while also racing
against a prune. The prune may delete the blob, since its
modification time is more than 2 weeks ago, but retain the commit,
since its modification time is right now.
Such graph corruption is already caught during PackWriter as it
enumerates the graph from the client's want list and digs back
to the roots or common base. Leave the reference validation also
for that same phase, where we know we have to parse the object to
support the enumeration.
Change-Id: Iee70ead0d3ed2d2fcc980417d09d7a69b05f5c2f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Because of change I28ae5713, the commit message lost the "into HEAD" and
caused the MergeCommandTest to fail. This change fixes it.
Bug: 336059
Change-Id: Ifac0138c6c6d66c40d7295b5e11ff3cd98bc9e0c
PushCommand now does not set a null credentials provider on
Transport because in this case the default provider is replaced with
null and the default mechanism for providing credentials is not
working.
Bug: 336023
Change-Id: I7a7a9221afcfebe2e1595a5e59641e6c1ae4a207
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
When MergeMessageFormatter was given a symbolic ref HEAD which points to
refs/heads/master (which is the case when merging a branch in EGit), it
would result in a merge message like the following:
Merge branch 'a' into HEAD
But it should print the following (as C Git does):
Merge branch 'a'
The solution is to use the leaf ref when checking for refs/heads/master.
Change-Id: I28ae5713b7e8123a0176fc6d7356e469900e7e97
There is no point in pushing all of the files within the edge
commits into the delta search when making a thin pack. This floods
the delta search window with objects that are unlikely to be useful
bases for the objects that will be written out, resulting in lower
data compression and higher transfer sizes.
Instead observe the path of a tree or blob that is being pushed
into the outgoing set, and use that path to locate up to WINDOW
ancestor versions from the edge commits. Push only those objects
into the edgeObjects set, reducing the number of objects seen by the
search window. This allows PackWriter to only look at ancestors
for the modified files, rather than all files in the project.
Limiting the search to WINDOW size makes sense, because more than
WINDOW edge objects will just skip through the window search as
none of them need to be delta compressed.
To further improve compression, sort edge objects into the front
of the window list, rather than randomly throughout. This puts
non-edges later in the window and gives them a better chance at
finding their base, since they search backwards through the window.
These changes make a significant difference in the thin-pack:
Before:
remote: Counting objects: 144190, done
remote: Finding sources: 100% (50275/50275)
remote: Getting sizes: 100% (101405/101405)
remote: Compressing objects: 100% (7587/7587)
Receiving objects: 100% (50275/50275), 24.67 MiB | 9.90 MiB/s, done.
Resolving deltas: 100% (40339/40339), completed with 2218 local objects.
real 0m30.267s
After:
remote: Counting objects: 61549, done
remote: Finding sources: 100% (50275/50275)
remote: Getting sizes: 100% (18862/18862)
remote: Compressing objects: 100% (7588/7588)
Receiving objects: 100% (50275/50275), 11.04 MiB | 3.51 MiB/s, done.
Resolving deltas: 100% (43160/43160), completed with 5014 local objects.
real 0m22.170s
The resulting pack is 13.63 MiB smaller, even though it contains the
same exact objects. 82,543 fewer objects had to have their sizes
looked up, which saved about 8s of server CPU time. 2,796 more
objects from the client were used as part of the base object set,
which contributed to the smaller transfer size.
Change-Id: Id01271950432c6960897495b09deab70e33993a9
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Sigend-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Some of this code predates making ObjectId.equals() final
and fixing RevObject.equals() to match ObjectId.equals().
It was therefore more complex than it needs to be, because
it tried to work around RevObject's broken equals() rules
by converting to ObjectId in a different collection.
Also combine setUpWalker() and findObjectsToPack() methods,
these can be one method and the code is actually cleaner.
Change-Id: I0f4cf9997cd66d8b6e7f80873979ef1439e507fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
The first 'Compressing objects' progress message is wrong, its
actually PackWriter looking up the sizes of each object in the
ObjectDatabase, so objects can be sorted correctly in the later
type-size sort that tries to take advantage of "Linus' Law" to
improve delta compression.
Rename the progress to say 'Getting sizes', which is an accurate
description of what it is doing.
Change-Id: Ida0a052ad2f6e994996189ca12959caab9e556a3
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>