A linear search is somewhat acceptable for only 4 recent chunks, but
a HashMap based lookup would be better. The table will have 16 slots
by default and given the hashCode() of ChunkKey is derived from the
SHA-1 of the chunk, each chunk will fall into its own bucket within
the table and thus evaluate only 1 entry during lookup instead of 4.
Some users may also want to devote more memory to the recent chunks,
in which case expanding this list to a longer length will help to
reduce chunk faults, but would increase search time. Using a HashMap
will help this code to scale to larger sizes better.
Change-Id: Ia41b7a1cc69ad27b85749e3b74cbf8d0aa338044
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The RecentChunks cache assumes there is always at least one recent
chunk in the maxSize that it receives from the DhtReaderOptions.
Ensure that is true by requiring the size to be at least 1.
Running with 0 recent chunk cache is very a bad idea, often
during commit walking the parents of a commit will be found
on the same chunk as the commit that was just accessed. In
these cases its a good idea to keep that last chunk around
so the parents can be quickly accessed.
Change-Id: I33b65286e8a4cbf6ef4ced28c547837f173e065d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If an internal exception occurs while packing and the request
needs to abort, the HTTP response might already be committed due
to progress message having already been delivered to the client.
This prevents UploadPackServlet from resetting the response and
sending back an HTTP 500 response.
Try to catch all exceptions and report internal errors over the
sideband stream or as an ERR command during the initial ACK/NAK
negotiation phase. This allows JGit to transmit an error message
that the user will receive on their console without needing to
worry about resetting the (already gone) HTTP response.
Change-Id: Ie393fb8bb55d2b79ab1276adf71c781c1807f9fe
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Prefetcher may have loaded a chunk that is a fragment, if the
DhtReader is scanning the Prefetcher's chunks for a particular
object fragment chunks will be missing the index and NPE during
the findOffset() call into the index itself.
Change-Id: Ie2823724c289f745655076c5209acec32361a1ea
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a fetch or push needs to apply more than a few references
to the local repository it may take more than 0.25 seconds to
process all of the updates. This is especially true in the DHT
storage system during an initial push of a project with many tags.
The backend database may need to use a transaction to ensure each
tag reference creation is unique, and there may be large delays
caused by these transactions.
Change-Id: Ib11a077adfbd525253e425d327f2e2c2380804c7
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Originally I put the first two digits of the object SHA-1 into the
start of a row key to try and spread the load of objects around a DHT
service. Unfortunately this tends to not work as well as I had hoped.
Servers reading a repository need to contact every node in a DHT
cluster if the cluster tries to evenly distribute the object rows.
This is a lot of connections, especially if the cluster has many
backend storage servers. If the library has an open connection
limit (possibly due to JVM file descriptor limitations) it may need
to open and close a lot of connections to access a repository,
rather than being able to reuse the same connection to a handful
of backend servers. This results in a lot of connection thrashing
for some DHT type databases, and is inefficient.
Some DHTs are able to operate even if part of the database space
is currently unavailable. For example, a DHT service might assign
some section of the key space to a node, and then fail that section
over to another node when the primary is noticed as being offline.
During that failover period that section of the key space is not
available, but other sections hosted by other backends are still
ready for service. Spreading keys all over the cluster makes it
likely that any single backend being temporarily down means the
entire cluster is down, rather than only some.
This is a massive schema change, but it should improve relability
and performance for any DHT system.
Change-Id: I6b65bfb4c14b6f7bd323c2bd0638b49d429245be
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* stable-1.0:
Prepare post JGit v1.0.0.201106090707-r builds
JGit v1.0.0.201106090707-r
Include about.html files in maven build
Prepare post v1.0.0.201106081625-r builds
JGit v1.0.0.201106081625-r
Add missing about.html files to all shipped bundles
Prepare post v1.0.0.201106071701-r builds
JGit v1.0.0.201106071701-r
* stable-1.0:
Prepare post v1.0.0.201106011211-rc3 builds
JGit v1.0.0.201106011211-rc3
Remove incubation marker
blame: Compute the origin of lines in a result file
BlameGenerator digs through history and discovers the origin of each
line of some result file. BlameResult consumes the stream of regions
created by the generator and lays them out in a table for applications
to display alongside of source lines.
Applications may optionally push in the working tree copy of a file
using the push(String, byte[]) method, allowing the application to
receive accurate line annotations for the working tree version. Lines
that are uncommitted (difference between HEAD and working tree) will
show up with the description given by the application as the author,
or "Not Committed Yet" as a default string.
Applications may also run the BlameGenerator in reverse mode using the
reverse(AnyObjectId, AnyObjectId) method instead of push(). When
running in the reverse mode the generator annotates lines by the
commit they are removed in, rather than the commit they were added in.
This allows a user to discover where a line disappeared from when they
are looking at an older revision in the repository. For example:
blame --reverse 16e810b2..master -L 1080, org.eclipse.jgit.test/tst/org/eclipse/jgit/storage/file/RefDirectoryTest.java
( 1080) }
2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1081)
2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1082) /**
2302a6d3 (Christian Halstrick 2011-05-20 11:18:20 +0200 1083) * Kick the timestamp of a local file.
Above we learn that line 1080 (a closing curly brace of the prior
method) still exists in branch master, but the Javadoc comment below
it has been removed by Christian Halstrick on May 20th as part of
commit 2302a6d3. This result differs considerably from that of C
Git's blame --reverse feature. JGit tells the reader which commit
performed the delete, while C Git tells the reader the last commit
that still contained the line, leaving it an exercise to the reader
to discover the descendant that performed the removal.
This is still only a basic implementation. Quite notably it is
missing support for the smart block copy/move detection that the C
implementation of `git blame` is well known for. Despite being
incremental, the BlameGenerator can only be run once. After the
generator runs it cannot be reused. A better implementation would
support applications browsing through history efficiently.
In regards to CQ 5110, only a little of the original code survives.
CQ: 5110
Bug: 306161
Change-Id: I84b8ea4838bb7d25f4fcdd540547884704661b8f
Signed-off-by: Kevin Sawicki <kevin@github.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
* stable-1.0:
DHT: Support removing a repository name
DHT: Fix thread-safety issue in AbstractWriteBuffer
jgit.sh: Implement pager support
Change EditList to extend ArrayList
Ensure the HTTP request is fully consumed
Make sure test repositories are closed
Fix CloneCommand not to fetch into remote tracking branches when bare
Update Eclipse IP log for 1.0
Change-Id: I6340d551482e1dda01f82496296d2038b07fa68b
The first step to deleting a repository from the DHT storage is to
remove the name binding in the RepositoryIndexTable, making the
repository unavailable for lookup.
Change-Id: I469bf92f4bf2f555a15949569b21937c14cb142b
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There is a data corruption issue with the 'running' list if a
background thread schedules something onto the buffer while the
application thread is also using it.
Change-Id: I5ba78b98b6632965d677a9c8f209f0cf8320cc3d
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the command is either `diff` or `log`, there is often a lot of
lines of output. Run these commands through $GIT_PAGER, $PAGER, or
`less` in order to make it easier to browse the output on a terminal.
Change-Id: I18b87ea4acf404b94788f2ac2101812bd13e6a0f
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There is no reason for this type to contain an ArrayList and try to
hide the implementation. It only slows down execution by adding an
extra layer of method dispatch to each invocation.
Instead subclass from ArrayList.
Change-Id: Ifbb9c7060c2fe3d5a7397c1aa85fbade14088637
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some servlet containers require the servlet to read the EOF marker
from the input stream before a response can be output if the stream
is using "Transfer-Encoding: chunked"... which is typical for any
sort of large push to a repository over smart HTTP.
Ensure the EOF is always read by the PackParser when it is handling
the stream, and fail fast if there is more data present than expected
since this does indicate a protocol error.
Also ensure the EOF is read by UploadPack before it starts to output
a partial response using packing progress meters.
Change-Id: I131db9dea20b2324cb7c3272a814f21296bc64bd
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some repositories created during tests are not added to the 'toClose'
list in LocalDiskRepositoryTestCase. Therefore when the tests end
we may have open FileHandles and on Windows this may cause the
tests to fail because we can't delete those files.
This is fixed by adding the possibility to explicitly add
repositories to the list of repos which are closed automatically.
Change-Id: I1261baeef4c7d9aaedd7c34b546393bfa005bbcc
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
When cloning into a bare repository we should not create remote
tracking branches (e.g refs/remotes/origin/testX). Branches of the
remote repository should but fetched into into branches of the same
name (e.g refs/heads/testX). Also add the noCheckout option which
would prevent checkout after fetch.
Change-Id: I5d4cc0389f3f30c53aa0065f38119af2a1430909
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
RefDirectory was not using FileSnapshot correctly in all places. This
is fixed with this commit. Additionally the constructors for the
different types of refs have been changed to take a FileSnapshot
instead of a modification time.
Change-Id: Ifb6a59e87e8b058a398c38cdfb9d648f0bad4bf8
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
The teardown faile on Windows because the repos were not closed.
Change-Id: I16cf5645558680029682f898386b061796948237
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
CQ "4876" is jgit's CQ for usage of protobuf, CQ "5135"
is the corresponding Orbit CQ.
Change-Id: I300cf2b5758c7da9c18494325f2f38bb3744e459
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
RefData now uses a sequence number as part of the field, ensuring
that updates always increase the sequence number by one whenever
a reference is modified.
Attaching a sequence number to RefData will help with storing
reference log entries during updates. As the sequence number should
be unique within the reference name space, log entries can be keyed
by the sequence number and remain unique. Making this work over
reference delete-create cycles will require an additional RefTable
API to return the oldest sequence number previously used in the
reference log to seed the recreated reference.
Change-Id: I11cfff2a96ef962e57f29925a3eef41bdbf9f9bb
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
The standard Google distribution of Protocol Buffers in Java is better
maintained than TinyProtobuf, and should be faster for most uses. It
does use slightly more memory due to many of our key types being
stored as strings in protobuf messages, but this is probably worth the
small hit to memory in exchange for better maintained code that is
easier to reuse in other applications.
Exposing all of our data members to the underlying implementation
makes it easier to develop reporting and data mining tools, or to
expand out a nested structure like RefData into a flat format in a SQL
database table.
Since the C++ `protoc` tool is necessary to convert the protobuf
script into Java code, the generated files are committed as part of
the source repository to make it easier for developers who do not have
this tool installed to still build the overall JGit package and make
use of it. Reviewers will need to be careful to ensure that any edits
made to a *.proto file come in a commit that also updates the
generated code to match.
CQ: 5135
Change-Id: I53e11e82c186b9cf0d7b368e0276519e6a0b2893
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Performance testing has indicated the per-process ChunkCache isn't
very effective for the DHT storage implementation. If a server is
using the DHT storage backend, it is most likely part of a larger
cluster where requests are distributed in a round-robin fashion
between the member servers.
In such a scenario there is insufficient data locality between
requests to get a good hit ratio on the per-process ChunkCache. A low
hit ratio means the cache is actually hurting performance by eating up
memory that could otherwise be used for transient request data, and
increasing pressure on the GC when it needs to find free space.
Remove all of the ChunkCache code. Installations that want to cache
(to reduce database usage) should wrap their Database with a
CacheDatabase and use a network based CacheServer.
I left the ChunkCache in the original DHT storage commit because I
wanted to document in the history of the project that its probably
worth *not* having, but leave open a door for someone to revert this
change if they find otherwise at a later date.
Change-Id: I364d0725c46c5a19f7443642a40c89ba4d3fdd29
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Adds a class which can be used to calculates a SHA1 of the diff
associated with a patch, similar to git patch-id.
In this version whitespace is not ignored.
Change-Id: I421d15ea905e23df543082786786841cbe3ef10d
Signed-off-by: Stefan Lay <stefan.lay@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
EGit change Iba3b87293c22e5fe7d989fc312184aa7463c4387 is also required
to make this work for EGit.
Change-Id: Iedc80e133e66d72e78ff0980b6e12634f75eca36
Signed-off-by: Carsten Pfeiffer <carsten.pfeiffer@gebit.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since this change may affect performance and memory consumption on every
access to a loose ref I explicitly made it a RFC to collect opinions.
Previously RefDirectory.scanRef() was not detecting an update of a
loose ref when the update didn't changed the modification time of
the backing file. RefDirectory cached loose refs and the way to detect
outdated cache entries was to compare lastmodification timestamp on the
file representing the ref. If two updates to the same ref happen faster
than the filesystem-timer granularity (for linux this is 2 seconds)
there is the possiblity that we don't detect the update.
Because of this bug EGit's PushOperationTest only works with 2 second
sleeps inside.
This change let RefDirectory use FileSnapshot to detect such situations.
FileSnapshot helps to remember when a file was last read from disk and
therefore enables to decide when to load a file from disk although
modification time has not changed.
Change-Id: I03b9a137af097ec69c4c5e2eaa512d2bdd7fe080
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>