Callers may want to inspect the contents of the cache, which this allows
them to do in a read-only fashion without any locking.
Change-Id: Ifd78e8ce34e26e5cc33e9dd61d70c593ce479ee0
Just as DfsPackDescription describes a pack but does not imply it is
open in memory, a DfsRepositoryDescription describes a repository at a
basic level without it necessarily being open.
Change-Id: I890b5fccdda12c1090cfabf4083b5c0e98d717f6
The docstring was copied from the local filesystem cache code, which
actually attempted to reconfigure the cache on the fly. The DFS cache is
designed to be "reconfigured" exactly once.
Change-Id: Ia0b01f5d6b6b3d3a68d65a5c229ff67c1cede5bc
In practice the DHT storage layer has not been performing as well as
large scale server environments want to see from a Git server.
The performance of the DHT schema degrades rapidly as small changes
are pushed into the repository due to the chunk size being less than
1/3 of the pushed pack size. Small chunks cause poor prefetch
performance during reading, and require significantly longer prefetch
lists inside of the chunk meta field to work around the small size.
The DHT code is very complex (>17,000 lines of code) and is very
sensitive to the underlying database round-trip time, as well as the
way objects were written into the pack stream that was chunked and
stored on the database. A poor pack layout (from any version of C Git
prior to Junio reworking it) can cause the DHT code to be unable to
enumerate the objects of the linux-2.6 repository in a completable
time scale.
Performing a clone from a DHT stored repository of 2 million objects
takes 2 million row lookups in the DHT to locate the OBJECT_INDEX row
for each object being cloned. This is very difficult for some DHTs to
scale, even at 5000 rows/second the lookup stage alone takes 6 minutes
(on local filesystem, this is almost too fast to bother measuring).
Some servers like Apache Cassandra just fall over and cannot complete
the 2 million lookups in rapid fire.
On a ~400 MiB repository, the DHT schema has an extra 25 MiB of
redundant data that gets downloaded to the JGit process, and that is
before you consider the cost of the OBJECT_INDEX table also being
fully loaded, which is at least 223 MiB of data for the linux kernel
repository. In the DHT schema answering a `git clone` of the ~400 MiB
linux kernel needs to load 248 MiB of "index" data from the DHT, in
addition to the ~400 MiB of pack data that gets sent to the client.
This is 193 MiB more data to be accessed than the native filesystem
format, but it needs to come over a much smaller pipe (local Ethernet
typically) than the local SATA disk drive.
I also never got around to writing the "repack" support for the DHT
schema, as it turns out to be fairly complex to safely repack data in
the repository while also trying to minimize the amount of changes
made to the database, due to very common limitations on database
mutation rates..
This new DFS storage layer fixes a lot of those issues by taking the
simple approach for storing relatively standard Git pack and index
files on an abstract filesystem. Packs are accessed by an in-process
buffer cache, similar to the WindowCache used by the local filesystem
storage layer. Unlike the local file IO, there are some assumptions
that the storage system has relatively high latency and no concept of
"file handles". Instead it looks at the file more like HTTP byte range
requests, where a read channel is a simply a thunk to trigger a read
request over the network.
The DFS code in this change is still abstract, it does not store on
any particular filesystem, but is fairly well suited to the Amazon S3
or Apache Hadoop HDFS. Storing packs directly on HDFS rather than
HBase removes a layer of abstraction, as most HBase row reads turn
into an HDFS read.
Most of the DFS code in this change was blatently copied from the
local filesystem code. Most parts should be refactored to be shared
between the two storage systems, but right now I am hesistent to do
this due to how well tuned the local filesystem code currently is.
Change-Id: Iec524abdf172e9ec5485d6c88ca6512cd8a6eafb
* changes:
Cosmetic adjustment of relative date format, do not display "0 months"
Make use of the many date formatting options in the log command
Define a utility class for handling Git date formats
Though it may seem less precise, "0 months" looks bad and the reference
Git implementation also does not display "0 months"
Change-Id: I488e9c97656f9941788ae88d7c5c1562ab6c26f0
The egit history view shows the files associated with a commit by using
a PathFilter. When following renames with a FollowFilter, the PathFilter
cannot be configured anymore because the affected files are simply not
known.
Thus, it should be possible to get to know which files are renamed.
Bug: 302549
Change-Id: I4761e9f5cfb4f0ef0b0e1e38991401a1d5003bea
Introducing a new abstract method is not nice when one
expects other to subclass them. Create default implementations
so old code that implements SystemReader does not break.
The default methods just delegate to the JVM.
Change-Id: I42cdfdcb6b29f7203697a23833dca85185b0b9b3
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Besides the formats known by git-log(1) we also add "locale"
and "localelocal" that formats dates according to the user's locale.
"locale" does not translate into local timezone, while
localelocal does.
Change-Id: I1c088dcec992c107e43f6c17be4ac9ed6eb428bf
We deleted the entry if there was a file and an index
entry, but not when there was just an index entry. Now
delete the file in both cases since the missing file
just means our worktree is dirty. This affected the
implementation of reset --hard.
Bug: 347574
Change-Id: Ie66fa61303472422830f5e33614e93ad65094e5d
* changes:
UploadPack: Fix races in smart HTTP negotiation
PackWriter: Export more statistics
Do not requeue state vector in stateless RPC fetch
Wrap excessively long line in BasePackFetchConnection
Fix smart HTTP client stream alignment errors
IndexDiff was extended to calculate ignored files and folders.
The calculation only considers files that are NOT in the index.
This functionality is required by the new EGit decorator implementation.
Bug: 359264
Change-Id: I8f09d6a4d61b64aeea80fd22bf3a2963c2bca347
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
This allows the following usage pattern:
PathFilterGroup.createFromStrings("path1", "path2");
Change-Id: I589e758cc55873ce75614602e017ac793435e24d
Signed-off-by: Kevin Sawicki <kevin@github.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
This constant determine the default start-point, if the user
don't want to create a branch from the current HEAD.
Change-Id: Iea944e11e80134fbafc4c47383457d5ed11a4164
Signed-off-by: Manuel Doninger <manuel.doninger@googlemail.com>
Signed-off-by: Chris Aniszczyk <caniszczyk@gmail.com>
Since we replaced GitIndex by DirCache JGit didn't fire
IndexChangedEvents anymore. For EGit this still worked with a high
latency since its RepositoryChangeScanner which is scheduled to
run each 10 seconds fires the event in case the index changes.
This scanner is meant to detect index changes induced by a different
process e.g. by calling "git add" from native git.
When the index is changed from within the same process we should fire
the event synchronously. Compare the index checksum on write to index
checksum when index was read earlier to determine if index really
changed. Use IndexChangedListener interface to keep DirCache decoupled
from Repository.
Change-Id: Id4311f7a7859ffe8738863b3d86c83c8b5f513af
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The checkout command was producing an inconsistent state of the index
which even confuses native git. The content sha1 of the touched index
entries was updated, but the length and the filemode was not updated.
Later in coding the index entries got automatically corrected (through
Dircache.checkoutEntry()) but the correction was after persisting the
index to disk. So, the correction was lost and we ended up with an index
where length and sha1 don't fit together.
A similar problem is fixed with "lastModified" of DircacheEntry. When
checking out a path without specifying an explicit commit (you want to
checkout what's in the index) the index was not updated regarding
lastModified. Readers of the index will think the checked-out
file is dirty because the file has a younger lastmodified then what's
in the index.
Change-Id: Ifc6d806fbf96f53c94d9ded0befcc932d943aa04
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
Signed-off-by: Jens Baumgart <jens.baumgart@sap.com>
Bug: 355205
This is required to make org.eclipse.jgit.test compile when SWTBot isn't
installed which should only be necessary for EGit developers.
Change-Id: I7fc22ca9fc3048cdcf211c56612a3d1b8bed8f6e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: I319f09577b3e04f6c31399fe8e57e9a9ad2c8a6c
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Change-Id: Ia0e73208b86c45a3d96698e973f6e70ec5cb7303
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
We should see whether the commit was a regular commit or something
else.
Change-Id: I82d8300cf3c53cb2bdcb6495386aadb803e0c6f7
Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
We can detect index changes using FileSnapshot. This is more efficient
and removes usage of a deprecated class.
Change-Id: I4a679102c9a1bd8e82b9ca93eb9dbbde445e9be4
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>