* stable-4.9:
BatchRefUpdate: repro racy atomic update, and fix it
Delete unused FileTreeIteratorWithTimeControl
Fix RacyGitTests#testRacyGitDetection
Change RacyGitTests to create a racy git situation in a stable way
Silence API warnings
Change-Id: Id5bf44645655fca40ad22bb1f1ad20a7c2e8f6db
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The latest revision includes various fixes to allow the build
to work with recent versions of Bazel.
Change-Id: I72c100b99762010946d9b2784286af560bbdf185
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
FileTreeIteratorWithTimeControl was deleted in a024759, but was
not removed from the BUILD file, thus causing the bazel build to
fail.
Change-Id: I892c0ffcac947298d0d6009374ee2c5d9afefb66
Signed-off-by: David Pursehouse <david.pursehouse@gmail.com>
(cherry picked from commit e54fde8616)
PackedBatchRefUpdate was creating a new packed-refs list that was
potentially unsorted. This would be papered over when the list was
read back from disk in parsePackedRef, which detects unsorted ref
lists on reading, and sorts them. However, the BatchRefUpdate also
installed the new (unsorted) list in-memory in
RefDirectory#packedRefs.
With the timestamp granularity code committed to stable-5.1, we can
more often accurately decide that the packed-refs file is clean, and
will return the erroneous unsorted data more often. Unluckily timed
delays also cause the file to be clean, hence this problem was
exacerbated under load.
The symptom is that refs added by a BatchRefUpdate would stop being
visible directly after they were added. In particular, the Gerrit
integration tests uses BatchRefUpdate in its setup for creating the
Admin group, and then tries to read it out directly afterward.
The tests recreates one failure case. A better approach would be to
revise RefList.Builder, so it detects out-of-order lists and
automatically sorts them.
Fixes https://bugs.eclipse.org/bugs/show_bug.cgi?id=548716 and
https://bugs.chromium.org/p/gerrit/issues/detail?id=11373.
Bug: 548716
Change-Id: I613c8059964513ce2370543620725b540b3cb6d1
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The only usage of this test iterator was removed in df637928d. Hence
delete this iterator and associated test.
Change-Id: I47710133ec3edc675c21db210960c024982668c6
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
(cherry picked from commit a024759cf5)
This test case assumed file system timestamp resolution of 1 second. On
filesystems with a finer resolution this test fails since the index
entry is only smudged if the file index entry's lastModified and the
lastModified of the git index itself are within the same filesystem
timer tick. Fix this by ensuring that these timestamps are identical
which should work for any filesystem timer resolution.
Bug: 548188
Change-Id: Id84d59e1cfeb48fa008f8f27f2f892c4f73985de
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Christian Halstrick <christian.halstrick@sap.com>
(cherry picked from commit 1159f9dd7c)
By using File#setLastModified, we can create a racy git situation
stably.
Tested with --runs_per_test=100
Bug: 526111
Change-Id: I60b3632d353e19f335668325aa603640be423f58
Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
(cherry picked from commit df637928d2)
Move the handling of cached user and system config to getSystemConfig
and getUserConfig methods and revert the implementation of
openSystemConfig and openUserConfig to the old stateless
implementation.
This ensures the open methods respect the passed-in parent config, which
may be different on each invocation. Additionally, returning a new
instance matches the behavior of the previous implementation of the
default system reader, which downstream callers may be depending on.
Move the implementation of the new caching methods getSystemConfig and
getUserConfig up to SystemReader. This avoids that we break the ABI for
subclasses of SystemReader.
Also see [1] which fixed a similar problem with Gerrit's custom
SystemReader.
[1] https://gerrit-review.googlesource.com/c/gerrit/+/225458
Change-Id: If54a2491932d8fc914d4649cb73c9e837c5b8ad0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This ensures that only one instance of user and one instance of system
config is set.
Change-Id: Idd00150f91d2d40af79499dd7bf8ad5940f87c4e
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
deleteChildren was called on directory instead of gitDir, leading to a
potential null pointer exception if the git directory existed initially.
Bug: 550340
Change-Id: Iafc3b2961253a99862a59e81c7371f7bc564b412
Signed-off-by: Adrien Bustany <adrien-xx-eclipse@bustany.org>
Ensure we use the same type when comparing seconds since the epoch.
This does not prevent that in 2038 timestamps in seconds since the epoch
stored in a 32 bit integer will overflow. Integer.MAX_VALUE translates
to 2038-01-19T03:14:07Z. After this date we'll have an issue since we
store seconds since the epoch in a 32 bit integer in some places.
Bug: 319142
Change-Id: If0c03003d40b480f044686e2f7a2f62c9f4e2fe1
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Replace the two int variables smudge_s and smudge_ns by an Instant and
use the new method DirCacheEntry.mightBeRacilyClean(Instant).
Change-Id: Id70adbb0856a64909617acf65da1bae8e2ae934a
Signed-off-by: Michael Keppler <Michael.Keppler@gmx.de>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
So far the git configuration and the system wide git configuration were
always reloaded when jgit accessed these global configuration files to
access global configuration options which are not in the context of a
single git repository. Cache these configurations in SystemReader and
only reload them if their file metadata observed using FileSnapshot
indicates a modification.
Change-Id: I092fe11a5d95f1c5799273cacfc7a415d0b7786c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Thomas Wolf <thomas.wolf@paranor.ch>
FS determines FileStore attributes in a background thread and tries to
save the results to the global git configuration. This competed with
LocalDiskRepositoryTestCase#setup trying to save changes to the same
file requiring the same lock. This frequently led to one of the threads
failing to acquire the lock.
Fix this by first initiating determination of FileStore attributes which
then uses a MockSystemReader not using a userConfig stored to disk which
avoids this race for the lock.
Change-Id: I30fcd96bc15100f8ef9b2a9eb3320bb5ace97c67
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The existing javadoc was copied from another method and not adapted.
Change-Id: I39a7e5d719b2c379de9bd1a4710a55a73700c6f0
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
- fix handling of interrupts in FileStoreAttributes#saveToConfig
- increase retry wait time to 100ms
- don't wait after last retry
- dont retry if failure is caused by another exception than
LockFailedException
Change-Id: I108c012717d2bcce71f2c6cb9cf0879de704ebc2
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Tests shall not modify ~/.gitconfig. When running tests with bazel this
test failed since bazel isolates tests in a sandbox.
Change-Id: I7dd092afd14972da58a95eb7c200d353f0959fa1
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
The method org.eclipse.jgit.util.FS.supportsAtomicCreateNewFile()
should default to true as mentioned in docs [1]
org.eclipse.jgit.util.FS_POSIX.supportsAtomicCreateNewFile() method
will set the value to false if the git config
core.supportsatomiccreatenewfile is not set.
It should default to true if the configuration is undefined.
[1]
4169a95a65/org.eclipse.jgit/src/org/eclipse/jgit/util/FS_POSIX.java (L372)
Bug: 544164
Change-Id: I16ccf989a89da2cf4975c200b3228b25ba4c0d55
Signed-off-by: Vishal Devgire <vishaldevgire@gmail.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
It missed to call the setup() method of its super class which prepares
the MockSystemReader
Change-Id: I39858749f8d0115fc6ac7edc8847ffb2bbc85c33
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
FS#getFileStoreAttributes used the real userConfig and not the mocked
one. This led to test errors when running tests with Bazel since it
sandboxes tests which prevents they can write to ~/.gitconfig.
Fix this by first preparing the MockedSystemReader and the mocked config
before calling FS#getFileStoreAttributes.
Also fix ConfigTest which broke due to this change since it inherits
from LocalDiskRepositoryTestCase and calls its setup method which was
changed here. We can no longer assert by comparing plain text since FS
adds FileStoreAttributes to the mocked userConfig. Also the default
options seen by this test changed since we now use a mocked config.
Change-Id: I76bc7c94953fe979266147d3b309a68dda9d4dfe
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
If we use the default system reader FileStoreAttributes cannot persist
attributes in userConfig when tests run in Bazel due to sandboxing.
Hence we need to ensure that all tests use MockSystemReader (and
especially a mocked userConfig).
Change-Id: Ic1ad8e2ec5a150c5433434a5f6667d6c4674c87d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This ensures we don't try to persist MockConfig using its superclasses
save() method which fails with an NPE since MockConfig has no backing
file.
Change-Id: Ifba2d24c9438bb30d3828ed31a4c131f940b45eb
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
We can't add this method to the super class StoredConfig since that
abstracts from filesystem storage. MockSystemReader.MockConfig is a
StoredConfig and is also used by tests for dfs based storage. Hence
remove this leaky abstraction.
This implies we always use the fallback FileStoreAttributes which means
a config file modification is considered racy within the first 2
seconds. This should not be an issue since typically configs change
rarely and re-reading a config within the racy period is relatively
cheap since configs are small.
Change-Id: Ia2615addc24a7cadf3c566ee842c6f4f07e159a5
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
This was enabled unintentionally in 06fc6c7c and spams the test logs. We
can enable this when needed.
Change-Id: I9f3042c0e285ff236be65fcc02bdcfdb90efc3af
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
- use slf4j-simple for logging in test runs
- for log configuration see
https://www.slf4j.org/api/org/slf4j/impl/SimpleLogger.html
Change-Id: I9f0a532644b31162c867cd0d63f083296eaf6be5
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
- use FS.DETECTED instead of db.getFS() since the ssh config is
typically in a different place than the repository, the same is used in
OpenSshConfig
- reduce unnecessary repeated writes by introducing wait for one tick of
the file time resolution
Change-Id: Ifac915e97ff420ec5cf8e2f162e351f9f51b6b14
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Increase the safety factor to 2.5x for extra safety if max of measured
timestamp resolution and measured minimal racy threshold is < 100ms, use
1.25 otherwise since for large filesystem resolution values the
influence of finite resolution of the system clock should be negligible.
Before, not yet using the newly introduced minRacyThreshold measurement,
the threshold was 1.1x FS resolution, and we could issue the
following sequence of events,
start
create-file
read-file (currentTime)
end
which had the following timestamps:
create-file 1564589081998
start 1564589082002
read 1564589082003
end 1564589082004
In this case, the difference between create-file and read is 5ms,
which exceeded the 4ms FS resolution, even though the events together
took just 2ms of runtime.
Reproduce with:
bazel test --runs_per_test=100 \
//org.eclipse.jgit.test:org_eclipse_jgit_internal_storage_file_FileSnapshotTest
The file system timestamp resolution is 4ms in this case.
This code assumes that the kernel and the JVM use the same clock that
is synchronized with the file system clock. This seems plausible,
given the resolution of System.currentTimeMillis() and the latency for
a gettimeofday system call (typically ~1us), but it would be good to
justify this with specifications.
Also cover a source of flakiness: if the test runs under extreme load,
then we could have
start
create-file
<long delay>
read
end
which would register as an unmodified file. Avoid this by skipping the
test if end-start is too big.
[msohn]:
- downported from master to stable-5.1
- skip test if resolution is below 10ms
- adjust safety factor to 1.25 for resolutions above 100ms
Change-Id: I87d2cf035e01c44b7ba8364c410a860aa8e312ef
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Since we now measure file time resolution we can use it to replace the
hard coded wait time of 25ms. FileSnapshot#equals will return true until
the mtime of the old (o) and the new FileSnapshot (n) differ by at least
one file time resolution.
Change-Id: Icb713a80ce9eb929242ed083406bfb6650c72223
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Cache FileStoreAttributeCache entries since looking up FileStore for a
file may be expensive on some platforms.
Implement a simple LRU cache based on ConcurrentHashMap using a simple
long counter to order access to cache entries.
Change-Id: I4881fa938ad2f17712c05da857838073a2fc4ddb
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Signed-off-by: Marc Strapetz <marc.strapetz@syntevo.com>
Also-By: Marc Strapetz <marc.strapetz@syntevo.com>
Use the fallback timestamp resolution as already described in the
javadoc of these methods. Using zero file timestamp resolution doesn't
make sense.
Change-Id: Iaad2a0f99c3be3678e94980a0a368181b6aed38c
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
To enable persisting the minimal racy threshold per FileStore add a
new config option to the user global git configuration:
- Config section is "filesystem"
- Config subsection is concatenation of
- Java vendor (system property "java.vendor")
- Java version (system property "java.version")
- FileStore's name, on Windows we use the attribute volume:vsn instead
since the name is not necessarily unique.
- separated by '|'
e.g.
"AdoptOpenJDK|1.8.0_212-b03|/dev/disk1s1"
The same prefix is used as for filesystem timestamp resolution, so
both values are stored in the same config section
- The config key for minmal racy threshold is "minRacyThreshold" as a
time value, supported time units are those supported by
DefaultTypedConfigGetter#getTimeUnit
- measure for 3 seconds to limit runtime which depends on hardware, OS
and Java version being used
If the minimal racy threshold is configured for a given FileStore the
configured value is used instead of measuring it.
When the minimal racy threshold was measured it is persisted in the user
global git configuration.
Rename FileStoreAttributeCache to FileStoreAttributes since this class
is now declared public in order to enable exposing all attributes in one
object.
Example:
[filesystem "AdoptOpenJDK|11.0.3|/dev/disk1s1"]
timestampResolution = 7000 nanoseconds
minRacyThreshold = 3440 microseconds
Change-Id: I22195e488453aae8d011b0a8e3276fe3d99deaea
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>
Also-By: Marc Strapetz <marc.strapetz@syntevo.com>
By running FileSnapshotTest#detectFileModified we found that the sum of
measured filesystem timestamp resolution and measured clock resolution
may yield a too small interval after a file has been modified which we
need to consider racily clean. In our tests we didn't find this behavior
on all systems we tested on, e.g. on MacOS using APFS and Java 8 and 11
this effect was not observed.
On Linux (SLES 15, kernel 4.12.14-150.22-default) we collected the
following test results using Java 8 and 11:
In 23-98% of 10000 test runs (depending on filesystem type and Java
version) the test failed, which means the effective interval which needs
to be considered racily clean after a file was modified is larger than
the measured file timestamp resolution.
"delta" is the observed interval after a file has been modified but
FileSnapshot did not yet detect the modification:
"resolution" is the measured sum of file timestamp resolution and clock
resolution seen in Java.
Java version filesystem failures resolution min delta max delta
1.8.0_212-b04 btrfs 98.6% 1 ms 3.6 ms 6.6 ms
1.8.0_212-b04 ext4 82.6% 3 ms 1.1 ms 4.1 ms
1.8.0_212-b04 xfs 23.8% 4 ms 3.7 ms 3.9 ms
1.8.0_212-b04 zfs 23.1% 3 ms 4.8 ms 5.0 ms
11.0.3+7 btrfs 98.1% 3 us 0.7 ms 4.7 ms
11.0.3+7 ext4 98.1% 6 us 0.7 ms 4.7 ms
11.0.3+7 xfs 98.5% 7 us 0.1 ms 8.0 ms
11.0.3+7 zfs 98.4% 7 us 0.7 ms 5.2 ms
Mac OS
1.8.0_212 APFS 0% 1 s
11.0.3+7 APFS 0% 6 us
The observed delta is not distributed according to a normal gaussian
distribution but rather random in the observed range between "min delta"
and "max delta".
Run this test after measuring file timestamp resolution in
FS.FileAttributeCache to auto-configure JGit since it's unclear what
mechanism is causing this effect.
In FileSnapshot#isRacyClean use the maximum of the measured timestamp
resolution and the measured "delta" as explained above to decide if a
given FileSnapshot is to be considered racily clean. Add a 30% safety
margin to ensure we are on the safe side.
Change-Id: I1c8bb59f6486f174b7bbdc63072777ddbe06694d
Signed-off-by: Matthias Sohn <matthias.sohn@sap.com>