You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
90 lines
4.2 KiB
90 lines
4.2 KiB
14 years ago
|
JGit Storage on DHT
|
||
|
-------------------
|
||
|
|
||
|
This implementation still has some pending issues:
|
||
|
|
||
|
* DhtInserter must skip existing objects
|
||
|
|
||
|
DirCache writes all trees to the ObjectInserter, letting the
|
||
|
inserter figure out which trees we already have, and which are new.
|
||
|
DhtInserter should buffer trees into a chunk, then before writing
|
||
|
the chunk to the DHT do a batch lookup to find the existing
|
||
|
ObjectInfo (if any). If any exist, the chunk should be compacted to
|
||
|
eliminate these objects, and if there is room in the chunk for more
|
||
|
objects, it should go back to the DhtInserter to be filled further
|
||
|
before flushing.
|
||
|
|
||
|
This implies the DhtInserter needs to work on multiple chunks at
|
||
|
once, and may need to combine chunks together when there is more
|
||
|
than one partial chunk.
|
||
|
|
||
|
* DhtPackParser must check for collisions
|
||
|
|
||
|
Because ChunkCache blindly assumes any copy of an object is an OK
|
||
|
copy of an object, DhtPackParser needs to validate all new objects
|
||
|
at the end of its importing phase, before it links the objects into
|
||
|
the ObjectIndexTable. Most objects won't already exist, but some
|
||
|
may, and those that do must either be removed from their chunk, or
|
||
|
have their content byte-for-byte validated.
|
||
|
|
||
|
Removal from a chunk just means deleting it from the chunk's local
|
||
|
index, and not writing it to the global ObjectIndexTable. This
|
||
|
creates a hole in the chunk which is wasted space, and that isn't
|
||
|
very useful. Fortunately objects that fit fully within one chunk
|
||
|
may be easy to inflate and double check, as they are small. Objects
|
||
|
that are big span multiple chunks, and the new chunks can simply be
|
||
|
deleted from the ChunkTable, leaving the original chunks.
|
||
|
|
||
|
Deltas can be checked quickly by inflating the delta and checking
|
||
|
only the insertion point text, comparing that to the existing data
|
||
|
in the repository. Unfortunately the repository is likely to use a
|
||
|
different delta representation, which means at least one of them
|
||
|
will need to be fully inflated to check the delta against.
|
||
|
|
||
|
* DhtPackParser should handle small-huge-small-huge
|
||
|
|
||
|
Multiple chunks need to be open at once, in case we get a bad
|
||
|
pattern of small-object, huge-object, small-object, huge-object. In
|
||
|
this case the small-objects should be put together into the same
|
||
|
chunk, to prevent having too many tiny chunks. This is tricky to do
|
||
|
with OFS_DELTA. A long OFS_DELTA requires all prior chunks to be
|
||
|
closed out so we know their lengths.
|
||
|
|
||
|
* RepresentationSelector performance bad on Cassandra
|
||
|
|
||
|
The 1.8 million batch lookups done for linux-2.6 kills Cassandra, it
|
||
|
cannot handle this read load.
|
||
|
|
||
|
* READ_REPAIR isn't fully accurate
|
||
|
|
||
|
There are a lot of places where the generic DHT code should be
|
||
|
helping to validate the local replica is consistent, and where it is
|
||
|
not, help the underlying storage system to heal the local replica by
|
||
|
reading from a remote replica and putting it back to the local one.
|
||
|
Most of this should be handled in the DHT SPI layer, but the generic
|
||
|
DHT code should be giving better hints during get() method calls.
|
||
|
|
||
|
* LOCAL / WORLD writes
|
||
|
|
||
|
Many writes should be done locally first, before they replicate to
|
||
|
the other replicas, as they might be backed out on an abort.
|
||
|
|
||
|
Likewise some writes must take place across sufficient replicas to
|
||
|
ensure the write is not lost... and this may include ensuring that
|
||
|
earlier local-only writes have actually been committed to all
|
||
|
replicas. This committing to replicas might be happening in the
|
||
|
background automatically after the local write (e.g. Cassandra will
|
||
|
start to send writes made by one node to other nodes, but doesn't
|
||
|
promise they finish). But parts of the code may need to force this
|
||
|
replication to complete before the higher level git operation ends.
|
||
|
|
||
|
* Forks/alternates
|
||
|
|
||
|
Forking is common, but we should avoid duplicating content into the
|
||
|
fork if the base repository has it. This requires some sort of
|
||
|
change to the key structure so that chunks are owned by an object
|
||
|
pool, and the object pool owns the repositories that use it. GC
|
||
|
proceeds at the object pool level, rather than the repository level,
|
||
|
but might want to take some of the reference namespace into account
|
||
|
to avoid placing forked less-common content near primary content.
|