Browse Source
Always use streaming (for SHA-checksum & collision detection) when indexing whole blobs, regardless of their size. Positives: * benefits of bugfix #312868 will apply to all runtimes, without additional conf for mem-constrained JVMs (5MB huge for some) * no byte array allocation (re-uses readBuffer instead of allocating new full-size array) * mildly better overall performance (given the usual blob-does-not-need-collision-checking case) * removes unnecessary code Negative: * doubles the disk IO for a blob comparision (comparitively rare occurance) I perf-tested a range of threshold sizes against a random selection of packfiles I found on my harddrive, the results are here: https://spreadsheets.google.com/ccc?key=tLCQElyyd2RKN9QevfvgwGQ&hl=en_GB#gid=1 My interpretation of the results is that the streaming size threshold isn't beneficial (actually seems to be very slightly detrimental) -so we should just get rid of it. This tallies with some of the comments Shawn & I had for the default value of streamFileThreshold in the review for I862afd4c: http://egit.eclipse.org/r/#patch,sidebyside,2040,2,org.eclipse.jgit/src/org/eclipse/jgit/transport/IndexPack.java The perf-test code is here: https://gist.github.com/735402 It's a bit scruffy but basically does 10 runs (in randomised order) for each threshold size on various packfiles, waiting a second between each pack-indexing to allow GC to catch up. I know it's not perfect - proper perf testing is hard to do :-)stable-0.10
roberto
14 years ago
2 changed files with 1 additions and 9 deletions
Loading…
Reference in new issue