shithub: furgit

ref: 1734207266752b9464f42c31f7728a7e3c692c50
dir: /research/dynamic_packfiles.txt/

View raw version
dynamic packfiles to append objects

gc/refcount process punches page-sized holes in them for pages fully
within the space of unwanted objects, after setting a tombstone mark

holes are recorded in an index and re-used

then, if desired, the repack process removes all the punched holes
and anything surrounding from unwanted objects that are slightly out
of the page boundary

repack is not really git's repack algorithm, it's bascially just
defragmentation.

genreational bloom filters

idx design
==========

so, let's first get our invariants and patterns clear.

* fixed-length cryptographic object IDs
* essentially uniform key distribution
* exact lookup only, no range scans, no ordered iteration requirements
* reads are extremely important
* writes are mostly append-like
* deletes/tombstones may happen later but are secondary

1st design
----------

* mutable front index
* immutable base index
* period merge/compaction into a new base generation



upload-pack/send-pack/defrag
============================

take current pack, remove dead objects/holes, filter objects out, record
offsets and adjust ofs_deltas since they always go backwards, write the pack
back; then stream written pack to client. two-step necessary because pack
header includes object count; could have a custom new protocol that doesn't do
so.