shithub: neoventi

Download patch

ref: 14883c018e531ff84ae4996268f3091e2e4cad45
parent: 22806f6ca81790a672b6b5c23916c15c211cc944
author: Noam Preil <noam@pixelhero.dev>
date: Sun Sep 7 02:00:25 EDT 2025

cache: change sentinel to -1, fix false hit

--- a/cache.c
+++ b/cache.c
@@ -70,6 +70,8 @@
 	}
 	// alignment
 	assert(((usize)buckets&0b111111) == 0);
+	for(int i = 0; i < BUCKETS; i += 1)
+		memset(buckets[i].index, 0xFF, 32);
 	bucketend = buckets + BUCKETS;
 }
 
@@ -81,7 +83,6 @@
 	int nbucket, i;
 	bucket_t *bucket;
 	u32int index;
-	fprint(2, "CL EVICT\n");
 	while(1){
 		nbucket = rand() % BUCKETS;
 		bucket = &buckets[nbucket];
@@ -88,10 +89,10 @@
 		if(!canlock(bucket))
 			continue;
 		i = rand() % BUCKETSIZE;
-		if(bucket->index[i] != 0){
+		if(bucket->index[i] != 0xFFFFFFFF){
 			// Found an entry; evict it!
 			index = bucket->values[i*3] | (bucket->values[i*3+1]<<8) | (bucket->values[i*3+2]<<16);
-			bucket->index[i] = 0;
+			bucket->index[i] = 0xFFFFFFFF;
 			unlock(bucket);
 			return index;
 		}
@@ -146,10 +147,14 @@
 static int
 trybucket(bucket_t *bucket, int *i, u32int key)
 {
+	if(key == 0xFFFFFFFF){
+		print("internal error: invalid key");
+		abort();
+	}
 	for(*i = 0; *i < BUCKETSIZE; *i += 1){
 		if(bucket->index[*i] == key)
 			return 1;
-		if(bucket->index[*i] == 0)
+		if(bucket->index[*i] == 0xFFFFFFFF)
 			return 0;
 	}
 	return -1;
@@ -202,13 +207,11 @@
 			// miss, but there's an empty slot in this bucket.
 			cacheindex = grabfree();
 			*buf = data+(usize)cacheindex*8192;
-			fprint(2, "CL %d %d => key %d:  %08x\n", arenaindex, blockindex, key, *buf);
 			put(bucket, i, key, cacheindex);
 			return 0;
 		case 1:
 			// already exists
 			get(bucket, i, buf);
-			fprint(2, "CL %d %d => key %d: HIT %08x\n", arenaindex, blockindex, key, *buf);
 			return 1;
 		}
 	}
--- a/notebook
+++ b/notebook
@@ -3069,4 +3069,139 @@
 
 ...I'm still cheating the cache though, but I'm not sure how necessary it is. Probably necessary.
 
-...if I vac the same data multiple times, I'm getting completely different scores each time. This is... not how vac is suposed to work. Each one seems valid, but... that's really, really not how this is intended to work.
+...if I vac the same data multiple times, I'm getting completely different scores each time. This is... not how vac is suposed to work. Each one seems valid, but... that's really, really not how this is intended to work... unless it is.
+
+It is; can repro on venti.
+
+For now, writing back the arena trailer is most important. Once that's in, 'just' testing and bugfixes.
+
+...probably worth taking a minute to reread all the venti man pages to make sure I'm not forgetting anything about the disk format, with how long it's been.
+
+Man pages are useless, /sys/doc/venti was a good refresher though. Nothing really new, but definitely helpful.
+
+Arena needs to be prepended to the directory before the trailer with clump info _before_ considering it as "fully in" the arena.
+
+and then, tbqh, the disk layer needs major cleanup. Maybe a rewrite. neoventi is too prototype-y. With knowledge from first pass, should be able to do a quick rewrite, but - well, should have been done with the prototype ages ago so lets just do that first before making more claims about how easy things are.
+
+Add a vtwritetrailer as the last part of writing a clump to the arena. Need to be able to load and modify the clump info from that trailer...
+
+venti has a function writeclumpinfo() that does this, as a function of the arena, the clump number, and the ClumpInfo to write. cib can be fetched based on clump number (getcib), then pack it, then just put the cib back.
+
+Their getcib is a function, as well, of whether or not it's writing - because it needs to pass a mode to their disk layer. That's not relevant for us, though - it probably should be? Disk layer design needs some reworking.
+
+For now, let's add a clean layer for the trailer in particular.
+
+Actual interfaces we need:
+
+- Read all blocks from an arena's trailer (used on startup to cross-reference against the index and add missing entries)
+- Add a block to the trailer, for use during writing
+- Read one back for checksum validation?
+
+The first two are the only definites.
+
+vtarenareadtrailer(VtArena *)
+
+Expected use is by loading code, where we're not looking at the data - we want it to be read _into the index_ - success code return only, then. And, should probably maintain the trailer vs directory terminology used in the original venti paper.
+
+int vtarenareaddirectory(VtArena*);
+int vtarenawritedirectory(VtArena*, int clumpindex, char *buf);
+
+ClumpInfo contains type (ignored by neoventi), size, uncsize, and score. This - what??? We're not actually including enough info here to find the clumps at all, are we? Ah, no, uncsize :)
+
+Start at arena base. Read trailer in reverse; for each block:
+	insertindex(score, addr)
+	addr += uncsize
+
+And we can use the diskstats vs memstats to compute which ones actually need processing.
+
+venti's process:
+
+- cib = getcib(arena, clumpindex, 1, &r)
+- packclumpinfo(clumpinfo, &cib->data->data[cib->offset])
+- putcib(arena, cib)
+
+getcib computes block index as `clump / arena->clumpmax`. Offset is `(clump - (block*clumpmax)) * ClumpInfoSize`. That offset is the offset of the _next write_ to the directory block.
+
+then, block is grabbed as `arena->base + arena->size - (block+1)*blocksize`.
+
+...TODO, we should keep the active directory block within the arena at all times. No reason to even be going through the cache lookups when we'll need it for every write.
+
+Actually, doing that later makes no sense. Better to do that now and have a cleaner interface from the beginning. I... guess I can clean up the disk interface a bit more while I'm at it?
+
+Arenas need to be able to lock, to prevent simultaneous writes. TODO: support striping writes across arenas for higher write throughput? What else do we actually _need_?
+
+- Read clumps arena
+	- Internally, built on reading blocks, and combining the needed parts
+- Write to arena
+	- Appends to blocks until full
+	- Grab the live block within the cache, read it back
+	- Advance blocks, flush, needs to be reasonably safe
+	- Write to directory - does this need to be its own API? Could just be part of arenawrite.
+		- Cleaner to have it separated internally, at least.
+- Scan trailer
+- Internally, we need to know the line between the data log and the directory, so that we can bound reads
+- Seal arena: compute checksum, mark as finished
+
+That's really all. We're not going to respect ctime or wtime; the concepts are at best misleading here, and can be implemented in the layer above venti if needed. The version is only needed when loading the arena, to make sure we know how. We probably don't need the name after loading either? Might be useful for logging though, no reason to drop it.
+
+TODO: dynamic index arena support. Needed for >32TiB instances. Large, but very plausible.
+
+vtreadarena is currently computing `u64 end = arena->size - arenadirsize(arena, arena->arenastats.clumps)` on every read. This is silliness; it's a constant for each arena, and that dirsize API is silliness too. Let's just store the directory offset internally.
+
+Arena APIs are mostly okay, now. Index APIs need work.
+
+Okay, tested with 	if(arena->indexstats.clumps != arena->arenastats.clumps && !vtarenareaddirectory(arena))
+		sysfatal("unable to load arena directory");
+
+on startup, but since we're not updating arenastats, this doesn't fail properly. When _should_ we be doing that update? On every write? Probably should just do  tthat for now, and then clean it after.
+
+Okay, so writing needs to update:
+
+- The data log
+- The arena stats in the header
+- The arena directory
+- The index
+
+It needs to be crash safe. Data log comes first. If we have that but nothing else, we'll reuse the space, and no client will have a reference yet. Then, arena directory. If we have both of those but not arena stats, we can scan on startup, validate the score, and finish the job. Then, arena stats. That'll make recovery easier. Index is last.
+
+So, we should unconditionally scan the directory; it may have entries missing from arenastats.
+
+...though, it's easier to do that detection the other way? Data log, then arena stats? If we have both of those, we can easily see there's something missing from the directory, scan through the log, and update the directory with that info. But, we don't _want_ to have to scan the log, that's what the directory is _for_! - but scanning the directory would show us where in the log the missing data is.
+
+And that makes testing and detection easy. Okay.
+
+BUG: venti/read e242ed3bffccdf271b7fbaf34ed72d089537b42f
+venti/read: could not read block: not connected
+
+This is correct; I just reformatted. But, the error isn't being handed from the server to the client properly. Later problem.
+
+BUG: if I don't force preads, cache is still giving bad results. ffs.
+
+This one's important enough to look into now. If I force cachelookup return code to 0 even on hit, trace is this on a brand-new venti, with a theoretically empty cache:
+
+CL 1 875 => key 66411:  001639b8
+reading from disk for readlookup...
+entry found in bucket: arena 0, offset 0
+CL 0 0 => key 0: HIT 001619b8
+
+Wait what???
+
+OH. OHHHHH
+
+static int
+trybucket(bucket_t *bucket, int *i, u32int key)
+{
+	if(key == 0)
+		sysfatal("internal error: invalid key");
+	for(*i = 0; *i < BUCKETSIZE; *i += 1){
+		if(bucket->index[*i] == key)
+			return 1;
+		if(bucket->index[*i] == 0)
+			return 0;
+	}
+	return -1;
+}
+
+Just added that assertion to the top and it's immediatley failing. That invariant is fine for arenas, not indices, and I didn't document it a year ago when I was working on this, oops. Fortunately, cache keys are largely arbitrary, they just need to be unique. Can just do bucket+1 and should be fine.
+
+But, that's not great. That's a clunky API. This shouldn't be necessary. Much cleaner: change the 'key missing' value to 0xFFFFFFFF (-1, but it's unsigned, so.), init it that way on startup, and make 0 a valid key entry.
--