diff --git a/index/scorch/README.md b/index/scorch/README.md index cec982eb..690e7d3d 100644 --- a/index/scorch/README.md +++ b/index/scorch/README.md @@ -183,7 +183,7 @@ An ASCII art example: [ 0 1 1 ] Compute bitset segment-1-deleted-by-2: - [ 0 0 0 ] + [ 0 ] OR it with previous (nil) still just nil @@ -418,3 +418,93 @@ state: 2, 4, 8 2-X, 4-X, 8-X, nil merger finishes: new segment Y, is not valid, need to be recomputed + + +### Bolt Segment Proposal + +Bucket + +"f" field storage + + Key Val + field name field id (var uint16) + + // TODO field location bits + +"d" term dictionary storage + Key Val + field id (var uint16) Vellum FST (mapping term to posting id uint64) + + +"p" postings list storage + Key Val + posting id (var uint64) Roaring Bitmap Serialization (doc numbers) - see FromBuffer + + +"x" chunked data storage + Key Val + chunk id (var uint64) sub-bucket + + Key Val + posting id (var uint64) sub-bucket + + + ALL Compressed Integer Encoding []uint64 + Key Val + "f" freqs 1 value per hit + "n" norms 1 value per hit + "i" fields values per hit + "s" start values per hit + "e" end values per hit + "p" pos values per hit + "a" array pos + entries + each entry is count + followed by uint64 + +"s" stored field data + Key Val + doc num (var uint64) sub-bucket + + Key Val + "m" mossy-like meta packed + + 16 bits - field id + 8 bits - field type + 2? bits - array pos length + + X bits - offset + X bits - length + + "d" raw []byte data (possibly compressed, need segment level config?) + + "a" array position info, packed slice uint64 + + + + + +Notes: + +It is assumed that each IndexReader (snapshot) starts a new Bolt TX (read-only) immediately, and holds it up until it is no longer needed. This allows us to use (unsafely) the raw bytes coming out of BoltDB as return values. Bolt guarantees they will be safe for the duration of the transaction (which we arrange to be the life of the index snapshot). + +Only physically store the fields in one direction, even though at runtime we need both. Upon opening the index, we can read in all the k/v pairs in the "f" bucket. We use the unsafe package to create a []string inverted mapping pointing to the underlying []byte in the BoltDB values. + +The term dictionary is stored opaquely as Vellum FST for each field. When accessing these keys, the []byte return to us is mmap'd by bolt under the hood. We then pass this to vellum using its []byte API, which then operates on it without ever forcing whole thing into memory unless needed. + +We do not need to persist the dictkeys slice since it is only there to support the dictionary iterator prefix/range searches, which are supported directly by the FST. + +Theory of operation of chunked storage is as follows. The postings list iterators only allow starting at the beginning, and have no "advance" capability. In the memory version, this means we always know the Nth hit in the postings list is the Nth entry in some other densely packed slice. However, while OK when everything is in RAM, this is not as suitable for a structure on disk, where wading through detailed info of records you don't care about is too expensive. Instead, we assume some fixed chunking, say 1024. All detailed info for document number N can be found inside of chunk N/1024. Now, the Advance operation still has to Next it's way through the posting list. But, now when it reaches a hit, it knows the chunk index as well as the hit index inside that chunk. Further, we push the chunk offsets to the top of the bolt structure, under the theory that we're likely to access data inside a chunk at the same time. For example, you're likely to access the frequency and norm values for a document hit together, so by organizing by chunk first, we increase the likelihood that this info is nearby on disk. + +The "f" and "n" sub-buckets inside a posting have 1 entry for each hit. (you must next-next-next within the chunk) + +The "i", "s", "e", "p", sub-buckets have entries for each hit. (you must have read and know the freq) + +The "a" sub-bucket has groupings, where each grouping starts with a count, followed by entries. + +For example, lets say hit docNum 27 has freq of 2. The first location for the hit has array positions (0, 1) length 2, and the second location for the hit has array positions (1, 3, 2) length 3. The entries in the slice for this hit look like: + +2 0 1 3 1 3 2 +^ ^ +| next entry, number of ints to follow for it +number of ints to follow for this entry