NOTE: this is a scorch zap file format change / bump to version 4.
In this optimization, the uint64 val stored in the vellum FST (term
dictionary) now may either be a uint64 postingsOffset (same as before
this change) or a uint64 encoding of the docNum + norm (in the case
where a term appears in just a single doc).
This change adds a zap PostingsIterator.nextBytes() method, which is
similar to Next(), but instead of returning a Posting instance,
nextBytes() returns the encoded freq/norm and location byte slices.
The zap merge code then provides those byte slices directly to the
intCoder's via a new method, intCoder.AddBytes(), thereby avoiding
having to encode many uvarint's.
The optimizations / changes include...
- reuse of a memory buf when serializing varint's.
- reuse of a govarint.U64Base128Encoder instance, as it's a thin,
wrapper around an underlying chunkBuf, so Reset()'s on the
chunkBuf is enough for encoder reuse.
- chunkedIntcoder.Write() method was changed to invoke w.Write() less
often by forming a larger, reused buf. Profiling and analysis
showed w.Write() was getting called a lot, often with tiny 1 or 2
byte inputs. The theory is w.Write() and its underlying memmove()
can be more efficient when provided with larger bufs.
- some repeated code removal, by reusing the Close() method.