The optimizations / changes include...
- reuse of a memory buf when serializing varint's.
- reuse of a govarint.U64Base128Encoder instance, as it's a thin,
wrapper around an underlying chunkBuf, so Reset()'s on the
chunkBuf is enough for encoder reuse.
- chunkedIntcoder.Write() method was changed to invoke w.Write() less
often by forming a larger, reused buf. Profiling and analysis
showed w.Write() was getting called a lot, often with tiny 1 or 2
byte inputs. The theory is w.Write() and its underlying memmove()
can be more efficient when provided with larger bufs.
- some repeated code removal, by reusing the Close() method.
As part of this, zap.MergeToWriter() now returns more information --
enough so that callers can now create their own SegmentBase instances.
Also, the fieldsMap maintained and returned by zap.MergeToWriter() is
now a mapping from fieldName ==> fieldID+1 (instead of the previous
mapping from fieldName ==> fieldID). This makes it similar to how
fieldsMap are handled in other parts of zap to avoid "zero value"
issues.
The theory with this change is that the dicts and itrs should be
positionally in "lock-step" with paired entries.
And, since later code also uses the same array indexing to access the
drops and newDocNums, those also need to be positionally in pair-wise
lock-step, too.
Unlike vellum's MergeIterator, the enumerator introduced in this
commit doesn't merge when there are matching keys across iterators.
Instead, the enumerator implementation provides a traversal of all the
tuples of (key, iteratorIndex, val) from the underlying vellum
iterators, ordered by key ASC, iteratorIndex ASC.
During zap segment merging, a new zap PostingsIterator was allocated
for every field X segment X term.
This change optimizes by reusing a single PostingsIterator instance
per persistMergedRest() invocation.
And, also unused fields are removed from the PostingsIterator.
Two cases in this commit...
If we're shutting down, the merger might not have handed off its
latest merged segment to the introducer yet, so the merger still owns
the segment and needs to Close() that segment itself.
In persistSnapshot(), there migth be cases where the persister might
not be able to swap in its newly persisted segments -- so, the
persistSnapshot() needs to Close() those segments itself.
Some codepaths in persistSnapshot() were saving errors into an err2
local variable, which might lead incorrectly to commit during an error
situation rather than abort.