Before this change, writeRoaringWithLen() would leverage a reused
bytes.Buffer (#A) and invoke the roaring.WriteTo() API.
But, it turns out the roaring.WriteTo() API has a suboptimal
implementation, in that underneath-the-hood it converts the roaring
bitmap to a byte buffer (using roaring.ToBytes()), and then calls
Write(). But, that Write() turns out to be an additional memcpy into
the provided bytes.Buffer (#A).
By directly invoking roaring.ToBytes(), this change to
writeRoaringWithLen() avoids the extra memory allocation and memcpy.
This change leverages the ability for the chunkedIntCoder.Add() method
to accept multiple input param values (via the '...' param signature),
meaning there are fewer Add() invocations.
This API (unexported) will estimate the amount of memory needed to execute
a search query over an index before the collector begins data collection.
Sample estimates for certain queries:
{Size: 10, BenchmarkUpsidedownSearchOverhead}
ESTIMATE BENCHMEM
TermQuery 4616 4796
MatchQuery 5210 5405
DisjunctionQuery (Match queries) 7700 8447
DisjunctionQuery (Term queries) 6514 6591
ConjunctionQuery (Match queries) 7524 8175
Nested disjunction query (disjunction of disjunctions) 10306 10708
…
This change adds a zap PostingsIterator.nextBytes() method, which is
similar to Next(), but instead of returning a Posting instance,
nextBytes() returns the encoded freq/norm and location byte slices.
The zap merge code then provides those byte slices directly to the
intCoder's via a new method, intCoder.AddBytes(), thereby avoiding
having to encode many uvarint's.
Due to the usage rules of iterators, mem.PostingsIterator.Next() can
reuse its returned Postings instance.
Also, there's a micro optimization in persistDocValues() for one fewer
access to the docTermMap in the inner-loop.
This change allows the introducer to become the only goroutine to
modify the root, which in turn allows the introducer to greatly reduce
its root lock holding surface area.
There's now multiple competing merge activities (file-merging and
in-memory merging during persistence), so the simple math to
precalculate capacity for the slice of segments in introduceMerge() no
longer works for all cases and might have negative capacity.
This change removes that (sometimes wrong) precalculation, and instead
depends on append() to grow the slice correctly.
The optimizations / changes include...
- reuse of a memory buf when serializing varint's.
- reuse of a govarint.U64Base128Encoder instance, as it's a thin,
wrapper around an underlying chunkBuf, so Reset()'s on the
chunkBuf is enough for encoder reuse.
- chunkedIntcoder.Write() method was changed to invoke w.Write() less
often by forming a larger, reused buf. Profiling and analysis
showed w.Write() was getting called a lot, often with tiny 1 or 2
byte inputs. The theory is w.Write() and its underlying memmove()
can be more efficient when provided with larger bufs.
- some repeated code removal, by reusing the Close() method.