NOTE: this is a zap file format change.
The separate "postings locations" roaring Bitmap that encoded whether
a posting has locations info is now replaced by the least significant
bit in the freq varint encoded in the freq-norm chunkedIntCoder.
encode/decodeFreqHasLocs() are added as helper functions.
Use of sync.Pool to reuse the interm structure relied on resetting
the fieldsInv slice. However, actual segments continued to use
this same fieldsInv slice after returning it to the pool. Simple
fix is to nil out fieldsInv slice in reset method and let the
newly built segment keep the one from the interim struct.
mergeFields depends on the fields from the various segments being
sorted for the fieldsSame comparison to work.
Of note, the 'fieldi > 1' guard skips the 0th field, which should
always be the '_id' field.
correctly handle/print additional loc bitmap address
this fixes bitmap length that is output
instantiate roaring bitmap and print it out
removed some unnecessary debug logging
updated dict command to print 1-hit encoded vals
this makes dict command usable for seeing which
doc ids are in a segment and their corresponding doc number
The optimization recently introduced in commit 530a3d24cf,
("scorch zap optimize merge by byte copying freq/norm/loc's") was to
byte-copy freq/norm/loc data directly during merging. But, it was
incorrect if the fields were different across segments.
This change now performs that byte-copying merging optimization only
when the fields are the same across segments, and if not, leverages
the old approach of deserializing & re-serializing the freq/norm/loc
information, which has the important step of remapping fieldID's.
See also: https://issues.couchbase.com/browse/MB-28781
err with update workload
Introducer was incorrectly updating the offsets slice
of segments, by considering only the live doc count
while computing the "running". This can result in
incorrectly computing the residing segment as well as
the local doc numbers while loading a document after
a search hit.
pprof of bleve-blast was showing file merging was in syscall/write a
lot. The bufio.NewWriter() provides a default buffer size of 4K,
which is too small, and using bufio.NewWriterSize(1MB buffer size)
leads to syscall/write dropping out of the file merging flame graphs.
This change detects whether a deletion bitmap is empty, and treats
that as a nil bitmap, which allows further postings iterator codepaths
to avoid roaring bitmap operations (like, AndNot(docNums, drops)).
by memoizing the size of index snapshots and their
constituent parts, we significantly reduce the amount
of time that the lock is held in the app_herder, when
calculating the total memory used
Since its just the pointer size of the IndexReader that is
being accounted for while estimating the RAM needed to
execute a search query, get rid of the Size() API in the
IndexReader interface.
De-duplicate the list of fields provided by the client as part
of the search request, so as to not inadvertantly load the same
stored field more than once.