pprof of bleve-blast was showing file merging was in syscall/write a
lot. The bufio.NewWriter() provides a default buffer size of 4K,
which is too small, and using bufio.NewWriterSize(1MB buffer size)
leads to syscall/write dropping out of the file merging flame graphs.
This change detects whether a deletion bitmap is empty, and treats
that as a nil bitmap, which allows further postings iterator codepaths
to avoid roaring bitmap operations (like, AndNot(docNums, drops)).
by memoizing the size of index snapshots and their
constituent parts, we significantly reduce the amount
of time that the lock is held in the app_herder, when
calculating the total memory used
Since its just the pointer size of the IndexReader that is
being accounted for while estimating the RAM needed to
execute a search query, get rid of the Size() API in the
IndexReader interface.
De-duplicate the list of fields provided by the client as part
of the search request, so as to not inadvertantly load the same
stored field more than once.
In this change, if the postings/postingsLocs slices need to be grown,
then copy over and reuse any of the preallocated roaring Bitmap's from
the old slice.
merging the doc value length and loc
slices into a single offset slice as that
is enough to compute the starting offset and
length of the the doc values data for a given
document inside a docValue chunk.
This commit avoids creating roaring.Bitmap's (which would have just a
single entry) when a postings list/iterator represents a single
"1-hit" encoding.
AnalysisResultsToSegmentBase() allows analysis results to be directly
converted into a zap-encoded SegmentBase, which can then be introduced
onto the root, avoiding the creation of mem.Segment data structures.
This leads to some reduction of garbage memory allocations.
The grouping and sorting and shaping of the postings list information
is taken from the mem.Segment codepaths.
The encoding of stored fields reuses functions from zap's merger,
which has the largest savings of garbage memory avoidance.
And, the encoding of tf/loc chunks, postings & dictionary information
also follows the approach used by zap's merger, which also has some
savings of garbage memory avoidance.
In future changes, the mem.Segment dependencies will be removed from
zap, which should result in a smaller codebase.