This change preallocates more of the backing arrays for Locfields,
Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem
segment.
On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch
indexing throughput seems to improve from 15MB/sec to 20MB/sec after
the recent series of preallocation changes.
The scorch mem segment build phase uses the append() idiom to populate
various slices that are keyed by postings list id's. These slices
include...
* Postings
* PostingsLocs
* Freqs
* Norms
* Locfields
* Locstarts
* Locends
* Locpos
* Locarraypos
This change introduces an initialization step that preallocates those
slices up-front, by assigning postings list id's to terms up-front.
This change also has an additional effect of simplifying the
processDocument() logic to no longer have to worry about a first-time
initialization case, removing some duplicate'ish code.
The first time through, startNumFields should be 0, where there ought
to be more optimization assuming later docs have similar fields as the
first doc.
-VisitableDocValueFields API for persisted DV field list
-making dv configs overridable at field level
-enabling on the fly/runtime un inverting of doc values
-few UT updates
inserted to the list of dv enabled fields list -
DocValueFields in mem segment.
Moved back to the original type `DocValueFields map[uint16]bool`
for easy look up to check whether the fieldID is
configured for dv storage.
+ Track memory usage at a segment level
+ Add a new scorch API: MemoryUsed()
- Aggregate the memory consumption across
segments when API is invoked.
+ TODO:
- Revisit the second iteration if it can be gotten
rid off, and the size accounted for during the first
run while building an in-mem segment.
- Accounting for pointer and slice overhead.
docValues are persisted along with the index,
in a columnar fashion per field with variable
sized chunking for quick look up.
-naive chunk level caching is added per field
-data part inside a chunk is snappy compressed
-metaHeader inside the chunk index the dv values
inside the uncompressed data part
-all the fields are docValue persisted in this iteration