This API (unexported) will estimate the amount of memory needed to execute
a search query over an index before the collector begins data collection.
Sample estimates for certain queries:
{Size: 10, BenchmarkUpsidedownSearchOverhead}
ESTIMATE BENCHMEM
TermQuery 4616 4796
MatchQuery 5210 5405
DisjunctionQuery (Match queries) 7700 8447
DisjunctionQuery (Term queries) 6514 6591
ConjunctionQuery (Match queries) 7524 8175
Nested disjunction query (disjunction of disjunctions) 10306 10708
…
-VisitableDocValueFields API for persisted DV field list
-making dv configs overridable at field level
-enabling on the fly/runtime un inverting of doc values
-few UT updates
+ Track memory usage at a segment level
+ Add a new scorch API: MemoryUsed()
- Aggregate the memory consumption across
segments when API is invoked.
+ TODO:
- Revisit the second iteration if it can be gotten
rid off, and the size accounted for during the first
run while building an in-mem segment.
- Accounting for pointer and slice overhead.
With the previous commit, there can be a scenario where batches that
had internal-updates-only can be rapidly introduced by the app, but
the persisted notifications on only the very last IndexSnapshot would
be fired. The persisted notifications on the in-between batches might
be missed.
The solution was to track the persisted notification channels at a
higher Scorch struct level, instead of tracking the persisted channels
at the IndexSnapshot and SegmentSnapshot levels.
Also, the persister double-check looping was simplified, which avoids
a race where an introducer might incorrectly not notify the persister.
This commit improves handling when an incoming batch has internal-data
updates only and no doc updates. In this case, a nil segment instead
of an empty segment instance is used in the segmentIntroduction. The
segmentIntroduction, that is, might now hold only internal-data
updates only.
To handle synchronous persistence, a new field that's a slice of
persisted notification channels is added to the IndexSnapshot struct,
which the persister goroutine will close as each IndexSnapshot is
persisted.
Also, as part of this change, instead of checking the unsafeBatch flag
in several places, we instead check for non-nil'ness of these
persisted chan's.
The cachedDocs preparation has to happen for all docs in the field,
not just on the currently requested docNum.
Also, as part of this commit, there's a loop optimization where we no
longer use bytes.Split() on the terms buffer, thus avoiding garbage
creation.