inserted to the list of dv enabled fields list -
DocValueFields in mem segment.
Moved back to the original type `DocValueFields map[uint16]bool`
for easy look up to check whether the fieldID is
configured for dv storage.
+ Track memory usage at a segment level
+ Add a new scorch API: MemoryUsed()
- Aggregate the memory consumption across
segments when API is invoked.
+ TODO:
- Revisit the second iteration if it can be gotten
rid off, and the size accounted for during the first
run while building an in-mem segment.
- Accounting for pointer and slice overhead.
+ Adding new entries to the stats struct of scorch.
+ These stats are atomically incremented upon every segment
introduction, and upon successful persistence.
docValues are persisted along with the index,
in a columnar fashion per field with variable
sized chunking for quick look up.
-naive chunk level caching is added per field
-data part inside a chunk is snappy compressed
-metaHeader inside the chunk index the dv values
inside the uncompressed data part
-all the fields are docValue persisted in this iteration
Found with "versus" test (TestScorchVersusUpsideDownBoltSmallMNSAM),
which had a boolean query with a MustNot that was the same as the Must
parameters. This replicates a situation found by
Aruna/Mihir/testrunner/RQG (MB-27291). Example:
"query": {
"must_not": {"disjuncts": [
{"field": "body", "match": "hello"}
]},
"must": {"conjuncts": [
{"field": "body", "match": "hello"}
]}
}
The nested searchers along the MustNot pathway would end up looking
roughly like...
booleanSearcher
MustNot
=> disjunctionSearcher
=> disjunctionSearcher
=> termSearcher
On the first Next() call by the collector, the two disjunction
searchers would run through their respective Next() method processing,
which includes their initSearcher() processing on the first time.
This has the effect of driving the leaf termSearcher through two
Next() invocations.
That is, if there were 3 docs (doc-1, doc-2, doc-3), the leaf
termSearcher would at this point have moved to point to doc-3, while
the topmost MustNot would have received doc-1.
Next, the booleanSearcher's Must searcher would produce doc-2, so the
booleanSearcher would try to Advance() the MustNot searcher to doc-2.
But, in scorch, the leafmost termSearcher had already gotten past
doc-2 and would return its doc-3.
In upsidedown, in contrast, the leaf termSearcher would then drive the
KVStore iterator with a Seek(doc-2), and the KVStore iterator would
perform a backwards seek to reach doc-2.
In scorch, however, backwards iteration seeking isn't supported.
So, this fix checks the state of the disjunction searcher to see if we
already have the necessary state so that we don't have to perform
actual Advance()'es on the underlying searchers. This not only fixes
the behavior w.r.t. scorch, but also can have an effect of potentially
making upsidedown slightly faster as we're avoiding some backwards
KVStore iterator seeks.
This is somewhat like a simple, unit-test'ish version of testrunner's
random query generator, where this does not have a dependency on an
external elasticsearch server, and instead depends on functional
correctness when comparing to upsidedown/bolt.
Added an option to the kvconfig JSON, called "unsafe_batch" (bool).
Default is false, so Batch() calls are synchronously persisted by
default. Advanced users may want to unsafe, asynchronous persistence
to tradeoff performance (mutations are queryable sooner) over safety.
{
"index_type": "scorch",
"kvconfig": { "unsafe_batch": true }
}
This change replaces the previous kvstore=="moss" workaround.
With more pprof focusing (zooming in on a particular func), there were
still some memory allocations showing up with docNumberToBytes() in
micro benchmarks of bleve-query. On a dev macbook, on an index of 50K
wikipedia docs, using search of relatively common "text:date"...
400 qps - upsidedown/moss
680 qps - scorch before
775 qps - scorch after