+ Adding new entries to the stats struct of scorch.
+ These stats are atomically incremented upon every segment
introduction, and upon successful persistence.
Found with "versus" test (TestScorchVersusUpsideDownBoltSmallMNSAM),
which had a boolean query with a MustNot that was the same as the Must
parameters. This replicates a situation found by
Aruna/Mihir/testrunner/RQG (MB-27291). Example:
"query": {
"must_not": {"disjuncts": [
{"field": "body", "match": "hello"}
]},
"must": {"conjuncts": [
{"field": "body", "match": "hello"}
]}
}
The nested searchers along the MustNot pathway would end up looking
roughly like...
booleanSearcher
MustNot
=> disjunctionSearcher
=> disjunctionSearcher
=> termSearcher
On the first Next() call by the collector, the two disjunction
searchers would run through their respective Next() method processing,
which includes their initSearcher() processing on the first time.
This has the effect of driving the leaf termSearcher through two
Next() invocations.
That is, if there were 3 docs (doc-1, doc-2, doc-3), the leaf
termSearcher would at this point have moved to point to doc-3, while
the topmost MustNot would have received doc-1.
Next, the booleanSearcher's Must searcher would produce doc-2, so the
booleanSearcher would try to Advance() the MustNot searcher to doc-2.
But, in scorch, the leafmost termSearcher had already gotten past
doc-2 and would return its doc-3.
In upsidedown, in contrast, the leaf termSearcher would then drive the
KVStore iterator with a Seek(doc-2), and the KVStore iterator would
perform a backwards seek to reach doc-2.
In scorch, however, backwards iteration seeking isn't supported.
So, this fix checks the state of the disjunction searcher to see if we
already have the necessary state so that we don't have to perform
actual Advance()'es on the underlying searchers. This not only fixes
the behavior w.r.t. scorch, but also can have an effect of potentially
making upsidedown slightly faster as we're avoiding some backwards
KVStore iterator seeks.
This is somewhat like a simple, unit-test'ish version of testrunner's
random query generator, where this does not have a dependency on an
external elasticsearch server, and instead depends on functional
correctness when comparing to upsidedown/bolt.
Added an option to the kvconfig JSON, called "unsafe_batch" (bool).
Default is false, so Batch() calls are synchronously persisted by
default. Advanced users may want to unsafe, asynchronous persistence
to tradeoff performance (mutations are queryable sooner) over safety.
{
"index_type": "scorch",
"kvconfig": { "unsafe_batch": true }
}
This change replaces the previous kvstore=="moss" workaround.
With more pprof focusing (zooming in on a particular func), there were
still some memory allocations showing up with docNumberToBytes() in
micro benchmarks of bleve-query. On a dev macbook, on an index of 50K
wikipedia docs, using search of relatively common "text:date"...
400 qps - upsidedown/moss
680 qps - scorch before
775 qps - scorch after
With this change, there are no more memory allocations in the calls to
PostingsIterator.Next() in the micro benchmarks of bleve-query. On a
dev macbook, on an index of 50K wikipedia docs, using high frequency
search of "text:date"...
400 qps - upsidedown/moss
565 qps - scorch before
680 qps - scorch after
With the previous commit, there can be a scenario where batches that
had internal-updates-only can be rapidly introduced by the app, but
the persisted notifications on only the very last IndexSnapshot would
be fired. The persisted notifications on the in-between batches might
be missed.
The solution was to track the persisted notification channels at a
higher Scorch struct level, instead of tracking the persisted channels
at the IndexSnapshot and SegmentSnapshot levels.
Also, the persister double-check looping was simplified, which avoids
a race where an introducer might incorrectly not notify the persister.
This commit improves handling when an incoming batch has internal-data
updates only and no doc updates. In this case, a nil segment instead
of an empty segment instance is used in the segmentIntroduction. The
segmentIntroduction, that is, might now hold only internal-data
updates only.
To handle synchronous persistence, a new field that's a slice of
persisted notification channels is added to the IndexSnapshot struct,
which the persister goroutine will close as each IndexSnapshot is
persisted.
Also, as part of this change, instead of checking the unsafeBatch flag
in several places, we instead check for non-nil'ness of these
persisted chan's.
Previously, CalcBudget() was treating
MergePlanOptions.SegmentsPerMergeTask as the growth factor while
computing the idealized staircase of segments.
This change introduces a TierGrowth option to MergePlanOptions for
more control and so that SegmentsPerMergeTask can be tweaked
independently of the tier growth factor.