0
0
Fork 0
Commit Graph

1533 Commits

Author SHA1 Message Date
Steve Yen c7a342bc7d scorch conjuncts match phrase test passes
The conjunction searcher Advance() method now checks if its curr
doc-matches suffices before advancing them.
2017-12-23 09:19:40 -08:00
Steve Yen 903e8797c7
Merge pull request #689 from steveyen/scorch
MB-27291 - scorch compared to upsidedown/bolt using templated, generated searches
2017-12-21 18:36:02 -08:00
Steve Yen d425a3be86 scorch fix disjunction searcher Advance()
Found with "versus" test (TestScorchVersusUpsideDownBoltSmallMNSAM),
which had a boolean query with a MustNot that was the same as the Must
parameters.  This replicates a situation found by
Aruna/Mihir/testrunner/RQG (MB-27291).  Example:

  "query": {
    "must_not": {"disjuncts": [
      {"field": "body", "match": "hello"}
    ]},
    "must": {"conjuncts": [
      {"field": "body", "match": "hello"}
    ]}
  }

The nested searchers along the MustNot pathway would end up looking
roughly like...

  booleanSearcher
    MustNot
      => disjunctionSearcher
         => disjunctionSearcher
            => termSearcher

On the first Next() call by the collector, the two disjunction
searchers would run through their respective Next() method processing,
which includes their initSearcher() processing on the first time.
This has the effect of driving the leaf termSearcher through two
Next() invocations.

That is, if there were 3 docs (doc-1, doc-2, doc-3), the leaf
termSearcher would at this point have moved to point to doc-3, while
the topmost MustNot would have received doc-1.

Next, the booleanSearcher's Must searcher would produce doc-2, so the
booleanSearcher would try to Advance() the MustNot searcher to doc-2.

But, in scorch, the leafmost termSearcher had already gotten past
doc-2 and would return its doc-3.

In upsidedown, in contrast, the leaf termSearcher would then drive the
KVStore iterator with a Seek(doc-2), and the KVStore iterator would
perform a backwards seek to reach doc-2.

In scorch, however, backwards iteration seeking isn't supported.

So, this fix checks the state of the disjunction searcher to see if we
already have the necessary state so that we don't have to perform
actual Advance()'es on the underlying searchers.  This not only fixes
the behavior w.r.t. scorch, but also can have an effect of potentially
making upsidedown slightly faster as we're avoiding some backwards
KVStore iterator seeks.
2017-12-21 18:20:04 -08:00
Steve Yen 93c787ca09 scorch versus_test.go passes errcheck 2017-12-21 16:49:39 -08:00
Steve Yen 33687260ca children of conjunct/disjunct's are not necessarily termSearchers
Rename termSearcher loop variable to searcher, as the child searchers
of a conjunction/disjunction searcher aren't necessarily
termSearchers.
2017-12-21 16:45:43 -08:00
Steve Yen a884f38bf6 scorch docInternalToNumber returns 0 on error 2017-12-21 16:44:31 -08:00
Steve Yen b3e41335e1 scorch compared to upsidedown/bolt using templated, generated searches
This is somewhat like a simple, unit-test'ish version of testrunner's
random query generator, where this does not have a dependency on an
external elasticsearch server, and instead depends on functional
correctness when comparing to upsidedown/bolt.
2017-12-21 16:43:52 -08:00
Steve Yen 4c494216d6
Merge pull request #687 from steveyen/scorch
some scorch changes to check closeCh & merger memory usage
2017-12-20 15:30:55 -08:00
Steve Yen 67e0e5973b scorch mergeStoredAndRemap() memory reuse
In mergeStoredAndRemap(), instead of allocating new hashmaps for each
document, this commit reuses some arrays that are indexed by fieldId.
2017-12-20 15:18:22 -08:00
Steve Yen c155255506 scorch optimize zap.Merge() to reuse some buffers 2017-12-20 14:59:53 -08:00
Steve Yen ea4eb7301b scorch merger checks closeCh 2017-12-20 14:59:53 -08:00
Steve Yen 59797c35fa
Merge pull request #686 from steveyen/scorch
scorch removeOldBoltSnapshots() deletes from correct bucket
2017-12-20 14:59:36 -08:00
Steve Yen 04ac9d5b1f scorch removeOldBoltSnapshots() deletes from correct bucket 2017-12-20 14:46:48 -08:00
Steve Yen d55ef26c51
Merge pull request #682 from steveyen/scorch
scorch added kvconfig unsafe_batch option
2017-12-20 10:21:49 -08:00
Steve Yen df6c8f4074 scorch added kvconfig unsafe_batch option
Added an option to the kvconfig JSON, called "unsafe_batch" (bool).
Default is false, so Batch() calls are synchronously persisted by
default.  Advanced users may want to unsafe, asynchronous persistence
to tradeoff performance (mutations are queryable sooner) over safety.

    {
      "index_type": "scorch",
      "kvconfig": { "unsafe_batch": true }
    }

This change replaces the previous kvstore=="moss" workaround.
2017-12-20 10:11:55 -08:00
Steve Yen 43e3d4e1dd
Merge pull request #681 from steveyen/scorch
scorch simplify err check after vellum load
2017-12-19 23:07:24 -08:00
Steve Yen 1abbfadf0d scorch simplify err check after vellum load 2017-12-19 22:34:39 -08:00
Steve Yen 4d28b16896
Merge pull request #680 from steveyen/scorch
scorch docNumberToBytes() checks cap(buf) before allocating
2017-12-19 19:31:23 -08:00
Steve Yen dbc88cf6b3 scorch docNumberToBytes() checks cap(buf) before allocating
With more pprof focusing (zooming in on a particular func), there were
still some memory allocations showing up with docNumberToBytes() in
micro benchmarks of bleve-query.  On a dev macbook, on an index of 50K
wikipedia docs, using search of relatively common "text:date"...

   400 qps - upsidedown/moss
   680 qps - scorch before
   775 qps - scorch after
2017-12-19 19:15:19 -08:00
Steve Yen ed8bbded02
Merge pull request #679 from steveyen/scorch
scorch optimize zap Count()
2017-12-19 18:12:43 -08:00
Steve Yen 8f8333e01b scorch optimize zap Count()
This proposed approach avoids building a temporary AndNot() bitmap,
following the same kind of optimization used by mem segments.
2017-12-19 18:02:27 -08:00
Steve Yen c5aa2f997f
Merge pull request #678 from steveyen/scorch
scorch added more cases to TestIndexInsertThenDelete
2017-12-19 17:17:14 -08:00
Steve Yen a0556ad65b scorch added more cases to TestIndexInsertThenDelete 2017-12-19 16:41:56 -08:00
Steve Yen 8890e36025
Merge pull request #677 from steveyen/scorch
scorch remove leftover doc comment
2017-12-19 13:54:27 -08:00
Steve Yen 142ccdfaec scorch remove leftover doc comment
I'm suspecting that Marty's editor is more exciting than mine. :-)
2017-12-19 13:53:04 -08:00
Steve Yen c0e09d8906
Merge pull request #676 from steveyen/scorch
scorch avoid extra clone by using roaring.AndNot(x, y)
2017-12-19 13:52:40 -08:00
Steve Yen f8b52f5e68
Merge pull request #674 from abhinavdangeti/scorch
scorch APIs to support rollback
2017-12-19 13:38:47 -08:00
Steve Yen d0e4f85026 scorch avoid extra clone by using roaring.AndNot(x, y) 2017-12-19 13:37:04 -08:00
Steve Yen b0e4936a71
Merge pull request #675 from steveyen/scorch
import couchbase/vellum instead of couchbaselabs/vellum
2017-12-19 11:07:35 -08:00
abhinavdangeti 679f1ce9c3 scorch APIs to support rollback
- PreviousPersistedSnapshot
- SnapshotRevert

+ unit test
2017-12-19 10:53:08 -08:00
Steve Yen f6b506134b import couchbase/vellum instead of couchbaselabs/vellum
Also, scrubbed an old couchbaselabs/moss reference in comments.

Also, go fmt.
2017-12-19 10:49:57 -08:00
Steve Yen 20972493d1
Merge pull request #661 from steveyen/scorchReusePosting
scorch reuses Posting instance in PostingsIterator.Next()
2017-12-18 16:37:07 -08:00
Steve Yen 730d906a50 scorch reuses Posting instance in PostingsIterator.Next()
With this change, there are no more memory allocations in the calls to
PostingsIterator.Next() in the micro benchmarks of bleve-query.  On a
dev macbook, on an index of 50K wikipedia docs, using high frequency
search of "text:date"...

   400 qps - upsidedown/moss
   565 qps - scorch before
   680 qps - scorch after
2017-12-18 16:15:38 -08:00
Steve Yen bf833a5eb8
Merge pull request #668 from steveyen/scorch
scorch mergeplan explicitly weeds out empty segments
2017-12-18 12:00:43 -08:00
Steve Yen 867bb2c031 scorch mergeplan explicitly weeds out empty segments
Rather than waiting on scoring to weed out empty segments, this commit
does the weeding out of empty segments explicitly and up front.
2017-12-18 11:33:19 -08:00
Steve Yen 4f03f0cecc
Merge pull request #666 from steveyen/scorch
scorch handle zero-doc segments
2017-12-17 13:27:14 -08:00
Steve Yen 20fe70770a scorch added some tests on # of expected segments 2017-12-17 12:39:15 -08:00
Steve Yen 34f5e2175f scorch fix persister for lost notifications on no-data batches
With the previous commit, there can be a scenario where batches that
had internal-updates-only can be rapidly introduced by the app, but
the persisted notifications on only the very last IndexSnapshot would
be fired.  The persisted notifications on the in-between batches might
be missed.

The solution was to track the persisted notification channels at a
higher Scorch struct level, instead of tracking the persisted channels
at the IndexSnapshot and SegmentSnapshot levels.

Also, the persister double-check looping was simplified, which avoids
a race where an introducer might incorrectly not notify the persister.
2017-12-17 12:30:05 -08:00
Steve Yen ecbb3d2df4 scorch handles non-updating batches better
This commit improves handling when an incoming batch has internal-data
updates only and no doc updates.  In this case, a nil segment instead
of an empty segment instance is used in the segmentIntroduction.  The
segmentIntroduction, that is, might now hold only internal-data
updates only.

To handle synchronous persistence, a new field that's a slice of
persisted notification channels is added to the IndexSnapshot struct,
which the persister goroutine will close as each IndexSnapshot is
persisted.

Also, as part of this change, instead of checking the unsafeBatch flag
in several places, we instead check for non-nil'ness of these
persisted chan's.
2017-12-17 08:51:23 -08:00
Steve Yen cf1dd4cb00
Merge pull request #665 from steveyen/scorch
scorch mergeplan added TierGrowth option
2017-12-16 14:44:27 -08:00
Steve Yen e98602600d scorch mergeplan added TierGrowth option
Previously, CalcBudget() was treating
MergePlanOptions.SegmentsPerMergeTask as the growth factor while
computing the idealized staircase of segments.

This change introduces a TierGrowth option to MergePlanOptions for
more control and so that SegmentsPerMergeTask can be tweaked
independently of the tier growth factor.
2017-12-16 14:22:15 -08:00
Steve Yen 62e37bcf2c
Merge pull request #664 from steveyen/scorch
scorch mergeplan.ToBarChart() refactored to callable API
2017-12-16 09:20:35 -08:00
Steve Yen 0539744e90 scorch mergeplan.ToBarChart() refactored to callable API
Refactored out API so it's usable from other places.
2017-12-16 08:39:10 -08:00
Steve Yen dc4df18001
Merge pull request #662 from steveyen/scorch
scorch mergeplan package comments tweak
2017-12-15 18:41:20 -08:00
Marty Schoch a575be4d56 fix issue where we incorrectly seed the nextSegmentID on Open() 2017-12-15 19:26:23 -05:00
Steve Yen 45c212a0c2 scorch mergeplan package comments tweak
Moving the package comment for mergeplan to the right place.
2017-12-15 13:25:39 -08:00
Marty Schoch 0c0f05e398
Merge pull request #659 from steveyen/scorch
scorch removed worker goroutines from TermFieldReader()
2017-12-15 15:20:07 -05:00
Steve Yen 620dcdb6f8 scorch uses prealloc'ed buffer for docNumberToBytes()
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now faster than
upsidedown/moss on high-freq term search "text:date"...

       400 qps - upsidedown/moss
       404 qps - scorch before
       565 qps - scorch after
2017-12-15 11:58:21 -08:00
Steve Yen f05794c6aa scorch removed worker goroutines from TermFieldReader()
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now in more the same
neighborhood of upsidedown/moss...

high-freq term search "text:date"...
   400 qps - upsidedown/moss
   360 qps - scorch before
   404 qps - scorch after

zero-freq term search "text:mschoch"...
  100K qps - upsidedown/moss
   55K qps - scorch before
   99K qps - scorch after

Of note, the scorch index had ~150 *.zap files in it, which likely
made made the worker goroutine overhead more costly than for a case
with few segments, where goroutine and channel related work appeared
relatively prominently in the pprof SVG's.
2017-12-15 11:11:18 -08:00
Marty Schoch 562b473e36
Merge pull request #657 from steveyen/scorch
scorch fix data race w/ AddEligibleForRemoval
2017-12-14 17:56:06 -05:00