0
0
Fork 0
Commit Graph

30 Commits

Author SHA1 Message Date
abhinavdangeti 7e36109b3c MB-28162: Provide API to estimate memory needed to run a search query
This API (unexported) will estimate the amount of memory needed to execute
a search query over an index before the collector begins data collection.

Sample estimates for certain queries:
{Size: 10, BenchmarkUpsidedownSearchOverhead}
                                                           ESTIMATE    BENCHMEM
TermQuery                                                  4616        4796
MatchQuery                                                 5210        5405
DisjunctionQuery (Match queries)                           7700        8447
DisjunctionQuery (Term queries)                            6514        6591
ConjunctionQuery (Match queries)                           7524        8175
Nested disjunction query (disjunction of disjunctions)     10306       10708
…
2018-03-06 13:53:42 -08:00
Marty Schoch 30acc55d05 remove unnecessary scorch reader wrapper
we now use *IndexSnapshot directly
2018-03-02 14:03:54 -08:00
Sreekanth Sivasankaran 4b742505aa adding stats for scorch 2018-02-28 15:31:55 +05:30
Sreekanth Sivasankaran 4c256f5669 DocValue Config, new API Changes
-VisitableDocValueFields API for persisted DV field list
-making dv configs overridable at field level
-enabling on the fly/runtime un inverting of doc values
-few UT updates
2018-01-08 10:58:33 +05:30
Marty Schoch 57a075afdb improving command-line tool for scorch 2018-01-05 11:50:07 -05:00
Sreekanth Sivasankaran 61ba81e964 Merge branch 'scorch', remote-tracking branch 'origin' into docValue_persisted 2017-12-30 16:52:51 +05:30
Sreekanth Sivasankaran c8df014c0c Updated readme, zap version, added new docvalue cmd,
fixed the footer and fields cmd,
interface name updated
2017-12-29 21:39:29 +05:30
abhinavdangeti 4bede84fd0 Wiring up missing stats for scorch
- updates, deletes, batches, errors
- term_searchers_started, term_searchers_finished
- num_plain_test_bytes_indexed
2017-12-28 14:07:58 -07:00
Sreekanth Sivasankaran 76f827f469 docValue persist changes
docValues are persisted along with the index,
in a columnar fashion per field with variable
sized chunking for quick look up.
-naive chunk level caching is added per field
-data part inside a chunk is snappy compressed
-metaHeader inside the chunk index the dv values
 inside the uncompressed data part
-all the fields are docValue persisted in this iteration
2017-12-28 12:05:33 +05:30
Steve Yen c7a342bc7d scorch conjuncts match phrase test passes
The conjunction searcher Advance() method now checks if its curr
doc-matches suffices before advancing them.
2017-12-23 09:19:40 -08:00
Steve Yen a884f38bf6 scorch docInternalToNumber returns 0 on error 2017-12-21 16:44:31 -08:00
Steve Yen dbc88cf6b3 scorch docNumberToBytes() checks cap(buf) before allocating
With more pprof focusing (zooming in on a particular func), there were
still some memory allocations showing up with docNumberToBytes() in
micro benchmarks of bleve-query.  On a dev macbook, on an index of 50K
wikipedia docs, using search of relatively common "text:date"...

   400 qps - upsidedown/moss
   680 qps - scorch before
   775 qps - scorch after
2017-12-19 19:15:19 -08:00
Steve Yen 620dcdb6f8 scorch uses prealloc'ed buffer for docNumberToBytes()
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now faster than
upsidedown/moss on high-freq term search "text:date"...

       400 qps - upsidedown/moss
       404 qps - scorch before
       565 qps - scorch after
2017-12-15 11:58:21 -08:00
Steve Yen f05794c6aa scorch removed worker goroutines from TermFieldReader()
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now in more the same
neighborhood of upsidedown/moss...

high-freq term search "text:date"...
   400 qps - upsidedown/moss
   360 qps - scorch before
   404 qps - scorch after

zero-freq term search "text:mschoch"...
  100K qps - upsidedown/moss
   55K qps - scorch before
   99K qps - scorch after

Of note, the scorch index had ~150 *.zap files in it, which likely
made made the worker goroutine overhead more costly than for a case
with few segments, where goroutine and channel related work appeared
relatively prominently in the pprof SVG's.
2017-12-15 11:11:18 -08:00
Steve Yen eb2f541d4f scorch filters _id from Reader.Document() results 2017-12-14 13:52:28 -08:00
Steve Yen a8884e1011 scorch fix for TestSortMatchSearch
The cachedDocs preparation has to happen for all docs in the field,
not just on the currently requested docNum.

Also, as part of this commit, there's a loop optimization where we no
longer use bytes.Split() on the terms buffer, thus avoiding garbage
creation.
2017-12-14 13:22:13 -08:00
Marty Schoch 149a26b5c1 merge deletion and cacheddocs fixes discussed in meeting 2017-12-14 10:27:39 -05:00
Sreekanth Sivasankaran 1066ee7d22 DocumentVisitFieldTerms Scorch implementation level1 2017-12-14 12:38:29 +05:30
Steve Yen c0cc46a2be scorch cleanup of the rootBolt of old snapshots
A new global variable, NumSnapshotsToKeep, represents the default
number of old snapshots that each scorch instance should maintain -- 0
is the default.  Apps that need rollback'ability may want to increase
this value in early initialization.

The Scorch.eligibleForRemoval field tracks epoches which are safe to
delete from the rootBolt.  The eligibleForRemoval is appended to
whenever the ref-count on an IndexSnapshot drops to 0.

On startup, eligibleForRemoval is also initialized with any older
epoch's found in the rootBolt.

The newly introduced Scorch.removeOldSnapshots() method is called on
every cycle of the persisterLoop(), where it maintains the
eligibleForRemoval slice to under a size defined by the
NumSnapshotsToKeep.

A future commit will remove actual storage files in order to match the
"source of truth" information found in the rootBolt.
2017-12-13 15:53:31 -08:00
Steve Yen c13ff85aaf scorch ref-counting
Future commits will provide actual cleanup when ref-counts reach 0.
2017-12-13 14:48:07 -08:00
Marty Schoch f13b786609 fix up issues to get all bleve unit tests passing for scorch
make scorch default
2017-12-11 15:47:41 -05:00
Marty Schoch 690cd39921 add crazy slow but functional DocumentVisitFieldTerms 2017-12-10 08:55:59 -05:00
Marty Schoch adac4f41db initial version of scorch which persists index to disk 2017-12-06 18:33:47 -05:00
Marty Schoch 22ffc8940e update segment API to return error in key places 2017-12-04 18:06:06 -05:00
Marty Schoch b74cf4b081 add copyright header to all new files in scorch 2017-12-01 15:42:50 -05:00
Marty Schoch 7c964de8bf switch to binary search for finding segment from global doc num
added unit tests for this function specifically
2017-12-01 09:26:51 -05:00
Marty Schoch c2047dcdf9 refactor doc id reader creation to share more code
fix issue identified by steve
2017-12-01 08:54:39 -05:00
Steve Yen 67986d41bf scorch InternalID() handles case of unknown docId 2017-11-30 08:36:01 -08:00
Marty Schoch 848aca4639 fix issues identified by errcheck 2017-11-29 13:34:15 -05:00
Marty Schoch 23f6dc1cc6 working in-memory version 2017-11-29 11:33:35 -05:00