0
0
Fork 0
Commit Graph

773 Commits

Author SHA1 Message Date
Sreekanth Sivasankaran debbcd7d47 adding maxsegment size limit checks 2018-03-13 17:35:54 +05:30
Sreekanth Sivasankaran 19318194fa moving to new offset slice format 2018-03-13 14:06:48 +05:30
Sreekanth Sivasankaran 5271b582bb Merge branch 'master' of https://github.com/blevesearch/bleve into loadchunk_minor 2018-03-13 11:59:29 +05:30
Steve Yen dbfc5e9130 scorch zap reuse interim freq/norm/loc slices 2018-03-12 10:04:11 -07:00
Steve Yen 07901910e2 scorch zap reuse roaring Bitmap in prepareDicts() slice growth
In this change, if the postings/postingsLocs slices need to be grown,
then copy over and reuse any of the preallocated roaring Bitmap's from
the old slice.
2018-03-12 09:19:38 -07:00
Steve Yen b1f3969521 scorch zap reuse roaring Bitmap in postings lists 2018-03-12 09:18:11 -07:00
Steve Yen cad88096ca scorch zap reuse roaring Bitmap during merge 2018-03-12 09:17:37 -07:00
Steve Yen c4ceffe584 scorch zap sync Pool for interim data 2018-03-12 09:17:37 -07:00
Steve Yen 531800c479 scorch zap use roaring Add() instead of AddInt()
This change invokes Add() directly as AddInt() is a convenience
wrapper around Add().
2018-03-12 09:17:37 -07:00
Sreekanth Sivasankaran f9545bef2f
Merge pull request #800 from blevesearch/numsnapshots_config
making NumSnapshotsToKeep configurable
2018-03-12 20:59:03 +05:30
Sreekanth Sivasankaran 90aa91105a handling only int, float64 values 2018-03-12 20:24:51 +05:30
Steve Yen 6df6a036d8
Merge pull request #817 from steveyen/zap-no-longer-uses-mem-segment
scorch zap no longer uses mem segment
2018-03-12 07:54:10 -07:00
Sreekanth Sivasankaran aaccf59191 docValue space savings
merging the doc value length and loc
slices into a single offset slice  as that
is enough to compute the starting offset and
length of the the doc values data for a given
document inside a docValue chunk.
2018-03-12 15:36:46 +05:30
Steve Yen 2a20a36e15 scorch zap optimimze to avoid bitmaps for 1-hit posting lists
This commit avoids creating roaring.Bitmap's (which would have just a
single entry) when a postings list/iterator represents a single
"1-hit" encoding.
2018-03-10 06:33:09 -08:00
Steve Yen 5abf7b7a19 scorch zap remove mem.Segment usage from persist / build.go 2018-03-09 15:23:58 -08:00
Steve Yen eade78be2f scorch zap unit tests no longer use mem.Segment 2018-03-09 15:23:58 -08:00
Steve Yen e82774ad20 scorch zap AnalysisResultsToSegmentBase()
AnalysisResultsToSegmentBase() allows analysis results to be directly
converted into a zap-encoded SegmentBase, which can then be introduced
onto the root, avoiding the creation of mem.Segment data structures.
This leads to some reduction of garbage memory allocations.

The grouping and sorting and shaping of the postings list information
is taken from the mem.Segment codepaths.

The encoding of stored fields reuses functions from zap's merger,
which has the largest savings of garbage memory avoidance.

And, the encoding of tf/loc chunks, postings & dictionary information
also follows the approach used by zap's merger, which also has some
savings of garbage memory avoidance.

In future changes, the mem.Segment dependencies will be removed from
zap, which should result in a smaller codebase.
2018-03-09 15:22:30 -08:00
Steve Yen 3884cf4d12 scorch zap writePostings() helper func refactored out 2018-03-09 13:29:28 -08:00
Sreekanth Sivasankaran d6522e7e17 minor optimisation to loadChunk method 2018-03-09 16:10:39 +05:30
Sreekanth Sivasankaran b04909d3ee adding the integer parser utility 2018-03-09 11:05:17 +05:30
Steve Yen 25beba615d scorch mem processDocument reuses fieldLens/docMap arrays
This change produces less garbage by switching from a map[uint16]'s to
array's for the fieldLens and docMap, and then reusing those arrays
across multiple processDocument() calls.
2018-03-08 13:04:51 -08:00
Steve Yen eac9808990 scorch zap optimize FST val encoding for terms with 1 hit
NOTE: this is a scorch zap file format change / bump to version 4.

In this optimization, the uint64 val stored in the vellum FST (term
dictionary) now may either be a uint64 postingsOffset (same as before
this change) or a uint64 encoding of the docNum + norm (in the case
where a term appears in just a single doc).
2018-03-08 09:19:54 -08:00
Steve Yen 1e2bb14f13 added TestRoaringSizes() 2018-03-07 10:53:24 -08:00
Steve Yen 0ec4a1935a
Merge pull request #808 from steveyen/more-scorch-optimizing
err fix and more scorch optimizing
2018-03-07 10:39:20 -08:00
Abhinav Dangeti 06be1ad72e
Merge pull request #806 from abhinavdangeti/master
Fixing the scorch search request memory estimate
2018-03-07 10:11:24 -08:00
Steve Yen 2b5da7a819 go fmt 2018-03-07 09:12:55 -08:00
Steve Yen 59eb70d020 scorch zap remove unused chunkedIntCoder field 2018-03-07 09:11:10 -08:00
Steve Yen 79f28b7c93 scorch fix persistDocValues() err return 2018-03-07 09:11:10 -08:00
Steve Yen 8c0f402d4b scorch zap optimize processDocument() loc inner loop 2018-03-07 09:11:10 -08:00
Steve Yen 15242af465
Merge pull request #805 from steveyen/optimize-scorch-mem-processField
Optimize scorch processField() inner loop and writeRoaringWithLen()
2018-03-07 09:09:57 -08:00
Sreekanth Sivasankaran 73ed8e248d
fixing the indentation issues.
looks like it happened during the web based conflict resolution..
2018-03-07 18:34:54 +05:30
Sreekanth Sivasankaran e0369a3553
Merge branch 'master' into compaction_bytes_stats 2018-03-07 14:47:33 +05:30
Sreekanth Sivasankaran 2a9739ee1b naming change, interface removal 2018-03-07 14:43:33 +05:30
abhinavdangeti 5c721226cf Fixing the scorch search request memory estimate
Do not re-account for certain referenced data in the zap structures.

New estimates:

                                    ESTIMATE    BENCHMEM
TermQuery                           11396       12437
MatchQuery                          12244       12951
DisjunctionQuery (Term queries)     20644       20709
2018-03-06 16:03:10 -08:00
Steve Yen 8841d79d26 scorch optimize mem processField inner-loop 2018-03-06 15:26:54 -08:00
Steve Yen dde6c2e01b scorch zap optimize writeRoaringWithLen()
Before this change, writeRoaringWithLen() would leverage a reused
bytes.Buffer (#A) and invoke the roaring.WriteTo() API.

But, it turns out the roaring.WriteTo() API has a suboptimal
implementation, in that underneath-the-hood it converts the roaring
bitmap to a byte buffer (using roaring.ToBytes()), and then calls
Write().  But, that Write() turns out to be an additional memcpy into
the provided bytes.Buffer (#A).

By directly invoking roaring.ToBytes(), this change to
writeRoaringWithLen() avoids the extra memory allocation and memcpy.
2018-03-06 14:59:20 -08:00
Steve Yen b62ca996f6 scorch zap optimize chunkedIntCoder.Add() calls to use multiple vals
This change leverages the ability for the chunkedIntCoder.Add() method
to accept multiple input param values (via the '...' param signature),
meaning there are fewer Add() invocations.
2018-03-06 14:11:41 -08:00
abhinavdangeti 38b6c522b0 Address build breakage after rebase
Removed attribute: iterator of type Posting
2018-03-06 14:00:54 -08:00
abhinavdangeti 7e36109b3c MB-28162: Provide API to estimate memory needed to run a search query
This API (unexported) will estimate the amount of memory needed to execute
a search query over an index before the collector begins data collection.

Sample estimates for certain queries:
{Size: 10, BenchmarkUpsidedownSearchOverhead}
                                                           ESTIMATE    BENCHMEM
TermQuery                                                  4616        4796
MatchQuery                                                 5210        5405
DisjunctionQuery (Match queries)                           7700        8447
DisjunctionQuery (Term queries)                            6514        6591
ConjunctionQuery (Match queries)                           7524        8175
Nested disjunction query (disjunction of disjunctions)     10306       10708
…
2018-03-06 13:53:42 -08:00
Steve Yen 5b86da85f3 scorch zap optimize postings itr with tf/loc reader/decoder reuse 2018-03-06 13:30:59 -08:00
Steve Yen 530a3d24cf scorch zap optimize merge by byte copying freq/norm/loc's
This change adds a zap PostingsIterator.nextBytes() method, which is
similar to Next(), but instead of returning a Posting instance,
nextBytes() returns the encoded freq/norm and location byte slices.

The zap merge code then provides those byte slices directly to the
intCoder's via a new method, intCoder.AddBytes(), thereby avoiding
having to encode many uvarint's.
2018-03-06 13:30:59 -08:00
Steve Yen 655268bec8 scorch zap postings iterator nextDocNum() helper method
Refactored out a nextDocNum() helper method from Next() that future
optimizations can use.
2018-03-06 07:55:26 -08:00
Sreekanth Sivasankaran fa5de8e09a making NumSnapshotsToKeep configurable 2018-03-06 16:22:11 +05:30
Steve Yen 502e64c256 scorch zap Posting doesn't use iterator field 2018-03-05 16:33:13 -08:00
Steve Yen 8f8fd511b7 scorch zap access freqs[offset] outside loop 2018-03-05 12:02:33 -08:00
Steve Yen a338386a03 scorch build optimize freq/loc slice capacity 2018-03-05 12:02:33 -08:00
Steve Yen 856778ad7b scorch zap build prealloc docNumbers capacity 2018-03-05 12:02:33 -08:00
Steve Yen 8c0881eab2 scorch zap build reuses mem postingsList/Iterator structs 2018-03-05 12:02:33 -08:00
Steve Yen 85761c6a57 go fmt 2018-03-05 12:02:33 -08:00
Steve Yen d44c5ad568 scorch stats MaxBatchIntroTime bug fix and more timing stats
Added timing stats for in-mem zap merging and file-based zap merging.
2018-03-05 12:02:33 -08:00
Sreekanth Sivasankaran 395b0a312d adding UTs 2018-03-05 17:02:58 +05:30
Sreekanth Sivasankaran dec265c481 adding compaction_written_bytes/sec stats to scorch 2018-03-05 16:32:57 +05:30
Steve Yen 884da6f93a scorch optimize mem processDocument() norm calculation
This change moves the norm calculation outside of the inner loop.
2018-03-03 11:58:30 -08:00
Steve Yen 6ae799052a scorch mem optimize processDocument() stored field 2018-03-03 11:52:33 -08:00
Steve Yen b7cfef81c9 scorch optimize mem processDocument() dict access
This change moves the dict lookup to outside of the loop.
2018-03-03 11:43:25 -08:00
Steve Yen 88c740095b scorch optimizations for mem.PostingsIterator.Next() & docTermMap
Due to the usage rules of iterators, mem.PostingsIterator.Next() can
reuse its returned Postings instance.

Also, there's a micro optimization in persistDocValues() for one fewer
access to the docTermMap in the inner-loop.
2018-03-03 11:31:18 -08:00
Steve Yen a5253bfe2b scorch persister goes through introducer to affect root
This change allows the introducer to become the only goroutine to
modify the root, which in turn allows the introducer to greatly reduce
its root lock holding surface area.
2018-03-02 16:14:28 -08:00
Marty Schoch 30acc55d05 remove unnecessary scorch reader wrapper
we now use *IndexSnapshot directly
2018-03-02 14:03:54 -08:00
Steve Yen d61d9e4cf6 scorch stats MaxBatchIntroTime and TotBatchIntroTime 2018-03-02 13:33:06 -08:00
Steve Yen 868a66279e scorch indexing time stat
Looks like this was forgotten along the way -- the stat for analysis
time was tracked correctly, but indexing time wasn't.
2018-03-02 11:07:39 -08:00
Steve Yen 7e5bb0bd8d renamed to CurOnDiskBytes/Files as those are gauges 2018-03-01 14:13:43 -08:00
Marty Schoch 0363b24dd4 update to use new vellum Reset API 2018-03-01 09:37:39 -08:00
Steve Yen 39f9cee910
Merge pull request #789 from steveyen/sreekanth-cb-scorch_stats
adding stats for scorch, with no gauges
2018-02-28 17:41:10 -08:00
Steve Yen 1b661ef844 stats cleanup, renaming, gauges replaced with counters 2018-02-28 17:03:28 -08:00
Steve Yen 7d46d2c7ae scorch zap intcoder encoder is never nil 2018-02-28 10:09:21 -08:00
Sreekanth Sivasankaran 4b742505aa adding stats for scorch 2018-02-28 15:31:55 +05:30
Steve Yen dd7d93ee5e scorch zap loadChunk reuses Location slices 2018-02-27 18:01:48 -08:00
Steve Yen 4dbb4b1495 scorch zap posting reuses freqNorm & loc reader and decoder 2018-02-27 18:01:48 -08:00
Steve Yen a32362ba2e MB-28403: scorch introduceMerge doesn't prealloc segments capacity
There's now multiple competing merge activities (file-merging and
in-memory merging during persistence), so the simple math to
precalculate capacity for the slice of segments in introduceMerge() no
longer works for all cases and might have negative capacity.

This change removes that (sometimes wrong) precalculation, and instead
depends on append() to grow the slice correctly.
2018-02-27 15:14:34 -08:00
Steve Yen 3f1dcb6078 scorch zap merge optimize drops lookup to outside of loop 2018-02-27 09:23:29 -08:00
Steve Yen 99ed127176 scorch zap merge optimize newDocNums lookup to outside of loop
And, also a "go fmt".
2018-02-26 14:23:55 -08:00
Steve Yen 98d5d7bd81 scorch zap chunkedIntCoder optimizations
The optimizations / changes include...

- reuse of a memory buf when serializing varint's.

- reuse of a govarint.U64Base128Encoder instance, as it's a thin,
  wrapper around an underlying chunkBuf, so Reset()'s on the
  chunkBuf is enough for encoder reuse.

- chunkedIntcoder.Write() method was changed to invoke w.Write() less
  often by forming a larger, reused buf.  Profiling and analysis
  showed w.Write() was getting called a lot, often with tiny 1 or 2
  byte inputs.  The theory is w.Write() and its underlying memmove()
  can be more efficient when provided with larger bufs.

- some repeated code removal, by reusing the Close() method.
2018-02-26 14:17:09 -08:00
Steve Yen ce2332e111 scorch zap merge reuses tf/locEncoder across terms
The finishTerm() helper func that's invoked on every outer loop resets
the tf/locEncoders so they can be safely reused.
2018-02-26 11:37:11 -08:00
Marty Schoch eca31dfd27
Merge pull request #777 from sreekanth-cb/persister_pause
pausing persister until merging catches up
2018-02-26 14:36:07 -05:00
Sreekanth Sivasankaran e02849fcda
fix the indentation 2018-02-26 16:21:33 +05:30
Sreekanth Sivasankaran c45822347f
Merge branch 'master' into mergeplanner_options 2018-02-26 15:59:20 +05:30
Sreekanth Sivasankaran e4cc79a9ad adopting json parsing on options,
fixed the inadvertant option modification
2018-02-26 15:56:30 +05:30
Sreekanth Sivasankaran f0a65f041d cleaning up the wait loop 2018-02-25 20:58:53 +05:30
Sreekanth Sivasankaran 3a571ad283
Merge branch 'master' into persister_pause 2018-02-24 23:57:20 +05:30
Sreekanth Sivasankaran 874829759b
cleaning up the wait loop 2018-02-24 23:53:49 +05:30
Sreekanth Sivasankaran 4109e327ff
Merge pull request #771 from sreekanth-cb/merge_handling_empty_seg_tasks
Fix for empty segment merge handling
2018-02-24 10:48:31 +05:30
Sreekanth Sivasankaran 683e195ac4 adding empty segment handling during introduction
cleaning up the segment live size check
2018-02-24 07:03:27 +05:30
abhinavdangeti da70758635 Handle case where store snapshot isn't closed in upsidedown's Batch() API 2018-02-23 14:47:22 -08:00
Steve Yen c50d9b4023 scorch conditional merging during persistSnapshot()
As part of this change, there are nw helper methods --
persistSnapshotMaybeMerge() and persistSnapshotDirect().
2018-02-23 09:17:02 -08:00
Sreekanth Sivasankaran a1db057656 configurable mergePlanner options
mergePlanner options are parsed from the
scorch configs parameters
2018-02-23 16:09:37 +05:30
Sreekanth Sivasankaran a8ebf2a553 lowering epochDistance to 5,
fixing the lastMergedEpoch value updates
2018-02-21 17:25:14 +05:30
Steve Yen a0b7508da7 scorch zap mergeSegmentBases() func
As part of this, zap.MergeToWriter() now returns more information --
enough so that callers can now create their own SegmentBase instances.

Also, the fieldsMap maintained and returned by zap.MergeToWriter() is
now a mapping from fieldName ==> fieldID+1 (instead of the previous
mapping from fieldName ==> fieldID).  This makes it similar to how
fieldsMap are handled in other parts of zap to avoid "zero value"
issues.
2018-02-19 14:13:31 -08:00
Steve Yen 720010783e scorch zap InitSegmentBase() helper func
Refactored out a zap.InitSegmentBase() func so that non-zap packages
can create SegmentBase instances.
2018-02-19 14:13:31 -08:00
Steve Yen 656220ca9d
Merge pull request #769 from steveyen/scorch-rollback-ignores-unsafeBatch
scorch rollback ignores unsafeBatch flag
2018-02-15 18:51:59 -08:00
Sreekanth Sivasankaran 606a270669 Fix for empty segment merge handling
Avoid creating new files with emtpy segments tasks
during the merge operation, skips the
incorrect appending of a newer segment during merge.
2018-02-15 16:44:20 +05:30
Sreekanth Sivasankaran 35611f4287
Merge branch 'master' into persister_pause 2018-02-14 16:53:06 +05:30
Sreekanth Sivasankaran 6f2797bec3 Adding a pause to persister until the merger
catches up
2018-02-14 16:39:26 +05:30
Steve Yen 030469a351
Merge pull request #767 from steveyen/persistSnapshot-err-handling
improvements to err handling in persistSnapshot(), etc
2018-02-13 14:53:42 -08:00
Steve Yen 57fc03258e scorch rollback ignores unsafeBatch flag
See also: https://github.com/blevesearch/bleve/issues/760
2018-02-13 10:21:42 -08:00
Steve Yen fe544f3352 scorch zap merge uses enumerator for vellum.Iterator's 2018-02-12 21:28:46 -08:00
Steve Yen a073424e5a scorch zap dict.postingsListFromOffset() method
A helper method that can create a PostingsList if the caller already
knows the postingsOffset.
2018-02-12 20:54:07 -08:00
Steve Yen 2158e06c40 scorch zap merge collects dicts & itrs in lock-step
The theory with this change is that the dicts and itrs should be
positionally in "lock-step" with paired entries.

And, since later code also uses the same array indexing to access the
drops and newDocNums, those also need to be positionally in pair-wise
lock-step, too.
2018-02-12 20:54:07 -08:00
Steve Yen 95a4f37e5c scorch zap enumerator impl that joins multiple vellum iterators
Unlike vellum's MergeIterator, the enumerator introduced in this
commit doesn't merge when there are matching keys across iterators.

Instead, the enumerator implementation provides a traversal of all the
tuples of (key, iteratorIndex, val) from the underlying vellum
iterators, ordered by key ASC, iteratorIndex ASC.
2018-02-12 20:54:06 -08:00
Steve Yen e37c563c56 scorch zap merge move fieldDvLocsOffset var declaration
Move the var declaration to nearer where its used.
2018-02-08 18:03:09 -08:00
Steve Yen f177f07613 scorch zap segment merging reuses prealloc'ed PostingsIterator
During zap segment merging, a new zap PostingsIterator was allocated
for every field X segment X term.

This change optimizes by reusing a single PostingsIterator instance
per persistMergedRest() invocation.

And, also unused fields are removed from the PostingsIterator.
2018-02-08 17:24:30 -08:00