0
0
Commit Graph

186 Commits

Author SHA1 Message Date
Steve Yen
85b4a31e2a scorch zap getField() which panics if the field is unknown 2018-03-20 11:12:18 -07:00
Steve Yen
f65ba5c0f4 MB-28781 - scorch zap merge freq/loc copying only when fieldsSame
The optimization recently introduced in commit 530a3d24cf,
("scorch zap optimize merge by byte copying freq/norm/loc's") was to
byte-copy freq/norm/loc data directly during merging.  But, it was
incorrect if the fields were different across segments.

This change now performs that byte-copying merging optimization only
when the fields are the same across segments, and if not, leverages
the old approach of deserializing & re-serializing the freq/norm/loc
information, which has the important step of remapping fieldID's.

See also: https://issues.couchbase.com/browse/MB-28781
2018-03-19 11:26:51 -07:00
Steve Yen
c881146270 scorch zap mergeTermFreqNormLocsByCopying() helper func 2018-03-19 10:36:23 -07:00
Steve Yen
5df53c8e1f scorch zap file merger uses 1MB buffered writer
pprof of bleve-blast was showing file merging was in syscall/write a
lot.  The bufio.NewWriter() provides a default buffer size of 4K,
which is too small, and using bufio.NewWriterSize(1MB buffer size)
leads to syscall/write dropping out of the file merging flame graphs.
2018-03-16 11:49:53 -07:00
Steve Yen
b411e65234 scorch zap optimize postingsIterator reuse of freq/locChunkOffsets 2018-03-16 11:22:50 -07:00
Steve Yen
e52eb84e37 scorch zap optimize merge when deletion bitmap is empty
This change detects whether a deletion bitmap is empty, and treats
that as a nil bitmap, which allows further postings iterator codepaths
to avoid roaring bitmap operations (like, AndNot(docNums, drops)).
2018-03-16 11:22:50 -07:00
Steve Yen
5411d9ae4f
Merge pull request #826 from steveyen/scorch-estimate-buf-size
estimate interim buffer size based on previous results
2018-03-16 11:22:42 -07:00
Marty Schoch
f1c26e29f0
Merge branch 'master' into avoid-app-herder-hot-lock 2018-03-16 10:30:34 -04:00
Sreekanth Sivasankaran
53c3cab512
Merge branch 'master' into minor_docvalue_space_savings 2018-03-16 08:53:57 +05:30
Sreekanth Sivasankaran
23cebae5a8
Merge pull request #815 from blevesearch/loadchunk_minor
minor optimisation to loadChunk method
2018-03-16 08:15:37 +05:30
Marty Schoch
45e0e5c666 memoize the size of an entire index snapshot
by memoizing the size of index snapshots and their
constituent parts, we significantly reduce the amount
of time that the lock is held in the app_herder, when
calculating the total memory used
2018-03-15 17:25:05 -04:00
Sreekanth Sivasankaran
d1155c223a zap version bump, changed the offset slice format
,UTs
2018-03-15 23:25:53 +05:30
Sreekanth Sivasankaran
1775602958 posting iterator array positions clean up,
max segment size limit adjustment for hit-1
optimisation
2018-03-15 14:40:00 +05:30
Sreekanth Sivasankaran
441065a41b comments,simplification 2018-03-15 13:11:29 +05:30
Steve Yen
4af65a7846 scorch zap prealloc buf via estimate from previous interim work 2018-03-14 09:32:14 -07:00
Steve Yen
7578ff7cb8 scorch zap optimize interim's reuse of vellum builders
Since interim structs are now sync.Pool'ed, we can now also hold onto
and reuse the associated vellum builder.
2018-03-14 07:49:28 -07:00
Sreekanth Sivasankaran
19318194fa moving to new offset slice format 2018-03-13 14:06:48 +05:30
Sreekanth Sivasankaran
5271b582bb Merge branch 'master' of https://github.com/blevesearch/bleve into loadchunk_minor 2018-03-13 11:59:29 +05:30
Steve Yen
dbfc5e9130 scorch zap reuse interim freq/norm/loc slices 2018-03-12 10:04:11 -07:00
Steve Yen
07901910e2 scorch zap reuse roaring Bitmap in prepareDicts() slice growth
In this change, if the postings/postingsLocs slices need to be grown,
then copy over and reuse any of the preallocated roaring Bitmap's from
the old slice.
2018-03-12 09:19:38 -07:00
Steve Yen
b1f3969521 scorch zap reuse roaring Bitmap in postings lists 2018-03-12 09:18:11 -07:00
Steve Yen
cad88096ca scorch zap reuse roaring Bitmap during merge 2018-03-12 09:17:37 -07:00
Steve Yen
c4ceffe584 scorch zap sync Pool for interim data 2018-03-12 09:17:37 -07:00
Steve Yen
531800c479 scorch zap use roaring Add() instead of AddInt()
This change invokes Add() directly as AddInt() is a convenience
wrapper around Add().
2018-03-12 09:17:37 -07:00
Steve Yen
6df6a036d8
Merge pull request #817 from steveyen/zap-no-longer-uses-mem-segment
scorch zap no longer uses mem segment
2018-03-12 07:54:10 -07:00
Sreekanth Sivasankaran
aaccf59191 docValue space savings
merging the doc value length and loc
slices into a single offset slice  as that
is enough to compute the starting offset and
length of the the doc values data for a given
document inside a docValue chunk.
2018-03-12 15:36:46 +05:30
Steve Yen
2a20a36e15 scorch zap optimimze to avoid bitmaps for 1-hit posting lists
This commit avoids creating roaring.Bitmap's (which would have just a
single entry) when a postings list/iterator represents a single
"1-hit" encoding.
2018-03-10 06:33:09 -08:00
Steve Yen
5abf7b7a19 scorch zap remove mem.Segment usage from persist / build.go 2018-03-09 15:23:58 -08:00
Steve Yen
eade78be2f scorch zap unit tests no longer use mem.Segment 2018-03-09 15:23:58 -08:00
Steve Yen
e82774ad20 scorch zap AnalysisResultsToSegmentBase()
AnalysisResultsToSegmentBase() allows analysis results to be directly
converted into a zap-encoded SegmentBase, which can then be introduced
onto the root, avoiding the creation of mem.Segment data structures.
This leads to some reduction of garbage memory allocations.

The grouping and sorting and shaping of the postings list information
is taken from the mem.Segment codepaths.

The encoding of stored fields reuses functions from zap's merger,
which has the largest savings of garbage memory avoidance.

And, the encoding of tf/loc chunks, postings & dictionary information
also follows the approach used by zap's merger, which also has some
savings of garbage memory avoidance.

In future changes, the mem.Segment dependencies will be removed from
zap, which should result in a smaller codebase.
2018-03-09 15:22:30 -08:00
Steve Yen
3884cf4d12 scorch zap writePostings() helper func refactored out 2018-03-09 13:29:28 -08:00
Sreekanth Sivasankaran
d6522e7e17 minor optimisation to loadChunk method 2018-03-09 16:10:39 +05:30
Steve Yen
25beba615d scorch mem processDocument reuses fieldLens/docMap arrays
This change produces less garbage by switching from a map[uint16]'s to
array's for the fieldLens and docMap, and then reusing those arrays
across multiple processDocument() calls.
2018-03-08 13:04:51 -08:00
Steve Yen
eac9808990 scorch zap optimize FST val encoding for terms with 1 hit
NOTE: this is a scorch zap file format change / bump to version 4.

In this optimization, the uint64 val stored in the vellum FST (term
dictionary) now may either be a uint64 postingsOffset (same as before
this change) or a uint64 encoding of the docNum + norm (in the case
where a term appears in just a single doc).
2018-03-08 09:19:54 -08:00
Steve Yen
1e2bb14f13 added TestRoaringSizes() 2018-03-07 10:53:24 -08:00
Steve Yen
0ec4a1935a
Merge pull request #808 from steveyen/more-scorch-optimizing
err fix and more scorch optimizing
2018-03-07 10:39:20 -08:00
Abhinav Dangeti
06be1ad72e
Merge pull request #806 from abhinavdangeti/master
Fixing the scorch search request memory estimate
2018-03-07 10:11:24 -08:00
Steve Yen
59eb70d020 scorch zap remove unused chunkedIntCoder field 2018-03-07 09:11:10 -08:00
Steve Yen
79f28b7c93 scorch fix persistDocValues() err return 2018-03-07 09:11:10 -08:00
Steve Yen
8c0f402d4b scorch zap optimize processDocument() loc inner loop 2018-03-07 09:11:10 -08:00
Steve Yen
15242af465
Merge pull request #805 from steveyen/optimize-scorch-mem-processField
Optimize scorch processField() inner loop and writeRoaringWithLen()
2018-03-07 09:09:57 -08:00
Sreekanth Sivasankaran
e0369a3553
Merge branch 'master' into compaction_bytes_stats 2018-03-07 14:47:33 +05:30
Sreekanth Sivasankaran
2a9739ee1b naming change, interface removal 2018-03-07 14:43:33 +05:30
abhinavdangeti
5c721226cf Fixing the scorch search request memory estimate
Do not re-account for certain referenced data in the zap structures.

New estimates:

                                    ESTIMATE    BENCHMEM
TermQuery                           11396       12437
MatchQuery                          12244       12951
DisjunctionQuery (Term queries)     20644       20709
2018-03-06 16:03:10 -08:00
Steve Yen
8841d79d26 scorch optimize mem processField inner-loop 2018-03-06 15:26:54 -08:00
Steve Yen
dde6c2e01b scorch zap optimize writeRoaringWithLen()
Before this change, writeRoaringWithLen() would leverage a reused
bytes.Buffer (#A) and invoke the roaring.WriteTo() API.

But, it turns out the roaring.WriteTo() API has a suboptimal
implementation, in that underneath-the-hood it converts the roaring
bitmap to a byte buffer (using roaring.ToBytes()), and then calls
Write().  But, that Write() turns out to be an additional memcpy into
the provided bytes.Buffer (#A).

By directly invoking roaring.ToBytes(), this change to
writeRoaringWithLen() avoids the extra memory allocation and memcpy.
2018-03-06 14:59:20 -08:00
Steve Yen
b62ca996f6 scorch zap optimize chunkedIntCoder.Add() calls to use multiple vals
This change leverages the ability for the chunkedIntCoder.Add() method
to accept multiple input param values (via the '...' param signature),
meaning there are fewer Add() invocations.
2018-03-06 14:11:41 -08:00
abhinavdangeti
38b6c522b0 Address build breakage after rebase
Removed attribute: iterator of type Posting
2018-03-06 14:00:54 -08:00
abhinavdangeti
7e36109b3c MB-28162: Provide API to estimate memory needed to run a search query
This API (unexported) will estimate the amount of memory needed to execute
a search query over an index before the collector begins data collection.

Sample estimates for certain queries:
{Size: 10, BenchmarkUpsidedownSearchOverhead}
                                                           ESTIMATE    BENCHMEM
TermQuery                                                  4616        4796
MatchQuery                                                 5210        5405
DisjunctionQuery (Match queries)                           7700        8447
DisjunctionQuery (Term queries)                            6514        6591
ConjunctionQuery (Match queries)                           7524        8175
Nested disjunction query (disjunction of disjunctions)     10306       10708
…
2018-03-06 13:53:42 -08:00
Steve Yen
5b86da85f3 scorch zap optimize postings itr with tf/loc reader/decoder reuse 2018-03-06 13:30:59 -08:00