0
0
Commit Graph

212 Commits

Author SHA1 Message Date
Sreekanth Sivasankaran
127aaa06bf
Merge 1ef41101ba into 7a98d75fc5 2018-03-28 20:51:54 +00:00
Steve Yen
596d990eb9 scorch zap optimize when zero hits
Instead of allocating brand-new empty postingsList/Iterator instances,
reuse some empty singletons.
2018-03-27 15:39:33 -07:00
Sreekanth Sivasankaran
6c6c1419b5
Merge pull request #855 from blevesearch/tfr_advance
TermFieldReader Advance optimisation
2018-03-27 22:49:48 +05:30
Abhinav Dangeti
1fcfc0a5f1
Merge pull request #842 from abhinavdangeti/segment-tests
Unit tests for segments with docs with non-overlapping fields
2018-03-27 09:54:02 -07:00
Sreekanth Sivasankaran
db6a2c274f
adding nil check 2018-03-27 22:10:09 +05:30
Sreekanth Sivasankaran
72ac352961 TermFieldReader Advance optimization
skips to the target segment and avoid
un necesary read of freq,loc,norm details
2018-03-27 20:18:16 +05:30
Steve Yen
1cab701f85 scorch zap postingsIter skips freq/norm/locs parsing if allowed
In this optimization, the zap PostingsIterator skips the parsing of
freq/norm/locs chunks based on the includeFreq|Norm|Locs flags.

In bleve-query microbenchmark on dev macbookpro, with 50K en-wiki
docs, on a medium frequency term search that does not ask for term
vectors, throughput was ~750 q/sec before the change and
went to ~1400 q/sec after the change.
2018-03-26 09:49:44 -07:00
Steve Yen
192621f402 scorch includeFreq/Norm/Locs params for postingsList.Iterator API
This commit adds boolean flag params to the scorch
PostingsList.Iterator() method, so that the caller can specify whether
freq/norm/locs information is needed or not.

Future changes can leverage these params for optimizations.
2018-03-26 09:49:44 -07:00
Steve Yen
fc7584f5a0 scorch zap prealloc extra locs for future growth 2018-03-26 09:49:44 -07:00
Steve Yen
3f4b161850 scorch zap postingsIter reuses array positions slice 2018-03-26 09:49:44 -07:00
Steve Yen
db792717a6 scorch zap postingsIter reuses nextLocs/nextSegmentLocs
The previous code would inefficiently throw away the nextLocs and
would also throw away the []segment.Location slice if there were no
locations, such as if it was a 1-hit postings list.

This change tries to reuse the nextLocs/nextSegmentLocs for all cases.
2018-03-26 09:49:44 -07:00
Steve Yen
6540b197d4 scorch zap provide full buffer capacity to snappy Encode/Decode()
The snappy Encode/Decode() API's accept an optional destination buffer
param where their encoded/decoded output results will be placed, but
they only check that the buffer has enough len() rather than enough
capacity before deciding to allocate a new buffer.
2018-03-26 09:49:44 -07:00
Steve Yen
84424edcad scorch zap sync.Pool for reusable VisitDocument() data structures
As part of this, snappy.Decode() is also provided a reused buffer for
decompression.
2018-03-26 09:49:44 -07:00
Steve Yen
ba644f3893 scorch zap fix postingsIter.nextBytes() when 1-bit encoded
The previous commit's optimization that replaced the locsBitmap was
incorrectly handling the case when there was a 1-bit encoding
optimization in the postingsIterator.nextBytes() method,
incorrectly generating the freq-norm bytes.

Also as part of this change, more unused locsBitmap's were removed.
2018-03-26 09:19:00 -07:00
Steve Yen
7a19e6fd7e scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding
This is attempt #2 of the optimization that replaces the locsBitmap,
without any changes from the original commit attempt.  A commit that
follows this one contains the actual fix.

See also...
- commit 621b58dd83 (the 1st attempt)
- commit 49a4ee60ba (the revert)

-------------
The original commit message body from 621b58 was...

NOTE: this is a zap file format change.

The separate "postings locations" roaring Bitmap that encoded whether
a posting has locations info is now replaced by the least significant
bit in the freq varint encoded in the freq-norm chunkedIntCoder.

encode/decodeFreqHasLocs() are added as helper functions.
2018-03-23 12:50:24 -07:00
Steve Yen
49a4ee60ba Revert "scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding"
Testing with the cbft application led to cbft process exits...

  AsyncError exit()... error reading location field: EOF --
  main.initBleveOptions.func1() at init_bleve.go:85

This reverts commit 621b58dd83.
2018-03-23 10:01:30 -07:00
Steve Yen
621b58dd83 scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding
NOTE: this is a zap file format change.

The separate "postings locations" roaring Bitmap that encoded whether
a posting has locations info is now replaced by the least significant
bit in the freq varint encoded in the freq-norm chunkedIntCoder.

encode/decodeFreqHasLocs() are added as helper functions.
2018-03-22 17:43:07 -07:00
Steve Yen
b506fae4f7 scorch zap postingsItr remove unused offset/locoffset fields 2018-03-21 18:00:14 -07:00
Steve Yen
d1e2b55c72 scorch zap postingsItr.nextDocNum() maintains allNChunk correctly
When PostingsIterator.nextDocNum() moves the 'all' roaring bitmap
iterator forwards, it was incorrectly not keeping the allNChunk value
aligned.
2018-03-21 17:57:54 -07:00
Abhinav Dangeti
ae27aa2f14
Merge pull request #848 from abhinavdangeti/curr
Getting rid of panics added for debugging MB-28719,MB-28781
2018-03-20 15:14:22 -07:00
abhinavdangeti
0e3c57c465 Revert "scorch zap getField() which panics if the field is unknown"
This reverts commit 85b4a31e2a.
2018-03-20 14:51:33 -07:00
abhinavdangeti
844845b5d2 Revert "scorch zap panic if mergeFields() sees unsorted fields"
This reverts commit 2f4d3d8587.
2018-03-20 14:51:25 -07:00
Marty Schoch
35ea1d4423 fix MB-28719 and MB-28781 invalid/missing field in scorch
Use of sync.Pool to reuse the interm structure relied on resetting
the fieldsInv slice.  However, actual segments continued to use
this same fieldsInv slice after returning it to the pool. Simple
fix is to nil out fieldsInv slice in reset method and let the
newly built segment keep the one from the interim struct.
2018-03-20 17:41:56 -04:00
Steve Yen
2f4d3d8587 scorch zap panic if mergeFields() sees unsorted fields
mergeFields depends on the fields from the various segments being
sorted for the fieldsSame comparison to work.

Of note, the 'fieldi > 1' guard skips the 0th field, which should
always be the '_id' field.
2018-03-20 11:17:46 -07:00
Steve Yen
85b4a31e2a scorch zap getField() which panics if the field is unknown 2018-03-20 11:12:18 -07:00
abhinavdangeti
85df86ba17 Unit tests for segments with docs with non-overlapping fields 2018-03-19 12:37:50 -07:00
Steve Yen
f65ba5c0f4 MB-28781 - scorch zap merge freq/loc copying only when fieldsSame
The optimization recently introduced in commit 530a3d24cf,
("scorch zap optimize merge by byte copying freq/norm/loc's") was to
byte-copy freq/norm/loc data directly during merging.  But, it was
incorrect if the fields were different across segments.

This change now performs that byte-copying merging optimization only
when the fields are the same across segments, and if not, leverages
the old approach of deserializing & re-serializing the freq/norm/loc
information, which has the important step of remapping fieldID's.

See also: https://issues.couchbase.com/browse/MB-28781
2018-03-19 11:26:51 -07:00
Steve Yen
c881146270 scorch zap mergeTermFreqNormLocsByCopying() helper func 2018-03-19 10:36:23 -07:00
Sreekanth Sivasankaran
1ef41101ba vellum adoption for regex and fuzzy queries 2018-03-19 17:29:29 +05:30
Steve Yen
5df53c8e1f scorch zap file merger uses 1MB buffered writer
pprof of bleve-blast was showing file merging was in syscall/write a
lot.  The bufio.NewWriter() provides a default buffer size of 4K,
which is too small, and using bufio.NewWriterSize(1MB buffer size)
leads to syscall/write dropping out of the file merging flame graphs.
2018-03-16 11:49:53 -07:00
Steve Yen
b411e65234 scorch zap optimize postingsIterator reuse of freq/locChunkOffsets 2018-03-16 11:22:50 -07:00
Steve Yen
e52eb84e37 scorch zap optimize merge when deletion bitmap is empty
This change detects whether a deletion bitmap is empty, and treats
that as a nil bitmap, which allows further postings iterator codepaths
to avoid roaring bitmap operations (like, AndNot(docNums, drops)).
2018-03-16 11:22:50 -07:00
Steve Yen
5411d9ae4f
Merge pull request #826 from steveyen/scorch-estimate-buf-size
estimate interim buffer size based on previous results
2018-03-16 11:22:42 -07:00
Marty Schoch
f1c26e29f0
Merge branch 'master' into avoid-app-herder-hot-lock 2018-03-16 10:30:34 -04:00
Sreekanth Sivasankaran
53c3cab512
Merge branch 'master' into minor_docvalue_space_savings 2018-03-16 08:53:57 +05:30
Sreekanth Sivasankaran
23cebae5a8
Merge pull request #815 from blevesearch/loadchunk_minor
minor optimisation to loadChunk method
2018-03-16 08:15:37 +05:30
Marty Schoch
45e0e5c666 memoize the size of an entire index snapshot
by memoizing the size of index snapshots and their
constituent parts, we significantly reduce the amount
of time that the lock is held in the app_herder, when
calculating the total memory used
2018-03-15 17:25:05 -04:00
Sreekanth Sivasankaran
d1155c223a zap version bump, changed the offset slice format
,UTs
2018-03-15 23:25:53 +05:30
Sreekanth Sivasankaran
1775602958 posting iterator array positions clean up,
max segment size limit adjustment for hit-1
optimisation
2018-03-15 14:40:00 +05:30
Sreekanth Sivasankaran
441065a41b comments,simplification 2018-03-15 13:11:29 +05:30
Steve Yen
4af65a7846 scorch zap prealloc buf via estimate from previous interim work 2018-03-14 09:32:14 -07:00
Steve Yen
7578ff7cb8 scorch zap optimize interim's reuse of vellum builders
Since interim structs are now sync.Pool'ed, we can now also hold onto
and reuse the associated vellum builder.
2018-03-14 07:49:28 -07:00
Sreekanth Sivasankaran
19318194fa moving to new offset slice format 2018-03-13 14:06:48 +05:30
Sreekanth Sivasankaran
5271b582bb Merge branch 'master' of https://github.com/blevesearch/bleve into loadchunk_minor 2018-03-13 11:59:29 +05:30
Steve Yen
dbfc5e9130 scorch zap reuse interim freq/norm/loc slices 2018-03-12 10:04:11 -07:00
Steve Yen
07901910e2 scorch zap reuse roaring Bitmap in prepareDicts() slice growth
In this change, if the postings/postingsLocs slices need to be grown,
then copy over and reuse any of the preallocated roaring Bitmap's from
the old slice.
2018-03-12 09:19:38 -07:00
Steve Yen
b1f3969521 scorch zap reuse roaring Bitmap in postings lists 2018-03-12 09:18:11 -07:00
Steve Yen
cad88096ca scorch zap reuse roaring Bitmap during merge 2018-03-12 09:17:37 -07:00
Steve Yen
c4ceffe584 scorch zap sync Pool for interim data 2018-03-12 09:17:37 -07:00
Steve Yen
531800c479 scorch zap use roaring Add() instead of AddInt()
This change invokes Add() directly as AddInt() is a convenience
wrapper around Add().
2018-03-12 09:17:37 -07:00