0
0
Fork 0
Commit Graph

1887 Commits

Author SHA1 Message Date
Steve Yen 7a98d75fc5
Merge pull request #860 from steveyen/optimize-docInternalToNumber
optimize docInternalToNumber() to avoid allocations
2018-03-28 10:21:30 -07:00
Steve Yen b955bdcd72 scorch optimize docInternalToNumber() to avoid allocations
docInternalToNumber() no longer allocates a reader instance and a
heap uint64 to hold the result.
2018-03-28 10:08:21 -07:00
Steve Yen fd07fdb862
Merge pull request #859 from steveyen/optimize-when-zero-hits
optimizations in the case of zero hits
2018-03-27 16:04:03 -07:00
Steve Yen 013d06d756 scorch TermFieldReader() reuses string(term) 2018-03-27 15:39:33 -07:00
Steve Yen 596d990eb9 scorch zap optimize when zero hits
Instead of allocating brand-new empty postingsList/Iterator instances,
reuse some empty singletons.
2018-03-27 15:39:33 -07:00
Sreekanth Sivasankaran 6c6c1419b5
Merge pull request #855 from blevesearch/tfr_advance
TermFieldReader Advance optimisation
2018-03-27 22:49:48 +05:30
Abhinav Dangeti 1fcfc0a5f1
Merge pull request #842 from abhinavdangeti/segment-tests
Unit tests for segments with docs with non-overlapping fields
2018-03-27 09:54:02 -07:00
Sreekanth Sivasankaran db6a2c274f
adding nil check 2018-03-27 22:10:09 +05:30
Sreekanth Sivasankaran 72ac352961 TermFieldReader Advance optimization
skips to the target segment and avoid
un necesary read of freq,loc,norm details
2018-03-27 20:18:16 +05:30
Steve Yen e9ca76be78
Merge pull request #850 from steveyen/more-reuse-optimizations
More buffer & slice reuse optimizations
2018-03-26 13:07:46 -07:00
Steve Yen 1cab701f85 scorch zap postingsIter skips freq/norm/locs parsing if allowed
In this optimization, the zap PostingsIterator skips the parsing of
freq/norm/locs chunks based on the includeFreq|Norm|Locs flags.

In bleve-query microbenchmark on dev macbookpro, with 50K en-wiki
docs, on a medium frequency term search that does not ask for term
vectors, throughput was ~750 q/sec before the change and
went to ~1400 q/sec after the change.
2018-03-26 09:49:44 -07:00
Steve Yen 192621f402 scorch includeFreq/Norm/Locs params for postingsList.Iterator API
This commit adds boolean flag params to the scorch
PostingsList.Iterator() method, so that the caller can specify whether
freq/norm/locs information is needed or not.

Future changes can leverage these params for optimizations.
2018-03-26 09:49:44 -07:00
Steve Yen fc7584f5a0 scorch zap prealloc extra locs for future growth 2018-03-26 09:49:44 -07:00
Steve Yen 3f4b161850 scorch zap postingsIter reuses array positions slice 2018-03-26 09:49:44 -07:00
Steve Yen db792717a6 scorch zap postingsIter reuses nextLocs/nextSegmentLocs
The previous code would inefficiently throw away the nextLocs and
would also throw away the []segment.Location slice if there were no
locations, such as if it was a 1-hit postings list.

This change tries to reuse the nextLocs/nextSegmentLocs for all cases.
2018-03-26 09:49:44 -07:00
Steve Yen 6540b197d4 scorch zap provide full buffer capacity to snappy Encode/Decode()
The snappy Encode/Decode() API's accept an optional destination buffer
param where their encoded/decoded output results will be placed, but
they only check that the buffer has enough len() rather than enough
capacity before deciding to allocate a new buffer.
2018-03-26 09:49:44 -07:00
Steve Yen 84424edcad scorch zap sync.Pool for reusable VisitDocument() data structures
As part of this, snappy.Decode() is also provided a reused buffer for
decompression.
2018-03-26 09:49:44 -07:00
Steve Yen 33b1f065dc
Merge pull request #857 from steveyen/replace-locsBitmap-attempt2
optimization to replace locations bitmap, attempt #2
2018-03-26 09:49:17 -07:00
Steve Yen ba644f3893 scorch zap fix postingsIter.nextBytes() when 1-bit encoded
The previous commit's optimization that replaced the locsBitmap was
incorrectly handling the case when there was a 1-bit encoding
optimization in the postingsIterator.nextBytes() method,
incorrectly generating the freq-norm bytes.

Also as part of this change, more unused locsBitmap's were removed.
2018-03-26 09:19:00 -07:00
Steve Yen 7a19e6fd7e scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding
This is attempt #2 of the optimization that replaces the locsBitmap,
without any changes from the original commit attempt.  A commit that
follows this one contains the actual fix.

See also...
- commit 621b58dd83 (the 1st attempt)
- commit 49a4ee60ba (the revert)

-------------
The original commit message body from 621b58 was...

NOTE: this is a zap file format change.

The separate "postings locations" roaring Bitmap that encoded whether
a posting has locations info is now replaced by the least significant
bit in the freq varint encoded in the freq-norm chunkedIntCoder.

encode/decodeFreqHasLocs() are added as helper functions.
2018-03-23 12:50:24 -07:00
Steve Yen 1f7faf7e01
Merge pull request #856 from steveyen/revert-locsBitmap-replacement
Revert "scorch zap replace locsBitmap w/ 1 bit from freq-norm varint …
2018-03-23 11:20:45 -07:00
Steve Yen 49a4ee60ba Revert "scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding"
Testing with the cbft application led to cbft process exits...

  AsyncError exit()... error reading location field: EOF --
  main.initBleveOptions.func1() at init_bleve.go:85

This reverts commit 621b58dd83.
2018-03-23 10:01:30 -07:00
Steve Yen c6df65286c
Merge pull request #854 from steveyen/replace-locsBitmap
replace locs bitmap with 1 bit from freq-norm varint encoding
2018-03-22 19:39:32 -07:00
Abhinav Dangeti 2384c41098
Merge pull request #851 from abhinavdangeti/master
MB-28782: Error handling in merger/persister when index is closed
2018-03-22 18:07:11 -07:00
Steve Yen 67f75005c4 fix cmd/bleve help string for internal command 2018-03-22 17:43:07 -07:00
Steve Yen 621b58dd83 scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding
NOTE: this is a zap file format change.

The separate "postings locations" roaring Bitmap that encoded whether
a posting has locations info is now replaced by the least significant
bit in the freq varint encoded in the freq-norm chunkedIntCoder.

encode/decodeFreqHasLocs() are added as helper functions.
2018-03-22 17:43:07 -07:00
abhinavdangeti 18cfcd11d1 MB-28782: Error handling in merger/persister when index is closed
When the index is closed, do not fire an AsyncError (fatal) from either
the merger or the persister that is actively working. This is quite a
probable situation, so exit the loop within the goroutine.
2018-03-22 14:29:59 -07:00
Steve Yen a7c4237d00
Merge pull request #852 from steveyen/scorch-zap-postingsIterator-allNChunk-bug
PostingsIterator.nextDocNum() maintains allNChunk correctly
2018-03-22 13:26:14 -07:00
Steve Yen 7e32a35af5
Merge pull request #853 from steveyen/scorch-cmd-ascii-help-fix
fix cmd/bleve scorch ascii cmd help text
2018-03-22 11:00:49 -07:00
Steve Yen 6b78dd4184 fix cmd/bleve scorch ascii cmd help text
Initially, there was a typo with an extra space char, but then I
realized there was some copypasting corrections.
2018-03-22 06:48:42 -07:00
Steve Yen b506fae4f7 scorch zap postingsItr remove unused offset/locoffset fields 2018-03-21 18:00:14 -07:00
Steve Yen d1e2b55c72 scorch zap postingsItr.nextDocNum() maintains allNChunk correctly
When PostingsIterator.nextDocNum() moves the 'all' roaring bitmap
iterator forwards, it was incorrectly not keeping the allNChunk value
aligned.
2018-03-21 17:57:54 -07:00
Abhinav Dangeti ae27aa2f14
Merge pull request #848 from abhinavdangeti/curr
Getting rid of panics added for debugging MB-28719,MB-28781
2018-03-20 15:14:22 -07:00
Abhinav Dangeti a4d88f8a12
Merge pull request #833 from abhinavdangeti/master
Return an error when the snapshotEpoch is invalid
2018-03-20 15:04:23 -07:00
Marty Schoch 110cfa3074
Merge pull request #847 from mschoch/fix-scorch-missing-invalid-field
fix MB-28719 and MB-28781 invalid/missing field in scorch
2018-03-20 17:52:52 -04:00
abhinavdangeti 0e3c57c465 Revert "scorch zap getField() which panics if the field is unknown"
This reverts commit 85b4a31e2a.
2018-03-20 14:51:33 -07:00
abhinavdangeti 844845b5d2 Revert "scorch zap panic if mergeFields() sees unsorted fields"
This reverts commit 2f4d3d8587.
2018-03-20 14:51:25 -07:00
Marty Schoch 35ea1d4423 fix MB-28719 and MB-28781 invalid/missing field in scorch
Use of sync.Pool to reuse the interm structure relied on resetting
the fieldsInv slice.  However, actual segments continued to use
this same fieldsInv slice after returning it to the pool. Simple
fix is to nil out fieldsInv slice in reset method and let the
newly built segment keep the one from the interim struct.
2018-03-20 17:41:56 -04:00
Steve Yen e88cb783e2
Merge pull request #845 from steveyen/MB-28719-related-assertions
MB-28719 related assertions
2018-03-20 11:38:31 -07:00
Steve Yen 2f4d3d8587 scorch zap panic if mergeFields() sees unsorted fields
mergeFields depends on the fields from the various segments being
sorted for the fieldsSame comparison to work.

Of note, the 'fieldi > 1' guard skips the 0th field, which should
always be the '_id' field.
2018-03-20 11:17:46 -07:00
Steve Yen 85b4a31e2a scorch zap getField() which panics if the field is unknown 2018-03-20 11:12:18 -07:00
abhinavdangeti 85df86ba17 Unit tests for segments with docs with non-overlapping fields 2018-03-19 12:37:50 -07:00
Marty Schoch 65e16a7d96
Merge pull request #841 from mschoch/improve-cmdline
improve command-line tool for zap
2018-03-19 15:06:17 -04:00
Steve Yen 0492b33c2e
Merge pull request #840 from steveyen/MB-28781
MB-28781 - check if fields are the same before using merge optimization of copying term/norm/loc bytes
2018-03-19 11:58:53 -07:00
Marty Schoch e9b228bcdd improve command-line tool for zap
correctly handle/print additional loc bitmap address
this fixes bitmap length that is output
instantiate roaring bitmap and print it out
removed some unnecessary debug logging

updated dict command to print 1-hit encoded vals
this makes dict command usable for seeing which
doc ids are in a segment and their corresponding doc number
2018-03-19 14:57:30 -04:00
Steve Yen f65ba5c0f4 MB-28781 - scorch zap merge freq/loc copying only when fieldsSame
The optimization recently introduced in commit 530a3d24cf,
("scorch zap optimize merge by byte copying freq/norm/loc's") was to
byte-copy freq/norm/loc data directly during merging.  But, it was
incorrect if the fields were different across segments.

This change now performs that byte-copying merging optimization only
when the fields are the same across segments, and if not, leverages
the old approach of deserializing & re-serializing the freq/norm/loc
information, which has the important step of remapping fieldID's.

See also: https://issues.couchbase.com/browse/MB-28781
2018-03-19 11:26:51 -07:00
Steve Yen c881146270 scorch zap mergeTermFreqNormLocsByCopying() helper func 2018-03-19 10:36:23 -07:00
Sreekanth Sivasankaran cf8e0d63bb
Merge pull request #837 from blevesearch/docnum_missing_fix
MB-28753 - docNumber "xx" not found err with updates
2018-03-19 14:36:30 +05:30
Sreekanth Sivasankaran 980ce9ebb3 MB-28753 - document number "xxx" not found
err with update workload

Introducer was incorrectly updating the offsets slice
of segments, by considering only the live doc count
while computing the "running". This can result in
incorrectly computing the residing segment as well as
the local doc numbers while loading a document after
a search hit.
2018-03-19 12:11:37 +05:30
Steve Yen 6693a89441
Merge pull request #835 from steveyen/use-1MB-buffer-for-file-merger
scorch zap file merger uses 1MB buffered writer
2018-03-18 09:06:52 -07:00