bleve

Author	SHA1	Message	Date
Steve Yen	192621f402	scorch includeFreq/Norm/Locs params for postingsList.Iterator API This commit adds boolean flag params to the scorch PostingsList.Iterator() method, so that the caller can specify whether freq/norm/locs information is needed or not. Future changes can leverage these params for optimizations.	2018-03-26 09:49:44 -07:00
Steve Yen	6540b197d4	scorch zap provide full buffer capacity to snappy Encode/Decode() The snappy Encode/Decode() API's accept an optional destination buffer param where their encoded/decoded output results will be placed, but they only check that the buffer has enough len() rather than enough capacity before deciding to allocate a new buffer.	2018-03-26 09:49:44 -07:00
Steve Yen	ba644f3893	scorch zap fix postingsIter.nextBytes() when 1-bit encoded The previous commit's optimization that replaced the locsBitmap was incorrectly handling the case when there was a 1-bit encoding optimization in the postingsIterator.nextBytes() method, incorrectly generating the freq-norm bytes. Also as part of this change, more unused locsBitmap's were removed.	2018-03-26 09:19:00 -07:00
Steve Yen	7a19e6fd7e	scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding This is attempt #2 of the optimization that replaces the locsBitmap, without any changes from the original commit attempt. A commit that follows this one contains the actual fix. See also... - commit `621b58dd83` (the 1st attempt) - commit `49a4ee60ba` (the revert) ------------- The original commit message body from 621b58 was... NOTE: this is a zap file format change. The separate "postings locations" roaring Bitmap that encoded whether a posting has locations info is now replaced by the least significant bit in the freq varint encoded in the freq-norm chunkedIntCoder. encode/decodeFreqHasLocs() are added as helper functions.	2018-03-23 12:50:24 -07:00
Steve Yen	49a4ee60ba	Revert "scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding" Testing with the cbft application led to cbft process exits... AsyncError exit()... error reading location field: EOF -- main.initBleveOptions.func1() at init_bleve.go:85 This reverts commit `621b58dd83`.	2018-03-23 10:01:30 -07:00
Steve Yen	621b58dd83	scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding NOTE: this is a zap file format change. The separate "postings locations" roaring Bitmap that encoded whether a posting has locations info is now replaced by the least significant bit in the freq varint encoded in the freq-norm chunkedIntCoder. encode/decodeFreqHasLocs() are added as helper functions.	2018-03-22 17:43:07 -07:00
abhinavdangeti	844845b5d2	Revert "scorch zap panic if mergeFields() sees unsorted fields" This reverts commit `2f4d3d8587`.	2018-03-20 14:51:25 -07:00
Steve Yen	2f4d3d8587	scorch zap panic if mergeFields() sees unsorted fields mergeFields depends on the fields from the various segments being sorted for the fieldsSame comparison to work. Of note, the 'fieldi > 1' guard skips the 0th field, which should always be the '_id' field.	2018-03-20 11:17:46 -07:00
Steve Yen	f65ba5c0f4	MB-28781 - scorch zap merge freq/loc copying only when fieldsSame The optimization recently introduced in commit `530a3d24cf`, ("scorch zap optimize merge by byte copying freq/norm/loc's") was to byte-copy freq/norm/loc data directly during merging. But, it was incorrect if the fields were different across segments. This change now performs that byte-copying merging optimization only when the fields are the same across segments, and if not, leverages the old approach of deserializing & re-serializing the freq/norm/loc information, which has the important step of remapping fieldID's. See also: https://issues.couchbase.com/browse/MB-28781	2018-03-19 11:26:51 -07:00
Steve Yen	c881146270	scorch zap mergeTermFreqNormLocsByCopying() helper func	2018-03-19 10:36:23 -07:00
Steve Yen	5df53c8e1f	scorch zap file merger uses 1MB buffered writer pprof of bleve-blast was showing file merging was in syscall/write a lot. The bufio.NewWriter() provides a default buffer size of 4K, which is too small, and using bufio.NewWriterSize(1MB buffer size) leads to syscall/write dropping out of the file merging flame graphs.	2018-03-16 11:49:53 -07:00
Steve Yen	e52eb84e37	scorch zap optimize merge when deletion bitmap is empty This change detects whether a deletion bitmap is empty, and treats that as a nil bitmap, which allows further postings iterator codepaths to avoid roaring bitmap operations (like, AndNot(docNums, drops)).	2018-03-16 11:22:50 -07:00
Steve Yen	cad88096ca	scorch zap reuse roaring Bitmap during merge	2018-03-12 09:17:37 -07:00
Steve Yen	3884cf4d12	scorch zap writePostings() helper func refactored out	2018-03-09 13:29:28 -08:00
Steve Yen	eac9808990	scorch zap optimize FST val encoding for terms with 1 hit NOTE: this is a scorch zap file format change / bump to version 4. In this optimization, the uint64 val stored in the vellum FST (term dictionary) now may either be a uint64 postingsOffset (same as before this change) or a uint64 encoding of the docNum + norm (in the case where a term appears in just a single doc).	2018-03-08 09:19:54 -08:00
Steve Yen	15242af465	Merge pull request #805 from steveyen/optimize-scorch-mem-processField Optimize scorch processField() inner loop and writeRoaringWithLen()	2018-03-07 09:09:57 -08:00
Sreekanth Sivasankaran	e0369a3553	Merge branch 'master' into compaction_bytes_stats	2018-03-07 14:47:33 +05:30
Sreekanth Sivasankaran	2a9739ee1b	naming change, interface removal	2018-03-07 14:43:33 +05:30
Steve Yen	dde6c2e01b	scorch zap optimize writeRoaringWithLen() Before this change, writeRoaringWithLen() would leverage a reused bytes.Buffer (#A) and invoke the roaring.WriteTo() API. But, it turns out the roaring.WriteTo() API has a suboptimal implementation, in that underneath-the-hood it converts the roaring bitmap to a byte buffer (using roaring.ToBytes()), and then calls Write(). But, that Write() turns out to be an additional memcpy into the provided bytes.Buffer (#A). By directly invoking roaring.ToBytes(), this change to writeRoaringWithLen() avoids the extra memory allocation and memcpy.	2018-03-06 14:59:20 -08:00
Steve Yen	530a3d24cf	scorch zap optimize merge by byte copying freq/norm/loc's This change adds a zap PostingsIterator.nextBytes() method, which is similar to Next(), but instead of returning a Posting instance, nextBytes() returns the encoded freq/norm and location byte slices. The zap merge code then provides those byte slices directly to the intCoder's via a new method, intCoder.AddBytes(), thereby avoiding having to encode many uvarint's.	2018-03-06 13:30:59 -08:00
Sreekanth Sivasankaran	395b0a312d	adding UTs	2018-03-05 17:02:58 +05:30
Sreekanth Sivasankaran	dec265c481	adding compaction_written_bytes/sec stats to scorch	2018-03-05 16:32:57 +05:30
Marty Schoch	0363b24dd4	update to use new vellum Reset API	2018-03-01 09:37:39 -08:00
Steve Yen	3f1dcb6078	scorch zap merge optimize drops lookup to outside of loop	2018-02-27 09:23:29 -08:00
Steve Yen	99ed127176	scorch zap merge optimize newDocNums lookup to outside of loop And, also a "go fmt".	2018-02-26 14:23:55 -08:00
Steve Yen	ce2332e111	scorch zap merge reuses tf/locEncoder across terms The finishTerm() helper func that's invoked on every outer loop resets the tf/locEncoders so they can be safely reused.	2018-02-26 11:37:11 -08:00
Steve Yen	a0b7508da7	scorch zap mergeSegmentBases() func As part of this, zap.MergeToWriter() now returns more information -- enough so that callers can now create their own SegmentBase instances. Also, the fieldsMap maintained and returned by zap.MergeToWriter() is now a mapping from fieldName ==> fieldID+1 (instead of the previous mapping from fieldName ==> fieldID). This makes it similar to how fieldsMap are handled in other parts of zap to avoid "zero value" issues.	2018-02-19 14:13:31 -08:00
Steve Yen	fe544f3352	scorch zap merge uses enumerator for vellum.Iterator's	2018-02-12 21:28:46 -08:00
Steve Yen	2158e06c40	scorch zap merge collects dicts & itrs in lock-step The theory with this change is that the dicts and itrs should be positionally in "lock-step" with paired entries. And, since later code also uses the same array indexing to access the drops and newDocNums, those also need to be positionally in pair-wise lock-step, too.	2018-02-12 20:54:07 -08:00
Steve Yen	e37c563c56	scorch zap merge move fieldDvLocsOffset var declaration Move the var declaration to nearer where its used.	2018-02-08 18:03:09 -08:00
Steve Yen	f177f07613	scorch zap segment merging reuses prealloc'ed PostingsIterator During zap segment merging, a new zap PostingsIterator was allocated for every field X segment X term. This change optimizes by reusing a single PostingsIterator instance per persistMergedRest() invocation. And, also unused fields are removed from the PostingsIterator.	2018-02-08 17:24:30 -08:00
Steve Yen	ed4826b189	scorch zap merge optimization to byte-copy storedDocs The optimization to byte-copy all the storedDocs for a given segment during merging kicks in when the fields are the same across all segments and when there are no deletions for that given segment. This can happen, for example, during data loading or insert-only scenarios. As part of this commit, the Segment.copyStoredDocs() method was added, which uses a single Write() call to copy all the stored docs bytes of a segment to a writer in one shot. And, getDocStoredMetaAndCompressed() was refactored into a related helper function, getDocStoredOffsets(), which provides the storedDocs metadata (offsets & lengths) for a doc.	2018-02-08 09:08:35 -08:00
Steve Yen	0b50a20cac	scorch zap move docDropped const to earlier in file	2018-02-08 09:06:31 -08:00
Steve Yen	822457542e	scorch zap VERSION bump: check whether fields are the same at merge COMPATIBILITY NOTE: scorch zap version bumped in this commit. The version bump is because mergeFields() now computes whether fields are the same across segments and it relies on the previous commit where fieldID's are assigned in field name sorted order (albeit with _id field always having fieldID of 0). Potential future commits might rely on this info that "fields are the same across segments" for more optimizations, etc.	2018-02-08 09:06:30 -08:00
Steve Yen	ffdeb8055e	scorch sorts fields by name to assign fieldID's This is a stepping stone to allow easier future comparisons of field maps and potential merge optimizations. In bleve-blast tests on a 2015 macbook (50K wikipedia docs, 8 indexers, batch size 100, ssd), this does not seem to have a distinct effect on indexing throughput.	2018-02-08 09:06:30 -08:00
Steve Yen	a83ee0f364	scorch zap.MergeToWriter() takes SegmentBases instead of Segments This change turns zap.MergeToWriter() into a public func, so that it's now directly callable from outside packages (such as from scorch's top-level merger or persister). And, MergerToWriter() now takes input of SegmentBases instead of Segments, so that it can now work on either in-memory zap segments or file-based zap segments. This is yet another stepping stone towards in-memory merging of zap segments.	2018-02-07 14:38:13 -08:00
Steve Yen	8c2520d55c	scorch zap optimize via postingsList reuse pprof graphs were showing many postingsList allocations during merging, so this change optimizes by reusing postingList memory in the merging loops.	2018-02-07 14:33:20 -08:00
Steve Yen	0dfd73d6cc	scorch zap mergeStoredAndRemap loop optimization This change avoids an array/slice access in a loop body.	2018-02-06 17:10:44 -08:00
Steve Yen	6578655758	scorch zap refactored out mergeToWriter() func This is a step towards supporting in-memory zap segment merging.	2018-02-05 07:39:16 -08:00
Steve Yen	eb21bf8315	scorch zap merge & build share persistStoredFieldValues() Refactored out a helper func, persistStoredFieldValues(), that both the persistence and merge codepaths now share.	2018-02-05 07:38:55 -08:00
Steve Yen	714f5321e0	scorch zap merge storedFieldVals inner loop optimization	2018-02-01 16:28:15 -08:00
Steve Yen	634cfa0560	scorch zap chunkedIntCoder optimization to prealloc some final buf	2018-01-29 11:03:53 -08:00
Steve Yen	a444c25ddf	scorch zap merge uses array for docTermMap with no sorting Instead of sorting docNum keys from a hashmap, this change instead iterates from docNum 0 to N and uses an array instead of hashmap. The array is also reused across outer loop iterations. This optimizes for when there's a lot of structural similarity between docs, where many/most docs have the same fields. i.e., beers, breweries. If every doc has completely different fields, then this change might produce worse behavior compared to the previous sparse hashmap approach.	2018-01-29 10:47:08 -08:00
Steve Yen	745575a6c1	scorch zap mergeStoredAndRemap uses array indexing, not append() Since we have right array size preallocated, we don't need the extra capacity checking of append().	2018-01-27 11:35:10 -08:00
Steve Yen	8dd17a3b20	scorch zap mergeStoredAndRemap uses continue for less indentation	2018-01-27 11:35:10 -08:00
Steve Yen	0041664bc4	scorch zap merge computeNewDocCount() optimize 1 variable	2018-01-27 11:35:10 -08:00
Steve Yen	6985db13a0	scorch zap merge reuses docNumbers array	2018-01-27 11:35:10 -08:00
Steve Yen	916bbf4125	scorch zap merge prealloc's docTermMap capacity	2018-01-27 11:35:10 -08:00
Steve Yen	56cdb68f35	scorch zap merge checks err2 not err Also, optimize the appending of the termSeparator so that the docTermMap is accessed and updated just once.	2018-01-27 11:35:10 -08:00
Steve Yen	3030d4edb5	scorch zap merge preallocs segNewDocNums capacity	2018-01-27 11:35:10 -08:00

1 2

71 Commits