bleve

Author	SHA1	Message	Date
Steve Yen	1e2bb14f13	added TestRoaringSizes()	2018-03-07 10:53:24 -08:00
Steve Yen	0ec4a1935a	Merge pull request #808 from steveyen/more-scorch-optimizing err fix and more scorch optimizing	2018-03-07 10:39:20 -08:00
Abhinav Dangeti	06be1ad72e	Merge pull request #806 from abhinavdangeti/master Fixing the scorch search request memory estimate	2018-03-07 10:11:24 -08:00
Steve Yen	59eb70d020	scorch zap remove unused chunkedIntCoder field	2018-03-07 09:11:10 -08:00
Steve Yen	79f28b7c93	scorch fix persistDocValues() err return	2018-03-07 09:11:10 -08:00
Steve Yen	15242af465	Merge pull request #805 from steveyen/optimize-scorch-mem-processField Optimize scorch processField() inner loop and writeRoaringWithLen()	2018-03-07 09:09:57 -08:00
Sreekanth Sivasankaran	e0369a3553	Merge branch 'master' into compaction_bytes_stats	2018-03-07 14:47:33 +05:30
Sreekanth Sivasankaran	2a9739ee1b	naming change, interface removal	2018-03-07 14:43:33 +05:30
abhinavdangeti	5c721226cf	Fixing the scorch search request memory estimate Do not re-account for certain referenced data in the zap structures. New estimates: ESTIMATE BENCHMEM TermQuery 11396 12437 MatchQuery 12244 12951 DisjunctionQuery (Term queries) 20644 20709	2018-03-06 16:03:10 -08:00
Steve Yen	dde6c2e01b	scorch zap optimize writeRoaringWithLen() Before this change, writeRoaringWithLen() would leverage a reused bytes.Buffer (#A) and invoke the roaring.WriteTo() API. But, it turns out the roaring.WriteTo() API has a suboptimal implementation, in that underneath-the-hood it converts the roaring bitmap to a byte buffer (using roaring.ToBytes()), and then calls Write(). But, that Write() turns out to be an additional memcpy into the provided bytes.Buffer (#A). By directly invoking roaring.ToBytes(), this change to writeRoaringWithLen() avoids the extra memory allocation and memcpy.	2018-03-06 14:59:20 -08:00
Steve Yen	b62ca996f6	scorch zap optimize chunkedIntCoder.Add() calls to use multiple vals This change leverages the ability for the chunkedIntCoder.Add() method to accept multiple input param values (via the '...' param signature), meaning there are fewer Add() invocations.	2018-03-06 14:11:41 -08:00
abhinavdangeti	38b6c522b0	Address build breakage after rebase Removed attribute: iterator of type Posting	2018-03-06 14:00:54 -08:00
abhinavdangeti	7e36109b3c	MB-28162: Provide API to estimate memory needed to run a search query This API (unexported) will estimate the amount of memory needed to execute a search query over an index before the collector begins data collection. Sample estimates for certain queries: {Size: 10, BenchmarkUpsidedownSearchOverhead} ESTIMATE BENCHMEM TermQuery 4616 4796 MatchQuery 5210 5405 DisjunctionQuery (Match queries) 7700 8447 DisjunctionQuery (Term queries) 6514 6591 ConjunctionQuery (Match queries) 7524 8175 Nested disjunction query (disjunction of disjunctions) 10306 10708 …	2018-03-06 13:53:42 -08:00
Steve Yen	5b86da85f3	scorch zap optimize postings itr with tf/loc reader/decoder reuse	2018-03-06 13:30:59 -08:00
Steve Yen	530a3d24cf	scorch zap optimize merge by byte copying freq/norm/loc's This change adds a zap PostingsIterator.nextBytes() method, which is similar to Next(), but instead of returning a Posting instance, nextBytes() returns the encoded freq/norm and location byte slices. The zap merge code then provides those byte slices directly to the intCoder's via a new method, intCoder.AddBytes(), thereby avoiding having to encode many uvarint's.	2018-03-06 13:30:59 -08:00
Steve Yen	655268bec8	scorch zap postings iterator nextDocNum() helper method Refactored out a nextDocNum() helper method from Next() that future optimizations can use.	2018-03-06 07:55:26 -08:00
Steve Yen	502e64c256	scorch zap Posting doesn't use iterator field	2018-03-05 16:33:13 -08:00
Steve Yen	8f8fd511b7	scorch zap access freqs[offset] outside loop	2018-03-05 12:02:33 -08:00
Steve Yen	a338386a03	scorch build optimize freq/loc slice capacity	2018-03-05 12:02:33 -08:00
Steve Yen	856778ad7b	scorch zap build prealloc docNumbers capacity	2018-03-05 12:02:33 -08:00
Steve Yen	8c0881eab2	scorch zap build reuses mem postingsList/Iterator structs	2018-03-05 12:02:33 -08:00
Sreekanth Sivasankaran	395b0a312d	adding UTs	2018-03-05 17:02:58 +05:30
Sreekanth Sivasankaran	dec265c481	adding compaction_written_bytes/sec stats to scorch	2018-03-05 16:32:57 +05:30
Steve Yen	88c740095b	scorch optimizations for mem.PostingsIterator.Next() & docTermMap Due to the usage rules of iterators, mem.PostingsIterator.Next() can reuse its returned Postings instance. Also, there's a micro optimization in persistDocValues() for one fewer access to the docTermMap in the inner-loop.	2018-03-03 11:31:18 -08:00
Marty Schoch	0363b24dd4	update to use new vellum Reset API	2018-03-01 09:37:39 -08:00
Steve Yen	7d46d2c7ae	scorch zap intcoder encoder is never nil	2018-02-28 10:09:21 -08:00
Steve Yen	dd7d93ee5e	scorch zap loadChunk reuses Location slices	2018-02-27 18:01:48 -08:00
Steve Yen	4dbb4b1495	scorch zap posting reuses freqNorm & loc reader and decoder	2018-02-27 18:01:48 -08:00
Steve Yen	3f1dcb6078	scorch zap merge optimize drops lookup to outside of loop	2018-02-27 09:23:29 -08:00
Steve Yen	99ed127176	scorch zap merge optimize newDocNums lookup to outside of loop And, also a "go fmt".	2018-02-26 14:23:55 -08:00
Steve Yen	98d5d7bd81	scorch zap chunkedIntCoder optimizations The optimizations / changes include... - reuse of a memory buf when serializing varint's. - reuse of a govarint.U64Base128Encoder instance, as it's a thin, wrapper around an underlying chunkBuf, so Reset()'s on the chunkBuf is enough for encoder reuse. - chunkedIntcoder.Write() method was changed to invoke w.Write() less often by forming a larger, reused buf. Profiling and analysis showed w.Write() was getting called a lot, often with tiny 1 or 2 byte inputs. The theory is w.Write() and its underlying memmove() can be more efficient when provided with larger bufs. - some repeated code removal, by reusing the Close() method.	2018-02-26 14:17:09 -08:00
Steve Yen	ce2332e111	scorch zap merge reuses tf/locEncoder across terms The finishTerm() helper func that's invoked on every outer loop resets the tf/locEncoders so they can be safely reused.	2018-02-26 11:37:11 -08:00
Steve Yen	a0b7508da7	scorch zap mergeSegmentBases() func As part of this, zap.MergeToWriter() now returns more information -- enough so that callers can now create their own SegmentBase instances. Also, the fieldsMap maintained and returned by zap.MergeToWriter() is now a mapping from fieldName ==> fieldID+1 (instead of the previous mapping from fieldName ==> fieldID). This makes it similar to how fieldsMap are handled in other parts of zap to avoid "zero value" issues.	2018-02-19 14:13:31 -08:00
Steve Yen	720010783e	scorch zap InitSegmentBase() helper func Refactored out a zap.InitSegmentBase() func so that non-zap packages can create SegmentBase instances.	2018-02-19 14:13:31 -08:00
Steve Yen	fe544f3352	scorch zap merge uses enumerator for vellum.Iterator's	2018-02-12 21:28:46 -08:00
Steve Yen	a073424e5a	scorch zap dict.postingsListFromOffset() method A helper method that can create a PostingsList if the caller already knows the postingsOffset.	2018-02-12 20:54:07 -08:00
Steve Yen	2158e06c40	scorch zap merge collects dicts & itrs in lock-step The theory with this change is that the dicts and itrs should be positionally in "lock-step" with paired entries. And, since later code also uses the same array indexing to access the drops and newDocNums, those also need to be positionally in pair-wise lock-step, too.	2018-02-12 20:54:07 -08:00
Steve Yen	95a4f37e5c	scorch zap enumerator impl that joins multiple vellum iterators Unlike vellum's MergeIterator, the enumerator introduced in this commit doesn't merge when there are matching keys across iterators. Instead, the enumerator implementation provides a traversal of all the tuples of (key, iteratorIndex, val) from the underlying vellum iterators, ordered by key ASC, iteratorIndex ASC.	2018-02-12 20:54:06 -08:00
Steve Yen	e37c563c56	scorch zap merge move fieldDvLocsOffset var declaration Move the var declaration to nearer where its used.	2018-02-08 18:03:09 -08:00
Steve Yen	f177f07613	scorch zap segment merging reuses prealloc'ed PostingsIterator During zap segment merging, a new zap PostingsIterator was allocated for every field X segment X term. This change optimizes by reusing a single PostingsIterator instance per persistMergedRest() invocation. And, also unused fields are removed from the PostingsIterator.	2018-02-08 17:24:30 -08:00
Steve Yen	ed4826b189	scorch zap merge optimization to byte-copy storedDocs The optimization to byte-copy all the storedDocs for a given segment during merging kicks in when the fields are the same across all segments and when there are no deletions for that given segment. This can happen, for example, during data loading or insert-only scenarios. As part of this commit, the Segment.copyStoredDocs() method was added, which uses a single Write() call to copy all the stored docs bytes of a segment to a writer in one shot. And, getDocStoredMetaAndCompressed() was refactored into a related helper function, getDocStoredOffsets(), which provides the storedDocs metadata (offsets & lengths) for a doc.	2018-02-08 09:08:35 -08:00
Steve Yen	0b50a20cac	scorch zap move docDropped const to earlier in file	2018-02-08 09:06:31 -08:00
Steve Yen	822457542e	scorch zap VERSION bump: check whether fields are the same at merge COMPATIBILITY NOTE: scorch zap version bumped in this commit. The version bump is because mergeFields() now computes whether fields are the same across segments and it relies on the previous commit where fieldID's are assigned in field name sorted order (albeit with _id field always having fieldID of 0). Potential future commits might rely on this info that "fields are the same across segments" for more optimizations, etc.	2018-02-08 09:06:30 -08:00
Steve Yen	ffdeb8055e	scorch sorts fields by name to assign fieldID's This is a stepping stone to allow easier future comparisons of field maps and potential merge optimizations. In bleve-blast tests on a 2015 macbook (50K wikipedia docs, 8 indexers, batch size 100, ssd), this does not seem to have a distinct effect on indexing throughput.	2018-02-08 09:06:30 -08:00
Steve Yen	a83ee0f364	scorch zap.MergeToWriter() takes SegmentBases instead of Segments This change turns zap.MergeToWriter() into a public func, so that it's now directly callable from outside packages (such as from scorch's top-level merger or persister). And, MergerToWriter() now takes input of SegmentBases instead of Segments, so that it can now work on either in-memory zap segments or file-based zap segments. This is yet another stepping stone towards in-memory merging of zap segments.	2018-02-07 14:38:13 -08:00
Steve Yen	8c2520d55c	scorch zap optimize via postingsList reuse pprof graphs were showing many postingsList allocations during merging, so this change optimizes by reusing postingList memory in the merging loops.	2018-02-07 14:33:20 -08:00
Steve Yen	0dfd73d6cc	scorch zap mergeStoredAndRemap loop optimization This change avoids an array/slice access in a loop body.	2018-02-06 17:10:44 -08:00
Steve Yen	c09e2a08ca	scorch zap chunkedContentCoder reuses chunk metadata slice memory And, renamed the chunk MetaData.DocID field to DocNum for naming correctness, where much of this commit is the mechanical effect of that rename.	2018-02-05 07:39:16 -08:00
Steve Yen	6578655758	scorch zap refactored out mergeToWriter() func This is a step towards supporting in-memory zap segment merging.	2018-02-05 07:39:16 -08:00
Steve Yen	eb21bf8315	scorch zap merge & build share persistStoredFieldValues() Refactored out a helper func, persistStoredFieldValues(), that both the persistence and merge codepaths now share.	2018-02-05 07:38:55 -08:00

1 2 3

116 Commits