bleve

gibheer

bleve

Author	SHA1	Message	Date
abhinavdangeti	7e36109b3c	MB-28162: Provide API to estimate memory needed to run a search query This API (unexported) will estimate the amount of memory needed to execute a search query over an index before the collector begins data collection. Sample estimates for certain queries: {Size: 10, BenchmarkUpsidedownSearchOverhead} ESTIMATE BENCHMEM TermQuery 4616 4796 MatchQuery 5210 5405 DisjunctionQuery (Match queries) 7700 8447 DisjunctionQuery (Term queries) 6514 6591 ConjunctionQuery (Match queries) 7524 8175 Nested disjunction query (disjunction of disjunctions) 10306 10708 …	2018-03-06 13:53:42 -08:00
Steve Yen	5b86da85f3	scorch zap optimize postings itr with tf/loc reader/decoder reuse	2018-03-06 13:30:59 -08:00
Steve Yen	530a3d24cf	scorch zap optimize merge by byte copying freq/norm/loc's This change adds a zap PostingsIterator.nextBytes() method, which is similar to Next(), but instead of returning a Posting instance, nextBytes() returns the encoded freq/norm and location byte slices. The zap merge code then provides those byte slices directly to the intCoder's via a new method, intCoder.AddBytes(), thereby avoiding having to encode many uvarint's.	2018-03-06 13:30:59 -08:00
Steve Yen	655268bec8	scorch zap postings iterator nextDocNum() helper method Refactored out a nextDocNum() helper method from Next() that future optimizations can use.	2018-03-06 07:55:26 -08:00
Steve Yen	502e64c256	scorch zap Posting doesn't use iterator field	2018-03-05 16:33:13 -08:00
Steve Yen	8f8fd511b7	scorch zap access freqs[offset] outside loop	2018-03-05 12:02:33 -08:00
Steve Yen	a338386a03	scorch build optimize freq/loc slice capacity	2018-03-05 12:02:33 -08:00
Steve Yen	856778ad7b	scorch zap build prealloc docNumbers capacity	2018-03-05 12:02:33 -08:00
Steve Yen	8c0881eab2	scorch zap build reuses mem postingsList/Iterator structs	2018-03-05 12:02:33 -08:00
Steve Yen	85761c6a57	go fmt	2018-03-05 12:02:33 -08:00
Steve Yen	884da6f93a	scorch optimize mem processDocument() norm calculation This change moves the norm calculation outside of the inner loop.	2018-03-03 11:58:30 -08:00
Steve Yen	6ae799052a	scorch mem optimize processDocument() stored field	2018-03-03 11:52:33 -08:00
Steve Yen	b7cfef81c9	scorch optimize mem processDocument() dict access This change moves the dict lookup to outside of the loop.	2018-03-03 11:43:25 -08:00
Steve Yen	88c740095b	scorch optimizations for mem.PostingsIterator.Next() & docTermMap Due to the usage rules of iterators, mem.PostingsIterator.Next() can reuse its returned Postings instance. Also, there's a micro optimization in persistDocValues() for one fewer access to the docTermMap in the inner-loop.	2018-03-03 11:31:18 -08:00
Marty Schoch	0363b24dd4	update to use new vellum Reset API	2018-03-01 09:37:39 -08:00
Steve Yen	7d46d2c7ae	scorch zap intcoder encoder is never nil	2018-02-28 10:09:21 -08:00
Steve Yen	dd7d93ee5e	scorch zap loadChunk reuses Location slices	2018-02-27 18:01:48 -08:00
Steve Yen	4dbb4b1495	scorch zap posting reuses freqNorm & loc reader and decoder	2018-02-27 18:01:48 -08:00
Steve Yen	3f1dcb6078	scorch zap merge optimize drops lookup to outside of loop	2018-02-27 09:23:29 -08:00
Steve Yen	99ed127176	scorch zap merge optimize newDocNums lookup to outside of loop And, also a "go fmt".	2018-02-26 14:23:55 -08:00
Steve Yen	98d5d7bd81	scorch zap chunkedIntCoder optimizations The optimizations / changes include... - reuse of a memory buf when serializing varint's. - reuse of a govarint.U64Base128Encoder instance, as it's a thin, wrapper around an underlying chunkBuf, so Reset()'s on the chunkBuf is enough for encoder reuse. - chunkedIntcoder.Write() method was changed to invoke w.Write() less often by forming a larger, reused buf. Profiling and analysis showed w.Write() was getting called a lot, often with tiny 1 or 2 byte inputs. The theory is w.Write() and its underlying memmove() can be more efficient when provided with larger bufs. - some repeated code removal, by reusing the Close() method.	2018-02-26 14:17:09 -08:00
Steve Yen	ce2332e111	scorch zap merge reuses tf/locEncoder across terms The finishTerm() helper func that's invoked on every outer loop resets the tf/locEncoders so they can be safely reused.	2018-02-26 11:37:11 -08:00
Steve Yen	a0b7508da7	scorch zap mergeSegmentBases() func As part of this, zap.MergeToWriter() now returns more information -- enough so that callers can now create their own SegmentBase instances. Also, the fieldsMap maintained and returned by zap.MergeToWriter() is now a mapping from fieldName ==> fieldID+1 (instead of the previous mapping from fieldName ==> fieldID). This makes it similar to how fieldsMap are handled in other parts of zap to avoid "zero value" issues.	2018-02-19 14:13:31 -08:00
Steve Yen	720010783e	scorch zap InitSegmentBase() helper func Refactored out a zap.InitSegmentBase() func so that non-zap packages can create SegmentBase instances.	2018-02-19 14:13:31 -08:00
Steve Yen	fe544f3352	scorch zap merge uses enumerator for vellum.Iterator's	2018-02-12 21:28:46 -08:00
Steve Yen	a073424e5a	scorch zap dict.postingsListFromOffset() method A helper method that can create a PostingsList if the caller already knows the postingsOffset.	2018-02-12 20:54:07 -08:00
Steve Yen	2158e06c40	scorch zap merge collects dicts & itrs in lock-step The theory with this change is that the dicts and itrs should be positionally in "lock-step" with paired entries. And, since later code also uses the same array indexing to access the drops and newDocNums, those also need to be positionally in pair-wise lock-step, too.	2018-02-12 20:54:07 -08:00
Steve Yen	95a4f37e5c	scorch zap enumerator impl that joins multiple vellum iterators Unlike vellum's MergeIterator, the enumerator introduced in this commit doesn't merge when there are matching keys across iterators. Instead, the enumerator implementation provides a traversal of all the tuples of (key, iteratorIndex, val) from the underlying vellum iterators, ordered by key ASC, iteratorIndex ASC.	2018-02-12 20:54:06 -08:00
Steve Yen	e37c563c56	scorch zap merge move fieldDvLocsOffset var declaration Move the var declaration to nearer where its used.	2018-02-08 18:03:09 -08:00
Steve Yen	f177f07613	scorch zap segment merging reuses prealloc'ed PostingsIterator During zap segment merging, a new zap PostingsIterator was allocated for every field X segment X term. This change optimizes by reusing a single PostingsIterator instance per persistMergedRest() invocation. And, also unused fields are removed from the PostingsIterator.	2018-02-08 17:24:30 -08:00
Steve Yen	ed4826b189	scorch zap merge optimization to byte-copy storedDocs The optimization to byte-copy all the storedDocs for a given segment during merging kicks in when the fields are the same across all segments and when there are no deletions for that given segment. This can happen, for example, during data loading or insert-only scenarios. As part of this commit, the Segment.copyStoredDocs() method was added, which uses a single Write() call to copy all the stored docs bytes of a segment to a writer in one shot. And, getDocStoredMetaAndCompressed() was refactored into a related helper function, getDocStoredOffsets(), which provides the storedDocs metadata (offsets & lengths) for a doc.	2018-02-08 09:08:35 -08:00
Steve Yen	0b50a20cac	scorch zap move docDropped const to earlier in file	2018-02-08 09:06:31 -08:00
Steve Yen	822457542e	scorch zap VERSION bump: check whether fields are the same at merge COMPATIBILITY NOTE: scorch zap version bumped in this commit. The version bump is because mergeFields() now computes whether fields are the same across segments and it relies on the previous commit where fieldID's are assigned in field name sorted order (albeit with _id field always having fieldID of 0). Potential future commits might rely on this info that "fields are the same across segments" for more optimizations, etc.	2018-02-08 09:06:30 -08:00
Steve Yen	ffdeb8055e	scorch sorts fields by name to assign fieldID's This is a stepping stone to allow easier future comparisons of field maps and potential merge optimizations. In bleve-blast tests on a 2015 macbook (50K wikipedia docs, 8 indexers, batch size 100, ssd), this does not seem to have a distinct effect on indexing throughput.	2018-02-08 09:06:30 -08:00
Steve Yen	a83ee0f364	scorch zap.MergeToWriter() takes SegmentBases instead of Segments This change turns zap.MergeToWriter() into a public func, so that it's now directly callable from outside packages (such as from scorch's top-level merger or persister). And, MergerToWriter() now takes input of SegmentBases instead of Segments, so that it can now work on either in-memory zap segments or file-based zap segments. This is yet another stepping stone towards in-memory merging of zap segments.	2018-02-07 14:38:13 -08:00
Steve Yen	8c2520d55c	scorch zap optimize via postingsList reuse pprof graphs were showing many postingsList allocations during merging, so this change optimizes by reusing postingList memory in the merging loops.	2018-02-07 14:33:20 -08:00
Steve Yen	03c8b2b7ec	scorch mem segment optimizes DictEntry's across Next() calls This change optimizes the scorch/mem DictionaryIterator by reusing a DictEntry struct across multiple Next() calls. This follows the same optimization trick and Next() semantics as upsidedown's FieldDict implementation.	2018-02-07 14:17:48 -08:00
Steve Yen	0dfd73d6cc	scorch zap mergeStoredAndRemap loop optimization This change avoids an array/slice access in a loop body.	2018-02-06 17:10:44 -08:00
Steve Yen	c09e2a08ca	scorch zap chunkedContentCoder reuses chunk metadata slice memory And, renamed the chunk MetaData.DocID field to DocNum for naming correctness, where much of this commit is the mechanical effect of that rename.	2018-02-05 07:39:16 -08:00
Steve Yen	6578655758	scorch zap refactored out mergeToWriter() func This is a step towards supporting in-memory zap segment merging.	2018-02-05 07:39:16 -08:00
Steve Yen	eb21bf8315	scorch zap merge & build share persistStoredFieldValues() Refactored out a helper func, persistStoredFieldValues(), that both the persistence and merge codepaths now share.	2018-02-05 07:38:55 -08:00
Steve Yen	714f5321e0	scorch zap merge storedFieldVals inner loop optimization	2018-02-01 16:28:15 -08:00
Steve Yen	93b037cdbb	scorch zap TestMergeWithUpdates()	2018-01-31 11:44:41 -08:00
Steve Yen	4dd64b68fa	scorch zap TestMergeWithEmptySegment(s)	2018-01-30 22:27:40 -08:00
Steve Yen	684ee3c0e7	scorch zap DictIterator term count fixed and more merge unit tests The zap DictionaryIterator Next() was incorrectly returning the postingsList offset as the term count. As part of this, refactored out a PostingsList.read() helper method. Also added more merge unit test scenarios, including merging a segment for a few rounds to see if there are differences before/after merging.	2018-01-30 21:22:06 -08:00
Steve Yen	634cfa0560	scorch zap chunkedIntCoder optimization to prealloc some final buf	2018-01-29 11:03:53 -08:00
Steve Yen	a444c25ddf	scorch zap merge uses array for docTermMap with no sorting Instead of sorting docNum keys from a hashmap, this change instead iterates from docNum 0 to N and uses an array instead of hashmap. The array is also reused across outer loop iterations. This optimizes for when there's a lot of structural similarity between docs, where many/most docs have the same fields. i.e., beers, breweries. If every doc has completely different fields, then this change might produce worse behavior compared to the previous sparse hashmap approach.	2018-01-29 10:47:08 -08:00
Steve Yen	745575a6c1	scorch zap mergeStoredAndRemap uses array indexing, not append() Since we have right array size preallocated, we don't need the extra capacity checking of append().	2018-01-27 11:35:10 -08:00
Steve Yen	8dd17a3b20	scorch zap mergeStoredAndRemap uses continue for less indentation	2018-01-27 11:35:10 -08:00
Steve Yen	0041664bc4	scorch zap merge computeNewDocCount() optimize 1 variable	2018-01-27 11:35:10 -08:00

1 2 3

136 Commits