bleve

Author	SHA1	Message	Date
Steve Yen	3f1dcb6078	scorch zap merge optimize drops lookup to outside of loop	2018-02-27 09:23:29 -08:00
Steve Yen	99ed127176	scorch zap merge optimize newDocNums lookup to outside of loop And, also a "go fmt".	2018-02-26 14:23:55 -08:00
Steve Yen	ce2332e111	scorch zap merge reuses tf/locEncoder across terms The finishTerm() helper func that's invoked on every outer loop resets the tf/locEncoders so they can be safely reused.	2018-02-26 11:37:11 -08:00
Steve Yen	a0b7508da7	scorch zap mergeSegmentBases() func As part of this, zap.MergeToWriter() now returns more information -- enough so that callers can now create their own SegmentBase instances. Also, the fieldsMap maintained and returned by zap.MergeToWriter() is now a mapping from fieldName ==> fieldID+1 (instead of the previous mapping from fieldName ==> fieldID). This makes it similar to how fieldsMap are handled in other parts of zap to avoid "zero value" issues.	2018-02-19 14:13:31 -08:00
Steve Yen	fe544f3352	scorch zap merge uses enumerator for vellum.Iterator's	2018-02-12 21:28:46 -08:00
Steve Yen	2158e06c40	scorch zap merge collects dicts & itrs in lock-step The theory with this change is that the dicts and itrs should be positionally in "lock-step" with paired entries. And, since later code also uses the same array indexing to access the drops and newDocNums, those also need to be positionally in pair-wise lock-step, too.	2018-02-12 20:54:07 -08:00
Steve Yen	e37c563c56	scorch zap merge move fieldDvLocsOffset var declaration Move the var declaration to nearer where its used.	2018-02-08 18:03:09 -08:00
Steve Yen	f177f07613	scorch zap segment merging reuses prealloc'ed PostingsIterator During zap segment merging, a new zap PostingsIterator was allocated for every field X segment X term. This change optimizes by reusing a single PostingsIterator instance per persistMergedRest() invocation. And, also unused fields are removed from the PostingsIterator.	2018-02-08 17:24:30 -08:00
Steve Yen	ed4826b189	scorch zap merge optimization to byte-copy storedDocs The optimization to byte-copy all the storedDocs for a given segment during merging kicks in when the fields are the same across all segments and when there are no deletions for that given segment. This can happen, for example, during data loading or insert-only scenarios. As part of this commit, the Segment.copyStoredDocs() method was added, which uses a single Write() call to copy all the stored docs bytes of a segment to a writer in one shot. And, getDocStoredMetaAndCompressed() was refactored into a related helper function, getDocStoredOffsets(), which provides the storedDocs metadata (offsets & lengths) for a doc.	2018-02-08 09:08:35 -08:00
Steve Yen	0b50a20cac	scorch zap move docDropped const to earlier in file	2018-02-08 09:06:31 -08:00
Steve Yen	822457542e	scorch zap VERSION bump: check whether fields are the same at merge COMPATIBILITY NOTE: scorch zap version bumped in this commit. The version bump is because mergeFields() now computes whether fields are the same across segments and it relies on the previous commit where fieldID's are assigned in field name sorted order (albeit with _id field always having fieldID of 0). Potential future commits might rely on this info that "fields are the same across segments" for more optimizations, etc.	2018-02-08 09:06:30 -08:00
Steve Yen	ffdeb8055e	scorch sorts fields by name to assign fieldID's This is a stepping stone to allow easier future comparisons of field maps and potential merge optimizations. In bleve-blast tests on a 2015 macbook (50K wikipedia docs, 8 indexers, batch size 100, ssd), this does not seem to have a distinct effect on indexing throughput.	2018-02-08 09:06:30 -08:00
Steve Yen	a83ee0f364	scorch zap.MergeToWriter() takes SegmentBases instead of Segments This change turns zap.MergeToWriter() into a public func, so that it's now directly callable from outside packages (such as from scorch's top-level merger or persister). And, MergerToWriter() now takes input of SegmentBases instead of Segments, so that it can now work on either in-memory zap segments or file-based zap segments. This is yet another stepping stone towards in-memory merging of zap segments.	2018-02-07 14:38:13 -08:00
Steve Yen	8c2520d55c	scorch zap optimize via postingsList reuse pprof graphs were showing many postingsList allocations during merging, so this change optimizes by reusing postingList memory in the merging loops.	2018-02-07 14:33:20 -08:00
Steve Yen	0dfd73d6cc	scorch zap mergeStoredAndRemap loop optimization This change avoids an array/slice access in a loop body.	2018-02-06 17:10:44 -08:00
Steve Yen	6578655758	scorch zap refactored out mergeToWriter() func This is a step towards supporting in-memory zap segment merging.	2018-02-05 07:39:16 -08:00
Steve Yen	eb21bf8315	scorch zap merge & build share persistStoredFieldValues() Refactored out a helper func, persistStoredFieldValues(), that both the persistence and merge codepaths now share.	2018-02-05 07:38:55 -08:00
Steve Yen	714f5321e0	scorch zap merge storedFieldVals inner loop optimization	2018-02-01 16:28:15 -08:00
Steve Yen	634cfa0560	scorch zap chunkedIntCoder optimization to prealloc some final buf	2018-01-29 11:03:53 -08:00
Steve Yen	a444c25ddf	scorch zap merge uses array for docTermMap with no sorting Instead of sorting docNum keys from a hashmap, this change instead iterates from docNum 0 to N and uses an array instead of hashmap. The array is also reused across outer loop iterations. This optimizes for when there's a lot of structural similarity between docs, where many/most docs have the same fields. i.e., beers, breweries. If every doc has completely different fields, then this change might produce worse behavior compared to the previous sparse hashmap approach.	2018-01-29 10:47:08 -08:00
Steve Yen	745575a6c1	scorch zap mergeStoredAndRemap uses array indexing, not append() Since we have right array size preallocated, we don't need the extra capacity checking of append().	2018-01-27 11:35:10 -08:00
Steve Yen	8dd17a3b20	scorch zap mergeStoredAndRemap uses continue for less indentation	2018-01-27 11:35:10 -08:00
Steve Yen	0041664bc4	scorch zap merge computeNewDocCount() optimize 1 variable	2018-01-27 11:35:10 -08:00
Steve Yen	6985db13a0	scorch zap merge reuses docNumbers array	2018-01-27 11:35:10 -08:00
Steve Yen	916bbf4125	scorch zap merge prealloc's docTermMap capacity	2018-01-27 11:35:10 -08:00
Steve Yen	56cdb68f35	scorch zap merge checks err2 not err Also, optimize the appending of the termSeparator so that the docTermMap is accessed and updated just once.	2018-01-27 11:35:10 -08:00
Steve Yen	3030d4edb5	scorch zap merge preallocs segNewDocNums capacity	2018-01-27 11:35:10 -08:00
Steve Yen	9038d75c98	scorch zap allocate govarint.U64Base128Encoder just once Instead of allocating a govarint.U64Base128Encoder in the inner loop, allocate it just once on the outside, as it appears that it's just a thin wrapper around binary.PutUvarint().	2018-01-27 11:35:10 -08:00
Steve Yen	10dd5489c2	scorch zap Dict.postingsList() takes []byte for more mem control This allows callers that already have a []byte term to avoid string'ification garbage.	2018-01-27 11:35:10 -08:00
Steve Yen	6a17ff48c7	scorch zap removed uneeded []byte cast of term	2018-01-27 11:35:10 -08:00
Steve Yen	d389e2bb40	scorch zap merge file cleanup on error, and some minor prealloc's	2018-01-27 11:35:10 -08:00
Steve Yen	37121c3b49	scorch zap writeRoaringWithLen optimized with reused bufs	2018-01-27 11:35:10 -08:00
Steve Yen	5a035dc9aa	scorch zap in-memory segment representation (SegmentBase) The zap SegmentBase struct is a refactoring of the zap Segment into the subset of fields that are needed for read-only ops, without any persistence related info. This allows us to use zap's optimized data encoding as scorch's in-memory segments. The zap Segment struct now embeds a zap SegmentBase struct, and layers on persistence. Both the zap Segment and zap SegmentBase implement scorch's Segment interface.	2018-01-27 11:35:10 -08:00
Sreekanth Sivasankaran	448201243a	removed redundant buf writer, and checks	2017-12-30 16:54:06 +05:30
Sreekanth Sivasankaran	c8df014c0c	Updated readme, zap version, added new docvalue cmd, fixed the footer and fields cmd, interface name updated	2017-12-29 21:39:29 +05:30
Sreekanth Sivasankaran	76f827f469	docValue persist changes docValues are persisted along with the index, in a columnar fashion per field with variable sized chunking for quick look up. -naive chunk level caching is added per field -data part inside a chunk is snappy compressed -metaHeader inside the chunk index the dv values inside the uncompressed data part -all the fields are docValue persisted in this iteration	2017-12-28 12:05:33 +05:30
Steve Yen	67e0e5973b	scorch mergeStoredAndRemap() memory reuse In mergeStoredAndRemap(), instead of allocating new hashmaps for each document, this commit reuses some arrays that are indexed by fieldId.	2017-12-20 15:18:22 -08:00
Steve Yen	c155255506	scorch optimize zap.Merge() to reuse some buffers	2017-12-20 14:59:53 -08:00
Steve Yen	f6b506134b	import couchbase/vellum instead of couchbaselabs/vellum Also, scrubbed an old couchbaselabs/moss reference in comments. Also, go fmt.	2017-12-19 10:49:57 -08:00
Marty Schoch	e1b0c61e2a	fix bug in handling iterator-done	2017-12-13 22:08:06 -05:00
Marty Schoch	a0e12b2640	add license to a few files missing it	2017-12-13 16:12:29 -05:00
Marty Schoch	85e15628ee	major refactoring of posting details	2017-12-13 16:10:06 -05:00
Marty Schoch	6e2207c445	additional refactoring of build/merge	2017-12-13 15:22:13 -05:00
Marty Schoch	289dc398bd	more refacotring of build/merge	2017-12-13 14:26:11 -05:00
Marty Schoch	1cd3fd7fbe	extrac common functionality between build/merge	2017-12-13 14:06:54 -05:00
Marty Schoch	f83c9f2a20	initial cut of merger that actually introduces changes	2017-12-13 13:41:03 -05:00
Marty Schoch	57121e40a8	fix issues identified by errcheck	2017-12-12 11:41:14 -05:00
Marty Schoch	665c3c80ff	initial cut of zap segment merging	2017-12-12 11:21:55 -05:00

48 Commits