bleve

Author	SHA1	Message	Date
Steve Yen	5b86da85f3	scorch zap optimize postings itr with tf/loc reader/decoder reuse	2018-03-06 13:30:59 -08:00
Steve Yen	530a3d24cf	scorch zap optimize merge by byte copying freq/norm/loc's This change adds a zap PostingsIterator.nextBytes() method, which is similar to Next(), but instead of returning a Posting instance, nextBytes() returns the encoded freq/norm and location byte slices. The zap merge code then provides those byte slices directly to the intCoder's via a new method, intCoder.AddBytes(), thereby avoiding having to encode many uvarint's.	2018-03-06 13:30:59 -08:00
Steve Yen	655268bec8	scorch zap postings iterator nextDocNum() helper method Refactored out a nextDocNum() helper method from Next() that future optimizations can use.	2018-03-06 07:55:26 -08:00
Sreekanth Sivasankaran	fa5de8e09a	making NumSnapshotsToKeep configurable	2018-03-06 16:22:11 +05:30
Steve Yen	502e64c256	scorch zap Posting doesn't use iterator field	2018-03-05 16:33:13 -08:00
Steve Yen	8f8fd511b7	scorch zap access freqs[offset] outside loop	2018-03-05 12:02:33 -08:00
Steve Yen	a338386a03	scorch build optimize freq/loc slice capacity	2018-03-05 12:02:33 -08:00
Steve Yen	856778ad7b	scorch zap build prealloc docNumbers capacity	2018-03-05 12:02:33 -08:00
Steve Yen	8c0881eab2	scorch zap build reuses mem postingsList/Iterator structs	2018-03-05 12:02:33 -08:00
Steve Yen	85761c6a57	go fmt	2018-03-05 12:02:33 -08:00
Steve Yen	d44c5ad568	scorch stats MaxBatchIntroTime bug fix and more timing stats Added timing stats for in-mem zap merging and file-based zap merging.	2018-03-05 12:02:33 -08:00
Sreekanth Sivasankaran	395b0a312d	adding UTs	2018-03-05 17:02:58 +05:30
Sreekanth Sivasankaran	dec265c481	adding compaction_written_bytes/sec stats to scorch	2018-03-05 16:32:57 +05:30
Steve Yen	884da6f93a	scorch optimize mem processDocument() norm calculation This change moves the norm calculation outside of the inner loop.	2018-03-03 11:58:30 -08:00
Steve Yen	6ae799052a	scorch mem optimize processDocument() stored field	2018-03-03 11:52:33 -08:00
Steve Yen	b7cfef81c9	scorch optimize mem processDocument() dict access This change moves the dict lookup to outside of the loop.	2018-03-03 11:43:25 -08:00
Steve Yen	88c740095b	scorch optimizations for mem.PostingsIterator.Next() & docTermMap Due to the usage rules of iterators, mem.PostingsIterator.Next() can reuse its returned Postings instance. Also, there's a micro optimization in persistDocValues() for one fewer access to the docTermMap in the inner-loop.	2018-03-03 11:31:18 -08:00
Steve Yen	a5253bfe2b	scorch persister goes through introducer to affect root This change allows the introducer to become the only goroutine to modify the root, which in turn allows the introducer to greatly reduce its root lock holding surface area.	2018-03-02 16:14:28 -08:00
Marty Schoch	30acc55d05	remove unnecessary scorch reader wrapper we now use *IndexSnapshot directly	2018-03-02 14:03:54 -08:00
Steve Yen	d61d9e4cf6	scorch stats MaxBatchIntroTime and TotBatchIntroTime	2018-03-02 13:33:06 -08:00
Steve Yen	868a66279e	scorch indexing time stat Looks like this was forgotten along the way -- the stat for analysis time was tracked correctly, but indexing time wasn't.	2018-03-02 11:07:39 -08:00
Steve Yen	7e5bb0bd8d	renamed to CurOnDiskBytes/Files as those are gauges	2018-03-01 14:13:43 -08:00
Marty Schoch	0363b24dd4	update to use new vellum Reset API	2018-03-01 09:37:39 -08:00
Steve Yen	39f9cee910	Merge pull request #789 from steveyen/sreekanth-cb-scorch_stats adding stats for scorch, with no gauges	2018-02-28 17:41:10 -08:00
Steve Yen	1b661ef844	stats cleanup, renaming, gauges replaced with counters	2018-02-28 17:03:28 -08:00
Steve Yen	7d46d2c7ae	scorch zap intcoder encoder is never nil	2018-02-28 10:09:21 -08:00
Sreekanth Sivasankaran	4b742505aa	adding stats for scorch	2018-02-28 15:31:55 +05:30
Steve Yen	dd7d93ee5e	scorch zap loadChunk reuses Location slices	2018-02-27 18:01:48 -08:00
Steve Yen	4dbb4b1495	scorch zap posting reuses freqNorm & loc reader and decoder	2018-02-27 18:01:48 -08:00
Steve Yen	a32362ba2e	MB-28403: scorch introduceMerge doesn't prealloc segments capacity There's now multiple competing merge activities (file-merging and in-memory merging during persistence), so the simple math to precalculate capacity for the slice of segments in introduceMerge() no longer works for all cases and might have negative capacity. This change removes that (sometimes wrong) precalculation, and instead depends on append() to grow the slice correctly.	2018-02-27 15:14:34 -08:00
Steve Yen	3f1dcb6078	scorch zap merge optimize drops lookup to outside of loop	2018-02-27 09:23:29 -08:00
Steve Yen	99ed127176	scorch zap merge optimize newDocNums lookup to outside of loop And, also a "go fmt".	2018-02-26 14:23:55 -08:00
Steve Yen	98d5d7bd81	scorch zap chunkedIntCoder optimizations The optimizations / changes include... - reuse of a memory buf when serializing varint's. - reuse of a govarint.U64Base128Encoder instance, as it's a thin, wrapper around an underlying chunkBuf, so Reset()'s on the chunkBuf is enough for encoder reuse. - chunkedIntcoder.Write() method was changed to invoke w.Write() less often by forming a larger, reused buf. Profiling and analysis showed w.Write() was getting called a lot, often with tiny 1 or 2 byte inputs. The theory is w.Write() and its underlying memmove() can be more efficient when provided with larger bufs. - some repeated code removal, by reusing the Close() method.	2018-02-26 14:17:09 -08:00
Steve Yen	ce2332e111	scorch zap merge reuses tf/locEncoder across terms The finishTerm() helper func that's invoked on every outer loop resets the tf/locEncoders so they can be safely reused.	2018-02-26 11:37:11 -08:00
Marty Schoch	eca31dfd27	Merge pull request #777 from sreekanth-cb/persister_pause pausing persister until merging catches up	2018-02-26 14:36:07 -05:00
Sreekanth Sivasankaran	e02849fcda	fix the indentation	2018-02-26 16:21:33 +05:30
Sreekanth Sivasankaran	c45822347f	Merge branch 'master' into mergeplanner_options	2018-02-26 15:59:20 +05:30
Sreekanth Sivasankaran	e4cc79a9ad	adopting json parsing on options, fixed the inadvertant option modification	2018-02-26 15:56:30 +05:30
Sreekanth Sivasankaran	f0a65f041d	cleaning up the wait loop	2018-02-25 20:58:53 +05:30
Sreekanth Sivasankaran	3a571ad283	Merge branch 'master' into persister_pause	2018-02-24 23:57:20 +05:30
Sreekanth Sivasankaran	874829759b	cleaning up the wait loop	2018-02-24 23:53:49 +05:30
Sreekanth Sivasankaran	4109e327ff	Merge pull request #771 from sreekanth-cb/merge_handling_empty_seg_tasks Fix for empty segment merge handling	2018-02-24 10:48:31 +05:30
Sreekanth Sivasankaran	683e195ac4	adding empty segment handling during introduction cleaning up the segment live size check	2018-02-24 07:03:27 +05:30
abhinavdangeti	da70758635	Handle case where store snapshot isn't closed in upsidedown's Batch() API	2018-02-23 14:47:22 -08:00
Steve Yen	c50d9b4023	scorch conditional merging during persistSnapshot() As part of this change, there are nw helper methods -- persistSnapshotMaybeMerge() and persistSnapshotDirect().	2018-02-23 09:17:02 -08:00
Sreekanth Sivasankaran	a1db057656	configurable mergePlanner options mergePlanner options are parsed from the scorch configs parameters	2018-02-23 16:09:37 +05:30
Sreekanth Sivasankaran	a8ebf2a553	lowering epochDistance to 5, fixing the lastMergedEpoch value updates	2018-02-21 17:25:14 +05:30
Steve Yen	a0b7508da7	scorch zap mergeSegmentBases() func As part of this, zap.MergeToWriter() now returns more information -- enough so that callers can now create their own SegmentBase instances. Also, the fieldsMap maintained and returned by zap.MergeToWriter() is now a mapping from fieldName ==> fieldID+1 (instead of the previous mapping from fieldName ==> fieldID). This makes it similar to how fieldsMap are handled in other parts of zap to avoid "zero value" issues.	2018-02-19 14:13:31 -08:00
Steve Yen	720010783e	scorch zap InitSegmentBase() helper func Refactored out a zap.InitSegmentBase() func so that non-zap packages can create SegmentBase instances.	2018-02-19 14:13:31 -08:00
Steve Yen	656220ca9d	Merge pull request #769 from steveyen/scorch-rollback-ignores-unsafeBatch scorch rollback ignores unsafeBatch flag	2018-02-15 18:51:59 -08:00
Sreekanth Sivasankaran	606a270669	Fix for empty segment merge handling Avoid creating new files with emtpy segments tasks during the merge operation, skips the incorrect appending of a newer segment during merge.	2018-02-15 16:44:20 +05:30
Sreekanth Sivasankaran	35611f4287	Merge branch 'master' into persister_pause	2018-02-14 16:53:06 +05:30
Sreekanth Sivasankaran	6f2797bec3	Adding a pause to persister until the merger catches up	2018-02-14 16:39:26 +05:30
Steve Yen	030469a351	Merge pull request #767 from steveyen/persistSnapshot-err-handling improvements to err handling in persistSnapshot(), etc	2018-02-13 14:53:42 -08:00
Steve Yen	57fc03258e	scorch rollback ignores unsafeBatch flag See also: https://github.com/blevesearch/bleve/issues/760	2018-02-13 10:21:42 -08:00
Steve Yen	fe544f3352	scorch zap merge uses enumerator for vellum.Iterator's	2018-02-12 21:28:46 -08:00
Steve Yen	a073424e5a	scorch zap dict.postingsListFromOffset() method A helper method that can create a PostingsList if the caller already knows the postingsOffset.	2018-02-12 20:54:07 -08:00
Steve Yen	2158e06c40	scorch zap merge collects dicts & itrs in lock-step The theory with this change is that the dicts and itrs should be positionally in "lock-step" with paired entries. And, since later code also uses the same array indexing to access the drops and newDocNums, those also need to be positionally in pair-wise lock-step, too.	2018-02-12 20:54:07 -08:00
Steve Yen	95a4f37e5c	scorch zap enumerator impl that joins multiple vellum iterators Unlike vellum's MergeIterator, the enumerator introduced in this commit doesn't merge when there are matching keys across iterators. Instead, the enumerator implementation provides a traversal of all the tuples of (key, iteratorIndex, val) from the underlying vellum iterators, ordered by key ASC, iteratorIndex ASC.	2018-02-12 20:54:06 -08:00
Steve Yen	e37c563c56	scorch zap merge move fieldDvLocsOffset var declaration Move the var declaration to nearer where its used.	2018-02-08 18:03:09 -08:00
Steve Yen	f177f07613	scorch zap segment merging reuses prealloc'ed PostingsIterator During zap segment merging, a new zap PostingsIterator was allocated for every field X segment X term. This change optimizes by reusing a single PostingsIterator instance per persistMergedRest() invocation. And, also unused fields are removed from the PostingsIterator.	2018-02-08 17:24:30 -08:00
Steve Yen	6f5f90cd41	scorch zap segment cleanup handling for some edge cases Two cases in this commit... If we're shutting down, the merger might not have handed off its latest merged segment to the introducer yet, so the merger still owns the segment and needs to Close() that segment itself. In persistSnapshot(), there migth be cases where the persister might not be able to swap in its newly persisted segments -- so, the persistSnapshot() needs to Close() those segments itself.	2018-02-08 14:04:04 -08:00
Steve Yen	83272a9629	scorch persistSnapshot() err handling & propagation	2018-02-08 14:03:59 -08:00
Steve Yen	dee6a2b1c6	scorch persistSnapshot() consistently uses err to commit vs abort Some codepaths in persistSnapshot() were saving errors into an err2 local variable, which might lead incorrectly to commit during an error situation rather than abort.	2018-02-08 14:02:35 -08:00
Steve Yen	91ac0d011a	scorch uses segment.id to encode boltdb sub-bucket key fixes #764	2018-02-08 13:25:16 -08:00
Steve Yen	8a7990427f	Merge pull request #765 from steveyen/more-TestIndexRollback-fixes fix for TestIndexRollback unit tests	2018-02-08 12:45:28 -08:00
Steve Yen	d0644fec12	scorch persistSnapshot comments update See also: https://github.com/blevesearch/bleve/issues/763	2018-02-08 12:22:58 -08:00
Steve Yen	99852accb0	scorch RollbackPoints() no error at start & fix TestIndexRollback When a scorch is just opened and is "empty", RollbackPoints() no longer considers that an error situation. Also, this commit makes the TestIndexRollback unit tests is a bit more forgiving to races, as we were seeing failures sometimes in travis-CI environments (TestIndexRollback was passing fine on my dev macbook). The theory is the double-looping in the persisterLoop would sometimes be racy, leading to 1 or 2 rollback points.	2018-02-08 11:45:25 -08:00
Steve Yen	ed4826b189	scorch zap merge optimization to byte-copy storedDocs The optimization to byte-copy all the storedDocs for a given segment during merging kicks in when the fields are the same across all segments and when there are no deletions for that given segment. This can happen, for example, during data loading or insert-only scenarios. As part of this commit, the Segment.copyStoredDocs() method was added, which uses a single Write() call to copy all the stored docs bytes of a segment to a writer in one shot. And, getDocStoredMetaAndCompressed() was refactored into a related helper function, getDocStoredOffsets(), which provides the storedDocs metadata (offsets & lengths) for a doc.	2018-02-08 09:08:35 -08:00
Steve Yen	0b50a20cac	scorch zap move docDropped const to earlier in file	2018-02-08 09:06:31 -08:00
Steve Yen	822457542e	scorch zap VERSION bump: check whether fields are the same at merge COMPATIBILITY NOTE: scorch zap version bumped in this commit. The version bump is because mergeFields() now computes whether fields are the same across segments and it relies on the previous commit where fieldID's are assigned in field name sorted order (albeit with _id field always having fieldID of 0). Potential future commits might rely on this info that "fields are the same across segments" for more optimizations, etc.	2018-02-08 09:06:30 -08:00
Steve Yen	ffdeb8055e	scorch sorts fields by name to assign fieldID's This is a stepping stone to allow easier future comparisons of field maps and potential merge optimizations. In bleve-blast tests on a 2015 macbook (50K wikipedia docs, 8 indexers, batch size 100, ssd), this does not seem to have a distinct effect on indexing throughput.	2018-02-08 09:06:30 -08:00
Marty Schoch	1af90936c4	Merge pull request #751 from sreekanth-cb/merger_persister_handshake_fix fix for merger persister handshake stalemate	2018-02-08 11:03:01 -05:00
Marty Schoch	0bcfb15ace	Merge pull request #754 from sreekanth-cb/mergeplan_edge_tuning tuning the edge for merge-task execution loop	2018-02-08 10:59:03 -05:00
Marty Schoch	534bd5ef4d	Merge pull request #753 from steveyen/zap-rollback-test-fixes scorch zap TestIndexRollback fixes	2018-02-08 10:57:41 -05:00
Marty Schoch	f531a248e7	Merge pull request #749 from sreekanth-cb/zapfile_cleanup_fix unblock the files for clean up, esp for merged new segment files	2018-02-08 10:53:41 -05:00
Sreekanth Sivasankaran	feecce1eb2	fix for merger persister handshake stalemate The slow merger was lagging behind the fast persister to a persister notify send-loop while the persister awaits for any new introductions from introducer totally blocking the merger This fix along with the deleted files eligibilty flipping makes the file count to around 6 to 11 files per shard for both travel and beer samples	2018-02-08 11:00:21 +05:30
Steve Yen	a83ee0f364	scorch zap.MergeToWriter() takes SegmentBases instead of Segments This change turns zap.MergeToWriter() into a public func, so that it's now directly callable from outside packages (such as from scorch's top-level merger or persister). And, MergerToWriter() now takes input of SegmentBases instead of Segments, so that it can now work on either in-memory zap segments or file-based zap segments. This is yet another stepping stone towards in-memory merging of zap segments.	2018-02-07 14:38:13 -08:00
Steve Yen	8c2520d55c	scorch zap optimize via postingsList reuse pprof graphs were showing many postingsList allocations during merging, so this change optimizes by reusing postingList memory in the merging loops.	2018-02-07 14:33:20 -08:00
Steve Yen	03c8b2b7ec	scorch mem segment optimizes DictEntry's across Next() calls This change optimizes the scorch/mem DictionaryIterator by reusing a DictEntry struct across multiple Next() calls. This follows the same optimization trick and Next() semantics as upsidedown's FieldDict implementation.	2018-02-07 14:17:48 -08:00
Steve Yen	0dfd73d6cc	scorch zap mergeStoredAndRemap loop optimization This change avoids an array/slice access in a loop body.	2018-02-06 17:10:44 -08:00
Steve Yen	eb1d269521	Merge pull request #748 from steveyen/master scorch zap merge related refactorings / optimizations	2018-02-06 07:52:17 -08:00
Sreekanth Sivasankaran	07274c036d	tuning the edge for merge-task execution loop Adjusting the merge task creation loop to accommodate the newly merged segments so that the eventual merge results/ number of segments stay within the calculated budget.	2018-02-06 13:48:16 +05:30
Steve Yen	a280ba7cf8	scorch zap TestIndexRollback fixes The TestIndexRollback unit test was failing more often than ever (perhaps raciness?), so this commit tries to remove avenues of raciness in the test... - The Scorch.Open() method is refactored into an Scorch.openBolt() helper method in order to allow unit tests to control which background goroutines are started. - TestIndexRollback() doesn't start the merger goroutine, to simulate a really slow merger that never gets around to merging old segments. - TestIndexRollback() creates a long-lived reader after the first batch, so that the first index snapshot isn't removed due to the long-lived reader's ref-count. - TestIndexRollback() temporarily bumps NumSnapshotsToKeep to a large number so the persister isn't tempted to removeOldData() that we're trying to rollback to.	2018-02-05 12:23:58 -08:00
Steve Yen	fdb240f5f9	more zap merge-planner CalcBudget tests at larger sizes Helps provide a sense of how # of segments grows as # of documents grows. Ex: 1B docs => budget of 54 segments.	2018-02-05 10:02:47 -08:00
Steve Yen	c09e2a08ca	scorch zap chunkedContentCoder reuses chunk metadata slice memory And, renamed the chunk MetaData.DocID field to DocNum for naming correctness, where much of this commit is the mechanical effect of that rename.	2018-02-05 07:39:16 -08:00
Steve Yen	3da191852d	scorch zap tighten up prepareSegment()'s lock area	2018-02-05 07:39:16 -08:00
Steve Yen	6578655758	scorch zap refactored out mergeToWriter() func This is a step towards supporting in-memory zap segment merging.	2018-02-05 07:39:16 -08:00
Steve Yen	eb21bf8315	scorch zap merge & build share persistStoredFieldValues() Refactored out a helper func, persistStoredFieldValues(), that both the persistence and merge codepaths now share.	2018-02-05 07:38:55 -08:00
Sreekanth Sivasankaran	9636209ae5	Update persister.go comment updated	2018-02-05 20:49:30 +05:30
Sreekanth Sivasankaran	678c412157	unblock the files for clean up, esp for merged new segment files	2018-02-02 14:44:02 +05:30
Steve Yen	714f5321e0	scorch zap merge storedFieldVals inner loop optimization	2018-02-01 16:28:15 -08:00
Steve Yen	175f80403a	Merge pull request #747 from steveyen/master scorch zap DictIterator term count fixed and more merge unit tests	2018-02-01 10:13:18 -08:00
Abhinav Dangeti	c24f8944c4	Merge pull request #738 from abhinavdangeti/scorch-stats Add support for certain disk stats	2018-02-01 08:35:59 -08:00
Steve Yen	93b037cdbb	scorch zap TestMergeWithUpdates()	2018-01-31 11:44:41 -08:00
Steve Yen	4dd64b68fa	scorch zap TestMergeWithEmptySegment(s)	2018-01-30 22:27:40 -08:00
Steve Yen	684ee3c0e7	scorch zap DictIterator term count fixed and more merge unit tests The zap DictionaryIterator Next() was incorrectly returning the postingsList offset as the term count. As part of this, refactored out a PostingsList.read() helper method. Also added more merge unit test scenarios, including merging a segment for a few rounds to see if there are differences before/after merging.	2018-01-30 21:22:06 -08:00
Steve Yen	634cfa0560	scorch zap chunkedIntCoder optimization to prealloc some final buf	2018-01-29 11:03:53 -08:00
Steve Yen	a444c25ddf	scorch zap merge uses array for docTermMap with no sorting Instead of sorting docNum keys from a hashmap, this change instead iterates from docNum 0 to N and uses an array instead of hashmap. The array is also reused across outer loop iterations. This optimizes for when there's a lot of structural similarity between docs, where many/most docs have the same fields. i.e., beers, breweries. If every doc has completely different fields, then this change might produce worse behavior compared to the previous sparse hashmap approach.	2018-01-29 10:47:08 -08:00
Steve Yen	745575a6c1	scorch zap mergeStoredAndRemap uses array indexing, not append() Since we have right array size preallocated, we don't need the extra capacity checking of append().	2018-01-27 11:35:10 -08:00

1 2 3 4 5 ...

734 Commits