bleve

gibheer

bleve

Author	SHA1	Message	Date
Steve Yen	6f5f90cd41	scorch zap segment cleanup handling for some edge cases Two cases in this commit... If we're shutting down, the merger might not have handed off its latest merged segment to the introducer yet, so the merger still owns the segment and needs to Close() that segment itself. In persistSnapshot(), there migth be cases where the persister might not be able to swap in its newly persisted segments -- so, the persistSnapshot() needs to Close() those segments itself.	2018-02-08 14:04:04 -08:00
Steve Yen	83272a9629	scorch persistSnapshot() err handling & propagation	2018-02-08 14:03:59 -08:00
Steve Yen	dee6a2b1c6	scorch persistSnapshot() consistently uses err to commit vs abort Some codepaths in persistSnapshot() were saving errors into an err2 local variable, which might lead incorrectly to commit during an error situation rather than abort.	2018-02-08 14:02:35 -08:00
Steve Yen	91ac0d011a	scorch uses segment.id to encode boltdb sub-bucket key fixes #764	2018-02-08 13:25:16 -08:00
Steve Yen	8a7990427f	Merge pull request #765 from steveyen/more-TestIndexRollback-fixes fix for TestIndexRollback unit tests	2018-02-08 12:45:28 -08:00
Steve Yen	d0644fec12	scorch persistSnapshot comments update See also: https://github.com/blevesearch/bleve/issues/763	2018-02-08 12:22:58 -08:00
Steve Yen	99852accb0	scorch RollbackPoints() no error at start & fix TestIndexRollback When a scorch is just opened and is "empty", RollbackPoints() no longer considers that an error situation. Also, this commit makes the TestIndexRollback unit tests is a bit more forgiving to races, as we were seeing failures sometimes in travis-CI environments (TestIndexRollback was passing fine on my dev macbook). The theory is the double-looping in the persisterLoop would sometimes be racy, leading to 1 or 2 rollback points.	2018-02-08 11:45:25 -08:00
Steve Yen	ed4826b189	scorch zap merge optimization to byte-copy storedDocs The optimization to byte-copy all the storedDocs for a given segment during merging kicks in when the fields are the same across all segments and when there are no deletions for that given segment. This can happen, for example, during data loading or insert-only scenarios. As part of this commit, the Segment.copyStoredDocs() method was added, which uses a single Write() call to copy all the stored docs bytes of a segment to a writer in one shot. And, getDocStoredMetaAndCompressed() was refactored into a related helper function, getDocStoredOffsets(), which provides the storedDocs metadata (offsets & lengths) for a doc.	2018-02-08 09:08:35 -08:00
Steve Yen	0b50a20cac	scorch zap move docDropped const to earlier in file	2018-02-08 09:06:31 -08:00
Steve Yen	822457542e	scorch zap VERSION bump: check whether fields are the same at merge COMPATIBILITY NOTE: scorch zap version bumped in this commit. The version bump is because mergeFields() now computes whether fields are the same across segments and it relies on the previous commit where fieldID's are assigned in field name sorted order (albeit with _id field always having fieldID of 0). Potential future commits might rely on this info that "fields are the same across segments" for more optimizations, etc.	2018-02-08 09:06:30 -08:00
Steve Yen	ffdeb8055e	scorch sorts fields by name to assign fieldID's This is a stepping stone to allow easier future comparisons of field maps and potential merge optimizations. In bleve-blast tests on a 2015 macbook (50K wikipedia docs, 8 indexers, batch size 100, ssd), this does not seem to have a distinct effect on indexing throughput.	2018-02-08 09:06:30 -08:00
Marty Schoch	1af90936c4	Merge pull request #751 from sreekanth-cb/merger_persister_handshake_fix fix for merger persister handshake stalemate	2018-02-08 11:03:01 -05:00
Marty Schoch	0bcfb15ace	Merge pull request #754 from sreekanth-cb/mergeplan_edge_tuning tuning the edge for merge-task execution loop	2018-02-08 10:59:03 -05:00
Marty Schoch	534bd5ef4d	Merge pull request #753 from steveyen/zap-rollback-test-fixes scorch zap TestIndexRollback fixes	2018-02-08 10:57:41 -05:00
Marty Schoch	f531a248e7	Merge pull request #749 from sreekanth-cb/zapfile_cleanup_fix unblock the files for clean up, esp for merged new segment files	2018-02-08 10:53:41 -05:00
Sreekanth Sivasankaran	feecce1eb2	fix for merger persister handshake stalemate The slow merger was lagging behind the fast persister to a persister notify send-loop while the persister awaits for any new introductions from introducer totally blocking the merger This fix along with the deleted files eligibilty flipping makes the file count to around 6 to 11 files per shard for both travel and beer samples	2018-02-08 11:00:21 +05:30
Steve Yen	a83ee0f364	scorch zap.MergeToWriter() takes SegmentBases instead of Segments This change turns zap.MergeToWriter() into a public func, so that it's now directly callable from outside packages (such as from scorch's top-level merger or persister). And, MergerToWriter() now takes input of SegmentBases instead of Segments, so that it can now work on either in-memory zap segments or file-based zap segments. This is yet another stepping stone towards in-memory merging of zap segments.	2018-02-07 14:38:13 -08:00
Steve Yen	8c2520d55c	scorch zap optimize via postingsList reuse pprof graphs were showing many postingsList allocations during merging, so this change optimizes by reusing postingList memory in the merging loops.	2018-02-07 14:33:20 -08:00
Steve Yen	03c8b2b7ec	scorch mem segment optimizes DictEntry's across Next() calls This change optimizes the scorch/mem DictionaryIterator by reusing a DictEntry struct across multiple Next() calls. This follows the same optimization trick and Next() semantics as upsidedown's FieldDict implementation.	2018-02-07 14:17:48 -08:00
Steve Yen	0dfd73d6cc	scorch zap mergeStoredAndRemap loop optimization This change avoids an array/slice access in a loop body.	2018-02-06 17:10:44 -08:00
Steve Yen	eb1d269521	Merge pull request #748 from steveyen/master scorch zap merge related refactorings / optimizations	2018-02-06 07:52:17 -08:00
Sreekanth Sivasankaran	07274c036d	tuning the edge for merge-task execution loop Adjusting the merge task creation loop to accommodate the newly merged segments so that the eventual merge results/ number of segments stay within the calculated budget.	2018-02-06 13:48:16 +05:30
Steve Yen	a280ba7cf8	scorch zap TestIndexRollback fixes The TestIndexRollback unit test was failing more often than ever (perhaps raciness?), so this commit tries to remove avenues of raciness in the test... - The Scorch.Open() method is refactored into an Scorch.openBolt() helper method in order to allow unit tests to control which background goroutines are started. - TestIndexRollback() doesn't start the merger goroutine, to simulate a really slow merger that never gets around to merging old segments. - TestIndexRollback() creates a long-lived reader after the first batch, so that the first index snapshot isn't removed due to the long-lived reader's ref-count. - TestIndexRollback() temporarily bumps NumSnapshotsToKeep to a large number so the persister isn't tempted to removeOldData() that we're trying to rollback to.	2018-02-05 12:23:58 -08:00
Steve Yen	fdb240f5f9	more zap merge-planner CalcBudget tests at larger sizes Helps provide a sense of how # of segments grows as # of documents grows. Ex: 1B docs => budget of 54 segments.	2018-02-05 10:02:47 -08:00
Steve Yen	c09e2a08ca	scorch zap chunkedContentCoder reuses chunk metadata slice memory And, renamed the chunk MetaData.DocID field to DocNum for naming correctness, where much of this commit is the mechanical effect of that rename.	2018-02-05 07:39:16 -08:00
Steve Yen	3da191852d	scorch zap tighten up prepareSegment()'s lock area	2018-02-05 07:39:16 -08:00
Steve Yen	6578655758	scorch zap refactored out mergeToWriter() func This is a step towards supporting in-memory zap segment merging.	2018-02-05 07:39:16 -08:00
Steve Yen	eb21bf8315	scorch zap merge & build share persistStoredFieldValues() Refactored out a helper func, persistStoredFieldValues(), that both the persistence and merge codepaths now share.	2018-02-05 07:38:55 -08:00
Sreekanth Sivasankaran	9636209ae5	Update persister.go comment updated	2018-02-05 20:49:30 +05:30
Sreekanth Sivasankaran	678c412157	unblock the files for clean up, esp for merged new segment files	2018-02-02 14:44:02 +05:30
Steve Yen	714f5321e0	scorch zap merge storedFieldVals inner loop optimization	2018-02-01 16:28:15 -08:00
Steve Yen	175f80403a	Merge pull request #747 from steveyen/master scorch zap DictIterator term count fixed and more merge unit tests	2018-02-01 10:13:18 -08:00
Abhinav Dangeti	c24f8944c4	Merge pull request #738 from abhinavdangeti/scorch-stats Add support for certain disk stats	2018-02-01 08:35:59 -08:00
Steve Yen	93b037cdbb	scorch zap TestMergeWithUpdates()	2018-01-31 11:44:41 -08:00
Steve Yen	4dd64b68fa	scorch zap TestMergeWithEmptySegment(s)	2018-01-30 22:27:40 -08:00
Steve Yen	684ee3c0e7	scorch zap DictIterator term count fixed and more merge unit tests The zap DictionaryIterator Next() was incorrectly returning the postingsList offset as the term count. As part of this, refactored out a PostingsList.read() helper method. Also added more merge unit test scenarios, including merging a segment for a few rounds to see if there are differences before/after merging.	2018-01-30 21:22:06 -08:00
Steve Yen	634cfa0560	scorch zap chunkedIntCoder optimization to prealloc some final buf	2018-01-29 11:03:53 -08:00
Steve Yen	a444c25ddf	scorch zap merge uses array for docTermMap with no sorting Instead of sorting docNum keys from a hashmap, this change instead iterates from docNum 0 to N and uses an array instead of hashmap. The array is also reused across outer loop iterations. This optimizes for when there's a lot of structural similarity between docs, where many/most docs have the same fields. i.e., beers, breweries. If every doc has completely different fields, then this change might produce worse behavior compared to the previous sparse hashmap approach.	2018-01-29 10:47:08 -08:00
Steve Yen	745575a6c1	scorch zap mergeStoredAndRemap uses array indexing, not append() Since we have right array size preallocated, we don't need the extra capacity checking of append().	2018-01-27 11:35:10 -08:00
Steve Yen	8dd17a3b20	scorch zap mergeStoredAndRemap uses continue for less indentation	2018-01-27 11:35:10 -08:00
Steve Yen	0041664bc4	scorch zap merge computeNewDocCount() optimize 1 variable	2018-01-27 11:35:10 -08:00
Steve Yen	6985db13a0	scorch zap merge reuses docNumbers array	2018-01-27 11:35:10 -08:00
Steve Yen	916bbf4125	scorch zap merge prealloc's docTermMap capacity	2018-01-27 11:35:10 -08:00
Steve Yen	56cdb68f35	scorch zap merge checks err2 not err Also, optimize the appending of the termSeparator so that the docTermMap is accessed and updated just once.	2018-01-27 11:35:10 -08:00
Steve Yen	3030d4edb5	scorch zap merge preallocs segNewDocNums capacity	2018-01-27 11:35:10 -08:00
Steve Yen	9038d75c98	scorch zap allocate govarint.U64Base128Encoder just once Instead of allocating a govarint.U64Base128Encoder in the inner loop, allocate it just once on the outside, as it appears that it's just a thin wrapper around binary.PutUvarint().	2018-01-27 11:35:10 -08:00
Steve Yen	10dd5489c2	scorch zap Dict.postingsList() takes []byte for more mem control This allows callers that already have a []byte term to avoid string'ification garbage.	2018-01-27 11:35:10 -08:00
Steve Yen	6a17ff48c7	scorch zap removed uneeded []byte cast of term	2018-01-27 11:35:10 -08:00
Steve Yen	d389e2bb40	scorch zap merge file cleanup on error, and some minor prealloc's	2018-01-27 11:35:10 -08:00
Steve Yen	29d526a7c2	scorch zap merge uses DefaultChunkFactor	2018-01-27 11:35:10 -08:00
Steve Yen	603425c2c5	scorch zap mergerLoop missing fireAsyncError case	2018-01-27 11:35:10 -08:00
Steve Yen	37121c3b49	scorch zap writeRoaringWithLen optimized with reused bufs	2018-01-27 11:35:10 -08:00
Steve Yen	5a035dc9aa	scorch zap in-memory segment representation (SegmentBase) The zap SegmentBase struct is a refactoring of the zap Segment into the subset of fields that are needed for read-only ops, without any persistence related info. This allows us to use zap's optimized data encoding as scorch's in-memory segments. The zap Segment struct now embeds a zap SegmentBase struct, and layers on persistence. Both the zap Segment and zap SegmentBase implement scorch's Segment interface.	2018-01-27 11:35:10 -08:00
Steve Yen	dc62324e02	scorch zap miscellaneous typos	2018-01-27 11:35:10 -08:00
abhinavdangeti	567d756c27	Add support for certain disk stats + num_bytes_used_disk + num_files_on_disk	2018-01-24 14:10:14 -08:00
Steve Yen	34fd77709f	scorch unlocks in introduceSegment's DocNumbers() error codepath	2018-01-20 17:17:16 -08:00
abhinavdangeti	1176c73a9c	Include overhead from data structures in segment's SizeInBytes + Account for all the overhead incurred from the data structures within mem.Segment and zap.Segment. - SizeOfMap = 8 - SizeOfPointer = 8 - SizeOfSlice = 24 - SizeOfString = 16 + Include overhead from certain new fields as well.	2018-01-17 11:11:44 -08:00
Steve Yen	71d6d1691b	scorch zap optimizations of inner loops and easy preallocs	2018-01-15 23:04:23 -08:00
Steve Yen	d682c85a7b	scorch mem segments uses backing array trick even more This change invokes make() only once per distinct type to allocate the large, contiguous backing arrays for the mem segment.	2018-01-15 19:17:39 -08:00
Steve Yen	0f19b542a3	scorch mem segment prealloc's Locfields/starts/ends/pos/arraypos This change preallocates more of the backing arrays for Locfields, Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem segment. On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch indexing throughput seems to improve from 15MB/sec to 20MB/sec after the recent series of preallocation changes.	2018-01-15 18:40:28 -08:00
Steve Yen	a84bd122d2	scorch mem segment preallocates sub-slices via # terms This change tracks the number of terms per posting list to preallocate the sub-slices for the Freqs & Norms.	2018-01-15 18:20:43 -08:00
Steve Yen	a4110d325c	scorch mem segment preallocates slices that are key'ed by postingId The scorch mem segment build phase uses the append() idiom to populate various slices that are keyed by postings list id's. These slices include... * Postings * PostingsLocs * Freqs * Norms * Locfields * Locstarts * Locends * Locpos * Locarraypos This change introduces an initialization step that preallocates those slices up-front, by assigning postings list id's to terms up-front. This change also has an additional effect of simplifying the processDocument() logic to no longer have to worry about a first-time initialization case, removing some duplicate'ish code.	2018-01-15 16:53:39 -08:00
Steve Yen	917c470791	scorch mem segment VisitDocument() accesses StoredTypes/Pos outside of loop	2018-01-15 11:54:46 -08:00
Steve Yen	e7bd6026eb	scorch mem segment preallocs docMap/fieldLens with capacity The first time through, startNumFields should be 0, where there ought to be more optimization assuming later docs have similar fields as the first doc.	2018-01-15 11:52:20 -08:00
Steve Yen	d777d7c365	scorch mem segment comments consistency	2018-01-15 11:08:21 -08:00
Marty Schoch	4e82a8a0ca	Merge pull request #726 from sreekanth-cb/docValue_configs DocValue Config, new API Changes	2018-01-10 18:11:18 -05:00
Sreekanth Sivasankaran	53aef2104e	fixing err handling in UTs, name changes	2018-01-10 22:00:26 +05:30
abhinavdangeti	43bfcc00c9	Do not account mmap'ed part of zap segments in MemoryUsed This API is designed to only emit the dirty "unpersisted" bytes only. This does not included the mmap'ed part in the zap segments (disk).	2018-01-09 09:43:53 -08:00
Sreekanth Sivasankaran	4c256f5669	DocValue Config, new API Changes -VisitableDocValueFields API for persisted DV field list -making dv configs overridable at field level -enabling on the fly/runtime un inverting of doc values -few UT updates	2018-01-08 10:58:33 +05:30
Marty Schoch	1788a03803	remove junk from end of scorch readme	2018-01-06 21:09:53 -05:00
Marty Schoch	e756c7acf0	add initial support for async error callback	2018-01-05 16:43:16 -05:00
Marty Schoch	6237479605	fix race condition in setting up event callbacks previous approach used SetEventCallback method which allowed you to change the callback, unfotunately that also included times after the goroutines were started and potentially firing the callback. checking lock on this would be too expensive, so instead we go for an approach that allows callbacks to be registered by name during process init(), then upon opening up an index a string config key 'eventCallbackName' is used to look up the appropriate callback function. also, since this string config name is serializable, it fits into the existing bleve index metadata without any new issues.	2018-01-05 13:46:03 -05:00
Marty Schoch	57a075afdb	improving command-line tool for scorch	2018-01-05 11:50:07 -05:00
Marty Schoch	c691cd2bb5	refactor scorch/zap command-line tools under bleve zap command-line tool added to main bleve command-line tool this required physical relocation due to the vendoring used only on the bleve command-line tool (unforseen limitation) a new scorch command-line tool has also been introduced and for the same reasons it is physically store under the top-level bleve command-line tool as well	2018-01-05 10:17:18 -05:00
Abhinav Dangeti	dee1dd9bc8	Merge pull request #720 from abhinavdangeti/scorch Updated Rollback APIs	2018-01-04 14:51:33 -08:00
abhinavdangeti	111f0d0721	Updated Rollback APIs New APIs: + RollbackPoints() - Retrieves the available list of rollback points: epoch+meta. - The application will need to check with the meta to decide on the rollback point. + Rollback() - API requires a rollback point identified by the first API. - Atomically & Durably rolls back the index to specified point, provided the specified rollback point is still available. + Unit test: TestIndexRollback - Writes a batch. - Sets the rollback point. - Writes second batch. - Rollback to previously decided point. - Ensure that data is as is before the second batch.	2018-01-04 13:21:58 -08:00
Marty Schoch	71cdac785d	Merge pull request #703 from sreekanth-cb/docValue_persisted docValue persist changes	2018-01-04 10:34:58 -05:00
Sreekanth Sivasankaran	71a726bbf6	perf issue was due to duplicate fieldIDs getting inserted to the list of dv enabled fields list - DocValueFields in mem segment. Moved back to the original type `DocValueFields map[uint16]bool` for easy look up to check whether the fieldID is configured for dv storage.	2018-01-04 15:34:55 +05:30
Sreekanth Sivasankaran	f42ecb0ac7	docvalue "zap-path" cmd to print out the dv disk sizes	2018-01-04 13:58:51 +05:30
Marty Schoch	1a59a1bb99	attempt to fix core reference counting issues Observed problem: Persisted index state (in root bolt) would contain index snapshots which pointed to index files that did not exist. Debugging this uncovered two main problems: 1. At the end of persisting a snapshot, the persister creates a new index snapshot with the SAME epoch as the current root, only it replaces in-memory segments with the new disk based ones. This is problematic because reference counting an index segment triggers "eligible for deletion". And eligible for deletion is keyed by epoch. So having two separate instances going by the same epoch is problematic. Specifically, one of them gets to 0 before the other, and we wrongly conclude it's eligible for deletion, when in fact the "other" instance with same epoch is actually still in use. To address this problem, we have modified the behavior of the persister. Now, upon completion of persistence, ONLY if new files were actually created do we proceed to introduce a new snapshot. AND, this new snapshot now gets it's own brand new epoch. BOTH of these are important because since the persister now also introduces a new epoch, it will see this epoch again in the future AND be expected to persist it. That is OK (mostly harmless), but we cannot allow it to form a loop. Checking that new files were actually introduced is what short-circuits the potential loop. The new epoch introduced by the persister, if seen again will not have any new segments that actually need persisting to disk, and the cycle is stopped. 2. The implementation of NumSnapshotsToKeep, and related code to deleted old snapshots from the root bolt also contains problems. Specifically, the determination of which snapshots to keep vs delete did not consider which ones were actually persisted. So, lets say you had set NumSnapshotsToKeep to 3, if the introducer gets 3 snapshots ahead of the persister, what can happen is that the three snapshots we choose to keep are all in memory. We now wrongly delete all of the snapshots from the root bolt. But it gets worse, in this instant of time, we now have files on disk that nothing in the root bolt points to, so we also go ahead and delete those files. Those files were still being referenced by the in-memory snapshots. But, now even if they get persisted to disk, they simply have references to non-existent files. Opening up one of these indexes results in lost data (often everything). To address this problem, we made large change to the way this section of code operates. First, we now start with a list of all epochs actually persisted in the root bolt. Second, we set aside NumSnapshotsToKeep of these snapshots to keep. Third, anything else in the eligibleForRemoval list will be deleted. I suspect this code is slower and less elegant, but I think it is more correct. Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to 1, which feels like saner out-of-the-box behavior (though it's debatable if the original intent was perhaps instead for "extra" snapshots to keep, but with the variable named as it is, 1 makes more sense to me) Other minor changes included in this change: - Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and 'ineligibleForRemoval' members of Scorch struct were moved into the paragraph with 'rootLock' to clarify that you must hold the lock to access it. - TestBatchRaceBug260 was updated to properly Close() the index, which leads to occasional test failures.	2018-01-03 12:05:00 -05:00
Sreekanth Sivasankaran	448201243a	removed redundant buf writer, and checks	2017-12-30 16:54:06 +05:30
Sreekanth Sivasankaran	61ba81e964	Merge branch 'scorch', remote-tracking branch 'origin' into docValue_persisted	2017-12-30 16:52:51 +05:30
Marty Schoch	29b63cfe43	Merge pull request #711 from abhinavdangeti/scorch3 Tracking memory consumption for a scorch index	2017-12-29 12:52:32 -08:00
abhinavdangeti	5c26f5a86d	Tracking memory consumption for a scorch index + Track memory usage at a segment level + Add a new scorch API: MemoryUsed() - Aggregate the memory consumption across segments when API is invoked. + TODO: - Revisit the second iteration if it can be gotten rid off, and the size accounted for during the first run while building an in-mem segment. - Accounting for pointer and slice overhead.	2017-12-29 10:20:11 -07:00
abhinavdangeti	055d3e12df	Adding onEvent callback support for scorch Event types: - EventKindCloseStart - EventKindClose - EventKindMergerProgress - EventKindPersisterProgress - EventKindBatchIntroductionStart - EventKindBatchIntroduction	2017-12-29 09:47:25 -07:00
Sreekanth Sivasankaran	c8df014c0c	Updated readme, zap version, added new docvalue cmd, fixed the footer and fields cmd, interface name updated	2017-12-29 21:39:29 +05:30
abhinavdangeti	4bede84fd0	Wiring up missing stats for scorch - updates, deletes, batches, errors - term_searchers_started, term_searchers_finished - num_plain_test_bytes_indexed	2017-12-28 14:07:58 -07:00
abhinavdangeti	becd4677cd	Adding num_items_introduced, num_items_persisted stats + Adding new entries to the stats struct of scorch. + These stats are atomically incremented upon every segment introduction, and upon successful persistence.	2017-12-28 14:07:44 -07:00
Sreekanth Sivasankaran	8abac42796	errCheck fixes	2017-12-28 13:23:57 +05:30
Sreekanth Sivasankaran	0272451093	adding checks for robustness	2017-12-28 13:05:25 +05:30
Sreekanth Sivasankaran	76f827f469	docValue persist changes docValues are persisted along with the index, in a columnar fashion per field with variable sized chunking for quick look up. -naive chunk level caching is added per field -data part inside a chunk is snappy compressed -metaHeader inside the chunk index the dv values inside the uncompressed data part -all the fields are docValue persisted in this iteration	2017-12-28 12:05:33 +05:30
abhinavdangeti	dcabc267a0	Wait for rollback'ed snapshot to persist	2017-12-27 10:06:29 -07:00
Steve Yen	c7a342bc7d	scorch conjuncts match phrase test passes The conjunction searcher Advance() method now checks if its curr doc-matches suffices before advancing them.	2017-12-23 09:19:40 -08:00
Steve Yen	a884f38bf6	scorch docInternalToNumber returns 0 on error	2017-12-21 16:44:31 -08:00
Steve Yen	67e0e5973b	scorch mergeStoredAndRemap() memory reuse In mergeStoredAndRemap(), instead of allocating new hashmaps for each document, this commit reuses some arrays that are indexed by fieldId.	2017-12-20 15:18:22 -08:00
Steve Yen	c155255506	scorch optimize zap.Merge() to reuse some buffers	2017-12-20 14:59:53 -08:00
Steve Yen	ea4eb7301b	scorch merger checks closeCh	2017-12-20 14:59:53 -08:00
Steve Yen	04ac9d5b1f	scorch removeOldBoltSnapshots() deletes from correct bucket	2017-12-20 14:46:48 -08:00
Steve Yen	df6c8f4074	scorch added kvconfig unsafe_batch option Added an option to the kvconfig JSON, called "unsafe_batch" (bool). Default is false, so Batch() calls are synchronously persisted by default. Advanced users may want to unsafe, asynchronous persistence to tradeoff performance (mutations are queryable sooner) over safety. { "index_type": "scorch", "kvconfig": { "unsafe_batch": true } } This change replaces the previous kvstore=="moss" workaround.	2017-12-20 10:11:55 -08:00
Steve Yen	1abbfadf0d	scorch simplify err check after vellum load	2017-12-19 22:34:39 -08:00
Steve Yen	dbc88cf6b3	scorch docNumberToBytes() checks cap(buf) before allocating With more pprof focusing (zooming in on a particular func), there were still some memory allocations showing up with docNumberToBytes() in micro benchmarks of bleve-query. On a dev macbook, on an index of 50K wikipedia docs, using search of relatively common "text:date"... 400 qps - upsidedown/moss 680 qps - scorch before 775 qps - scorch after	2017-12-19 19:15:19 -08:00
Steve Yen	8f8333e01b	scorch optimize zap Count() This proposed approach avoids building a temporary AndNot() bitmap, following the same kind of optimization used by mem segments.	2017-12-19 18:02:27 -08:00
Steve Yen	a0556ad65b	scorch added more cases to TestIndexInsertThenDelete	2017-12-19 16:41:56 -08:00
Steve Yen	142ccdfaec	scorch remove leftover doc comment I'm suspecting that Marty's editor is more exciting than mine. :-)	2017-12-19 13:53:04 -08:00
Steve Yen	c0e09d8906	Merge pull request #676 from steveyen/scorch scorch avoid extra clone by using roaring.AndNot(x, y)	2017-12-19 13:52:40 -08:00
Steve Yen	f8b52f5e68	Merge pull request #674 from abhinavdangeti/scorch scorch APIs to support rollback	2017-12-19 13:38:47 -08:00
Steve Yen	d0e4f85026	scorch avoid extra clone by using roaring.AndNot(x, y)	2017-12-19 13:37:04 -08:00
abhinavdangeti	679f1ce9c3	scorch APIs to support rollback - PreviousPersistedSnapshot - SnapshotRevert + unit test	2017-12-19 10:53:08 -08:00
Steve Yen	f6b506134b	import couchbase/vellum instead of couchbaselabs/vellum Also, scrubbed an old couchbaselabs/moss reference in comments. Also, go fmt.	2017-12-19 10:49:57 -08:00
Steve Yen	730d906a50	scorch reuses Posting instance in PostingsIterator.Next() With this change, there are no more memory allocations in the calls to PostingsIterator.Next() in the micro benchmarks of bleve-query. On a dev macbook, on an index of 50K wikipedia docs, using high frequency search of "text:date"... 400 qps - upsidedown/moss 565 qps - scorch before 680 qps - scorch after	2017-12-18 16:15:38 -08:00
Steve Yen	867bb2c031	scorch mergeplan explicitly weeds out empty segments Rather than waiting on scoring to weed out empty segments, this commit does the weeding out of empty segments explicitly and up front.	2017-12-18 11:33:19 -08:00
Steve Yen	20fe70770a	scorch added some tests on # of expected segments	2017-12-17 12:39:15 -08:00
Steve Yen	34f5e2175f	scorch fix persister for lost notifications on no-data batches With the previous commit, there can be a scenario where batches that had internal-updates-only can be rapidly introduced by the app, but the persisted notifications on only the very last IndexSnapshot would be fired. The persisted notifications on the in-between batches might be missed. The solution was to track the persisted notification channels at a higher Scorch struct level, instead of tracking the persisted channels at the IndexSnapshot and SegmentSnapshot levels. Also, the persister double-check looping was simplified, which avoids a race where an introducer might incorrectly not notify the persister.	2017-12-17 12:30:05 -08:00
Steve Yen	ecbb3d2df4	scorch handles non-updating batches better This commit improves handling when an incoming batch has internal-data updates only and no doc updates. In this case, a nil segment instead of an empty segment instance is used in the segmentIntroduction. The segmentIntroduction, that is, might now hold only internal-data updates only. To handle synchronous persistence, a new field that's a slice of persisted notification channels is added to the IndexSnapshot struct, which the persister goroutine will close as each IndexSnapshot is persisted. Also, as part of this change, instead of checking the unsafeBatch flag in several places, we instead check for non-nil'ness of these persisted chan's.	2017-12-17 08:51:23 -08:00
Steve Yen	e98602600d	scorch mergeplan added TierGrowth option Previously, CalcBudget() was treating MergePlanOptions.SegmentsPerMergeTask as the growth factor while computing the idealized staircase of segments. This change introduces a TierGrowth option to MergePlanOptions for more control and so that SegmentsPerMergeTask can be tweaked independently of the tier growth factor.	2017-12-16 14:22:15 -08:00
Steve Yen	0539744e90	scorch mergeplan.ToBarChart() refactored to callable API Refactored out API so it's usable from other places.	2017-12-16 08:39:10 -08:00
Steve Yen	dc4df18001	Merge pull request #662 from steveyen/scorch scorch mergeplan package comments tweak	2017-12-15 18:41:20 -08:00
Marty Schoch	a575be4d56	fix issue where we incorrectly seed the nextSegmentID on Open()	2017-12-15 19:26:23 -05:00
Steve Yen	45c212a0c2	scorch mergeplan package comments tweak Moving the package comment for mergeplan to the right place.	2017-12-15 13:25:39 -08:00
Steve Yen	620dcdb6f8	scorch uses prealloc'ed buffer for docNumberToBytes() On a couple of micro benchmarks on a dev macbook using bleve-query on an index of 50K wikipedia docs, scorch is now faster than upsidedown/moss on high-freq term search "text:date"... 400 qps - upsidedown/moss 404 qps - scorch before 565 qps - scorch after	2017-12-15 11:58:21 -08:00
Steve Yen	f05794c6aa	scorch removed worker goroutines from TermFieldReader() On a couple of micro benchmarks on a dev macbook using bleve-query on an index of 50K wikipedia docs, scorch is now in more the same neighborhood of upsidedown/moss... high-freq term search "text:date"... 400 qps - upsidedown/moss 360 qps - scorch before 404 qps - scorch after zero-freq term search "text:mschoch"... 100K qps - upsidedown/moss 55K qps - scorch before 99K qps - scorch after Of note, the scorch index had ~150 *.zap files in it, which likely made made the worker goroutine overhead more costly than for a case with few segments, where goroutine and channel related work appeared relatively prominently in the pprof SVG's.	2017-12-15 11:11:18 -08:00
Marty Schoch	562b473e36	Merge pull request #657 from steveyen/scorch scorch fix data race w/ AddEligibleForRemoval	2017-12-14 17:56:06 -05:00
Marty Schoch	b5aa4ed22b	return err not panic	2017-12-14 17:41:02 -05:00
Steve Yen	506aa1c325	scorch fix data race w/ AddEligibleForRemoval Found from "go test -race ./..." WARNING: DATA RACE Read at 0x00c420088060 by goroutine 48: github.com/blevesearch/bleve/index/scorch.(Scorch).AddEligibleForRemoval() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:348 +0x6d Previous write at 0x00c420088060 by goroutine 31: github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt.func1() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:332 +0x87b github.com/boltdb/bolt.(DB).View() /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1 github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1 github.com/blevesearch/bleve/index/scorch.(Scorch).Open() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351 testing.tRunner() /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c Goroutine 48 (running) created at: github.com/blevesearch/bleve/index/scorch.(IndexSnapshot).DecRef() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/snapshot_index.go:72 +0x23e github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt.func1() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:330 +0x8f4 github.com/boltdb/bolt.(DB).View() /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1 github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1 github.com/blevesearch/bleve/index/scorch.(Scorch).Open() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351 testing.tRunner() /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c	2017-12-14 14:40:33 -08:00
Marty Schoch	6ab27e4afa	quick hack to disable safe batches in fts	2017-12-14 17:19:50 -05:00
Steve Yen	eb2f541d4f	scorch filters _id from Reader.Document() results	2017-12-14 13:52:28 -08:00
Steve Yen	a8884e1011	scorch fix for TestSortMatchSearch The cachedDocs preparation has to happen for all docs in the field, not just on the currently requested docNum. Also, as part of this commit, there's a loop optimization where we no longer use bytes.Split() on the terms buffer, thus avoiding garbage creation.	2017-12-14 13:22:13 -08:00
Steve Yen	2be5eb4427	scorch tracks zap files that can't be removed yet A race & solution found by Marty Schoch... consider a case when the merger might grab a nextSegmentID, like 4, but takes awhile to complete. Meanwhile, the persister grabs the nextSegmentID of 5, but finishes its persistence work fast, and then loops to cleanup any old files. The simple approach of checking a "highest segment ID" of 5 is wrong now, because the deleter now thinks that segment 4's zap file is (incorrectly) ok to delete. The solution in this commit is to track an ephemeral map of filenames which are ineligibleForRemoval, because they're still being written (by the merger) and haven't been fully incorporated into the rootBolt yet. The merger adds to that ineligibleForRemoval map as it starts a merged zap file, the persister cleans up entries from that map when it persists zap filenames into the rootBolt, and the deleter (part of the persister's loop) consults the map before performing any actual zap file deletions.	2017-12-14 10:49:33 -08:00
Marty Schoch	bd742caf65	don't try to close a nil segment if err opening	2017-12-14 10:29:19 -05:00
Marty Schoch	149a26b5c1	merge deletion and cacheddocs fixes discussed in meeting	2017-12-14 10:27:39 -05:00
Sreekanth Sivasankaran	95b65ade3e	getting right internalID for doc in UT	2017-12-14 17:16:47 +05:30
Sreekanth Sivasankaran	1066ee7d22	DocumentVisitFieldTerms Scorch implementation level1	2017-12-14 12:38:29 +05:30
Marty Schoch	2b92e5ff99	Merge pull request #653 from steveyen/scorch scorch cleanup of the rootBolt of old snapshots	2017-12-13 22:47:14 -05:00
Marty Schoch	e1b0c61e2a	fix bug in handling iterator-done	2017-12-13 22:08:06 -05:00
Steve Yen	b7dff6669f	scorch cleanup of *.zap files not listed in the rootBolt	2017-12-13 17:09:50 -08:00
Steve Yen	c0cc46a2be	scorch cleanup of the rootBolt of old snapshots A new global variable, NumSnapshotsToKeep, represents the default number of old snapshots that each scorch instance should maintain -- 0 is the default. Apps that need rollback'ability may want to increase this value in early initialization. The Scorch.eligibleForRemoval field tracks epoches which are safe to delete from the rootBolt. The eligibleForRemoval is appended to whenever the ref-count on an IndexSnapshot drops to 0. On startup, eligibleForRemoval is also initialized with any older epoch's found in the rootBolt. The newly introduced Scorch.removeOldSnapshots() method is called on every cycle of the persisterLoop(), where it maintains the eligibleForRemoval slice to under a size defined by the NumSnapshotsToKeep. A future commit will remove actual storage files in order to match the "source of truth" information found in the rootBolt.	2017-12-13 15:53:31 -08:00
Steve Yen	c13ff85aaf	scorch ref-counting Future commits will provide actual cleanup when ref-counts reach 0.	2017-12-13 14:48:07 -08:00
Marty Schoch	50471003dc	basic refactoring of introducer to make it more readable	2017-12-13 16:30:39 -05:00
Marty Schoch	a0e12b2640	add license to a few files missing it	2017-12-13 16:12:29 -05:00
Marty Schoch	85e15628ee	major refactoring of posting details	2017-12-13 16:10:06 -05:00
Marty Schoch	6e2207c445	additional refactoring of build/merge	2017-12-13 15:22:13 -05:00
Marty Schoch	50441e5065	refactor to reuse shared code	2017-12-13 14:41:20 -05:00
Marty Schoch	289dc398bd	more refacotring of build/merge	2017-12-13 14:26:11 -05:00
Marty Schoch	1cd3fd7fbe	extrac common functionality between build/merge	2017-12-13 14:06:54 -05:00
Marty Schoch	cd45487cb3	fsync rootBolt when persisting snapshot	2017-12-13 13:55:06 -05:00
Marty Schoch	f83c9f2a20	initial cut of merger that actually introduces changes	2017-12-13 13:41:03 -05:00
Marty Schoch	c15c3c11cd	extra protection if dict address is 0 (empty segment)	2017-12-13 13:31:18 -05:00
Steve Yen	be7dd36ac6	mergeplan: more tests and bargraph tweaks	2017-12-12 10:37:27 -08:00
Steve Yen	59a1e26300	mergeplan: scoring implemented	2017-12-12 10:37:27 -08:00
Marty Schoch	57121e40a8	fix issues identified by errcheck	2017-12-12 11:41:14 -05:00
Marty Schoch	665c3c80ff	initial cut of zap segment merging	2017-12-12 11:21:55 -05:00
Marty Schoch	927216df8c	fix postings list count impl	2017-12-12 08:42:13 -05:00
Steve Yen	3461fb741f	mergeplan: a placeholder planner that merges all segments A stepping stone to fleshing out the API contract.	2017-12-11 14:53:08 -08:00
Marty Schoch	58ef21a88a	fix golint issue	2017-12-11 16:24:46 -05:00
Marty Schoch	f246e0e4c0	update README for zap file format changes	2017-12-11 16:22:29 -05:00
Marty Schoch	74b2eeb14d	refactor where we do some work so we can return error	2017-12-11 15:59:36 -05:00
Marty Schoch	f13b786609	fix up issues to get all bleve unit tests passing for scorch make scorch default	2017-12-11 15:47:41 -05:00
Marty Schoch	d7eb223e14	remove bolt segment format upcomning breaking changes and no desire to maintain	2017-12-11 10:20:26 -05:00
Marty Schoch	eada7b209b	fix test issue identified by sreekanth	2017-12-11 10:16:56 -05:00
Marty Schoch	8280859bb8	handle read-only and in-mem only cases	2017-12-11 09:07:01 -05:00
Marty Schoch	e8cc7ac0bf	add new fields command to zap cmd-line util	2017-12-11 09:05:50 -05:00
Marty Schoch	690cd39921	add crazy slow but functional DocumentVisitFieldTerms	2017-12-10 08:55:59 -05:00
Marty Schoch	dc0adc8827	add fsync	2017-12-09 20:52:01 -05:00
Marty Schoch	e0d9828cd0	add more detail to the readme	2017-12-09 14:42:36 -05:00
Marty Schoch	414899618b	switch from bolt format to zap in the persister	2017-12-09 14:28:50 -05:00
Marty Schoch	9781d9b089	add initial version of zap file format	2017-12-09 14:28:33 -05:00
Marty Schoch	ff2e6b98e4	added empty segment	2017-12-09 12:43:02 -05:00
Marty Schoch	e470105635	fix issues identified by errcheck	2017-12-06 18:36:14 -05:00
Marty Schoch	adac4f41db	initial version of scorch which persists index to disk	2017-12-06 18:33:47 -05:00
Marty Schoch	b1346b4c8a	add readme describing our use of bolt as a segment format	2017-12-05 16:09:00 -05:00
Marty Schoch	898a6b1e85	fix errcheck issues	2017-12-05 13:32:57 -05:00
Marty Schoch	ece27ef215	adding initial version of bolt persisted segment	2017-12-05 13:05:12 -05:00
Marty Schoch	f6be841668	add test for postings list count method	2017-12-05 13:01:36 -05:00
Marty Schoch	30e9d6daa5	add better testing of array positions	2017-12-05 12:54:44 -05:00
Marty Schoch	8d9d45115f	add test of location field	2017-12-05 12:20:06 -05:00
Marty Schoch	8f0350865b	add test for segment fields method	2017-12-05 12:17:56 -05:00
Marty Schoch	7a6b5483f2	add validation that all locations were seen	2017-12-05 11:58:05 -05:00
Marty Schoch	e08fdab54a	remove todo item	2017-12-05 10:13:27 -05:00
Marty Schoch	87e2627551	added dictionary tests to mem segment	2017-12-05 09:49:41 -05:00
Marty Schoch	ed067f45dd	added Close() method to Segment	2017-12-05 09:31:02 -05:00
Marty Schoch	22ffc8940e	update segment API to return error in key places	2017-12-04 18:06:06 -05:00
Marty Schoch	b74cf4b081	add copyright header to all new files in scorch	2017-12-01 15:42:50 -05:00
Marty Schoch	89aa02cf5b	fix highlighting of composite fields updated log statements for refactored names	2017-12-01 15:12:08 -05:00
Marty Schoch	cff14f1212	fix crash in DocNumbers when segment is empty	2017-12-01 09:50:27 -05:00
Marty Schoch	eb256f78bc	switch to constant referring to id field id 0 this avoids potentially mutating something that is intended to be immutable	2017-12-01 09:30:07 -05:00
Marty Schoch	7c964de8bf	switch to binary search for finding segment from global doc num added unit tests for this function specifically	2017-12-01 09:26:51 -05:00
Marty Schoch	c2047dcdf9	refactor doc id reader creation to share more code fix issue identified by steve	2017-12-01 08:54:39 -05:00
Marty Schoch	bcd4bdc3d1	added initial bolt thought to README	2017-12-01 07:27:04 -05:00
Marty Schoch	395458ce83	refactor to make mem segment contents exported	2017-12-01 07:26:47 -05:00
Steve Yen	398dcb19b3	scorch introducer uses the roaring.Or(x, y) API Instead of cloning an input bitmap, the roaring.Or(x, y) implementation fills a brand new result bitmap, which should be allow for more efficient packing and memory utilization.	2017-11-30 10:37:10 -08:00
Steve Yen	67986d41bf	scorch InternalID() handles case of unknown docId	2017-11-30 08:36:01 -08:00
Marty Schoch	848aca4639	fix issues identified by errcheck	2017-11-29 13:34:15 -05:00
Marty Schoch	23f6dc1cc6	working in-memory version	2017-11-29 11:33:35 -05:00
Steve Yen	546700b2de	fix comment typo	2017-08-24 16:25:10 -07:00
Marty Schoch	cea119449e	fix data race in doc id search the implementation of the doc id search requires that the list of ids be sorted. however, when doing a multisearch across many indexes at once, the list of doc ids in the query is shared. deeper in the implementation, the search of each shard attempts to sort this list, resulting in a data race. this is one example of a potentially larger problem, however it has been decided to fix this data race, even though larger issues of data owernship may remain unresolved. this fix makes a copy of the list of doc ids, just prior to sorting the list. subsequently, all use of the list is on the copy that was made, not the original. fixes #518	2017-08-07 15:11:35 -04:00
abhinavdangeti	8ec88a6cb0	MB-24560: Add moss store\|collection histograms to stats	2017-05-25 16:32:36 -07:00
Marty Schoch	3ad13236ec	fix geopoint fields to be able to be stored and retrieved	2017-03-31 09:40:54 -04:00
Marty Schoch	74140d4f2b	remove forestdb from bleve	2017-03-30 12:27:23 -04:00
Marty Schoch	1bcfe4efa1	Merge pull request #546 from sreekanth-cb/store_abort_close Store abort close	2017-03-07 12:35:18 -05:00
Sreekanth Sivasankaran	f759d841c2	Adding guards for config casting.	2017-03-07 22:51:27 +05:30

... 2 3 4 5 6 ...

773 Commits