bleve

Author	SHA1	Message	Date
abhinavdangeti	111f0d0721	Updated Rollback APIs New APIs: + RollbackPoints() - Retrieves the available list of rollback points: epoch+meta. - The application will need to check with the meta to decide on the rollback point. + Rollback() - API requires a rollback point identified by the first API. - Atomically & Durably rolls back the index to specified point, provided the specified rollback point is still available. + Unit test: TestIndexRollback - Writes a batch. - Sets the rollback point. - Writes second batch. - Rollback to previously decided point. - Ensure that data is as is before the second batch.	2018-01-04 13:21:58 -08:00
Marty Schoch	6ad679bbb5	Merge pull request #717 from mschoch/scorch-fix-refcounting attempt to fix core reference counting issues	2018-01-03 09:32:40 -08:00
Marty Schoch	1a59a1bb99	attempt to fix core reference counting issues Observed problem: Persisted index state (in root bolt) would contain index snapshots which pointed to index files that did not exist. Debugging this uncovered two main problems: 1. At the end of persisting a snapshot, the persister creates a new index snapshot with the SAME epoch as the current root, only it replaces in-memory segments with the new disk based ones. This is problematic because reference counting an index segment triggers "eligible for deletion". And eligible for deletion is keyed by epoch. So having two separate instances going by the same epoch is problematic. Specifically, one of them gets to 0 before the other, and we wrongly conclude it's eligible for deletion, when in fact the "other" instance with same epoch is actually still in use. To address this problem, we have modified the behavior of the persister. Now, upon completion of persistence, ONLY if new files were actually created do we proceed to introduce a new snapshot. AND, this new snapshot now gets it's own brand new epoch. BOTH of these are important because since the persister now also introduces a new epoch, it will see this epoch again in the future AND be expected to persist it. That is OK (mostly harmless), but we cannot allow it to form a loop. Checking that new files were actually introduced is what short-circuits the potential loop. The new epoch introduced by the persister, if seen again will not have any new segments that actually need persisting to disk, and the cycle is stopped. 2. The implementation of NumSnapshotsToKeep, and related code to deleted old snapshots from the root bolt also contains problems. Specifically, the determination of which snapshots to keep vs delete did not consider which ones were actually persisted. So, lets say you had set NumSnapshotsToKeep to 3, if the introducer gets 3 snapshots ahead of the persister, what can happen is that the three snapshots we choose to keep are all in memory. We now wrongly delete all of the snapshots from the root bolt. But it gets worse, in this instant of time, we now have files on disk that nothing in the root bolt points to, so we also go ahead and delete those files. Those files were still being referenced by the in-memory snapshots. But, now even if they get persisted to disk, they simply have references to non-existent files. Opening up one of these indexes results in lost data (often everything). To address this problem, we made large change to the way this section of code operates. First, we now start with a list of all epochs actually persisted in the root bolt. Second, we set aside NumSnapshotsToKeep of these snapshots to keep. Third, anything else in the eligibleForRemoval list will be deleted. I suspect this code is slower and less elegant, but I think it is more correct. Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to 1, which feels like saner out-of-the-box behavior (though it's debatable if the original intent was perhaps instead for "extra" snapshots to keep, but with the variable named as it is, 1 makes more sense to me) Other minor changes included in this change: - Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and 'ineligibleForRemoval' members of Scorch struct were moved into the paragraph with 'rootLock' to clarify that you must hold the lock to access it. - TestBatchRaceBug260 was updated to properly Close() the index, which leads to occasional test failures.	2018-01-03 12:05:00 -05:00
Marty Schoch	29b63cfe43	Merge pull request #711 from abhinavdangeti/scorch3 Tracking memory consumption for a scorch index	2017-12-29 12:52:32 -08:00
Marty Schoch	780c3e9c43	Merge pull request #710 from abhinavdangeti/scorch2 Adding onEvent callback support for scorch	2017-12-29 09:29:24 -08:00
abhinavdangeti	5c26f5a86d	Tracking memory consumption for a scorch index + Track memory usage at a segment level + Add a new scorch API: MemoryUsed() - Aggregate the memory consumption across segments when API is invoked. + TODO: - Revisit the second iteration if it can be gotten rid off, and the size accounted for during the first run while building an in-mem segment. - Accounting for pointer and slice overhead.	2017-12-29 10:20:11 -07:00
abhinavdangeti	055d3e12df	Adding onEvent callback support for scorch Event types: - EventKindCloseStart - EventKindClose - EventKindMergerProgress - EventKindPersisterProgress - EventKindBatchIntroductionStart - EventKindBatchIntroduction	2017-12-29 09:47:25 -07:00
Marty Schoch	a475ee886d	Merge pull request #705 from abhinavdangeti/scorch Scorch specific stats	2017-12-28 13:57:30 -08:00
abhinavdangeti	4bede84fd0	Wiring up missing stats for scorch - updates, deletes, batches, errors - term_searchers_started, term_searchers_finished - num_plain_test_bytes_indexed	2017-12-28 14:07:58 -07:00
abhinavdangeti	becd4677cd	Adding num_items_introduced, num_items_persisted stats + Adding new entries to the stats struct of scorch. + These stats are atomically incremented upon every segment introduction, and upon successful persistence.	2017-12-28 14:07:44 -07:00
Marty Schoch	fd91a1b4b1	Merge pull request #700 from mschoch/scorch-phrase phrase searcher don't allow advance after end	2017-12-28 07:00:09 -08:00
Marty Schoch	ee9cc24a6f	Merge pull request #699 from abhinavdangeti/scorch1 Wait for rollback'ed snapshot to persist	2017-12-28 06:51:26 -08:00
Marty Schoch	272da43c16	phrase searcher don't allow advance after end	2017-12-27 10:24:33 -08:00
abhinavdangeti	dcabc267a0	Wait for rollback'ed snapshot to persist	2017-12-27 10:06:29 -07:00
Marty Schoch	7afeb1ae1d	Merge pull request #692 from steveyen/scorch scorch conjuncts match phrase	2017-12-24 06:21:35 -05:00
Steve Yen	c7a342bc7d	scorch conjuncts match phrase test passes The conjunction searcher Advance() method now checks if its curr doc-matches suffices before advancing them.	2017-12-23 09:19:40 -08:00
Steve Yen	903e8797c7	Merge pull request #689 from steveyen/scorch MB-27291 - scorch compared to upsidedown/bolt using templated, generated searches	2017-12-21 18:36:02 -08:00
Steve Yen	d425a3be86	scorch fix disjunction searcher Advance() Found with "versus" test (TestScorchVersusUpsideDownBoltSmallMNSAM), which had a boolean query with a MustNot that was the same as the Must parameters. This replicates a situation found by Aruna/Mihir/testrunner/RQG (MB-27291). Example: "query": { "must_not": {"disjuncts": [ {"field": "body", "match": "hello"} ]}, "must": {"conjuncts": [ {"field": "body", "match": "hello"} ]} } The nested searchers along the MustNot pathway would end up looking roughly like... booleanSearcher MustNot => disjunctionSearcher => disjunctionSearcher => termSearcher On the first Next() call by the collector, the two disjunction searchers would run through their respective Next() method processing, which includes their initSearcher() processing on the first time. This has the effect of driving the leaf termSearcher through two Next() invocations. That is, if there were 3 docs (doc-1, doc-2, doc-3), the leaf termSearcher would at this point have moved to point to doc-3, while the topmost MustNot would have received doc-1. Next, the booleanSearcher's Must searcher would produce doc-2, so the booleanSearcher would try to Advance() the MustNot searcher to doc-2. But, in scorch, the leafmost termSearcher had already gotten past doc-2 and would return its doc-3. In upsidedown, in contrast, the leaf termSearcher would then drive the KVStore iterator with a Seek(doc-2), and the KVStore iterator would perform a backwards seek to reach doc-2. In scorch, however, backwards iteration seeking isn't supported. So, this fix checks the state of the disjunction searcher to see if we already have the necessary state so that we don't have to perform actual Advance()'es on the underlying searchers. This not only fixes the behavior w.r.t. scorch, but also can have an effect of potentially making upsidedown slightly faster as we're avoiding some backwards KVStore iterator seeks.	2017-12-21 18:20:04 -08:00
Steve Yen	93c787ca09	scorch versus_test.go passes errcheck	2017-12-21 16:49:39 -08:00
Steve Yen	33687260ca	children of conjunct/disjunct's are not necessarily termSearchers Rename termSearcher loop variable to searcher, as the child searchers of a conjunction/disjunction searcher aren't necessarily termSearchers.	2017-12-21 16:45:43 -08:00
Steve Yen	a884f38bf6	scorch docInternalToNumber returns 0 on error	2017-12-21 16:44:31 -08:00
Steve Yen	b3e41335e1	scorch compared to upsidedown/bolt using templated, generated searches This is somewhat like a simple, unit-test'ish version of testrunner's random query generator, where this does not have a dependency on an external elasticsearch server, and instead depends on functional correctness when comparing to upsidedown/bolt.	2017-12-21 16:43:52 -08:00
Steve Yen	4c494216d6	Merge pull request #687 from steveyen/scorch some scorch changes to check closeCh & merger memory usage	2017-12-20 15:30:55 -08:00
Steve Yen	67e0e5973b	scorch mergeStoredAndRemap() memory reuse In mergeStoredAndRemap(), instead of allocating new hashmaps for each document, this commit reuses some arrays that are indexed by fieldId.	2017-12-20 15:18:22 -08:00
Steve Yen	c155255506	scorch optimize zap.Merge() to reuse some buffers	2017-12-20 14:59:53 -08:00
Steve Yen	ea4eb7301b	scorch merger checks closeCh	2017-12-20 14:59:53 -08:00
Steve Yen	59797c35fa	Merge pull request #686 from steveyen/scorch scorch removeOldBoltSnapshots() deletes from correct bucket	2017-12-20 14:59:36 -08:00
Steve Yen	04ac9d5b1f	scorch removeOldBoltSnapshots() deletes from correct bucket	2017-12-20 14:46:48 -08:00
Steve Yen	d55ef26c51	Merge pull request #682 from steveyen/scorch scorch added kvconfig unsafe_batch option	2017-12-20 10:21:49 -08:00
Steve Yen	df6c8f4074	scorch added kvconfig unsafe_batch option Added an option to the kvconfig JSON, called "unsafe_batch" (bool). Default is false, so Batch() calls are synchronously persisted by default. Advanced users may want to unsafe, asynchronous persistence to tradeoff performance (mutations are queryable sooner) over safety. { "index_type": "scorch", "kvconfig": { "unsafe_batch": true } } This change replaces the previous kvstore=="moss" workaround.	2017-12-20 10:11:55 -08:00
Steve Yen	43e3d4e1dd	Merge pull request #681 from steveyen/scorch scorch simplify err check after vellum load	2017-12-19 23:07:24 -08:00
Steve Yen	1abbfadf0d	scorch simplify err check after vellum load	2017-12-19 22:34:39 -08:00
Steve Yen	4d28b16896	Merge pull request #680 from steveyen/scorch scorch docNumberToBytes() checks cap(buf) before allocating	2017-12-19 19:31:23 -08:00
Steve Yen	dbc88cf6b3	scorch docNumberToBytes() checks cap(buf) before allocating With more pprof focusing (zooming in on a particular func), there were still some memory allocations showing up with docNumberToBytes() in micro benchmarks of bleve-query. On a dev macbook, on an index of 50K wikipedia docs, using search of relatively common "text:date"... 400 qps - upsidedown/moss 680 qps - scorch before 775 qps - scorch after	2017-12-19 19:15:19 -08:00
Steve Yen	ed8bbded02	Merge pull request #679 from steveyen/scorch scorch optimize zap Count()	2017-12-19 18:12:43 -08:00
Steve Yen	8f8333e01b	scorch optimize zap Count() This proposed approach avoids building a temporary AndNot() bitmap, following the same kind of optimization used by mem segments.	2017-12-19 18:02:27 -08:00
Steve Yen	c5aa2f997f	Merge pull request #678 from steveyen/scorch scorch added more cases to TestIndexInsertThenDelete	2017-12-19 17:17:14 -08:00
Steve Yen	a0556ad65b	scorch added more cases to TestIndexInsertThenDelete	2017-12-19 16:41:56 -08:00
Steve Yen	8890e36025	Merge pull request #677 from steveyen/scorch scorch remove leftover doc comment	2017-12-19 13:54:27 -08:00
Steve Yen	142ccdfaec	scorch remove leftover doc comment I'm suspecting that Marty's editor is more exciting than mine. :-)	2017-12-19 13:53:04 -08:00
Steve Yen	c0e09d8906	Merge pull request #676 from steveyen/scorch scorch avoid extra clone by using roaring.AndNot(x, y)	2017-12-19 13:52:40 -08:00
Steve Yen	f8b52f5e68	Merge pull request #674 from abhinavdangeti/scorch scorch APIs to support rollback	2017-12-19 13:38:47 -08:00
Steve Yen	d0e4f85026	scorch avoid extra clone by using roaring.AndNot(x, y)	2017-12-19 13:37:04 -08:00
Steve Yen	b0e4936a71	Merge pull request #675 from steveyen/scorch import couchbase/vellum instead of couchbaselabs/vellum	2017-12-19 11:07:35 -08:00
abhinavdangeti	679f1ce9c3	scorch APIs to support rollback - PreviousPersistedSnapshot - SnapshotRevert + unit test	2017-12-19 10:53:08 -08:00
Steve Yen	f6b506134b	import couchbase/vellum instead of couchbaselabs/vellum Also, scrubbed an old couchbaselabs/moss reference in comments. Also, go fmt.	2017-12-19 10:49:57 -08:00
Steve Yen	20972493d1	Merge pull request #661 from steveyen/scorchReusePosting scorch reuses Posting instance in PostingsIterator.Next()	2017-12-18 16:37:07 -08:00
Steve Yen	730d906a50	scorch reuses Posting instance in PostingsIterator.Next() With this change, there are no more memory allocations in the calls to PostingsIterator.Next() in the micro benchmarks of bleve-query. On a dev macbook, on an index of 50K wikipedia docs, using high frequency search of "text:date"... 400 qps - upsidedown/moss 565 qps - scorch before 680 qps - scorch after	2017-12-18 16:15:38 -08:00
Steve Yen	bf833a5eb8	Merge pull request #668 from steveyen/scorch scorch mergeplan explicitly weeds out empty segments	2017-12-18 12:00:43 -08:00
Steve Yen	867bb2c031	scorch mergeplan explicitly weeds out empty segments Rather than waiting on scoring to weed out empty segments, this commit does the weeding out of empty segments explicitly and up front.	2017-12-18 11:33:19 -08:00

1 2 3 4 5 ...

1548 Commits