bleve

Author	SHA1	Message	Date
Marty Schoch	4e82a8a0ca	Merge pull request #726 from sreekanth-cb/docValue_configs DocValue Config, new API Changes	2018-01-10 18:11:18 -05:00
Sreekanth Sivasankaran	53aef2104e	fixing err handling in UTs, name changes	2018-01-10 22:00:26 +05:30
abhinavdangeti	43bfcc00c9	Do not account mmap'ed part of zap segments in MemoryUsed This API is designed to only emit the dirty "unpersisted" bytes only. This does not included the mmap'ed part in the zap segments (disk).	2018-01-09 09:43:53 -08:00
Sreekanth Sivasankaran	4c256f5669	DocValue Config, new API Changes -VisitableDocValueFields API for persisted DV field list -making dv configs overridable at field level -enabling on the fly/runtime un inverting of doc values -few UT updates	2018-01-08 10:58:33 +05:30
Marty Schoch	1788a03803	remove junk from end of scorch readme	2018-01-06 21:09:53 -05:00
Marty Schoch	e756c7acf0	add initial support for async error callback	2018-01-05 16:43:16 -05:00
Marty Schoch	6237479605	fix race condition in setting up event callbacks previous approach used SetEventCallback method which allowed you to change the callback, unfotunately that also included times after the goroutines were started and potentially firing the callback. checking lock on this would be too expensive, so instead we go for an approach that allows callbacks to be registered by name during process init(), then upon opening up an index a string config key 'eventCallbackName' is used to look up the appropriate callback function. also, since this string config name is serializable, it fits into the existing bleve index metadata without any new issues.	2018-01-05 13:46:03 -05:00
Marty Schoch	57a075afdb	improving command-line tool for scorch	2018-01-05 11:50:07 -05:00
Marty Schoch	c691cd2bb5	refactor scorch/zap command-line tools under bleve zap command-line tool added to main bleve command-line tool this required physical relocation due to the vendoring used only on the bleve command-line tool (unforseen limitation) a new scorch command-line tool has also been introduced and for the same reasons it is physically store under the top-level bleve command-line tool as well	2018-01-05 10:17:18 -05:00
Abhinav Dangeti	dee1dd9bc8	Merge pull request #720 from abhinavdangeti/scorch Updated Rollback APIs	2018-01-04 14:51:33 -08:00
abhinavdangeti	111f0d0721	Updated Rollback APIs New APIs: + RollbackPoints() - Retrieves the available list of rollback points: epoch+meta. - The application will need to check with the meta to decide on the rollback point. + Rollback() - API requires a rollback point identified by the first API. - Atomically & Durably rolls back the index to specified point, provided the specified rollback point is still available. + Unit test: TestIndexRollback - Writes a batch. - Sets the rollback point. - Writes second batch. - Rollback to previously decided point. - Ensure that data is as is before the second batch.	2018-01-04 13:21:58 -08:00
Marty Schoch	71cdac785d	Merge pull request #703 from sreekanth-cb/docValue_persisted docValue persist changes	2018-01-04 10:34:58 -05:00
Sreekanth Sivasankaran	71a726bbf6	perf issue was due to duplicate fieldIDs getting inserted to the list of dv enabled fields list - DocValueFields in mem segment. Moved back to the original type `DocValueFields map[uint16]bool` for easy look up to check whether the fieldID is configured for dv storage.	2018-01-04 15:34:55 +05:30
Sreekanth Sivasankaran	f42ecb0ac7	docvalue "zap-path" cmd to print out the dv disk sizes	2018-01-04 13:58:51 +05:30
Marty Schoch	1a59a1bb99	attempt to fix core reference counting issues Observed problem: Persisted index state (in root bolt) would contain index snapshots which pointed to index files that did not exist. Debugging this uncovered two main problems: 1. At the end of persisting a snapshot, the persister creates a new index snapshot with the SAME epoch as the current root, only it replaces in-memory segments with the new disk based ones. This is problematic because reference counting an index segment triggers "eligible for deletion". And eligible for deletion is keyed by epoch. So having two separate instances going by the same epoch is problematic. Specifically, one of them gets to 0 before the other, and we wrongly conclude it's eligible for deletion, when in fact the "other" instance with same epoch is actually still in use. To address this problem, we have modified the behavior of the persister. Now, upon completion of persistence, ONLY if new files were actually created do we proceed to introduce a new snapshot. AND, this new snapshot now gets it's own brand new epoch. BOTH of these are important because since the persister now also introduces a new epoch, it will see this epoch again in the future AND be expected to persist it. That is OK (mostly harmless), but we cannot allow it to form a loop. Checking that new files were actually introduced is what short-circuits the potential loop. The new epoch introduced by the persister, if seen again will not have any new segments that actually need persisting to disk, and the cycle is stopped. 2. The implementation of NumSnapshotsToKeep, and related code to deleted old snapshots from the root bolt also contains problems. Specifically, the determination of which snapshots to keep vs delete did not consider which ones were actually persisted. So, lets say you had set NumSnapshotsToKeep to 3, if the introducer gets 3 snapshots ahead of the persister, what can happen is that the three snapshots we choose to keep are all in memory. We now wrongly delete all of the snapshots from the root bolt. But it gets worse, in this instant of time, we now have files on disk that nothing in the root bolt points to, so we also go ahead and delete those files. Those files were still being referenced by the in-memory snapshots. But, now even if they get persisted to disk, they simply have references to non-existent files. Opening up one of these indexes results in lost data (often everything). To address this problem, we made large change to the way this section of code operates. First, we now start with a list of all epochs actually persisted in the root bolt. Second, we set aside NumSnapshotsToKeep of these snapshots to keep. Third, anything else in the eligibleForRemoval list will be deleted. I suspect this code is slower and less elegant, but I think it is more correct. Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to 1, which feels like saner out-of-the-box behavior (though it's debatable if the original intent was perhaps instead for "extra" snapshots to keep, but with the variable named as it is, 1 makes more sense to me) Other minor changes included in this change: - Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and 'ineligibleForRemoval' members of Scorch struct were moved into the paragraph with 'rootLock' to clarify that you must hold the lock to access it. - TestBatchRaceBug260 was updated to properly Close() the index, which leads to occasional test failures.	2018-01-03 12:05:00 -05:00
Sreekanth Sivasankaran	448201243a	removed redundant buf writer, and checks	2017-12-30 16:54:06 +05:30
Sreekanth Sivasankaran	61ba81e964	Merge branch 'scorch', remote-tracking branch 'origin' into docValue_persisted	2017-12-30 16:52:51 +05:30
Marty Schoch	29b63cfe43	Merge pull request #711 from abhinavdangeti/scorch3 Tracking memory consumption for a scorch index	2017-12-29 12:52:32 -08:00
abhinavdangeti	5c26f5a86d	Tracking memory consumption for a scorch index + Track memory usage at a segment level + Add a new scorch API: MemoryUsed() - Aggregate the memory consumption across segments when API is invoked. + TODO: - Revisit the second iteration if it can be gotten rid off, and the size accounted for during the first run while building an in-mem segment. - Accounting for pointer and slice overhead.	2017-12-29 10:20:11 -07:00
abhinavdangeti	055d3e12df	Adding onEvent callback support for scorch Event types: - EventKindCloseStart - EventKindClose - EventKindMergerProgress - EventKindPersisterProgress - EventKindBatchIntroductionStart - EventKindBatchIntroduction	2017-12-29 09:47:25 -07:00
Sreekanth Sivasankaran	c8df014c0c	Updated readme, zap version, added new docvalue cmd, fixed the footer and fields cmd, interface name updated	2017-12-29 21:39:29 +05:30
abhinavdangeti	4bede84fd0	Wiring up missing stats for scorch - updates, deletes, batches, errors - term_searchers_started, term_searchers_finished - num_plain_test_bytes_indexed	2017-12-28 14:07:58 -07:00
abhinavdangeti	becd4677cd	Adding num_items_introduced, num_items_persisted stats + Adding new entries to the stats struct of scorch. + These stats are atomically incremented upon every segment introduction, and upon successful persistence.	2017-12-28 14:07:44 -07:00
Sreekanth Sivasankaran	8abac42796	errCheck fixes	2017-12-28 13:23:57 +05:30
Sreekanth Sivasankaran	0272451093	adding checks for robustness	2017-12-28 13:05:25 +05:30
Sreekanth Sivasankaran	76f827f469	docValue persist changes docValues are persisted along with the index, in a columnar fashion per field with variable sized chunking for quick look up. -naive chunk level caching is added per field -data part inside a chunk is snappy compressed -metaHeader inside the chunk index the dv values inside the uncompressed data part -all the fields are docValue persisted in this iteration	2017-12-28 12:05:33 +05:30
abhinavdangeti	dcabc267a0	Wait for rollback'ed snapshot to persist	2017-12-27 10:06:29 -07:00
Steve Yen	c7a342bc7d	scorch conjuncts match phrase test passes The conjunction searcher Advance() method now checks if its curr doc-matches suffices before advancing them.	2017-12-23 09:19:40 -08:00
Steve Yen	a884f38bf6	scorch docInternalToNumber returns 0 on error	2017-12-21 16:44:31 -08:00
Steve Yen	67e0e5973b	scorch mergeStoredAndRemap() memory reuse In mergeStoredAndRemap(), instead of allocating new hashmaps for each document, this commit reuses some arrays that are indexed by fieldId.	2017-12-20 15:18:22 -08:00
Steve Yen	c155255506	scorch optimize zap.Merge() to reuse some buffers	2017-12-20 14:59:53 -08:00
Steve Yen	ea4eb7301b	scorch merger checks closeCh	2017-12-20 14:59:53 -08:00
Steve Yen	04ac9d5b1f	scorch removeOldBoltSnapshots() deletes from correct bucket	2017-12-20 14:46:48 -08:00
Steve Yen	df6c8f4074	scorch added kvconfig unsafe_batch option Added an option to the kvconfig JSON, called "unsafe_batch" (bool). Default is false, so Batch() calls are synchronously persisted by default. Advanced users may want to unsafe, asynchronous persistence to tradeoff performance (mutations are queryable sooner) over safety. { "index_type": "scorch", "kvconfig": { "unsafe_batch": true } } This change replaces the previous kvstore=="moss" workaround.	2017-12-20 10:11:55 -08:00
Steve Yen	1abbfadf0d	scorch simplify err check after vellum load	2017-12-19 22:34:39 -08:00
Steve Yen	dbc88cf6b3	scorch docNumberToBytes() checks cap(buf) before allocating With more pprof focusing (zooming in on a particular func), there were still some memory allocations showing up with docNumberToBytes() in micro benchmarks of bleve-query. On a dev macbook, on an index of 50K wikipedia docs, using search of relatively common "text:date"... 400 qps - upsidedown/moss 680 qps - scorch before 775 qps - scorch after	2017-12-19 19:15:19 -08:00
Steve Yen	8f8333e01b	scorch optimize zap Count() This proposed approach avoids building a temporary AndNot() bitmap, following the same kind of optimization used by mem segments.	2017-12-19 18:02:27 -08:00
Steve Yen	a0556ad65b	scorch added more cases to TestIndexInsertThenDelete	2017-12-19 16:41:56 -08:00
Steve Yen	142ccdfaec	scorch remove leftover doc comment I'm suspecting that Marty's editor is more exciting than mine. :-)	2017-12-19 13:53:04 -08:00
Steve Yen	c0e09d8906	Merge pull request #676 from steveyen/scorch scorch avoid extra clone by using roaring.AndNot(x, y)	2017-12-19 13:52:40 -08:00
Steve Yen	f8b52f5e68	Merge pull request #674 from abhinavdangeti/scorch scorch APIs to support rollback	2017-12-19 13:38:47 -08:00
Steve Yen	d0e4f85026	scorch avoid extra clone by using roaring.AndNot(x, y)	2017-12-19 13:37:04 -08:00
abhinavdangeti	679f1ce9c3	scorch APIs to support rollback - PreviousPersistedSnapshot - SnapshotRevert + unit test	2017-12-19 10:53:08 -08:00
Steve Yen	f6b506134b	import couchbase/vellum instead of couchbaselabs/vellum Also, scrubbed an old couchbaselabs/moss reference in comments. Also, go fmt.	2017-12-19 10:49:57 -08:00
Steve Yen	730d906a50	scorch reuses Posting instance in PostingsIterator.Next() With this change, there are no more memory allocations in the calls to PostingsIterator.Next() in the micro benchmarks of bleve-query. On a dev macbook, on an index of 50K wikipedia docs, using high frequency search of "text:date"... 400 qps - upsidedown/moss 565 qps - scorch before 680 qps - scorch after	2017-12-18 16:15:38 -08:00
Steve Yen	867bb2c031	scorch mergeplan explicitly weeds out empty segments Rather than waiting on scoring to weed out empty segments, this commit does the weeding out of empty segments explicitly and up front.	2017-12-18 11:33:19 -08:00
Steve Yen	20fe70770a	scorch added some tests on # of expected segments	2017-12-17 12:39:15 -08:00
Steve Yen	34f5e2175f	scorch fix persister for lost notifications on no-data batches With the previous commit, there can be a scenario where batches that had internal-updates-only can be rapidly introduced by the app, but the persisted notifications on only the very last IndexSnapshot would be fired. The persisted notifications on the in-between batches might be missed. The solution was to track the persisted notification channels at a higher Scorch struct level, instead of tracking the persisted channels at the IndexSnapshot and SegmentSnapshot levels. Also, the persister double-check looping was simplified, which avoids a race where an introducer might incorrectly not notify the persister.	2017-12-17 12:30:05 -08:00
Steve Yen	ecbb3d2df4	scorch handles non-updating batches better This commit improves handling when an incoming batch has internal-data updates only and no doc updates. In this case, a nil segment instead of an empty segment instance is used in the segmentIntroduction. The segmentIntroduction, that is, might now hold only internal-data updates only. To handle synchronous persistence, a new field that's a slice of persisted notification channels is added to the IndexSnapshot struct, which the persister goroutine will close as each IndexSnapshot is persisted. Also, as part of this change, instead of checking the unsafeBatch flag in several places, we instead check for non-nil'ness of these persisted chan's.	2017-12-17 08:51:23 -08:00
Steve Yen	e98602600d	scorch mergeplan added TierGrowth option Previously, CalcBudget() was treating MergePlanOptions.SegmentsPerMergeTask as the growth factor while computing the idealized staircase of segments. This change introduces a TierGrowth option to MergePlanOptions for more control and so that SegmentsPerMergeTask can be tweaked independently of the tier growth factor.	2017-12-16 14:22:15 -08:00
Steve Yen	0539744e90	scorch mergeplan.ToBarChart() refactored to callable API Refactored out API so it's usable from other places.	2017-12-16 08:39:10 -08:00
Steve Yen	dc4df18001	Merge pull request #662 from steveyen/scorch scorch mergeplan package comments tweak	2017-12-15 18:41:20 -08:00
Marty Schoch	a575be4d56	fix issue where we incorrectly seed the nextSegmentID on Open()	2017-12-15 19:26:23 -05:00
Steve Yen	45c212a0c2	scorch mergeplan package comments tweak Moving the package comment for mergeplan to the right place.	2017-12-15 13:25:39 -08:00
Steve Yen	620dcdb6f8	scorch uses prealloc'ed buffer for docNumberToBytes() On a couple of micro benchmarks on a dev macbook using bleve-query on an index of 50K wikipedia docs, scorch is now faster than upsidedown/moss on high-freq term search "text:date"... 400 qps - upsidedown/moss 404 qps - scorch before 565 qps - scorch after	2017-12-15 11:58:21 -08:00
Steve Yen	f05794c6aa	scorch removed worker goroutines from TermFieldReader() On a couple of micro benchmarks on a dev macbook using bleve-query on an index of 50K wikipedia docs, scorch is now in more the same neighborhood of upsidedown/moss... high-freq term search "text:date"... 400 qps - upsidedown/moss 360 qps - scorch before 404 qps - scorch after zero-freq term search "text:mschoch"... 100K qps - upsidedown/moss 55K qps - scorch before 99K qps - scorch after Of note, the scorch index had ~150 *.zap files in it, which likely made made the worker goroutine overhead more costly than for a case with few segments, where goroutine and channel related work appeared relatively prominently in the pprof SVG's.	2017-12-15 11:11:18 -08:00
Marty Schoch	562b473e36	Merge pull request #657 from steveyen/scorch scorch fix data race w/ AddEligibleForRemoval	2017-12-14 17:56:06 -05:00
Marty Schoch	b5aa4ed22b	return err not panic	2017-12-14 17:41:02 -05:00
Steve Yen	506aa1c325	scorch fix data race w/ AddEligibleForRemoval Found from "go test -race ./..." WARNING: DATA RACE Read at 0x00c420088060 by goroutine 48: github.com/blevesearch/bleve/index/scorch.(Scorch).AddEligibleForRemoval() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:348 +0x6d Previous write at 0x00c420088060 by goroutine 31: github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt.func1() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:332 +0x87b github.com/boltdb/bolt.(DB).View() /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1 github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1 github.com/blevesearch/bleve/index/scorch.(Scorch).Open() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351 testing.tRunner() /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c Goroutine 48 (running) created at: github.com/blevesearch/bleve/index/scorch.(IndexSnapshot).DecRef() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/snapshot_index.go:72 +0x23e github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt.func1() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:330 +0x8f4 github.com/boltdb/bolt.(DB).View() /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1 github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1 github.com/blevesearch/bleve/index/scorch.(Scorch).Open() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351 testing.tRunner() /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c	2017-12-14 14:40:33 -08:00
Marty Schoch	6ab27e4afa	quick hack to disable safe batches in fts	2017-12-14 17:19:50 -05:00
Steve Yen	eb2f541d4f	scorch filters _id from Reader.Document() results	2017-12-14 13:52:28 -08:00
Steve Yen	a8884e1011	scorch fix for TestSortMatchSearch The cachedDocs preparation has to happen for all docs in the field, not just on the currently requested docNum. Also, as part of this commit, there's a loop optimization where we no longer use bytes.Split() on the terms buffer, thus avoiding garbage creation.	2017-12-14 13:22:13 -08:00
Steve Yen	2be5eb4427	scorch tracks zap files that can't be removed yet A race & solution found by Marty Schoch... consider a case when the merger might grab a nextSegmentID, like 4, but takes awhile to complete. Meanwhile, the persister grabs the nextSegmentID of 5, but finishes its persistence work fast, and then loops to cleanup any old files. The simple approach of checking a "highest segment ID" of 5 is wrong now, because the deleter now thinks that segment 4's zap file is (incorrectly) ok to delete. The solution in this commit is to track an ephemeral map of filenames which are ineligibleForRemoval, because they're still being written (by the merger) and haven't been fully incorporated into the rootBolt yet. The merger adds to that ineligibleForRemoval map as it starts a merged zap file, the persister cleans up entries from that map when it persists zap filenames into the rootBolt, and the deleter (part of the persister's loop) consults the map before performing any actual zap file deletions.	2017-12-14 10:49:33 -08:00
Marty Schoch	bd742caf65	don't try to close a nil segment if err opening	2017-12-14 10:29:19 -05:00
Marty Schoch	149a26b5c1	merge deletion and cacheddocs fixes discussed in meeting	2017-12-14 10:27:39 -05:00
Sreekanth Sivasankaran	95b65ade3e	getting right internalID for doc in UT	2017-12-14 17:16:47 +05:30
Sreekanth Sivasankaran	1066ee7d22	DocumentVisitFieldTerms Scorch implementation level1	2017-12-14 12:38:29 +05:30
Marty Schoch	2b92e5ff99	Merge pull request #653 from steveyen/scorch scorch cleanup of the rootBolt of old snapshots	2017-12-13 22:47:14 -05:00
Marty Schoch	e1b0c61e2a	fix bug in handling iterator-done	2017-12-13 22:08:06 -05:00
Steve Yen	b7dff6669f	scorch cleanup of *.zap files not listed in the rootBolt	2017-12-13 17:09:50 -08:00
Steve Yen	c0cc46a2be	scorch cleanup of the rootBolt of old snapshots A new global variable, NumSnapshotsToKeep, represents the default number of old snapshots that each scorch instance should maintain -- 0 is the default. Apps that need rollback'ability may want to increase this value in early initialization. The Scorch.eligibleForRemoval field tracks epoches which are safe to delete from the rootBolt. The eligibleForRemoval is appended to whenever the ref-count on an IndexSnapshot drops to 0. On startup, eligibleForRemoval is also initialized with any older epoch's found in the rootBolt. The newly introduced Scorch.removeOldSnapshots() method is called on every cycle of the persisterLoop(), where it maintains the eligibleForRemoval slice to under a size defined by the NumSnapshotsToKeep. A future commit will remove actual storage files in order to match the "source of truth" information found in the rootBolt.	2017-12-13 15:53:31 -08:00
Steve Yen	c13ff85aaf	scorch ref-counting Future commits will provide actual cleanup when ref-counts reach 0.	2017-12-13 14:48:07 -08:00
Marty Schoch	50471003dc	basic refactoring of introducer to make it more readable	2017-12-13 16:30:39 -05:00
Marty Schoch	a0e12b2640	add license to a few files missing it	2017-12-13 16:12:29 -05:00
Marty Schoch	85e15628ee	major refactoring of posting details	2017-12-13 16:10:06 -05:00
Marty Schoch	6e2207c445	additional refactoring of build/merge	2017-12-13 15:22:13 -05:00
Marty Schoch	50441e5065	refactor to reuse shared code	2017-12-13 14:41:20 -05:00
Marty Schoch	289dc398bd	more refacotring of build/merge	2017-12-13 14:26:11 -05:00
Marty Schoch	1cd3fd7fbe	extrac common functionality between build/merge	2017-12-13 14:06:54 -05:00
Marty Schoch	cd45487cb3	fsync rootBolt when persisting snapshot	2017-12-13 13:55:06 -05:00
Marty Schoch	f83c9f2a20	initial cut of merger that actually introduces changes	2017-12-13 13:41:03 -05:00
Marty Schoch	c15c3c11cd	extra protection if dict address is 0 (empty segment)	2017-12-13 13:31:18 -05:00
Steve Yen	be7dd36ac6	mergeplan: more tests and bargraph tweaks	2017-12-12 10:37:27 -08:00
Steve Yen	59a1e26300	mergeplan: scoring implemented	2017-12-12 10:37:27 -08:00
Marty Schoch	57121e40a8	fix issues identified by errcheck	2017-12-12 11:41:14 -05:00
Marty Schoch	665c3c80ff	initial cut of zap segment merging	2017-12-12 11:21:55 -05:00
Marty Schoch	927216df8c	fix postings list count impl	2017-12-12 08:42:13 -05:00
Steve Yen	3461fb741f	mergeplan: a placeholder planner that merges all segments A stepping stone to fleshing out the API contract.	2017-12-11 14:53:08 -08:00
Marty Schoch	58ef21a88a	fix golint issue	2017-12-11 16:24:46 -05:00
Marty Schoch	f246e0e4c0	update README for zap file format changes	2017-12-11 16:22:29 -05:00
Marty Schoch	74b2eeb14d	refactor where we do some work so we can return error	2017-12-11 15:59:36 -05:00
Marty Schoch	f13b786609	fix up issues to get all bleve unit tests passing for scorch make scorch default	2017-12-11 15:47:41 -05:00
Marty Schoch	d7eb223e14	remove bolt segment format upcomning breaking changes and no desire to maintain	2017-12-11 10:20:26 -05:00
Marty Schoch	eada7b209b	fix test issue identified by sreekanth	2017-12-11 10:16:56 -05:00
Marty Schoch	8280859bb8	handle read-only and in-mem only cases	2017-12-11 09:07:01 -05:00
Marty Schoch	e8cc7ac0bf	add new fields command to zap cmd-line util	2017-12-11 09:05:50 -05:00
Marty Schoch	690cd39921	add crazy slow but functional DocumentVisitFieldTerms	2017-12-10 08:55:59 -05:00
Marty Schoch	dc0adc8827	add fsync	2017-12-09 20:52:01 -05:00
Marty Schoch	e0d9828cd0	add more detail to the readme	2017-12-09 14:42:36 -05:00
Marty Schoch	414899618b	switch from bolt format to zap in the persister	2017-12-09 14:28:50 -05:00
Marty Schoch	9781d9b089	add initial version of zap file format	2017-12-09 14:28:33 -05:00
Marty Schoch	ff2e6b98e4	added empty segment	2017-12-09 12:43:02 -05:00
Marty Schoch	e470105635	fix issues identified by errcheck	2017-12-06 18:36:14 -05:00
Marty Schoch	adac4f41db	initial version of scorch which persists index to disk	2017-12-06 18:33:47 -05:00
Marty Schoch	b1346b4c8a	add readme describing our use of bolt as a segment format	2017-12-05 16:09:00 -05:00
Marty Schoch	898a6b1e85	fix errcheck issues	2017-12-05 13:32:57 -05:00
Marty Schoch	ece27ef215	adding initial version of bolt persisted segment	2017-12-05 13:05:12 -05:00
Marty Schoch	f6be841668	add test for postings list count method	2017-12-05 13:01:36 -05:00
Marty Schoch	30e9d6daa5	add better testing of array positions	2017-12-05 12:54:44 -05:00
Marty Schoch	8d9d45115f	add test of location field	2017-12-05 12:20:06 -05:00
Marty Schoch	8f0350865b	add test for segment fields method	2017-12-05 12:17:56 -05:00
Marty Schoch	7a6b5483f2	add validation that all locations were seen	2017-12-05 11:58:05 -05:00
Marty Schoch	e08fdab54a	remove todo item	2017-12-05 10:13:27 -05:00
Marty Schoch	87e2627551	added dictionary tests to mem segment	2017-12-05 09:49:41 -05:00
Marty Schoch	ed067f45dd	added Close() method to Segment	2017-12-05 09:31:02 -05:00
Marty Schoch	22ffc8940e	update segment API to return error in key places	2017-12-04 18:06:06 -05:00
Marty Schoch	b74cf4b081	add copyright header to all new files in scorch	2017-12-01 15:42:50 -05:00
Marty Schoch	89aa02cf5b	fix highlighting of composite fields updated log statements for refactored names	2017-12-01 15:12:08 -05:00
Marty Schoch	cff14f1212	fix crash in DocNumbers when segment is empty	2017-12-01 09:50:27 -05:00
Marty Schoch	eb256f78bc	switch to constant referring to id field id 0 this avoids potentially mutating something that is intended to be immutable	2017-12-01 09:30:07 -05:00
Marty Schoch	7c964de8bf	switch to binary search for finding segment from global doc num added unit tests for this function specifically	2017-12-01 09:26:51 -05:00
Marty Schoch	c2047dcdf9	refactor doc id reader creation to share more code fix issue identified by steve	2017-12-01 08:54:39 -05:00
Marty Schoch	bcd4bdc3d1	added initial bolt thought to README	2017-12-01 07:27:04 -05:00
Marty Schoch	395458ce83	refactor to make mem segment contents exported	2017-12-01 07:26:47 -05:00
Steve Yen	398dcb19b3	scorch introducer uses the roaring.Or(x, y) API Instead of cloning an input bitmap, the roaring.Or(x, y) implementation fills a brand new result bitmap, which should be allow for more efficient packing and memory utilization.	2017-11-30 10:37:10 -08:00
Steve Yen	67986d41bf	scorch InternalID() handles case of unknown docId	2017-11-30 08:36:01 -08:00
Marty Schoch	848aca4639	fix issues identified by errcheck	2017-11-29 13:34:15 -05:00
Marty Schoch	23f6dc1cc6	working in-memory version	2017-11-29 11:33:35 -05:00
Steve Yen	546700b2de	fix comment typo	2017-08-24 16:25:10 -07:00
Marty Schoch	cea119449e	fix data race in doc id search the implementation of the doc id search requires that the list of ids be sorted. however, when doing a multisearch across many indexes at once, the list of doc ids in the query is shared. deeper in the implementation, the search of each shard attempts to sort this list, resulting in a data race. this is one example of a potentially larger problem, however it has been decided to fix this data race, even though larger issues of data owernship may remain unresolved. this fix makes a copy of the list of doc ids, just prior to sorting the list. subsequently, all use of the list is on the copy that was made, not the original. fixes #518	2017-08-07 15:11:35 -04:00
abhinavdangeti	8ec88a6cb0	MB-24560: Add moss store\|collection histograms to stats	2017-05-25 16:32:36 -07:00
Marty Schoch	3ad13236ec	fix geopoint fields to be able to be stored and retrieved	2017-03-31 09:40:54 -04:00
Marty Schoch	74140d4f2b	remove forestdb from bleve	2017-03-30 12:27:23 -04:00
Marty Schoch	1bcfe4efa1	Merge pull request #546 from sreekanth-cb/store_abort_close Store abort close	2017-03-07 12:35:18 -05:00
Sreekanth Sivasankaran	f759d841c2	Adding guards for config casting.	2017-03-07 22:51:27 +05:30
Sreekanth Sivasankaran	e88ff3c60a	Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close Syntax change for errcheck tool	2017-03-07 19:56:08 +05:30
Sreekanth Sivasankaran	ee819f5950	MB-22410 - Configurable forced Store Abort API Adding a configurable forced store close Bumping the moss store version	2017-03-07 19:33:51 +05:30
Marty Schoch	0eba2a3f0c	reduce garbage created while processing facets previously we parsed/returned large sections of the documents back index row in order to compute facet information. this would require parsing the protobuf of the entire back index row. unfortunately this creates considerable garbage. this new version introduces a visitor/callback approach to working with data inside the back index row. the benefit of this approach is that we can let the higher-level code see values, prior to any copies of data being made or intermediate garbage being created. implementations of the callback must copy any value which they would like to retain beyond the callback. NOTE: this approach is duplicates code from the automatically generated protobuf code NOTE: this approach assumes that the "field" field be serialized before the "terms" field. This is guaranteed by our currently generated protobuf encoder, and is recommended by the protobuf spec. But, decoders SHOULD support them occuring in any order, which we do not.	2017-03-02 17:00:46 -05:00
Marty Schoch	b04745abcc	remove smolder indexing scheme this was an experiment that we're no longer working on we learned from it, but now carrying it forward has a maintenance burden we don't wish to pay	2017-03-01 14:38:17 -05:00
Sreekanth Sivasankaran	67a5814fbe	MB-22410:deleting/editing index definition with large dirty write queue can be very slow Adding a configurable forced store close	2017-03-01 18:58:32 +05:30
Sreekanth Sivasankaran	324e4237cf	adding configurable Abort Close	2017-03-01 16:23:56 +05:30
Sundar Sridharan	74c7de0dcf	re-order childSnapshot declaration	2017-02-21 15:54:04 -08:00
Sundar Sridharan	04d428656e	Add Snapshot interface methods for moss child collections feature	2017-02-20 15:03:45 -08:00
Steve Yen	0b70a1bcb8	use inlined prealloc'ed termFreqRow in upsidedown termFieldReader	2017-02-08 18:23:13 -08:00
Steve Yen	31fecc3663	avoid row alloc's in upsidedown termFieldReader constructor	2017-02-08 18:14:30 -08:00
Marty Schoch	606fd6344b	INDEX FORMAT CHANGE: change back index row value Previously term entries were encoded pairwise (field/term), so you'd have data like: F1/T1 F1/T2 F1/T3 F2/T4 F3/T5 As you can see, even though field 1 has 3 terms, we repeat the F1 part in the encoded data. This is a bit wasteful. In the new format we encode it as a list of terms for each field: F1/T1,T2,T3 F2/T4 F3/T5 When fields have multiple terms, this saves space. In unit tests there is no additional waste even in the case that a field has only a single value. Here are the results of an indexing test case (beer-search): $ benchcmp indexing-before.txt indexing-after.txt benchmark old ns/op new ns/op delta BenchmarkIndexing-4 11275835988 10745514321 -4.70% benchmark old allocs new allocs delta BenchmarkIndexing-4 25230685 22480494 -10.90% benchmark old bytes new bytes delta BenchmarkIndexing-4 4802816224 4741641856 -1.27% And here are the results of a MatchAll search building a facet on the "abv" field: $ benchcmp facet-before.txt facet-after.txt benchmark old ns/op new ns/op delta BenchmarkFacets-4 439762100 228064575 -48.14% benchmark old allocs new allocs delta BenchmarkFacets-4 9460208 3723286 -60.64% benchmark old bytes new bytes delta BenchmarkFacets-4 260784261 151746483 -41.81% Although we expect the index to be smaller in many cases, the beer-search index is about the same in this case. However, this may be due to the underlying storage (boltdb) in this case. Finally, the index version was bumped from 5 to 7, since smolder also used version 6, which could lead to some confusion.	2017-01-24 15:39:38 -05:00
Steve Yen	5927224e15	optimize mergeOldAndNew for case of first time a doc is seen	2017-01-09 22:48:58 -08:00
Steve Yen	790f2e3e32	optimize by alloc'ing arrays of TermFrequencyRow/TermVector	2017-01-09 22:42:00 -08:00
Steve Yen	8f4726ab10	use struct{}{} idiom instead of additional mark var	2017-01-09 10:17:26 -08:00
Steve Yen	302cac72c4	optimize mergeOldAndNew when non-update case	2017-01-08 17:59:49 -08:00

1 2 3 4 5 ...

658 Commits