bleve

Author	SHA1	Message	Date
Marty Schoch	b5aa4ed22b	return err not panic	2017-12-14 17:41:02 -05:00
Steve Yen	506aa1c325	scorch fix data race w/ AddEligibleForRemoval Found from "go test -race ./..." WARNING: DATA RACE Read at 0x00c420088060 by goroutine 48: github.com/blevesearch/bleve/index/scorch.(Scorch).AddEligibleForRemoval() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:348 +0x6d Previous write at 0x00c420088060 by goroutine 31: github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt.func1() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:332 +0x87b github.com/boltdb/bolt.(DB).View() /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1 github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1 github.com/blevesearch/bleve/index/scorch.(Scorch).Open() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351 testing.tRunner() /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c Goroutine 48 (running) created at: github.com/blevesearch/bleve/index/scorch.(IndexSnapshot).DecRef() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/snapshot_index.go:72 +0x23e github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt.func1() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:330 +0x8f4 github.com/boltdb/bolt.(DB).View() /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1 github.com/blevesearch/bleve/index/scorch.(Scorch).loadFromBolt() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1 github.com/blevesearch/bleve/index/scorch.(Scorch).Open() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen() /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351 testing.tRunner() /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c	2017-12-14 14:40:33 -08:00
Marty Schoch	6ab27e4afa	quick hack to disable safe batches in fts	2017-12-14 17:19:50 -05:00
Steve Yen	eb2f541d4f	scorch filters _id from Reader.Document() results	2017-12-14 13:52:28 -08:00
Steve Yen	a8884e1011	scorch fix for TestSortMatchSearch The cachedDocs preparation has to happen for all docs in the field, not just on the currently requested docNum. Also, as part of this commit, there's a loop optimization where we no longer use bytes.Split() on the terms buffer, thus avoiding garbage creation.	2017-12-14 13:22:13 -08:00
Steve Yen	2be5eb4427	scorch tracks zap files that can't be removed yet A race & solution found by Marty Schoch... consider a case when the merger might grab a nextSegmentID, like 4, but takes awhile to complete. Meanwhile, the persister grabs the nextSegmentID of 5, but finishes its persistence work fast, and then loops to cleanup any old files. The simple approach of checking a "highest segment ID" of 5 is wrong now, because the deleter now thinks that segment 4's zap file is (incorrectly) ok to delete. The solution in this commit is to track an ephemeral map of filenames which are ineligibleForRemoval, because they're still being written (by the merger) and haven't been fully incorporated into the rootBolt yet. The merger adds to that ineligibleForRemoval map as it starts a merged zap file, the persister cleans up entries from that map when it persists zap filenames into the rootBolt, and the deleter (part of the persister's loop) consults the map before performing any actual zap file deletions.	2017-12-14 10:49:33 -08:00
Marty Schoch	bd742caf65	don't try to close a nil segment if err opening	2017-12-14 10:29:19 -05:00
Marty Schoch	149a26b5c1	merge deletion and cacheddocs fixes discussed in meeting	2017-12-14 10:27:39 -05:00
Sreekanth Sivasankaran	95b65ade3e	getting right internalID for doc in UT	2017-12-14 17:16:47 +05:30
Sreekanth Sivasankaran	1066ee7d22	DocumentVisitFieldTerms Scorch implementation level1	2017-12-14 12:38:29 +05:30
Marty Schoch	2b92e5ff99	Merge pull request #653 from steveyen/scorch scorch cleanup of the rootBolt of old snapshots	2017-12-13 22:47:14 -05:00
Marty Schoch	e1b0c61e2a	fix bug in handling iterator-done	2017-12-13 22:08:06 -05:00
Steve Yen	b7dff6669f	scorch cleanup of *.zap files not listed in the rootBolt	2017-12-13 17:09:50 -08:00
Steve Yen	c0cc46a2be	scorch cleanup of the rootBolt of old snapshots A new global variable, NumSnapshotsToKeep, represents the default number of old snapshots that each scorch instance should maintain -- 0 is the default. Apps that need rollback'ability may want to increase this value in early initialization. The Scorch.eligibleForRemoval field tracks epoches which are safe to delete from the rootBolt. The eligibleForRemoval is appended to whenever the ref-count on an IndexSnapshot drops to 0. On startup, eligibleForRemoval is also initialized with any older epoch's found in the rootBolt. The newly introduced Scorch.removeOldSnapshots() method is called on every cycle of the persisterLoop(), where it maintains the eligibleForRemoval slice to under a size defined by the NumSnapshotsToKeep. A future commit will remove actual storage files in order to match the "source of truth" information found in the rootBolt.	2017-12-13 15:53:31 -08:00
Steve Yen	c13ff85aaf	scorch ref-counting Future commits will provide actual cleanup when ref-counts reach 0.	2017-12-13 14:48:07 -08:00
Marty Schoch	50471003dc	basic refactoring of introducer to make it more readable	2017-12-13 16:30:39 -05:00
Marty Schoch	a0e12b2640	add license to a few files missing it	2017-12-13 16:12:29 -05:00
Marty Schoch	85e15628ee	major refactoring of posting details	2017-12-13 16:10:06 -05:00
Marty Schoch	6e2207c445	additional refactoring of build/merge	2017-12-13 15:22:13 -05:00
Marty Schoch	50441e5065	refactor to reuse shared code	2017-12-13 14:41:20 -05:00
Marty Schoch	289dc398bd	more refacotring of build/merge	2017-12-13 14:26:11 -05:00
Marty Schoch	1cd3fd7fbe	extrac common functionality between build/merge	2017-12-13 14:06:54 -05:00
Marty Schoch	cd45487cb3	fsync rootBolt when persisting snapshot	2017-12-13 13:55:06 -05:00
Marty Schoch	f83c9f2a20	initial cut of merger that actually introduces changes	2017-12-13 13:41:03 -05:00
Marty Schoch	c15c3c11cd	extra protection if dict address is 0 (empty segment)	2017-12-13 13:31:18 -05:00
Steve Yen	be7dd36ac6	mergeplan: more tests and bargraph tweaks	2017-12-12 10:37:27 -08:00
Steve Yen	59a1e26300	mergeplan: scoring implemented	2017-12-12 10:37:27 -08:00
Marty Schoch	57121e40a8	fix issues identified by errcheck	2017-12-12 11:41:14 -05:00
Marty Schoch	665c3c80ff	initial cut of zap segment merging	2017-12-12 11:21:55 -05:00
Marty Schoch	927216df8c	fix postings list count impl	2017-12-12 08:42:13 -05:00
Steve Yen	3461fb741f	mergeplan: a placeholder planner that merges all segments A stepping stone to fleshing out the API contract.	2017-12-11 14:53:08 -08:00
Marty Schoch	58ef21a88a	fix golint issue	2017-12-11 16:24:46 -05:00
Marty Schoch	f246e0e4c0	update README for zap file format changes	2017-12-11 16:22:29 -05:00
Marty Schoch	74b2eeb14d	refactor where we do some work so we can return error	2017-12-11 15:59:36 -05:00
Marty Schoch	f13b786609	fix up issues to get all bleve unit tests passing for scorch make scorch default	2017-12-11 15:47:41 -05:00
Marty Schoch	d7eb223e14	remove bolt segment format upcomning breaking changes and no desire to maintain	2017-12-11 10:20:26 -05:00
Marty Schoch	eada7b209b	fix test issue identified by sreekanth	2017-12-11 10:16:56 -05:00
Marty Schoch	8280859bb8	handle read-only and in-mem only cases	2017-12-11 09:07:01 -05:00
Marty Schoch	e8cc7ac0bf	add new fields command to zap cmd-line util	2017-12-11 09:05:50 -05:00
Marty Schoch	690cd39921	add crazy slow but functional DocumentVisitFieldTerms	2017-12-10 08:55:59 -05:00
Marty Schoch	dc0adc8827	add fsync	2017-12-09 20:52:01 -05:00
Marty Schoch	e0d9828cd0	add more detail to the readme	2017-12-09 14:42:36 -05:00
Marty Schoch	414899618b	switch from bolt format to zap in the persister	2017-12-09 14:28:50 -05:00
Marty Schoch	9781d9b089	add initial version of zap file format	2017-12-09 14:28:33 -05:00
Marty Schoch	ff2e6b98e4	added empty segment	2017-12-09 12:43:02 -05:00
Marty Schoch	e470105635	fix issues identified by errcheck	2017-12-06 18:36:14 -05:00
Marty Schoch	adac4f41db	initial version of scorch which persists index to disk	2017-12-06 18:33:47 -05:00
Marty Schoch	b1346b4c8a	add readme describing our use of bolt as a segment format	2017-12-05 16:09:00 -05:00
Marty Schoch	898a6b1e85	fix errcheck issues	2017-12-05 13:32:57 -05:00
Marty Schoch	ece27ef215	adding initial version of bolt persisted segment	2017-12-05 13:05:12 -05:00
Marty Schoch	f6be841668	add test for postings list count method	2017-12-05 13:01:36 -05:00
Marty Schoch	30e9d6daa5	add better testing of array positions	2017-12-05 12:54:44 -05:00
Marty Schoch	8d9d45115f	add test of location field	2017-12-05 12:20:06 -05:00
Marty Schoch	8f0350865b	add test for segment fields method	2017-12-05 12:17:56 -05:00
Marty Schoch	7a6b5483f2	add validation that all locations were seen	2017-12-05 11:58:05 -05:00
Marty Schoch	e08fdab54a	remove todo item	2017-12-05 10:13:27 -05:00
Marty Schoch	87e2627551	added dictionary tests to mem segment	2017-12-05 09:49:41 -05:00
Marty Schoch	ed067f45dd	added Close() method to Segment	2017-12-05 09:31:02 -05:00
Marty Schoch	22ffc8940e	update segment API to return error in key places	2017-12-04 18:06:06 -05:00
Marty Schoch	b74cf4b081	add copyright header to all new files in scorch	2017-12-01 15:42:50 -05:00
Marty Schoch	89aa02cf5b	fix highlighting of composite fields updated log statements for refactored names	2017-12-01 15:12:08 -05:00
Marty Schoch	cff14f1212	fix crash in DocNumbers when segment is empty	2017-12-01 09:50:27 -05:00
Marty Schoch	eb256f78bc	switch to constant referring to id field id 0 this avoids potentially mutating something that is intended to be immutable	2017-12-01 09:30:07 -05:00
Marty Schoch	7c964de8bf	switch to binary search for finding segment from global doc num added unit tests for this function specifically	2017-12-01 09:26:51 -05:00
Marty Schoch	c2047dcdf9	refactor doc id reader creation to share more code fix issue identified by steve	2017-12-01 08:54:39 -05:00
Marty Schoch	bcd4bdc3d1	added initial bolt thought to README	2017-12-01 07:27:04 -05:00
Marty Schoch	395458ce83	refactor to make mem segment contents exported	2017-12-01 07:26:47 -05:00
Steve Yen	398dcb19b3	scorch introducer uses the roaring.Or(x, y) API Instead of cloning an input bitmap, the roaring.Or(x, y) implementation fills a brand new result bitmap, which should be allow for more efficient packing and memory utilization.	2017-11-30 10:37:10 -08:00
Steve Yen	67986d41bf	scorch InternalID() handles case of unknown docId	2017-11-30 08:36:01 -08:00
Marty Schoch	848aca4639	fix issues identified by errcheck	2017-11-29 13:34:15 -05:00
Marty Schoch	23f6dc1cc6	working in-memory version	2017-11-29 11:33:35 -05:00
Steve Yen	546700b2de	fix comment typo	2017-08-24 16:25:10 -07:00
Marty Schoch	cea119449e	fix data race in doc id search the implementation of the doc id search requires that the list of ids be sorted. however, when doing a multisearch across many indexes at once, the list of doc ids in the query is shared. deeper in the implementation, the search of each shard attempts to sort this list, resulting in a data race. this is one example of a potentially larger problem, however it has been decided to fix this data race, even though larger issues of data owernship may remain unresolved. this fix makes a copy of the list of doc ids, just prior to sorting the list. subsequently, all use of the list is on the copy that was made, not the original. fixes #518	2017-08-07 15:11:35 -04:00
abhinavdangeti	8ec88a6cb0	MB-24560: Add moss store\|collection histograms to stats	2017-05-25 16:32:36 -07:00
Marty Schoch	3ad13236ec	fix geopoint fields to be able to be stored and retrieved	2017-03-31 09:40:54 -04:00
Marty Schoch	74140d4f2b	remove forestdb from bleve	2017-03-30 12:27:23 -04:00
Marty Schoch	1bcfe4efa1	Merge pull request #546 from sreekanth-cb/store_abort_close Store abort close	2017-03-07 12:35:18 -05:00
Sreekanth Sivasankaran	f759d841c2	Adding guards for config casting.	2017-03-07 22:51:27 +05:30
Sreekanth Sivasankaran	e88ff3c60a	Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close Syntax change for errcheck tool	2017-03-07 19:56:08 +05:30
Sreekanth Sivasankaran	ee819f5950	MB-22410 - Configurable forced Store Abort API Adding a configurable forced store close Bumping the moss store version	2017-03-07 19:33:51 +05:30
Marty Schoch	0eba2a3f0c	reduce garbage created while processing facets previously we parsed/returned large sections of the documents back index row in order to compute facet information. this would require parsing the protobuf of the entire back index row. unfortunately this creates considerable garbage. this new version introduces a visitor/callback approach to working with data inside the back index row. the benefit of this approach is that we can let the higher-level code see values, prior to any copies of data being made or intermediate garbage being created. implementations of the callback must copy any value which they would like to retain beyond the callback. NOTE: this approach is duplicates code from the automatically generated protobuf code NOTE: this approach assumes that the "field" field be serialized before the "terms" field. This is guaranteed by our currently generated protobuf encoder, and is recommended by the protobuf spec. But, decoders SHOULD support them occuring in any order, which we do not.	2017-03-02 17:00:46 -05:00
Marty Schoch	b04745abcc	remove smolder indexing scheme this was an experiment that we're no longer working on we learned from it, but now carrying it forward has a maintenance burden we don't wish to pay	2017-03-01 14:38:17 -05:00
Sreekanth Sivasankaran	67a5814fbe	MB-22410:deleting/editing index definition with large dirty write queue can be very slow Adding a configurable forced store close	2017-03-01 18:58:32 +05:30
Sreekanth Sivasankaran	324e4237cf	adding configurable Abort Close	2017-03-01 16:23:56 +05:30
Sundar Sridharan	74c7de0dcf	re-order childSnapshot declaration	2017-02-21 15:54:04 -08:00
Sundar Sridharan	04d428656e	Add Snapshot interface methods for moss child collections feature	2017-02-20 15:03:45 -08:00
Steve Yen	0b70a1bcb8	use inlined prealloc'ed termFreqRow in upsidedown termFieldReader	2017-02-08 18:23:13 -08:00
Steve Yen	31fecc3663	avoid row alloc's in upsidedown termFieldReader constructor	2017-02-08 18:14:30 -08:00
Marty Schoch	606fd6344b	INDEX FORMAT CHANGE: change back index row value Previously term entries were encoded pairwise (field/term), so you'd have data like: F1/T1 F1/T2 F1/T3 F2/T4 F3/T5 As you can see, even though field 1 has 3 terms, we repeat the F1 part in the encoded data. This is a bit wasteful. In the new format we encode it as a list of terms for each field: F1/T1,T2,T3 F2/T4 F3/T5 When fields have multiple terms, this saves space. In unit tests there is no additional waste even in the case that a field has only a single value. Here are the results of an indexing test case (beer-search): $ benchcmp indexing-before.txt indexing-after.txt benchmark old ns/op new ns/op delta BenchmarkIndexing-4 11275835988 10745514321 -4.70% benchmark old allocs new allocs delta BenchmarkIndexing-4 25230685 22480494 -10.90% benchmark old bytes new bytes delta BenchmarkIndexing-4 4802816224 4741641856 -1.27% And here are the results of a MatchAll search building a facet on the "abv" field: $ benchcmp facet-before.txt facet-after.txt benchmark old ns/op new ns/op delta BenchmarkFacets-4 439762100 228064575 -48.14% benchmark old allocs new allocs delta BenchmarkFacets-4 9460208 3723286 -60.64% benchmark old bytes new bytes delta BenchmarkFacets-4 260784261 151746483 -41.81% Although we expect the index to be smaller in many cases, the beer-search index is about the same in this case. However, this may be due to the underlying storage (boltdb) in this case. Finally, the index version was bumped from 5 to 7, since smolder also used version 6, which could lead to some confusion.	2017-01-24 15:39:38 -05:00
Steve Yen	5927224e15	optimize mergeOldAndNew for case of first time a doc is seen	2017-01-09 22:48:58 -08:00
Steve Yen	790f2e3e32	optimize by alloc'ing arrays of TermFrequencyRow/TermVector	2017-01-09 22:42:00 -08:00
Steve Yen	8f4726ab10	use struct{}{} idiom instead of additional mark var	2017-01-09 10:17:26 -08:00
Steve Yen	302cac72c4	optimize mergeOldAndNew when non-update case	2017-01-08 17:59:49 -08:00
Steve Yen	931d133024	go fmt and go vet	2017-01-07 22:14:22 -08:00
Steve Yen	40780254ae	optimize upsidedown mergeOldAndNew existing key maps The optimization is to provide a better initial size to the map constructor and to use a 0-byte-sized struct{} as the map values.	2017-01-07 22:05:55 -08:00
Steve Yen	c2bafa2a51	optimize term vectors/locations via preallocated arrays The change should hit the allocator less often when processing term vectors/locations as it preallocates larger, contiguous arrays of records upfront.	2017-01-07 12:34:06 -08:00
Steve Yen	8b140d84c4	minor optimization of upsidedown backIndexRowForDoc This change might allow a smart enough golang compiler to perhaps allocate a backIndexRow on the stack rather than the heap.	2017-01-07 11:49:42 -08:00
Steve Yen	c21d27e15a	upsidedown TermFieldReader checks includeTermVectors flag param The flag was part of the API, but wasn't previously checked.	2017-01-05 21:10:27 -08:00
Steve Yen	37490864ce	bleve/index/store/moss - accessor for underlying mossStore This change adds methods that provide access to the actual, underlying mossStore instance in the bleve/index/store/moss KVStore adaptor. This enables applications to utilize advanced, mossStore-specific features (such as partial rollback of indexes). See also https://issues.couchbase.com/browse/MB-17805	2016-12-05 12:25:29 -08:00
Patrick Mezard	c81fd6fdb0	index: DocIDReader.Next() returns nil when done not io.EOF	2016-11-20 19:05:35 +01:00

1 2 3 4 5 ...

551 Commits