bleve

Author	SHA1	Message	Date
Steve Yen	fb8c9a7475	firestorm.Batch() collects [][]IndexRows instead of []IndexRow Rather than append() all received rows into a flat []IndexRow during the result gathering loop, this change instead collects the analysis result rows into a [][]IndexRow, which avoids extra copying. As part of this, firestorm batchRows() now takes the [][]IndexRow as its input.	2016-01-02 12:30:47 -08:00
Steve Yen	1c5b84911d	firestorm DictUpdater NotifyBatch is more async	2016-01-02 12:21:25 -08:00
Steve Yen	b241242465	firestorm.Analyze() preallocs rows, with analyzeField() func The new analyzeField() helper func is used for both regular fields and for composite fields. With this change, all analysis is done up front, for both regular fields and composite fields. After analysis, this change counts up all the row capacity needed and extends the AnalysisResult.Rows in one shot, as opposed to the previous approach of dynamically growing the array as needed during append()'s. Also, in this change, the TermFreqRow for _id is added first, which seems more correct.	2016-01-02 12:21:25 -08:00
Steve Yen	5b2bc1c20f	firestorm.indexField() check for includeTermVectors moved out of loop	2016-01-02 12:21:25 -08:00
Steve Yen	45e9eaaacb	firestorm.indexField() allocs up-front array of TermFreqRow's This uses the "backing array" technique to allocate many TermFreqRow's at the front of firestorm.indexField(), instead of the previous one-by-one, as-needed TermFreqRow allocation approach. Results from micro-benchmark, null-firestorm, bleve-blast has this change producing a ~half MB/sec improvement.	2016-01-02 12:21:24 -08:00
Steve Yen	7ae696d661	firestorm lookuper notified via batch Previously, the firestorm.Batch() would notify the lookuper goroutine on a document by document basis. If the lookuper input channel became full, then that would block the firestorm.Batch() operation. With this change, lookuper is notified once, with a "batch" that is an []InFlightItem. This change also reuses that same []InFlightItem to invoke the compensator.MutateBatch(). This also has the advantage of only converting the docID's from string to []byte just once, outside of the lock that's used by the compensator. Micro-benchmark of this change with null-firestorm bleve-blast does not show large impact, neither degradation or improvement.	2016-01-02 12:21:24 -08:00
Steve Yen	38d50ed8b5	renamed var to docsUpdated to match docsDeleted naming	2016-01-02 12:21:24 -08:00
Steve Yen	3feeb14b7d	firestorm.batchRows reuses buf for all IndexRows	2016-01-02 12:21:24 -08:00
Steve Yen	0a7f7e3df8	firestorm.Analyze() converts docID to bytes only once	2016-01-02 12:21:24 -08:00
Steve Yen	fd81d0364c	firestorm.indexField() uses capacity of len(tokenFreqs)	2016-01-02 12:21:24 -08:00
Steve Yen	ee5ccda112	use KeyTo/ValueTo in firestorm.batchRows After this change, with null kvstore micro-benchmark... GOMAXPROCS=8 ./bleve-blast -source=../../tmp/enwiki.txt \ -count=100000 -numAnalyzers=8 -numIndexers=8 \ -config=../../configs/null-firestorm.json -batch=100 Then TermFreqRow key and value methods dissapear as large boxes from the cpu profile graphs.	2016-01-01 09:57:59 -08:00
Steve Yen	fd287bdfa4	firestorm.md markdown fixes	2016-01-01 09:57:59 -08:00
Steve Yen	b605224106	use shorter go idiom	2015-12-29 22:14:45 -08:00
Antoine Grondin	6806343677	firestore: fix #296 for division by zero on GC	2015-12-25 11:34:19 +07:00
Antoine Grondin	a6f7abdfa3	firestore: reproducer for division by zero on GC	2015-12-25 11:33:46 +07:00
Marty Schoch	8efbd556a3	fix indexing bug with data coming from arrays fixes #295	2015-12-21 14:59:32 -05:00
Marty Schoch	cf67fe2cbc	fix major synchronization issue in the field_cache The field cache is expected to be the authority on which field names are identified by which identifier. This code was optimized for the most common case in which fields already exist. However, if we deterimine the field is missing with the read lock (shared), we incorrectly immediately proceed to create a new row with the write lock (exclusive). The problem is that multiple goroutines might have come to the same conclusion, and they all proceed to add rows. The two choices were to do the whole operation with the write lock, or recheck the value again with the write lock. We have chosen to repeat the check inside the write-lock, as this optimizes for what we believe to be the most common case, in which most fields will already exist.	2015-12-15 16:39:38 -05:00
Marty Schoch	a73a178923	fix incorrect prefix search behavior avoids double incrementing of end term when reading term dict fixes #293	2015-12-04 14:07:16 -05:00
Marty Schoch	699c86073a	make existing integration tests work with firestorm	2015-12-01 12:29:56 -05:00
Marty Schoch	6d851cfcc2	fix bug in warmup which led to docs being deleted	2015-11-30 10:18:14 -05:00
Marty Schoch	aa8d98f5fa	include space after prefix in log output	2015-11-30 10:17:48 -05:00
Marty Schoch	68d8742826	correctly prefix internal rows with 'i' and print them in debug	2015-11-30 10:17:15 -05:00
Marty Schoch	c93de9734e	fix issues identified by errcheck	2015-11-24 14:32:33 -05:00
Marty Schoch	bbef1980d8	Merge branch 'master' into firestorm	2015-11-24 13:04:36 -05:00
Marty Schoch	ff11f83842	properly handle errors inside metrics kvstore reporting	2015-11-24 12:52:03 -05:00
Marty Schoch	a707d44e0b	Merge branch 'master' into firestorm	2015-11-24 09:44:47 -05:00
Patrick Mezard	e85c9c542e	row: expose TermFrequencyRow term and freq fields Rows content is an implementation detail of bleve index and may change in the future. That said, they also contains information valuable to assess the quality of the index or understand its performances. So, as long as we agree that type asserting rows should only be done if you know what you are doing and are ready to deal with future changes, I see no reason to hide the row fields from external packages. Fix #268	2015-11-17 17:21:26 +01:00
Kosov Eugene	45e670b99b	BoltDB wrapper nano optimization which makes code a bit prettier too	2015-11-05 00:27:28 +03:00
Marty Schoch	4791625b9b	Merge pull request #262 from pmezard/index-and-tokenizer-doc-and-fix Index and tokenizer doc and fix	2015-11-02 11:51:21 -05:00
Marty Schoch	30651065e9	fix panic on insufficiently sized buffer adds test case to reproduce original problem fixes #264	2015-10-30 18:25:38 -04:00
Marty Schoch	2bd3ef4080	copy relevant k/v pairs before advancing underlying iterator	2015-10-28 12:23:54 -04:00
Marty Schoch	d1b07f4909	fix dump methods to properly copy keys and values	2015-10-28 12:06:44 -04:00
Marty Schoch	01526e971f	Merge branch 'master' into firestorm	2015-10-28 11:26:01 -04:00
Patrick Mezard	f2b3d5698e	index: document TermFieldReader interface	2015-10-27 18:53:03 +01:00
Patrick Mezard	3df789d258	index: document empty strings behaviour when calling DocIDReader()	2015-10-27 18:53:03 +01:00
Marty Schoch	1a978a4591	fix go vet issues and cleanup reader/iterator	2015-10-26 16:41:58 -04:00
Marty Schoch	f0d282f5f8	add test case for seeing prefix iterators outside of range similar to #256 except for prefix iterators includes fix for boltdb and gtreap which had incorrect behavior	2015-10-26 16:14:29 -04:00
Patrick Mezard	5100e00f20	doc: DocIDReader.Advance() is no longer implementation dependent	2015-10-20 20:32:23 +02:00
Patrick Mezard	2fa334fc27	doc: talk about "documents" not "indexed or stored documents"	2015-10-20 20:24:24 +02:00
Patrick Mezard	b174c137fd	doc: document DocIDReader, and some Index bits	2015-10-20 20:24:24 +02:00
Patrick Mezard	da72d0c2b9	store_test: deduplicate store initialization	2015-10-20 19:21:01 +02:00
Patrick Mezard	873f483804	gtreap: RangeIterator.Seek should not move before start	2015-10-20 19:12:30 +02:00
Patrick Mezard	5d7628ba3b	boltdb: fix RangeIterator outside of range seeks Two issues: - Seeking before i.start and iterating returned keys before i.start - Seeking after the store last key did not invalidate the iterator and could cause infinite loops.	2015-10-20 19:09:51 +02:00
Patrick Mezard	aada2e7333	store_test: test RangeIterator.Seek on goleveldb	2015-10-20 19:09:38 +02:00
Marty Schoch	6cc21346dc	fix errcheck issues	2015-10-19 14:27:03 -04:00
Marty Schoch	817c317c90	Merge branch 'master' into newkvstore	2015-10-19 12:04:07 -04:00
Marty Schoch	faceecf87b	make row buffer size constant/configurable also handle case where it is insufficiently sized	2015-10-19 12:03:38 -04:00
Marty Schoch	f0ee9a3c66	removed commented code and unused functions	2015-10-19 11:13:03 -04:00
Marty Schoch	c9471d5739	Merge pull request #244 from kevgs/master reducing allocation count	2015-10-16 15:51:30 -04:00
Marty Schoch	e6d0fc8d95	Merge pull request #247 from pmezard/remove-update-goroutine upside_down: no need for a goroutine to enqueue AnalysisWork	2015-10-16 10:15:55 -04:00
Marty Schoch	4c6bc23043	rewrite to keep using same buffer when possible	2015-10-13 14:04:56 -07:00
Marty Schoch	8de860bf12	2 more places that used old Key()	2015-10-13 12:35:08 -07:00
Marty Schoch	5f594d1acc	Merge branch 'master' into newkvstore	2015-10-12 18:07:04 -07:00
Marty Schoch	08572e4925	move literals outside loop for more predicatble test results	2015-10-12 18:06:38 -07:00
Patrick Mezard	8c928539ee	upside_down: no need for a goroutine to enqueue AnalysisWork It boils down to: 1. client sends some work and a notification channel to a single worker, then waits. 2. worker processes the work 3. worker sends the result to the client using the notification channel I do not see any problem with this, even with unbuffered channels.	2015-10-12 10:42:14 +02:00
Marty Schoch	95e06538f3	fix benchmarks for the x kvstores	2015-10-09 11:09:42 -04:00
Marty Schoch	0f05d1d3ca	Merge branch 'master' into newkvstore	2015-10-09 10:33:41 -04:00
Patrick Mezard	aee82f8b49	upside_down: simplify return code in batchRows()	2015-10-09 09:57:12 +02:00
Marty Schoch	e28eb749d7	bump up buffer size	2015-10-06 16:45:38 -04:00
Marty Schoch	71cbb13e07	modify code to reuse buffer for kv generation	2015-10-05 17:49:50 -04:00
Kosov Eugene	a61c350888	reducing allocation count	2015-10-05 22:57:10 +03:00
Patrick Mezard	9d5407be13	boltdb: add "nosync" option to force boltdb.DB.NoSync=true Use this option when rebuilding indexes from scratch. In my small case (~17000 json documents), it reduces indexing from 520s to 250s. I did not add any test, short of forced indexing termination it only has performance effects, which are hard to test. And unknown options are currently ignored. Issue #240	2015-10-03 14:26:48 +02:00
Marty Schoch	d06b526cbf	more refactoring	2015-09-28 16:50:27 -04:00
Marty Schoch	66aa1b020a	Merge branch 'master' into firestorm	2015-09-23 11:32:25 -07:00
Marty Schoch	900f1b4a67	major kvstore interface and impl overhaul clarified the interface contract	2015-09-23 11:25:47 -07:00
Marty Schoch	f81b2be334	major refactor of bleve configuration see #221 for full details	2015-09-16 17:10:59 -04:00
Marty Schoch	c308f611cf	skip unnecessary map before slice benchmark old ns/op new ns/op delta BenchmarkBatch-4 16950972 16377194 -3.38% benchmark old allocs new allocs delta BenchmarkBatch-4 136164 136161 -0.00% benchmark old bytes new bytes delta BenchmarkBatch-4 7168872 7109691 -0.83%	2015-09-10 08:21:26 -04:00
Marty Schoch	f6f1628b15	avoid doing unnecessary work: benchmark old ns/op new ns/op delta BenchmarkBatch-4 20738739 17047158 -17.80% benchmark old allocs new allocs delta BenchmarkBatch-4 136423 136160 -0.19% benchmark old bytes new bytes delta BenchmarkBatch-4 20277781 7168772 -64.65%	2015-09-10 08:19:05 -04:00
Marty Schoch	c8538c835f	Merge branch 'master' into firestorm	2015-09-10 08:14:14 -04:00
Marty Schoch	17c64d37c7	add similar benchmarks from firestorm	2015-09-10 08:13:52 -04:00
Marty Schoch	1e4d637761	adding more benchmarks	2015-09-10 08:01:11 -04:00
Marty Schoch	f74ed6a9ae	Merge remote-tracking branch 'origin' into firestorm cathching up with changes from master	2015-09-02 13:29:03 -04:00
Marty Schoch	dbb93b75a4	refactoring to allow pluggable index encodings this lays the foundation for supporting the new firestorm indexing scheme. i'm merging these changes ahead of the rest of the firestorm branch so i can continue to make changes to the analysis pipeline in parallel	2015-09-02 13:12:08 -04:00
Marty Schoch	7ad7659ce5	add support for using null kvstore outside of bleve internals	2015-09-02 11:50:06 -04:00
Marty Schoch	07d37ca38a	add important rocksdb config options	2015-09-02 11:49:42 -04:00
Marty Schoch	18151862b5	fix go vet issues	2015-08-25 15:13:13 -04:00
Marty Schoch	84811cf5a0	made index type configurable + first version of firestorm	2015-08-25 14:52:42 -04:00
Marty Schoch	3e60ca24ec	support using end key on forestdb iterator for term freq lookup also additoanl forestdb configs	2015-08-18 16:22:02 -04:00
Marty Schoch	ae19d77b04	updated protobuf defs to be valid	2015-08-17 15:37:13 -04:00
Marty Schoch	1187436e46	changed Stored row Values to also use protobuf	2015-08-17 09:48:40 -04:00
Marty Schoch	8d8a05a842	fix more issues	2015-08-14 16:27:00 -04:00
Marty Schoch	e0802a2b39	fixed the worst of the formatting	2015-08-14 16:17:48 -04:00
Marty Schoch	f4df56eb7c	add first draft of firestorm proposal	2015-08-14 16:09:19 -04:00
Marty Schoch	d3dda3d0ea	fixup config parsing and add new options	2015-08-12 13:18:23 -04:00
Marty Schoch	01667dfff3	faster protobufs with gogo	2015-08-12 13:18:23 -04:00
Marty Schoch	7df66b4857	fix broken benchmark cause by index row encoding change	2015-08-06 14:48:04 -04:00
Marty Schoch	9db850a53e	Merge branch 'fix/MaxVarintLen64' of https://github.com/tukdesk/bleve into tukdesk-fix/MaxVarintLen64	2015-07-31 15:16:16 -04:00
Marty Schoch	3682c25467	update to correctly work with composite fields also updated search results to return array positions	2015-07-31 11:16:11 -04:00
Marty Schoch	c1c4941dde	Merge branch 'feature/term_vector' of https://github.com/tukdesk/bleve into tukdesk-feature/term_vector	2015-07-29 14:31:15 -04:00
Marty Schoch	bf8dcae76b	removing build tags	2015-07-28 18:59:10 -04:00
Marty Schoch	1b28f6218b	additional row validation	2015-07-13 15:22:54 -04:00
Marty Schoch	17ef48f82a	switching back to the canonical goleveldb repo	2015-07-08 12:21:17 -06:00
Marty Schoch	bf80f4628e	fix bug in curent goleveldb (must copy during iteration) also changed over to mschoch fork of goleveldb (temporary) the change to my fork is pending some read-only issues described here: https://github.com/syndtr/goleveldb/issues/111 hopefully we can find a path forward, and get that addressed upstream	2015-07-06 18:00:05 -04:00
Marty Schoch	7be7ecdf8e	fix batch indexing bug, incremented docCount before commit fixes #211	2015-06-08 14:14:05 -04:00
Marty Schoch	2768c2da3c	fix previous sloppy fix which hadn't been adequately tested	2015-05-27 19:15:55 -07:00
Marty Schoch	201fb91171	fix up to correctly trim off separator even though it should never be present	2015-05-27 19:10:12 -07:00
Marty Schoch	a58592ceff	fix case where NewBackIndexRowKV returns nil, nil the logic for reading the docID from the keys in this row relies on the keys NEVER containing the byte separator character (0xff), this is OK as we require that all keys be valid utf-8 however, it turns out that in the case where this rule was violated, we would panic, because we return nil, nil and later try to print the doc id	2015-05-27 19:04:57 -07:00
dtynn	59c97ae577	use binary.MaxVarintLen64	2015-05-26 15:35:31 +08:00
Marty Schoch	e0887f9113	fix tests which deadlock boltdb due to deferred cleanup fixes #209	2015-05-21 12:29:31 -04:00
Marty Schoch	a52d3b5c07	put in hack to allow boltdb reader isolation test to pass in boltdb, long readers MAY block a writer. in particular if the write requires additional allocation, it must acquire a lock already held by the reader. in general this is not a problem for bleve (though it can affect performance in some cases), but it is a problem for the reader isolation test. this commit adds a hack to try and avoid the need for additional allocation closes #208	2015-05-21 11:39:59 -04:00

1 2 3 4 5 ...

295 Commits