bleve

Author	SHA1	Message	Date
Steve Yen	be2800a8e4	MB-18715 - moss Merge() didn't bump bufUsed correctly And, also allocate more memory for both the partial and full merges.	2016-03-15 17:09:40 -07:00
Marty Schoch	d7292ed891	add support for gathering stats via map for easier consumption	2016-03-07 18:37:46 -05:00
Marty Schoch	23a323bc9d	add support for numPlainTextBytesIndexed metric	2016-03-05 14:05:08 -05:00
Steve Yen	a29dd25a48	upside_down dict row value size accounts for large uvarint's This is somewhat unlikely, but if a term is (incredibly) popular, its uvarint count value representation might go beyond 8 bytes. Some KVStore implementations (like forestdb) provide a BatchEx cgo optimization that depends on proper preallocated counting, so this change provides a proper worst-case estimate based on the max-unvarint of 10 bytes instead of the previously incorrect 8 bytes.	2016-02-22 11:52:51 -08:00
Marty Schoch	40c95513b7	add support for including kvstore stats	2016-02-05 12:26:19 -05:00
Marty Schoch	c5dea9e882	fix accessing store via Advanced() method which was broken	2016-02-02 11:54:18 -05:00
Steve Yen	035d9d0e40	unneeded cast and parens	2016-01-17 00:16:05 -08:00
Steve Yen	6849e538be	upside_down and firestorm use new NewBatchEx() API With this change, the upside_down batchRows() and firestorm batchRows() now use the new KVWriter.NewBatchEx() API, which can improve performance by reducing the number of cgo hops.	2016-01-13 23:08:20 -08:00
Marty Schoch	af25e724f6	Merge branch 'master' of https://github.com/slavikm/bleve into slavikm-master	2016-01-13 16:10:59 -05:00
Steve Yen	0e72b949b3	upside_down batchRows() takes array of arrays In order to spend less time in append(), this change in upside_down (similar to another recent performance change in firestorm) builds up an array of arrays as the eventual input to batchRows().	2016-01-11 18:11:21 -08:00
slavikm	680be52f87	Implemented boolean field support	2016-01-11 17:18:03 -08:00
Steve Yen	7ce7d98cba	upside_down merge dictionary deltas before using batch.Merge() This change performs more dictionary delta incr/decr math in batchRows() instead of in the KVStore ExecuteBatch() machinery.	2016-01-11 16:52:07 -08:00
Steve Yen	94273d5fa9	upside_down process internal rows earlier With this change, internal rows are processed while we're waiting for backIndex rows to be retrieved.	2016-01-11 16:25:35 -08:00
Steve Yen	bb5cd8f3d6	upside_down merge backIndexRow concurrently Previously, the code would gather all the backIndexRows before processing them. This change instead merges the backIndexRows concurrently on the theory that we might as well make progress on compute & processing tasks while waiting for the rest of the back index rows to be fetched from the KVStore.	2016-01-10 18:50:42 -08:00
Steve Yen	c3b5246b0c	upside_down track analysis time tighter; and comments	2016-01-10 15:36:54 -08:00
Steve Yen	d3dd40d334	upside_down retrieves backindex concurrently with analysis Start backindex reading concurrently with analysi to try to utilize more I/O bandwidth. The analysis time vs indexing time stats tracking are also now "off", since there's now concurrency between those actiivties. One tradeoff is that the lock area in upside_down Batch() is increased as part of this change.	2016-01-10 15:18:28 -08:00
Steve Yen	860de28a28	fix memory leak by closing batches in batchRows()	2016-01-07 17:59:42 -08:00
Steve Yen	846912d083	upside_down udc.termVectorsFromTokenFreq rows append optimization	2016-01-07 00:48:34 -08:00
Steve Yen	8b980bd2ef	firestorm avoid extra goroutine, similar to upside_down	2016-01-07 00:43:27 -08:00
Steve Yen	4eee8821f9	upside_down storeField/indexField append to provided arrays Taking another optimization from firestorm, upside_down's storeField()/indexField() funcs now also append() to passed-in arrays rather than always allocating their own arrays.	2016-01-07 00:13:46 -08:00
Steve Yen	82b8b3468e	upside_down analysis converts to docIDBytes once	2016-01-06 23:38:02 -08:00
Steve Yen	89d17f01ef	analyze locations only if includeTermVectors enabled With this change, TermLocations are computed and maintained only if includeTermVectors is enabled, for higher performance.	2016-01-05 12:46:46 -08:00
Marty Schoch	8efbd556a3	fix indexing bug with data coming from arrays fixes #295	2015-12-21 14:59:32 -05:00
Marty Schoch	30651065e9	fix panic on insufficiently sized buffer adds test case to reproduce original problem fixes #264	2015-10-30 18:25:38 -04:00
Marty Schoch	817c317c90	Merge branch 'master' into newkvstore	2015-10-19 12:04:07 -04:00
Marty Schoch	faceecf87b	make row buffer size constant/configurable also handle case where it is insufficiently sized	2015-10-19 12:03:38 -04:00
Marty Schoch	c9471d5739	Merge pull request #244 from kevgs/master reducing allocation count	2015-10-16 15:51:30 -04:00
Marty Schoch	4c6bc23043	rewrite to keep using same buffer when possible	2015-10-13 14:04:56 -07:00
Marty Schoch	8de860bf12	2 more places that used old Key()	2015-10-13 12:35:08 -07:00
Patrick Mezard	8c928539ee	upside_down: no need for a goroutine to enqueue AnalysisWork It boils down to: 1. client sends some work and a notification channel to a single worker, then waits. 2. worker processes the work 3. worker sends the result to the client using the notification channel I do not see any problem with this, even with unbuffered channels.	2015-10-12 10:42:14 +02:00
Marty Schoch	0f05d1d3ca	Merge branch 'master' into newkvstore	2015-10-09 10:33:41 -04:00
Patrick Mezard	aee82f8b49	upside_down: simplify return code in batchRows()	2015-10-09 09:57:12 +02:00
Marty Schoch	e28eb749d7	bump up buffer size	2015-10-06 16:45:38 -04:00
Marty Schoch	71cbb13e07	modify code to reuse buffer for kv generation	2015-10-05 17:49:50 -04:00
Kosov Eugene	a61c350888	reducing allocation count	2015-10-05 22:57:10 +03:00
Marty Schoch	d06b526cbf	more refactoring	2015-09-28 16:50:27 -04:00
Marty Schoch	900f1b4a67	major kvstore interface and impl overhaul clarified the interface contract	2015-09-23 11:25:47 -07:00
Marty Schoch	dbb93b75a4	refactoring to allow pluggable index encodings this lays the foundation for supporting the new firestorm indexing scheme. i'm merging these changes ahead of the rest of the firestorm branch so i can continue to make changes to the analysis pipeline in parallel	2015-09-02 13:12:08 -04:00
Marty Schoch	3682c25467	update to correctly work with composite fields also updated search results to return array positions	2015-07-31 11:16:11 -04:00
Marty Schoch	c1c4941dde	Merge branch 'feature/term_vector' of https://github.com/tukdesk/bleve into tukdesk-feature/term_vector	2015-07-29 14:31:15 -04:00
Marty Schoch	7be7ecdf8e	fix batch indexing bug, incremented docCount before commit fixes #211	2015-06-08 14:14:05 -04:00
dtynn	b4f7496031	update the index format version number	2015-05-18 15:16:35 +08:00
dtynn	89dc2c22bc	update TermVector	2015-05-17 13:07:14 +08:00
Marty Schoch	8f70def63b	properly use the stored array positions when loading a document fixes #205	2015-05-15 15:47:54 -04:00
Marty Schoch	328bc73ed0	clarify Batch is not threadsafe in docs in some limited cases we can detect unsafe usage in these cases, do not trip over ourselves and panic instead return a strongly typed error upside_down.UnsafeBatchUseDetected also, introduced Batch.Reset() to allow batch reuse this is currently still experimental closes #195	2015-05-15 15:04:52 -04:00
Marty Schoch	57cd67fa88	fix data race on index metadata (docCount) closes #198	2015-05-08 08:07:20 -04:00
Marty Schoch	a9c07acbfa	refactor of kvstore api to support native merge in rocksdb refactor to share code in emulated batch refactor to share code in emulated merge refactor index kvstore benchmarks to share more code refactor index kvstore benchmarks to be more repeatable	2015-04-24 17:13:50 -04:00
Marty Schoch	f1ec73e764	fix issues identified by errcheck part of #169	2015-04-07 13:26:54 -04:00
Marty Schoch	522f9d5cc7	significant change to index format, support dictionary rows this introduces disk format v4 now the summary rows for a term are stored in their own "dictionary row" format, previously the same information was stored in special term frequency rows this now allows us to easily iterate all the terms for a field in sorted order (useful for many other fuzzy data structures) at the top-level of bleve you can now browse terms within a field using the following api on the Index interface: FieldDict(field string) (index.FieldDict, error) FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error) FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error) fixes #127	2015-03-10 16:22:19 -04:00
Marty Schoch	a2ad7634f2	update term freq rows to use varint where possible benchmark old ns/op new ns/op delta BenchmarkLevelDBIndexing1Workers 1138292 657901 -42.20% BenchmarkLevelDBIndexing2Workers 1619323 647628 -60.01% BenchmarkLevelDBIndexing4Workers 1172845 636478 -45.73% BenchmarkLevelDBIndexing1Workers10Batch 465556545 448153394 -3.74% BenchmarkLevelDBIndexing2Workers10Batch 504203911 449657355 -10.82% BenchmarkLevelDBIndexing4Workers10Batch 510766435 439839335 -13.89% BenchmarkLevelDBIndexing1Workers100Batch 307657846 268976464 -12.57% BenchmarkLevelDBIndexing2Workers100Batch 302257400 269110215 -10.97% BenchmarkLevelDBIndexing4Workers100Batch 305320485 259084902 -15.14% BenchmarkLevelDBIndexing1Workers1000Batch 301320576 258070231 -14.35% BenchmarkLevelDBIndexing2Workers1000Batch 334174454 261175641 -21.84% BenchmarkLevelDBIndexing4Workers1000Batch 267732436 261461739 -2.34% closes #165	2015-03-06 13:00:53 -05:00

1 2

85 Commits