bleve

Author	SHA1	Message	Date
Silvan Jegen	d326898f7b	Remove unneeded brackets	2016-01-14 16:41:41 +01:00
Marty Schoch	af25e724f6	Merge branch 'master' of https://github.com/slavikm/bleve into slavikm-master	2016-01-13 16:10:59 -05:00
Silvan Jegen	35ac2b2bee	Run go fmt ./...	2016-01-12 22:15:50 +01:00
Steve Yen	0e72b949b3	upside_down batchRows() takes array of arrays In order to spend less time in append(), this change in upside_down (similar to another recent performance change in firestorm) builds up an array of arrays as the eventual input to batchRows().	2016-01-11 18:11:21 -08:00
slavikm	680be52f87	Implemented boolean field support	2016-01-11 17:18:03 -08:00
Steve Yen	7ce7d98cba	upside_down merge dictionary deltas before using batch.Merge() This change performs more dictionary delta incr/decr math in batchRows() instead of in the KVStore ExecuteBatch() machinery.	2016-01-11 16:52:07 -08:00
Steve Yen	94273d5fa9	upside_down process internal rows earlier With this change, internal rows are processed while we're waiting for backIndex rows to be retrieved.	2016-01-11 16:25:35 -08:00
Steve Yen	bb5cd8f3d6	upside_down merge backIndexRow concurrently Previously, the code would gather all the backIndexRows before processing them. This change instead merges the backIndexRows concurrently on the theory that we might as well make progress on compute & processing tasks while waiting for the rest of the back index rows to be fetched from the KVStore.	2016-01-10 18:50:42 -08:00
Steve Yen	c3b5246b0c	upside_down track analysis time tighter; and comments	2016-01-10 15:36:54 -08:00
Steve Yen	d3dd40d334	upside_down retrieves backindex concurrently with analysis Start backindex reading concurrently with analysi to try to utilize more I/O bandwidth. The analysis time vs indexing time stats tracking are also now "off", since there's now concurrency between those actiivties. One tradeoff is that the lock area in upside_down Batch() is increased as part of this change.	2016-01-10 15:18:28 -08:00
Steve Yen	bff95eef70	firestorm close kvwriter sooner	2016-01-10 15:18:27 -08:00
Steve Yen	860de28a28	fix memory leak by closing batches in batchRows()	2016-01-07 17:59:42 -08:00
Steve Yen	70105477cf	added Close() method to KVBatch interface	2016-01-07 17:54:21 -08:00
Marty Schoch	48fcd5a7d5	Merge branch 'WIP-perf-20160106' of https://github.com/steveyen/bleve into steveyen-WIP-perf-20160106	2016-01-07 15:40:29 -05:00
Marty Schoch	665f5c58e1	fix errcheck violation	2016-01-07 11:11:43 -05:00
Marty Schoch	e54db33346	try testing slightly different way	2016-01-07 11:06:18 -05:00
Marty Schoch	cd940cc375	add another check to try to understand test failure on travis	2016-01-07 10:45:20 -05:00
Steve Yen	846912d083	upside_down udc.termVectorsFromTokenFreq rows append optimization	2016-01-07 00:48:34 -08:00
Steve Yen	8b980bd2ef	firestorm avoid extra goroutine, similar to upside_down	2016-01-07 00:43:27 -08:00
Steve Yen	fbd0e7bfe9	upside_down backIndexTermEntries precalloc'ed capacity	2016-01-07 00:23:25 -08:00
Steve Yen	4eee8821f9	upside_down storeField/indexField append to provided arrays Taking another optimization from firestorm, upside_down's storeField()/indexField() funcs now also append() to passed-in arrays rather than always allocating their own arrays.	2016-01-07 00:13:46 -08:00
Steve Yen	1af2927967	upside_down gets analysis perf rows optimizations from firestorm	2016-01-06 23:53:13 -08:00
Steve Yen	82b8b3468e	upside_down analysis converts to docIDBytes once	2016-01-06 23:38:02 -08:00
Steve Yen	d6a997d8c1	firestorm gtreap lookup once per snapshot docID Previously, firestorm would lookup docID's in the inFlight gtreap for every candidate docNum, and this change moves the lookup to outside of the loop.	2016-01-06 16:46:15 -08:00
Steve Yen	024848ac91	firestorm valid docNum finding, fixes #310	2016-01-06 16:04:56 -08:00
Steve Yen	7df07f94fa	firestorm use the ParseKey() funcs to avoid unneeded value parsing With this change, the row allocation also happens only once per loop, instead of once per item.	2016-01-06 15:53:12 -08:00
Steve Yen	009d59222a	firestorm StoredRow.ParseKey() func	2016-01-06 15:46:26 -08:00
Steve Yen	8389027ae8	firestorm TermFreqRow.ParseKey() func	2016-01-06 15:32:09 -08:00
Steve Yen	89d17f01ef	analyze locations only if includeTermVectors enabled With this change, TermLocations are computed and maintained only if includeTermVectors is enabled, for higher performance.	2016-01-05 12:46:46 -08:00
Steve Yen	70b7e73c82	firestorm compensator inFlight.Get() might return nil	2016-01-03 10:21:54 -08:00
Steve Yen	fb8c9a7475	firestorm.Batch() collects [][]IndexRows instead of []IndexRow Rather than append() all received rows into a flat []IndexRow during the result gathering loop, this change instead collects the analysis result rows into a [][]IndexRow, which avoids extra copying. As part of this, firestorm batchRows() now takes the [][]IndexRow as its input.	2016-01-02 12:30:47 -08:00
Steve Yen	1c5b84911d	firestorm DictUpdater NotifyBatch is more async	2016-01-02 12:21:25 -08:00
Steve Yen	b241242465	firestorm.Analyze() preallocs rows, with analyzeField() func The new analyzeField() helper func is used for both regular fields and for composite fields. With this change, all analysis is done up front, for both regular fields and composite fields. After analysis, this change counts up all the row capacity needed and extends the AnalysisResult.Rows in one shot, as opposed to the previous approach of dynamically growing the array as needed during append()'s. Also, in this change, the TermFreqRow for _id is added first, which seems more correct.	2016-01-02 12:21:25 -08:00
Steve Yen	5b2bc1c20f	firestorm.indexField() check for includeTermVectors moved out of loop	2016-01-02 12:21:25 -08:00
Steve Yen	45e9eaaacb	firestorm.indexField() allocs up-front array of TermFreqRow's This uses the "backing array" technique to allocate many TermFreqRow's at the front of firestorm.indexField(), instead of the previous one-by-one, as-needed TermFreqRow allocation approach. Results from micro-benchmark, null-firestorm, bleve-blast has this change producing a ~half MB/sec improvement.	2016-01-02 12:21:24 -08:00
Steve Yen	7ae696d661	firestorm lookuper notified via batch Previously, the firestorm.Batch() would notify the lookuper goroutine on a document by document basis. If the lookuper input channel became full, then that would block the firestorm.Batch() operation. With this change, lookuper is notified once, with a "batch" that is an []InFlightItem. This change also reuses that same []InFlightItem to invoke the compensator.MutateBatch(). This also has the advantage of only converting the docID's from string to []byte just once, outside of the lock that's used by the compensator. Micro-benchmark of this change with null-firestorm bleve-blast does not show large impact, neither degradation or improvement.	2016-01-02 12:21:24 -08:00
Steve Yen	38d50ed8b5	renamed var to docsUpdated to match docsDeleted naming	2016-01-02 12:21:24 -08:00
Steve Yen	3feeb14b7d	firestorm.batchRows reuses buf for all IndexRows	2016-01-02 12:21:24 -08:00
Steve Yen	0a7f7e3df8	firestorm.Analyze() converts docID to bytes only once	2016-01-02 12:21:24 -08:00
Steve Yen	fd81d0364c	firestorm.indexField() uses capacity of len(tokenFreqs)	2016-01-02 12:21:24 -08:00
Steve Yen	ee5ccda112	use KeyTo/ValueTo in firestorm.batchRows After this change, with null kvstore micro-benchmark... GOMAXPROCS=8 ./bleve-blast -source=../../tmp/enwiki.txt \ -count=100000 -numAnalyzers=8 -numIndexers=8 \ -config=../../configs/null-firestorm.json -batch=100 Then TermFreqRow key and value methods dissapear as large boxes from the cpu profile graphs.	2016-01-01 09:57:59 -08:00
Steve Yen	fd287bdfa4	firestorm.md markdown fixes	2016-01-01 09:57:59 -08:00
Steve Yen	b605224106	use shorter go idiom	2015-12-29 22:14:45 -08:00
Antoine Grondin	6806343677	firestore: fix #296 for division by zero on GC	2015-12-25 11:34:19 +07:00
Antoine Grondin	a6f7abdfa3	firestore: reproducer for division by zero on GC	2015-12-25 11:33:46 +07:00
Marty Schoch	8efbd556a3	fix indexing bug with data coming from arrays fixes #295	2015-12-21 14:59:32 -05:00
Marty Schoch	cf67fe2cbc	fix major synchronization issue in the field_cache The field cache is expected to be the authority on which field names are identified by which identifier. This code was optimized for the most common case in which fields already exist. However, if we deterimine the field is missing with the read lock (shared), we incorrectly immediately proceed to create a new row with the write lock (exclusive). The problem is that multiple goroutines might have come to the same conclusion, and they all proceed to add rows. The two choices were to do the whole operation with the write lock, or recheck the value again with the write lock. We have chosen to repeat the check inside the write-lock, as this optimizes for what we believe to be the most common case, in which most fields will already exist.	2015-12-15 16:39:38 -05:00
Marty Schoch	a73a178923	fix incorrect prefix search behavior avoids double incrementing of end term when reading term dict fixes #293	2015-12-04 14:07:16 -05:00
Marty Schoch	699c86073a	make existing integration tests work with firestorm	2015-12-01 12:29:56 -05:00
Marty Schoch	6d851cfcc2	fix bug in warmup which led to docs being deleted	2015-11-30 10:18:14 -05:00

1 2 3 4 5 ...

275 Commits