0
0
Commit Graph

262 Commits

Author SHA1 Message Date
Marty Schoch
48fcd5a7d5 Merge branch 'WIP-perf-20160106' of https://github.com/steveyen/bleve into steveyen-WIP-perf-20160106 2016-01-07 15:40:29 -05:00
Marty Schoch
665f5c58e1 fix errcheck violation 2016-01-07 11:11:43 -05:00
Marty Schoch
e54db33346 try testing slightly different way 2016-01-07 11:06:18 -05:00
Marty Schoch
cd940cc375 add another check to try to understand test failure on travis 2016-01-07 10:45:20 -05:00
Steve Yen
846912d083 upside_down udc.termVectorsFromTokenFreq rows append optimization 2016-01-07 00:48:34 -08:00
Steve Yen
8b980bd2ef firestorm avoid extra goroutine, similar to upside_down 2016-01-07 00:43:27 -08:00
Steve Yen
fbd0e7bfe9 upside_down backIndexTermEntries precalloc'ed capacity 2016-01-07 00:23:25 -08:00
Steve Yen
4eee8821f9 upside_down storeField/indexField append to provided arrays
Taking another optimization from firestorm, upside_down's
storeField()/indexField() funcs now also append() to passed-in arrays
rather than always allocating their own arrays.
2016-01-07 00:13:46 -08:00
Steve Yen
1af2927967 upside_down gets analysis perf rows optimizations from firestorm 2016-01-06 23:53:13 -08:00
Steve Yen
82b8b3468e upside_down analysis converts to docIDBytes once 2016-01-06 23:38:02 -08:00
Steve Yen
d6a997d8c1 firestorm gtreap lookup once per snapshot docID
Previously, firestorm would lookup docID's in the inFlight gtreap for
every candidate docNum, and this change moves the lookup to outside of
the loop.
2016-01-06 16:46:15 -08:00
Steve Yen
024848ac91 firestorm valid docNum finding, fixes #310 2016-01-06 16:04:56 -08:00
Steve Yen
7df07f94fa firestorm use the ParseKey() funcs to avoid unneeded value parsing
With this change, the row allocation also happens only once per loop,
instead of once per item.
2016-01-06 15:53:12 -08:00
Steve Yen
009d59222a firestorm StoredRow.ParseKey() func 2016-01-06 15:46:26 -08:00
Steve Yen
8389027ae8 firestorm TermFreqRow.ParseKey() func 2016-01-06 15:32:09 -08:00
Steve Yen
89d17f01ef analyze locations only if includeTermVectors enabled
With this change, TermLocations are computed and maintained only if
includeTermVectors is enabled, for higher performance.
2016-01-05 12:46:46 -08:00
Steve Yen
70b7e73c82 firestorm compensator inFlight.Get() might return nil 2016-01-03 10:21:54 -08:00
Steve Yen
fb8c9a7475 firestorm.Batch() collects [][]IndexRows instead of []IndexRow
Rather than append() all received rows into a flat []IndexRow during
the result gathering loop, this change instead collects the analysis
result rows into a [][]IndexRow, which avoids extra copying.

As part of this, firestorm batchRows() now takes the [][]IndexRow as
its input.
2016-01-02 12:30:47 -08:00
Steve Yen
1c5b84911d firestorm DictUpdater NotifyBatch is more async 2016-01-02 12:21:25 -08:00
Steve Yen
b241242465 firestorm.Analyze() preallocs rows, with analyzeField() func
The new analyzeField() helper func is used for both regular fields and
for composite fields.

With this change, all analysis is done up front, for both regular
fields and composite fields.

After analysis, this change counts up all the row capacity needed and
extends the AnalysisResult.Rows in one shot, as opposed to the
previous approach of dynamically growing the array as needed during
append()'s.

Also, in this change, the TermFreqRow for _id is added first, which
seems more correct.
2016-01-02 12:21:25 -08:00
Steve Yen
5b2bc1c20f firestorm.indexField() check for includeTermVectors moved out of loop 2016-01-02 12:21:25 -08:00
Steve Yen
45e9eaaacb firestorm.indexField() allocs up-front array of TermFreqRow's
This uses the "backing array" technique to allocate many TermFreqRow's
at the front of firestorm.indexField(), instead of the previous
one-by-one, as-needed TermFreqRow allocation approach.

Results from micro-benchmark, null-firestorm, bleve-blast has this
change producing a ~half MB/sec improvement.
2016-01-02 12:21:24 -08:00
Steve Yen
7ae696d661 firestorm lookuper notified via batch
Previously, the firestorm.Batch() would notify the lookuper goroutine
on a document by document basis.  If the lookuper input channel became
full, then that would block the firestorm.Batch() operation.

With this change, lookuper is notified once, with a "batch" that is an
[]*InFlightItem.

This change also reuses that same []*InFlightItem to invoke the
compensator.MutateBatch().

This also has the advantage of only converting the docID's from string
to []byte just once, outside of the lock that's used by the
compensator.

Micro-benchmark of this change with null-firestorm bleve-blast does
not show large impact, neither degradation or improvement.
2016-01-02 12:21:24 -08:00
Steve Yen
38d50ed8b5 renamed var to docsUpdated to match docsDeleted naming 2016-01-02 12:21:24 -08:00
Steve Yen
3feeb14b7d firestorm.batchRows reuses buf for all IndexRows 2016-01-02 12:21:24 -08:00
Steve Yen
0a7f7e3df8 firestorm.Analyze() converts docID to bytes only once 2016-01-02 12:21:24 -08:00
Steve Yen
fd81d0364c firestorm.indexField() uses capacity of len(tokenFreqs) 2016-01-02 12:21:24 -08:00
Steve Yen
ee5ccda112 use KeyTo/ValueTo in firestorm.batchRows
After this change, with null kvstore micro-benchmark...

  GOMAXPROCS=8 ./bleve-blast -source=../../tmp/enwiki.txt \
    -count=100000 -numAnalyzers=8 -numIndexers=8 \
    -config=../../configs/null-firestorm.json -batch=100

Then TermFreqRow key and value methods dissapear as large boxes from
the cpu profile graphs.
2016-01-01 09:57:59 -08:00
Steve Yen
fd287bdfa4 firestorm.md markdown fixes 2016-01-01 09:57:59 -08:00
Steve Yen
b605224106 use shorter go idiom 2015-12-29 22:14:45 -08:00
Antoine Grondin
6806343677 firestore: fix #296 for division by zero on GC 2015-12-25 11:34:19 +07:00
Antoine Grondin
a6f7abdfa3 firestore: reproducer for division by zero on GC 2015-12-25 11:33:46 +07:00
Marty Schoch
8efbd556a3 fix indexing bug with data coming from arrays
fixes #295
2015-12-21 14:59:32 -05:00
Marty Schoch
cf67fe2cbc fix major synchronization issue in the field_cache
The field cache is expected to be the authority on which field
names are identified by which identifier.  This code was
optimized for the most common case in which fields already
exist.  However, if we deterimine the field is missing with
the read lock (shared), we incorrectly immediately proceed
to create a new row with the write lock (exclusive).  The
problem is that multiple goroutines might have come to
the same conclusion, and they all proceed to add rows.  The two
choices were to do the whole operation with the write lock, or
recheck the value again with the write lock.  We have chosen
to repeat the check inside the write-lock, as this optimizes
for what we believe to be the most common case, in which most
fields will already exist.
2015-12-15 16:39:38 -05:00
Marty Schoch
a73a178923 fix incorrect prefix search behavior
avoids double incrementing of end term when reading term dict
fixes #293
2015-12-04 14:07:16 -05:00
Marty Schoch
699c86073a make existing integration tests work with firestorm 2015-12-01 12:29:56 -05:00
Marty Schoch
6d851cfcc2 fix bug in warmup which led to docs being deleted 2015-11-30 10:18:14 -05:00
Marty Schoch
aa8d98f5fa include space after prefix in log output 2015-11-30 10:17:48 -05:00
Marty Schoch
68d8742826 correctly prefix internal rows with 'i' and print them in debug 2015-11-30 10:17:15 -05:00
Marty Schoch
c93de9734e fix issues identified by errcheck 2015-11-24 14:32:33 -05:00
Marty Schoch
bbef1980d8 Merge branch 'master' into firestorm 2015-11-24 13:04:36 -05:00
Marty Schoch
ff11f83842 properly handle errors inside metrics kvstore reporting 2015-11-24 12:52:03 -05:00
Marty Schoch
a707d44e0b Merge branch 'master' into firestorm 2015-11-24 09:44:47 -05:00
Patrick Mezard
e85c9c542e row: expose TermFrequencyRow term and freq fields
Rows content is an implementation detail of bleve index and may change
in the future. That said, they also contains information valuable to
assess the quality of the index or understand its performances. So, as
long as we agree that type asserting rows should only be done if you
know what you are doing and are ready to deal with future changes, I see
no reason to hide the row fields from external packages.

Fix #268
2015-11-17 17:21:26 +01:00
Kosov Eugene
45e670b99b BoltDB wrapper nano optimization which makes code a bit prettier too 2015-11-05 00:27:28 +03:00
Marty Schoch
4791625b9b Merge pull request #262 from pmezard/index-and-tokenizer-doc-and-fix
Index and tokenizer doc and fix
2015-11-02 11:51:21 -05:00
Marty Schoch
30651065e9 fix panic on insufficiently sized buffer
adds test case to reproduce original problem
fixes #264
2015-10-30 18:25:38 -04:00
Marty Schoch
2bd3ef4080 copy relevant k/v pairs before advancing underlying iterator 2015-10-28 12:23:54 -04:00
Marty Schoch
d1b07f4909 fix dump methods to properly copy keys and values 2015-10-28 12:06:44 -04:00
Marty Schoch
01526e971f Merge branch 'master' into firestorm 2015-10-28 11:26:01 -04:00