0
0
Commit Graph

776 Commits

Author SHA1 Message Date
slavikm
680be52f87 Implemented boolean field support 2016-01-11 17:18:03 -08:00
Marty Schoch
d533b326f6 Merge pull request #315 from steveyen/WIP-perf-20160107
KVBatch Close() method
2016-01-08 09:47:37 -05:00
Steve Yen
860de28a28 fix memory leak by closing batches in batchRows() 2016-01-07 17:59:42 -08:00
Steve Yen
70105477cf added Close() method to KVBatch interface 2016-01-07 17:54:21 -08:00
Marty Schoch
2b947e8c14 Merge branch 'steveyen-WIP-perf-20160106' 2016-01-07 15:42:53 -05:00
Marty Schoch
48fcd5a7d5 Merge branch 'WIP-perf-20160106' of https://github.com/steveyen/bleve into steveyen-WIP-perf-20160106 2016-01-07 15:40:29 -05:00
Marty Schoch
665f5c58e1 fix errcheck violation 2016-01-07 11:11:43 -05:00
Marty Schoch
e54db33346 try testing slightly different way 2016-01-07 11:06:18 -05:00
Marty Schoch
cd940cc375 add another check to try to understand test failure on travis 2016-01-07 10:45:20 -05:00
Marty Schoch
155e496121 switch to containerized builds
switch to go 1.5 (required by errcheck now)
do not install go vet (part of Go now)
2016-01-07 09:57:23 -05:00
Steve Yen
846912d083 upside_down udc.termVectorsFromTokenFreq rows append optimization 2016-01-07 00:48:34 -08:00
Steve Yen
8b980bd2ef firestorm avoid extra goroutine, similar to upside_down 2016-01-07 00:43:27 -08:00
Steve Yen
fbd0e7bfe9 upside_down backIndexTermEntries precalloc'ed capacity 2016-01-07 00:23:25 -08:00
Steve Yen
4eee8821f9 upside_down storeField/indexField append to provided arrays
Taking another optimization from firestorm, upside_down's
storeField()/indexField() funcs now also append() to passed-in arrays
rather than always allocating their own arrays.
2016-01-07 00:13:46 -08:00
Steve Yen
1af2927967 upside_down gets analysis perf rows optimizations from firestorm 2016-01-06 23:53:13 -08:00
Steve Yen
82b8b3468e upside_down analysis converts to docIDBytes once 2016-01-06 23:38:02 -08:00
Steve Yen
d6a997d8c1 firestorm gtreap lookup once per snapshot docID
Previously, firestorm would lookup docID's in the inFlight gtreap for
every candidate docNum, and this change moves the lookup to outside of
the loop.
2016-01-06 16:46:15 -08:00
Steve Yen
024848ac91 firestorm valid docNum finding, fixes #310 2016-01-06 16:04:56 -08:00
Steve Yen
7df07f94fa firestorm use the ParseKey() funcs to avoid unneeded value parsing
With this change, the row allocation also happens only once per loop,
instead of once per item.
2016-01-06 15:53:12 -08:00
Steve Yen
009d59222a firestorm StoredRow.ParseKey() func 2016-01-06 15:46:26 -08:00
Steve Yen
8389027ae8 firestorm TermFreqRow.ParseKey() func 2016-01-06 15:32:09 -08:00
Marty Schoch
83cd8da394 Merge pull request #307 from steveyen/WIP-perf-20160105
analyze locations only if includeTermVectors enabled
2016-01-05 16:04:59 -05:00
Steve Yen
89d17f01ef analyze locations only if includeTermVectors enabled
With this change, TermLocations are computed and maintained only if
includeTermVectors is enabled, for higher performance.
2016-01-05 12:46:46 -08:00
Marty Schoch
e5c1af4164 add travis config to run integration tests against firestorm 2016-01-05 13:00:36 -05:00
Marty Schoch
ab67b2f642 Merge pull request #267 from pmezard/doc-dump-methods
index: document DumpAll, DumpDoc and DumpFields methods
2016-01-05 09:55:35 -05:00
Marty Schoch
db7363fba1 Merge pull request #305 from steveyen/WIP-perf-20160102
perf 20160102
2016-01-05 08:54:47 -05:00
Steve Yen
70b7e73c82 firestorm compensator inFlight.Get() might return nil 2016-01-03 10:21:54 -08:00
Steve Yen
fb8c9a7475 firestorm.Batch() collects [][]IndexRows instead of []IndexRow
Rather than append() all received rows into a flat []IndexRow during
the result gathering loop, this change instead collects the analysis
result rows into a [][]IndexRow, which avoids extra copying.

As part of this, firestorm batchRows() now takes the [][]IndexRow as
its input.
2016-01-02 12:30:47 -08:00
Steve Yen
1c5b84911d firestorm DictUpdater NotifyBatch is more async 2016-01-02 12:21:25 -08:00
Steve Yen
b241242465 firestorm.Analyze() preallocs rows, with analyzeField() func
The new analyzeField() helper func is used for both regular fields and
for composite fields.

With this change, all analysis is done up front, for both regular
fields and composite fields.

After analysis, this change counts up all the row capacity needed and
extends the AnalysisResult.Rows in one shot, as opposed to the
previous approach of dynamically growing the array as needed during
append()'s.

Also, in this change, the TermFreqRow for _id is added first, which
seems more correct.
2016-01-02 12:21:25 -08:00
Steve Yen
325a616993 unicode.Tokenize() avoids array growth via array of arrays 2016-01-02 12:21:25 -08:00
Steve Yen
918732f3d8 unicode.Tokenize() allocs backing array of Tokens
Previously, unicode.Tokenize() would allocate a Token one-by-one, on
an as-needed basis.

This change allocates a "backing array" of Tokens, so that it goes to
the runtime object allocator much less often.  It takes a heuristic
guess as to the backing array size by using the average token
(segment) length seen so far.

Results from micro-benchmark (null-firestorm, bleve-blast) seem to
give perhaps less than ~0.5 MB/second throughput improvement.
2016-01-02 12:21:25 -08:00
Steve Yen
5b2bc1c20f firestorm.indexField() check for includeTermVectors moved out of loop 2016-01-02 12:21:25 -08:00
Steve Yen
45e9eaaacb firestorm.indexField() allocs up-front array of TermFreqRow's
This uses the "backing array" technique to allocate many TermFreqRow's
at the front of firestorm.indexField(), instead of the previous
one-by-one, as-needed TermFreqRow allocation approach.

Results from micro-benchmark, null-firestorm, bleve-blast has this
change producing a ~half MB/sec improvement.
2016-01-02 12:21:24 -08:00
Steve Yen
7ae696d661 firestorm lookuper notified via batch
Previously, the firestorm.Batch() would notify the lookuper goroutine
on a document by document basis.  If the lookuper input channel became
full, then that would block the firestorm.Batch() operation.

With this change, lookuper is notified once, with a "batch" that is an
[]*InFlightItem.

This change also reuses that same []*InFlightItem to invoke the
compensator.MutateBatch().

This also has the advantage of only converting the docID's from string
to []byte just once, outside of the lock that's used by the
compensator.

Micro-benchmark of this change with null-firestorm bleve-blast does
not show large impact, neither degradation or improvement.
2016-01-02 12:21:24 -08:00
Steve Yen
38d50ed8b5 renamed var to docsUpdated to match docsDeleted naming 2016-01-02 12:21:24 -08:00
Steve Yen
3feeb14b7d firestorm.batchRows reuses buf for all IndexRows 2016-01-02 12:21:24 -08:00
Steve Yen
0a7f7e3df8 firestorm.Analyze() converts docID to bytes only once 2016-01-02 12:21:24 -08:00
Steve Yen
fd81d0364c firestorm.indexField() uses capacity of len(tokenFreqs) 2016-01-02 12:21:24 -08:00
Steve Yen
a345e7951e TokenFrequency() alloc's all TokenLocations up front 2016-01-02 12:21:17 -08:00
Steve Yen
ee5ccda112 use KeyTo/ValueTo in firestorm.batchRows
After this change, with null kvstore micro-benchmark...

  GOMAXPROCS=8 ./bleve-blast -source=../../tmp/enwiki.txt \
    -count=100000 -numAnalyzers=8 -numIndexers=8 \
    -config=../../configs/null-firestorm.json -batch=100

Then TermFreqRow key and value methods dissapear as large boxes from
the cpu profile graphs.
2016-01-01 09:57:59 -08:00
Steve Yen
fd287bdfa4 firestorm.md markdown fixes 2016-01-01 09:57:59 -08:00
Steve Yen
b605224106 use shorter go idiom 2015-12-29 22:14:45 -08:00
Marty Schoch
6ddcde4c04 Merge pull request #294 from Shugyousha/fuzzytest
Add tests for fuzzy search
2015-12-25 11:38:35 -08:00
Marty Schoch
8ae2aee0bc Merge pull request #297 from aybabtme/firestorm-dont-gc-if-no-documents
Firestorm: dont gc if no documents
2015-12-25 11:23:49 -08:00
Antoine Grondin
6806343677 firestore: fix #296 for division by zero on GC 2015-12-25 11:34:19 +07:00
Antoine Grondin
a6f7abdfa3 firestore: reproducer for division by zero on GC 2015-12-25 11:33:46 +07:00
Marty Schoch
8efbd556a3 fix indexing bug with data coming from arrays
fixes #295
2015-12-21 14:59:32 -05:00
Marty Schoch
7bb58e1be4 add ability for integration test to check hit locations 2015-12-21 14:42:43 -05:00
Silvan Jegen
84c755cdb0 Add tests for fuzzy search 2015-12-20 17:00:46 +01:00