0
0
Commit Graph

358 Commits

Author SHA1 Message Date
Marty Schoch
724684a4f1 additional firestorm fixes for 64-bit alignment
part of #359
2016-03-20 11:02:13 -04:00
Marty Schoch
3dc64de478 moved fields requiring 64-bit alignment to start of struct
several data structures had a pointer at the start of the struct
on some 32-bit systems, this causes the remaining fields no longer
be aligned on 64-bit boundaries

the fix identifed by @pmezard is to put the counters first in the
struct, which guarantees correct alignment

fixes #359
2016-03-20 10:38:28 -04:00
Steve Yen
be2800a8e4 MB-18715 - moss Merge() didn't bump bufUsed correctly
And, also allocate more memory for both the partial and full merges.
2016-03-15 17:09:40 -07:00
Steve Yen
c1597842d0 moss lowerLevelUpdate didn't handle batches of size 1 2016-03-11 15:47:23 -08:00
Steve Yen
f1dac8b497 moss defaults to non-nil options.Log 2016-03-09 10:15:11 -08:00
Steve Yen
1d63c55f7c parse mossLowerLevelMaxBatchSize only when lower-level-store exists 2016-03-09 10:09:15 -08:00
Steve Yen
76b9365928 added moss RegistryCollectionOptions
The moss RegistryCollectionOptions allows applications to register
moss-related callback API functions and other advanced feature usage
at process initialization time.

For example, this could be used for moss's OnError(), OnEvent() and
logging callback options.
2016-03-09 09:40:29 -08:00
Marty Schoch
d7292ed891 add support for gathering stats via map for easier consumption 2016-03-07 18:37:46 -05:00
Marty Schoch
e51f4d5450 changing async test strategy, was failing in go 1.6 2016-03-07 09:39:20 -05:00
Marty Schoch
23a323bc9d add support for numPlainTextBytesIndexed metric 2016-03-05 14:05:08 -05:00
Marty Schoch
81780f97d0 add term search stats 2016-03-05 07:50:25 -05:00
Marty Schoch
147debaa12 expose metrics and moss stats wrapping underlying stats as well 2016-03-04 13:43:39 -05:00
Steve Yen
f6d1bd2c87 moss option MaxPreMergerBatches renamed 2016-03-03 11:18:30 -08:00
Steve Yen
7d67d89a9c MB-18441 - moss lower-level iterator starts positioned on current
The iterator starts off positioned so that Current() is correct, so
invoking Next() right off the bat was incorrect.
2016-03-01 21:45:48 -08:00
Steve Yen
a29dd25a48 upside_down dict row value size accounts for large uvarint's
This is somewhat unlikely, but if a term is (incredibly) popular, its
uvarint count value representation might go beyond 8 bytes.

Some KVStore implementations (like forestdb) provide a BatchEx cgo
optimization that depends on proper preallocated counting, so this
change provides a proper worst-case estimate based on the max-unvarint
of 10 bytes instead of the previously incorrect 8 bytes.
2016-02-22 11:52:51 -08:00
Steve Yen
dd1718fa78 index/store/moss uses AllocMerge() instead of Merge()
Performance optimization.  Before this change, by using Merge()
instead of AllocMerge(), moss's internal batch buf's would be
wastefully, dramatically grown during append()'s to a mis-sized buf.
2016-02-22 11:48:02 -08:00
Steve Yen
ea1a52464d more index/store/moss err handling 2016-02-20 14:25:42 -08:00
Steve Yen
eb315fa500 integrate index/store/moss KV store 2016-02-20 14:25:42 -08:00
Marty Schoch
208b700e17 add missing build tag guarding cznicb benchmark 2016-02-09 15:57:35 -05:00
Marty Schoch
40c95513b7 add support for including kvstore stats 2016-02-05 12:26:19 -05:00
Marty Schoch
c5dea9e882 fix accessing store via Advanced() method which was broken 2016-02-02 11:54:18 -05:00
Marty Schoch
710d06e974 add support for native C merge operators 2016-01-27 17:51:07 -05:00
Steve Yen
d97e3caf4f fix comment typo 2016-01-22 09:04:24 -08:00
Steve Yen
d5de1d3da1 metrics implements BatchEx correctly 2016-01-21 11:00:41 -08:00
Marty Schoch
fc34a97875 copy locations on merge for more safe/predictable behavior
fixes #328
2016-01-19 14:21:48 -05:00
Steve Yen
035d9d0e40 unneeded cast and parens 2016-01-17 00:16:05 -08:00
Marty Schoch
1335eb2a7b Merge pull request #322 from steveyen/WIP-perf-20160113
KVReader.MultiGet and KVWriter.NewBatchEx API's
2016-01-15 14:28:59 -05:00
opennota
8517feb1c6 Fix some typos 2016-01-15 05:46:27 +07:00
Silvan Jegen
d326898f7b Remove unneeded brackets 2016-01-14 16:41:41 +01:00
Steve Yen
6849e538be upside_down and firestorm use new NewBatchEx() API
With this change, the upside_down batchRows() and firestorm
batchRows() now use the new KVWriter.NewBatchEx() API, which can
improve performance by reducing the number of cgo hops.
2016-01-13 23:08:20 -08:00
Steve Yen
d94ccf2d74 added KVWriter.NewBatchEx() method 2016-01-13 16:19:04 -08:00
Steve Yen
fb048f6c64 added KVReader.MultiGet() method 2016-01-13 15:12:10 -08:00
Steve Yen
8dc067b1d9 go fmt 2016-01-13 15:11:50 -08:00
Steve Yen
fe39b3fd13 avoid fieldTermFreqs loop if no composite fields 2016-01-13 14:45:04 -08:00
Marty Schoch
af25e724f6 Merge branch 'master' of https://github.com/slavikm/bleve into slavikm-master 2016-01-13 16:10:59 -05:00
Silvan Jegen
35ac2b2bee Run go fmt ./... 2016-01-12 22:15:50 +01:00
Steve Yen
0e72b949b3 upside_down batchRows() takes array of arrays
In order to spend less time in append(), this change in upside_down
(similar to another recent performance change in firestorm) builds up
an array of arrays as the eventual input to batchRows().
2016-01-11 18:11:21 -08:00
slavikm
680be52f87 Implemented boolean field support 2016-01-11 17:18:03 -08:00
Steve Yen
7ce7d98cba upside_down merge dictionary deltas before using batch.Merge()
This change performs more dictionary delta incr/decr math in
batchRows() instead of in the KVStore ExecuteBatch() machinery.
2016-01-11 16:52:07 -08:00
Steve Yen
94273d5fa9 upside_down process internal rows earlier
With this change, internal rows are processed while we're waiting for
backIndex rows to be retrieved.
2016-01-11 16:25:35 -08:00
Steve Yen
bb5cd8f3d6 upside_down merge backIndexRow concurrently
Previously, the code would gather all the backIndexRows before
processing them.  This change instead merges the backIndexRows
concurrently on the theory that we might as well make progress on
compute & processing tasks while waiting for the rest of the back
index rows to be fetched from the KVStore.
2016-01-10 18:50:42 -08:00
Steve Yen
c3b5246b0c upside_down track analysis time tighter; and comments 2016-01-10 15:36:54 -08:00
Steve Yen
d3dd40d334 upside_down retrieves backindex concurrently with analysis
Start backindex reading concurrently with analysi to try to utilize
more I/O bandwidth.

The analysis time vs indexing time stats tracking are also now "off",
since there's now concurrency between those actiivties.

One tradeoff is that the lock area in upside_down Batch() is increased
as part of this change.
2016-01-10 15:18:28 -08:00
Steve Yen
bff95eef70 firestorm close kvwriter sooner 2016-01-10 15:18:27 -08:00
Steve Yen
860de28a28 fix memory leak by closing batches in batchRows() 2016-01-07 17:59:42 -08:00
Steve Yen
70105477cf added Close() method to KVBatch interface 2016-01-07 17:54:21 -08:00
Marty Schoch
48fcd5a7d5 Merge branch 'WIP-perf-20160106' of https://github.com/steveyen/bleve into steveyen-WIP-perf-20160106 2016-01-07 15:40:29 -05:00
Marty Schoch
665f5c58e1 fix errcheck violation 2016-01-07 11:11:43 -05:00
Marty Schoch
e54db33346 try testing slightly different way 2016-01-07 11:06:18 -05:00
Marty Schoch
cd940cc375 add another check to try to understand test failure on travis 2016-01-07 10:45:20 -05:00
Steve Yen
846912d083 upside_down udc.termVectorsFromTokenFreq rows append optimization 2016-01-07 00:48:34 -08:00
Steve Yen
8b980bd2ef firestorm avoid extra goroutine, similar to upside_down 2016-01-07 00:43:27 -08:00
Steve Yen
fbd0e7bfe9 upside_down backIndexTermEntries precalloc'ed capacity 2016-01-07 00:23:25 -08:00
Steve Yen
4eee8821f9 upside_down storeField/indexField append to provided arrays
Taking another optimization from firestorm, upside_down's
storeField()/indexField() funcs now also append() to passed-in arrays
rather than always allocating their own arrays.
2016-01-07 00:13:46 -08:00
Steve Yen
1af2927967 upside_down gets analysis perf rows optimizations from firestorm 2016-01-06 23:53:13 -08:00
Steve Yen
82b8b3468e upside_down analysis converts to docIDBytes once 2016-01-06 23:38:02 -08:00
Steve Yen
d6a997d8c1 firestorm gtreap lookup once per snapshot docID
Previously, firestorm would lookup docID's in the inFlight gtreap for
every candidate docNum, and this change moves the lookup to outside of
the loop.
2016-01-06 16:46:15 -08:00
Steve Yen
024848ac91 firestorm valid docNum finding, fixes #310 2016-01-06 16:04:56 -08:00
Steve Yen
7df07f94fa firestorm use the ParseKey() funcs to avoid unneeded value parsing
With this change, the row allocation also happens only once per loop,
instead of once per item.
2016-01-06 15:53:12 -08:00
Steve Yen
009d59222a firestorm StoredRow.ParseKey() func 2016-01-06 15:46:26 -08:00
Steve Yen
8389027ae8 firestorm TermFreqRow.ParseKey() func 2016-01-06 15:32:09 -08:00
Steve Yen
89d17f01ef analyze locations only if includeTermVectors enabled
With this change, TermLocations are computed and maintained only if
includeTermVectors is enabled, for higher performance.
2016-01-05 12:46:46 -08:00
Steve Yen
70b7e73c82 firestorm compensator inFlight.Get() might return nil 2016-01-03 10:21:54 -08:00
Steve Yen
fb8c9a7475 firestorm.Batch() collects [][]IndexRows instead of []IndexRow
Rather than append() all received rows into a flat []IndexRow during
the result gathering loop, this change instead collects the analysis
result rows into a [][]IndexRow, which avoids extra copying.

As part of this, firestorm batchRows() now takes the [][]IndexRow as
its input.
2016-01-02 12:30:47 -08:00
Steve Yen
1c5b84911d firestorm DictUpdater NotifyBatch is more async 2016-01-02 12:21:25 -08:00
Steve Yen
b241242465 firestorm.Analyze() preallocs rows, with analyzeField() func
The new analyzeField() helper func is used for both regular fields and
for composite fields.

With this change, all analysis is done up front, for both regular
fields and composite fields.

After analysis, this change counts up all the row capacity needed and
extends the AnalysisResult.Rows in one shot, as opposed to the
previous approach of dynamically growing the array as needed during
append()'s.

Also, in this change, the TermFreqRow for _id is added first, which
seems more correct.
2016-01-02 12:21:25 -08:00
Steve Yen
5b2bc1c20f firestorm.indexField() check for includeTermVectors moved out of loop 2016-01-02 12:21:25 -08:00
Steve Yen
45e9eaaacb firestorm.indexField() allocs up-front array of TermFreqRow's
This uses the "backing array" technique to allocate many TermFreqRow's
at the front of firestorm.indexField(), instead of the previous
one-by-one, as-needed TermFreqRow allocation approach.

Results from micro-benchmark, null-firestorm, bleve-blast has this
change producing a ~half MB/sec improvement.
2016-01-02 12:21:24 -08:00
Steve Yen
7ae696d661 firestorm lookuper notified via batch
Previously, the firestorm.Batch() would notify the lookuper goroutine
on a document by document basis.  If the lookuper input channel became
full, then that would block the firestorm.Batch() operation.

With this change, lookuper is notified once, with a "batch" that is an
[]*InFlightItem.

This change also reuses that same []*InFlightItem to invoke the
compensator.MutateBatch().

This also has the advantage of only converting the docID's from string
to []byte just once, outside of the lock that's used by the
compensator.

Micro-benchmark of this change with null-firestorm bleve-blast does
not show large impact, neither degradation or improvement.
2016-01-02 12:21:24 -08:00
Steve Yen
38d50ed8b5 renamed var to docsUpdated to match docsDeleted naming 2016-01-02 12:21:24 -08:00
Steve Yen
3feeb14b7d firestorm.batchRows reuses buf for all IndexRows 2016-01-02 12:21:24 -08:00
Steve Yen
0a7f7e3df8 firestorm.Analyze() converts docID to bytes only once 2016-01-02 12:21:24 -08:00
Steve Yen
fd81d0364c firestorm.indexField() uses capacity of len(tokenFreqs) 2016-01-02 12:21:24 -08:00
Steve Yen
ee5ccda112 use KeyTo/ValueTo in firestorm.batchRows
After this change, with null kvstore micro-benchmark...

  GOMAXPROCS=8 ./bleve-blast -source=../../tmp/enwiki.txt \
    -count=100000 -numAnalyzers=8 -numIndexers=8 \
    -config=../../configs/null-firestorm.json -batch=100

Then TermFreqRow key and value methods dissapear as large boxes from
the cpu profile graphs.
2016-01-01 09:57:59 -08:00
Steve Yen
fd287bdfa4 firestorm.md markdown fixes 2016-01-01 09:57:59 -08:00
Steve Yen
b605224106 use shorter go idiom 2015-12-29 22:14:45 -08:00
Antoine Grondin
6806343677 firestore: fix #296 for division by zero on GC 2015-12-25 11:34:19 +07:00
Antoine Grondin
a6f7abdfa3 firestore: reproducer for division by zero on GC 2015-12-25 11:33:46 +07:00
Marty Schoch
8efbd556a3 fix indexing bug with data coming from arrays
fixes #295
2015-12-21 14:59:32 -05:00
Marty Schoch
cf67fe2cbc fix major synchronization issue in the field_cache
The field cache is expected to be the authority on which field
names are identified by which identifier.  This code was
optimized for the most common case in which fields already
exist.  However, if we deterimine the field is missing with
the read lock (shared), we incorrectly immediately proceed
to create a new row with the write lock (exclusive).  The
problem is that multiple goroutines might have come to
the same conclusion, and they all proceed to add rows.  The two
choices were to do the whole operation with the write lock, or
recheck the value again with the write lock.  We have chosen
to repeat the check inside the write-lock, as this optimizes
for what we believe to be the most common case, in which most
fields will already exist.
2015-12-15 16:39:38 -05:00
Marty Schoch
a73a178923 fix incorrect prefix search behavior
avoids double incrementing of end term when reading term dict
fixes #293
2015-12-04 14:07:16 -05:00
Marty Schoch
699c86073a make existing integration tests work with firestorm 2015-12-01 12:29:56 -05:00
Marty Schoch
6d851cfcc2 fix bug in warmup which led to docs being deleted 2015-11-30 10:18:14 -05:00
Marty Schoch
aa8d98f5fa include space after prefix in log output 2015-11-30 10:17:48 -05:00
Marty Schoch
68d8742826 correctly prefix internal rows with 'i' and print them in debug 2015-11-30 10:17:15 -05:00
Marty Schoch
c93de9734e fix issues identified by errcheck 2015-11-24 14:32:33 -05:00
Marty Schoch
bbef1980d8 Merge branch 'master' into firestorm 2015-11-24 13:04:36 -05:00
Marty Schoch
ff11f83842 properly handle errors inside metrics kvstore reporting 2015-11-24 12:52:03 -05:00
Marty Schoch
a707d44e0b Merge branch 'master' into firestorm 2015-11-24 09:44:47 -05:00
Patrick Mezard
e85c9c542e row: expose TermFrequencyRow term and freq fields
Rows content is an implementation detail of bleve index and may change
in the future. That said, they also contains information valuable to
assess the quality of the index or understand its performances. So, as
long as we agree that type asserting rows should only be done if you
know what you are doing and are ready to deal with future changes, I see
no reason to hide the row fields from external packages.

Fix #268
2015-11-17 17:21:26 +01:00
Kosov Eugene
45e670b99b BoltDB wrapper nano optimization which makes code a bit prettier too 2015-11-05 00:27:28 +03:00
Marty Schoch
4791625b9b Merge pull request #262 from pmezard/index-and-tokenizer-doc-and-fix
Index and tokenizer doc and fix
2015-11-02 11:51:21 -05:00
Marty Schoch
30651065e9 fix panic on insufficiently sized buffer
adds test case to reproduce original problem
fixes #264
2015-10-30 18:25:38 -04:00
Marty Schoch
2bd3ef4080 copy relevant k/v pairs before advancing underlying iterator 2015-10-28 12:23:54 -04:00
Marty Schoch
d1b07f4909 fix dump methods to properly copy keys and values 2015-10-28 12:06:44 -04:00
Marty Schoch
01526e971f Merge branch 'master' into firestorm 2015-10-28 11:26:01 -04:00
Patrick Mezard
f2b3d5698e index: document TermFieldReader interface 2015-10-27 18:53:03 +01:00
Patrick Mezard
3df789d258 index: document empty strings behaviour when calling DocIDReader() 2015-10-27 18:53:03 +01:00
Marty Schoch
1a978a4591 fix go vet issues and cleanup reader/iterator 2015-10-26 16:41:58 -04:00
Marty Schoch
f0d282f5f8 add test case for seeing prefix iterators outside of range
similar to #256 except for prefix iterators
includes fix for boltdb and gtreap which had incorrect behavior
2015-10-26 16:14:29 -04:00