0
0
Commit Graph

542 Commits

Author SHA1 Message Date
Steve Yen
4822cff63a optimize Advance() with pre-allocated in-out param
This perf-related change helps the code and API reach more similarity
with the Next() methods, which now take a pre-allocate param.
2016-07-29 14:15:00 -07:00
Steve Yen
3c82086805 optimize upside_down reader & 64-bit struct alignments
The UpsideDownCouchTermFieldReader.Next() only needs the doc ID from
the key, so this change provides a specialized parseKDoc() method for
that optimization.

Additionally, fields in various structs are more 64-bit aligned, in an
attempt to reduce the invocations of runtime.typedmemmove() and
runtime.heapBitsBulkBarrier(), which the go compiler seems to
automatically insert to transparently handle misaligned data.
2016-07-23 10:37:40 -07:00
Steve Yen
5094d2d097 optimize moss PrefixIterator
Previously, the PrefixIterator() for moss was implemented by comparing
the prefix bytes on every Next().

With this optimization, the next larger endKeyExclusive is computed at
the iterator's initialization, which allows us to avoid all those
prefix comparisons.
2016-07-21 18:33:34 -07:00
Steve Yen
5271a0f62b optimize termFieldVectorsFromTermVectors when empty 2016-07-21 11:46:14 -07:00
Steve Yen
cbb174b074 optimize moss iterator Next() done/k/v maintenance 2016-07-21 11:10:49 -07:00
Steve Yen
b744148449 optimization to actually reuse the TermFrequencyRow 2016-07-21 11:10:49 -07:00
Steve Yen
6d7fa0b964 optimize moss iterator checkDone() 2016-07-21 11:10:49 -07:00
Steve Yen
39d3e2f028 optimize upside_down reader Next() with TermFieldDoc reuse
This optimization changes the index.TermFieldReader.Next() interface
API, adding an optional, pre-allocated *TermFieldDoc parameter, which
can help prevent garbage creation.
2016-07-21 11:10:49 -07:00
Steve Yen
2498ccc913 optimize upside_down reader Next() to reuse TermFrequencyRow
Before this change, upside down's reader would alloc a new
TermFrequencyRow on every Next(), which would be immediately
transformed into an index.TermFieldDoc{}.  This change reuses a
pre-allocated TermFrequencyRow that's a field in the reader.
2016-07-21 11:10:49 -07:00
Steve Yen
68af6aef62 optimize upside_down reader Next() when 0-length term field vectors
From some bleve-query perf profiling, term field vectors appeared to
be alloc'ed, which was unnecessary as term field vectors are disabled
in the bleve-blast/bleve-query tests.
2016-07-21 11:10:49 -07:00
Marty Schoch
5934a185f3 Merge pull request #398 from slavikm/master
Make facets much faster
2016-07-21 09:12:28 -04:00
slavikm
fc990bc2d1 Remove the field IDs from outside of the index 2016-07-19 20:42:45 -07:00
slavikm
ce64c17be1 Do field cache only once per search 2016-07-17 16:29:17 -07:00
slavikm
9a9b630a6d Make facets much faster 2016-07-17 15:31:35 -07:00
Steve Yen
80623f4a8a MB-20101 - moss KV fix Get() of 0-length vals
The moss KV store adapter's Get() implementation was incorrectly
transforming a 0-length val (e.g., []byte{}) into a nil val.
2016-07-15 14:41:30 -07:00
Marty Schoch
bd2a23fb6d remove firestorm index scheme
firestorm was an experiment
we learned a lot, but it did not result in a usable index scheme
2016-06-26 07:51:41 -04:00
Mark Mindenhall
c3c827aded Add boltdb config test 2016-06-14 13:36:40 -06:00
Mark Mindenhall
d369bd5c3c Add bucket fill percent option for boltdb 2016-06-13 18:47:38 -06:00
Marty Schoch
1be5699c54 Merge pull request #381 from MachineShop-IOT/master
Compact for boltdb (workaround for #374)
2016-06-08 00:01:20 -04:00
Steve Yen
4e531ae11b configurable mossStoreOptions and DeferredSort defaults to true 2016-06-07 17:38:43 -07:00
Mark Mindenhall
09fcc69516 rename defaultBatchSize to defaultCompactBatchSize 2016-06-01 14:25:57 -06:00
Mark Mindenhall
b5a4378a46 Cleanup godoc comments in PR 2016-06-01 13:59:57 -06:00
Mark Mindenhall
fecf7ab5c4 Compact for boltdb (workaround for #374) 2016-06-01 13:16:43 -06:00
Marty Schoch
92cf2a8974 Merge pull request #376 from MachineShop-IOT/master
Remove DictionaryTerm with count 0 during compact (workaround for #374)
2016-06-01 13:39:30 -04:00
Steve Yen
bf318b489b enable mossStore as configurable lower-level store
Also, bumped moss vendor SHA to latest moss with mossStore.
2016-05-26 13:33:22 -07:00
Mark Mindenhall
04351eb8f1 Move creation of iterator within transaction 2016-05-26 12:29:49 -06:00
Mark Mindenhall
686b20be4f Remove DictionaryTerm with count 0 during compact (workaround for #374) 2016-05-26 11:04:53 -06:00
Mark Mindenhall
3aa1d72233 Add compact method to goleveldb store 2016-05-17 16:58:17 -06:00
Marty Schoch
73b514fa4f do not put +/-Inf or NaN values into the stats map 2016-04-15 13:39:30 -04:00
Marty Schoch
b8a2fbb887 fix data race in bleve batch reuse
Currently bleve batch is build by user goroutine
Then read by bleve gourinte
This is still safe when used correctly
However, Reset() will modify the map, which is now a data race

This fix is to simply make batch.Reset() alloc new maps.
This provides a data-access pattern that can be used safely.
Also, this thread argues that creating a new map may be faster
than trying to reuse an existing one:

https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J

Separate but related, I have opted to remove the "unsafe batch"
checking that we did.  This was always limited anyway, and now
users of Go 1.6 are just as likely to get a panic from the
runtime for concurrent map access anyway.  So, the price paid
by us (additional mutex) is not worth it.

fixes #360 and #260
2016-04-08 15:32:13 -04:00
Marty Schoch
2a703376ea fix ineffectual assignments 2016-04-02 22:42:56 -04:00
Marty Schoch
7892882519 fix typos 2016-04-02 21:59:30 -04:00
Marty Schoch
194ee82c80 gofmt simplifications 2016-04-02 21:54:33 -04:00
Marty Schoch
639fb1ab89 remove NativeMergeOperator from core, it requires unsafe 2016-03-24 12:06:43 -04:00
Marty Schoch
724684a4f1 additional firestorm fixes for 64-bit alignment
part of #359
2016-03-20 11:02:13 -04:00
Marty Schoch
3dc64de478 moved fields requiring 64-bit alignment to start of struct
several data structures had a pointer at the start of the struct
on some 32-bit systems, this causes the remaining fields no longer
be aligned on 64-bit boundaries

the fix identifed by @pmezard is to put the counters first in the
struct, which guarantees correct alignment

fixes #359
2016-03-20 10:38:28 -04:00
Steve Yen
be2800a8e4 MB-18715 - moss Merge() didn't bump bufUsed correctly
And, also allocate more memory for both the partial and full merges.
2016-03-15 17:09:40 -07:00
Steve Yen
c1597842d0 moss lowerLevelUpdate didn't handle batches of size 1 2016-03-11 15:47:23 -08:00
Steve Yen
f1dac8b497 moss defaults to non-nil options.Log 2016-03-09 10:15:11 -08:00
Steve Yen
1d63c55f7c parse mossLowerLevelMaxBatchSize only when lower-level-store exists 2016-03-09 10:09:15 -08:00
Steve Yen
76b9365928 added moss RegistryCollectionOptions
The moss RegistryCollectionOptions allows applications to register
moss-related callback API functions and other advanced feature usage
at process initialization time.

For example, this could be used for moss's OnError(), OnEvent() and
logging callback options.
2016-03-09 09:40:29 -08:00
Marty Schoch
d7292ed891 add support for gathering stats via map for easier consumption 2016-03-07 18:37:46 -05:00
Marty Schoch
e51f4d5450 changing async test strategy, was failing in go 1.6 2016-03-07 09:39:20 -05:00
Marty Schoch
23a323bc9d add support for numPlainTextBytesIndexed metric 2016-03-05 14:05:08 -05:00
Marty Schoch
81780f97d0 add term search stats 2016-03-05 07:50:25 -05:00
Marty Schoch
147debaa12 expose metrics and moss stats wrapping underlying stats as well 2016-03-04 13:43:39 -05:00
Steve Yen
f6d1bd2c87 moss option MaxPreMergerBatches renamed 2016-03-03 11:18:30 -08:00
Steve Yen
7d67d89a9c MB-18441 - moss lower-level iterator starts positioned on current
The iterator starts off positioned so that Current() is correct, so
invoking Next() right off the bat was incorrect.
2016-03-01 21:45:48 -08:00
Steve Yen
a29dd25a48 upside_down dict row value size accounts for large uvarint's
This is somewhat unlikely, but if a term is (incredibly) popular, its
uvarint count value representation might go beyond 8 bytes.

Some KVStore implementations (like forestdb) provide a BatchEx cgo
optimization that depends on proper preallocated counting, so this
change provides a proper worst-case estimate based on the max-unvarint
of 10 bytes instead of the previously incorrect 8 bytes.
2016-02-22 11:52:51 -08:00
Steve Yen
dd1718fa78 index/store/moss uses AllocMerge() instead of Merge()
Performance optimization.  Before this change, by using Merge()
instead of AllocMerge(), moss's internal batch buf's would be
wastefully, dramatically grown during append()'s to a mis-sized buf.
2016-02-22 11:48:02 -08:00
Steve Yen
ea1a52464d more index/store/moss err handling 2016-02-20 14:25:42 -08:00
Steve Yen
eb315fa500 integrate index/store/moss KV store 2016-02-20 14:25:42 -08:00
Marty Schoch
208b700e17 add missing build tag guarding cznicb benchmark 2016-02-09 15:57:35 -05:00
Marty Schoch
40c95513b7 add support for including kvstore stats 2016-02-05 12:26:19 -05:00
Marty Schoch
c5dea9e882 fix accessing store via Advanced() method which was broken 2016-02-02 11:54:18 -05:00
Marty Schoch
710d06e974 add support for native C merge operators 2016-01-27 17:51:07 -05:00
Steve Yen
d97e3caf4f fix comment typo 2016-01-22 09:04:24 -08:00
Steve Yen
d5de1d3da1 metrics implements BatchEx correctly 2016-01-21 11:00:41 -08:00
Marty Schoch
fc34a97875 copy locations on merge for more safe/predictable behavior
fixes #328
2016-01-19 14:21:48 -05:00
Steve Yen
035d9d0e40 unneeded cast and parens 2016-01-17 00:16:05 -08:00
Marty Schoch
1335eb2a7b Merge pull request #322 from steveyen/WIP-perf-20160113
KVReader.MultiGet and KVWriter.NewBatchEx API's
2016-01-15 14:28:59 -05:00
opennota
8517feb1c6 Fix some typos 2016-01-15 05:46:27 +07:00
Silvan Jegen
d326898f7b Remove unneeded brackets 2016-01-14 16:41:41 +01:00
Steve Yen
6849e538be upside_down and firestorm use new NewBatchEx() API
With this change, the upside_down batchRows() and firestorm
batchRows() now use the new KVWriter.NewBatchEx() API, which can
improve performance by reducing the number of cgo hops.
2016-01-13 23:08:20 -08:00
Steve Yen
d94ccf2d74 added KVWriter.NewBatchEx() method 2016-01-13 16:19:04 -08:00
Steve Yen
fb048f6c64 added KVReader.MultiGet() method 2016-01-13 15:12:10 -08:00
Steve Yen
8dc067b1d9 go fmt 2016-01-13 15:11:50 -08:00
Steve Yen
fe39b3fd13 avoid fieldTermFreqs loop if no composite fields 2016-01-13 14:45:04 -08:00
Marty Schoch
af25e724f6 Merge branch 'master' of https://github.com/slavikm/bleve into slavikm-master 2016-01-13 16:10:59 -05:00
Silvan Jegen
35ac2b2bee Run go fmt ./... 2016-01-12 22:15:50 +01:00
Steve Yen
0e72b949b3 upside_down batchRows() takes array of arrays
In order to spend less time in append(), this change in upside_down
(similar to another recent performance change in firestorm) builds up
an array of arrays as the eventual input to batchRows().
2016-01-11 18:11:21 -08:00
slavikm
680be52f87 Implemented boolean field support 2016-01-11 17:18:03 -08:00
Steve Yen
7ce7d98cba upside_down merge dictionary deltas before using batch.Merge()
This change performs more dictionary delta incr/decr math in
batchRows() instead of in the KVStore ExecuteBatch() machinery.
2016-01-11 16:52:07 -08:00
Steve Yen
94273d5fa9 upside_down process internal rows earlier
With this change, internal rows are processed while we're waiting for
backIndex rows to be retrieved.
2016-01-11 16:25:35 -08:00
Steve Yen
bb5cd8f3d6 upside_down merge backIndexRow concurrently
Previously, the code would gather all the backIndexRows before
processing them.  This change instead merges the backIndexRows
concurrently on the theory that we might as well make progress on
compute & processing tasks while waiting for the rest of the back
index rows to be fetched from the KVStore.
2016-01-10 18:50:42 -08:00
Steve Yen
c3b5246b0c upside_down track analysis time tighter; and comments 2016-01-10 15:36:54 -08:00
Steve Yen
d3dd40d334 upside_down retrieves backindex concurrently with analysis
Start backindex reading concurrently with analysi to try to utilize
more I/O bandwidth.

The analysis time vs indexing time stats tracking are also now "off",
since there's now concurrency between those actiivties.

One tradeoff is that the lock area in upside_down Batch() is increased
as part of this change.
2016-01-10 15:18:28 -08:00
Steve Yen
bff95eef70 firestorm close kvwriter sooner 2016-01-10 15:18:27 -08:00
Steve Yen
860de28a28 fix memory leak by closing batches in batchRows() 2016-01-07 17:59:42 -08:00
Steve Yen
70105477cf added Close() method to KVBatch interface 2016-01-07 17:54:21 -08:00
Marty Schoch
48fcd5a7d5 Merge branch 'WIP-perf-20160106' of https://github.com/steveyen/bleve into steveyen-WIP-perf-20160106 2016-01-07 15:40:29 -05:00
Marty Schoch
665f5c58e1 fix errcheck violation 2016-01-07 11:11:43 -05:00
Marty Schoch
e54db33346 try testing slightly different way 2016-01-07 11:06:18 -05:00
Marty Schoch
cd940cc375 add another check to try to understand test failure on travis 2016-01-07 10:45:20 -05:00
Steve Yen
846912d083 upside_down udc.termVectorsFromTokenFreq rows append optimization 2016-01-07 00:48:34 -08:00
Steve Yen
8b980bd2ef firestorm avoid extra goroutine, similar to upside_down 2016-01-07 00:43:27 -08:00
Steve Yen
fbd0e7bfe9 upside_down backIndexTermEntries precalloc'ed capacity 2016-01-07 00:23:25 -08:00
Steve Yen
4eee8821f9 upside_down storeField/indexField append to provided arrays
Taking another optimization from firestorm, upside_down's
storeField()/indexField() funcs now also append() to passed-in arrays
rather than always allocating their own arrays.
2016-01-07 00:13:46 -08:00
Steve Yen
1af2927967 upside_down gets analysis perf rows optimizations from firestorm 2016-01-06 23:53:13 -08:00
Steve Yen
82b8b3468e upside_down analysis converts to docIDBytes once 2016-01-06 23:38:02 -08:00
Steve Yen
d6a997d8c1 firestorm gtreap lookup once per snapshot docID
Previously, firestorm would lookup docID's in the inFlight gtreap for
every candidate docNum, and this change moves the lookup to outside of
the loop.
2016-01-06 16:46:15 -08:00
Steve Yen
024848ac91 firestorm valid docNum finding, fixes #310 2016-01-06 16:04:56 -08:00
Steve Yen
7df07f94fa firestorm use the ParseKey() funcs to avoid unneeded value parsing
With this change, the row allocation also happens only once per loop,
instead of once per item.
2016-01-06 15:53:12 -08:00
Steve Yen
009d59222a firestorm StoredRow.ParseKey() func 2016-01-06 15:46:26 -08:00
Steve Yen
8389027ae8 firestorm TermFreqRow.ParseKey() func 2016-01-06 15:32:09 -08:00
Steve Yen
89d17f01ef analyze locations only if includeTermVectors enabled
With this change, TermLocations are computed and maintained only if
includeTermVectors is enabled, for higher performance.
2016-01-05 12:46:46 -08:00
Steve Yen
70b7e73c82 firestorm compensator inFlight.Get() might return nil 2016-01-03 10:21:54 -08:00
Steve Yen
fb8c9a7475 firestorm.Batch() collects [][]IndexRows instead of []IndexRow
Rather than append() all received rows into a flat []IndexRow during
the result gathering loop, this change instead collects the analysis
result rows into a [][]IndexRow, which avoids extra copying.

As part of this, firestorm batchRows() now takes the [][]IndexRow as
its input.
2016-01-02 12:30:47 -08:00
Steve Yen
1c5b84911d firestorm DictUpdater NotifyBatch is more async 2016-01-02 12:21:25 -08:00
Steve Yen
b241242465 firestorm.Analyze() preallocs rows, with analyzeField() func
The new analyzeField() helper func is used for both regular fields and
for composite fields.

With this change, all analysis is done up front, for both regular
fields and composite fields.

After analysis, this change counts up all the row capacity needed and
extends the AnalysisResult.Rows in one shot, as opposed to the
previous approach of dynamically growing the array as needed during
append()'s.

Also, in this change, the TermFreqRow for _id is added first, which
seems more correct.
2016-01-02 12:21:25 -08:00
Steve Yen
5b2bc1c20f firestorm.indexField() check for includeTermVectors moved out of loop 2016-01-02 12:21:25 -08:00
Steve Yen
45e9eaaacb firestorm.indexField() allocs up-front array of TermFreqRow's
This uses the "backing array" technique to allocate many TermFreqRow's
at the front of firestorm.indexField(), instead of the previous
one-by-one, as-needed TermFreqRow allocation approach.

Results from micro-benchmark, null-firestorm, bleve-blast has this
change producing a ~half MB/sec improvement.
2016-01-02 12:21:24 -08:00
Steve Yen
7ae696d661 firestorm lookuper notified via batch
Previously, the firestorm.Batch() would notify the lookuper goroutine
on a document by document basis.  If the lookuper input channel became
full, then that would block the firestorm.Batch() operation.

With this change, lookuper is notified once, with a "batch" that is an
[]*InFlightItem.

This change also reuses that same []*InFlightItem to invoke the
compensator.MutateBatch().

This also has the advantage of only converting the docID's from string
to []byte just once, outside of the lock that's used by the
compensator.

Micro-benchmark of this change with null-firestorm bleve-blast does
not show large impact, neither degradation or improvement.
2016-01-02 12:21:24 -08:00
Steve Yen
38d50ed8b5 renamed var to docsUpdated to match docsDeleted naming 2016-01-02 12:21:24 -08:00
Steve Yen
3feeb14b7d firestorm.batchRows reuses buf for all IndexRows 2016-01-02 12:21:24 -08:00
Steve Yen
0a7f7e3df8 firestorm.Analyze() converts docID to bytes only once 2016-01-02 12:21:24 -08:00
Steve Yen
fd81d0364c firestorm.indexField() uses capacity of len(tokenFreqs) 2016-01-02 12:21:24 -08:00
Steve Yen
ee5ccda112 use KeyTo/ValueTo in firestorm.batchRows
After this change, with null kvstore micro-benchmark...

  GOMAXPROCS=8 ./bleve-blast -source=../../tmp/enwiki.txt \
    -count=100000 -numAnalyzers=8 -numIndexers=8 \
    -config=../../configs/null-firestorm.json -batch=100

Then TermFreqRow key and value methods dissapear as large boxes from
the cpu profile graphs.
2016-01-01 09:57:59 -08:00
Steve Yen
fd287bdfa4 firestorm.md markdown fixes 2016-01-01 09:57:59 -08:00
Steve Yen
b605224106 use shorter go idiom 2015-12-29 22:14:45 -08:00
Antoine Grondin
6806343677 firestore: fix #296 for division by zero on GC 2015-12-25 11:34:19 +07:00
Antoine Grondin
a6f7abdfa3 firestore: reproducer for division by zero on GC 2015-12-25 11:33:46 +07:00
Marty Schoch
8efbd556a3 fix indexing bug with data coming from arrays
fixes #295
2015-12-21 14:59:32 -05:00
Marty Schoch
cf67fe2cbc fix major synchronization issue in the field_cache
The field cache is expected to be the authority on which field
names are identified by which identifier.  This code was
optimized for the most common case in which fields already
exist.  However, if we deterimine the field is missing with
the read lock (shared), we incorrectly immediately proceed
to create a new row with the write lock (exclusive).  The
problem is that multiple goroutines might have come to
the same conclusion, and they all proceed to add rows.  The two
choices were to do the whole operation with the write lock, or
recheck the value again with the write lock.  We have chosen
to repeat the check inside the write-lock, as this optimizes
for what we believe to be the most common case, in which most
fields will already exist.
2015-12-15 16:39:38 -05:00
Marty Schoch
a73a178923 fix incorrect prefix search behavior
avoids double incrementing of end term when reading term dict
fixes #293
2015-12-04 14:07:16 -05:00
Marty Schoch
699c86073a make existing integration tests work with firestorm 2015-12-01 12:29:56 -05:00
Marty Schoch
6d851cfcc2 fix bug in warmup which led to docs being deleted 2015-11-30 10:18:14 -05:00
Marty Schoch
aa8d98f5fa include space after prefix in log output 2015-11-30 10:17:48 -05:00
Marty Schoch
68d8742826 correctly prefix internal rows with 'i' and print them in debug 2015-11-30 10:17:15 -05:00
Marty Schoch
c93de9734e fix issues identified by errcheck 2015-11-24 14:32:33 -05:00
Marty Schoch
bbef1980d8 Merge branch 'master' into firestorm 2015-11-24 13:04:36 -05:00
Marty Schoch
ff11f83842 properly handle errors inside metrics kvstore reporting 2015-11-24 12:52:03 -05:00
Marty Schoch
a707d44e0b Merge branch 'master' into firestorm 2015-11-24 09:44:47 -05:00
Patrick Mezard
e85c9c542e row: expose TermFrequencyRow term and freq fields
Rows content is an implementation detail of bleve index and may change
in the future. That said, they also contains information valuable to
assess the quality of the index or understand its performances. So, as
long as we agree that type asserting rows should only be done if you
know what you are doing and are ready to deal with future changes, I see
no reason to hide the row fields from external packages.

Fix #268
2015-11-17 17:21:26 +01:00
Kosov Eugene
45e670b99b BoltDB wrapper nano optimization which makes code a bit prettier too 2015-11-05 00:27:28 +03:00
Marty Schoch
4791625b9b Merge pull request #262 from pmezard/index-and-tokenizer-doc-and-fix
Index and tokenizer doc and fix
2015-11-02 11:51:21 -05:00
Marty Schoch
30651065e9 fix panic on insufficiently sized buffer
adds test case to reproduce original problem
fixes #264
2015-10-30 18:25:38 -04:00
Marty Schoch
2bd3ef4080 copy relevant k/v pairs before advancing underlying iterator 2015-10-28 12:23:54 -04:00
Marty Schoch
d1b07f4909 fix dump methods to properly copy keys and values 2015-10-28 12:06:44 -04:00
Marty Schoch
01526e971f Merge branch 'master' into firestorm 2015-10-28 11:26:01 -04:00
Patrick Mezard
f2b3d5698e index: document TermFieldReader interface 2015-10-27 18:53:03 +01:00
Patrick Mezard
3df789d258 index: document empty strings behaviour when calling DocIDReader() 2015-10-27 18:53:03 +01:00
Marty Schoch
1a978a4591 fix go vet issues and cleanup reader/iterator 2015-10-26 16:41:58 -04:00
Marty Schoch
f0d282f5f8 add test case for seeing prefix iterators outside of range
similar to #256 except for prefix iterators
includes fix for boltdb and gtreap which had incorrect behavior
2015-10-26 16:14:29 -04:00
Patrick Mezard
5100e00f20 doc: DocIDReader.Advance() is no longer implementation dependent 2015-10-20 20:32:23 +02:00
Patrick Mezard
2fa334fc27 doc: talk about "documents" not "indexed or stored documents" 2015-10-20 20:24:24 +02:00
Patrick Mezard
b174c137fd doc: document DocIDReader, and some Index bits 2015-10-20 20:24:24 +02:00
Patrick Mezard
da72d0c2b9 store_test: deduplicate store initialization 2015-10-20 19:21:01 +02:00
Patrick Mezard
873f483804 gtreap: RangeIterator.Seek should not move before start 2015-10-20 19:12:30 +02:00
Patrick Mezard
5d7628ba3b boltdb: fix RangeIterator outside of range seeks
Two issues:
- Seeking before i.start and iterating returned keys before i.start
- Seeking after the store last key did not invalidate the iterator and
  could cause infinite loops.
2015-10-20 19:09:51 +02:00
Patrick Mezard
aada2e7333 store_test: test RangeIterator.Seek on goleveldb 2015-10-20 19:09:38 +02:00
Marty Schoch
6cc21346dc fix errcheck issues 2015-10-19 14:27:03 -04:00
Marty Schoch
817c317c90 Merge branch 'master' into newkvstore 2015-10-19 12:04:07 -04:00
Marty Schoch
faceecf87b make row buffer size constant/configurable
also handle case where it is insufficiently sized
2015-10-19 12:03:38 -04:00
Marty Schoch
f0ee9a3c66 removed commented code and unused functions 2015-10-19 11:13:03 -04:00
Marty Schoch
c9471d5739 Merge pull request #244 from kevgs/master
reducing allocation count
2015-10-16 15:51:30 -04:00
Marty Schoch
e6d0fc8d95 Merge pull request #247 from pmezard/remove-update-goroutine
upside_down: no need for a goroutine to enqueue AnalysisWork
2015-10-16 10:15:55 -04:00
Marty Schoch
4c6bc23043 rewrite to keep using same buffer when possible 2015-10-13 14:04:56 -07:00
Marty Schoch
8de860bf12 2 more places that used old Key() 2015-10-13 12:35:08 -07:00
Marty Schoch
5f594d1acc Merge branch 'master' into newkvstore 2015-10-12 18:07:04 -07:00
Marty Schoch
08572e4925 move literals outside loop for more predicatble test results 2015-10-12 18:06:38 -07:00
Patrick Mezard
8c928539ee upside_down: no need for a goroutine to enqueue AnalysisWork
It boils down to:
1. client sends some work and a notification channel to a single worker,
   then waits.
2. worker processes the work
3. worker sends the result to the client using the notification channel

I do not see any problem with this, even with unbuffered channels.
2015-10-12 10:42:14 +02:00
Marty Schoch
95e06538f3 fix benchmarks for the x kvstores 2015-10-09 11:09:42 -04:00
Marty Schoch
0f05d1d3ca Merge branch 'master' into newkvstore 2015-10-09 10:33:41 -04:00
Patrick Mezard
aee82f8b49 upside_down: simplify return code in batchRows() 2015-10-09 09:57:12 +02:00
Marty Schoch
e28eb749d7 bump up buffer size 2015-10-06 16:45:38 -04:00
Marty Schoch
71cbb13e07 modify code to reuse buffer for kv generation 2015-10-05 17:49:50 -04:00
Kosov Eugene
a61c350888 reducing allocation count 2015-10-05 22:57:10 +03:00
Patrick Mezard
9d5407be13 boltdb: add "nosync" option to force boltdb.DB.NoSync=true
Use this option when rebuilding indexes from scratch. In my small case
(~17000 json documents), it reduces indexing from 520s to 250s.

I did not add any test, short of forced indexing termination it only
has performance effects, which are hard to test. And unknown options are
currently ignored.

Issue #240
2015-10-03 14:26:48 +02:00
Marty Schoch
d06b526cbf more refactoring 2015-09-28 16:50:27 -04:00
Marty Schoch
66aa1b020a Merge branch 'master' into firestorm 2015-09-23 11:32:25 -07:00
Marty Schoch
900f1b4a67 major kvstore interface and impl overhaul
clarified the interface contract
2015-09-23 11:25:47 -07:00
Marty Schoch
f81b2be334 major refactor of bleve configuration
see #221 for full details
2015-09-16 17:10:59 -04:00
Marty Schoch
c308f611cf skip unnecessary map before slice
benchmark            old ns/op     new ns/op     delta
BenchmarkBatch-4     16950972      16377194      -3.38%

benchmark            old allocs     new allocs     delta
BenchmarkBatch-4     136164         136161         -0.00%

benchmark            old bytes     new bytes     delta
BenchmarkBatch-4     7168872       7109691       -0.83%
2015-09-10 08:21:26 -04:00
Marty Schoch
f6f1628b15 avoid doing unnecessary work:
benchmark            old ns/op     new ns/op     delta
BenchmarkBatch-4     20738739      17047158      -17.80%

benchmark            old allocs     new allocs     delta
BenchmarkBatch-4     136423         136160         -0.19%

benchmark            old bytes     new bytes     delta
BenchmarkBatch-4     20277781      7168772       -64.65%
2015-09-10 08:19:05 -04:00
Marty Schoch
c8538c835f Merge branch 'master' into firestorm 2015-09-10 08:14:14 -04:00
Marty Schoch
17c64d37c7 add similar benchmarks from firestorm 2015-09-10 08:13:52 -04:00
Marty Schoch
1e4d637761 adding more benchmarks 2015-09-10 08:01:11 -04:00
Marty Schoch
f74ed6a9ae Merge remote-tracking branch 'origin' into firestorm
cathching up with changes from master
2015-09-02 13:29:03 -04:00
Marty Schoch
dbb93b75a4 refactoring to allow pluggable index encodings
this lays the foundation for supporting the new firestorm
indexing scheme.  i'm merging these changes ahead of
the rest of the firestorm branch so i can continue
to make changes to the analysis pipeline in parallel
2015-09-02 13:12:08 -04:00
Marty Schoch
7ad7659ce5 add support for using null kvstore outside of bleve internals 2015-09-02 11:50:06 -04:00
Marty Schoch
07d37ca38a add important rocksdb config options 2015-09-02 11:49:42 -04:00
Marty Schoch
18151862b5 fix go vet issues 2015-08-25 15:13:13 -04:00
Marty Schoch
84811cf5a0 made index type configurable + first version of firestorm 2015-08-25 14:52:42 -04:00
Marty Schoch
3e60ca24ec support using end key on forestdb iterator for term freq lookup
also additoanl forestdb configs
2015-08-18 16:22:02 -04:00
Marty Schoch
ae19d77b04 updated protobuf defs to be valid 2015-08-17 15:37:13 -04:00
Marty Schoch
1187436e46 changed Stored row Values to also use protobuf 2015-08-17 09:48:40 -04:00
Marty Schoch
8d8a05a842 fix more issues 2015-08-14 16:27:00 -04:00
Marty Schoch
e0802a2b39 fixed the worst of the formatting 2015-08-14 16:17:48 -04:00
Marty Schoch
f4df56eb7c add first draft of firestorm proposal 2015-08-14 16:09:19 -04:00
Marty Schoch
d3dda3d0ea fixup config parsing and add new options 2015-08-12 13:18:23 -04:00
Marty Schoch
01667dfff3 faster protobufs with gogo 2015-08-12 13:18:23 -04:00
Marty Schoch
7df66b4857 fix broken benchmark cause by index row encoding change 2015-08-06 14:48:04 -04:00
Marty Schoch
9db850a53e Merge branch 'fix/MaxVarintLen64' of https://github.com/tukdesk/bleve into tukdesk-fix/MaxVarintLen64 2015-07-31 15:16:16 -04:00
Marty Schoch
3682c25467 update to correctly work with composite fields
also updated search results to return array positions
2015-07-31 11:16:11 -04:00
Marty Schoch
c1c4941dde Merge branch 'feature/term_vector' of https://github.com/tukdesk/bleve into tukdesk-feature/term_vector 2015-07-29 14:31:15 -04:00
Marty Schoch
bf8dcae76b removing build tags 2015-07-28 18:59:10 -04:00
Marty Schoch
1b28f6218b additional row validation 2015-07-13 15:22:54 -04:00
Marty Schoch
17ef48f82a switching back to the canonical goleveldb repo 2015-07-08 12:21:17 -06:00
Marty Schoch
bf80f4628e fix bug in curent goleveldb (must copy during iteration)
also changed over to mschoch fork of goleveldb (temporary)

the change to my fork is pending some read-only issues described
here:  https://github.com/syndtr/goleveldb/issues/111

hopefully we can find a path forward, and get that addressed upstream
2015-07-06 18:00:05 -04:00
Marty Schoch
7be7ecdf8e fix batch indexing bug, incremented docCount before commit
fixes #211
2015-06-08 14:14:05 -04:00
Marty Schoch
2768c2da3c fix previous sloppy fix which hadn't been adequately tested 2015-05-27 19:15:55 -07:00
Marty Schoch
201fb91171 fix up to correctly trim off separator
even though it should never be present
2015-05-27 19:10:12 -07:00
Marty Schoch
a58592ceff fix case where NewBackIndexRowKV returns nil, nil
the logic for reading the docID from the keys
in this row relies on the keys NEVER containing
the byte separator character (0xff), this is OK
as we require that all keys be valid utf-8
however, it turns out that in the case where this
rule was violated, we would panic, because we
return nil, nil and later try to print the doc id
2015-05-27 19:04:57 -07:00
dtynn
59c97ae577 use binary.MaxVarintLen64 2015-05-26 15:35:31 +08:00
Marty Schoch
e0887f9113 fix tests which deadlock boltdb due to deferred cleanup
fixes #209
2015-05-21 12:29:31 -04:00
Marty Schoch
a52d3b5c07 put in hack to allow boltdb reader isolation test to pass
in boltdb, long readers *MAY* block a writer.  in particular if
the write requires additional allocation, it must acquire a lock
already held by the reader.  in general this is not a problem
for bleve (though it can affect performance in some cases), but
it is a problem for the reader isolation test.  this commit
adds a hack to try and avoid the need for additional allocation
closes #208
2015-05-21 11:39:59 -04:00
dtynn
b4f7496031 update the index format version number 2015-05-18 15:16:35 +08:00
dtynn
89dc2c22bc update TermVector 2015-05-17 13:07:14 +08:00
Marty Schoch
8f70def63b properly use the stored array positions when loading a document
fixes #205
2015-05-15 15:47:54 -04:00
Marty Schoch
328bc73ed0 clarify Batch is not threadsafe in docs
in some limited cases we can detect unsafe usage
in these cases, do not trip over ourselves and panic
instead return a strongly typed error upside_down.UnsafeBatchUseDetected
also, introduced Batch.Reset() to allow batch reuse
this is currently still experimental
closes #195
2015-05-15 15:04:52 -04:00
Marty Schoch
57cd67fa88 fix data race on index metadata (docCount)
closes #198
2015-05-08 08:07:20 -04:00
Marty Schoch
57358088ec fix row merging bug
trying to be clever, we reused the memory allocated for the left
operand when doing partial merges
this had been tested to be safe, in general.  however, the
implementation was then written such that we always reused
globally defined operands, this meant that we mutated
the operands which were intended to always represent
+1/-1
this then cascades quickly to making increment/decrement
values much larger/smaller than they should be
related to #197
2015-05-06 11:00:04 -04:00
Marty Schoch
30a0ba1f9b fix bug, dictionary row encoding buffer too small
we incorrectly created a []byte of length 8
but the max for a uvarint is 10
closes #197
2015-05-06 10:04:02 -04:00
Steve Yen
e98ae8ab71 update metrics store to latest kvstore api 2015-04-27 11:01:53 -07:00
Marty Schoch
16f538d7b7 close documents returned by iterator before losing their reference
fixes #194
2015-04-24 17:48:21 -04:00
Marty Schoch
b54a59139c change forestdb imports to couchbase not couchbaselabs 2015-04-24 17:35:01 -04:00
Marty Schoch
ee47d1c21a standardize on including 1000 sized batches 2015-04-24 17:31:34 -04:00
Marty Schoch
452fea6a24 adding initial impl of rocksdb kv store 2015-04-24 17:19:44 -04:00
Marty Schoch
a9c07acbfa refactor of kvstore api to support native merge in rocksdb
refactor to share code in emulated batch
refactor to share code in emulated merge
refactor index kvstore benchmarks to share more code
refactor index kvstore benchmarks to be more repeatable
2015-04-24 17:13:50 -04:00
indraniel
a62320a50e + fix goleveldb's BytesSafeAfterClose() on reader
- it should be set to false
2015-04-10 15:45:22 -05:00
Marty Schoch
d5dc66313f change variable name conflicting when both LevelDB bencharmks run 2015-04-10 15:03:44 -04:00
Marty Schoch
d5caad4405 changed GoLevelDB benchmark names to be different from LevelDB
this will allow for easier comparision when running both
versions at the same time
2015-04-10 15:00:56 -04:00
Marty Schoch
5f66bd84c7 fix issues identified by errcheck 2015-04-10 14:59:05 -04:00
indraniel
54ab493b3e + correctly copy bytes from the goleveldb store
- this is part of a recent bleve KVStore API change.

    See the following two google group threads for more details:

    * [help adding goleveldb as an alternative Key/Value store for bleve][1]
    * [bleve search performance improvement][2]

    [1]: https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY
    [2]: https://groups.google.com/forum/#!topic/bleve/aTyqsSnbhik
2015-04-10 11:25:23 -05:00
indraniel
81bef38cce Revert "+ make copies of the []bytes returned by goleveldb"
This reverts commit cb8c1741289a0f00b30733e0d52d9d81d1199603.

This commit is no longer desired. The KV store API has been changed to
better address this issue.

For more details, see the google group conversation thread at:

https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY
2015-04-10 11:12:44 -05:00
indraniel
3a70401835 + make copies of the []bytes returned by goleveldb
- The byte strings returned by goleveldb aren't necessarily safe.  See
    the following google group thread:

    https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY

    This code change is based on the gist created here:

    https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY
2015-04-10 11:08:02 -05:00
indraniel
a88d714778 + add a goleveldb index updside-down benchmark test 2015-04-10 11:08:02 -05:00
indraniel
a0a2a61050 + keep 'get' consistent with levigo implementation
- this change keeps the method behavior consistent with the
     levigo/leveldb implementation.

   - don't issue an err if a key isn't found
2015-04-10 11:08:02 -05:00
indraniel
5e55fa2866 + keep 'getWithSnapshot' consistent with levigo implementation
- this change keeps the method behavior consistent with the
     levigo/leveldb implementation.

   - the leveldb store_test.go and goleveldb store_test.go are now
     identical.
2015-04-10 11:08:02 -05:00
indraniel
caa19e6c36 + initial stub of goleveldb package
- This is a first-pass introduction. Things may not be working
    correctly yet.
2015-04-10 11:08:02 -05:00
Marty Schoch
8581e73cef added String method for Batch
also changed Batch methods to pointer receiver
closes #180
2015-04-08 10:41:42 -04:00
Marty Schoch
539aeb8dc7 fix errors identified by errcheck
part of #169
2015-04-07 18:05:41 -04:00
Marty Schoch
ba6b3c8bb3 fix more issues identified by errcheck
part of #169
2015-04-07 16:45:23 -04:00
Marty Schoch
ab24772bf0 fix issues identified by errcheck
part of #169
2015-04-07 16:34:29 -04:00
Marty Schoch
56c4a09de1 fix issues identified by errcheck
part of #169
2015-04-07 15:39:56 -04:00
Marty Schoch
93e01a803e fix issues identified by errcheck
part of #169
2015-04-07 14:52:00 -04:00
Marty Schoch
f1ec73e764 fix issues identified by errcheck
part of #169
2015-04-07 13:26:54 -04:00
Marty Schoch
56a30a3574 fix issues identified by errcheck
part of #169
2015-04-07 13:05:47 -04:00
Marty Schoch
d2e9409413 fix issues identified by errcheck
part of #169
2015-04-07 12:04:59 -04:00
Marty Schoch
dd921d31e3 undoing f92ab131e4
we now guarantee bytes were copied earlier in the chain
the kv store is NOT responsible for making an additional copy
closes #181
2015-04-07 11:12:28 -04:00
Marty Schoch
443c0252e0 fix another metrics BytesSafeAfterClose() loop
closes #184
2015-04-03 21:17:23 -04:00
Steve Yen
efc39a6857 fix metrics BytesSafeAfterClose() loop
fixes issue 184
2015-04-03 16:36:32 -07:00
Marty Schoch
867110e03b major improvements to index row encoding
improvements uncovered some issues with how k/v data was copied
or not.  to address this, kv abstraction layer now lets impl
specify if the bytes returned are safe to use after a reader
(or writer since writers are also readers) are closed
See index/store/KVReader - BytesSafeAfterClose() bool
false is the safe value if you're not sure
it will cause index impls to copy the data
Some kv impls already have created a copy a the C-api barrier
in which case they can safely return true.

Overall this yields ~25% speedup for searches with leveldb.
It yields ~10% speedup for boltdb.
Returning stored fields is now slower with boltdb, as previously
we were returning unsafe bytes.
2015-04-03 16:50:48 -04:00
Steve Yen
dbf50b7f29 KVStore gtreap allows only 1 writer at a time 2015-03-26 16:40:18 -07:00
Steve Yen
f92ab131e4 KVStore gtreap implementation copies value bytes 2015-03-26 14:46:37 -07:00
Steve Yen
78453dab7d metrics KVStore now tracks last 100 errors 2015-03-19 18:41:16 -07:00
Marty Schoch
a44a7c01af rewrite to used fixed size []byte instead of buffer
removes unchecked errors in calls to buffer.Write
and also benchmarks considerably faster
2015-03-11 15:12:13 -04:00
Marty Schoch
522f9d5cc7 significant change to index format, support dictionary rows
this introduces disk format v4
now the summary rows for a term are stored in their own
"dictionary row" format, previously the same information
was stored in special term frequency rows
this now allows us to easily iterate all the terms for a field
in sorted order (useful for many other fuzzy data structures)

at the top-level of bleve you can now browse terms within a field
using the following api on the Index interface:

  FieldDict(field string) (index.FieldDict, error)
  FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error)
  FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)

fixes #127
2015-03-10 16:22:19 -04:00
Marty Schoch
4e14f4e4ef change path for forestdb test to correctly cleanup
this is due to forestdb auto-compaction using the provided
path as just the prefix, so if we're not careful we end
up with many stray files laying around
here, we create a sub-directory first, and just nuke the
whole subdir when we're done
2015-03-10 14:05:58 -04:00
Marty Schoch
300ec79c96 first pass at checking errors that were ignored
part of #169
2015-03-06 14:46:29 -05:00
Marty Schoch
a2ad7634f2 update term freq rows to use varint where possible
benchmark old ns/op new ns/op delta
BenchmarkLevelDBIndexing1Workers 1138292 657901 -42.20%
BenchmarkLevelDBIndexing2Workers 1619323 647628 -60.01%
BenchmarkLevelDBIndexing4Workers 1172845 636478 -45.73%
BenchmarkLevelDBIndexing1Workers10Batch 465556545 448153394 -3.74%
BenchmarkLevelDBIndexing2Workers10Batch 504203911 449657355 -10.82%
BenchmarkLevelDBIndexing4Workers10Batch 510766435 439839335 -13.89%
BenchmarkLevelDBIndexing1Workers100Batch 307657846 268976464 -12.57%
BenchmarkLevelDBIndexing2Workers100Batch 302257400 269110215 -10.97%
BenchmarkLevelDBIndexing4Workers100Batch 305320485 259084902 -15.14%
BenchmarkLevelDBIndexing1Workers1000Batch 301320576 258070231 -14.35%
BenchmarkLevelDBIndexing2Workers1000Batch 334174454 261175641 -21.84%
BenchmarkLevelDBIndexing4Workers1000Batch 267732436 261461739 -2.34%

closes #165
2015-03-06 13:00:53 -05:00
Marty Schoch
c566d34264 bump index format version number, start checking version on open 2015-02-17 17:16:31 +05:30
Steve Yen
38ee9be353 added some batch size 1000 microbenchmarks 2015-01-30 15:58:39 -08:00
Steve Yen
7d6a6aeaa8 single append for inmem KVStore batch 2015-01-29 11:14:08 -08:00
Steve Yen
5a30d36b17 cznicb KVStore uses Put() for faster read-modify-write 2015-01-29 11:02:01 -08:00
Steve Yen
b054cddf76 gtreap KVStore does 1 append for batch Set/Delete 2015-01-29 10:49:39 -08:00
Steve Yen
05d222f490 cznicb KVStore batch uses <2 appends per Set/Delete 2015-01-29 10:22:13 -08:00
Steve Yen
c5c59e61f4 make leveldb faster with non-zero sized batch 2015-01-29 10:20:26 -08:00
Steve Yen
1c1774d4ad throw away data even faster in null KVStore 2015-01-29 10:17:21 -08:00