0
0
Commit Graph

419 Commits

Author SHA1 Message Date
Sreekanth Sivasankaran
67a5814fbe MB-22410:deleting/editing index definition with large dirty write queue can be very slow
Adding a configurable forced store close
2017-03-01 18:58:32 +05:30
Sreekanth Sivasankaran
324e4237cf adding configurable Abort Close 2017-03-01 16:23:56 +05:30
Sundar Sridharan
74c7de0dcf re-order childSnapshot declaration 2017-02-21 15:54:04 -08:00
Sundar Sridharan
04d428656e Add Snapshot interface methods for moss child collections feature 2017-02-20 15:03:45 -08:00
Steve Yen
0b70a1bcb8 use inlined prealloc'ed termFreqRow in upsidedown termFieldReader 2017-02-08 18:23:13 -08:00
Steve Yen
31fecc3663 avoid row alloc's in upsidedown termFieldReader constructor 2017-02-08 18:14:30 -08:00
Marty Schoch
606fd6344b INDEX FORMAT CHANGE: change back index row value
Previously term entries were encoded pairwise (field/term), so
you'd have data like:

F1/T1 F1/T2 F1/T3 F2/T4 F3/T5

As you can see, even though field 1 has 3 terms, we repeat the F1
part in the encoded data.  This is a bit wasteful.

In the new format we encode it as a list of terms for each field:

F1/T1,T2,T3 F2/T4 F3/T5

When fields have multiple terms, this saves space.  In unit
tests there is no additional waste even in the case that a field
has only a single value.

Here are the results of an indexing test case (beer-search):

$ benchcmp indexing-before.txt indexing-after.txt
benchmark               old ns/op       new ns/op       delta
BenchmarkIndexing-4     11275835988     10745514321     -4.70%

benchmark               old allocs     new allocs     delta
BenchmarkIndexing-4     25230685       22480494       -10.90%

benchmark               old bytes      new bytes      delta
BenchmarkIndexing-4     4802816224     4741641856     -1.27%

And here are the results of a MatchAll search building a facet
on the "abv" field:

$ benchcmp facet-before.txt facet-after.txt
benchmark             old ns/op     new ns/op     delta
BenchmarkFacets-4     439762100     228064575     -48.14%

benchmark             old allocs     new allocs     delta
BenchmarkFacets-4     9460208        3723286        -60.64%

benchmark             old bytes     new bytes     delta
BenchmarkFacets-4     260784261     151746483     -41.81%

Although we expect the index to be smaller in many cases, the
beer-search index is about the same in this case.  However,
this may be due to the underlying storage (boltdb) in this case.

Finally, the index version was bumped from 5 to 7, since smolder
also used version 6, which could lead to some confusion.
2017-01-24 15:39:38 -05:00
Steve Yen
5927224e15 optimize mergeOldAndNew for case of first time a doc is seen 2017-01-09 22:48:58 -08:00
Steve Yen
790f2e3e32 optimize by alloc'ing arrays of TermFrequencyRow/TermVector 2017-01-09 22:42:00 -08:00
Steve Yen
8f4726ab10 use struct{}{} idiom instead of additional mark var 2017-01-09 10:17:26 -08:00
Steve Yen
302cac72c4 optimize mergeOldAndNew when non-update case 2017-01-08 17:59:49 -08:00
Steve Yen
931d133024 go fmt and go vet 2017-01-07 22:14:22 -08:00
Steve Yen
40780254ae optimize upsidedown mergeOldAndNew existing key maps
The optimization is to provide a better initial size to the map
constructor and to use a 0-byte-sized struct{} as the map values.
2017-01-07 22:05:55 -08:00
Steve Yen
c2bafa2a51 optimize term vectors/locations via preallocated arrays
The change should hit the allocator less often when processing term
vectors/locations as it preallocates larger, contiguous arrays of
records upfront.
2017-01-07 12:34:06 -08:00
Steve Yen
8b140d84c4 minor optimization of upsidedown backIndexRowForDoc
This change might allow a smart enough golang compiler to perhaps
allocate a backIndexRow on the stack rather than the heap.
2017-01-07 11:49:42 -08:00
Steve Yen
c21d27e15a upsidedown TermFieldReader checks includeTermVectors flag param
The flag was part of the API, but wasn't previously checked.
2017-01-05 21:10:27 -08:00
Steve Yen
37490864ce bleve/index/store/moss - accessor for underlying mossStore
This change adds methods that provide access to the actual, underlying
mossStore instance in the bleve/index/store/moss KVStore adaptor.

This enables applications to utilize advanced, mossStore-specific
features (such as partial rollback of indexes).  See also
https://issues.couchbase.com/browse/MB-17805
2016-12-05 12:25:29 -08:00
Patrick Mezard
c81fd6fdb0 index: DocIDReader.Next() returns nil when done not io.EOF 2016-11-20 19:05:35 +01:00
Steve Yen
2a8237e8cc optimize FacetsBuilder with cached fields & avoid some allocs 2016-10-25 15:34:48 -07:00
Steve Yen
a941a0f318 simplify DocumentFieldTerms append() usage 2016-10-25 15:30:19 -07:00
Rob McColl
12c404aec0 Update kvstore.go 2016-10-19 10:31:00 -04:00
Rob McColl
2b26218591 Fix NumDeletes doc copy/paste err s/Merge/Delete/g 2016-10-17 12:42:21 -04:00
Marty Schoch
77b79a2684 Merge pull request #466 from steveyen/optimize-fieldDict-reader-with-prealloc
Optimize upside-down's field dict reader with preallocated objects
2016-10-13 14:09:54 +02:00
Steve Yen
62e6f1f648 reuse incrementBytes() in moss KV store integration
In this commit, I saw that there was a simple incrementBytes()
implementation elsewhere in bleve that seemed simpler than using the
big int package.

Edge case note: if the input bytes would overflow in incrementBytes(),
such as with an input of [0xff 0xff 0xff], it returns nil.  moss then
treats a nil endKeyExclusive iterator param as a logical
"higher-than-topmost" key, which produces the prefix iteration
behavior that we want for this edge situation.
2016-10-12 09:34:36 -07:00
Steve Yen
01fb59d293 optimize upside-down DictionaryRow for fewer parsing alloc's 2016-10-12 09:22:50 -07:00
Steve Yen
2d72b542c0 optimize upside-down FieldDict reader with prealloc'ed objects
As part of this commit, there's also a newly added
Dictionaryrow.parseDictionaryK() helper method.
2016-10-12 09:18:58 -07:00
Marty Schoch
2f48d7fb02 fix misspellings 2016-10-02 12:11:15 -04:00
Marty Schoch
2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch
6bf9dd59ab BREAKING CHANGE - additional package renaming
i recently learned that package names should also prefer the
singular form, not the plural form
2016-10-01 17:20:59 -04:00
Steve Yen
004e157963 field cache also tracks fieldIndex -> fieldName reverse mapping 2016-10-01 13:06:03 -07:00
Steve Yen
c362ab302e fix tracking of termSearchersFinished stats 2016-09-30 16:11:30 -07:00
Marty Schoch
caf5256f74 don't export internal timers from metrics kvstore 2016-09-30 15:52:16 -04:00
Marty Schoch
f90856b8d3 BREAKING CHANGE - rename upside_down to upsidedown 2016-09-30 12:36:38 -04:00
Marty Schoch
35da361bfa BREAKING CHANGE - renamed packages to be shorter and not use _
this commit only addresses the analysis sub-package
2016-09-30 12:36:10 -04:00
Marty Schoch
73d0951b2a don't panic on missing backindex row
part of #419
2016-09-27 22:16:45 -04:00
Marty Schoch
fb0f4bbecd BREAKING CHANGE - new method to create memory only index
Previously bleve allowed you to create a memory-only index by
simply passing "" as the path argument to the New() method.

This was not clear when reading the code, and led to some
problematic error cases as well.

Now, to create a memory-only index one should use the
NewMemOnly() method.  Passing "" as the path argument
to the New() method will now return os.ErrInvalid.

Advanced users calling NewUsing() can create disk-based or
memory-only indexes, but the change here is that pass ""
as the path argument no longer defaults you into getting
a memory-only index.  Instead, the KV store is selected
manually, just as it is for the disk-based solutions.

Here is an example use of the NewUsing() method to create
a memory-only index:

NewUsing("", indexMapping, Config.DefaultIndexType,
         Config.DefaultMemKVStore, nil)

Config.DefaultMemKVStore is just a new default value
added to the configuration, it currently points to
gtreap.Name (which could have been used directly
instead for more control)

closes #427
2016-09-27 14:11:40 -04:00
Marty Schoch
1f79f65b6a Merge pull request #450 from mschoch/bug449
fix logic in Advance() of UpsideDownCouchDocIDReader
2016-09-26 12:44:09 -04:00
Marty Schoch
981812ff70 fix logic in Advance() of UpsideDownCouchDocIDReader
also added unit tests for newUpsideDownCouchDocIDReaderOnly
use cases
fixes #449
2016-09-26 12:36:24 -04:00
Steve Yen
10cab1826d added upside_down TermFrequencyRow.KeyAppendTo() API
This is a cleanup commit that's followup to a code review discussion
on a previous Advance() perf-optimization PR...

https://github.com/blevesearch/bleve/pull/443
2016-09-23 09:22:42 -07:00
Steve Yen
988dfb02e9 moss kvstore iterator Seek() invokes underlying moss SeekTo() API 2016-09-22 17:46:06 -07:00
Steve Yen
5f5b5d3b80 optimize upside_down TermFieldReader.Advance() to reuse memory
On a dev laptop, bleve-query benchmark on wiki dataset using
query-string of "+text:afternoon +text:coffee" previously had
throughput of 1222qps, and with this change hits 1940qps.
2016-09-22 17:46:06 -07:00
Steve Yen
bcec199c89 issue 441 - upside_down termFieldReader doesn't call Next() early
This change to upside_down term-field-reader no longer moves the
underlying iterator forward preemptively.  Instead, it will invoke
Next() on the underlying iterator only when the caller invokes the
term-field-reader's Next().

There's a special case to handle the situation on the first Next()
invocation after the term-field-reader is created.
2016-09-22 09:18:29 -07:00
slavikm
3eec1ae16c Satisfy errcheck 2016-09-21 17:56:03 +03:00
slavikm
40c1dc076f Now, without the rollback 2016-09-21 16:15:06 +03:00
slavikm
588f379962 Commit if there is no error, rollback otherwise 2016-09-21 16:13:47 +03:00
slavikm
ac49306077 Make sure that the transaction is closed if there is an error 2016-09-21 14:32:05 +03:00
Marty Schoch
e68f6ca9e6 Merge pull request #432 from steveyen/perf-skip-0xff-scan
skip termFrequencyRow 0xFF scan as term length is already known
2016-09-18 12:20:21 -04:00
Steve Yen
b5d2c32b46 skip termFrequencyRow 0xFF scan as term length is already known
This commit modifies the upside_down TermFrequencyRow parseKDoc() to
skip the ByteSeparator (0xFF) scan, as we already know the term's
length in the UpsideDownCouchTermFieldReader.

On my dev box, results from bleve-query test on high frequency terms
went from previous 107qps to 124qps.
2016-09-18 08:56:05 -07:00
Marty Schoch
3fd2a64872 BREAKING CHANGE - removed DumpXXX() methods from bleve.Index
The DumpXXX() methods were always documented as internal and
unsupported.  However, now they are being removed from the
public top-level API.  They are still available on the internal
IndexReader, which can be accessed using the Advanced() method.

The DocCount() and DumpXXX() methods on the internal index
have moved to the internal index reader, since they logically
operate on a snapshot of an index.
2016-09-13 12:40:01 -04:00
Marty Schoch
e1fb860a86 removed unused AsyncIndex interface 2016-09-13 08:42:36 -04:00