0
0
Commit Graph

658 Commits

Author SHA1 Message Date
Steve Yen
0539744e90 scorch mergeplan.ToBarChart() refactored to callable API
Refactored out API so it's usable from other places.
2017-12-16 08:39:10 -08:00
Steve Yen
dc4df18001
Merge pull request #662 from steveyen/scorch
scorch mergeplan package comments tweak
2017-12-15 18:41:20 -08:00
Marty Schoch
a575be4d56 fix issue where we incorrectly seed the nextSegmentID on Open() 2017-12-15 19:26:23 -05:00
Steve Yen
45c212a0c2 scorch mergeplan package comments tweak
Moving the package comment for mergeplan to the right place.
2017-12-15 13:25:39 -08:00
Steve Yen
620dcdb6f8 scorch uses prealloc'ed buffer for docNumberToBytes()
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now faster than
upsidedown/moss on high-freq term search "text:date"...

       400 qps - upsidedown/moss
       404 qps - scorch before
       565 qps - scorch after
2017-12-15 11:58:21 -08:00
Steve Yen
f05794c6aa scorch removed worker goroutines from TermFieldReader()
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now in more the same
neighborhood of upsidedown/moss...

high-freq term search "text:date"...
   400 qps - upsidedown/moss
   360 qps - scorch before
   404 qps - scorch after

zero-freq term search "text:mschoch"...
  100K qps - upsidedown/moss
   55K qps - scorch before
   99K qps - scorch after

Of note, the scorch index had ~150 *.zap files in it, which likely
made made the worker goroutine overhead more costly than for a case
with few segments, where goroutine and channel related work appeared
relatively prominently in the pprof SVG's.
2017-12-15 11:11:18 -08:00
Marty Schoch
562b473e36
Merge pull request #657 from steveyen/scorch
scorch fix data race w/ AddEligibleForRemoval
2017-12-14 17:56:06 -05:00
Marty Schoch
b5aa4ed22b return err not panic 2017-12-14 17:41:02 -05:00
Steve Yen
506aa1c325 scorch fix data race w/ AddEligibleForRemoval
Found from "go test -race ./..."

WARNING: DATA RACE
Read at 0x00c420088060 by goroutine 48:
  github.com/blevesearch/bleve/index/scorch.(*Scorch).AddEligibleForRemoval()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:348 +0x6d

Previous write at 0x00c420088060 by goroutine 31:
  github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt.func1()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:332 +0x87b
  github.com/boltdb/bolt.(*DB).View()
      /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1
  github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1
  github.com/blevesearch/bleve/index/scorch.(*Scorch).Open()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f
  github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351
  testing.tRunner()
      /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c

Goroutine 48 (running) created at:
  github.com/blevesearch/bleve/index/scorch.(*IndexSnapshot).DecRef()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/snapshot_index.go:72 +0x23e
  github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt.func1()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:330 +0x8f4
  github.com/boltdb/bolt.(*DB).View()
      /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1
  github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1
  github.com/blevesearch/bleve/index/scorch.(*Scorch).Open()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f
  github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351
  testing.tRunner()
      /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c
2017-12-14 14:40:33 -08:00
Marty Schoch
6ab27e4afa quick hack to disable safe batches in fts 2017-12-14 17:19:50 -05:00
Steve Yen
eb2f541d4f scorch filters _id from Reader.Document() results 2017-12-14 13:52:28 -08:00
Steve Yen
a8884e1011 scorch fix for TestSortMatchSearch
The cachedDocs preparation has to happen for all docs in the field,
not just on the currently requested docNum.

Also, as part of this commit, there's a loop optimization where we no
longer use bytes.Split() on the terms buffer, thus avoiding garbage
creation.
2017-12-14 13:22:13 -08:00
Steve Yen
2be5eb4427 scorch tracks zap files that can't be removed yet
A race & solution found by Marty Schoch... consider a case when the
merger might grab a nextSegmentID, like 4, but takes awhile to
complete.  Meanwhile, the persister grabs the nextSegmentID of 5, but
finishes its persistence work fast, and then loops to cleanup any old
files.  The simple approach of checking a "highest segment ID" of 5 is
wrong now, because the deleter now thinks that segment 4's zap file is
(incorrectly) ok to delete.

The solution in this commit is to track an ephemeral map of filenames
which are ineligibleForRemoval, because they're still being written
(by the merger) and haven't been fully incorporated into the rootBolt
yet.

The merger adds to that ineligibleForRemoval map as it starts a merged
zap file, the persister cleans up entries from that map when it
persists zap filenames into the rootBolt, and the deleter (part of the
persister's loop) consults the map before performing any actual zap
file deletions.
2017-12-14 10:49:33 -08:00
Marty Schoch
bd742caf65 don't try to close a nil segment if err opening 2017-12-14 10:29:19 -05:00
Marty Schoch
149a26b5c1 merge deletion and cacheddocs fixes discussed in meeting 2017-12-14 10:27:39 -05:00
Sreekanth Sivasankaran
95b65ade3e getting right internalID for doc in UT 2017-12-14 17:16:47 +05:30
Sreekanth Sivasankaran
1066ee7d22 DocumentVisitFieldTerms Scorch implementation level1 2017-12-14 12:38:29 +05:30
Marty Schoch
2b92e5ff99
Merge pull request #653 from steveyen/scorch
scorch cleanup of the rootBolt of old snapshots
2017-12-13 22:47:14 -05:00
Marty Schoch
e1b0c61e2a fix bug in handling iterator-done 2017-12-13 22:08:06 -05:00
Steve Yen
b7dff6669f scorch cleanup of *.zap files not listed in the rootBolt 2017-12-13 17:09:50 -08:00
Steve Yen
c0cc46a2be scorch cleanup of the rootBolt of old snapshots
A new global variable, NumSnapshotsToKeep, represents the default
number of old snapshots that each scorch instance should maintain -- 0
is the default.  Apps that need rollback'ability may want to increase
this value in early initialization.

The Scorch.eligibleForRemoval field tracks epoches which are safe to
delete from the rootBolt.  The eligibleForRemoval is appended to
whenever the ref-count on an IndexSnapshot drops to 0.

On startup, eligibleForRemoval is also initialized with any older
epoch's found in the rootBolt.

The newly introduced Scorch.removeOldSnapshots() method is called on
every cycle of the persisterLoop(), where it maintains the
eligibleForRemoval slice to under a size defined by the
NumSnapshotsToKeep.

A future commit will remove actual storage files in order to match the
"source of truth" information found in the rootBolt.
2017-12-13 15:53:31 -08:00
Steve Yen
c13ff85aaf scorch ref-counting
Future commits will provide actual cleanup when ref-counts reach 0.
2017-12-13 14:48:07 -08:00
Marty Schoch
50471003dc basic refactoring of introducer to make it more readable 2017-12-13 16:30:39 -05:00
Marty Schoch
a0e12b2640 add license to a few files missing it 2017-12-13 16:12:29 -05:00
Marty Schoch
85e15628ee major refactoring of posting details 2017-12-13 16:10:06 -05:00
Marty Schoch
6e2207c445 additional refactoring of build/merge 2017-12-13 15:22:13 -05:00
Marty Schoch
50441e5065 refactor to reuse shared code 2017-12-13 14:41:20 -05:00
Marty Schoch
289dc398bd more refacotring of build/merge 2017-12-13 14:26:11 -05:00
Marty Schoch
1cd3fd7fbe extrac common functionality between build/merge 2017-12-13 14:06:54 -05:00
Marty Schoch
cd45487cb3 fsync rootBolt when persisting snapshot 2017-12-13 13:55:06 -05:00
Marty Schoch
f83c9f2a20 initial cut of merger that actually introduces changes 2017-12-13 13:41:03 -05:00
Marty Schoch
c15c3c11cd extra protection if dict address is 0 (empty segment) 2017-12-13 13:31:18 -05:00
Steve Yen
be7dd36ac6 mergeplan: more tests and bargraph tweaks 2017-12-12 10:37:27 -08:00
Steve Yen
59a1e26300 mergeplan: scoring implemented 2017-12-12 10:37:27 -08:00
Marty Schoch
57121e40a8 fix issues identified by errcheck 2017-12-12 11:41:14 -05:00
Marty Schoch
665c3c80ff initial cut of zap segment merging 2017-12-12 11:21:55 -05:00
Marty Schoch
927216df8c fix postings list count impl 2017-12-12 08:42:13 -05:00
Steve Yen
3461fb741f mergeplan: a placeholder planner that merges all segments
A stepping stone to fleshing out the API contract.
2017-12-11 14:53:08 -08:00
Marty Schoch
58ef21a88a fix golint issue 2017-12-11 16:24:46 -05:00
Marty Schoch
f246e0e4c0 update README for zap file format changes 2017-12-11 16:22:29 -05:00
Marty Schoch
74b2eeb14d refactor where we do some work so we can return error 2017-12-11 15:59:36 -05:00
Marty Schoch
f13b786609 fix up issues to get all bleve unit tests passing for scorch
make scorch default
2017-12-11 15:47:41 -05:00
Marty Schoch
d7eb223e14 remove bolt segment format
upcomning breaking changes and no desire to maintain
2017-12-11 10:20:26 -05:00
Marty Schoch
eada7b209b fix test issue identified by sreekanth 2017-12-11 10:16:56 -05:00
Marty Schoch
8280859bb8 handle read-only and in-mem only cases 2017-12-11 09:07:01 -05:00
Marty Schoch
e8cc7ac0bf add new fields command to zap cmd-line util 2017-12-11 09:05:50 -05:00
Marty Schoch
690cd39921 add crazy slow but functional DocumentVisitFieldTerms 2017-12-10 08:55:59 -05:00
Marty Schoch
dc0adc8827 add fsync 2017-12-09 20:52:01 -05:00
Marty Schoch
e0d9828cd0 add more detail to the readme 2017-12-09 14:42:36 -05:00
Marty Schoch
414899618b switch from bolt format to zap in the persister 2017-12-09 14:28:50 -05:00
Marty Schoch
9781d9b089 add initial version of zap file format 2017-12-09 14:28:33 -05:00
Marty Schoch
ff2e6b98e4 added empty segment 2017-12-09 12:43:02 -05:00
Marty Schoch
e470105635 fix issues identified by errcheck 2017-12-06 18:36:14 -05:00
Marty Schoch
adac4f41db initial version of scorch which persists index to disk 2017-12-06 18:33:47 -05:00
Marty Schoch
b1346b4c8a add readme describing our use of bolt as a segment format 2017-12-05 16:09:00 -05:00
Marty Schoch
898a6b1e85 fix errcheck issues 2017-12-05 13:32:57 -05:00
Marty Schoch
ece27ef215 adding initial version of bolt persisted segment 2017-12-05 13:05:12 -05:00
Marty Schoch
f6be841668 add test for postings list count method 2017-12-05 13:01:36 -05:00
Marty Schoch
30e9d6daa5 add better testing of array positions 2017-12-05 12:54:44 -05:00
Marty Schoch
8d9d45115f add test of location field 2017-12-05 12:20:06 -05:00
Marty Schoch
8f0350865b add test for segment fields method 2017-12-05 12:17:56 -05:00
Marty Schoch
7a6b5483f2 add validation that all locations were seen 2017-12-05 11:58:05 -05:00
Marty Schoch
e08fdab54a remove todo item 2017-12-05 10:13:27 -05:00
Marty Schoch
87e2627551 added dictionary tests to mem segment 2017-12-05 09:49:41 -05:00
Marty Schoch
ed067f45dd added Close() method to Segment 2017-12-05 09:31:02 -05:00
Marty Schoch
22ffc8940e update segment API to return error in key places 2017-12-04 18:06:06 -05:00
Marty Schoch
b74cf4b081 add copyright header to all new files in scorch 2017-12-01 15:42:50 -05:00
Marty Schoch
89aa02cf5b fix highlighting of composite fields
updated log statements for refactored names
2017-12-01 15:12:08 -05:00
Marty Schoch
cff14f1212 fix crash in DocNumbers when segment is empty 2017-12-01 09:50:27 -05:00
Marty Schoch
eb256f78bc switch to constant referring to id field id 0
this avoids potentially mutating something that is intended
to be immutable
2017-12-01 09:30:07 -05:00
Marty Schoch
7c964de8bf switch to binary search for finding segment from global doc num
added unit tests for this function specifically
2017-12-01 09:26:51 -05:00
Marty Schoch
c2047dcdf9 refactor doc id reader creation to share more code
fix issue identified by steve
2017-12-01 08:54:39 -05:00
Marty Schoch
bcd4bdc3d1 added initial bolt thought to README 2017-12-01 07:27:04 -05:00
Marty Schoch
395458ce83 refactor to make mem segment contents exported 2017-12-01 07:26:47 -05:00
Steve Yen
398dcb19b3 scorch introducer uses the roaring.Or(x, y) API
Instead of cloning an input bitmap, the roaring.Or(x, y)
implementation fills a brand new result bitmap, which should be allow
for more efficient packing and memory utilization.
2017-11-30 10:37:10 -08:00
Steve Yen
67986d41bf scorch InternalID() handles case of unknown docId 2017-11-30 08:36:01 -08:00
Marty Schoch
848aca4639 fix issues identified by errcheck 2017-11-29 13:34:15 -05:00
Marty Schoch
23f6dc1cc6 working in-memory version 2017-11-29 11:33:35 -05:00
Steve Yen
546700b2de fix comment typo 2017-08-24 16:25:10 -07:00
Marty Schoch
cea119449e fix data race in doc id search
the implementation of the doc id search requires that the list
of ids be sorted.  however, when doing a multisearch across
many indexes at once, the list of doc ids in the query is shared.
deeper in the implementation, the search of each shard attempts
to sort this list, resulting in a data race.

this is one example of a potentially larger problem, however
it has been decided to fix this data race, even though larger
issues of data owernship may remain unresolved.

this fix makes a copy of the list of doc ids, just prior to
sorting the list.  subsequently, all use of the list is on the
copy that was made, not the original.

fixes #518
2017-08-07 15:11:35 -04:00
abhinavdangeti
8ec88a6cb0 MB-24560: Add moss store|collection histograms to stats 2017-05-25 16:32:36 -07:00
Marty Schoch
3ad13236ec fix geopoint fields to be able to be stored and retrieved 2017-03-31 09:40:54 -04:00
Marty Schoch
74140d4f2b remove forestdb from bleve 2017-03-30 12:27:23 -04:00
Marty Schoch
1bcfe4efa1 Merge pull request #546 from sreekanth-cb/store_abort_close
Store abort close
2017-03-07 12:35:18 -05:00
Sreekanth Sivasankaran
f759d841c2 Adding guards for config casting. 2017-03-07 22:51:27 +05:30
Sreekanth Sivasankaran
e88ff3c60a Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close
Syntax change for errcheck tool
2017-03-07 19:56:08 +05:30
Sreekanth Sivasankaran
ee819f5950 MB-22410 - Configurable forced Store Abort API
Adding a configurable forced store close
Bumping the moss store version
2017-03-07 19:33:51 +05:30
Marty Schoch
0eba2a3f0c reduce garbage created while processing facets
previously we parsed/returned large sections of the documents
back index row in order to compute facet information.  this
would require parsing the protobuf of the entire back index row.
unfortunately this creates considerable garbage.

this new version introduces a visitor/callback approach to
working with data inside the back index row.  the benefit
of this approach is that we can let the higher-level code
see values, prior to any copies of data being made or
intermediate garbage being created.  implementations of
the callback must copy any value which they would like to
retain beyond the callback.

NOTE: this approach is duplicates code from the
automatically generated protobuf code

NOTE: this approach assumes that the "field" field be serialized
before the "terms" field.  This is guaranteed by our currently
generated protobuf encoder, and is recommended by the protobuf
spec.  But, decoders SHOULD support them occuring in any order,
which we do not.
2017-03-02 17:00:46 -05:00
Marty Schoch
b04745abcc remove smolder indexing scheme
this was an experiment that we're no longer working on
we learned from it, but now carrying it forward has
a maintenance burden we don't wish to pay
2017-03-01 14:38:17 -05:00
Sreekanth Sivasankaran
67a5814fbe MB-22410:deleting/editing index definition with large dirty write queue can be very slow
Adding a configurable forced store close
2017-03-01 18:58:32 +05:30
Sreekanth Sivasankaran
324e4237cf adding configurable Abort Close 2017-03-01 16:23:56 +05:30
Sundar Sridharan
74c7de0dcf re-order childSnapshot declaration 2017-02-21 15:54:04 -08:00
Sundar Sridharan
04d428656e Add Snapshot interface methods for moss child collections feature 2017-02-20 15:03:45 -08:00
Steve Yen
0b70a1bcb8 use inlined prealloc'ed termFreqRow in upsidedown termFieldReader 2017-02-08 18:23:13 -08:00
Steve Yen
31fecc3663 avoid row alloc's in upsidedown termFieldReader constructor 2017-02-08 18:14:30 -08:00
Marty Schoch
606fd6344b INDEX FORMAT CHANGE: change back index row value
Previously term entries were encoded pairwise (field/term), so
you'd have data like:

F1/T1 F1/T2 F1/T3 F2/T4 F3/T5

As you can see, even though field 1 has 3 terms, we repeat the F1
part in the encoded data.  This is a bit wasteful.

In the new format we encode it as a list of terms for each field:

F1/T1,T2,T3 F2/T4 F3/T5

When fields have multiple terms, this saves space.  In unit
tests there is no additional waste even in the case that a field
has only a single value.

Here are the results of an indexing test case (beer-search):

$ benchcmp indexing-before.txt indexing-after.txt
benchmark               old ns/op       new ns/op       delta
BenchmarkIndexing-4     11275835988     10745514321     -4.70%

benchmark               old allocs     new allocs     delta
BenchmarkIndexing-4     25230685       22480494       -10.90%

benchmark               old bytes      new bytes      delta
BenchmarkIndexing-4     4802816224     4741641856     -1.27%

And here are the results of a MatchAll search building a facet
on the "abv" field:

$ benchcmp facet-before.txt facet-after.txt
benchmark             old ns/op     new ns/op     delta
BenchmarkFacets-4     439762100     228064575     -48.14%

benchmark             old allocs     new allocs     delta
BenchmarkFacets-4     9460208        3723286        -60.64%

benchmark             old bytes     new bytes     delta
BenchmarkFacets-4     260784261     151746483     -41.81%

Although we expect the index to be smaller in many cases, the
beer-search index is about the same in this case.  However,
this may be due to the underlying storage (boltdb) in this case.

Finally, the index version was bumped from 5 to 7, since smolder
also used version 6, which could lead to some confusion.
2017-01-24 15:39:38 -05:00
Steve Yen
5927224e15 optimize mergeOldAndNew for case of first time a doc is seen 2017-01-09 22:48:58 -08:00
Steve Yen
790f2e3e32 optimize by alloc'ing arrays of TermFrequencyRow/TermVector 2017-01-09 22:42:00 -08:00
Steve Yen
8f4726ab10 use struct{}{} idiom instead of additional mark var 2017-01-09 10:17:26 -08:00
Steve Yen
302cac72c4 optimize mergeOldAndNew when non-update case 2017-01-08 17:59:49 -08:00
Steve Yen
931d133024 go fmt and go vet 2017-01-07 22:14:22 -08:00
Steve Yen
40780254ae optimize upsidedown mergeOldAndNew existing key maps
The optimization is to provide a better initial size to the map
constructor and to use a 0-byte-sized struct{} as the map values.
2017-01-07 22:05:55 -08:00
Steve Yen
c2bafa2a51 optimize term vectors/locations via preallocated arrays
The change should hit the allocator less often when processing term
vectors/locations as it preallocates larger, contiguous arrays of
records upfront.
2017-01-07 12:34:06 -08:00
Steve Yen
8b140d84c4 minor optimization of upsidedown backIndexRowForDoc
This change might allow a smart enough golang compiler to perhaps
allocate a backIndexRow on the stack rather than the heap.
2017-01-07 11:49:42 -08:00
Steve Yen
c21d27e15a upsidedown TermFieldReader checks includeTermVectors flag param
The flag was part of the API, but wasn't previously checked.
2017-01-05 21:10:27 -08:00
Steve Yen
37490864ce bleve/index/store/moss - accessor for underlying mossStore
This change adds methods that provide access to the actual, underlying
mossStore instance in the bleve/index/store/moss KVStore adaptor.

This enables applications to utilize advanced, mossStore-specific
features (such as partial rollback of indexes).  See also
https://issues.couchbase.com/browse/MB-17805
2016-12-05 12:25:29 -08:00
Patrick Mezard
c81fd6fdb0 index: DocIDReader.Next() returns nil when done not io.EOF 2016-11-20 19:05:35 +01:00
Steve Yen
2a8237e8cc optimize FacetsBuilder with cached fields & avoid some allocs 2016-10-25 15:34:48 -07:00
Steve Yen
a941a0f318 simplify DocumentFieldTerms append() usage 2016-10-25 15:30:19 -07:00
Rob McColl
12c404aec0 Update kvstore.go 2016-10-19 10:31:00 -04:00
Rob McColl
2b26218591 Fix NumDeletes doc copy/paste err s/Merge/Delete/g 2016-10-17 12:42:21 -04:00
Marty Schoch
77b79a2684 Merge pull request #466 from steveyen/optimize-fieldDict-reader-with-prealloc
Optimize upside-down's field dict reader with preallocated objects
2016-10-13 14:09:54 +02:00
Steve Yen
62e6f1f648 reuse incrementBytes() in moss KV store integration
In this commit, I saw that there was a simple incrementBytes()
implementation elsewhere in bleve that seemed simpler than using the
big int package.

Edge case note: if the input bytes would overflow in incrementBytes(),
such as with an input of [0xff 0xff 0xff], it returns nil.  moss then
treats a nil endKeyExclusive iterator param as a logical
"higher-than-topmost" key, which produces the prefix iteration
behavior that we want for this edge situation.
2016-10-12 09:34:36 -07:00
Steve Yen
01fb59d293 optimize upside-down DictionaryRow for fewer parsing alloc's 2016-10-12 09:22:50 -07:00
Steve Yen
2d72b542c0 optimize upside-down FieldDict reader with prealloc'ed objects
As part of this commit, there's also a newly added
Dictionaryrow.parseDictionaryK() helper method.
2016-10-12 09:18:58 -07:00
Marty Schoch
2f48d7fb02 fix misspellings 2016-10-02 12:11:15 -04:00
Marty Schoch
2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch
6bf9dd59ab BREAKING CHANGE - additional package renaming
i recently learned that package names should also prefer the
singular form, not the plural form
2016-10-01 17:20:59 -04:00
Steve Yen
004e157963 field cache also tracks fieldIndex -> fieldName reverse mapping 2016-10-01 13:06:03 -07:00
Steve Yen
c362ab302e fix tracking of termSearchersFinished stats 2016-09-30 16:11:30 -07:00
Marty Schoch
caf5256f74 don't export internal timers from metrics kvstore 2016-09-30 15:52:16 -04:00
Marty Schoch
f90856b8d3 BREAKING CHANGE - rename upside_down to upsidedown 2016-09-30 12:36:38 -04:00
Marty Schoch
35da361bfa BREAKING CHANGE - renamed packages to be shorter and not use _
this commit only addresses the analysis sub-package
2016-09-30 12:36:10 -04:00
Marty Schoch
73d0951b2a don't panic on missing backindex row
part of #419
2016-09-27 22:16:45 -04:00
Marty Schoch
fb0f4bbecd BREAKING CHANGE - new method to create memory only index
Previously bleve allowed you to create a memory-only index by
simply passing "" as the path argument to the New() method.

This was not clear when reading the code, and led to some
problematic error cases as well.

Now, to create a memory-only index one should use the
NewMemOnly() method.  Passing "" as the path argument
to the New() method will now return os.ErrInvalid.

Advanced users calling NewUsing() can create disk-based or
memory-only indexes, but the change here is that pass ""
as the path argument no longer defaults you into getting
a memory-only index.  Instead, the KV store is selected
manually, just as it is for the disk-based solutions.

Here is an example use of the NewUsing() method to create
a memory-only index:

NewUsing("", indexMapping, Config.DefaultIndexType,
         Config.DefaultMemKVStore, nil)

Config.DefaultMemKVStore is just a new default value
added to the configuration, it currently points to
gtreap.Name (which could have been used directly
instead for more control)

closes #427
2016-09-27 14:11:40 -04:00
Marty Schoch
1f79f65b6a Merge pull request #450 from mschoch/bug449
fix logic in Advance() of UpsideDownCouchDocIDReader
2016-09-26 12:44:09 -04:00
Marty Schoch
981812ff70 fix logic in Advance() of UpsideDownCouchDocIDReader
also added unit tests for newUpsideDownCouchDocIDReaderOnly
use cases
fixes #449
2016-09-26 12:36:24 -04:00
Steve Yen
10cab1826d added upside_down TermFrequencyRow.KeyAppendTo() API
This is a cleanup commit that's followup to a code review discussion
on a previous Advance() perf-optimization PR...

https://github.com/blevesearch/bleve/pull/443
2016-09-23 09:22:42 -07:00
Steve Yen
988dfb02e9 moss kvstore iterator Seek() invokes underlying moss SeekTo() API 2016-09-22 17:46:06 -07:00
Steve Yen
5f5b5d3b80 optimize upside_down TermFieldReader.Advance() to reuse memory
On a dev laptop, bleve-query benchmark on wiki dataset using
query-string of "+text:afternoon +text:coffee" previously had
throughput of 1222qps, and with this change hits 1940qps.
2016-09-22 17:46:06 -07:00
Steve Yen
bcec199c89 issue 441 - upside_down termFieldReader doesn't call Next() early
This change to upside_down term-field-reader no longer moves the
underlying iterator forward preemptively.  Instead, it will invoke
Next() on the underlying iterator only when the caller invokes the
term-field-reader's Next().

There's a special case to handle the situation on the first Next()
invocation after the term-field-reader is created.
2016-09-22 09:18:29 -07:00
slavikm
3eec1ae16c Satisfy errcheck 2016-09-21 17:56:03 +03:00
slavikm
40c1dc076f Now, without the rollback 2016-09-21 16:15:06 +03:00
slavikm
588f379962 Commit if there is no error, rollback otherwise 2016-09-21 16:13:47 +03:00
slavikm
ac49306077 Make sure that the transaction is closed if there is an error 2016-09-21 14:32:05 +03:00
Marty Schoch
e68f6ca9e6 Merge pull request #432 from steveyen/perf-skip-0xff-scan
skip termFrequencyRow 0xFF scan as term length is already known
2016-09-18 12:20:21 -04:00
Steve Yen
b5d2c32b46 skip termFrequencyRow 0xFF scan as term length is already known
This commit modifies the upside_down TermFrequencyRow parseKDoc() to
skip the ByteSeparator (0xFF) scan, as we already know the term's
length in the UpsideDownCouchTermFieldReader.

On my dev box, results from bleve-query test on high frequency terms
went from previous 107qps to 124qps.
2016-09-18 08:56:05 -07:00
Marty Schoch
3fd2a64872 BREAKING CHANGE - removed DumpXXX() methods from bleve.Index
The DumpXXX() methods were always documented as internal and
unsupported.  However, now they are being removed from the
public top-level API.  They are still available on the internal
IndexReader, which can be accessed using the Advanced() method.

The DocCount() and DumpXXX() methods on the internal index
have moved to the internal index reader, since they logically
operate on a snapshot of an index.
2016-09-13 12:40:01 -04:00
Marty Schoch
e1fb860a86 removed unused AsyncIndex interface 2016-09-13 08:42:36 -04:00
Marty Schoch
23755049e8 slight tweak to API to only encode docNum->docNumBytes once 2016-09-11 20:29:16 -04:00
Marty Schoch
035b7c91fc fix unchecked err 2016-09-11 20:29:15 -04:00
Marty Schoch
bbfa6406ea fix test expectation to use ext ids not internal ones
the test had incorreclty been updated to compare the internal
document ids, but these are opaque and may not be the expected
ids in some cases, the test should simply check that it
corresponds to the correct external ids
2016-09-11 20:29:15 -04:00
Marty Schoch
36000f1a1b fix api changes and test after merge 2016-09-11 20:29:15 -04:00
Marty Schoch
1b68c4ec5b make backindex rows more compact, fix bug counting docs on start 2016-09-11 20:29:15 -04:00
Marty Schoch
d3ca5424e2 added cuckoo filter, perf improves overall from upside_down
though only slightly
2016-09-11 20:29:15 -04:00
Marty Schoch
07ab49f602 fix bug counting docs and make smolder selectable 2016-09-11 20:29:15 -04:00
Marty Schoch
04fd62dec3 further tweaks, now all bleve tests pass 2016-09-11 20:29:15 -04:00
Marty Schoch
1b10c286e7 adding initial attempt at numeric ids in index
index scheme is named smolder
compiles and unit tests pass, that is all
2016-09-11 20:29:15 -04:00
Marty Schoch
da9339bcdf refactor FinalizeID into ExternalID and InternalID 2016-09-11 20:29:14 -04:00
Steve Yen
e8cc3c6bdd index/store/moss KV backend propagates mossStore's Stats()
This change depends on the recently introduced mossStore Stats() API
in github.com/couchbase/moss 564bdbc0 commit.  So, gvt for moss has
been updated as part of this change.

Most of the change involves propagating the mossStore instance (the
statsFunc callback) so that it's accessible to the KVStore.Stats()
method.

See also: http://review.couchbase.org/#/c/67524/
2016-09-08 17:12:04 -07:00
Marty Schoch
ae4b354c72 Merge pull request #411 from steveyen/master
tighter moss KV store iterator handling
2016-08-27 08:00:45 -04:00
Steve Yen
eaa59621ff tighter moss KV store iterator handling 2016-08-19 09:10:03 -07:00
Marty Schoch
27ba6187bc adds support for more complex field sorts with object (not string)
previously from JSON we would just deserialize strings like
"-abv" or "city" or "_id" or "_score" as simple sorts
on fields, ids or scores respectively

while this is simple and compact, it can be ambiguous (for
example if you have a field starting with - or if you have a field
named "_id" already.  also, this simple syntax doesnt allow us
to specify more cmoplex options to deal with type/mode/missing

we keep support for the simple string syntax, but now also
recognize a more expressive syntax like:

{
  "by": "field",
  "field": "abv",
  "desc": true,
  "type": "string",
  "mode": "min",
  "missing": "first"
}

type, mode and missing are optional and default to
"auto", "default", and "last" respectively
2016-08-17 14:33:51 -07:00
Marty Schoch
750e0ac16c change sort field impl to use indexed values not stored values 2016-08-17 09:20:44 -07:00
Marty Schoch
5f1454106d Merge pull request #402 from mschoch/indexapiwork
Index/Search API work
2016-08-10 12:41:51 -04:00
Marty Schoch
aa3ae3d39c enable read_only mode for boltdb indexes
fixes #405
2016-08-06 10:47:34 -04:00
Marty Schoch
da794d3762 fix bug introduced by reuse of TermFrequencyRow values
in a recent commit, we changed the code to reuse
TermFrequencyRow objects intsead of constantly allocating new
ones.  unfortunately, one of the original methods was not coded
with this reuse in mind, and a lazy initialization cause us to
leak data from previous uses of the same object.

in particular this caused term vector information from previous
hits to still be applied to subsequent hits.  eventually this
causes the highlighter to try and highlight invalid regions
of a slice.

fixes #404
2016-08-05 08:33:04 -04:00
Marty Schoch
b857769217 document Reset behavior as its non-obvious 2016-08-03 17:16:15 -04:00
Marty Schoch
d7405a4d79 updated attempt to reuse []byte
previous attempt was flawed (but maked by Reset() method)
new approach is to do this work in the Reset() method itself,
logically this is where it belongs.

but further we acknowledge that IndexInternalID []byte lifetime
lives beyond the TermFieldDoc, so another copy is made into
the DocumentMatch.  Although this introduces yet another copy
the theory being tested is that it allows each of these
structuress to reuse memory without additional allocation.
2016-08-03 17:01:27 -04:00
Marty Schoch
89d83cb5a1 reuse memory already allocated for copies of docids
when the term field reader is copying ID values out of the
kv store's iterator, it is already attempting to reuse the
term frequency row data structure.  this change allows us
to also attempt to reuse the []byte allocated for previous
copies of the docid.  we reset the slice length to zero
then copy the data into the existing slice, avoiding
new allocation and garbage collection in the cases where
there is already enough space
2016-08-03 13:45:48 -04:00
Marty Schoch
36de4a7097 cleaner fix for the TermFrequencyRow reuse bug
reset to nil first, let remaining logic work as before
2016-08-01 17:17:29 -04:00
Marty Schoch
cfce9c5fc5 initialize term vector list in parseV
otherwise reusing previous term frequency row causes us to
keep tacking on to one gigantic list
2016-08-01 17:01:34 -04:00
Marty Schoch
172ca7e69e need to copy the doc ID for it to survive past next iteration 2016-08-01 17:01:04 -04:00
Marty Schoch
1aacd9bad5 changed approach
IndexInternalID is now []byte
this is still opaque, and should still work for any future
index implementations as it is a least common denominator
choice, all implementations must internally represent the
id as []byte at some point for storage to disk
2016-08-01 14:26:50 -04:00
Marty Schoch
5aa9e95468 major refactor of index/search API
index id's are now opaque (until finally returned to top-level user)
 - the TermFieldDoc's returned by TermFieldReader no longer contain doc id
 - instead they return an opaque IndexInternalID
 - items returned are still in the "natural index order"
 - but that is no longer guaranteed to be "doc id order"
 - correct behavior requires that they all follow the same order
 - but not any particular order

 - new API FinalizeDocID which converts index internal ID's to public string ID

 - APIs used internally which previously took doc id now take IndexInternalID
     - that is DocumentFieldTerms() and DocumentFieldTermsForFields()
 - however, APIs that are used externally do not reflect this change
     - that is Document()

 - DocumentIDReader follows the same changes, but this is less obvious
     - behavior clarified, used to iterate doc ids, BUT NOT in doc id order
     - method STILL available to iterate doc ids in range
     - but again, you won't get them in any meaningful order
     - new method to iterate actual doc ids from list of possible ids
         - this was introduced to make the DocIDSearcher continue working

searchers now work with the new opaque index internal doc ids
 - they return new DocumentMatchInternal (which does not have string ID)
scorerers also work with these opaque index internal doc ids
 - they return DocumentMatchInternal (which does not have string ID)
collectors now also perform a final step of converting the final result
 - they STILL return traditional DocumentMatch (with string ID)
 - but they now also require an IndexReader (so that they can do the conversion)
2016-07-31 13:46:18 -04:00
Marty Schoch
47ee69ae82 term field reader supports optionally omitting 3 details
at the time you create the term field reader, you can specify
that you don't need the term freq, the norm, or the term vectors

in that case, the index implementation can choose to not return
them in its subsequently returned values

this is advisory only, some simple implementations may ignore this
and continue to return the values anyway (as the current impl of
upside_down does today)

this change will allow future index implementations the
opportunity to do less work when it isn't required
2016-07-30 10:26:42 -04:00
Steve Yen
4822cff63a optimize Advance() with pre-allocated in-out param
This perf-related change helps the code and API reach more similarity
with the Next() methods, which now take a pre-allocate param.
2016-07-29 14:15:00 -07:00
Steve Yen
3c82086805 optimize upside_down reader & 64-bit struct alignments
The UpsideDownCouchTermFieldReader.Next() only needs the doc ID from
the key, so this change provides a specialized parseKDoc() method for
that optimization.

Additionally, fields in various structs are more 64-bit aligned, in an
attempt to reduce the invocations of runtime.typedmemmove() and
runtime.heapBitsBulkBarrier(), which the go compiler seems to
automatically insert to transparently handle misaligned data.
2016-07-23 10:37:40 -07:00
Steve Yen
5094d2d097 optimize moss PrefixIterator
Previously, the PrefixIterator() for moss was implemented by comparing
the prefix bytes on every Next().

With this optimization, the next larger endKeyExclusive is computed at
the iterator's initialization, which allows us to avoid all those
prefix comparisons.
2016-07-21 18:33:34 -07:00
Steve Yen
5271a0f62b optimize termFieldVectorsFromTermVectors when empty 2016-07-21 11:46:14 -07:00
Steve Yen
cbb174b074 optimize moss iterator Next() done/k/v maintenance 2016-07-21 11:10:49 -07:00
Steve Yen
b744148449 optimization to actually reuse the TermFrequencyRow 2016-07-21 11:10:49 -07:00
Steve Yen
6d7fa0b964 optimize moss iterator checkDone() 2016-07-21 11:10:49 -07:00
Steve Yen
39d3e2f028 optimize upside_down reader Next() with TermFieldDoc reuse
This optimization changes the index.TermFieldReader.Next() interface
API, adding an optional, pre-allocated *TermFieldDoc parameter, which
can help prevent garbage creation.
2016-07-21 11:10:49 -07:00
Steve Yen
2498ccc913 optimize upside_down reader Next() to reuse TermFrequencyRow
Before this change, upside down's reader would alloc a new
TermFrequencyRow on every Next(), which would be immediately
transformed into an index.TermFieldDoc{}.  This change reuses a
pre-allocated TermFrequencyRow that's a field in the reader.
2016-07-21 11:10:49 -07:00
Steve Yen
68af6aef62 optimize upside_down reader Next() when 0-length term field vectors
From some bleve-query perf profiling, term field vectors appeared to
be alloc'ed, which was unnecessary as term field vectors are disabled
in the bleve-blast/bleve-query tests.
2016-07-21 11:10:49 -07:00
Marty Schoch
5934a185f3 Merge pull request #398 from slavikm/master
Make facets much faster
2016-07-21 09:12:28 -04:00
slavikm
fc990bc2d1 Remove the field IDs from outside of the index 2016-07-19 20:42:45 -07:00
slavikm
ce64c17be1 Do field cache only once per search 2016-07-17 16:29:17 -07:00
slavikm
9a9b630a6d Make facets much faster 2016-07-17 15:31:35 -07:00
Steve Yen
80623f4a8a MB-20101 - moss KV fix Get() of 0-length vals
The moss KV store adapter's Get() implementation was incorrectly
transforming a 0-length val (e.g., []byte{}) into a nil val.
2016-07-15 14:41:30 -07:00
Marty Schoch
bd2a23fb6d remove firestorm index scheme
firestorm was an experiment
we learned a lot, but it did not result in a usable index scheme
2016-06-26 07:51:41 -04:00
Mark Mindenhall
c3c827aded Add boltdb config test 2016-06-14 13:36:40 -06:00
Mark Mindenhall
d369bd5c3c Add bucket fill percent option for boltdb 2016-06-13 18:47:38 -06:00
Marty Schoch
1be5699c54 Merge pull request #381 from MachineShop-IOT/master
Compact for boltdb (workaround for #374)
2016-06-08 00:01:20 -04:00
Steve Yen
4e531ae11b configurable mossStoreOptions and DeferredSort defaults to true 2016-06-07 17:38:43 -07:00
Mark Mindenhall
09fcc69516 rename defaultBatchSize to defaultCompactBatchSize 2016-06-01 14:25:57 -06:00
Mark Mindenhall
b5a4378a46 Cleanup godoc comments in PR 2016-06-01 13:59:57 -06:00
Mark Mindenhall
fecf7ab5c4 Compact for boltdb (workaround for #374) 2016-06-01 13:16:43 -06:00
Marty Schoch
92cf2a8974 Merge pull request #376 from MachineShop-IOT/master
Remove DictionaryTerm with count 0 during compact (workaround for #374)
2016-06-01 13:39:30 -04:00
Steve Yen
bf318b489b enable mossStore as configurable lower-level store
Also, bumped moss vendor SHA to latest moss with mossStore.
2016-05-26 13:33:22 -07:00
Mark Mindenhall
04351eb8f1 Move creation of iterator within transaction 2016-05-26 12:29:49 -06:00
Mark Mindenhall
686b20be4f Remove DictionaryTerm with count 0 during compact (workaround for #374) 2016-05-26 11:04:53 -06:00
Mark Mindenhall
3aa1d72233 Add compact method to goleveldb store 2016-05-17 16:58:17 -06:00
Marty Schoch
73b514fa4f do not put +/-Inf or NaN values into the stats map 2016-04-15 13:39:30 -04:00
Marty Schoch
b8a2fbb887 fix data race in bleve batch reuse
Currently bleve batch is build by user goroutine
Then read by bleve gourinte
This is still safe when used correctly
However, Reset() will modify the map, which is now a data race

This fix is to simply make batch.Reset() alloc new maps.
This provides a data-access pattern that can be used safely.
Also, this thread argues that creating a new map may be faster
than trying to reuse an existing one:

https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J

Separate but related, I have opted to remove the "unsafe batch"
checking that we did.  This was always limited anyway, and now
users of Go 1.6 are just as likely to get a panic from the
runtime for concurrent map access anyway.  So, the price paid
by us (additional mutex) is not worth it.

fixes #360 and #260
2016-04-08 15:32:13 -04:00
Marty Schoch
2a703376ea fix ineffectual assignments 2016-04-02 22:42:56 -04:00
Marty Schoch
7892882519 fix typos 2016-04-02 21:59:30 -04:00
Marty Schoch
194ee82c80 gofmt simplifications 2016-04-02 21:54:33 -04:00
Marty Schoch
639fb1ab89 remove NativeMergeOperator from core, it requires unsafe 2016-03-24 12:06:43 -04:00