0
0
Commit Graph

371 Commits

Author SHA1 Message Date
Marty Schoch
3fd2a64872 BREAKING CHANGE - removed DumpXXX() methods from bleve.Index
The DumpXXX() methods were always documented as internal and
unsupported.  However, now they are being removed from the
public top-level API.  They are still available on the internal
IndexReader, which can be accessed using the Advanced() method.

The DocCount() and DumpXXX() methods on the internal index
have moved to the internal index reader, since they logically
operate on a snapshot of an index.
2016-09-13 12:40:01 -04:00
Marty Schoch
e1fb860a86 removed unused AsyncIndex interface 2016-09-13 08:42:36 -04:00
Marty Schoch
23755049e8 slight tweak to API to only encode docNum->docNumBytes once 2016-09-11 20:29:16 -04:00
Marty Schoch
035b7c91fc fix unchecked err 2016-09-11 20:29:15 -04:00
Marty Schoch
bbfa6406ea fix test expectation to use ext ids not internal ones
the test had incorreclty been updated to compare the internal
document ids, but these are opaque and may not be the expected
ids in some cases, the test should simply check that it
corresponds to the correct external ids
2016-09-11 20:29:15 -04:00
Marty Schoch
36000f1a1b fix api changes and test after merge 2016-09-11 20:29:15 -04:00
Marty Schoch
1b68c4ec5b make backindex rows more compact, fix bug counting docs on start 2016-09-11 20:29:15 -04:00
Marty Schoch
d3ca5424e2 added cuckoo filter, perf improves overall from upside_down
though only slightly
2016-09-11 20:29:15 -04:00
Marty Schoch
07ab49f602 fix bug counting docs and make smolder selectable 2016-09-11 20:29:15 -04:00
Marty Schoch
04fd62dec3 further tweaks, now all bleve tests pass 2016-09-11 20:29:15 -04:00
Marty Schoch
1b10c286e7 adding initial attempt at numeric ids in index
index scheme is named smolder
compiles and unit tests pass, that is all
2016-09-11 20:29:15 -04:00
Marty Schoch
da9339bcdf refactor FinalizeID into ExternalID and InternalID 2016-09-11 20:29:14 -04:00
Steve Yen
e8cc3c6bdd index/store/moss KV backend propagates mossStore's Stats()
This change depends on the recently introduced mossStore Stats() API
in github.com/couchbase/moss 564bdbc0 commit.  So, gvt for moss has
been updated as part of this change.

Most of the change involves propagating the mossStore instance (the
statsFunc callback) so that it's accessible to the KVStore.Stats()
method.

See also: http://review.couchbase.org/#/c/67524/
2016-09-08 17:12:04 -07:00
Marty Schoch
ae4b354c72 Merge pull request #411 from steveyen/master
tighter moss KV store iterator handling
2016-08-27 08:00:45 -04:00
Steve Yen
eaa59621ff tighter moss KV store iterator handling 2016-08-19 09:10:03 -07:00
Marty Schoch
27ba6187bc adds support for more complex field sorts with object (not string)
previously from JSON we would just deserialize strings like
"-abv" or "city" or "_id" or "_score" as simple sorts
on fields, ids or scores respectively

while this is simple and compact, it can be ambiguous (for
example if you have a field starting with - or if you have a field
named "_id" already.  also, this simple syntax doesnt allow us
to specify more cmoplex options to deal with type/mode/missing

we keep support for the simple string syntax, but now also
recognize a more expressive syntax like:

{
  "by": "field",
  "field": "abv",
  "desc": true,
  "type": "string",
  "mode": "min",
  "missing": "first"
}

type, mode and missing are optional and default to
"auto", "default", and "last" respectively
2016-08-17 14:33:51 -07:00
Marty Schoch
750e0ac16c change sort field impl to use indexed values not stored values 2016-08-17 09:20:44 -07:00
Marty Schoch
5f1454106d Merge pull request #402 from mschoch/indexapiwork
Index/Search API work
2016-08-10 12:41:51 -04:00
Marty Schoch
aa3ae3d39c enable read_only mode for boltdb indexes
fixes #405
2016-08-06 10:47:34 -04:00
Marty Schoch
da794d3762 fix bug introduced by reuse of TermFrequencyRow values
in a recent commit, we changed the code to reuse
TermFrequencyRow objects intsead of constantly allocating new
ones.  unfortunately, one of the original methods was not coded
with this reuse in mind, and a lazy initialization cause us to
leak data from previous uses of the same object.

in particular this caused term vector information from previous
hits to still be applied to subsequent hits.  eventually this
causes the highlighter to try and highlight invalid regions
of a slice.

fixes #404
2016-08-05 08:33:04 -04:00
Marty Schoch
b857769217 document Reset behavior as its non-obvious 2016-08-03 17:16:15 -04:00
Marty Schoch
d7405a4d79 updated attempt to reuse []byte
previous attempt was flawed (but maked by Reset() method)
new approach is to do this work in the Reset() method itself,
logically this is where it belongs.

but further we acknowledge that IndexInternalID []byte lifetime
lives beyond the TermFieldDoc, so another copy is made into
the DocumentMatch.  Although this introduces yet another copy
the theory being tested is that it allows each of these
structuress to reuse memory without additional allocation.
2016-08-03 17:01:27 -04:00
Marty Schoch
89d83cb5a1 reuse memory already allocated for copies of docids
when the term field reader is copying ID values out of the
kv store's iterator, it is already attempting to reuse the
term frequency row data structure.  this change allows us
to also attempt to reuse the []byte allocated for previous
copies of the docid.  we reset the slice length to zero
then copy the data into the existing slice, avoiding
new allocation and garbage collection in the cases where
there is already enough space
2016-08-03 13:45:48 -04:00
Marty Schoch
36de4a7097 cleaner fix for the TermFrequencyRow reuse bug
reset to nil first, let remaining logic work as before
2016-08-01 17:17:29 -04:00
Marty Schoch
cfce9c5fc5 initialize term vector list in parseV
otherwise reusing previous term frequency row causes us to
keep tacking on to one gigantic list
2016-08-01 17:01:34 -04:00
Marty Schoch
172ca7e69e need to copy the doc ID for it to survive past next iteration 2016-08-01 17:01:04 -04:00
Marty Schoch
1aacd9bad5 changed approach
IndexInternalID is now []byte
this is still opaque, and should still work for any future
index implementations as it is a least common denominator
choice, all implementations must internally represent the
id as []byte at some point for storage to disk
2016-08-01 14:26:50 -04:00
Marty Schoch
5aa9e95468 major refactor of index/search API
index id's are now opaque (until finally returned to top-level user)
 - the TermFieldDoc's returned by TermFieldReader no longer contain doc id
 - instead they return an opaque IndexInternalID
 - items returned are still in the "natural index order"
 - but that is no longer guaranteed to be "doc id order"
 - correct behavior requires that they all follow the same order
 - but not any particular order

 - new API FinalizeDocID which converts index internal ID's to public string ID

 - APIs used internally which previously took doc id now take IndexInternalID
     - that is DocumentFieldTerms() and DocumentFieldTermsForFields()
 - however, APIs that are used externally do not reflect this change
     - that is Document()

 - DocumentIDReader follows the same changes, but this is less obvious
     - behavior clarified, used to iterate doc ids, BUT NOT in doc id order
     - method STILL available to iterate doc ids in range
     - but again, you won't get them in any meaningful order
     - new method to iterate actual doc ids from list of possible ids
         - this was introduced to make the DocIDSearcher continue working

searchers now work with the new opaque index internal doc ids
 - they return new DocumentMatchInternal (which does not have string ID)
scorerers also work with these opaque index internal doc ids
 - they return DocumentMatchInternal (which does not have string ID)
collectors now also perform a final step of converting the final result
 - they STILL return traditional DocumentMatch (with string ID)
 - but they now also require an IndexReader (so that they can do the conversion)
2016-07-31 13:46:18 -04:00
Marty Schoch
47ee69ae82 term field reader supports optionally omitting 3 details
at the time you create the term field reader, you can specify
that you don't need the term freq, the norm, or the term vectors

in that case, the index implementation can choose to not return
them in its subsequently returned values

this is advisory only, some simple implementations may ignore this
and continue to return the values anyway (as the current impl of
upside_down does today)

this change will allow future index implementations the
opportunity to do less work when it isn't required
2016-07-30 10:26:42 -04:00
Steve Yen
4822cff63a optimize Advance() with pre-allocated in-out param
This perf-related change helps the code and API reach more similarity
with the Next() methods, which now take a pre-allocate param.
2016-07-29 14:15:00 -07:00
Steve Yen
3c82086805 optimize upside_down reader & 64-bit struct alignments
The UpsideDownCouchTermFieldReader.Next() only needs the doc ID from
the key, so this change provides a specialized parseKDoc() method for
that optimization.

Additionally, fields in various structs are more 64-bit aligned, in an
attempt to reduce the invocations of runtime.typedmemmove() and
runtime.heapBitsBulkBarrier(), which the go compiler seems to
automatically insert to transparently handle misaligned data.
2016-07-23 10:37:40 -07:00
Steve Yen
5094d2d097 optimize moss PrefixIterator
Previously, the PrefixIterator() for moss was implemented by comparing
the prefix bytes on every Next().

With this optimization, the next larger endKeyExclusive is computed at
the iterator's initialization, which allows us to avoid all those
prefix comparisons.
2016-07-21 18:33:34 -07:00
Steve Yen
5271a0f62b optimize termFieldVectorsFromTermVectors when empty 2016-07-21 11:46:14 -07:00
Steve Yen
cbb174b074 optimize moss iterator Next() done/k/v maintenance 2016-07-21 11:10:49 -07:00
Steve Yen
b744148449 optimization to actually reuse the TermFrequencyRow 2016-07-21 11:10:49 -07:00
Steve Yen
6d7fa0b964 optimize moss iterator checkDone() 2016-07-21 11:10:49 -07:00
Steve Yen
39d3e2f028 optimize upside_down reader Next() with TermFieldDoc reuse
This optimization changes the index.TermFieldReader.Next() interface
API, adding an optional, pre-allocated *TermFieldDoc parameter, which
can help prevent garbage creation.
2016-07-21 11:10:49 -07:00
Steve Yen
2498ccc913 optimize upside_down reader Next() to reuse TermFrequencyRow
Before this change, upside down's reader would alloc a new
TermFrequencyRow on every Next(), which would be immediately
transformed into an index.TermFieldDoc{}.  This change reuses a
pre-allocated TermFrequencyRow that's a field in the reader.
2016-07-21 11:10:49 -07:00
Steve Yen
68af6aef62 optimize upside_down reader Next() when 0-length term field vectors
From some bleve-query perf profiling, term field vectors appeared to
be alloc'ed, which was unnecessary as term field vectors are disabled
in the bleve-blast/bleve-query tests.
2016-07-21 11:10:49 -07:00
Marty Schoch
5934a185f3 Merge pull request #398 from slavikm/master
Make facets much faster
2016-07-21 09:12:28 -04:00
slavikm
fc990bc2d1 Remove the field IDs from outside of the index 2016-07-19 20:42:45 -07:00
slavikm
ce64c17be1 Do field cache only once per search 2016-07-17 16:29:17 -07:00
slavikm
9a9b630a6d Make facets much faster 2016-07-17 15:31:35 -07:00
Steve Yen
80623f4a8a MB-20101 - moss KV fix Get() of 0-length vals
The moss KV store adapter's Get() implementation was incorrectly
transforming a 0-length val (e.g., []byte{}) into a nil val.
2016-07-15 14:41:30 -07:00
Marty Schoch
bd2a23fb6d remove firestorm index scheme
firestorm was an experiment
we learned a lot, but it did not result in a usable index scheme
2016-06-26 07:51:41 -04:00
Mark Mindenhall
c3c827aded Add boltdb config test 2016-06-14 13:36:40 -06:00
Mark Mindenhall
d369bd5c3c Add bucket fill percent option for boltdb 2016-06-13 18:47:38 -06:00
Marty Schoch
1be5699c54 Merge pull request #381 from MachineShop-IOT/master
Compact for boltdb (workaround for #374)
2016-06-08 00:01:20 -04:00
Steve Yen
4e531ae11b configurable mossStoreOptions and DeferredSort defaults to true 2016-06-07 17:38:43 -07:00
Mark Mindenhall
09fcc69516 rename defaultBatchSize to defaultCompactBatchSize 2016-06-01 14:25:57 -06:00