0
0
Fork 0
Commit Graph

19 Commits

Author SHA1 Message Date
abhinavdangeti 7e36109b3c MB-28162: Provide API to estimate memory needed to run a search query
This API (unexported) will estimate the amount of memory needed to execute
a search query over an index before the collector begins data collection.

Sample estimates for certain queries:
{Size: 10, BenchmarkUpsidedownSearchOverhead}
                                                           ESTIMATE    BENCHMEM
TermQuery                                                  4616        4796
MatchQuery                                                 5210        5405
DisjunctionQuery (Match queries)                           7700        8447
DisjunctionQuery (Term queries)                            6514        6591
ConjunctionQuery (Match queries)                           7524        8175
Nested disjunction query (disjunction of disjunctions)     10306       10708
…
2018-03-06 13:53:42 -08:00
Marty Schoch 0eba2a3f0c reduce garbage created while processing facets
previously we parsed/returned large sections of the documents
back index row in order to compute facet information.  this
would require parsing the protobuf of the entire back index row.
unfortunately this creates considerable garbage.

this new version introduces a visitor/callback approach to
working with data inside the back index row.  the benefit
of this approach is that we can let the higher-level code
see values, prior to any copies of data being made or
intermediate garbage being created.  implementations of
the callback must copy any value which they would like to
retain beyond the callback.

NOTE: this approach is duplicates code from the
automatically generated protobuf code

NOTE: this approach assumes that the "field" field be serialized
before the "terms" field.  This is guaranteed by our currently
generated protobuf encoder, and is recommended by the protobuf
spec.  But, decoders SHOULD support them occuring in any order,
which we do not.
2017-03-02 17:00:46 -05:00
Marty Schoch e5ec831250 numeric range facet merging compare range values not pointers
fix #492
2016-11-03 15:48:46 -04:00
Steve Yen 2a8237e8cc optimize FacetsBuilder with cached fields & avoid some allocs 2016-10-25 15:34:48 -07:00
Marty Schoch 2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch 27ba6187bc adds support for more complex field sorts with object (not string)
previously from JSON we would just deserialize strings like
"-abv" or "city" or "_id" or "_score" as simple sorts
on fields, ids or scores respectively

while this is simple and compact, it can be ambiguous (for
example if you have a field starting with - or if you have a field
named "_id" already.  also, this simple syntax doesnt allow us
to specify more cmoplex options to deal with type/mode/missing

we keep support for the simple string syntax, but now also
recognize a more expressive syntax like:

{
  "by": "field",
  "field": "abv",
  "desc": true,
  "type": "string",
  "mode": "min",
  "missing": "first"
}

type, mode and missing are optional and default to
"auto", "default", and "last" respectively
2016-08-17 14:33:51 -07:00
Marty Schoch 750e0ac16c change sort field impl to use indexed values not stored values 2016-08-17 09:20:44 -07:00
Marty Schoch e188fe35f7 switch back to single DocumentMatch struct
instead of separate DocumentMatch/DocumentMatchInternal

rules are simple, everything operates on the IndexInternalID field
until the results are returned, then ID is set correctly
the IndexInternalID field is not exported to JSON
2016-08-01 14:58:02 -04:00
Marty Schoch 5aa9e95468 major refactor of index/search API
index id's are now opaque (until finally returned to top-level user)
 - the TermFieldDoc's returned by TermFieldReader no longer contain doc id
 - instead they return an opaque IndexInternalID
 - items returned are still in the "natural index order"
 - but that is no longer guaranteed to be "doc id order"
 - correct behavior requires that they all follow the same order
 - but not any particular order

 - new API FinalizeDocID which converts index internal ID's to public string ID

 - APIs used internally which previously took doc id now take IndexInternalID
     - that is DocumentFieldTerms() and DocumentFieldTermsForFields()
 - however, APIs that are used externally do not reflect this change
     - that is Document()

 - DocumentIDReader follows the same changes, but this is less obvious
     - behavior clarified, used to iterate doc ids, BUT NOT in doc id order
     - method STILL available to iterate doc ids in range
     - but again, you won't get them in any meaningful order
     - new method to iterate actual doc ids from list of possible ids
         - this was introduced to make the DocIDSearcher continue working

searchers now work with the new opaque index internal doc ids
 - they return new DocumentMatchInternal (which does not have string ID)
scorerers also work with these opaque index internal doc ids
 - they return DocumentMatchInternal (which does not have string ID)
collectors now also perform a final step of converting the final result
 - they STILL return traditional DocumentMatch (with string ID)
 - but they now also require an IndexReader (so that they can do the conversion)
2016-07-31 13:46:18 -04:00
slavikm fc990bc2d1 Remove the field IDs from outside of the index 2016-07-19 20:42:45 -07:00
slavikm ce64c17be1 Do field cache only once per search 2016-07-17 16:29:17 -07:00
slavikm 9a9b630a6d Make facets much faster 2016-07-17 15:31:35 -07:00
Marty Schoch f1abf6beb3 facets now also have secondary sort
in case of term facets, secondary sort (after count) is on the term
for date and numberic facets, secondary sort is on the facet name

fixes #335
2016-03-14 12:02:30 -04:00
Marty Schoch 6c988de5b5 fix date facet merging for searches on index aliases
previously we incorrectly identified matching buckets by
comparing string pointers.  this worked in the unit test
but not in real applications since the strings result from
date parsing inside the facet collector, and are therefore
different pointers
2016-02-23 15:33:07 -05:00
Marty Schoch 51a59cb05c initial impl of Index Aliases
an IndexAlias allows you easily work with one logical Index
while changing the actual Index its pointing to behind the scenes
Changing which actual Index is backing an IndexAlias can be done
atomically so that your application smoothly transitions from
one Index to another.
A separate use of IndexAlias is allowed when the IndexAlias is
defined to point to multiple Indexes.  In this case only the
Search() operation is supported, but the Search will be run
on each of the underlying indexes in parallel, and the results
will be merged.
2014-10-29 09:22:11 -04:00
Marty Schoch 198ca1ad4d major refactor of kvstore/index internals, see below
In the index/store package
introduce KVReader
  creates snapshot
  all read operations consistent from this snapshot
  must close to release

introduce KVWriter
  only one writer active
  access to all operations
  allows for consisten read-modify-write
  must close to release

introduce AssociativeMerge operation on batch
  allows efficient read-modify-write
  for associative operations
  used to consolidate updates to the term summary rows
  saves 1 set and 1 get op per shared instance of term in field

In the index package
introduced an IndexReader
  exposes a consisten snapshot of the index for searching

At top level
  All searches now operate on a consisten snapshot of the index
2014-09-12 17:21:35 -04:00
Marty Schoch 7a7eb2e94c add newline between license and package
this avoids cluttering godocs with the license
2014-09-02 10:54:50 -04:00
Marty Schoch 1161361bea rename imports from couchbaselabs to blevesearch 2014-08-28 15:38:57 -04:00
Marty Schoch 7bbaa8ecd5 added support for returning facet results with requests
supports terms, numeric ranges, and date ranges
closes #14
2014-08-11 11:03:29 -04:00