0
0
Commit Graph

13 Commits

Author SHA1 Message Date
Marty Schoch
ce0b299d6f switch sort impl to use interface
this improves perf in the case where we're not doing any sorting
as we avoid allocating memory and converting scores into
numeric terms
2016-08-24 19:02:22 -04:00
Marty Schoch
24a2b57e29 refactor search package to reuse DocumentMatch and ID []byte's
the motivation for this commit is long and detailed and has been
documented externally here:

https://gist.github.com/mschoch/5cc5c9cf4669a5fe8512cb7770d3c1a2

the core of the changes are:

1.  recognize that collector/searcher need only a fixed number
of DocumentMatch instances, and this number can be determined
from the structure of the query, not the size of the data

2.  knowing this, instances can be allocated in bulk, up front
and they can be reused without locking (since all search
operations take place in a single goroutine

3.  combined with previous commits which enabled reuse of
the IndexInternalID []byte, this allows for no allocation/copy
of these bytes as well (by using DocumentMatch Reset() method
when returning entries to the pool
2016-08-08 22:21:47 -04:00
Marty Schoch
e188fe35f7 switch back to single DocumentMatch struct
instead of separate DocumentMatch/DocumentMatchInternal

rules are simple, everything operates on the IndexInternalID field
until the results are returned, then ID is set correctly
the IndexInternalID field is not exported to JSON
2016-08-01 14:58:02 -04:00
Marty Schoch
1aacd9bad5 changed approach
IndexInternalID is now []byte
this is still opaque, and should still work for any future
index implementations as it is a least common denominator
choice, all implementations must internally represent the
id as []byte at some point for storage to disk
2016-08-01 14:26:50 -04:00
Marty Schoch
5aa9e95468 major refactor of index/search API
index id's are now opaque (until finally returned to top-level user)
 - the TermFieldDoc's returned by TermFieldReader no longer contain doc id
 - instead they return an opaque IndexInternalID
 - items returned are still in the "natural index order"
 - but that is no longer guaranteed to be "doc id order"
 - correct behavior requires that they all follow the same order
 - but not any particular order

 - new API FinalizeDocID which converts index internal ID's to public string ID

 - APIs used internally which previously took doc id now take IndexInternalID
     - that is DocumentFieldTerms() and DocumentFieldTermsForFields()
 - however, APIs that are used externally do not reflect this change
     - that is Document()

 - DocumentIDReader follows the same changes, but this is less obvious
     - behavior clarified, used to iterate doc ids, BUT NOT in doc id order
     - method STILL available to iterate doc ids in range
     - but again, you won't get them in any meaningful order
     - new method to iterate actual doc ids from list of possible ids
         - this was introduced to make the DocIDSearcher continue working

searchers now work with the new opaque index internal doc ids
 - they return new DocumentMatchInternal (which does not have string ID)
scorerers also work with these opaque index internal doc ids
 - they return DocumentMatchInternal (which does not have string ID)
collectors now also perform a final step of converting the final result
 - they STILL return traditional DocumentMatch (with string ID)
 - but they now also require an IndexReader (so that they can do the conversion)
2016-07-31 13:46:18 -04:00
Steve Yen
988ca62182 optimize upside_down reader Next() with doc match reuse
This optimization changes the search.Search.Next() interface API,
adding an optional, pre-allocated *DocumentMatch parameter.

When it's non-nil, the TermSearcher and TermQueryScorer will use that
pre-allocated *DocumentMatch, instead of allocating a brand new
DocumentMatch instance.
2016-07-21 11:10:49 -07:00
Marty Schoch
194ee82c80 gofmt simplifications 2016-04-02 21:54:33 -04:00
Marty Schoch
900f1b4a67 major kvstore interface and impl overhaul
clarified the interface contract
2015-09-23 11:25:47 -07:00
Marty Schoch
539aeb8dc7 fix errors identified by errcheck
part of #169
2015-04-07 18:05:41 -04:00
Marty Schoch
c7443fe52b refactored API a bit
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API.  in these case and proactively in a few
others we now return error as well.

also the batch API has been updated to allow performing
set/delete internal within the batch
2014-10-31 09:40:23 -04:00
Marty Schoch
198ca1ad4d major refactor of kvstore/index internals, see below
In the index/store package
introduce KVReader
  creates snapshot
  all read operations consistent from this snapshot
  must close to release

introduce KVWriter
  only one writer active
  access to all operations
  allows for consisten read-modify-write
  must close to release

introduce AssociativeMerge operation on batch
  allows efficient read-modify-write
  for associative operations
  used to consolidate updates to the term summary rows
  saves 1 set and 1 get op per shared instance of term in field

In the index package
introduced an IndexReader
  exposes a consisten snapshot of the index for searching

At top level
  All searches now operate on a consisten snapshot of the index
2014-09-12 17:21:35 -04:00
Marty Schoch
7a7eb2e94c add newline between license and package
this avoids cluttering godocs with the license
2014-09-02 10:54:50 -04:00
Marty Schoch
2ee7289bc8 major refactor of search package
this started initially to relocate highlighting into
a self contained package, which would then also use
the registry
however, it turned into a much larger refactor in
order to avoid cyclic imports
now facets, searchers, scorers and collectors
are also broken out into subpackages of search
2014-09-01 11:15:38 -04:00