bleve

Author	SHA1	Message	Date
Marty Schoch	5aa9e95468	major refactor of index/search API index id's are now opaque (until finally returned to top-level user) - the TermFieldDoc's returned by TermFieldReader no longer contain doc id - instead they return an opaque IndexInternalID - items returned are still in the "natural index order" - but that is no longer guaranteed to be "doc id order" - correct behavior requires that they all follow the same order - but not any particular order - new API FinalizeDocID which converts index internal ID's to public string ID - APIs used internally which previously took doc id now take IndexInternalID - that is DocumentFieldTerms() and DocumentFieldTermsForFields() - however, APIs that are used externally do not reflect this change - that is Document() - DocumentIDReader follows the same changes, but this is less obvious - behavior clarified, used to iterate doc ids, BUT NOT in doc id order - method STILL available to iterate doc ids in range - but again, you won't get them in any meaningful order - new method to iterate actual doc ids from list of possible ids - this was introduced to make the DocIDSearcher continue working searchers now work with the new opaque index internal doc ids - they return new DocumentMatchInternal (which does not have string ID) scorerers also work with these opaque index internal doc ids - they return DocumentMatchInternal (which does not have string ID) collectors now also perform a final step of converting the final result - they STILL return traditional DocumentMatch (with string ID) - but they now also require an IndexReader (so that they can do the conversion)	2016-07-31 13:46:18 -04:00
Steve Yen	4822cff63a	optimize Advance() with pre-allocated in-out param This perf-related change helps the code and API reach more similarity with the Next() methods, which now take a pre-allocate param.	2016-07-29 14:15:00 -07:00
Steve Yen	b8c8478783	optimize collector to check ctx.Done() only occasionally	2016-07-21 11:10:49 -07:00
Steve Yen	b564ebbfbe	optimization comments on DocumentMatch instance reuse	2016-07-21 11:10:49 -07:00
Steve Yen	988ca62182	optimize upside_down reader Next() with doc match reuse This optimization changes the search.Search.Next() interface API, adding an optional, pre-allocated DocumentMatch parameter. When it's non-nil, the TermSearcher and TermQueryScorer will use that pre-allocated DocumentMatch, instead of allocating a brand new DocumentMatch instance.	2016-07-21 11:10:49 -07:00
Marty Schoch	2043bb4bf8	fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: `f2aba116c4` theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.	2016-06-01 11:35:18 -04:00
slavikm	f2aba116c4	Make top score collector about 7 times faster	2016-04-29 09:46:47 -07:00
Marty Schoch	0b2380d9bf	introduce ability for searches to timeout or be cancelled our implementation uses: golang.org/x/net/context New method SearchInContext() allows the user to run a search in the provided context. If that context is cancelled or exceeds its deadline Bleve will attempt to stop and return as soon as possible. This is a best effort attempt at this time and may not be in a timely manner. If the caller must return very near the timeout, the call should also be wrapped in a goroutine. The IndexAlias implementation is affected in a slightly more complex way. In order to return partial results when a timeout occurs on some indexes, the timeout is strictly enforced, and at the moment this does introduce an additional goroutine. The Bleve implementation honoring the context is currently very course-grained. Specifically we check the Done() channel between each DocumentMatch produced during the search. In the future we will propogate the context deeper into the internals of Bleve, and this will allow finer-grained timeout behavior.	2016-03-02 17:30:21 -05:00
Marty Schoch	c8d974048a	fix issues identified by errcheck part of #169	2015-04-07 14:59:35 -04:00
Marty Schoch	300ec79c96	first pass at checking errors that were ignored part of #169	2015-03-06 14:46:29 -05:00
Marty Schoch	7284c10020	added benchmark to collector	2015-03-06 12:59:44 -05:00
Marty Schoch	e1b77956d4	more golint cleanups	2014-09-03 18:47:02 -04:00
Marty Schoch	7a7eb2e94c	add newline between license and package this avoids cluttering godocs with the license	2014-09-02 10:54:50 -04:00
Marty Schoch	2ee7289bc8	major refactor of search package this started initially to relocate highlighting into a self contained package, which would then also use the registry however, it turned into a much larger refactor in order to avoid cyclic imports now facets, searchers, scorers and collectors are also broken out into subpackages of search	2014-09-01 11:15:38 -04:00

14 Commits