From diagnosing a recent issue where the termSearchersFinished stats
were incorrectly tracked, I ended up scouring the Close() / cleanup
codepaths.
This change takes more care in Close()'ing child searchers, especially
in error situations. This can be important to allow underlying
kvstore's to release resources.
the motivation for this commit is long and detailed and has been
documented externally here:
https://gist.github.com/mschoch/5cc5c9cf4669a5fe8512cb7770d3c1a2
the core of the changes are:
1. recognize that collector/searcher need only a fixed number
of DocumentMatch instances, and this number can be determined
from the structure of the query, not the size of the data
2. knowing this, instances can be allocated in bulk, up front
and they can be reused without locking (since all search
operations take place in a single goroutine
3. combined with previous commits which enabled reuse of
the IndexInternalID []byte, this allows for no allocation/copy
of these bytes as well (by using DocumentMatch Reset() method
when returning entries to the pool
instead of separate DocumentMatch/DocumentMatchInternal
rules are simple, everything operates on the IndexInternalID field
until the results are returned, then ID is set correctly
the IndexInternalID field is not exported to JSON
index id's are now opaque (until finally returned to top-level user)
- the TermFieldDoc's returned by TermFieldReader no longer contain doc id
- instead they return an opaque IndexInternalID
- items returned are still in the "natural index order"
- but that is no longer guaranteed to be "doc id order"
- correct behavior requires that they all follow the same order
- but not any particular order
- new API FinalizeDocID which converts index internal ID's to public string ID
- APIs used internally which previously took doc id now take IndexInternalID
- that is DocumentFieldTerms() and DocumentFieldTermsForFields()
- however, APIs that are used externally do not reflect this change
- that is Document()
- DocumentIDReader follows the same changes, but this is less obvious
- behavior clarified, used to iterate doc ids, BUT NOT in doc id order
- method STILL available to iterate doc ids in range
- but again, you won't get them in any meaningful order
- new method to iterate actual doc ids from list of possible ids
- this was introduced to make the DocIDSearcher continue working
searchers now work with the new opaque index internal doc ids
- they return new DocumentMatchInternal (which does not have string ID)
scorerers also work with these opaque index internal doc ids
- they return DocumentMatchInternal (which does not have string ID)
collectors now also perform a final step of converting the final result
- they STILL return traditional DocumentMatch (with string ID)
- but they now also require an IndexReader (so that they can do the conversion)
This optimization changes the search.Search.Next() interface API,
adding an optional, pre-allocated *DocumentMatch parameter.
When it's non-nil, the TermSearcher and TermQueryScorer will use that
pre-allocated *DocumentMatch, instead of allocating a brand new
DocumentMatch instance.
these searchers incorrectly called Next() on their underlying
searcher, instead of Advance(). this can cause values to be
returned with an ID less than the one that was Advanced() to,
which violates the contract, and causes other incorrect behavior.
fixes#342
regexp, fuzzy and numeric range searchers now check to see if
they will be exceeding a configured DisjunctionMaxClauseCount
and stop work earlier, this does a better job of avoiding
situations which consume all available memory for an operation
they cannot complete