if min and max are the same term
and the term is in dictionary
and both in and max are set to exclusive
then we would panic attempting to access element -1 of a slice.
now, after trimming the slice, we recheck that the length is > 0
there was a bug where if the circle described by the point
distance query crossed the poles, then we incorrectly built
a box around it. this resulted in incorrect searh results.
The term range query is not often used in full-text queries, but
can be useful when filtering on keyword indexed text terms in
the index.
The JSON syntax to do a TermRange query is the same as for
NumericRange, but the min/max values must be string and not
float64.
- TermSearcher has alternate constructor if term is []byte, this can avoid
copying in some cases. TermScorer updated to accept []byte term. Also
removed a few struct fields which were not being used.
- New MultiTermSearcher searches for documents containing any of a list of
terms. Current implementation simply uses DisjunctionSearcher.
- Several other searcher constructors now simply build a list of terms and
then delegate to the MultiTermSearcher
- NewPrefixSearcher
- NewRegexpSearcher
- NewFuzzySearcher
- NewNumericRangeSearcher
- NewGeoBoundingBoxSearcher and NewGeoPointDistanceSearcher make use of
the MultiTermSearcher internally, and follow the pattern of returning
an existing search.Searcher, as opposed to their own wrapping struct.
- Callback filter functions used in NewGeoBoundingBoxSearcher and
NewGeoPointDistanceSearcher have been extracted into separate functions
which makes the code much easier to read.
also added flag for bounding box searcher to optionally not
check boundaries. this is useful when other searchers are going
to check every point anyway by some other criteria.
- added readme pointing back to lucene origins
- improved documentation of exported methods in geo package
- improved test coverage to 100% on geo package
- added support for parsing geojson style points
- removed some duplicated code in the geo bounding box searcher
New field type GeoPointField, or "geopoint" in mapping JSON.
Currently structs and maps are considered when a mapping explicitly
marks a field as type "geopoint". Several variants of "lon", "lng", and "lat"
are looked for in map keys, struct field names, or method names.
New query type GeoBoundingBoxQuery searches for documents which have a
GeoPointField indexed with a value that is inside the specified bounding box.
New query type GeoDistanceQuery searches for documents which have a
GeoPointField indexed with a value that is less than or equal to the
specified distance from the specified location.
New sort by method "geo_distance". Hits can be sorted by their distance
from the specified location.
New geo utility package with all routines ported from Lucene.
New FilteringSearcher, which wraps an existing Searcher, but filters
all hits with a user-provided callback.
the logic of how a phrase search works should be an internal
detail of the phrase searcher. further, these changes will
allow proper scoring of phrase matches, which require access
to the underlying searcher objects, which were hidden in the
previous approach.
this originated from a misunderstanding of mine going back
several years. the values need not be float64 just because
we plan to serialize them as json.
there are still larger questions about what the right type should
be, and where should any conversions go. but, this commit
simply attempts to address the most egregious problems
at the core, the Next() method moves another searcher forward
and checks each hit to see if it also satisfies the phrase
constraints. the current implementation has 4 nested for loops.
these nested loops make it harder read (indentation) and harder
to reason about (complexity).
this refactor does not remove any loops, it simply moves some
of the inner loops into separate methods so that one can
more easily reason about the parts separately.
While researching an observed performance issue with wildcard
queries, it was observed that the LiteralPrefix() method on
the regexp.Regexp struct did not always behave as expected.
In particular, when the pattern starts with ^, AND involves
some backtracking, the LiteralPrefix() seems to always be the
empty string.
The side-effect of this is that we rely on having a helpful
prefix, to reduce the number of terms in the term dictionary
that need to be visited.
This change now makes the searcher enforce start/end on the term
directly, by using FindStringIndex() instead of Match().
Next, we also modified WildcardQuery and RegexpQuery to no
longer include the ^ and $ modifiers.
Documentation was also udpated to instruct users that they should
not include the ^ and $ modifiers in their patterns.
This is a change in search result behavior in that location
information is no longer provided by default with search results.
Although this looks like a wide-ranging change, it's mostly a
mechanical replacement of the explain bool flag with a new
search.SearcherOptions struct, which holds both the Explain bool flag
and the IncludeTermVectors bool flag.
This commit reverts a previous optimization attempt 3f588cd4a that
tried to trim or shrink the array of child searchers in a
search-disjunction.
Although I am not sure why at the moment, that optimization
incorrectly broke higher level boolean queries, but reverting so that
functionality is restored.
Disjunction searchers are used heavily by higher-level searchers, like
prefix searchers. In that case, a disjunction searcher might have
many thousands of child searchers.
This commit adds an optimization to close each child term searcher as
soon as a child searcher is finished and remove it from the
disjunction searcher's children.