previously, the only way to get numeric matching was with the
range operators :> :>= :< :<=
now, when we encounter field:val if the val can be parsed as a
number, then we do a disjunction search, which includes
searching for val as a text term and as an exact numeric value
1) disjunction and conjunction queries now support a
"query string mode". By default they do not operate
in this mode. When in this mode, any disjunct/conjunct
which evaluates to MatchNone searcher, will be removed
from the disjunction/conjunction. If the query ends
up with NO conjuncts/disjuncts, it will itself
return the MatchNone seacher.
2) boolean query also supports a query string mode. when in
this mode, the Must, Should and MustNot searchers are all put
into query string mode.
3) rewriting of negation only queries (like -foo) now take into
account the rewriting rules above, and those are handled first.
this means that we rewrite correctly in case of +stoword -foo
4) the empty query string is now valid, and returns 0 hits.
previously this was considered a validation error.
we now use a multiphrase query in all cases
internally its optimized to be the same as regular phrase query
anyway, and we simplly map all the tokens in the stream into
a multi-phrase query with the appropriate structure
when parsing json, when we encounter the key "terms", we first
try to parse as traditional phrase query, then if that fails,
we also try parsing it as multi-phrase
the logic of how a phrase search works should be an internal
detail of the phrase searcher. further, these changes will
allow proper scoring of phrase matches, which require access
to the underlying searcher objects, which were hidden in the
previous approach.
While researching an observed performance issue with wildcard
queries, it was observed that the LiteralPrefix() method on
the regexp.Regexp struct did not always behave as expected.
In particular, when the pattern starts with ^, AND involves
some backtracking, the LiteralPrefix() seems to always be the
empty string.
The side-effect of this is that we rely on having a helpful
prefix, to reduce the number of terms in the term dictionary
that need to be visited.
This change now makes the searcher enforce start/end on the term
directly, by using FindStringIndex() instead of Match().
Next, we also modified WildcardQuery and RegexpQuery to no
longer include the ^ and $ modifiers.
Documentation was also udpated to instruct users that they should
not include the ^ and $ modifiers in their patterns.
This is a change in search result behavior in that location
information is no longer provided by default with search results.
Although this looks like a wide-ranging change, it's mostly a
mechanical replacement of the explain bool flag with a new
search.SearcherOptions struct, which holds both the Explain bool flag
and the IncludeTermVectors bool flag.
From diagnosing a recent issue where the termSearchersFinished stats
were incorrectly tracked, I ended up scouring the Close() / cleanup
codepaths.
This change takes more care in Close()'ing child searchers, especially
in error situations. This can be important to allow underlying
kvstore's to release resources.
as we are a Go library is this the much more natural way to
express such queries.
support for strings is still supported through json marshal
and unmarshal, as well as inside query string queries
as before we use the package level QueryDateTimeParser to
deterimine which date time parser to use for parsing
only serializing out to json, we consult a new package
variable: QueryDateTimeFormat
this addresses the longstanding PR #255
Boostable, Fieldable, Validatable broken out into separate
interfaces. This allows them to be discoverable when
needed, but ignorable otherwise. The top-level bleve package
only every cares about Validatable and even that is optional.
Also, this change goes further to make the structure names
more reasonable, for cases where you're directly interacting
with the structures.