fixes#378
this bug was introduced by:
f2aba116c4
theory of operation for this collector (top N, skip K)
- collect the highest scoring N+K results
- if K > 0, skip K and return the next N
internal details
- the top N+K are kept in a list
- the list is ordered from lowest scoring (first) to highest scoring (last)
- as a hit comes in, we find where this new hit would fit into this list
- if this caused the list to get too big, trim off the head (lowest scoring hit)
theory of the optimization
- we were not tracking the lowest score in the list
- so if the score was lower than the lowest score, we would add/remove it
- by keeping track of the lowest score in the list, we can avoid these ops
problem with the optimization
- the optimization worked by returning early
- by returning early there was a subtle change to documents which had the same score
- the reason is that which docs end up in the top N+K changed by returning early
- why was that? docs are coming in, in order by key ascending
- when finding the correct position to insert a hit into the list, we checked <, not <= the score
- this has the subtle effect that docs with the same score end up in reverse order
for example consider the following in progress list:
doc ids [ c a b ]
scores [ 1 5 9 ]
if we now see doc d with score 5, we get:
doc ids [ c a d b ]
scores [ 1 5 5 9 ]
While that appears in order (a, d) it is actually reverse order, because when we
produce the top N we start at the end.
theory of the fix
- previous pagination depended on later hits with the same score "bumping" earlier
hits with the same score off the bottom of the list
- however, if we change the logic to <= instead of <, now the list in the previous
example would look like:
doc ids [ c d a b ]
scores [ 1 5 5 9 ]
- this small change means that now earlier (lower id) will score higher, and
thus we no longer depend on later hits bumping things down, which means returning
early is a valid thing to do
NOTE: this does depend on the hits coming back in order by ID. this is not
something strictly guaranteed, but it was the same assumption that allowed the
original behavior
This also has the side-effect that 2 hits with the same score come back in
ascending ID order, which is somehow more pleasing to me than reverse order.
parsing of date ranges in queries no longer consults the
index mapping. it was deteremined that this wasn't very useful
and led to overly complicated query syntax/behavior.
instead, applications get set the datetime parser used for
date range queries with the top-level config QueryDateTimeParser
also, we now support querying date ranges in the query string,
the syntax is:
field:>"date"
>,>=,<,<= operators are supported
the date must be surrounded by quotes
and must parse in the configured date format
regexp, fuzzy and numeric range searchers now check to see if
they will be exceeding a configured DisjunctionMaxClauseCount
and stop work earlier, this does a better job of avoiding
situations which consume all available memory for an operation
they cannot complete
you can now use terms like:
test?string*
and similar text in query strings to perform wildcard
searches. also if you use:
/aregexp/
it will perform a regexp search as well
Currently bleve batch is build by user goroutine
Then read by bleve gourinte
This is still safe when used correctly
However, Reset() will modify the map, which is now a data race
This fix is to simply make batch.Reset() alloc new maps.
This provides a data-access pattern that can be used safely.
Also, this thread argues that creating a new map may be faster
than trying to reuse an existing one:
https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J
Separate but related, I have opted to remove the "unsafe batch"
checking that we did. This was always limited anyway, and now
users of Go 1.6 are just as likely to get a panic from the
runtime for concurrent map access anyway. So, the price paid
by us (additional mutex) is not worth it.
fixes#360 and #260
this change improves compatibility with the simple analyzer
defined by Lucene. this has important implications for
some perf tests as well as they often use the simple
analyzer.
this change only affects JSON parsing, any search request which
omits the size field entirely now defaults to 10 which
is the same behavior as NewSearchRequest()
0 is still a valid size, but must be set explicitly