0
0
Commit Graph

15 Commits

Author SHA1 Message Date
Marty Schoch
1eba5541f2 introduce new query TermRange
The term range query is not often used in full-text queries, but
can be useful when filtering on keyword indexed text terms in
the index.

The JSON syntax to do a TermRange query is the same as for
NumericRange, but the min/max values must be string and not
float64.
2017-03-31 22:04:00 -04:00
Marty Schoch
2043bb4bf8 fix pagination bug introduced by collector optimization
fixes #378

this bug was introduced by:
f2aba116c4

theory of operation for this collector (top N, skip K)

- collect the highest scoring N+K results
- if K > 0, skip K and return the next N

internal details

- the top N+K are kept in a list
- the list is ordered from lowest scoring (first) to highest scoring (last)
- as a hit comes in, we find where this new hit would fit into this list
- if this caused the list to get too big, trim off the head (lowest scoring hit)

theory of the optimization

- we were not tracking the lowest score in the list
- so if the score was lower than the lowest score, we would add/remove it
- by keeping track of the lowest score in the list, we can avoid these ops

problem with the optimization
- the optimization worked by returning early
- by returning early there was a subtle change to documents which had the same score
- the reason is that which docs end up in the top N+K changed by returning early
- why was that? docs are coming in, in order by key ascending
- when finding the correct position to insert a hit into the list, we checked <, not <= the score
- this has the subtle effect that docs with the same score end up in reverse order

for example consider the following in progress list:

doc ids [   c    a    b  ]
scores  [   1    5    9  ]

if we now see doc d with score 5, we get:

doc ids [   c    a    d    b  ]
scores  [   1    5    5    9  ]

While that appears in order (a, d) it is actually reverse order, because when we
produce the top N we start at the end.

theory of the fix

- previous pagination depended on later hits with the same score "bumping" earlier
hits with the same score off the bottom of the list
- however, if we change the logic to <= instead of <, now the list in the previous
example would look like:

doc ids [   c    d    a    b  ]
scores  [   1    5    5    9  ]

- this small change means that now earlier (lower id) will score higher, and
thus we no longer depend on later hits bumping things down, which means returning
early is a valid thing to do

NOTE: this does depend on the hits coming back in order by ID.  this is not
something strictly guaranteed, but it was the same assumption that allowed the
original behavior

This also has the side-effect that 2 hits with the same score come back in
ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 11:35:18 -04:00
Marty Schoch
7ec37d6533 add support for wildcard and regexp queries to query string
you can now use terms like:

test?string*

and similar text in query strings to perform wildcard
searches.  also if you use:

/aregexp/

it will perform a regexp search as well
2016-04-08 15:56:02 -04:00
Marty Schoch
5408083ab5 from JSON parsing regexp/wildcard queries defaulted to boost of 0
having boost of 0 led to invalid scores of NaN
added integration test for wildcard query
added ability to run single integration test at a time
added assertion that scoare is not NaN/+Inf/-Inf
2016-02-23 09:22:39 -05:00
Marty Schoch
c07fa47551 added test case to verify boost is working 2016-02-05 13:10:01 -05:00
Marty Schoch
0bddafb9e1 properly anchor regexp patterns to end of term
added integration tests for regexp anchoring
fixes #329
2016-01-21 13:44:38 -05:00
Marty Schoch
b7c03dae1a boolean query defaults to minShould of 0
fixes #258
2016-01-12 16:30:10 -05:00
Marty Schoch
f7698f1f15 support match_all, match_none and docid queries via JSON
also fixed bug in docIDQuery execution which would cause not
matching the highest docID passed in if it was in fact a
valid ID
2015-12-16 14:53:14 -05:00
Marty Schoch
a73a178923 fix incorrect prefix search behavior
avoids double incrementing of end term when reading term dict
fixes #293
2015-12-04 14:07:16 -05:00
Marty Schoch
f35e2e42df fix highlighting to work on fields containing arrays
fixes #170
2015-07-31 14:43:12 -04:00
Marty Schoch
2a8f319689 added test case for query string containing only MUST NOT clause 2015-07-13 15:30:19 -04:00
Marty Schoch
7f0961424d updated tests for <mark></mark> 2015-07-06 18:00:05 -04:00
Marty Schoch
a2c3fa262a add more test cases of index
tests fields, highlighting, document field loading
2014-11-26 15:36:58 -05:00
Marty Schoch
12ec3173fa added integration test for fuzzy search 2014-11-21 14:01:48 -05:00
Marty Schoch
68a2b9614d refactored integration tests into separate package
also made integration tests declarative
you can now easily define new datasets/mappings/searches/results
2014-11-19 15:58:15 -05:00