0
0
Fork 0
Commit Graph

46 Commits

Author SHA1 Message Date
Steve Yen c7a342bc7d scorch conjuncts match phrase test passes
The conjunction searcher Advance() method now checks if its curr
doc-matches suffices before advancing them.
2017-12-23 09:19:40 -08:00
Steve Yen d425a3be86 scorch fix disjunction searcher Advance()
Found with "versus" test (TestScorchVersusUpsideDownBoltSmallMNSAM),
which had a boolean query with a MustNot that was the same as the Must
parameters.  This replicates a situation found by
Aruna/Mihir/testrunner/RQG (MB-27291).  Example:

  "query": {
    "must_not": {"disjuncts": [
      {"field": "body", "match": "hello"}
    ]},
    "must": {"conjuncts": [
      {"field": "body", "match": "hello"}
    ]}
  }

The nested searchers along the MustNot pathway would end up looking
roughly like...

  booleanSearcher
    MustNot
      => disjunctionSearcher
         => disjunctionSearcher
            => termSearcher

On the first Next() call by the collector, the two disjunction
searchers would run through their respective Next() method processing,
which includes their initSearcher() processing on the first time.
This has the effect of driving the leaf termSearcher through two
Next() invocations.

That is, if there were 3 docs (doc-1, doc-2, doc-3), the leaf
termSearcher would at this point have moved to point to doc-3, while
the topmost MustNot would have received doc-1.

Next, the booleanSearcher's Must searcher would produce doc-2, so the
booleanSearcher would try to Advance() the MustNot searcher to doc-2.

But, in scorch, the leafmost termSearcher had already gotten past
doc-2 and would return its doc-3.

In upsidedown, in contrast, the leaf termSearcher would then drive the
KVStore iterator with a Seek(doc-2), and the KVStore iterator would
perform a backwards seek to reach doc-2.

In scorch, however, backwards iteration seeking isn't supported.

So, this fix checks the state of the disjunction searcher to see if we
already have the necessary state so that we don't have to perform
actual Advance()'es on the underlying searchers.  This not only fixes
the behavior w.r.t. scorch, but also can have an effect of potentially
making upsidedown slightly faster as we're avoiding some backwards
KVStore iterator seeks.
2017-12-21 18:20:04 -08:00
Steve Yen 93c787ca09 scorch versus_test.go passes errcheck 2017-12-21 16:49:39 -08:00
Steve Yen b3e41335e1 scorch compared to upsidedown/bolt using templated, generated searches
This is somewhat like a simple, unit-test'ish version of testrunner's
random query generator, where this does not have a dependency on an
external elasticsearch server, and instead depends on functional
correctness when comparing to upsidedown/bolt.
2017-12-21 16:43:52 -08:00
Marty Schoch 1eba5541f2 introduce new query TermRange
The term range query is not often used in full-text queries, but
can be useful when filtering on keyword indexed text terms in
the index.

The JSON syntax to do a TermRange query is the same as for
NumericRange, but the min/max values must be string and not
float64.
2017-03-31 22:04:00 -04:00
Marty Schoch 9790574610 update to geo query parsing and top-level bleve accessibility
- make geo queries accessible from top-level bleve
- update query parsing to support same geo point formats as
  document parsing
- add constructor for easier sorting by geo distance in Go
- additional integration tests using alternate (GeoJSON) style points
2017-03-30 15:23:27 -04:00
Marty Schoch 5636536583 fixed typo and formatted searches.json through jq . 2017-03-29 19:33:54 -04:00
Marty Schoch 7f89ff9493 add geo integration tests 2017-03-29 18:57:35 -04:00
Marty Schoch a5d1d7974c add query support for multi-phrase
when parsing json, when we encounter the key "terms", we first
try to parse as traditional phrase query, then if that fails,
we also try parsing it as multi-phrase
2017-02-10 16:46:38 -05:00
Steve Yen 89a1cefde1 API change: optional SearchRequest.IncludeLocations flag
This is a change in search result behavior in that location
information is no longer provided by default with search results.

Although this looks like a wide-ranging change, it's mostly a
mechanical replacement of the explain bool flag with a new
search.SearcherOptions struct, which holds both the Explain bool flag
and the IncludeTermVectors bool flag.
2017-01-05 21:11:22 -08:00
Marty Schoch 2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch d7298a6e97 remove commented out section found by @steveyen code review 2016-09-30 12:36:52 -04:00
Marty Schoch 35da361bfa BREAKING CHANGE - renamed packages to be shorter and not use _
this commit only addresses the analysis sub-package
2016-09-30 12:36:10 -04:00
Marty Schoch 79cc39a67e refactor mapping to inteface and move into separate package
the index mapping contains some relatively messy logic
and the top-level bleve package only cares about a relatively
small portion of this
the motivation for this change is to codify the part that the
top-level bleve package cares about into an interface
then move all the details into its own package

NOTE: the top-level bleve package still has hard dependency on
the actual implementation (for now) because it must deserialize
mappings from JSON and simply assumes it is this one instance.
this is seen as OK for now, and this issue could be revisited
in a future change.  moving the logic into a separate package
is seen as a simplification of top-level bleve, even though
we still depend on the one particular implementation.
2016-09-29 14:53:18 -04:00
Marty Schoch e1fb860a86 removed unused AsyncIndex interface 2016-09-13 08:42:36 -04:00
Marty Schoch 04fd62dec3 further tweaks, now all bleve tests pass 2016-09-11 20:29:15 -04:00
Marty Schoch 1ae938b781 add integration tests for sorting 2016-08-20 14:45:53 -04:00
Marty Schoch 9089de251f remove byte_array_conveters
fixes #392
fixes #100
2016-07-01 10:21:41 -04:00
Marty Schoch 2043bb4bf8 fix pagination bug introduced by collector optimization
fixes #378

this bug was introduced by:
f2aba116c4

theory of operation for this collector (top N, skip K)

- collect the highest scoring N+K results
- if K > 0, skip K and return the next N

internal details

- the top N+K are kept in a list
- the list is ordered from lowest scoring (first) to highest scoring (last)
- as a hit comes in, we find where this new hit would fit into this list
- if this caused the list to get too big, trim off the head (lowest scoring hit)

theory of the optimization

- we were not tracking the lowest score in the list
- so if the score was lower than the lowest score, we would add/remove it
- by keeping track of the lowest score in the list, we can avoid these ops

problem with the optimization
- the optimization worked by returning early
- by returning early there was a subtle change to documents which had the same score
- the reason is that which docs end up in the top N+K changed by returning early
- why was that? docs are coming in, in order by key ascending
- when finding the correct position to insert a hit into the list, we checked <, not <= the score
- this has the subtle effect that docs with the same score end up in reverse order

for example consider the following in progress list:

doc ids [   c    a    b  ]
scores  [   1    5    9  ]

if we now see doc d with score 5, we get:

doc ids [   c    a    d    b  ]
scores  [   1    5    5    9  ]

While that appears in order (a, d) it is actually reverse order, because when we
produce the top N we start at the end.

theory of the fix

- previous pagination depended on later hits with the same score "bumping" earlier
hits with the same score off the bottom of the list
- however, if we change the logic to <= instead of <, now the list in the previous
example would look like:

doc ids [   c    d    a    b  ]
scores  [   1    5    5    9  ]

- this small change means that now earlier (lower id) will score higher, and
thus we no longer depend on later hits bumping things down, which means returning
early is a valid thing to do

NOTE: this does depend on the hits coming back in order by ID.  this is not
something strictly guaranteed, but it was the same assumption that allowed the
original behavior

This also has the side-effect that 2 hits with the same score come back in
ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 11:35:18 -04:00
Marty Schoch 7ec37d6533 add support for wildcard and regexp queries to query string
you can now use terms like:

test?string*

and similar text in query strings to perform wildcard
searches.  also if you use:

/aregexp/

it will perform a regexp search as well
2016-04-08 15:56:02 -04:00
Marty Schoch 5badbfdb0e allow running integration tests on alternate kvstore 2016-03-07 08:40:15 -05:00
Marty Schoch 5408083ab5 from JSON parsing regexp/wildcard queries defaulted to boost of 0
having boost of 0 led to invalid scores of NaN
added integration test for wildcard query
added ability to run single integration test at a time
added assertion that scoare is not NaN/+Inf/-Inf
2016-02-23 09:22:39 -05:00
Marty Schoch c07fa47551 added test case to verify boost is working 2016-02-05 13:10:01 -05:00
Marty Schoch 0bddafb9e1 properly anchor regexp patterns to end of term
added integration tests for regexp anchoring
fixes #329
2016-01-21 13:44:38 -05:00
Marty Schoch b7c03dae1a boolean query defaults to minShould of 0
fixes #258
2016-01-12 16:30:10 -05:00
Marty Schoch 8efbd556a3 fix indexing bug with data coming from arrays
fixes #295
2015-12-21 14:59:32 -05:00
Marty Schoch 7bb58e1be4 add ability for integration test to check hit locations 2015-12-21 14:42:43 -05:00
Marty Schoch f7698f1f15 support match_all, match_none and docid queries via JSON
also fixed bug in docIDQuery execution which would cause not
matching the highest docID passed in if it was in fact a
valid ID
2015-12-16 14:53:14 -05:00
Marty Schoch 84ec206fec add some tests for index names in results 2015-12-08 14:38:46 -05:00
Marty Schoch b4d4ee2fff fix incorrect results returned by phrase search
previously phrase searcher would not validate that consecutive
terms were actually occurring in the same array position

fixes #292
2015-12-06 15:55:00 -05:00
Marty Schoch a73a178923 fix incorrect prefix search behavior
avoids double incrementing of end term when reading term dict
fixes #293
2015-12-04 14:07:16 -05:00
Marty Schoch 699c86073a make existing integration tests work with firestorm 2015-12-01 12:29:56 -05:00
Marty Schoch f81b2be334 major refactor of bleve configuration
see #221 for full details
2015-09-16 17:10:59 -04:00
Marty Schoch f35e2e42df fix highlighting to work on fields containing arrays
fixes #170
2015-07-31 14:43:12 -04:00
Marty Schoch 2a8f319689 added test case for query string containing only MUST NOT clause 2015-07-13 15:30:19 -04:00
Marty Schoch 7f0961424d updated tests for <mark></mark> 2015-07-06 18:00:05 -04:00
Marty Schoch 539aeb8dc7 fix errors identified by errcheck
part of #169
2015-04-07 18:05:41 -04:00
Marty Schoch 56c4a09de1 fix issues identified by errcheck
part of #169
2015-04-07 15:39:56 -04:00
Marty Schoch 0df0a6fcb2 better logging on which test failed in integration tests 2015-03-10 14:05:30 -04:00
Marty Schoch a69fa1e91d adding tests based on problems found with fosdem dataset 2015-01-22 09:57:26 -05:00
Silvan Jegen ef18dfe4cd Fix typos in comments and strings 2014-12-18 18:43:12 +01:00
Marty Schoch a2c3fa262a add more test cases of index
tests fields, highlighting, document field loading
2014-11-26 15:36:58 -05:00
Marty Schoch 65fe69d705 added integration tests for facets 2014-11-25 17:18:16 -05:00
Marty Schoch 67beaca6d6 fix to phrase/phrase match search involving stop words
closes #122
2014-11-25 10:07:54 -05:00
Marty Schoch 12ec3173fa added integration test for fuzzy search 2014-11-21 14:01:48 -05:00
Marty Schoch 68a2b9614d refactored integration tests into separate package
also made integration tests declarative
you can now easily define new datasets/mappings/searches/results
2014-11-19 15:58:15 -05:00