bleve

Author	SHA1	Message	Date
Marty Schoch	272da43c16	phrase searcher don't allow advance after end	2017-12-27 10:24:33 -08:00
Steve Yen	c7a342bc7d	scorch conjuncts match phrase test passes The conjunction searcher Advance() method now checks if its curr doc-matches suffices before advancing them.	2017-12-23 09:19:40 -08:00
Steve Yen	d425a3be86	scorch fix disjunction searcher Advance() Found with "versus" test (TestScorchVersusUpsideDownBoltSmallMNSAM), which had a boolean query with a MustNot that was the same as the Must parameters. This replicates a situation found by Aruna/Mihir/testrunner/RQG (MB-27291). Example: "query": { "must_not": {"disjuncts": [ {"field": "body", "match": "hello"} ]}, "must": {"conjuncts": [ {"field": "body", "match": "hello"} ]} } The nested searchers along the MustNot pathway would end up looking roughly like... booleanSearcher MustNot => disjunctionSearcher => disjunctionSearcher => termSearcher On the first Next() call by the collector, the two disjunction searchers would run through their respective Next() method processing, which includes their initSearcher() processing on the first time. This has the effect of driving the leaf termSearcher through two Next() invocations. That is, if there were 3 docs (doc-1, doc-2, doc-3), the leaf termSearcher would at this point have moved to point to doc-3, while the topmost MustNot would have received doc-1. Next, the booleanSearcher's Must searcher would produce doc-2, so the booleanSearcher would try to Advance() the MustNot searcher to doc-2. But, in scorch, the leafmost termSearcher had already gotten past doc-2 and would return its doc-3. In upsidedown, in contrast, the leaf termSearcher would then drive the KVStore iterator with a Seek(doc-2), and the KVStore iterator would perform a backwards seek to reach doc-2. In scorch, however, backwards iteration seeking isn't supported. So, this fix checks the state of the disjunction searcher to see if we already have the necessary state so that we don't have to perform actual Advance()'es on the underlying searchers. This not only fixes the behavior w.r.t. scorch, but also can have an effect of potentially making upsidedown slightly faster as we're avoiding some backwards KVStore iterator seeks.	2017-12-21 18:20:04 -08:00
Steve Yen	33687260ca	children of conjunct/disjunct's are not necessarily termSearchers Rename termSearcher loop variable to searcher, as the child searchers of a conjunction/disjunction searcher aren't necessarily termSearchers.	2017-12-21 16:45:43 -08:00
Marty Schoch	c048833fcd	added stringer method to phrase part a failing test was producing unhelpful pointer addresses as the only debug output. this changes the output to print the terms and locations as readable text part of #629	2017-09-01 09:16:08 -04:00
Marty Schoch	4c801f2f01	fix issue with numeric range queries in query string previously the query string queries were modified to aid in compatibility with other search systems. this change: `f391b991c2` has a problem when combined with: `77101ae424` due to the introduction of MatchNoneSearchers being returned in a case where previously they never would. the fix for now is to simply return disjunction queries on 0 terms instead. this ultimately also matches nothing, but avoids triggering the logic which handles match none searchers in a special way.	2017-06-06 16:03:05 -04:00
Marty Schoch	77101ae424	filter numeric range terms against the term dictionary previously, all numeric terms required to implement a numeric range search were passed to the disjunction query (possibly exceeding the disjunction clause limit) now, after producing the list of terms, we filter them against the terms which actually exist in the term dictionary. the theory is that this will often greatly reduce the number of terms and therefore reduce the likelihood that you would run into the disjunction term limit in practice. because the term dictionary interface does not have a seek API and we're reluctant to add that now, i chose to do a binary search of the terms, which either finds the term, or not. then subsequent binary searches can proceed from that position, since both the list of terms and the term dictionary are sorted.	2017-05-31 13:15:13 -04:00
Marty Schoch	87f693fc57	fix panic in term range search if min and max are the same term and the term is in dictionary and both in and max are set to exclusive then we would panic attempting to access element -1 of a slice. now, after trimming the slice, we recheck that the length is > 0	2017-05-05 23:13:04 -04:00
Marty Schoch	8df8d4e797	fix geo point distance search there was a bug where if the circle described by the point distance query crossed the poles, then we incorrectly built a box around it. this resulted in incorrect searh results.	2017-04-27 17:28:07 -04:00
Marty Schoch	6f62489f21	add option for multi term searcher to skip max disjunction check - geo searches now use this option and skip the check - export ComputeGeoTerms for geo debug visualizations	2017-04-04 10:46:57 -04:00
Marty Schoch	1eba5541f2	introduce new query TermRange The term range query is not often used in full-text queries, but can be useful when filtering on keyword indexed text terms in the index. The JSON syntax to do a TermRange query is the same as for NumericRange, but the min/max values must be string and not float64.	2017-03-31 22:04:00 -04:00
Marty Schoch	f8fdfebb6c	refactor searchers - TermSearcher has alternate constructor if term is []byte, this can avoid copying in some cases. TermScorer updated to accept []byte term. Also removed a few struct fields which were not being used. - New MultiTermSearcher searches for documents containing any of a list of terms. Current implementation simply uses DisjunctionSearcher. - Several other searcher constructors now simply build a list of terms and then delegate to the MultiTermSearcher - NewPrefixSearcher - NewRegexpSearcher - NewFuzzySearcher - NewNumericRangeSearcher - NewGeoBoundingBoxSearcher and NewGeoPointDistanceSearcher make use of the MultiTermSearcher internally, and follow the pattern of returning an existing search.Searcher, as opposed to their own wrapping struct. - Callback filter functions used in NewGeoBoundingBoxSearcher and NewGeoPointDistanceSearcher have been extracted into separate functions which makes the code much easier to read.	2017-03-31 17:21:46 -04:00
Marty Schoch	6554e9624f	geo review comments from sreekanth also one fix came from steve, i must have forgotten to push that commit up before merging	2017-03-31 08:41:40 -04:00
Marty Schoch	6507e31787	improved geo searcher unit tests also added flag for bounding box searcher to optionally not check boundaries. this is useful when other searchers are going to check every point anyway by some other criteria.	2017-03-29 16:57:58 -04:00
Marty Schoch	fdbe669fd5	several more items on the geo checklist - added readme pointing back to lucene origins - improved documentation of exported methods in geo package - improved test coverage to 100% on geo package - added support for parsing geojson style points - removed some duplicated code in the geo bounding box searcher	2017-03-29 14:21:59 -04:00
Marty Schoch	a16efa5e78	add experimental support for indexing/query geo points New field type GeoPointField, or "geopoint" in mapping JSON. Currently structs and maps are considered when a mapping explicitly marks a field as type "geopoint". Several variants of "lon", "lng", and "lat" are looked for in map keys, struct field names, or method names. New query type GeoBoundingBoxQuery searches for documents which have a GeoPointField indexed with a value that is inside the specified bounding box. New query type GeoDistanceQuery searches for documents which have a GeoPointField indexed with a value that is less than or equal to the specified distance from the specified location. New sort by method "geo_distance". Hits can be sorted by their distance from the specified location. New geo utility package with all routines ported from Lucene. New FilteringSearcher, which wraps an existing Searcher, but filters all hits with a user-provided callback.	2017-03-24 17:22:21 -07:00
Marty Schoch	2ba915b929	add additional parens to clarify logic	2017-02-10 20:22:32 -05:00
Marty Schoch	c6085d8cdc	address initial code review comments	2017-02-10 15:22:14 -05:00
Marty Schoch	09d00829db	phrase searcher now supports multi-phrase backwards compatability maintained through previous constructor very basic test added (not sufficient)	2017-02-10 15:17:50 -05:00
Marty Schoch	9c8e1e82de	add initial low-level support for multi-phrase this adds basic multi-phrase support, a shim to keep the top-level working and unit tests for new multi-phrase cases	2017-02-10 13:16:05 -05:00
Marty Schoch	4e38c49287	move phrase search logic into phrase searcher the logic of how a phrase search works should be an internal detail of the phrase searcher. further, these changes will allow proper scoring of phrase matches, which require access to the underlying searcher objects, which were hidden in the previous approach.	2017-02-10 12:05:01 -05:00
Marty Schoch	8096d9fb90	remove use of float64 to represent int things this originated from a misunderstanding of mine going back several years. the values need not be float64 just because we plan to serialize them as json. there are still larger questions about what the right type should be, and where should any conversions go. but, this commit simply attempts to address the most egregious problems	2017-02-09 20:15:59 -05:00
Marty Schoch	232fc80dad	add support for phrase slop to internals of phrase searcher phrase slop is not yet supported on the frontend added lots of tests around slop	2017-02-09 15:59:51 -05:00
Marty Schoch	f82638c117	refactor phrase search to be recursive a more correct solution that will enable us to extend in two important ways: 1) support slop 2) support multi-phrase	2017-02-03 16:05:21 -05:00
Marty Schoch	12a7257b5f	remove duplicate code suggested by review from @steveyen	2017-01-31 15:12:06 -05:00
Marty Schoch	7fd8aeb50a	refactor phrase search into seprate methods at the core, the Next() method moves another searcher forward and checks each hit to see if it also satisfies the phrase constraints. the current implementation has 4 nested for loops. these nested loops make it harder read (indentation) and harder to reason about (complexity). this refactor does not remove any loops, it simply moves some of the inner loops into separate methods so that one can more easily reason about the parts separately.	2017-01-31 13:32:46 -05:00
Marty Schoch	b55c9043b9	improve performance of regular expression and wildcard queries While researching an observed performance issue with wildcard queries, it was observed that the LiteralPrefix() method on the regexp.Regexp struct did not always behave as expected. In particular, when the pattern starts with ^, AND involves some backtracking, the LiteralPrefix() seems to always be the empty string. The side-effect of this is that we rely on having a helpful prefix, to reduce the number of terms in the term dictionary that need to be visited. This change now makes the searcher enforce start/end on the term directly, by using FindStringIndex() instead of Match(). Next, we also modified WildcardQuery and RegexpQuery to no longer include the ^ and $ modifiers. Documentation was also udpated to instruct users that they should not include the ^ and $ modifiers in their patterns.	2017-01-18 16:22:16 -05:00
Marty Schoch	8cd6040b63	Merge pull request #512 from steveyen/master API change: optional SearchRequest.IncludeLocations flag	2017-01-09 14:19:17 -05:00
Steve Yen	89a1cefde1	API change: optional SearchRequest.IncludeLocations flag This is a change in search result behavior in that location information is no longer provided by default with search results. Although this looks like a wide-ranging change, it's mostly a mechanical replacement of the explain bool flag with a new search.SearcherOptions struct, which holds both the Explain bool flag and the IncludeTermVectors bool flag.	2017-01-05 21:11:22 -08:00
Silvan Jegen	1a6a4c493b	Check locations in the phrase searcher as well	2016-11-08 20:05:36 +01:00
Silvan Jegen	33e2432fc6	Initialize the return value as late as possible	2016-11-08 20:05:36 +01:00
Silvan Jegen	3dd363afaa	Don't search the same term twice We have searched for the first term in the phrase query already so we can skip it. Before doing so we have to add the location of the first term.	2016-11-08 20:05:04 +01:00
Silvan Jegen	d87b4f88bf	Refactor phrase searching Reduce nesting by using early continues.	2016-11-08 20:04:28 +01:00
Steve Yen	adc409e823	optimize NewRegexpSearcher to return its disjunction searcher This minor optimization removes an unnecessary wrapper around the disjunction searcher.	2016-10-27 13:16:41 -07:00
Steve Yen	58c3b5c9b8	revert optimization that trims search-disjunction child searchers This commit reverts a previous optimization attempt `3f588cd4a` that tried to trim or shrink the array of child searchers in a search-disjunction. Although I am not sure why at the moment, that optimization incorrectly broke higher level boolean queries, but reverting so that functionality is restored.	2016-10-18 14:38:34 -07:00
Marty Schoch	5c7a2264a2	Merge pull request #473 from steveyen/reuse-incrementBytes-in-moss-kv-integration reuse incrementBytes() in moss KV store integration	2016-10-13 14:03:46 +02:00
Marty Schoch	cee18d302e	Merge pull request #475 from steveyen/phrase-searcher-simplifications-dry some simplification / DRY for phrase searcher	2016-10-12 23:07:35 +02:00
Steve Yen	1a994ce2a7	end fuzzy searcher prefixTerm construction loop early	2016-10-12 09:51:36 -07:00
Steve Yen	6a38fa3719	go fmt	2016-10-12 09:39:43 -07:00
Steve Yen	8230a7195f	some simplification / DRY for phrase searcher	2016-10-12 09:26:31 -07:00
Marty Schoch	bddc064069	Merge pull request #471 from steveyen/remove-extra-indirection-LevenshteinDistance removed extra level of pointer indirection from LevenshteinDistance()'s params	2016-10-12 14:05:34 +02:00
Marty Schoch	483f06ef5b	Merge pull request #467 from steveyen/optimize-disjunction-searcher-shrink-children optimize disjunction searcher to trim child searchers array earlier	2016-10-12 14:00:19 +02:00
Marty Schoch	b76cbc805e	Merge pull request #465 from steveyen/cleanup-when-PrefixSearcher-error close resources when we encounter an error on PrefixSearcher initialization	2016-10-12 13:39:28 +02:00
Steve Yen	b6c97ddbfe	removed extra ptr indirection from LevenshteinDistance	2016-10-11 08:49:10 -07:00
Steve Yen	3f588cd4ae	optimize disjunction searcher to trim child searchers array earlier Disjunction searchers are used heavily by higher-level searchers, like prefix searchers. In that case, a disjunction searcher might have many thousands of child searchers. This commit adds an optimization to close each child term searcher as soon as a child searcher is finished and remove it from the disjunction searcher's children.	2016-10-10 22:47:11 -07:00
Steve Yen	535b746b41	close resources when error on PrefixSearcher initialization	2016-10-10 17:29:59 -07:00
Steve Yen	2a022830f0	check FieldDictPrefix err result in prefix searcher	2016-10-10 15:35:54 -07:00
Marty Schoch	8e784c362b	another golint suggestions	2016-10-02 11:54:04 -04:00
Marty Schoch	3a276153a3	actually rename packages to singular, not just directory name	2016-10-02 10:29:39 -04:00
Marty Schoch	2332455bd2	nicer formatting of license header	2016-10-02 10:13:14 -04:00

1 2

51 Commits