0
0
Commit Graph

1267 Commits

Author SHA1 Message Date
Sreekanth Sivasankaran
2a09857657 MB-20793 : Validation for min/max/start/end params for numeric/date range facets
Updated the comments in UTs.
2017-02-02 13:10:09 +05:30
Sreekanth Sivasankaran
c6f96f081d MB-20793 : Validation for min/max/start/end params for numeric/date range facets
Added few more cases in the unit tests.
2017-02-02 13:05:49 +05:30
Sreekanth Sivasankaran
78686c3fa3 MB-20793 : Validation for min/max/start/end params for numeric/date range facets
Corrected the validation and updated the unit tests.
2017-02-02 12:15:48 +05:30
Sreekanth Sivasankaran
f514ac7867 MB-20793: Validation for min/max/start/end params for numeric/date range facets
Improved the validations for date and numeric range queries for facets
2017-02-01 14:48:47 +05:30
Marty Schoch
12a7257b5f remove duplicate code suggested by review from @steveyen 2017-01-31 15:12:06 -05:00
Marty Schoch
a3ee71ddbb Merge pull request #525 from mschoch/refactor-phrase
refactor phrase search into seprate methods
2017-01-31 15:06:42 -05:00
Marty Schoch
7fd8aeb50a refactor phrase search into seprate methods
at the core, the Next() method moves another searcher forward
and checks each hit to see if it also satisfies the phrase
constraints.  the current implementation has 4 nested for loops.
these nested loops make it harder read (indentation) and harder
to reason about (complexity).

this refactor does not remove any loops, it simply moves some
of the inner loops into separate methods so that one can
more easily reason about the parts separately.
2017-01-31 13:32:46 -05:00
Marty Schoch
d48d2b6c68 Merge pull request #524 from mschoch/fix523
fix edge ngram output when side=Back and input token len=max
2017-01-31 12:03:26 -05:00
Marty Schoch
782dbecfe1 fix edge ngram output when side=Back and input token len=max
edge condition was incorreclty checked
fixes #523
2017-01-30 20:29:20 -05:00
Marty Schoch
d40cfb0870 Merge pull request #521 from mschoch/improved-backindex-row
INDEX FORMAT CHANGE: change back index row value
2017-01-24 16:07:47 -05:00
Marty Schoch
606fd6344b INDEX FORMAT CHANGE: change back index row value
Previously term entries were encoded pairwise (field/term), so
you'd have data like:

F1/T1 F1/T2 F1/T3 F2/T4 F3/T5

As you can see, even though field 1 has 3 terms, we repeat the F1
part in the encoded data.  This is a bit wasteful.

In the new format we encode it as a list of terms for each field:

F1/T1,T2,T3 F2/T4 F3/T5

When fields have multiple terms, this saves space.  In unit
tests there is no additional waste even in the case that a field
has only a single value.

Here are the results of an indexing test case (beer-search):

$ benchcmp indexing-before.txt indexing-after.txt
benchmark               old ns/op       new ns/op       delta
BenchmarkIndexing-4     11275835988     10745514321     -4.70%

benchmark               old allocs     new allocs     delta
BenchmarkIndexing-4     25230685       22480494       -10.90%

benchmark               old bytes      new bytes      delta
BenchmarkIndexing-4     4802816224     4741641856     -1.27%

And here are the results of a MatchAll search building a facet
on the "abv" field:

$ benchcmp facet-before.txt facet-after.txt
benchmark             old ns/op     new ns/op     delta
BenchmarkFacets-4     439762100     228064575     -48.14%

benchmark             old allocs     new allocs     delta
BenchmarkFacets-4     9460208        3723286        -60.64%

benchmark             old bytes     new bytes     delta
BenchmarkFacets-4     260784261     151746483     -41.81%

Although we expect the index to be smaller in many cases, the
beer-search index is about the same in this case.  However,
this may be due to the underlying storage (boltdb) in this case.

Finally, the index version was bumped from 5 to 7, since smolder
also used version 6, which could lead to some confusion.
2017-01-24 15:39:38 -05:00
Marty Schoch
f94a790156 Merge pull request #520 from mschoch/faster_regexp
improve performance of regular expression and wildcard queries
2017-01-18 16:31:49 -05:00
Marty Schoch
b55c9043b9 improve performance of regular expression and wildcard queries
While researching an observed performance issue with wildcard
queries, it was observed that the LiteralPrefix() method on
the regexp.Regexp struct did not always behave as expected.

In particular, when the pattern starts with ^, AND involves
some backtracking, the LiteralPrefix() seems to always be the
empty string.

The side-effect of this is that we rely on having a helpful
prefix, to reduce the number of terms in the term dictionary
that need to be visited.

This change now makes the searcher enforce start/end on the term
directly, by using FindStringIndex() instead of Match().
Next, we also modified WildcardQuery and RegexpQuery to no
longer include the ^ and $ modifiers.

Documentation was also udpated to instruct users that they should
not include the ^ and $ modifiers in their patterns.
2017-01-18 16:22:16 -05:00
Marty Schoch
72731336bf Merge pull request #517 from minagawa-sho/fix-confusing-variable-name
fix the confusing variable name
2017-01-14 09:00:34 -05:00
Sho Minagawa
5537688394 fix the confusing variable name 2017-01-14 20:26:08 +09:00
Marty Schoch
269cc302e3 Merge pull request #514 from steveyen/master
more upsidedown optimizations
2017-01-10 09:15:04 -05:00
Steve Yen
5927224e15 optimize mergeOldAndNew for case of first time a doc is seen 2017-01-09 22:48:58 -08:00
Steve Yen
790f2e3e32 optimize by alloc'ing arrays of TermFrequencyRow/TermVector 2017-01-09 22:42:00 -08:00
Marty Schoch
8cd6040b63 Merge pull request #512 from steveyen/master
API change: optional SearchRequest.IncludeLocations flag
2017-01-09 14:19:17 -05:00
Marty Schoch
ae219d6397 Merge pull request #489 from Shugyousha/refactorphrasesearch
Refactor PhraseSearcher
2017-01-09 14:13:22 -05:00
Steve Yen
8f4726ab10 use struct{}{} idiom instead of additional mark var 2017-01-09 10:17:26 -08:00
Marty Schoch
d081ed712a Merge pull request #513 from mosuka/master
renamed detect_lang to detectlang
2017-01-09 09:17:57 -05:00
Minoru Osuka
63c0d9a4d2 renamed detect_lang to detectlang
renamed detect_lang to detectlang.
2017-01-09 16:51:48 +09:00
Steve Yen
302cac72c4 optimize mergeOldAndNew when non-update case 2017-01-08 17:59:49 -08:00
Steve Yen
931d133024 go fmt and go vet 2017-01-07 22:14:22 -08:00
Steve Yen
40780254ae optimize upsidedown mergeOldAndNew existing key maps
The optimization is to provide a better initial size to the map
constructor and to use a 0-byte-sized struct{} as the map values.
2017-01-07 22:05:55 -08:00
Steve Yen
c2bafa2a51 optimize term vectors/locations via preallocated arrays
The change should hit the allocator less often when processing term
vectors/locations as it preallocates larger, contiguous arrays of
records upfront.
2017-01-07 12:34:06 -08:00
Steve Yen
8b140d84c4 minor optimization of upsidedown backIndexRowForDoc
This change might allow a smart enough golang compiler to perhaps
allocate a backIndexRow on the stack rather than the heap.
2017-01-07 11:49:42 -08:00
Steve Yen
89a1cefde1 API change: optional SearchRequest.IncludeLocations flag
This is a change in search result behavior in that location
information is no longer provided by default with search results.

Although this looks like a wide-ranging change, it's mostly a
mechanical replacement of the explain bool flag with a new
search.SearcherOptions struct, which holds both the Explain bool flag
and the IncludeTermVectors bool flag.
2017-01-05 21:11:22 -08:00
Steve Yen
c21d27e15a upsidedown TermFieldReader checks includeTermVectors flag param
The flag was part of the API, but wasn't previously checked.
2017-01-05 21:10:27 -08:00
Marty Schoch
3b2bc30b54 fix type identification when object indexed is pointer to struct
fixes #508
2016-12-08 08:07:38 -05:00
Marty Schoch
d4f21a6290 Merge pull request #503 from steveyen/master
bleve/index/store/moss - accessor for underlying mossStore
2016-12-05 16:41:05 -05:00
Steve Yen
37490864ce bleve/index/store/moss - accessor for underlying mossStore
This change adds methods that provide access to the actual, underlying
mossStore instance in the bleve/index/store/moss KVStore adaptor.

This enables applications to utilize advanced, mossStore-specific
features (such as partial rollback of indexes).  See also
https://issues.couchbase.com/browse/MB-17805
2016-12-05 12:25:29 -08:00
Marty Schoch
c351931701 Merge branch 'slavikm-master4' 2016-11-28 15:00:48 -05:00
Marty Schoch
c927e124dd Merge branch 'master' of https://github.com/slavikm/bleve into slavikm-master4 2016-11-28 14:03:35 -05:00
slavikm
75c8c0e2b1 Revert the nil protection which is not needed 2016-11-23 09:26:07 -08:00
slavikm
20b847f04e Added protection again nil Boost 2016-11-22 13:04:36 -08:00
slavikm
a4c94e440e Added missing boost getters 2016-11-22 12:50:08 -08:00
Marty Schoch
58fe9b9562 Merge pull request #502 from pmezard/fix-docidreader-next-doc
index: DocIDReader.Next() returns nil when done not io.EOF
2016-11-20 13:16:35 -05:00
Patrick Mezard
c81fd6fdb0 index: DocIDReader.Next() returns nil when done not io.EOF 2016-11-20 19:05:35 +01:00
Marty Schoch
3da28dfbc1 Merge pull request #499 from mschoch/498
add support for parsing BoolFieldQuery from JSON
2016-11-16 11:50:44 -05:00
Marty Schoch
d372602f3c add support for parsing BoolFieldQuery from JSON
presence of the "bool" key triggers parsing as a BoolFieldQuery
fixes #498
2016-11-15 10:29:11 -05:00
slavikm
187d6013df Make sure getters follow the Go convention 2016-11-14 15:30:07 -08:00
slavikm
339ddbe0fa Added getters to boost and field query interfaces 2016-11-14 14:02:43 -08:00
Silvan Jegen
1a6a4c493b Check locations in the phrase searcher as well 2016-11-08 20:05:36 +01:00
Silvan Jegen
33e2432fc6 Initialize the return value as late as possible 2016-11-08 20:05:36 +01:00
Silvan Jegen
3dd363afaa Don't search the same term twice
We have searched for the first term in the phrase query already so we
can skip it. Before doing so we have to add the location of the first
term.
2016-11-08 20:05:04 +01:00
Silvan Jegen
d87b4f88bf Refactor phrase searching
Reduce nesting by using early continues.
2016-11-08 20:04:28 +01:00
Marty Schoch
bcaea084c5 Merge pull request #496 from mschoch/fix495
fix date facets when using MultiSearch
2016-11-04 15:06:57 -04:00
Marty Schoch
8e2159cbe4 Merge pull request #494 from steveyen/MB-21474
simplified MultiSearch requires that indexes honor context deadlines
2016-11-04 15:06:47 -04:00