bleve

Author	SHA1	Message	Date
Marty Schoch	2ba915b929	add additional parens to clarify logic	2017-02-10 20:22:32 -05:00
Marty Schoch	56a79528c3	update match_phrase query to handle multiple tokens in same pos we now use a multiphrase query in all cases internally its optimized to be the same as regular phrase query anyway, and we simplly map all the tokens in the stream into a multi-phrase query with the appropriate structure	2017-02-10 17:12:13 -05:00
Marty Schoch	a5d1d7974c	add query support for multi-phrase when parsing json, when we encounter the key "terms", we first try to parse as traditional phrase query, then if that fails, we also try parsing it as multi-phrase	2017-02-10 16:46:38 -05:00
Marty Schoch	c6085d8cdc	address initial code review comments	2017-02-10 15:22:14 -05:00
Marty Schoch	09d00829db	phrase searcher now supports multi-phrase backwards compatability maintained through previous constructor very basic test added (not sufficient)	2017-02-10 15:17:50 -05:00
Marty Schoch	9c8e1e82de	add initial low-level support for multi-phrase this adds basic multi-phrase support, a shim to keep the top-level working and unit tests for new multi-phrase cases	2017-02-10 13:16:05 -05:00
Marty Schoch	4e38c49287	move phrase search logic into phrase searcher the logic of how a phrase search works should be an internal detail of the phrase searcher. further, these changes will allow proper scoring of phrase matches, which require access to the underlying searcher objects, which were hidden in the previous approach.	2017-02-10 12:05:01 -05:00
Marty Schoch	97a428f5b0	Merge pull request #531 from mschoch/losefloat remove use of float64 to represent int things	2017-02-09 20:25:20 -05:00
Marty Schoch	8096d9fb90	remove use of float64 to represent int things this originated from a misunderstanding of mine going back several years. the values need not be float64 just because we plan to serialize them as json. there are still larger questions about what the right type should be, and where should any conversions go. but, this commit simply attempts to address the most egregious problems	2017-02-09 20:15:59 -05:00
Marty Schoch	0c87b7bff1	Merge pull request #527 from mschoch/recursive_phrase refactor phrase search to be recursive	2017-02-09 18:10:33 -05:00
Marty Schoch	87df597b21	add 'used by' badge	2017-02-09 16:40:37 -05:00
Marty Schoch	232fc80dad	add support for phrase slop to internals of phrase searcher phrase slop is not yet supported on the frontend added lots of tests around slop	2017-02-09 15:59:51 -05:00
Marty Schoch	4da7756f67	Merge pull request #530 from steveyen/master optimizations around search / DocumentMatchPool	2017-02-09 15:44:04 -05:00
Steve Yen	0b70a1bcb8	use inlined prealloc'ed termFreqRow in upsidedown termFieldReader	2017-02-08 18:23:13 -08:00
Steve Yen	31fecc3663	avoid row alloc's in upsidedown termFieldReader constructor	2017-02-08 18:14:30 -08:00
Steve Yen	470516d973	DocumentMatchPool hits allocator outside of loop	2017-02-06 14:26:59 -08:00
Marty Schoch	50c43bfef6	Merge pull request #528 from bfontaine/syntax README: Use go syntax highlighting	2017-02-04 09:52:58 -05:00
Baptiste Fontaine	7e2ce1cf9e	README: Use go syntax highlighting	2017-02-04 12:20:56 +01:00
Marty Schoch	f82638c117	refactor phrase search to be recursive a more correct solution that will enable us to extend in two important ways: 1) support slop 2) support multi-phrase	2017-02-03 16:05:21 -05:00
Marty Schoch	101ecfe972	Merge pull request #526 from sreekanth-cb/improved_facet_range_validations MB-20793: Validation for min/max/start/end params for numeric/date ra…	2017-02-03 14:34:19 -05:00
Sreekanth Sivasankaran	029d4c73d9	clean up of unit test.	2017-02-02 23:33:26 +05:30
Sreekanth Sivasankaran	c1d28bb2fc	Moving the tests to the table driven test pattern.	2017-02-02 23:31:00 +05:30
Sreekanth Sivasankaran	2a09857657	MB-20793 : Validation for min/max/start/end params for numeric/date range facets Updated the comments in UTs.	2017-02-02 13:10:09 +05:30
Sreekanth Sivasankaran	c6f96f081d	MB-20793 : Validation for min/max/start/end params for numeric/date range facets Added few more cases in the unit tests.	2017-02-02 13:05:49 +05:30
Sreekanth Sivasankaran	78686c3fa3	MB-20793 : Validation for min/max/start/end params for numeric/date range facets Corrected the validation and updated the unit tests.	2017-02-02 12:15:48 +05:30
Sreekanth Sivasankaran	f514ac7867	MB-20793: Validation for min/max/start/end params for numeric/date range facets Improved the validations for date and numeric range queries for facets	2017-02-01 14:48:47 +05:30
Marty Schoch	12a7257b5f	remove duplicate code suggested by review from @steveyen	2017-01-31 15:12:06 -05:00
Marty Schoch	a3ee71ddbb	Merge pull request #525 from mschoch/refactor-phrase refactor phrase search into seprate methods	2017-01-31 15:06:42 -05:00
Marty Schoch	7fd8aeb50a	refactor phrase search into seprate methods at the core, the Next() method moves another searcher forward and checks each hit to see if it also satisfies the phrase constraints. the current implementation has 4 nested for loops. these nested loops make it harder read (indentation) and harder to reason about (complexity). this refactor does not remove any loops, it simply moves some of the inner loops into separate methods so that one can more easily reason about the parts separately.	2017-01-31 13:32:46 -05:00
Marty Schoch	d48d2b6c68	Merge pull request #524 from mschoch/fix523 fix edge ngram output when side=Back and input token len=max	2017-01-31 12:03:26 -05:00
Marty Schoch	782dbecfe1	fix edge ngram output when side=Back and input token len=max edge condition was incorreclty checked fixes #523	2017-01-30 20:29:20 -05:00
Marty Schoch	d40cfb0870	Merge pull request #521 from mschoch/improved-backindex-row INDEX FORMAT CHANGE: change back index row value	2017-01-24 16:07:47 -05:00
Marty Schoch	606fd6344b	INDEX FORMAT CHANGE: change back index row value Previously term entries were encoded pairwise (field/term), so you'd have data like: F1/T1 F1/T2 F1/T3 F2/T4 F3/T5 As you can see, even though field 1 has 3 terms, we repeat the F1 part in the encoded data. This is a bit wasteful. In the new format we encode it as a list of terms for each field: F1/T1,T2,T3 F2/T4 F3/T5 When fields have multiple terms, this saves space. In unit tests there is no additional waste even in the case that a field has only a single value. Here are the results of an indexing test case (beer-search): $ benchcmp indexing-before.txt indexing-after.txt benchmark old ns/op new ns/op delta BenchmarkIndexing-4 11275835988 10745514321 -4.70% benchmark old allocs new allocs delta BenchmarkIndexing-4 25230685 22480494 -10.90% benchmark old bytes new bytes delta BenchmarkIndexing-4 4802816224 4741641856 -1.27% And here are the results of a MatchAll search building a facet on the "abv" field: $ benchcmp facet-before.txt facet-after.txt benchmark old ns/op new ns/op delta BenchmarkFacets-4 439762100 228064575 -48.14% benchmark old allocs new allocs delta BenchmarkFacets-4 9460208 3723286 -60.64% benchmark old bytes new bytes delta BenchmarkFacets-4 260784261 151746483 -41.81% Although we expect the index to be smaller in many cases, the beer-search index is about the same in this case. However, this may be due to the underlying storage (boltdb) in this case. Finally, the index version was bumped from 5 to 7, since smolder also used version 6, which could lead to some confusion.	2017-01-24 15:39:38 -05:00
Marty Schoch	f94a790156	Merge pull request #520 from mschoch/faster_regexp improve performance of regular expression and wildcard queries	2017-01-18 16:31:49 -05:00
Marty Schoch	b55c9043b9	improve performance of regular expression and wildcard queries While researching an observed performance issue with wildcard queries, it was observed that the LiteralPrefix() method on the regexp.Regexp struct did not always behave as expected. In particular, when the pattern starts with ^, AND involves some backtracking, the LiteralPrefix() seems to always be the empty string. The side-effect of this is that we rely on having a helpful prefix, to reduce the number of terms in the term dictionary that need to be visited. This change now makes the searcher enforce start/end on the term directly, by using FindStringIndex() instead of Match(). Next, we also modified WildcardQuery and RegexpQuery to no longer include the ^ and $ modifiers. Documentation was also udpated to instruct users that they should not include the ^ and $ modifiers in their patterns.	2017-01-18 16:22:16 -05:00
Marty Schoch	72731336bf	Merge pull request #517 from minagawa-sho/fix-confusing-variable-name fix the confusing variable name	2017-01-14 09:00:34 -05:00
Sho Minagawa	5537688394	fix the confusing variable name	2017-01-14 20:26:08 +09:00
Marty Schoch	269cc302e3	Merge pull request #514 from steveyen/master more upsidedown optimizations	2017-01-10 09:15:04 -05:00
Steve Yen	5927224e15	optimize mergeOldAndNew for case of first time a doc is seen	2017-01-09 22:48:58 -08:00
Steve Yen	790f2e3e32	optimize by alloc'ing arrays of TermFrequencyRow/TermVector	2017-01-09 22:42:00 -08:00
Marty Schoch	8cd6040b63	Merge pull request #512 from steveyen/master API change: optional SearchRequest.IncludeLocations flag	2017-01-09 14:19:17 -05:00
Marty Schoch	ae219d6397	Merge pull request #489 from Shugyousha/refactorphrasesearch Refactor PhraseSearcher	2017-01-09 14:13:22 -05:00
Steve Yen	8f4726ab10	use struct{}{} idiom instead of additional mark var	2017-01-09 10:17:26 -08:00
Marty Schoch	d081ed712a	Merge pull request #513 from mosuka/master renamed detect_lang to detectlang	2017-01-09 09:17:57 -05:00
Minoru Osuka	63c0d9a4d2	renamed detect_lang to detectlang renamed detect_lang to detectlang.	2017-01-09 16:51:48 +09:00
Steve Yen	302cac72c4	optimize mergeOldAndNew when non-update case	2017-01-08 17:59:49 -08:00
Steve Yen	931d133024	go fmt and go vet	2017-01-07 22:14:22 -08:00
Steve Yen	40780254ae	optimize upsidedown mergeOldAndNew existing key maps The optimization is to provide a better initial size to the map constructor and to use a 0-byte-sized struct{} as the map values.	2017-01-07 22:05:55 -08:00
Steve Yen	c2bafa2a51	optimize term vectors/locations via preallocated arrays The change should hit the allocator less often when processing term vectors/locations as it preallocates larger, contiguous arrays of records upfront.	2017-01-07 12:34:06 -08:00
Steve Yen	8b140d84c4	minor optimization of upsidedown backIndexRowForDoc This change might allow a smart enough golang compiler to perhaps allocate a backIndexRow on the stack rather than the heap.	2017-01-07 11:49:42 -08:00

1 2 3 4 5 ...

1289 Commits