bleve

Author	SHA1	Message	Date
Marty Schoch	f531835d5c	Merge pull request #420 from steveyen/MB-20590 index/store/moss KV backend propagates mossStore's Stats()	2016-09-11 20:28:29 -04:00
Marty Schoch	5cf50ec338	Merge pull request #418 from dtylman/master fix for #416	2016-09-11 20:26:24 -04:00
Marty Schoch	ee61b2e866	Merge pull request #425 from mschoch/porterfaster improve perf of porter stemmer	2016-09-11 20:22:23 -04:00
Marty Schoch	f8e8c9d065	Merge pull request #426 from mschoch/fasterbuildterms encode runes directly into buffer	2016-09-11 20:19:09 -04:00
Marty Schoch	44ff6ced8a	improve perf of porter stemmer 1. porter stemmer offers method to NOT do lowercasing, however to use this we must convert to runes first ourself, so we did this 2. now we can invoke the version that skips lowercasing, we already do this ourselves before stemming through separate filter due to the fact that the stemmer modifies the runes in place we have no way to know if there were changes, thus we must always encode back into the term byte slice added unit test which catches the problem found NOTE this uses analysis.BuildTermFromRunes so perf gain is only visible with other PR also merged future gains are possible if we udpate the stemmer to let us know if changes were made, thus skipping re-encoding to []byte when no changes were actually made	2016-09-11 20:13:15 -04:00
Marty Schoch	c13626be45	encode runes directly into buffer avoid allocating unnecessary intermediate buffer also introduce new method to let a user optimistically try and encode back into an existing buffer, if it isn't large enough, it silently allocates a new one and returns it	2016-09-11 20:10:03 -04:00
Marty Schoch	56c7b9f831	Merge pull request #423 from mschoch/stopfilterfaster avoid allocation in stop token filter	2016-09-11 13:59:31 -04:00
Marty Schoch	5ed9f67b0b	Merge pull request #424 from mschoch/possessivefaster speed up english possessive filter	2016-09-11 13:26:50 -04:00
Marty Schoch	9e9f172f81	speed up english possessive filter previous impl always did full utf8 decode of rune if we assume most tokens are not possessive this is unnecessary and even if they are, we only need to chop off last to runes so, now we only decode last rune of token, and if it looks like s/S then we proceed to decode second to last rune, and then only if it looks like any form of apostrophe, do we make any changes to token, again by just reslicing original to chop off the possessive extension	2016-09-11 12:55:03 -04:00
Marty Schoch	faa07ac3a6	avoid allocation in stop token filter the token stream resulting from the removal of stop words must be shorter or the same length as the original, so we just reuse it and truncate it at the end.	2016-09-11 12:29:33 -04:00
Steve Yen	e8cc3c6bdd	index/store/moss KV backend propagates mossStore's Stats() This change depends on the recently introduced mossStore Stats() API in github.com/couchbase/moss 564bdbc0 commit. So, gvt for moss has been updated as part of this change. Most of the change involves propagating the mossStore instance (the statsFunc callback) so that it's accessible to the KVStore.Stats() method. See also: http://review.couchbase.org/#/c/67524/	2016-09-08 17:12:04 -07:00
Danny Tylman	6c52907f2b	fixes #416 : panic in collector_heap	2016-09-08 11:40:53 +03:00
Marty Schoch	b961d742c1	Merge branch 'bcampbell-sedtweak'	2016-09-01 13:56:11 -04:00
Marty Schoch	67755618e9	Merge branch 'sedtweak' of https://github.com/bcampbell/bleve into bcampbell-sedtweak	2016-09-01 13:55:15 -04:00
Marty Schoch	5023993895	replaced nex lexer with custom lexer this improvement was started to improve code coverage but also improves performance and adds support for escaping escaping: The following quoted string enumerates the characters which may be escaped. "+-=&\|><!(){}[]^\"~*?:\\/ " Note that this list includes space. In order to escape these characters, they are prefixed with the \ (backslash) character. In all cases, using the escaped version produces the character itself and is not interpretted by the lexer. Two simple examples: my\ name Will be interpretted as a single argument to a match query with the value "my name". "contains a\" character" Will be interpretted as a single argument to a phrase query with the value `contains a " character`. Performance: before$ go test -v -run=xxx -bench=BenchmarkLexer BenchmarkLexer-4 100000 13991 ns/op PASS ok github.com/blevesearch/bleve 1.570s after$ go test -v -run=xxx -bench=BenchmarkLexer BenchmarkLexer-4 500000 3387 ns/op PASS ok github.com/blevesearch/bleve 1.740s	2016-09-01 13:16:07 -04:00
Marty Schoch	46f70bfa12	streamline boost just like tilde	2016-08-31 22:10:44 -04:00
Marty Schoch	37d3750157	simplify parser rules	2016-08-31 21:57:44 -04:00
Marty Schoch	bb285cd0f2	more lexer/parser simplification	2016-08-31 21:53:49 -04:00
Marty Schoch	6c75b7c646	tightening up lexer/parser to prep future work	2016-08-31 21:23:03 -04:00
Marty Schoch	c5465eccb1	change from const to var so apps can adjust value	2016-08-31 16:43:50 -04:00
Marty Schoch	521003d543	Merge pull request #415 from mschoch/fixbug408 cap preallocation by the collector to reasonable value	2016-08-31 16:05:30 -04:00
Marty Schoch	60efecc8e9	cap preallocation by the collector to reasonable value the collector has optimizations to avoid allocation and reslicing during the common case of searching for top hits however, in some cases users request an a very large number of search hits to be returned (attempting to get them all) this caused unnecessary allocation of ram. to address this we introduce a new constant PreAllocSizeSkipCap it defaults the value of 1000. if your search+skip is less than this constant, you get the optimized behavior. if your search+skip is greater than this, we cap the preallcations to this lower value. additional space is acquired on an as needed basis by growing the DocumentMatchPool and reslicing the collector backing slice applications can change the value of PreAllocSizeSkipCap to suit their own needs fixes #408	2016-08-31 15:25:17 -04:00
Marty Schoch	a771e344ae	dont count code coverage support tool in project coverage	2016-08-31 13:52:19 -04:00
Marty Schoch	81282b3c06	remove unused code	2016-08-31 13:52:02 -04:00
Marty Schoch	83a3eecb22	don't count kv store test against code coverage	2016-08-31 13:27:12 -04:00
Marty Schoch	ae4b354c72	Merge pull request #411 from steveyen/master tighter moss KV store iterator handling	2016-08-27 08:00:45 -04:00
Marty Schoch	56d7bbfe1c	fix benchmark names to match values used	2016-08-26 18:09:03 -04:00
Marty Schoch	4a25034ddd	Merge branch 'sort-by-field-try2'	2016-08-26 17:58:38 -04:00
Marty Schoch	b1b93d5ff9	remove unneeded code fixes code review comment from @steveyen	2016-08-26 17:27:19 -04:00
Marty Schoch	c9310b906d	introduced new collector store impl based on slice counter-intuitively the list impl was faster than the heap the theory was the heap did more comparisons and swapping so even though it benefited from no interface and some cache locality, it was still slower the idea was to just use a raw slice kept in order this avoids the need for interface, but can take same comparison approach as the list it seems to work out: go test -run=xxx -bench=. -benchmem -cpuprofile=cpu.out BenchmarkTop10of100000Scores-4 5000 299959 ns/op 2600 B/op 36 allocs/op BenchmarkTop100of100000Scores-4 2000 601104 ns/op 20720 B/op 216 allocs/op BenchmarkTop10of1000000Scores-4 500 3450196 ns/op 2616 B/op 36 allocs/op BenchmarkTop100of1000000Scores-4 500 3874276 ns/op 20856 B/op 216 allocs/op PASS ok github.com/blevesearch/bleve/search/collectors 7.440s	2016-08-26 11:52:49 -04:00
Marty Schoch	47c239ca7b	refactored data structure out of collector the TopNCollector now can either use a heap or a list i did not code it to use an interface, because this is a very hot loop during searching. rather, it lets bleve developers easily toggle between the two (or other ideas) by changing 2 lines The list is faster in the benchmark, but causes more allocations. The list is once again the default (for now). To switch to the heap implementation, change: store collectStoreList to store collectStoreHeap and newStoreList(... to newStoreHeap(...	2016-08-26 10:29:50 -04:00
Marty Schoch	3f8757c05b	slight fixup to last change to set the sort value i'd like the sort value to be correct even with the optimizations not using it	2016-08-25 23:13:22 -04:00
Marty Schoch	931ec677c4	completely avoid dynamic dispatch if only sorting on score	2016-08-25 22:59:08 -04:00
Marty Schoch	127f37212b	cache values to avoid dynamic dispatch inside hot loop	2016-08-25 16:24:26 -04:00
Marty Schoch	60750c1614	improved implementation to address perf regressions primary change is going back to sort values be []string and not []interface{}, this avoid allocatiosn converting into the interface{} that sounds obvious, so why didn't we just do that first? because a common (default) sort is score, which is naturally a number, not a string (like terms). converting into the number was also expensive, and the common case. so, this solution also makes the change to NOT put the score into the sort value list. instead you see the dummy value "_score". this is just a placeholder, the actual sort impl knows that field of the sort is the score, and will sort using the actual score. also, several other aspets of the benchmark were cleaned up so that unnecessary allocations do not pollute the cpu profiles Here are the updated benchmarks: $ go test -run=xxx -bench=. -benchmem -cpuprofile=cpu.out BenchmarkTop10of100000Scores-4 3000 465809 ns/op 2548 B/op 33 allocs/op BenchmarkTop100of100000Scores-4 2000 626488 ns/op 21484 B/op 213 allocs/op BenchmarkTop10of1000000Scores-4 300 5107658 ns/op 2560 B/op 33 allocs/op BenchmarkTop100of1000000Scores-4 300 5275403 ns/op 21624 B/op 213 allocs/op PASS ok github.com/blevesearch/bleve/search/collectors 7.188s Prior to this PR, master reported: $ go test -run=xxx -bench=. -benchmem BenchmarkTop10of100000Scores-4 3000 453269 ns/op 360161 B/op 42 allocs/op BenchmarkTop100of100000Scores-4 2000 519131 ns/op 388275 B/op 219 allocs/op BenchmarkTop10of1000000Scores-4 200 7459004 ns/op 4628236 B/op 52 allocs/op BenchmarkTop100of1000000Scores-4 200 8064864 ns/op 4656596 B/op 232 allocs/op PASS ok github.com/blevesearch/bleve/search/collectors 7.385s So, we're pretty close on the smaller datasets, and we scale better on the larger datasets. We also show fewer allocations and bytes in all cases (some of this is artificial due to test cleanup).	2016-08-25 15:47:07 -04:00
Marty Schoch	ce0b299d6f	switch sort impl to use interface this improves perf in the case where we're not doing any sorting as we avoid allocating memory and converting scores into numeric terms	2016-08-24 19:02:22 -04:00
Marty Schoch	5e94145cf4	apply same colletor benchmark change	2016-08-24 15:56:26 -04:00
Marty Schoch	94489fa778	change collector benchmark to not reuse collector instances they are never reused in practice and the original design did not consider reuse future alternate implementations are not reusable	2016-08-24 15:14:40 -04:00
Marty Schoch	0322ecd441	adjust new sort functionality to also work with MultiSearch	2016-08-24 14:07:10 -04:00
Marty Schoch	1ae938b781	add integration tests for sorting	2016-08-20 14:45:53 -04:00
Steve Yen	eaa59621ff	tighter moss KV store iterator handling	2016-08-19 09:10:03 -07:00
Marty Schoch	2311d060d1	add example usage of SortBy and SortByCustom	2016-08-18 13:03:48 -07:00
Marty Schoch	27f5c6ec92	expose simple string slice based sorting to top-level bleve this change means simple sort requirements no longer require importing the search package (high-level API goal) also the sort test at the top-level was changed to use this form	2016-08-17 14:49:06 -07:00
Marty Schoch	27ba6187bc	adds support for more complex field sorts with object (not string) previously from JSON we would just deserialize strings like "-abv" or "city" or "_id" or "_score" as simple sorts on fields, ids or scores respectively while this is simple and compact, it can be ambiguous (for example if you have a field starting with - or if you have a field named "_id" already. also, this simple syntax doesnt allow us to specify more cmoplex options to deal with type/mode/missing we keep support for the simple string syntax, but now also recognize a more expressive syntax like: { "by": "field", "field": "abv", "desc": true, "type": "string", "mode": "min", "missing": "first" } type, mode and missing are optional and default to "auto", "default", and "last" respectively	2016-08-17 14:33:51 -07:00
Marty Schoch	750e0ac16c	change sort field impl to use indexed values not stored values	2016-08-17 09:20:44 -07:00
Marty Schoch	0d873916f0	support JSON marshal/unmarshal of search request sort The syntax used is an array of strings. The strings "_id" and "_score" are special and reserved to mean sorting on the document id and score repsectively. All other strings refer to the literal field name with that value. If the string is prefixed with "-" the order of that sort is descending, without it, it defaults to ascending. Examples: "sort":["-abv","-_score"] This will sort results in decreasing order of the "abv" field. Results which have the same value of the "abv" field will then be sorted by their score, also decreasing. If no value for "sort" is provided in the search request the default soring is the same as before, which is decreasing score.	2016-08-12 19:16:24 -04:00
Marty Schoch	be56380833	fix SearchRequest parsing to default to proper default sort order	2016-08-12 14:49:22 -04:00
Marty Schoch	0bb69a9a1c	Merge branch 'master' of https://github.com/dtylman/bleve into sort-by-field-try2	2016-08-12 14:23:55 -04:00
Danny Tylman	b585c5786b	removing mock data generation packages from unit-tests fixing wrong sort order on certain fields	2016-08-11 11:35:08 +03:00
Marty Schoch	5f1454106d	Merge pull request #402 from mschoch/indexapiwork Index/Search API work	2016-08-10 12:41:51 -04:00

1 2 3 4 5 ...

1155 Commits