bleve

Author	SHA1	Message	Date
Marty Schoch	0d52d2f8ea	add build tag to ignore gendocs by default	2016-09-20 13:58:59 -04:00
Marty Schoch	81e676de79	improved usage and added utility to generate markdown docs	2016-09-20 13:42:45 -04:00
Marty Schoch	b896537eff	Merge pull request #437 from steveyen/perf-boolean-searcher optimize boolean search Next() with fewer id comparisons	2016-09-20 12:59:38 -04:00
Steve Yen	3acad78875	optimize boolean search Next() with fewer id comparisons This change to the BooleanSearcher.Next() tries to perform fewer internal id comparisons.	2016-09-20 09:43:01 -07:00
Marty Schoch	58a5ac2c45	Merge pull request #433 from steveyen/perf-misc miscellaneous search perf tweaks	2016-09-18 14:12:34 -04:00
Steve Yen	26b621e916	reuse backing array of matches for boolean searcher The reused backing array of constituent matches should help avoid additional memory allocations.	2016-09-18 10:43:29 -07:00
Steve Yen	dd7cb14a56	disjunction searcher avoids second ID.Equals() comparison Optimization for DisjunctionSearcher, where an extra matchingIdxs helps track the currs that were matching. This avoids the previous code's second loop through the currs slice.	2016-09-18 10:43:16 -07:00
Steve Yen	090c08eb46	upside_down disjunction searcher reuses matching slice	2016-09-18 10:43:16 -07:00
Marty Schoch	e68f6ca9e6	Merge pull request #432 from steveyen/perf-skip-0xff-scan skip termFrequencyRow 0xFF scan as term length is already known	2016-09-18 12:20:21 -04:00
Steve Yen	b5d2c32b46	skip termFrequencyRow 0xFF scan as term length is already known This commit modifies the upside_down TermFrequencyRow parseKDoc() to skip the ByteSeparator (0xFF) scan, as we already know the term's length in the UpsideDownCouchTermFieldReader. On my dev box, results from bleve-query test on high frequency terms went from previous 107qps to 124qps.	2016-09-18 08:56:05 -07:00
Marty Schoch	c5159251a9	make shingle token filter stateless the previous implementation was incorectly stateful, which violates the contract for token filters fixes #431	2016-09-15 08:59:43 -04:00
Marty Schoch	ffee3c3764	fixed regexp tokenizers to not produce empty tokens	2016-09-14 16:22:20 -04:00
Marty Schoch	c87cf35ace	migrated all bleve utils into single bleve command used spf13/cobra to make it awesome and attempting to vendor this new dep	2016-09-14 11:52:29 -04:00
Marty Schoch	d01ff4ad8a	Merge pull request #429 from mschoch/apicleanup BREAKING CHANGE - removed DumpXXX() methods from bleve.Index	2016-09-13 15:42:42 -04:00
Marty Schoch	3fd2a64872	BREAKING CHANGE - removed DumpXXX() methods from bleve.Index The DumpXXX() methods were always documented as internal and unsupported. However, now they are being removed from the public top-level API. They are still available on the internal IndexReader, which can be accessed using the Advanced() method. The DocCount() and DumpXXX() methods on the internal index have moved to the internal index reader, since they logically operate on a snapshot of an index.	2016-09-13 12:40:01 -04:00
Marty Schoch	e1fb860a86	removed unused AsyncIndex interface	2016-09-13 08:42:36 -04:00
Marty Schoch	34ebd6ab08	Merge pull request #421 from mschoch/smolder Smolder	2016-09-12 14:47:53 -04:00
Marty Schoch	0574ba4979	add cuckoofilter and gofarmhash to manifest	2016-09-12 14:34:07 -04:00
Marty Schoch	23755049e8	slight tweak to API to only encode docNum->docNumBytes once	2016-09-11 20:29:16 -04:00
Marty Schoch	035b7c91fc	fix unchecked err	2016-09-11 20:29:15 -04:00
Marty Schoch	bbfa6406ea	fix test expectation to use ext ids not internal ones the test had incorreclty been updated to compare the internal document ids, but these are opaque and may not be the expected ids in some cases, the test should simply check that it corresponds to the correct external ids	2016-09-11 20:29:15 -04:00
Marty Schoch	36000f1a1b	fix api changes and test after merge	2016-09-11 20:29:15 -04:00
Marty Schoch	1b68c4ec5b	make backindex rows more compact, fix bug counting docs on start	2016-09-11 20:29:15 -04:00
Marty Schoch	d3ca5424e2	added cuckoo filter, perf improves overall from upside_down though only slightly	2016-09-11 20:29:15 -04:00
Marty Schoch	07ab49f602	fix bug counting docs and make smolder selectable	2016-09-11 20:29:15 -04:00
Marty Schoch	04fd62dec3	further tweaks, now all bleve tests pass	2016-09-11 20:29:15 -04:00
Marty Schoch	1b10c286e7	adding initial attempt at numeric ids in index index scheme is named smolder compiles and unit tests pass, that is all	2016-09-11 20:29:15 -04:00
Marty Schoch	da9339bcdf	refactor FinalizeID into ExternalID and InternalID	2016-09-11 20:29:14 -04:00
Marty Schoch	f531835d5c	Merge pull request #420 from steveyen/MB-20590 index/store/moss KV backend propagates mossStore's Stats()	2016-09-11 20:28:29 -04:00
Marty Schoch	5cf50ec338	Merge pull request #418 from dtylman/master fix for #416	2016-09-11 20:26:24 -04:00
Marty Schoch	ee61b2e866	Merge pull request #425 from mschoch/porterfaster improve perf of porter stemmer	2016-09-11 20:22:23 -04:00
Marty Schoch	f8e8c9d065	Merge pull request #426 from mschoch/fasterbuildterms encode runes directly into buffer	2016-09-11 20:19:09 -04:00
Marty Schoch	44ff6ced8a	improve perf of porter stemmer 1. porter stemmer offers method to NOT do lowercasing, however to use this we must convert to runes first ourself, so we did this 2. now we can invoke the version that skips lowercasing, we already do this ourselves before stemming through separate filter due to the fact that the stemmer modifies the runes in place we have no way to know if there were changes, thus we must always encode back into the term byte slice added unit test which catches the problem found NOTE this uses analysis.BuildTermFromRunes so perf gain is only visible with other PR also merged future gains are possible if we udpate the stemmer to let us know if changes were made, thus skipping re-encoding to []byte when no changes were actually made	2016-09-11 20:13:15 -04:00
Marty Schoch	c13626be45	encode runes directly into buffer avoid allocating unnecessary intermediate buffer also introduce new method to let a user optimistically try and encode back into an existing buffer, if it isn't large enough, it silently allocates a new one and returns it	2016-09-11 20:10:03 -04:00
Marty Schoch	56c7b9f831	Merge pull request #423 from mschoch/stopfilterfaster avoid allocation in stop token filter	2016-09-11 13:59:31 -04:00
Marty Schoch	5ed9f67b0b	Merge pull request #424 from mschoch/possessivefaster speed up english possessive filter	2016-09-11 13:26:50 -04:00
Marty Schoch	9e9f172f81	speed up english possessive filter previous impl always did full utf8 decode of rune if we assume most tokens are not possessive this is unnecessary and even if they are, we only need to chop off last to runes so, now we only decode last rune of token, and if it looks like s/S then we proceed to decode second to last rune, and then only if it looks like any form of apostrophe, do we make any changes to token, again by just reslicing original to chop off the possessive extension	2016-09-11 12:55:03 -04:00
Marty Schoch	faa07ac3a6	avoid allocation in stop token filter the token stream resulting from the removal of stop words must be shorter or the same length as the original, so we just reuse it and truncate it at the end.	2016-09-11 12:29:33 -04:00
Steve Yen	e8cc3c6bdd	index/store/moss KV backend propagates mossStore's Stats() This change depends on the recently introduced mossStore Stats() API in github.com/couchbase/moss 564bdbc0 commit. So, gvt for moss has been updated as part of this change. Most of the change involves propagating the mossStore instance (the statsFunc callback) so that it's accessible to the KVStore.Stats() method. See also: http://review.couchbase.org/#/c/67524/	2016-09-08 17:12:04 -07:00
Danny Tylman	6c52907f2b	fixes #416 : panic in collector_heap	2016-09-08 11:40:53 +03:00
Marty Schoch	b961d742c1	Merge branch 'bcampbell-sedtweak'	2016-09-01 13:56:11 -04:00
Marty Schoch	67755618e9	Merge branch 'sedtweak' of https://github.com/bcampbell/bleve into bcampbell-sedtweak	2016-09-01 13:55:15 -04:00
Marty Schoch	5023993895	replaced nex lexer with custom lexer this improvement was started to improve code coverage but also improves performance and adds support for escaping escaping: The following quoted string enumerates the characters which may be escaped. "+-=&\|><!(){}[]^\"~*?:\\/ " Note that this list includes space. In order to escape these characters, they are prefixed with the \ (backslash) character. In all cases, using the escaped version produces the character itself and is not interpretted by the lexer. Two simple examples: my\ name Will be interpretted as a single argument to a match query with the value "my name". "contains a\" character" Will be interpretted as a single argument to a phrase query with the value `contains a " character`. Performance: before$ go test -v -run=xxx -bench=BenchmarkLexer BenchmarkLexer-4 100000 13991 ns/op PASS ok github.com/blevesearch/bleve 1.570s after$ go test -v -run=xxx -bench=BenchmarkLexer BenchmarkLexer-4 500000 3387 ns/op PASS ok github.com/blevesearch/bleve 1.740s	2016-09-01 13:16:07 -04:00
Marty Schoch	46f70bfa12	streamline boost just like tilde	2016-08-31 22:10:44 -04:00
Marty Schoch	37d3750157	simplify parser rules	2016-08-31 21:57:44 -04:00
Marty Schoch	bb285cd0f2	more lexer/parser simplification	2016-08-31 21:53:49 -04:00
Marty Schoch	6c75b7c646	tightening up lexer/parser to prep future work	2016-08-31 21:23:03 -04:00
Marty Schoch	c5465eccb1	change from const to var so apps can adjust value	2016-08-31 16:43:50 -04:00
Marty Schoch	521003d543	Merge pull request #415 from mschoch/fixbug408 cap preallocation by the collector to reasonable value	2016-08-31 16:05:30 -04:00
Marty Schoch	60efecc8e9	cap preallocation by the collector to reasonable value the collector has optimizations to avoid allocation and reslicing during the common case of searching for top hits however, in some cases users request an a very large number of search hits to be returned (attempting to get them all) this caused unnecessary allocation of ram. to address this we introduce a new constant PreAllocSizeSkipCap it defaults the value of 1000. if your search+skip is less than this constant, you get the optimized behavior. if your search+skip is greater than this, we cap the preallcations to this lower value. additional space is acquired on an as needed basis by growing the DocumentMatchPool and reslicing the collector backing slice applications can change the value of PreAllocSizeSkipCap to suit their own needs fixes #408	2016-08-31 15:25:17 -04:00

... 6 7 8 9 10 ...

1433 Commits