bleve

Author	SHA1	Message	Date
Marty Schoch	da766263f9	Merge branch 'master' of https://github.com/slavikm/bleve into slavikm-master3	2016-09-21 13:04:07 -04:00
Marty Schoch	0236043f65	rewrite links suitable for blevesearch website	2016-09-21 12:58:18 -04:00
slavikm	3eec1ae16c	Satisfy errcheck	2016-09-21 17:56:03 +03:00
slavikm	40c1dc076f	Now, without the rollback	2016-09-21 16:15:06 +03:00
slavikm	588f379962	Commit if there is no error, rollback otherwise	2016-09-21 16:13:47 +03:00
slavikm	ac49306077	Make sure that the transaction is closed if there is an error	2016-09-21 14:32:05 +03:00
Steve Yen	6d6fae2895	optimize when disjunction query has only a single child On my dev laptop, the bleve-query benchmark of query-string "+text:afternoon +text:coffee" (which gets parsed into a conjection of disjunctions) had throughput of 308qps before this change, and after this change was 342qps.	2016-09-20 23:09:26 -07:00
Steve Yen	38bd2fc058	ConjunctionSearcher avoids one internal id comparison	2016-09-20 23:03:15 -07:00
Steve Yen	cdfa2710fb	moved ConjunctionSearcher fields for alignment	2016-09-20 22:29:40 -07:00
Steve Yen	e344582021	optimize DisjunctionSearcher.Next() This change simplifies and removes the DisjunctionSearcher.currentID tracking, and instead utilizes the the matching/matchingIdxs slices for tracking the required information. As the core of the optimization, the previous code used two loop passses to compare the internal ID's to the currentID field. This commit instead optimizes to have a single pass to both compare the internalID's and to also maintain the matching/matchingIdxs arrays. On my dev box, using a bleve-query benchmark on a wiki dataset, with query-string of "text:afternoon text:coffee", the previous code had throughput of 958qps, and this commit has 1174qps.	2016-09-20 19:22:37 -07:00
Steve Yen	75281a1f9f	change DisjunctionSearcher.min type from float64 to int	2016-09-20 18:51:40 -07:00
Steve Yen	16dac98f71	optimization when boolean query has should constituents only A common search case is when a user performs a query-string query, such as for "the lazy dog". That would be parsed into a boolean query with a nil Must child, a nil MustNot child, and a non-nil Should child (a disjunction query for "the", "lazy", "dog"). The optimization in this case is to return just the Should child directly, skipping any additional Must and MustNot overhead. On a dev box bleve-query benchmark on a wiki index with a query string of "text:afternoon text:coffee", the throughput was previously 873qps and with this change hits 940qps.	2016-09-20 18:26:00 -07:00
Steve Yen	46a46357a7	simplify BooleanSearcher mustSearcher else logic	2016-09-20 18:11:04 -07:00
Marty Schoch	949ea6397c	Merge pull request #438 from mschoch/buildtagdocs add build tag protecting merge-coverprofile	2016-09-20 14:44:14 -04:00
Marty Schoch	85b61a8631	add build tag protecting merge-coverprofile this should prevent people that run: go get github.com/blevesearch/bleve/... from getting a useless "docs" program in their bin/ dir	2016-09-20 14:29:01 -04:00
Marty Schoch	60ef1c89dc	Merge pull request #430 from mschoch/newblevetool migrated all bleve utils into single bleve command	2016-09-20 14:11:35 -04:00
Marty Schoch	0d52d2f8ea	add build tag to ignore gendocs by default	2016-09-20 13:58:59 -04:00
Marty Schoch	81e676de79	improved usage and added utility to generate markdown docs	2016-09-20 13:42:45 -04:00
Marty Schoch	b896537eff	Merge pull request #437 from steveyen/perf-boolean-searcher optimize boolean search Next() with fewer id comparisons	2016-09-20 12:59:38 -04:00
Steve Yen	3acad78875	optimize boolean search Next() with fewer id comparisons This change to the BooleanSearcher.Next() tries to perform fewer internal id comparisons.	2016-09-20 09:43:01 -07:00
Marty Schoch	58a5ac2c45	Merge pull request #433 from steveyen/perf-misc miscellaneous search perf tweaks	2016-09-18 14:12:34 -04:00
Steve Yen	26b621e916	reuse backing array of matches for boolean searcher The reused backing array of constituent matches should help avoid additional memory allocations.	2016-09-18 10:43:29 -07:00
Steve Yen	dd7cb14a56	disjunction searcher avoids second ID.Equals() comparison Optimization for DisjunctionSearcher, where an extra matchingIdxs helps track the currs that were matching. This avoids the previous code's second loop through the currs slice.	2016-09-18 10:43:16 -07:00
Steve Yen	090c08eb46	upside_down disjunction searcher reuses matching slice	2016-09-18 10:43:16 -07:00
Marty Schoch	e68f6ca9e6	Merge pull request #432 from steveyen/perf-skip-0xff-scan skip termFrequencyRow 0xFF scan as term length is already known	2016-09-18 12:20:21 -04:00
Steve Yen	b5d2c32b46	skip termFrequencyRow 0xFF scan as term length is already known This commit modifies the upside_down TermFrequencyRow parseKDoc() to skip the ByteSeparator (0xFF) scan, as we already know the term's length in the UpsideDownCouchTermFieldReader. On my dev box, results from bleve-query test on high frequency terms went from previous 107qps to 124qps.	2016-09-18 08:56:05 -07:00
Marty Schoch	c5159251a9	make shingle token filter stateless the previous implementation was incorectly stateful, which violates the contract for token filters fixes #431	2016-09-15 08:59:43 -04:00
Marty Schoch	ffee3c3764	fixed regexp tokenizers to not produce empty tokens	2016-09-14 16:22:20 -04:00
Marty Schoch	c87cf35ace	migrated all bleve utils into single bleve command used spf13/cobra to make it awesome and attempting to vendor this new dep	2016-09-14 11:52:29 -04:00
Marty Schoch	d01ff4ad8a	Merge pull request #429 from mschoch/apicleanup BREAKING CHANGE - removed DumpXXX() methods from bleve.Index	2016-09-13 15:42:42 -04:00
Marty Schoch	3fd2a64872	BREAKING CHANGE - removed DumpXXX() methods from bleve.Index The DumpXXX() methods were always documented as internal and unsupported. However, now they are being removed from the public top-level API. They are still available on the internal IndexReader, which can be accessed using the Advanced() method. The DocCount() and DumpXXX() methods on the internal index have moved to the internal index reader, since they logically operate on a snapshot of an index.	2016-09-13 12:40:01 -04:00
Marty Schoch	e1fb860a86	removed unused AsyncIndex interface	2016-09-13 08:42:36 -04:00
Marty Schoch	34ebd6ab08	Merge pull request #421 from mschoch/smolder Smolder	2016-09-12 14:47:53 -04:00
Marty Schoch	0574ba4979	add cuckoofilter and gofarmhash to manifest	2016-09-12 14:34:07 -04:00
Marty Schoch	23755049e8	slight tweak to API to only encode docNum->docNumBytes once	2016-09-11 20:29:16 -04:00
Marty Schoch	035b7c91fc	fix unchecked err	2016-09-11 20:29:15 -04:00
Marty Schoch	bbfa6406ea	fix test expectation to use ext ids not internal ones the test had incorreclty been updated to compare the internal document ids, but these are opaque and may not be the expected ids in some cases, the test should simply check that it corresponds to the correct external ids	2016-09-11 20:29:15 -04:00
Marty Schoch	36000f1a1b	fix api changes and test after merge	2016-09-11 20:29:15 -04:00
Marty Schoch	1b68c4ec5b	make backindex rows more compact, fix bug counting docs on start	2016-09-11 20:29:15 -04:00
Marty Schoch	d3ca5424e2	added cuckoo filter, perf improves overall from upside_down though only slightly	2016-09-11 20:29:15 -04:00
Marty Schoch	07ab49f602	fix bug counting docs and make smolder selectable	2016-09-11 20:29:15 -04:00
Marty Schoch	04fd62dec3	further tweaks, now all bleve tests pass	2016-09-11 20:29:15 -04:00
Marty Schoch	1b10c286e7	adding initial attempt at numeric ids in index index scheme is named smolder compiles and unit tests pass, that is all	2016-09-11 20:29:15 -04:00
Marty Schoch	da9339bcdf	refactor FinalizeID into ExternalID and InternalID	2016-09-11 20:29:14 -04:00
Marty Schoch	f531835d5c	Merge pull request #420 from steveyen/MB-20590 index/store/moss KV backend propagates mossStore's Stats()	2016-09-11 20:28:29 -04:00
Marty Schoch	5cf50ec338	Merge pull request #418 from dtylman/master fix for #416	2016-09-11 20:26:24 -04:00
Marty Schoch	ee61b2e866	Merge pull request #425 from mschoch/porterfaster improve perf of porter stemmer	2016-09-11 20:22:23 -04:00
Marty Schoch	f8e8c9d065	Merge pull request #426 from mschoch/fasterbuildterms encode runes directly into buffer	2016-09-11 20:19:09 -04:00
Marty Schoch	44ff6ced8a	improve perf of porter stemmer 1. porter stemmer offers method to NOT do lowercasing, however to use this we must convert to runes first ourself, so we did this 2. now we can invoke the version that skips lowercasing, we already do this ourselves before stemming through separate filter due to the fact that the stemmer modifies the runes in place we have no way to know if there were changes, thus we must always encode back into the term byte slice added unit test which catches the problem found NOTE this uses analysis.BuildTermFromRunes so perf gain is only visible with other PR also merged future gains are possible if we udpate the stemmer to let us know if changes were made, thus skipping re-encoding to []byte when no changes were actually made	2016-09-11 20:13:15 -04:00
Marty Schoch	c13626be45	encode runes directly into buffer avoid allocating unnecessary intermediate buffer also introduce new method to let a user optimistically try and encode back into an existing buffer, if it isn't large enough, it silently allocates a new one and returns it	2016-09-11 20:10:03 -04:00

... 11 12 13 14 15 ...

1699 Commits