0
0
Commit Graph

1087 Commits

Author SHA1 Message Date
Marty Schoch
0236043f65 rewrite links suitable for blevesearch website 2016-09-21 12:58:18 -04:00
Marty Schoch
949ea6397c Merge pull request #438 from mschoch/buildtagdocs
add build tag protecting merge-coverprofile
2016-09-20 14:44:14 -04:00
Marty Schoch
85b61a8631 add build tag protecting merge-coverprofile
this should prevent people that run:
go get github.com/blevesearch/bleve/...
from getting a useless "docs" program in their bin/ dir
2016-09-20 14:29:01 -04:00
Marty Schoch
60ef1c89dc Merge pull request #430 from mschoch/newblevetool
migrated all bleve utils into single bleve command
2016-09-20 14:11:35 -04:00
Marty Schoch
0d52d2f8ea add build tag to ignore gendocs by default 2016-09-20 13:58:59 -04:00
Marty Schoch
81e676de79 improved usage and added utility to generate markdown docs 2016-09-20 13:42:45 -04:00
Marty Schoch
b896537eff Merge pull request #437 from steveyen/perf-boolean-searcher
optimize boolean search Next() with fewer id comparisons
2016-09-20 12:59:38 -04:00
Steve Yen
3acad78875 optimize boolean search Next() with fewer id comparisons
This change to the BooleanSearcher.Next() tries to perform fewer
internal id comparisons.
2016-09-20 09:43:01 -07:00
Marty Schoch
58a5ac2c45 Merge pull request #433 from steveyen/perf-misc
miscellaneous search perf tweaks
2016-09-18 14:12:34 -04:00
Steve Yen
26b621e916 reuse backing array of matches for boolean searcher
The reused backing array of constituent matches should help avoid
additional memory allocations.
2016-09-18 10:43:29 -07:00
Steve Yen
dd7cb14a56 disjunction searcher avoids second ID.Equals() comparison
Optimization for DisjunctionSearcher, where an extra matchingIdxs
helps track the currs that were matching.  This avoids the previous
code's second loop through the currs slice.
2016-09-18 10:43:16 -07:00
Steve Yen
090c08eb46 upside_down disjunction searcher reuses matching slice 2016-09-18 10:43:16 -07:00
Marty Schoch
e68f6ca9e6 Merge pull request #432 from steveyen/perf-skip-0xff-scan
skip termFrequencyRow 0xFF scan as term length is already known
2016-09-18 12:20:21 -04:00
Steve Yen
b5d2c32b46 skip termFrequencyRow 0xFF scan as term length is already known
This commit modifies the upside_down TermFrequencyRow parseKDoc() to
skip the ByteSeparator (0xFF) scan, as we already know the term's
length in the UpsideDownCouchTermFieldReader.

On my dev box, results from bleve-query test on high frequency terms
went from previous 107qps to 124qps.
2016-09-18 08:56:05 -07:00
Marty Schoch
c5159251a9 make shingle token filter stateless
the previous implementation was incorectly stateful, which
violates the contract for token filters

fixes #431
2016-09-15 08:59:43 -04:00
Marty Schoch
ffee3c3764 fixed regexp tokenizers to not produce empty tokens 2016-09-14 16:22:20 -04:00
Marty Schoch
c87cf35ace migrated all bleve utils into single bleve command
used spf13/cobra to make it awesome
and attempting to vendor this new dep
2016-09-14 11:52:29 -04:00
Marty Schoch
d01ff4ad8a Merge pull request #429 from mschoch/apicleanup
BREAKING CHANGE - removed DumpXXX() methods from bleve.Index
2016-09-13 15:42:42 -04:00
Marty Schoch
3fd2a64872 BREAKING CHANGE - removed DumpXXX() methods from bleve.Index
The DumpXXX() methods were always documented as internal and
unsupported.  However, now they are being removed from the
public top-level API.  They are still available on the internal
IndexReader, which can be accessed using the Advanced() method.

The DocCount() and DumpXXX() methods on the internal index
have moved to the internal index reader, since they logically
operate on a snapshot of an index.
2016-09-13 12:40:01 -04:00
Marty Schoch
e1fb860a86 removed unused AsyncIndex interface 2016-09-13 08:42:36 -04:00
Marty Schoch
34ebd6ab08 Merge pull request #421 from mschoch/smolder
Smolder
2016-09-12 14:47:53 -04:00
Marty Schoch
0574ba4979 add cuckoofilter and gofarmhash to manifest 2016-09-12 14:34:07 -04:00
Marty Schoch
23755049e8 slight tweak to API to only encode docNum->docNumBytes once 2016-09-11 20:29:16 -04:00
Marty Schoch
035b7c91fc fix unchecked err 2016-09-11 20:29:15 -04:00
Marty Schoch
bbfa6406ea fix test expectation to use ext ids not internal ones
the test had incorreclty been updated to compare the internal
document ids, but these are opaque and may not be the expected
ids in some cases, the test should simply check that it
corresponds to the correct external ids
2016-09-11 20:29:15 -04:00
Marty Schoch
36000f1a1b fix api changes and test after merge 2016-09-11 20:29:15 -04:00
Marty Schoch
1b68c4ec5b make backindex rows more compact, fix bug counting docs on start 2016-09-11 20:29:15 -04:00
Marty Schoch
d3ca5424e2 added cuckoo filter, perf improves overall from upside_down
though only slightly
2016-09-11 20:29:15 -04:00
Marty Schoch
07ab49f602 fix bug counting docs and make smolder selectable 2016-09-11 20:29:15 -04:00
Marty Schoch
04fd62dec3 further tweaks, now all bleve tests pass 2016-09-11 20:29:15 -04:00
Marty Schoch
1b10c286e7 adding initial attempt at numeric ids in index
index scheme is named smolder
compiles and unit tests pass, that is all
2016-09-11 20:29:15 -04:00
Marty Schoch
da9339bcdf refactor FinalizeID into ExternalID and InternalID 2016-09-11 20:29:14 -04:00
Marty Schoch
f531835d5c Merge pull request #420 from steveyen/MB-20590
index/store/moss KV backend propagates mossStore's Stats()
2016-09-11 20:28:29 -04:00
Marty Schoch
5cf50ec338 Merge pull request #418 from dtylman/master
fix for #416
2016-09-11 20:26:24 -04:00
Marty Schoch
ee61b2e866 Merge pull request #425 from mschoch/porterfaster
improve perf of porter stemmer
2016-09-11 20:22:23 -04:00
Marty Schoch
f8e8c9d065 Merge pull request #426 from mschoch/fasterbuildterms
encode runes directly into buffer
2016-09-11 20:19:09 -04:00
Marty Schoch
44ff6ced8a improve perf of porter stemmer
1.  porter stemmer offers method to NOT do lowercasing, however
to use this we must convert to runes first ourself, so we did this

2.  now we can invoke the version that skips lowercasing, we
already do this ourselves before stemming through separate filter

due to the fact that the stemmer modifies the runes in place
we have no way to know if there were changes, thus we must
always encode back into the term byte slice

added unit test which catches the problem found

NOTE this uses analysis.BuildTermFromRunes so perf gain is
only visible with other PR also merged

future gains are possible if we udpate the stemmer to let us
know if changes were made, thus skipping re-encoding to
[]byte when no changes were actually made
2016-09-11 20:13:15 -04:00
Marty Schoch
c13626be45 encode runes directly into buffer
avoid allocating unnecessary intermediate buffer

also introduce new method to let a user optimistically
try and encode back into an existing buffer, if it isn't
large enough, it silently allocates a new one and returns it
2016-09-11 20:10:03 -04:00
Marty Schoch
56c7b9f831 Merge pull request #423 from mschoch/stopfilterfaster
avoid allocation in stop token filter
2016-09-11 13:59:31 -04:00
Marty Schoch
5ed9f67b0b Merge pull request #424 from mschoch/possessivefaster
speed up english possessive filter
2016-09-11 13:26:50 -04:00
Marty Schoch
9e9f172f81 speed up english possessive filter
previous impl always did full utf8 decode of rune
if we assume most tokens are not possessive this is unnecessary
and even if they are, we only need to chop off last to runes
so, now we only decode last rune of token, and if it looks like
s/S then we proceed to decode second to last rune, and then
only if it looks like any form of apostrophe, do we make any
changes to token, again by just reslicing original to chop
off the possessive extension
2016-09-11 12:55:03 -04:00
Marty Schoch
faa07ac3a6 avoid allocation in stop token filter
the token stream resulting from the removal of stop words must
be shorter or the same length as the original, so we just
reuse it and truncate it at the end.
2016-09-11 12:29:33 -04:00
Steve Yen
e8cc3c6bdd index/store/moss KV backend propagates mossStore's Stats()
This change depends on the recently introduced mossStore Stats() API
in github.com/couchbase/moss 564bdbc0 commit.  So, gvt for moss has
been updated as part of this change.

Most of the change involves propagating the mossStore instance (the
statsFunc callback) so that it's accessible to the KVStore.Stats()
method.

See also: http://review.couchbase.org/#/c/67524/
2016-09-08 17:12:04 -07:00
Danny Tylman
6c52907f2b fixes #416:
panic in collector_heap
2016-09-08 11:40:53 +03:00
Marty Schoch
b961d742c1 Merge branch 'bcampbell-sedtweak' 2016-09-01 13:56:11 -04:00
Marty Schoch
67755618e9 Merge branch 'sedtweak' of https://github.com/bcampbell/bleve into bcampbell-sedtweak 2016-09-01 13:55:15 -04:00
Marty Schoch
5023993895 replaced nex lexer with custom lexer
this improvement was started to improve code coverage
but also improves performance and adds support for escaping

escaping:

The following quoted string enumerates the characters which
may be escaped.

"+-=&|><!(){}[]^\"~*?:\\/ "

Note that this list includes space.

In order to escape these characters, they are prefixed with the \
(backslash) character.  In all cases, using the escaped version
produces the character itself and is not interpretted by the
lexer.

Two simple examples:

my\ name

Will be interpretted as a single argument to a match query
with the value "my name".

"contains a\" character"

Will be interpretted as a single argument to a phrase query
with the value `contains a " character`.

Performance:

before$ go test -v -run=xxx -bench=BenchmarkLexer
BenchmarkLexer-4   	  100000	     13991 ns/op
PASS
ok  	github.com/blevesearch/bleve	1.570s

after$ go test -v -run=xxx -bench=BenchmarkLexer
BenchmarkLexer-4   	  500000	      3387 ns/op
PASS
ok  	github.com/blevesearch/bleve	1.740s
2016-09-01 13:16:07 -04:00
Marty Schoch
46f70bfa12 streamline boost just like tilde 2016-08-31 22:10:44 -04:00
Marty Schoch
37d3750157 simplify parser rules 2016-08-31 21:57:44 -04:00
Marty Schoch
bb285cd0f2 more lexer/parser simplification 2016-08-31 21:53:49 -04:00