0
0
Commit Graph

1830 Commits

Author SHA1 Message Date
Marty Schoch
3f83149ed3 adding back the forestdb kv store impl 2014-10-31 09:42:32 -04:00
Marty Schoch
c7443fe52b refactored API a bit
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API.  in these case and proactively in a few
others we now return error as well.

also the batch API has been updated to allow performing
set/delete internal within the batch
2014-10-31 09:40:23 -04:00
Marty Schoch
84d1cdf216 fix go vet issues 2014-10-29 09:31:03 -04:00
Marty Schoch
51a59cb05c initial impl of Index Aliases
an IndexAlias allows you easily work with one logical Index
while changing the actual Index its pointing to behind the scenes
Changing which actual Index is backing an IndexAlias can be done
atomically so that your application smoothly transitions from
one Index to another.
A separate use of IndexAlias is allowed when the IndexAlias is
defined to point to multiple Indexes.  In this case only the
Search() operation is supported, but the Search will be run
on each of the underlying indexes in parallel, and the results
will be merged.
2014-10-29 09:22:11 -04:00
Marty Schoch
3a0263bb72 finished initial impl of fuzzy search
you can do a manual fuzzy term search using the FuzzyQuery struct
or, more suitable for most users the MatchQuery now supports
some fuzzy options.  Here you can specify fuzziness and
prefix_length, to turn the underlying term search into a fuzzy
term search.  This has the benefit that analysis is performed
on your input, just like the analyzed field, prior to computing
the fuzzy variants.

closes #82
2014-10-24 13:39:48 -04:00
Marty Schoch
78467c0836 refactored yacc parsing code to be threadsafe, removed mutex 2014-10-23 15:56:59 -04:00
Marty Schoch
d485b0ef26 initial impl of fuzzy search 2014-10-23 13:02:29 -04:00
Steve Yen
91501262b6 typo with NewListIndexesHandler/NewCreateIndexHandler 2014-10-22 16:40:21 -07:00
Marty Schoch
0500a572af exposed Get/Set/Delete Internal methods
these are to be used to store side-channel information
along with the index
2014-10-22 16:03:55 -04:00
Marty Schoch
40a8154bab changed en analyzer to use pure go components
behavior should be similar with unicode segmentation
and a porter stemmer
2014-10-21 16:38:58 -04:00
Marty Schoch
c4d1782689 new pure go porter stemmer integrated
renamed original libstemmer porter to "stemmer_porter_classic"
new pure go stemmer is "stemmer_porter"
2014-10-20 16:55:24 -04:00
Marty Schoch
cf3643f292 added pure go tokenizer to do unicode word boundary segmentation 2014-10-17 18:07:48 -04:00
Marty Schoch
dcb90ad176 added benchmark for tokenizing English text 2014-10-17 18:07:01 -04:00
Marty Schoch
febb8d2df1 renamed unicode_word_boundary package to icu
this is in preparation of alternative unicode word boundary impls
2014-10-17 15:15:13 -04:00
Marty Schoch
7bf44e1ba7 added ability to return all document fields by requesting field * 2014-10-15 19:16:16 -04:00
Marty Schoch
8222fbea57 improve lexer handling of special characters
characters like + and - are special
but they should only be special at the beginning of strings
inside someting that would otherwise be a string we should just
let them be characters
closes #103
2014-10-10 20:45:57 -07:00
Marty Schoch
af6a5d27eb allow term searches for numbers
closes #108
2014-10-10 20:36:31 -07:00
Marty Schoch
19d45dfdb6 fix compliation with the latest changes to kagome 2014-10-10 19:59:24 -07:00
Marty Schoch
8be0652dc8 better String() impl when request.Size=0
closes #107
2014-10-10 18:08:20 -07:00
Marty Schoch
64b0066121 added support for tracking index stats and exposing via expvar
closes #83
2014-10-02 11:12:49 -07:00
Marty Schoch
97902e2619 text analysis now moved out of index write lock onto goroutine
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently

as a part of benchmarking these changes i've also introduce a new
null storage implementation.  this should never be used, as it
does not actualy build an index.  it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O.  this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
2014-09-24 08:13:14 -04:00
Marty Schoch
1dc466a800 modified token filters to avoid creating new token stream
often the result stream was the same length, so can reuse the
existing token stream
also, in cases where a new stream was required, set capacity to
the length of the input stream.  most output stream are at least
as long as the input, so this may avoid some subsequent resizing
2014-09-23 18:41:32 -04:00
Marty Schoch
95e6e37e67 added build tag to fix runngin tests without tag 2014-09-16 11:28:44 -04:00
Marty Schoch
608b9163a3 Merge branch 'master' of github.com:blevesearch/bleve 2014-09-16 11:22:01 -04:00
Marty Schoch
55c0e84665 relocated kagome tokenizer and introduced ja analyzer 2014-09-16 11:21:29 -04:00
Silvan Jegen
29bdc094a9 Use byte positions instead of character positions 2014-09-14 13:19:30 +02:00
Marty Schoch
3dc66b5338 Merge pull request #99 from jingweno/patch-1
Update README.md
2014-09-13 22:45:21 -04:00
Jingwen Owen Ou
79691770c4 Update README.md
Fix broken example.
2014-09-13 19:38:24 -07:00
Silvan Jegen
a8ec7f7af2 Add tests for the Kagome tokenizer 2014-09-13 17:45:22 +02:00
Silvan Jegen
ebf100c097 Add the Kagome tokenizer for Japanese 2014-09-13 17:45:19 +02:00
Marty Schoch
198ca1ad4d major refactor of kvstore/index internals, see below
In the index/store package
introduce KVReader
  creates snapshot
  all read operations consistent from this snapshot
  must close to release

introduce KVWriter
  only one writer active
  access to all operations
  allows for consisten read-modify-write
  must close to release

introduce AssociativeMerge operation on batch
  allows efficient read-modify-write
  for associative operations
  used to consolidate updates to the term summary rows
  saves 1 set and 1 get op per shared instance of term in field

In the index package
introduced an IndexReader
  exposes a consisten snapshot of the index for searching

At top level
  All searches now operate on a consisten snapshot of the index
2014-09-12 17:21:35 -04:00
Marty Schoch
7819deb447 added boltdb benchmark, same as others 2014-09-12 16:55:50 -04:00
Marty Schoch
2294b24b9d remove forestdb for now
not any benfefit in maintaining this for the time being
2014-09-12 16:55:11 -04:00
Marty Schoch
8c16d68c00 include cjk analyzer in default config 2014-09-11 10:44:14 -04:00
Marty Schoch
1a1cf32a86 introducing cjk_bigram filter and cjk analyzer
closes #34
2014-09-11 10:39:05 -04:00
Marty Schoch
cb5ccd2b1d fix whitespace tokenizer
previously would fail to split ascii running into ideographic
2014-09-11 10:38:02 -04:00
Marty Schoch
8debf26cb7 changed many components to not have defaults
many of these defaults were arbitrary, and not having
defaults lets us more easily flag them for configuration
added a shingle filter
introduce new toke type for shingles
2014-09-09 18:15:14 -04:00
Marty Schoch
8dd8fb8910 fix compilation 2014-09-07 14:13:32 -04:00
Marty Schoch
6b4c86b35a changed whitespace tokenizer to work better on cjk input
now it will return each cjk character as a separate token
this will pair well with a cjk bigram filter for indexing
2014-09-07 14:11:01 -04:00
Marty Schoch
933d99c576 rename the configurable token map from standard to custom
this makes it consistent with the "custom" analyzer
which operates similarly
also, added it to the config.go so its registerd and
available for use
2014-09-07 14:09:38 -04:00
Marty Schoch
22911888c4 refactor registry package and bleve_registry utility 2014-09-07 14:07:42 -04:00
Marty Schoch
9e78643bad icu tokenier uses brk status to set token type
part of #34
2014-09-07 10:24:02 -04:00
Marty Schoch
44df73d317 apply doc fix patch from rakoo
closes #95
2014-09-07 09:09:47 -04:00
Marty Schoch
f87a22e24c added json struct tag to http doc count response 2014-09-05 12:16:26 -04:00
Marty Schoch
b1dd4215fc added features to readme 2014-09-04 15:09:19 -04:00
Marty Schoch
f384f9dead added link to wiki search to readme 2014-09-04 14:43:25 -04:00
Marty Schoch
d90697f725 added features to readme 2014-09-04 14:31:26 -04:00
Marty Schoch
afdb5f057f added convenience method to add field to highlight request 2014-09-04 10:13:13 -04:00
Marty Schoch
9d2187706e another round of golint 2014-09-03 19:53:59 -04:00
Marty Schoch
8b9255f52f even more golint cleanups 2014-09-03 19:32:27 -04:00