0
0
Commit Graph

349 Commits

Author SHA1 Message Date
Marty Schoch
67beaca6d6 fix to phrase/phrase match search involving stop words
closes #122
2014-11-25 10:07:54 -05:00
Marty Schoch
316970df13 better fix for returning on first error
closes #126
2014-11-25 10:04:15 -05:00
Marty Schoch
3c886276ed fix error message typo 2014-11-24 17:14:44 -05:00
Marty Schoch
69d69e4516 fix panic in MultiSearch when all indexes return error
fixes #126
2014-11-24 17:12:16 -05:00
Marty Schoch
5486c519b6 added index tests 2014-11-21 16:47:20 -05:00
Marty Schoch
a8ca4d67a0 improving test coverage of search 2014-11-21 15:37:09 -05:00
Marty Schoch
a313928be2 adding scripts for new code coverage 2014-11-21 14:02:17 -05:00
Marty Schoch
12ec3173fa added integration test for fuzzy search 2014-11-21 14:01:48 -05:00
Marty Schoch
560990d29c Merge pull request #125 from Shugyousha/simplereturn
Remove unneeded else clauses
2014-11-20 14:45:57 -05:00
Silvan Jegen
e3a2d3b58b Remove unneeded else clauses 2014-11-20 20:34:05 +01:00
Marty Schoch
68a2b9614d refactored integration tests into separate package
also made integration tests declarative
you can now easily define new datasets/mappings/searches/results
2014-11-19 15:58:15 -05:00
Marty Schoch
eb16b3c563 properly return multi-value fields in an array 2014-11-19 15:58:15 -05:00
Marty Schoch
19305c6b5f fix bug which prevented size=0 via JSON 2014-11-19 15:58:15 -05:00
Marty Schoch
d673f3404d Merge pull request #124 from miku/bleve_bulkindex
adding bleve_bulkindex utility
2014-11-19 14:03:46 -05:00
Martin Czygan
95ae51f59d adding bleve_bulkindex utility
Usage:

    bleve_bulkindex -index path file.ldj [file2.ldj, ...]

where file.ldj is a line-delimited JSON,
each representing a document. docIDs are autogenerated.
2014-11-19 19:46:24 +01:00
Marty Schoch
4f61bbfede add ability to configure color for ANSI highlighter
closes #123
2014-11-19 09:27:57 -05:00
Marty Schoch
d452b2a10e add support for dictionary based compound word filter
partially addresses #115
2014-11-18 15:18:42 -05:00
Marty Schoch
47bc7caec3 added getRollbackID() and rollbackTo() to the ForestDB store 2014-11-04 08:34:49 -05:00
Marty Schoch
5f40396ce8 removes dependency on mux from bleve.http
handlers now use varLookupFunc's to get variable values from
the request object.  this allows applications to use whatever
mux/router they want, and extracting variables like
indexName and docID is up to the caller-provided function

closes #113
2014-10-31 14:44:15 -04:00
Marty Schoch
3f83149ed3 adding back the forestdb kv store impl 2014-10-31 09:42:32 -04:00
Marty Schoch
c7443fe52b refactored API a bit
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API.  in these case and proactively in a few
others we now return error as well.

also the batch API has been updated to allow performing
set/delete internal within the batch
2014-10-31 09:40:23 -04:00
Marty Schoch
84d1cdf216 fix go vet issues 2014-10-29 09:31:03 -04:00
Marty Schoch
51a59cb05c initial impl of Index Aliases
an IndexAlias allows you easily work with one logical Index
while changing the actual Index its pointing to behind the scenes
Changing which actual Index is backing an IndexAlias can be done
atomically so that your application smoothly transitions from
one Index to another.
A separate use of IndexAlias is allowed when the IndexAlias is
defined to point to multiple Indexes.  In this case only the
Search() operation is supported, but the Search will be run
on each of the underlying indexes in parallel, and the results
will be merged.
2014-10-29 09:22:11 -04:00
Marty Schoch
3a0263bb72 finished initial impl of fuzzy search
you can do a manual fuzzy term search using the FuzzyQuery struct
or, more suitable for most users the MatchQuery now supports
some fuzzy options.  Here you can specify fuzziness and
prefix_length, to turn the underlying term search into a fuzzy
term search.  This has the benefit that analysis is performed
on your input, just like the analyzed field, prior to computing
the fuzzy variants.

closes #82
2014-10-24 13:39:48 -04:00
Marty Schoch
78467c0836 refactored yacc parsing code to be threadsafe, removed mutex 2014-10-23 15:56:59 -04:00
Marty Schoch
d485b0ef26 initial impl of fuzzy search 2014-10-23 13:02:29 -04:00
Steve Yen
91501262b6 typo with NewListIndexesHandler/NewCreateIndexHandler 2014-10-22 16:40:21 -07:00
Marty Schoch
0500a572af exposed Get/Set/Delete Internal methods
these are to be used to store side-channel information
along with the index
2014-10-22 16:03:55 -04:00
Marty Schoch
40a8154bab changed en analyzer to use pure go components
behavior should be similar with unicode segmentation
and a porter stemmer
2014-10-21 16:38:58 -04:00
Marty Schoch
c4d1782689 new pure go porter stemmer integrated
renamed original libstemmer porter to "stemmer_porter_classic"
new pure go stemmer is "stemmer_porter"
2014-10-20 16:55:24 -04:00
Marty Schoch
cf3643f292 added pure go tokenizer to do unicode word boundary segmentation 2014-10-17 18:07:48 -04:00
Marty Schoch
dcb90ad176 added benchmark for tokenizing English text 2014-10-17 18:07:01 -04:00
Marty Schoch
febb8d2df1 renamed unicode_word_boundary package to icu
this is in preparation of alternative unicode word boundary impls
2014-10-17 15:15:13 -04:00
Marty Schoch
7bf44e1ba7 added ability to return all document fields by requesting field * 2014-10-15 19:16:16 -04:00
Marty Schoch
8222fbea57 improve lexer handling of special characters
characters like + and - are special
but they should only be special at the beginning of strings
inside someting that would otherwise be a string we should just
let them be characters
closes #103
2014-10-10 20:45:57 -07:00
Marty Schoch
af6a5d27eb allow term searches for numbers
closes #108
2014-10-10 20:36:31 -07:00
Marty Schoch
19d45dfdb6 fix compliation with the latest changes to kagome 2014-10-10 19:59:24 -07:00
Marty Schoch
8be0652dc8 better String() impl when request.Size=0
closes #107
2014-10-10 18:08:20 -07:00
Marty Schoch
64b0066121 added support for tracking index stats and exposing via expvar
closes #83
2014-10-02 11:12:49 -07:00
Marty Schoch
97902e2619 text analysis now moved out of index write lock onto goroutine
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently

as a part of benchmarking these changes i've also introduce a new
null storage implementation.  this should never be used, as it
does not actualy build an index.  it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O.  this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
2014-09-24 08:13:14 -04:00
Marty Schoch
1dc466a800 modified token filters to avoid creating new token stream
often the result stream was the same length, so can reuse the
existing token stream
also, in cases where a new stream was required, set capacity to
the length of the input stream.  most output stream are at least
as long as the input, so this may avoid some subsequent resizing
2014-09-23 18:41:32 -04:00
Marty Schoch
95e6e37e67 added build tag to fix runngin tests without tag 2014-09-16 11:28:44 -04:00
Marty Schoch
608b9163a3 Merge branch 'master' of github.com:blevesearch/bleve 2014-09-16 11:22:01 -04:00
Marty Schoch
55c0e84665 relocated kagome tokenizer and introduced ja analyzer 2014-09-16 11:21:29 -04:00
Silvan Jegen
29bdc094a9 Use byte positions instead of character positions 2014-09-14 13:19:30 +02:00
Marty Schoch
3dc66b5338 Merge pull request #99 from jingweno/patch-1
Update README.md
2014-09-13 22:45:21 -04:00
Jingwen Owen Ou
79691770c4 Update README.md
Fix broken example.
2014-09-13 19:38:24 -07:00
Silvan Jegen
a8ec7f7af2 Add tests for the Kagome tokenizer 2014-09-13 17:45:22 +02:00
Silvan Jegen
ebf100c097 Add the Kagome tokenizer for Japanese 2014-09-13 17:45:19 +02:00
Marty Schoch
198ca1ad4d major refactor of kvstore/index internals, see below
In the index/store package
introduce KVReader
  creates snapshot
  all read operations consistent from this snapshot
  must close to release

introduce KVWriter
  only one writer active
  access to all operations
  allows for consisten read-modify-write
  must close to release

introduce AssociativeMerge operation on batch
  allows efficient read-modify-write
  for associative operations
  used to consolidate updates to the term summary rows
  saves 1 set and 1 get op per shared instance of term in field

In the index package
introduced an IndexReader
  exposes a consisten snapshot of the index for searching

At top level
  All searches now operate on a consisten snapshot of the index
2014-09-12 17:21:35 -04:00