0
0
Commit Graph

28 Commits

Author SHA1 Message Date
Patrick Mezard
19230b2f8a searcher_docid: catch DocIDReader.Close() possible error 2015-11-04 19:24:01 +01:00
Patrick Mezard
ff7234d893 query_docid: add DocIDQuery to filter by document identifiers 2015-11-04 18:41:16 +01:00
Marty Schoch
900f1b4a67 major kvstore interface and impl overhaul
clarified the interface contract
2015-09-23 11:25:47 -07:00
Marty Schoch
dbb93b75a4 refactoring to allow pluggable index encodings
this lays the foundation for supporting the new firestorm
indexing scheme.  i'm merging these changes ahead of
the rest of the firestorm branch so i can continue
to make changes to the analysis pipeline in parallel
2015-09-02 13:12:08 -04:00
Marty Schoch
a9c07acbfa refactor of kvstore api to support native merge in rocksdb
refactor to share code in emulated batch
refactor to share code in emulated merge
refactor index kvstore benchmarks to share more code
refactor index kvstore benchmarks to be more repeatable
2015-04-24 17:13:50 -04:00
Marty Schoch
539aeb8dc7 fix errors identified by errcheck
part of #169
2015-04-07 18:05:41 -04:00
Marty Schoch
c8d974048a fix issues identified by errcheck
part of #169
2015-04-07 14:59:35 -04:00
Marty Schoch
867110e03b major improvements to index row encoding
improvements uncovered some issues with how k/v data was copied
or not.  to address this, kv abstraction layer now lets impl
specify if the bytes returned are safe to use after a reader
(or writer since writers are also readers) are closed
See index/store/KVReader - BytesSafeAfterClose() bool
false is the safe value if you're not sure
it will cause index impls to copy the data
Some kv impls already have created a copy a the C-api barrier
in which case they can safely return true.

Overall this yields ~25% speedup for searches with leveldb.
It yields ~10% speedup for boltdb.
Returning stored fields is now slower with boltdb, as previously
we were returning unsafe bytes.
2015-04-03 16:50:48 -04:00
Marty Schoch
a41f229b14 added regexp and wildcard queries
fixes #152
2015-03-11 16:57:22 -04:00
Marty Schoch
183fcd4b14 added a missing check for errors 2015-03-11 16:56:01 -04:00
Marty Schoch
522f9d5cc7 significant change to index format, support dictionary rows
this introduces disk format v4
now the summary rows for a term are stored in their own
"dictionary row" format, previously the same information
was stored in special term frequency rows
this now allows us to easily iterate all the terms for a field
in sorted order (useful for many other fuzzy data structures)

at the top-level of bleve you can now browse terms within a field
using the following api on the Index interface:

  FieldDict(field string) (index.FieldDict, error)
  FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error)
  FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)

fixes #127
2015-03-10 16:22:19 -04:00
Marty Schoch
300ec79c96 first pass at checking errors that were ignored
part of #169
2015-03-06 14:46:29 -05:00
Marty Schoch
0ed47f5343 fix advance logic to not skip over result 2015-01-22 09:56:40 -05:00
Marty Schoch
5a09ceeac8 fix traversal logic when not in expected order 2015-01-22 09:56:21 -05:00
Silvan Jegen
ef18dfe4cd Fix typos in comments and strings 2014-12-18 18:43:12 +01:00
Marty Schoch
fc33752c80 moved levenshtein code outside of fuzzy searcher
should allow easier reuse
2014-12-12 13:23:06 -05:00
Marty Schoch
67beaca6d6 fix to phrase/phrase match search involving stop words
closes #122
2014-11-25 10:07:54 -05:00
Marty Schoch
c7443fe52b refactored API a bit
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API.  in these case and proactively in a few
others we now return error as well.

also the batch API has been updated to allow performing
set/delete internal within the batch
2014-10-31 09:40:23 -04:00
Marty Schoch
3a0263bb72 finished initial impl of fuzzy search
you can do a manual fuzzy term search using the FuzzyQuery struct
or, more suitable for most users the MatchQuery now supports
some fuzzy options.  Here you can specify fuzziness and
prefix_length, to turn the underlying term search into a fuzzy
term search.  This has the benefit that analysis is performed
on your input, just like the analyzed field, prior to computing
the fuzzy variants.

closes #82
2014-10-24 13:39:48 -04:00
Marty Schoch
d485b0ef26 initial impl of fuzzy search 2014-10-23 13:02:29 -04:00
Marty Schoch
97902e2619 text analysis now moved out of index write lock onto goroutine
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently

as a part of benchmarking these changes i've also introduce a new
null storage implementation.  this should never be used, as it
does not actualy build an index.  it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O.  this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
2014-09-24 08:13:14 -04:00
Marty Schoch
198ca1ad4d major refactor of kvstore/index internals, see below
In the index/store package
introduce KVReader
  creates snapshot
  all read operations consistent from this snapshot
  must close to release

introduce KVWriter
  only one writer active
  access to all operations
  allows for consisten read-modify-write
  must close to release

introduce AssociativeMerge operation on batch
  allows efficient read-modify-write
  for associative operations
  used to consolidate updates to the term summary rows
  saves 1 set and 1 get op per shared instance of term in field

In the index package
introduced an IndexReader
  exposes a consisten snapshot of the index for searching

At top level
  All searches now operate on a consisten snapshot of the index
2014-09-12 17:21:35 -04:00
Marty Schoch
9d2187706e another round of golint 2014-09-03 19:53:59 -04:00
Marty Schoch
8b9255f52f even more golint cleanups 2014-09-03 19:32:27 -04:00
Marty Schoch
e1b77956d4 more golint cleanups 2014-09-03 18:47:02 -04:00
Marty Schoch
8e6c8e5644 continued refactoring of the mapping code
also renamed some constant that didnt follow go convetions
2014-09-03 13:02:10 -04:00
Marty Schoch
7a7eb2e94c add newline between license and package
this avoids cluttering godocs with the license
2014-09-02 10:54:50 -04:00
Marty Schoch
2ee7289bc8 major refactor of search package
this started initially to relocate highlighting into
a self contained package, which would then also use
the registry
however, it turned into a much larger refactor in
order to avoid cyclic imports
now facets, searchers, scorers and collectors
are also broken out into subpackages of search
2014-09-01 11:15:38 -04:00