our implementation uses: golang.org/x/net/context
New method SearchInContext() allows the user to run a search
in the provided context. If that context is cancelled or
exceeds its deadline Bleve will attempt to stop and return
as soon as possible. This is a *best effort* attempt at this
time and may *not* be in a timely manner. If the caller must
return very near the timeout, the call should also be wrapped
in a goroutine.
The IndexAlias implementation is affected in a slightly more
complex way. In order to return partial results when a timeout
occurs on some indexes, the timeout is strictly enforced, and
at the moment this does introduce an additional goroutine.
The Bleve implementation honoring the context is currently
very course-grained. Specifically we check the Done() channel
between each DocumentMatch produced during the search. In the
future we will propogate the context deeper into the internals
of Bleve, and this will allow finer-grained timeout behavior.
the Status section can report on the number of total/fail/success
indexes when querying across multiple indexes through IndexAlias
Further, searching an IndexAlias will now return partial results,
the burden is on the caller to check the number of failed
indexes and decide how to handle this situation.
it would appear that a document lookup for an id fails
but that is a document id that was returned as a search hit
since we're using a stable snapshot, this should not happen
this lays the foundation for supporting the new firestorm
indexing scheme. i'm merging these changes ahead of
the rest of the firestorm branch so i can continue
to make changes to the analysis pipeline in parallel
refactor to share code in emulated batch
refactor to share code in emulated merge
refactor index kvstore benchmarks to share more code
refactor index kvstore benchmarks to be more repeatable
this introduces disk format v4
now the summary rows for a term are stored in their own
"dictionary row" format, previously the same information
was stored in special term frequency rows
this now allows us to easily iterate all the terms for a field
in sorted order (useful for many other fuzzy data structures)
at the top-level of bleve you can now browse terms within a field
using the following api on the Index interface:
FieldDict(field string) (index.FieldDict, error)
FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error)
FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)
fixes#127
now created through the index itself
mapping problems reported early at the time
data is added to the batch, previously these
were not reported until the batch was executed
new method OpenUsing allows user to override values
in the persisted config
example would be opening the index, but using a different
buffer size for leveldb (not actually supported yet, but that
is the idea)
closes#138
now you can access the underlying index/store implementations
using the Advanced() method. this is intedned for advanced
usage only, and can lead to problems if misused.
also, there is a new method NewUsing(...) which allows callers
of the top-level API to choose which underlying k/v store they
want to use.
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API. in these case and proactively in a few
others we now return error as well.
also the batch API has been updated to allow performing
set/delete internal within the batch
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently
as a part of benchmarking these changes i've also introduce a new
null storage implementation. this should never be used, as it
does not actualy build an index. it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O. this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
In the index/store package
introduce KVReader
creates snapshot
all read operations consistent from this snapshot
must close to release
introduce KVWriter
only one writer active
access to all operations
allows for consisten read-modify-write
must close to release
introduce AssociativeMerge operation on batch
allows efficient read-modify-write
for associative operations
used to consolidate updates to the term summary rows
saves 1 set and 1 get op per shared instance of term in field
In the index package
introduced an IndexReader
exposes a consisten snapshot of the index for searching
At top level
All searches now operate on a consisten snapshot of the index
this started initially to relocate highlighting into
a self contained package, which would then also use
the registry
however, it turned into a much larger refactor in
order to avoid cyclic imports
now facets, searchers, scorers and collectors
are also broken out into subpackages of search