this lays the foundation for supporting the new firestorm
indexing scheme. i'm merging these changes ahead of
the rest of the firestorm branch so i can continue
to make changes to the analysis pipeline in parallel
refactor to share code in emulated batch
refactor to share code in emulated merge
refactor index kvstore benchmarks to share more code
refactor index kvstore benchmarks to be more repeatable
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API. in these case and proactively in a few
others we now return error as well.
also the batch API has been updated to allow performing
set/delete internal within the batch
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently
as a part of benchmarking these changes i've also introduce a new
null storage implementation. this should never be used, as it
does not actualy build an index. it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O. this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
In the index/store package
introduce KVReader
creates snapshot
all read operations consistent from this snapshot
must close to release
introduce KVWriter
only one writer active
access to all operations
allows for consisten read-modify-write
must close to release
introduce AssociativeMerge operation on batch
allows efficient read-modify-write
for associative operations
used to consolidate updates to the term summary rows
saves 1 set and 1 get op per shared instance of term in field
In the index package
introduced an IndexReader
exposes a consisten snapshot of the index for searching
At top level
All searches now operate on a consisten snapshot of the index
now can track array positions for field values
stored fields now include this in the key
and the back index now uses protobufs to simplify serialization
closes#73
removed analyzers (these are now built as needed through config)
removed html chacter filter (now built as needed through config)
added missing license header
changed constructor signature of filters that cannot return errors
filter constructors that can have errors, now have Must variant which panics
change cdl2 tokenizer into filter (should only see lower-case input)
new top level index api, closes#5
refactored index tests to not rely directly on analyzers
moved query objects to top-level
new top level search api, closes#12
top score collector allows skipping results
index mapping supports _all by default, closes#3 and closes#6
index mapping supports disabled sections, closes#7
new http sub package with reusable http.Handler's, closes#22