In the index/store package
introduce KVReader
creates snapshot
all read operations consistent from this snapshot
must close to release
introduce KVWriter
only one writer active
access to all operations
allows for consisten read-modify-write
must close to release
introduce AssociativeMerge operation on batch
allows efficient read-modify-write
for associative operations
used to consolidate updates to the term summary rows
saves 1 set and 1 get op per shared instance of term in field
In the index package
introduced an IndexReader
exposes a consisten snapshot of the index for searching
At top level
All searches now operate on a consisten snapshot of the index
by default we now use the pure go boltdb kv store
it is less tested at this point but appears to work
test pass, and moves us closer to the goal of being
able to just "go get" bleve
New is now used to create new indexes
Open is used to open existing indexes
calls to Open no longer specify a mapping because the mapping
is serialized and stored along with the index
now can track array positions for field values
stored fields now include this in the key
and the back index now uses protobufs to simplify serialization
closes#73
ultimately this is make it more convenient for us to wire up
different elements of the analysis pipeline, without having to
preload everything into memory before we need it
separately the index layer now has a mechanism for storing
internal key/value pairs. this is expected to be used to
store the mapping, and possibly other pieces of data by the
top layer, but not exposed to the user at the top.
this change was then exposed at the higher levels
also the beer-sample app was upgraded to index in batches of 100
by default. this yieled an indexing speed up from 27s to 16s.
closes#57
previously we used the format:
't' <utf-8 term> <byte separator> <16-bit field id> <utf-8 docID> <byte separator>
now we have moved the field before the term, resulting in:
't' <16-bit field id> <utf-8 term> <byte separator> <utf-8 docID> <byte separator>
this means now instead of all fields with the same term being grouped together
all terms within the same field are grouped together
this allows us to enumerate the terms used with a field
this allows us to implement prefix search, and possibly improve numeric range queries
removed analyzers (these are now built as needed through config)
removed html chacter filter (now built as needed through config)
added missing license header
changed constructor signature of filters that cannot return errors
filter constructors that can have errors, now have Must variant which panics
change cdl2 tokenizer into filter (should only see lower-case input)
new top level index api, closes#5
refactored index tests to not rely directly on analyzers
moved query objects to top-level
new top level search api, closes#12
top score collector allows skipping results
index mapping supports _all by default, closes#3 and closes#6
index mapping supports disabled sections, closes#7
new http sub package with reusable http.Handler's, closes#22