ultimately this is make it more convenient for us to wire up
different elements of the analysis pipeline, without having to
preload everything into memory before we need it
separately the index layer now has a mechanism for storing
internal key/value pairs. this is expected to be used to
store the mapping, and possibly other pieces of data by the
top layer, but not exposed to the user at the top.
this change was then exposed at the higher levels
also the beer-sample app was upgraded to index in batches of 100
by default. this yieled an indexing speed up from 27s to 16s.
closes#57
introduced concept of byte array converters
right now only wired up to top-level index mapping
allowing the removal of the JSON methods, now at the top level
we default to parsing []byte as JSON, override if thats not
the behavior you want.
future enhancements will allow use of these byte array converters
to control how byte arrays are handled elsewhere in documents
this would allow for handing binary attachments, etc in the future
closes#59
previously we used the format:
't' <utf-8 term> <byte separator> <16-bit field id> <utf-8 docID> <byte separator>
now we have moved the field before the term, resulting in:
't' <16-bit field id> <utf-8 term> <byte separator> <utf-8 docID> <byte separator>
this means now instead of all fields with the same term being grouped together
all terms within the same field are grouped together
this allows us to enumerate the terms used with a field
this allows us to implement prefix search, and possibly improve numeric range queries
defined article word maps for french, italian, irish and catalan
defined elision filters for these same languages
updated analyers for french and italian to use this new filter
irish and catalan still depend on other missing pieces
closes#25
this encoding scheme matches the one used by lucene
it has been packaged separately so that others may
more easily reuse it without using the rest of bleve
stop words can be loaded from files/bytes, closes#19
stop words loaded for large list of languages, closes#20
defined language specific analyzers for as much as possible right now, closes#21
opened new issues for some of the remaining gaps