0
0
Commit Graph

3 Commits

Author SHA1 Message Date
Marty Schoch
043a3bfb7c change cjk analyzer to use unicode tokenizer
change cjk bigram analyzer to work with multi-rune terms
add cjk width filter replaces full unicode normailzation

these changes make the cjk analyzer behave more like elasticsearch
they also remove the depenency on the whitespace analyzer
which is now free to also behave more like lucene/es

fixes #33
2016-06-10 13:04:40 -04:00
Marty Schoch
1dc466a800 modified token filters to avoid creating new token stream
often the result stream was the same length, so can reuse the
existing token stream
also, in cases where a new stream was required, set capacity to
the length of the input stream.  most output stream are at least
as long as the input, so this may avoid some subsequent resizing
2014-09-23 18:41:32 -04:00
Marty Schoch
1a1cf32a86 introducing cjk_bigram filter and cjk analyzer
closes #34
2014-09-11 10:39:05 -04:00