bleve

gibheer/bleve

Fork 0

Commit Graph

Author	SHA1	Message	Date
Marty Schoch	043a3bfb7c	change cjk analyzer to use unicode tokenizer change cjk bigram analyzer to work with multi-rune terms add cjk width filter replaces full unicode normailzation these changes make the cjk analyzer behave more like elasticsearch they also remove the depenency on the whitespace analyzer which is now free to also behave more like lucene/es fixes #33	2016-06-10 13:04:40 -04:00
Marty Schoch	1dc466a800	modified token filters to avoid creating new token stream often the result stream was the same length, so can reuse the existing token stream also, in cases where a new stream was required, set capacity to the length of the input stream. most output stream are at least as long as the input, so this may avoid some subsequent resizing	2014-09-23 18:41:32 -04:00
Marty Schoch	1a1cf32a86	introducing cjk_bigram filter and cjk analyzer closes #34	2014-09-11 10:39:05 -04:00

Author

SHA1

Message

Date

Marty Schoch

043a3bfb7c

change cjk analyzer to use unicode tokenizer

change cjk bigram analyzer to work with multi-rune terms
add cjk width filter replaces full unicode normailzation

these changes make the cjk analyzer behave more like elasticsearch
they also remove the depenency on the whitespace analyzer
which is now free to also behave more like lucene/es

fixes #33

2016-06-10 13:04:40 -04:00

Marty Schoch

1dc466a800

modified token filters to avoid creating new token stream

often the result stream was the same length, so can reuse the
existing token stream
also, in cases where a new stream was required, set capacity to
the length of the input stream.  most output stream are at least
as long as the input, so this may avoid some subsequent resizing

2014-09-23 18:41:32 -04:00

Marty Schoch

1a1cf32a86

introducing cjk_bigram filter and cjk analyzer

closes #34

2014-09-11 10:39:05 -04:00

3 Commits