change cjk bigram analyzer to work with multi-rune terms
add cjk width filter replaces full unicode normailzation
these changes make the cjk analyzer behave more like elasticsearch
they also remove the depenency on the whitespace analyzer
which is now free to also behave more like lucene/es
fixes#33
often the result stream was the same length, so can reuse the
existing token stream
also, in cases where a new stream was required, set capacity to
the length of the input stream. most output stream are at least
as long as the input, so this may avoid some subsequent resizing