bleve

Author	SHA1	Message	Date
Steve Yen	325a616993	unicode.Tokenize() avoids array growth via array of arrays	2016-01-02 12:21:25 -08:00
Steve Yen	918732f3d8	unicode.Tokenize() allocs backing array of Tokens Previously, unicode.Tokenize() would allocate a Token one-by-one, on an as-needed basis. This change allocates a "backing array" of Tokens, so that it goes to the runtime object allocator much less often. It takes a heuristic guess as to the backing array size by using the average token (segment) length seen so far. Results from micro-benchmark (null-firestorm, bleve-blast) seem to give perhaps less than ~0.5 MB/second throughput improvement.	2016-01-02 12:21:25 -08:00
Marty Schoch	0a4844f9d0	change unicode tokenizer to use direct segmenter api	2015-01-12 17:57:45 -05:00
Marty Schoch	fcab645f96	add test to cover kana/ideographic case	2014-11-26 08:42:40 -05:00
Marty Schoch	cf3643f292	added pure go tokenizer to do unicode word boundary segmentation	2014-10-17 18:07:48 -04:00