bleve

History

Steve Yen 918732f3d8 unicode.Tokenize() allocs backing array of Tokens Previously, unicode.Tokenize() would allocate a Token one-by-one, on an as-needed basis. This change allocates a "backing array" of Tokens, so that it goes to the runtime object allocator much less often. It takes a heuristic guess as to the backing array size by using the average token (segment) length seen so far. Results from micro-benchmark (null-firestorm, bleve-blast) seem to give perhaps less than ~0.5 MB/second throughput improvement.		2016-01-02 12:21:25 -08:00
..
exception	exception: fail if pattern is empty, name tokenizer in error	2015-10-27 18:53:03 +01:00
regexp_tokenizer	changed whitespace tokenizer to work better on cjk input	2014-09-07 14:11:01 -04:00
single_token	add newline between license and package	2014-09-02 10:54:50 -04:00
unicode	unicode.Tokenize() allocs backing array of Tokens	2016-01-02 12:21:25 -08:00
web	add support for a "web" tokenizer/analyzer	2015-11-30 14:27:18 -05:00
whitespace_tokenizer	added benchmark for tokenizing English text	2014-10-17 18:07:01 -04:00