918732f3d8
Previously, unicode.Tokenize() would allocate a Token one-by-one, on an as-needed basis. This change allocates a "backing array" of Tokens, so that it goes to the runtime object allocator much less often. It takes a heuristic guess as to the backing array size by using the average token (segment) length seen so far. Results from micro-benchmark (null-firestorm, bleve-blast) seem to give perhaps less than ~0.5 MB/second throughput improvement. |
||
---|---|---|
.. | ||
exception | ||
regexp_tokenizer | ||
single_token | ||
unicode | ||
web | ||
whitespace_tokenizer |