Ethan Koenig
0433f05d9c
Fix test
2017-06-22 18:56:28 -04:00
Ethan Koenig
8994ad2e00
Fix token start/end/position values in camelCase tokenizer
2017-06-22 17:42:39 -04:00
Marty Schoch
782dbecfe1
fix edge ngram output when side=Back and input token len=max
...
edge condition was incorreclty checked
fixes #523
2017-01-30 20:29:20 -05:00
Steve Yen
6a38fa3719
go fmt
2016-10-12 09:39:43 -07:00
Michael Nitschinger
7e656dad32
Address special unicode sigma at end of term when lowercasing.
...
Σ maps to σ, except at the end of a word where it maps to ς.
This is the only conditional (contextual) but language-independent
mapping in unicode.
2016-10-11 12:37:08 +02:00
Michael Nitschinger
ff35d75aa4
Skip already lowercased runes on transformation.
...
The LowerCaseFilter works on the original slice to avoid allocations,
so skipping already lowercased runes avoids unnecessary work.
benchmark old ns/op new ns/op delta
BenchmarkLowerCaseFilter-8 1302 815 -37.40%
2016-10-11 12:03:26 +02:00
Marty Schoch
2332455bd2
nicer formatting of license header
2016-10-02 10:13:14 -04:00
Marty Schoch
6bf9dd59ab
BREAKING CHANGE - additional package renaming
...
i recently learned that package names should also prefer the
singular form, not the plural form
2016-10-01 17:20:59 -04:00