Ethan Koenig
012d436dd7
Add UniqueTerm token filter
2018-01-16 22:24:51 -08:00
Marty Schoch
af198c833f
Merge branch 'ru_analyzer' of https://github.com/sokolovstas/bleve into sokolovstas-ru_analyzer
2018-01-10 10:29:15 -05:00
Ethan Koenig
0433f05d9c
Fix test
2017-06-22 18:56:28 -04:00
Ethan Koenig
8994ad2e00
Fix token start/end/position values in camelCase tokenizer
2017-06-22 17:42:39 -04:00
Stanislav Sokolov
d8d57e6990
Added Russian analyzer with snowball stemmer
2017-06-05 18:01:01 +05:00
Marty Schoch
782dbecfe1
fix edge ngram output when side=Back and input token len=max
...
edge condition was incorreclty checked
fixes #523
2017-01-30 20:29:20 -05:00
Steve Yen
6a38fa3719
go fmt
2016-10-12 09:39:43 -07:00
Michael Nitschinger
7e656dad32
Address special unicode sigma at end of term when lowercasing.
...
Σ maps to σ, except at the end of a word where it maps to ς.
This is the only conditional (contextual) but language-independent
mapping in unicode.
2016-10-11 12:37:08 +02:00
Michael Nitschinger
ff35d75aa4
Skip already lowercased runes on transformation.
...
The LowerCaseFilter works on the original slice to avoid allocations,
so skipping already lowercased runes avoids unnecessary work.
benchmark old ns/op new ns/op delta
BenchmarkLowerCaseFilter-8 1302 815 -37.40%
2016-10-11 12:03:26 +02:00
Marty Schoch
2332455bd2
nicer formatting of license header
2016-10-02 10:13:14 -04:00
Marty Schoch
6bf9dd59ab
BREAKING CHANGE - additional package renaming
...
i recently learned that package names should also prefer the
singular form, not the plural form
2016-10-01 17:20:59 -04:00