0
0
Fork 0
Commit Graph

11 Commits

Author SHA1 Message Date
Ethan Koenig 012d436dd7 Add UniqueTerm token filter 2018-01-16 22:24:51 -08:00
Marty Schoch af198c833f Merge branch 'ru_analyzer' of https://github.com/sokolovstas/bleve into sokolovstas-ru_analyzer 2018-01-10 10:29:15 -05:00
Ethan Koenig 0433f05d9c Fix test 2017-06-22 18:56:28 -04:00
Ethan Koenig 8994ad2e00 Fix token start/end/position values in camelCase tokenizer 2017-06-22 17:42:39 -04:00
Stanislav Sokolov d8d57e6990 Added Russian analyzer with snowball stemmer 2017-06-05 18:01:01 +05:00
Marty Schoch 782dbecfe1 fix edge ngram output when side=Back and input token len=max
edge condition was incorreclty checked
fixes #523
2017-01-30 20:29:20 -05:00
Steve Yen 6a38fa3719 go fmt 2016-10-12 09:39:43 -07:00
Michael Nitschinger 7e656dad32 Address special unicode sigma at end of term when lowercasing.
Σ maps to σ, except at the end of a word where it maps to ς.
This is the only conditional (contextual) but language-independent
mapping in unicode.
2016-10-11 12:37:08 +02:00
Michael Nitschinger ff35d75aa4 Skip already lowercased runes on transformation.
The LowerCaseFilter works on the original slice to avoid allocations,
so skipping already lowercased runes avoids unnecessary work.

benchmark                      old ns/op     new ns/op     delta
BenchmarkLowerCaseFilter-8     1302          815           -37.40%
2016-10-11 12:03:26 +02:00
Marty Schoch 2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch 6bf9dd59ab BREAKING CHANGE - additional package renaming
i recently learned that package names should also prefer the
singular form, not the plural form
2016-10-01 17:20:59 -04:00