0
0
bleve/analysis/token_filters
Marty Schoch 44ff6ced8a improve perf of porter stemmer
1.  porter stemmer offers method to NOT do lowercasing, however
to use this we must convert to runes first ourself, so we did this

2.  now we can invoke the version that skips lowercasing, we
already do this ourselves before stemming through separate filter

due to the fact that the stemmer modifies the runes in place
we have no way to know if there were changes, thus we must
always encode back into the term byte slice

added unit test which catches the problem found

NOTE this uses analysis.BuildTermFromRunes so perf gain is
only visible with other PR also merged

future gains are possible if we udpate the stemmer to let us
know if changes were made, thus skipping re-encoding to
[]byte when no changes were actually made
2016-09-11 20:13:15 -04:00
..
apostrophe_filter token_filters: fix typo in right single quotation mark name 2015-11-04 10:29:56 +01:00
camelcase_filter add couchbase copyright and license now that CLA has been signed 2016-06-10 13:08:50 -04:00
compound add support for dictionary based compound word filter 2014-11-18 15:18:42 -05:00
edge_ngram_filter removing duplicate code by reusing util.go in analysis 2016-06-09 15:13:30 -04:00
elision_filter elision_filter: correctly strip multi-bytes quotation marks 2015-11-04 10:59:10 +01:00
keyword_marker_filter modified token filters to avoid creating new token stream 2014-09-23 18:41:32 -04:00
length_filter modified token filters to avoid creating new token stream 2014-09-23 18:41:32 -04:00
lower_case_filter added some godoc documentation for the en analyzer 2015-11-18 15:28:57 +13:00
ngram_filter removing duplicate code by reusing util.go in analysis 2016-06-09 15:13:30 -04:00
porter improve perf of porter stemmer 2016-09-11 20:13:15 -04:00
shingle modified token filters to avoid creating new token stream 2014-09-23 18:41:32 -04:00
stop_tokens_filter token_map: document it along with stop_token_filter 2015-11-05 14:07:54 +01:00
truncate_token_filter removing duplicate code by reusing util.go in analysis 2016-06-09 15:13:30 -04:00
unicode_normalize fix typo in unicode normalization form constant 2015-01-26 14:09:20 -05:00