e472b3e807
The goal of the "web" tokenizer is to recognize web things like - email addresses - URLs - twitter @handles and #hashtags This implementation uses regexp exceptions. There will most likely be endless debate about the regular expressions. These were chosein as "good enough for now". There is also a "web" analyzer. This is just the "standard" analyzer, but using the "web" tokenizer instead of the "unicode" one. NOTE: after processing the exceptions, it still falls back to the standard "unicode" one. For many users, you can simply set your mapping's default analyzer to be "web". closes #269 |
||
---|---|---|
.. | ||
analyzers | ||
byte_array_converters | ||
char_filters | ||
datetime_parsers | ||
language | ||
token_filters | ||
token_map | ||
tokenizers | ||
benchmark_test.go | ||
freq_test.go | ||
freq.go | ||
test_words.txt | ||
token_map_test.go | ||
token_map.go | ||
type.go | ||
util_test.go | ||
util.go |