e472b3e807
The goal of the "web" tokenizer is to recognize web things like - email addresses - URLs - twitter @handles and #hashtags This implementation uses regexp exceptions. There will most likely be endless debate about the regular expressions. These were chosein as "good enough for now". There is also a "web" analyzer. This is just the "standard" analyzer, but using the "web" tokenizer instead of the "unicode" one. NOTE: after processing the exceptions, it still falls back to the standard "unicode" one. For many users, you can simply set your mapping's default analyzer to be "web". closes #269 |
||
---|---|---|
.. | ||
exception | ||
regexp_tokenizer | ||
single_token | ||
unicode | ||
web | ||
whitespace_tokenizer |