many tools and applications using bleve use the config pkg to
include support for many languages out of the box by forcing
import of optional packages.
The goal of the "web" tokenizer is to recognize web things like
- email addresses
- URLs
- twitter @handles and #hashtags
This implementation uses regexp exceptions. There will most
likely be endless debate about the regular expressions. These
were chosein as "good enough for now".
There is also a "web" analyzer. This is just the "standard"
analyzer, but using the "web" tokenizer instead of the "unicode"
one. NOTE: after processing the exceptions, it still falls back
to the standard "unicode" one.
For many users, you can simply set your mapping's default analyzer
to be "web".
closes#269