0
0
bleve/analysis/analyzers
Marty Schoch e472b3e807 add support for a "web" tokenizer/analyzer
The goal of the "web" tokenizer is to recognize web things like
- email addresses
- URLs
- twitter @handles and #hashtags

This implementation uses regexp exceptions.  There will most
likely be endless debate about the regular expressions. These
were chosein as "good enough for now".

There is also a "web" analyzer.  This is just the "standard"
analyzer, but using the "web" tokenizer instead of the "unicode"
one.  NOTE: after processing the exceptions, it still falls back
to the standard "unicode" one.

For many users, you can simply set your mapping's default analyzer
to be "web".

closes #269
2015-11-30 14:27:18 -05:00
..
custom_analyzer move custom_analyzer to custom_analyzer package 2015-08-11 21:22:03 +00:00
keyword_analyzer add newline between license and package 2014-09-02 10:54:50 -04:00
simple_analyzer switching to unicode tokenizer now that its faster than regexp 2015-01-12 18:04:34 -05:00
standard_analyzer switching to unicode tokenizer now that its faster than regexp 2015-01-12 18:04:34 -05:00
web add support for a "web" tokenizer/analyzer 2015-11-30 14:27:18 -05:00