bleve

History

Marty Schoch e472b3e807 add support for a "web" tokenizer/analyzer The goal of the "web" tokenizer is to recognize web things like - email addresses - URLs - twitter @handles and #hashtags This implementation uses regexp exceptions. There will most likely be endless debate about the regular expressions. These were chosein as "good enough for now". There is also a "web" analyzer. This is just the "standard" analyzer, but using the "web" tokenizer instead of the "unicode" one. NOTE: after processing the exceptions, it still falls back to the standard "unicode" one. For many users, you can simply set your mapping's default analyzer to be "web". closes #269		2015-11-30 14:27:18 -05:00
..
exception	exception: fail if pattern is empty, name tokenizer in error	2015-10-27 18:53:03 +01:00
regexp_tokenizer	changed whitespace tokenizer to work better on cjk input	2014-09-07 14:11:01 -04:00
single_token	add newline between license and package	2014-09-02 10:54:50 -04:00
unicode	change unicode tokenizer to use direct segmenter api	2015-01-12 17:57:45 -05:00
web	add support for a "web" tokenizer/analyzer	2015-11-30 14:27:18 -05:00
whitespace_tokenizer	added benchmark for tokenizing English text	2014-10-17 18:07:01 -04:00