0
0
bleve/analysis
Marty Schoch 0f16eccd6b new tokenizer that allows you to pre-identify tokens with regexp
name "exception"
configure with list of regexp string "exceptions"
these exceptions regexps that match sequences you want treated
as a single token.  these sequences are NOT sent to the
underlying tokenizer
configure "tokenizer" is the named tokenizer that should be
used for processing all text regions not matching exceptions

An example configuration with simple patterns to match URLs and
email addresses:

map[string]interface{}{
	"type":      "exception",
	"tokenizer": "unicode",
	"exceptions": []interface{}{
		`[hH][tT][tT][pP][sS]?://(\S)*`,
		`[fF][iI][lL][eE]://(\S)*`,
		`[fF][tT][pP]://(\S)*`,
		`\S+@\S+`,
  }
}
2015-04-08 15:31:58 -04:00
..
analyzers switching to unicode tokenizer now that its faster than regexp 2015-01-12 18:04:34 -05:00
byte_array_converters moved byte array converts into the analysis package 2014-08-29 19:23:21 -04:00
char_filters add newline between license and package 2014-09-02 10:54:50 -04:00
datetime_parsers additional golint issues resolved 2014-09-03 18:17:26 -04:00
language fixed issues with portuguese analyzer 2015-03-11 14:22:11 -04:00
token_filters fix typo in unicode normalization form constant 2015-01-26 14:09:20 -05:00
token_map Fix typos in comments and strings 2014-12-18 18:43:12 +01:00
tokenizers new tokenizer that allows you to pre-identify tokens with regexp 2015-04-08 15:31:58 -04:00
freq_test.go add newline between license and package 2014-09-02 10:54:50 -04:00
freq.go Remove unneeded import statements 2014-11-29 14:25:24 +01:00
test_words.txt major refactor of analysis files, now wired up to registry 2014-08-13 21:14:47 -04:00
token_map_test.go fix issues identified by errcheck 2015-04-07 14:52:00 -04:00
token_map.go first pass at checking errors that were ignored 2015-03-06 14:46:29 -05:00
type.go introducing cjk_bigram filter and cjk analyzer 2014-09-11 10:39:05 -04:00
util_test.go add newline between license and package 2014-09-02 10:54:50 -04:00
util.go fix issues with lucene stemmer 2015-03-11 11:14:29 -04:00