bleve

gibheer/bleve

Fork 0

Commit Graph

Author	SHA1	Message	Date
Marty Schoch	0f16eccd6b	new tokenizer that allows you to pre-identify tokens with regexp name "exception" configure with list of regexp string "exceptions" these exceptions regexps that match sequences you want treated as a single token. these sequences are NOT sent to the underlying tokenizer configure "tokenizer" is the named tokenizer that should be used for processing all text regions not matching exceptions An example configuration with simple patterns to match URLs and email addresses: map[string]interface{}{ "type": "exception", "tokenizer": "unicode", "exceptions": []interface{}{ `[hH][tT][tT][pP][sS]?://(\S)`, `[fF][iI][lL][eE]://(\S)`, `[fF][tT][pP]://(\S)*`, `\S+@\S+`, } }	2015-04-08 15:31:58 -04:00

Author

SHA1

Message

Date

Marty Schoch

0f16eccd6b

new tokenizer that allows you to pre-identify tokens with regexp

name "exception"
configure with list of regexp string "exceptions"
these exceptions regexps that match sequences you want treated
as a single token.  these sequences are NOT sent to the
underlying tokenizer
configure "tokenizer" is the named tokenizer that should be
used for processing all text regions not matching exceptions

An example configuration with simple patterns to match URLs and
email addresses:

map[string]interface{}{
	"type":      "exception",
	"tokenizer": "unicode",
	"exceptions": []interface{}{
		`[hH][tT][tT][pP][sS]?://(\S)*`,
		`[fF][iI][lL][eE]://(\S)*`,
		`[fF][tT][pP]://(\S)*`,
		`\S+@\S+`,
  }
}

2015-04-08 15:31:58 -04:00

1 Commits