bleve

History

Marty Schoch 0f16eccd6b new tokenizer that allows you to pre-identify tokens with regexp name "exception" configure with list of regexp string "exceptions" these exceptions regexps that match sequences you want treated as a single token. these sequences are NOT sent to the underlying tokenizer configure "tokenizer" is the named tokenizer that should be used for processing all text regions not matching exceptions An example configuration with simple patterns to match URLs and email addresses: map[string]interface{}{ "type": "exception", "tokenizer": "unicode", "exceptions": []interface{}{ `[hH][tT][tT][pP][sS]?://(\S)`, `[fF][iI][lL][eE]://(\S)`, `[fF][tT][pP]://(\S)*`, `\S+@\S+`, } }	2015-04-08 15:31:58 -04:00
..
exception_test.go	new tokenizer that allows you to pre-identify tokens with regexp	2015-04-08 15:31:58 -04:00
exception.go	new tokenizer that allows you to pre-identify tokens with regexp	2015-04-08 15:31:58 -04:00

Marty Schoch 0f16eccd6b new tokenizer that allows you to pre-identify tokens with regexp

name "exception"
configure with list of regexp string "exceptions"
these exceptions regexps that match sequences you want treated
as a single token.  these sequences are NOT sent to the
underlying tokenizer
configure "tokenizer" is the named tokenizer that should be
used for processing all text regions not matching exceptions

An example configuration with simple patterns to match URLs and
email addresses:

map[string]interface{}{
	"type":      "exception",
	"tokenizer": "unicode",
	"exceptions": []interface{}{
		`[hH][tT][tT][pP][sS]?://(\S)*`,
		`[fF][iI][lL][eE]://(\S)*`,
		`[fF][tT][pP]://(\S)*`,
		`\S+@\S+`,
  }
}

2015-04-08 15:31:58 -04:00

exception_test.go

new tokenizer that allows you to pre-identify tokens with regexp

2015-04-08 15:31:58 -04:00

exception.go

new tokenizer that allows you to pre-identify tokens with regexp

2015-04-08 15:31:58 -04:00