name "exception"
configure with list of regexp string "exceptions"
these exceptions regexps that match sequences you want treated
as a single token. these sequences are NOT sent to the
underlying tokenizer
configure "tokenizer" is the named tokenizer that should be
used for processing all text regions not matching exceptions
An example configuration with simple patterns to match URLs and
email addresses:
map[string]interface{}{
"type": "exception",
"tokenizer": "unicode",
"exceptions": []interface{}{
`[hH][tT][tT][pP][sS]?://(\S)*`,
`[fF][iI][lL][eE]://(\S)*`,
`[fF][tT][pP]://(\S)*`,
`\S+@\S+`,
}
}