0f16eccd6b
name "exception" configure with list of regexp string "exceptions" these exceptions regexps that match sequences you want treated as a single token. these sequences are NOT sent to the underlying tokenizer configure "tokenizer" is the named tokenizer that should be used for processing all text regions not matching exceptions An example configuration with simple patterns to match URLs and email addresses: map[string]interface{}{ "type": "exception", "tokenizer": "unicode", "exceptions": []interface{}{ `[hH][tT][tT][pP][sS]?://(\S)*`, `[fF][iI][lL][eE]://(\S)*`, `[fF][tT][pP]://(\S)*`, `\S+@\S+`, } } |
||
---|---|---|
.. | ||
exception_test.go | ||
exception.go |