Patrick Mezard
8b17787a65
analysis: document "exception" tokenizer, and Tokenizer interface
2015-10-27 18:53:03 +01:00
Patrick Mezard
e2fa3d6351
doc: document Token, TokenFrequencies and Field structs
...
It helps understanding what is going on in indexing code.
ArrayPositions() was particularly puzzling.
2015-10-09 12:32:44 +02:00
Marty Schoch
1a1cf32a86
introducing cjk_bigram filter and cjk analyzer
...
closes #34
2014-09-11 10:39:05 -04:00
Marty Schoch
8debf26cb7
changed many components to not have defaults
...
many of these defaults were arbitrary, and not having
defaults lets us more easily flag them for configuration
added a shingle filter
introduce new toke type for shingles
2014-09-09 18:15:14 -04:00
Marty Schoch
6b4c86b35a
changed whitespace tokenizer to work better on cjk input
...
now it will return each cjk character as a separate token
this will pair well with a cjk bigram filter for indexing
2014-09-07 14:11:01 -04:00
Marty Schoch
9e78643bad
icu tokenier uses brk status to set token type
...
part of #34
2014-09-07 10:24:02 -04:00
Marty Schoch
377ae090d0
additional golint issues resolved
2014-09-03 18:17:26 -04:00
Marty Schoch
d534b0836b
converted ALL_CAPS constants to CamelCase
2014-09-03 17:48:40 -04:00
Marty Schoch
7a7eb2e94c
add newline between license and package
...
this avoids cluttering godocs with the license
2014-09-02 10:54:50 -04:00
Marty Schoch
7bfad18d40
moved byte array converts into the analysis package
2014-08-29 19:23:21 -04:00
Marty Schoch
0e54fbd8da
added keyword marker filter
...
updated stemmer filter to not stem tokens marked as keyword
closes #48
2014-08-07 08:13:00 -04:00
Marty Schoch
00d6f9700b
added support for date range fields and queries
...
closes #9 and closes #11
2014-08-03 17:19:04 -04:00
Marty Schoch
25540c736a
introduced token type
2014-07-31 13:54:12 -04:00
Marty Schoch
3d842dfaf2
initial commit
2014-04-17 16:55:53 -04:00