0
0
Commit Graph

60 Commits

Author SHA1 Message Date
Marty Schoch
9e78643bad icu tokenier uses brk status to set token type
part of #34
2014-09-07 10:24:02 -04:00
Marty Schoch
377ae090d0 additional golint issues resolved 2014-09-03 18:17:26 -04:00
Marty Schoch
d534b0836b converted ALL_CAPS constants to CamelCase 2014-09-03 17:48:40 -04:00
Marty Schoch
7a7eb2e94c add newline between license and package
this avoids cluttering godocs with the license
2014-09-02 10:54:50 -04:00
Marty Schoch
1dcd06e412 add ability to define custom analysis as part of index mapping
now, as part of your index mapping you can create custom
analysis components.  these custome analysis components
are serialized as part of the mapping, and reused
as you would expect on subsequent accesses.
2014-09-01 13:55:23 -04:00
Marty Schoch
7bfad18d40 moved byte array converts into the analysis package 2014-08-29 19:23:21 -04:00
Marty Schoch
1161361bea rename imports from couchbaselabs to blevesearch 2014-08-28 15:38:57 -04:00
Marty Schoch
e8959d03ae added build tag 'icu' to enable functionality dependent on it 2014-08-25 12:22:01 -04:00
Marty Schoch
21ef6e9878 added build tag for things depending on libstemmer 2014-08-25 12:06:10 -04:00
Marty Schoch
08db2eae42 added alternate build tag 'full' which will be an alias to enable all 2014-08-25 11:40:58 -04:00
Marty Schoch
f37bb77794 added build tag to enable cld2 2014-08-25 11:24:20 -04:00
Marty Schoch
092e30a38e tried to word the instructions for static and dynamic linking 2014-08-25 10:54:15 -04:00
deoxxa
22b7b3bc24 compile libcld2 statically 2014-08-24 03:44:57 +10:00
Marty Schoch
b48dc87afa added test case clarifying whitespace tokenizer on empty input 2014-08-19 10:43:52 -04:00
Marty Schoch
5dcd39ade7 added turkish analyzer test 2014-08-14 16:42:41 -04:00
Marty Schoch
21408e49eb added thai analyzer test 2014-08-14 16:39:37 -04:00
Marty Schoch
599ef6edce added swedish analyzer test 2014-08-14 16:12:48 -04:00
Marty Schoch
64255e3eb9 added russian analyzer test 2014-08-14 16:11:23 -04:00
Marty Schoch
8896de2039 added romanian analyzer test 2014-08-14 16:06:17 -04:00
Marty Schoch
c2937b4b81 added portuguese analyzer test
discrepencies found, logged in #70
failing tests commented out for now
2014-08-14 16:04:29 -04:00
Marty Schoch
81a9d325a2 added norwegian analyzer test 2014-08-14 16:01:03 -04:00
Marty Schoch
a3a97a09d3 added dutch analyzer test 2014-08-14 15:59:39 -04:00
Marty Schoch
6714d5d765 added italian analyzer test
discrepencies found between us and lucene, documented in #69
failing tests commented out for now
2014-08-14 15:56:47 -04:00
Marty Schoch
b9c0477762 added hungarian analyzer test 2014-08-14 15:51:55 -04:00
Marty Schoch
6a9f8e85ae added french analyzer test
many discrepencies noted, opened issue #68 to track this
failing tests commented out for now
2014-08-14 15:48:32 -04:00
Marty Schoch
f6f17c7a9e added finish analyzer test 2014-08-14 15:27:45 -04:00
Marty Schoch
80d7c4f870 added persian analyzer test 2014-08-14 15:24:42 -04:00
Marty Schoch
2ef7c80c92 added spanish analyzer test 2014-08-14 14:44:46 -04:00
Marty Schoch
4398aab723 added sorani analyzer test 2014-08-14 14:42:36 -04:00
Marty Schoch
b22941ee37 added test for danish anlyzer 2014-08-14 14:36:24 -04:00
Marty Schoch
8c9997f1e2 added test for german analyzer 2014-08-14 14:33:30 -04:00
Marty Schoch
6a951b9372 added analyzer test for english 2014-08-14 14:28:24 -04:00
Marty Schoch
c526a38369 major refactor of analysis files, now wired up to registry
ultimately this is make it more convenient for us to wire up
different elements of the analysis pipeline, without having to
preload everything into memory before we need it

separately the index layer now has a mechanism for storing
internal key/value pairs.  this is expected to be used to
store the mapping, and possibly other pieces of data by the
top layer, but not exposed to the user at the top.
2014-08-13 21:14:47 -04:00
Marty Schoch
3481ec9cef added hindi stemmer
closes #40
2014-08-11 22:29:47 -04:00
Marty Schoch
c65f7415ff added hindi normalizer
closes #64
2014-08-11 19:51:47 -04:00
Marty Schoch
cd0e3fd85b added german normalizer
updated german analyzer to use this normalizer
closes #65
2014-08-11 19:25:37 -04:00
Marty Schoch
a4707ebb4e configured zero width non joiner char filter, and persian analyzer 2014-08-11 18:57:04 -04:00
Marty Schoch
4ccd69ed45 added arabic normalizer
closes #63
2014-08-11 18:35:35 -04:00
Marty Schoch
73b252f6a6 added persian normalizer
closes #67
2014-08-11 18:15:41 -04:00
Marty Schoch
e21b7f4436 added sorani normalizer and stemmer, now have analyzer
closes #43
2014-08-08 09:38:28 -04:00
Marty Schoch
ef35ea1985 added czech stop word list
closes #36
2014-08-07 22:32:49 -04:00
Marty Schoch
964b87f76e added rune tokenizer
not used directly right now, but basis for other simple tokenizers
2014-08-07 22:14:26 -04:00
Marty Schoch
0e54fbd8da added keyword marker filter
updated stemmer filter to not stem tokens marked as keyword
closes #48
2014-08-07 08:13:00 -04:00
Marty Schoch
c19270108c added ngram and edge ngram token filters
closes #46 and closes #47
2014-08-06 22:11:42 -04:00
Marty Schoch
9a777aaa80 added token truncate filter
closes #49
2014-08-06 20:39:42 -04:00
Marty Schoch
d84187fd24 added apostrophe filter to improve turkish analyzer
closes #27
2014-08-06 08:50:00 -04:00
Marty Schoch
79ab2b9b3d added unicode normalization filter 2014-08-04 21:59:57 -04:00
Marty Schoch
2c0bf23fac added elision filter
defined article word maps for french, italian, irish and catalan
defined elision filters for these same languages
updated analyers for french and italian to use this new filter
irish and catalan still depend on other missing pieces
closes #25
2014-08-03 19:17:35 -04:00
Marty Schoch
0960cab0ae refactored StopWordsMap into WordMap so it can be reused
the ElisionFilter will need a word list of articles and plan to reuse this
2014-08-03 17:46:35 -04:00
Marty Schoch
00d6f9700b added support for date range fields and queries
closes #9 and closes #11
2014-08-03 17:19:04 -04:00