0
0
Commit Graph

58 Commits

Author SHA1 Message Date
Marty Schoch
043a3bfb7c change cjk analyzer to use unicode tokenizer
change cjk bigram analyzer to work with multi-rune terms
add cjk width filter replaces full unicode normailzation

these changes make the cjk analyzer behave more like elasticsearch
they also remove the depenency on the whitespace analyzer
which is now free to also behave more like lucene/es

fixes #33
2016-06-10 13:04:40 -04:00
Marty Schoch
2a703376ea fix ineffectual assignments 2016-04-02 22:42:56 -04:00
Marty Schoch
7892882519 fix typos 2016-04-02 21:59:30 -04:00
Marty Schoch
194ee82c80 gofmt simplifications 2016-04-02 21:54:33 -04:00
Ben Campbell
4fafb2be3f Merge branch 'master' into documenting 2016-03-23 10:48:09 +13:00
Marty Schoch
cecdfcbc69 moving japanese analyzer to blevex package 2016-03-13 18:05:05 -04:00
ikawaha
fcebff60e9 Add a test case 2016-02-21 19:59:52 +09:00
ikawaha
4fe7688431 Use a small version of kagome 2016-02-21 19:58:36 +09:00
Ben Campbell
994f4b4d11 added some godoc documentation for the en analyzer 2015-11-18 15:28:57 +13:00
Marty Schoch
c3a4fab911 Merge pull request #238 from ikawaha/ja-morph-analyzer
fix compliation with the latest changes to kagome
2015-09-28 17:05:46 -04:00
ikawaha
89af7978a9 fix compliation with the latest changes to kagome 2015-09-28 15:53:08 +09:00
Marty Schoch
f81b2be334 major refactor of bleve configuration
see #221 for full details
2015-09-16 17:10:59 -04:00
Marty Schoch
1f4ef3da8b move elision filter after lowercase filter
this affects all languages using the elision filter
languages fr and it are updated now
languages ca and ga are still missing other components and
do not yet have an analyzer, but they should follow this lead
once they are ready

fixes #218
2015-07-21 10:43:53 -04:00
Marty Schoch
65556f45c7 added additional tests for bug #214 2015-07-06 18:00:05 -04:00
Marty Schoch
50bd082257 fixed issues with portuguese analyzer
fixes #70
2015-03-11 14:22:11 -04:00
Marty Schoch
7970f42c29 fix issues with italian analyzer
switch it to not require icu/libstemmer
fixes #69
2015-03-11 11:48:13 -04:00
Marty Schoch
eeaf514848 switch fr to not require icu/libstemmer
also corrected copy/paste bug in test
2015-03-11 11:46:33 -04:00
Marty Schoch
8ae30fb6f0 fix issues with lucene stemmer
fixes issue #68
2015-03-11 11:14:29 -04:00
Salmān Aljammāz
9444af9366 arabic: add unicode normalization to analyzer 2015-02-06 19:50:58 +03:00
Salmān Aljammāz
91a8d5da9f arabic: check minimum length before stemming
This invloves converting tokens to a rune slice in the filter, but
at least we're now compatable with Lucene's stemmer.
2015-02-06 19:50:58 +03:00
Salmān Aljammāz
0470f93955 arabic: add more stemmer tests
These came from org.apache.lucene.analysis.ar.
2015-02-06 19:49:30 +03:00
Salmān Aljammāz
e461fed92a arabic stemmer: strip multiple suffixes
updates #150
2015-02-05 16:07:58 +03:00
Marty Schoch
4be974f489 added first implementation of arabic analyzer
one test cases is not passing and is commented out temporarily
updates #150
2015-02-05 07:44:55 -05:00
Marty Schoch
b9c22fe50d Merge pull request #154 from saljam/arabic
add arabic light stemmer
2015-02-05 07:09:54 -05:00
Salmān Aljammāz
945ef8158f add arabic light stemmer
fixes #28
updates #150
2015-02-05 13:24:30 +03:00
Marty Schoch
dd1cd189a7 added initial implementation of hindi analyzer
closes #66
2015-02-04 15:12:08 -05:00
Marty Schoch
40a8154bab changed en analyzer to use pure go components
behavior should be similar with unicode segmentation
and a porter stemmer
2014-10-21 16:38:58 -04:00
Marty Schoch
c4d1782689 new pure go porter stemmer integrated
renamed original libstemmer porter to "stemmer_porter_classic"
new pure go stemmer is "stemmer_porter"
2014-10-20 16:55:24 -04:00
Marty Schoch
febb8d2df1 renamed unicode_word_boundary package to icu
this is in preparation of alternative unicode word boundary impls
2014-10-17 15:15:13 -04:00
Marty Schoch
19d45dfdb6 fix compliation with the latest changes to kagome 2014-10-10 19:59:24 -07:00
Marty Schoch
1dc466a800 modified token filters to avoid creating new token stream
often the result stream was the same length, so can reuse the
existing token stream
also, in cases where a new stream was required, set capacity to
the length of the input stream.  most output stream are at least
as long as the input, so this may avoid some subsequent resizing
2014-09-23 18:41:32 -04:00
Marty Schoch
95e6e37e67 added build tag to fix runngin tests without tag 2014-09-16 11:28:44 -04:00
Marty Schoch
55c0e84665 relocated kagome tokenizer and introduced ja analyzer 2014-09-16 11:21:29 -04:00
Marty Schoch
1a1cf32a86 introducing cjk_bigram filter and cjk analyzer
closes #34
2014-09-11 10:39:05 -04:00
Marty Schoch
d534b0836b converted ALL_CAPS constants to CamelCase 2014-09-03 17:48:40 -04:00
Marty Schoch
7a7eb2e94c add newline between license and package
this avoids cluttering godocs with the license
2014-09-02 10:54:50 -04:00
Marty Schoch
1161361bea rename imports from couchbaselabs to blevesearch 2014-08-28 15:38:57 -04:00
Marty Schoch
e8959d03ae added build tag 'icu' to enable functionality dependent on it 2014-08-25 12:22:01 -04:00
Marty Schoch
21ef6e9878 added build tag for things depending on libstemmer 2014-08-25 12:06:10 -04:00
Marty Schoch
5dcd39ade7 added turkish analyzer test 2014-08-14 16:42:41 -04:00
Marty Schoch
21408e49eb added thai analyzer test 2014-08-14 16:39:37 -04:00
Marty Schoch
599ef6edce added swedish analyzer test 2014-08-14 16:12:48 -04:00
Marty Schoch
64255e3eb9 added russian analyzer test 2014-08-14 16:11:23 -04:00
Marty Schoch
8896de2039 added romanian analyzer test 2014-08-14 16:06:17 -04:00
Marty Schoch
c2937b4b81 added portuguese analyzer test
discrepencies found, logged in #70
failing tests commented out for now
2014-08-14 16:04:29 -04:00
Marty Schoch
81a9d325a2 added norwegian analyzer test 2014-08-14 16:01:03 -04:00
Marty Schoch
a3a97a09d3 added dutch analyzer test 2014-08-14 15:59:39 -04:00
Marty Schoch
6714d5d765 added italian analyzer test
discrepencies found between us and lucene, documented in #69
failing tests commented out for now
2014-08-14 15:56:47 -04:00
Marty Schoch
b9c0477762 added hungarian analyzer test 2014-08-14 15:51:55 -04:00
Marty Schoch
6a9f8e85ae added french analyzer test
many discrepencies noted, opened issue #68 to track this
failing tests commented out for now
2014-08-14 15:48:32 -04:00