Marty Schoch
2a703376ea
fix ineffectual assignments
2016-04-02 22:42:56 -04:00
Marty Schoch
7892882519
fix typos
2016-04-02 21:59:30 -04:00
Marty Schoch
194ee82c80
gofmt simplifications
2016-04-02 21:54:33 -04:00
Ben Campbell
4fafb2be3f
Merge branch 'master' into documenting
2016-03-23 10:48:09 +13:00
Marty Schoch
cecdfcbc69
moving japanese analyzer to blevex package
2016-03-13 18:05:05 -04:00
ikawaha
fcebff60e9
Add a test case
2016-02-21 19:59:52 +09:00
ikawaha
4fe7688431
Use a small version of kagome
2016-02-21 19:58:36 +09:00
Ben Campbell
994f4b4d11
added some godoc documentation for the en analyzer
2015-11-18 15:28:57 +13:00
Marty Schoch
c3a4fab911
Merge pull request #238 from ikawaha/ja-morph-analyzer
...
fix compliation with the latest changes to kagome
2015-09-28 17:05:46 -04:00
ikawaha
89af7978a9
fix compliation with the latest changes to kagome
2015-09-28 15:53:08 +09:00
Marty Schoch
f81b2be334
major refactor of bleve configuration
...
see #221 for full details
2015-09-16 17:10:59 -04:00
Marty Schoch
1f4ef3da8b
move elision filter after lowercase filter
...
this affects all languages using the elision filter
languages fr and it are updated now
languages ca and ga are still missing other components and
do not yet have an analyzer, but they should follow this lead
once they are ready
fixes #218
2015-07-21 10:43:53 -04:00
Marty Schoch
65556f45c7
added additional tests for bug #214
2015-07-06 18:00:05 -04:00
Marty Schoch
50bd082257
fixed issues with portuguese analyzer
...
fixes #70
2015-03-11 14:22:11 -04:00
Marty Schoch
7970f42c29
fix issues with italian analyzer
...
switch it to not require icu/libstemmer
fixes #69
2015-03-11 11:48:13 -04:00
Marty Schoch
eeaf514848
switch fr to not require icu/libstemmer
...
also corrected copy/paste bug in test
2015-03-11 11:46:33 -04:00
Marty Schoch
8ae30fb6f0
fix issues with lucene stemmer
...
fixes issue #68
2015-03-11 11:14:29 -04:00
Salmān Aljammāz
9444af9366
arabic: add unicode normalization to analyzer
2015-02-06 19:50:58 +03:00
Salmān Aljammāz
91a8d5da9f
arabic: check minimum length before stemming
...
This invloves converting tokens to a rune slice in the filter, but
at least we're now compatable with Lucene's stemmer.
2015-02-06 19:50:58 +03:00
Salmān Aljammāz
0470f93955
arabic: add more stemmer tests
...
These came from org.apache.lucene.analysis.ar.
2015-02-06 19:49:30 +03:00
Salmān Aljammāz
e461fed92a
arabic stemmer: strip multiple suffixes
...
updates #150
2015-02-05 16:07:58 +03:00
Marty Schoch
4be974f489
added first implementation of arabic analyzer
...
one test cases is not passing and is commented out temporarily
updates #150
2015-02-05 07:44:55 -05:00
Marty Schoch
b9c22fe50d
Merge pull request #154 from saljam/arabic
...
add arabic light stemmer
2015-02-05 07:09:54 -05:00
Salmān Aljammāz
945ef8158f
add arabic light stemmer
...
fixes #28
updates #150
2015-02-05 13:24:30 +03:00
Marty Schoch
dd1cd189a7
added initial implementation of hindi analyzer
...
closes #66
2015-02-04 15:12:08 -05:00
Marty Schoch
40a8154bab
changed en analyzer to use pure go components
...
behavior should be similar with unicode segmentation
and a porter stemmer
2014-10-21 16:38:58 -04:00
Marty Schoch
c4d1782689
new pure go porter stemmer integrated
...
renamed original libstemmer porter to "stemmer_porter_classic"
new pure go stemmer is "stemmer_porter"
2014-10-20 16:55:24 -04:00
Marty Schoch
febb8d2df1
renamed unicode_word_boundary package to icu
...
this is in preparation of alternative unicode word boundary impls
2014-10-17 15:15:13 -04:00
Marty Schoch
19d45dfdb6
fix compliation with the latest changes to kagome
2014-10-10 19:59:24 -07:00
Marty Schoch
1dc466a800
modified token filters to avoid creating new token stream
...
often the result stream was the same length, so can reuse the
existing token stream
also, in cases where a new stream was required, set capacity to
the length of the input stream. most output stream are at least
as long as the input, so this may avoid some subsequent resizing
2014-09-23 18:41:32 -04:00
Marty Schoch
95e6e37e67
added build tag to fix runngin tests without tag
2014-09-16 11:28:44 -04:00
Marty Schoch
55c0e84665
relocated kagome tokenizer and introduced ja analyzer
2014-09-16 11:21:29 -04:00
Marty Schoch
1a1cf32a86
introducing cjk_bigram filter and cjk analyzer
...
closes #34
2014-09-11 10:39:05 -04:00
Marty Schoch
d534b0836b
converted ALL_CAPS constants to CamelCase
2014-09-03 17:48:40 -04:00
Marty Schoch
7a7eb2e94c
add newline between license and package
...
this avoids cluttering godocs with the license
2014-09-02 10:54:50 -04:00
Marty Schoch
1161361bea
rename imports from couchbaselabs to blevesearch
2014-08-28 15:38:57 -04:00
Marty Schoch
e8959d03ae
added build tag 'icu' to enable functionality dependent on it
2014-08-25 12:22:01 -04:00
Marty Schoch
21ef6e9878
added build tag for things depending on libstemmer
2014-08-25 12:06:10 -04:00
Marty Schoch
5dcd39ade7
added turkish analyzer test
2014-08-14 16:42:41 -04:00
Marty Schoch
21408e49eb
added thai analyzer test
2014-08-14 16:39:37 -04:00
Marty Schoch
599ef6edce
added swedish analyzer test
2014-08-14 16:12:48 -04:00
Marty Schoch
64255e3eb9
added russian analyzer test
2014-08-14 16:11:23 -04:00
Marty Schoch
8896de2039
added romanian analyzer test
2014-08-14 16:06:17 -04:00
Marty Schoch
c2937b4b81
added portuguese analyzer test
...
discrepencies found, logged in #70
failing tests commented out for now
2014-08-14 16:04:29 -04:00
Marty Schoch
81a9d325a2
added norwegian analyzer test
2014-08-14 16:01:03 -04:00
Marty Schoch
a3a97a09d3
added dutch analyzer test
2014-08-14 15:59:39 -04:00
Marty Schoch
6714d5d765
added italian analyzer test
...
discrepencies found between us and lucene, documented in #69
failing tests commented out for now
2014-08-14 15:56:47 -04:00
Marty Schoch
b9c0477762
added hungarian analyzer test
2014-08-14 15:51:55 -04:00
Marty Schoch
6a9f8e85ae
added french analyzer test
...
many discrepencies noted, opened issue #68 to track this
failing tests commented out for now
2014-08-14 15:48:32 -04:00
Marty Schoch
f6f17c7a9e
added finish analyzer test
2014-08-14 15:27:45 -04:00