Marty Schoch
1161361bea
rename imports from couchbaselabs to blevesearch
2014-08-28 15:38:57 -04:00
Marty Schoch
e8959d03ae
added build tag 'icu' to enable functionality dependent on it
2014-08-25 12:22:01 -04:00
Marty Schoch
21ef6e9878
added build tag for things depending on libstemmer
2014-08-25 12:06:10 -04:00
Marty Schoch
08db2eae42
added alternate build tag 'full' which will be an alias to enable all
2014-08-25 11:40:58 -04:00
Marty Schoch
f37bb77794
added build tag to enable cld2
2014-08-25 11:24:20 -04:00
Marty Schoch
092e30a38e
tried to word the instructions for static and dynamic linking
2014-08-25 10:54:15 -04:00
deoxxa
22b7b3bc24
compile libcld2 statically
2014-08-24 03:44:57 +10:00
Marty Schoch
b48dc87afa
added test case clarifying whitespace tokenizer on empty input
2014-08-19 10:43:52 -04:00
Marty Schoch
5dcd39ade7
added turkish analyzer test
2014-08-14 16:42:41 -04:00
Marty Schoch
21408e49eb
added thai analyzer test
2014-08-14 16:39:37 -04:00
Marty Schoch
599ef6edce
added swedish analyzer test
2014-08-14 16:12:48 -04:00
Marty Schoch
64255e3eb9
added russian analyzer test
2014-08-14 16:11:23 -04:00
Marty Schoch
8896de2039
added romanian analyzer test
2014-08-14 16:06:17 -04:00
Marty Schoch
c2937b4b81
added portuguese analyzer test
...
discrepencies found, logged in #70
failing tests commented out for now
2014-08-14 16:04:29 -04:00
Marty Schoch
81a9d325a2
added norwegian analyzer test
2014-08-14 16:01:03 -04:00
Marty Schoch
a3a97a09d3
added dutch analyzer test
2014-08-14 15:59:39 -04:00
Marty Schoch
6714d5d765
added italian analyzer test
...
discrepencies found between us and lucene, documented in #69
failing tests commented out for now
2014-08-14 15:56:47 -04:00
Marty Schoch
b9c0477762
added hungarian analyzer test
2014-08-14 15:51:55 -04:00
Marty Schoch
6a9f8e85ae
added french analyzer test
...
many discrepencies noted, opened issue #68 to track this
failing tests commented out for now
2014-08-14 15:48:32 -04:00
Marty Schoch
f6f17c7a9e
added finish analyzer test
2014-08-14 15:27:45 -04:00
Marty Schoch
80d7c4f870
added persian analyzer test
2014-08-14 15:24:42 -04:00
Marty Schoch
2ef7c80c92
added spanish analyzer test
2014-08-14 14:44:46 -04:00
Marty Schoch
4398aab723
added sorani analyzer test
2014-08-14 14:42:36 -04:00
Marty Schoch
b22941ee37
added test for danish anlyzer
2014-08-14 14:36:24 -04:00
Marty Schoch
8c9997f1e2
added test for german analyzer
2014-08-14 14:33:30 -04:00
Marty Schoch
6a951b9372
added analyzer test for english
2014-08-14 14:28:24 -04:00
Marty Schoch
c526a38369
major refactor of analysis files, now wired up to registry
...
ultimately this is make it more convenient for us to wire up
different elements of the analysis pipeline, without having to
preload everything into memory before we need it
separately the index layer now has a mechanism for storing
internal key/value pairs. this is expected to be used to
store the mapping, and possibly other pieces of data by the
top layer, but not exposed to the user at the top.
2014-08-13 21:14:47 -04:00
Marty Schoch
3481ec9cef
added hindi stemmer
...
closes #40
2014-08-11 22:29:47 -04:00
Marty Schoch
c65f7415ff
added hindi normalizer
...
closes #64
2014-08-11 19:51:47 -04:00
Marty Schoch
cd0e3fd85b
added german normalizer
...
updated german analyzer to use this normalizer
closes #65
2014-08-11 19:25:37 -04:00
Marty Schoch
a4707ebb4e
configured zero width non joiner char filter, and persian analyzer
2014-08-11 18:57:04 -04:00
Marty Schoch
4ccd69ed45
added arabic normalizer
...
closes #63
2014-08-11 18:35:35 -04:00
Marty Schoch
73b252f6a6
added persian normalizer
...
closes #67
2014-08-11 18:15:41 -04:00
Marty Schoch
e21b7f4436
added sorani normalizer and stemmer, now have analyzer
...
closes #43
2014-08-08 09:38:28 -04:00
Marty Schoch
ef35ea1985
added czech stop word list
...
closes #36
2014-08-07 22:32:49 -04:00
Marty Schoch
964b87f76e
added rune tokenizer
...
not used directly right now, but basis for other simple tokenizers
2014-08-07 22:14:26 -04:00
Marty Schoch
0e54fbd8da
added keyword marker filter
...
updated stemmer filter to not stem tokens marked as keyword
closes #48
2014-08-07 08:13:00 -04:00
Marty Schoch
c19270108c
added ngram and edge ngram token filters
...
closes #46 and closes #47
2014-08-06 22:11:42 -04:00
Marty Schoch
9a777aaa80
added token truncate filter
...
closes #49
2014-08-06 20:39:42 -04:00
Marty Schoch
d84187fd24
added apostrophe filter to improve turkish analyzer
...
closes #27
2014-08-06 08:50:00 -04:00
Marty Schoch
79ab2b9b3d
added unicode normalization filter
2014-08-04 21:59:57 -04:00
Marty Schoch
2c0bf23fac
added elision filter
...
defined article word maps for french, italian, irish and catalan
defined elision filters for these same languages
updated analyers for french and italian to use this new filter
irish and catalan still depend on other missing pieces
closes #25
2014-08-03 19:17:35 -04:00
Marty Schoch
0960cab0ae
refactored StopWordsMap into WordMap so it can be reused
...
the ElisionFilter will need a word list of articles and plan to reuse this
2014-08-03 17:46:35 -04:00
Marty Schoch
00d6f9700b
added support for date range fields and queries
...
closes #9 and closes #11
2014-08-03 17:19:04 -04:00
Marty Schoch
25540c736a
introduced token type
2014-07-31 13:54:12 -04:00
Marty Schoch
3eb63a887b
improved stop word support and related config
...
stop words can be loaded from files/bytes, closes #19
stop words loaded for large list of languages, closes #20
defined language specific analyzers for as much as possible right now, closes #21
opened new issues for some of the remaining gaps
2014-07-30 19:29:52 -04:00
Marty Schoch
2968d3538a
major refactor, apologies for the large commit
...
removed analyzers (these are now built as needed through config)
removed html chacter filter (now built as needed through config)
added missing license header
changed constructor signature of filters that cannot return errors
filter constructors that can have errors, now have Must variant which panics
change cdl2 tokenizer into filter (should only see lower-case input)
new top level index api, closes #5
refactored index tests to not rely directly on analyzers
moved query objects to top-level
new top level search api, closes #12
top score collector allows skipping results
index mapping supports _all by default, closes #3 and closes #6
index mapping supports disabled sections, closes #7
new http sub package with reusable http.Handler's, closes #22
2014-07-30 12:30:38 -04:00
Marty Schoch
d7341524aa
trying to fix compilation on drone
2014-07-21 18:00:59 -04:00
Marty Schoch
737dcb6118
fixing c++ issues on drone.io
2014-07-21 17:49:53 -04:00
Marty Schoch
b629636424
new tokenizer which uses cld2 to guess the field's language
2014-07-21 17:21:31 -04:00
Marty Schoch
70a8b03bed
added support for composite fields
2014-07-21 17:05:55 -04:00
Marty Schoch
900b54e240
changed to not use pkg-config, brittle on some platforms
2014-04-18 11:50:14 -04:00
Marty Schoch
9058db20ec
fix commit of old file
2014-04-18 11:09:36 -04:00
Marty Schoch
3d842dfaf2
initial commit
2014-04-17 16:55:53 -04:00