bleve

Author	SHA1	Message	Date
Sacheendra Talluri	4b3967a68e	rewrite custom analyzer without using reflect	2015-01-08 00:25:16 +05:30
Sacheendra Talluri	4abf2a638e	adds handling of []string type attributes to custom analyzer	2015-01-08 00:08:20 +05:30
Marty Schoch	0ddfa774ec	clean up logging to use package level *log.Logger by default messages go to ioutil.Discard	2014-12-28 12:14:48 -08:00
Silvan Jegen	ef18dfe4cd	Fix typos in comments and strings	2014-12-18 18:43:12 +01:00
Sergey Avseyev	570109a983	Update "code.google.com" import paths https://github.com/couchbase/sync_gateway/issues/492	2014-12-10 01:17:49 +03:00
Silvan Jegen	412049d63c	Remove unneeded import statements	2014-11-29 14:25:24 +01:00
Marty Schoch	fcab645f96	add test to cover kana/ideographic case	2014-11-26 08:42:40 -05:00
Marty Schoch	d452b2a10e	add support for dictionary based compound word filter partially addresses #115	2014-11-18 15:18:42 -05:00
Marty Schoch	40a8154bab	changed en analyzer to use pure go components behavior should be similar with unicode segmentation and a porter stemmer	2014-10-21 16:38:58 -04:00
Marty Schoch	c4d1782689	new pure go porter stemmer integrated renamed original libstemmer porter to "stemmer_porter_classic" new pure go stemmer is "stemmer_porter"	2014-10-20 16:55:24 -04:00
Marty Schoch	cf3643f292	added pure go tokenizer to do unicode word boundary segmentation	2014-10-17 18:07:48 -04:00
Marty Schoch	dcb90ad176	added benchmark for tokenizing English text	2014-10-17 18:07:01 -04:00
Marty Schoch	febb8d2df1	renamed unicode_word_boundary package to icu this is in preparation of alternative unicode word boundary impls	2014-10-17 15:15:13 -04:00
Marty Schoch	19d45dfdb6	fix compliation with the latest changes to kagome	2014-10-10 19:59:24 -07:00
Marty Schoch	1dc466a800	modified token filters to avoid creating new token stream often the result stream was the same length, so can reuse the existing token stream also, in cases where a new stream was required, set capacity to the length of the input stream. most output stream are at least as long as the input, so this may avoid some subsequent resizing	2014-09-23 18:41:32 -04:00
Marty Schoch	95e6e37e67	added build tag to fix runngin tests without tag	2014-09-16 11:28:44 -04:00
Marty Schoch	55c0e84665	relocated kagome tokenizer and introduced ja analyzer	2014-09-16 11:21:29 -04:00
Silvan Jegen	29bdc094a9	Use byte positions instead of character positions	2014-09-14 13:19:30 +02:00
Silvan Jegen	a8ec7f7af2	Add tests for the Kagome tokenizer	2014-09-13 17:45:22 +02:00
Silvan Jegen	ebf100c097	Add the Kagome tokenizer for Japanese	2014-09-13 17:45:19 +02:00
Marty Schoch	1a1cf32a86	introducing cjk_bigram filter and cjk analyzer closes #34	2014-09-11 10:39:05 -04:00
Marty Schoch	cb5ccd2b1d	fix whitespace tokenizer previously would fail to split ascii running into ideographic	2014-09-11 10:38:02 -04:00
Marty Schoch	8debf26cb7	changed many components to not have defaults many of these defaults were arbitrary, and not having defaults lets us more easily flag them for configuration added a shingle filter introduce new toke type for shingles	2014-09-09 18:15:14 -04:00
Marty Schoch	6b4c86b35a	changed whitespace tokenizer to work better on cjk input now it will return each cjk character as a separate token this will pair well with a cjk bigram filter for indexing	2014-09-07 14:11:01 -04:00
Marty Schoch	933d99c576	rename the configurable token map from standard to custom this makes it consistent with the "custom" analyzer which operates similarly also, added it to the config.go so its registerd and available for use	2014-09-07 14:09:38 -04:00
Marty Schoch	9e78643bad	icu tokenier uses brk status to set token type part of #34	2014-09-07 10:24:02 -04:00
Marty Schoch	377ae090d0	additional golint issues resolved	2014-09-03 18:17:26 -04:00
Marty Schoch	d534b0836b	converted ALL_CAPS constants to CamelCase	2014-09-03 17:48:40 -04:00
Marty Schoch	7a7eb2e94c	add newline between license and package this avoids cluttering godocs with the license	2014-09-02 10:54:50 -04:00
Marty Schoch	1dcd06e412	add ability to define custom analysis as part of index mapping now, as part of your index mapping you can create custom analysis components. these custome analysis components are serialized as part of the mapping, and reused as you would expect on subsequent accesses.	2014-09-01 13:55:23 -04:00
Marty Schoch	7bfad18d40	moved byte array converts into the analysis package	2014-08-29 19:23:21 -04:00
Marty Schoch	1161361bea	rename imports from couchbaselabs to blevesearch	2014-08-28 15:38:57 -04:00
Marty Schoch	e8959d03ae	added build tag 'icu' to enable functionality dependent on it	2014-08-25 12:22:01 -04:00
Marty Schoch	21ef6e9878	added build tag for things depending on libstemmer	2014-08-25 12:06:10 -04:00
Marty Schoch	08db2eae42	added alternate build tag 'full' which will be an alias to enable all	2014-08-25 11:40:58 -04:00
Marty Schoch	f37bb77794	added build tag to enable cld2	2014-08-25 11:24:20 -04:00
Marty Schoch	092e30a38e	tried to word the instructions for static and dynamic linking	2014-08-25 10:54:15 -04:00
deoxxa	22b7b3bc24	compile libcld2 statically	2014-08-24 03:44:57 +10:00
Marty Schoch	b48dc87afa	added test case clarifying whitespace tokenizer on empty input	2014-08-19 10:43:52 -04:00
Marty Schoch	5dcd39ade7	added turkish analyzer test	2014-08-14 16:42:41 -04:00
Marty Schoch	21408e49eb	added thai analyzer test	2014-08-14 16:39:37 -04:00
Marty Schoch	599ef6edce	added swedish analyzer test	2014-08-14 16:12:48 -04:00
Marty Schoch	64255e3eb9	added russian analyzer test	2014-08-14 16:11:23 -04:00
Marty Schoch	8896de2039	added romanian analyzer test	2014-08-14 16:06:17 -04:00
Marty Schoch	c2937b4b81	added portuguese analyzer test discrepencies found, logged in #70 failing tests commented out for now	2014-08-14 16:04:29 -04:00
Marty Schoch	81a9d325a2	added norwegian analyzer test	2014-08-14 16:01:03 -04:00
Marty Schoch	a3a97a09d3	added dutch analyzer test	2014-08-14 15:59:39 -04:00
Marty Schoch	6714d5d765	added italian analyzer test discrepencies found between us and lucene, documented in #69 failing tests commented out for now	2014-08-14 15:56:47 -04:00
Marty Schoch	b9c0477762	added hungarian analyzer test	2014-08-14 15:51:55 -04:00
Marty Schoch	6a9f8e85ae	added french analyzer test many discrepencies noted, opened issue #68 to track this failing tests commented out for now	2014-08-14 15:48:32 -04:00
Marty Schoch	f6f17c7a9e	added finish analyzer test	2014-08-14 15:27:45 -04:00
Marty Schoch	80d7c4f870	added persian analyzer test	2014-08-14 15:24:42 -04:00
Marty Schoch	2ef7c80c92	added spanish analyzer test	2014-08-14 14:44:46 -04:00
Marty Schoch	4398aab723	added sorani analyzer test	2014-08-14 14:42:36 -04:00
Marty Schoch	b22941ee37	added test for danish anlyzer	2014-08-14 14:36:24 -04:00
Marty Schoch	8c9997f1e2	added test for german analyzer	2014-08-14 14:33:30 -04:00
Marty Schoch	6a951b9372	added analyzer test for english	2014-08-14 14:28:24 -04:00
Marty Schoch	c526a38369	major refactor of analysis files, now wired up to registry ultimately this is make it more convenient for us to wire up different elements of the analysis pipeline, without having to preload everything into memory before we need it separately the index layer now has a mechanism for storing internal key/value pairs. this is expected to be used to store the mapping, and possibly other pieces of data by the top layer, but not exposed to the user at the top.	2014-08-13 21:14:47 -04:00
Marty Schoch	3481ec9cef	added hindi stemmer closes #40	2014-08-11 22:29:47 -04:00
Marty Schoch	c65f7415ff	added hindi normalizer closes #64	2014-08-11 19:51:47 -04:00
Marty Schoch	cd0e3fd85b	added german normalizer updated german analyzer to use this normalizer closes #65	2014-08-11 19:25:37 -04:00
Marty Schoch	a4707ebb4e	configured zero width non joiner char filter, and persian analyzer	2014-08-11 18:57:04 -04:00
Marty Schoch	4ccd69ed45	added arabic normalizer closes #63	2014-08-11 18:35:35 -04:00
Marty Schoch	73b252f6a6	added persian normalizer closes #67	2014-08-11 18:15:41 -04:00
Marty Schoch	e21b7f4436	added sorani normalizer and stemmer, now have analyzer closes #43	2014-08-08 09:38:28 -04:00
Marty Schoch	ef35ea1985	added czech stop word list closes #36	2014-08-07 22:32:49 -04:00
Marty Schoch	964b87f76e	added rune tokenizer not used directly right now, but basis for other simple tokenizers	2014-08-07 22:14:26 -04:00
Marty Schoch	0e54fbd8da	added keyword marker filter updated stemmer filter to not stem tokens marked as keyword closes #48	2014-08-07 08:13:00 -04:00
Marty Schoch	c19270108c	added ngram and edge ngram token filters closes #46 and closes #47	2014-08-06 22:11:42 -04:00
Marty Schoch	9a777aaa80	added token truncate filter closes #49	2014-08-06 20:39:42 -04:00
Marty Schoch	d84187fd24	added apostrophe filter to improve turkish analyzer closes #27	2014-08-06 08:50:00 -04:00
Marty Schoch	79ab2b9b3d	added unicode normalization filter	2014-08-04 21:59:57 -04:00
Marty Schoch	2c0bf23fac	added elision filter defined article word maps for french, italian, irish and catalan defined elision filters for these same languages updated analyers for french and italian to use this new filter irish and catalan still depend on other missing pieces closes #25	2014-08-03 19:17:35 -04:00
Marty Schoch	0960cab0ae	refactored StopWordsMap into WordMap so it can be reused the ElisionFilter will need a word list of articles and plan to reuse this	2014-08-03 17:46:35 -04:00
Marty Schoch	00d6f9700b	added support for date range fields and queries closes #9 and closes #11	2014-08-03 17:19:04 -04:00
Marty Schoch	25540c736a	introduced token type	2014-07-31 13:54:12 -04:00
Marty Schoch	3eb63a887b	improved stop word support and related config stop words can be loaded from files/bytes, closes #19 stop words loaded for large list of languages, closes #20 defined language specific analyzers for as much as possible right now, closes #21 opened new issues for some of the remaining gaps	2014-07-30 19:29:52 -04:00
Marty Schoch	2968d3538a	major refactor, apologies for the large commit removed analyzers (these are now built as needed through config) removed html chacter filter (now built as needed through config) added missing license header changed constructor signature of filters that cannot return errors filter constructors that can have errors, now have Must variant which panics change cdl2 tokenizer into filter (should only see lower-case input) new top level index api, closes #5 refactored index tests to not rely directly on analyzers moved query objects to top-level new top level search api, closes #12 top score collector allows skipping results index mapping supports _all by default, closes #3 and closes #6 index mapping supports disabled sections, closes #7 new http sub package with reusable http.Handler's, closes #22	2014-07-30 12:30:38 -04:00
Marty Schoch	d7341524aa	trying to fix compilation on drone	2014-07-21 18:00:59 -04:00
Marty Schoch	737dcb6118	fixing c++ issues on drone.io	2014-07-21 17:49:53 -04:00
Marty Schoch	b629636424	new tokenizer which uses cld2 to guess the field's language	2014-07-21 17:21:31 -04:00
Marty Schoch	70a8b03bed	added support for composite fields	2014-07-21 17:05:55 -04:00
Marty Schoch	900b54e240	changed to not use pkg-config, brittle on some platforms	2014-04-18 11:50:14 -04:00
Marty Schoch	9058db20ec	fix commit of old file	2014-04-18 11:09:36 -04:00
Marty Schoch	3d842dfaf2	initial commit	2014-04-17 16:55:53 -04:00

1 2 3 4

185 Commits