0
0
Commit Graph

776 Commits

Author SHA1 Message Date
Marty Schoch
80d7c4f870 added persian analyzer test 2014-08-14 15:24:42 -04:00
Marty Schoch
2ef7c80c92 added spanish analyzer test 2014-08-14 14:44:46 -04:00
Marty Schoch
4398aab723 added sorani analyzer test 2014-08-14 14:42:36 -04:00
Marty Schoch
b22941ee37 added test for danish anlyzer 2014-08-14 14:36:24 -04:00
Marty Schoch
8c9997f1e2 added test for german analyzer 2014-08-14 14:33:30 -04:00
Marty Schoch
6a951b9372 added analyzer test for english 2014-08-14 14:28:24 -04:00
Marty Schoch
c526a38369 major refactor of analysis files, now wired up to registry
ultimately this is make it more convenient for us to wire up
different elements of the analysis pipeline, without having to
preload everything into memory before we need it

separately the index layer now has a mechanism for storing
internal key/value pairs.  this is expected to be used to
store the mapping, and possibly other pieces of data by the
top layer, but not exposed to the user at the top.
2014-08-13 21:14:47 -04:00
Marty Schoch
3481ec9cef added hindi stemmer
closes #40
2014-08-11 22:29:47 -04:00
Marty Schoch
c65f7415ff added hindi normalizer
closes #64
2014-08-11 19:51:47 -04:00
Marty Schoch
cd0e3fd85b added german normalizer
updated german analyzer to use this normalizer
closes #65
2014-08-11 19:25:37 -04:00
Marty Schoch
a4707ebb4e configured zero width non joiner char filter, and persian analyzer 2014-08-11 18:57:04 -04:00
Marty Schoch
4ccd69ed45 added arabic normalizer
closes #63
2014-08-11 18:35:35 -04:00
Marty Schoch
73b252f6a6 added persian normalizer
closes #67
2014-08-11 18:15:41 -04:00
Marty Schoch
e5d4e6f1e4 refactored index layer to support batch operations
this change was then exposed at the higher levels
also the beer-sample app was upgraded to index in batches of 100
by default.  this yieled an indexing speed up from 27s to 16s.
closes #57
2014-08-11 16:27:18 -04:00
Marty Schoch
cac707b5b7 upgraded beer-search to index in background
this allows the app to be usable while indexing takes place
also prints out indexing performace stats to console
2014-08-11 13:20:32 -04:00
Marty Schoch
42895649de further streamlined the API
introduced concept of byte array converters
right now only wired up to top-level index mapping
allowing the removal of the JSON methods, now at the top level
we default to parsing []byte as JSON, override if thats not
the behavior you want.

future enhancements will allow use of these byte array converters
to control how byte arrays are handled elsewhere in documents
this would allow for handing binary attachments, etc in the future

closes #59
2014-08-11 12:47:29 -04:00
Marty Schoch
7bbaa8ecd5 added support for returning facet results with requests
supports terms, numeric ranges, and date ranges
closes #14
2014-08-11 11:03:29 -04:00
Marty Schoch
e21b7f4436 added sorani normalizer and stemmer, now have analyzer
closes #43
2014-08-08 09:38:28 -04:00
Marty Schoch
ef35ea1985 added czech stop word list
closes #36
2014-08-07 22:32:49 -04:00
Marty Schoch
964b87f76e added rune tokenizer
not used directly right now, but basis for other simple tokenizers
2014-08-07 22:14:26 -04:00
Marty Schoch
a3ac85c0de added prefix search to beer-search example app 2014-08-07 13:46:34 -04:00
Marty Schoch
292af78b9e implemented prefix search
closes #4
2014-08-07 13:45:39 -04:00
Marty Schoch
b16c1d7f79 changed term row encoding
previously we used the format:
't' <utf-8 term> <byte separator> <16-bit field id> <utf-8 docID> <byte separator>

now we have moved the field before the term, resulting in:
't' <16-bit field id> <utf-8 term> <byte separator> <utf-8 docID> <byte separator>

this means now instead of all fields with the same term being grouped together
all terms within the same field are grouped together

this allows us to enumerate the terms used with a field

this allows us to implement prefix search, and possibly improve numeric range queries
2014-08-07 09:39:04 -04:00
Marty Schoch
0e54fbd8da added keyword marker filter
updated stemmer filter to not stem tokens marked as keyword
closes #48
2014-08-07 08:13:00 -04:00
Marty Schoch
c19270108c added ngram and edge ngram token filters
closes #46 and closes #47
2014-08-06 22:11:42 -04:00
Marty Schoch
9a777aaa80 added token truncate filter
closes #49
2014-08-06 20:39:42 -04:00
Marty Schoch
0441c6bef6 refactored names, removing Term from things that were more general
closes #60
2014-08-06 20:03:41 -04:00
Marty Schoch
f69838d670 added support for boost with ^boostval in syntax query
closes #51
also found/fixed major bug in scoring, closes #61
2014-08-06 19:36:23 -04:00
Marty Schoch
41d4f67ee2 fix storing/retrieving numeric and date fields
also includes new ability to request stored fields be returned with results

closes #55 and closes #56 and closes #58
2014-08-06 13:52:20 -04:00
Marty Schoch
d84187fd24 added apostrophe filter to improve turkish analyzer
closes #27
2014-08-06 08:50:00 -04:00
Marty Schoch
649a4999a1 fix broken test 2014-08-06 08:42:57 -04:00
Marty Schoch
da26a24031 removed optional element, and fixed took time formatting 2014-08-06 08:25:42 -04:00
Marty Schoch
893efa670e added some more explicit mappings 2014-08-06 08:25:10 -04:00
Marty Schoch
78da6fd65d added support for a default field
this works at the config and index mapping levels
2014-08-06 08:23:29 -04:00
Marty Schoch
79ab2b9b3d added unicode normalization filter 2014-08-04 21:59:57 -04:00
Marty Schoch
2c0bf23fac added elision filter
defined article word maps for french, italian, irish and catalan
defined elision filters for these same languages
updated analyers for french and italian to use this new filter
irish and catalan still depend on other missing pieces
closes #25
2014-08-03 19:17:35 -04:00
Marty Schoch
0960cab0ae refactored StopWordsMap into WordMap so it can be reused
the ElisionFilter will need a word list of articles and plan to reuse this
2014-08-03 17:46:35 -04:00
Marty Schoch
1e5cc5c89f fix issues identified by go vet 2014-08-03 17:21:41 -04:00
Marty Schoch
41ee1028c9 added date range query to beer-search sample app 2014-08-03 17:20:00 -04:00
Marty Schoch
00d6f9700b added support for date range fields and queries
closes #9 and closes #11
2014-08-03 17:19:04 -04:00
Marty Schoch
65b2faeaa2 fix go vet 2014-08-02 19:17:53 -04:00
Marty Schoch
6d6819ed50 added range query to beer-sample app 2014-08-02 19:07:33 -04:00
Marty Schoch
78465ca686 added initial support for indexing and querying numeric values
closes #8 and closes #10
2014-08-02 19:05:58 -04:00
Marty Schoch
07eb6311a8 added utility package for encoding numbers at byte terms
this encoding scheme matches the one used by lucene
it has been packaged separately so that others may
more easily reuse it without using the rest of bleve
2014-08-02 19:03:16 -04:00
Marty Schoch
dd36f916c4 set token type 2014-07-31 14:10:27 -04:00
Marty Schoch
25540c736a introduced token type 2014-07-31 13:54:12 -04:00
Marty Schoch
c8918fe41a adding beer-sample to examples 2014-07-31 11:51:27 -04:00
Marty Schoch
4ae9eb895c added method to list fields in the index
also added a corresponding http handler
2014-07-31 11:47:36 -04:00
Marty Schoch
7a174d7d05 upaated README
closes #16
2014-07-31 10:58:20 -04:00
Marty Schoch
3eb63a887b improved stop word support and related config
stop words can be loaded from files/bytes, closes #19
stop words loaded for large list of languages, closes #20
defined language specific analyzers for as much as possible right now, closes #21
opened new issues for some of the remaining gaps
2014-07-30 19:29:52 -04:00