bleve

Author	SHA1	Message	Date
Marty Schoch	2332455bd2	nicer formatting of license header	2016-10-02 10:13:14 -04:00
Marty Schoch	6bf9dd59ab	BREAKING CHANGE - additional package renaming i recently learned that package names should also prefer the singular form, not the plural form	2016-10-01 17:20:59 -04:00
Marty Schoch	f90856b8d3	BREAKING CHANGE - rename upside_down to upsidedown	2016-09-30 12:36:38 -04:00
Marty Schoch	9ec2ddd757	initial refactor of query into separate package	2016-09-29 14:54:16 -04:00
Marty Schoch	fb0f4bbecd	BREAKING CHANGE - new method to create memory only index Previously bleve allowed you to create a memory-only index by simply passing "" as the path argument to the New() method. This was not clear when reading the code, and led to some problematic error cases as well. Now, to create a memory-only index one should use the NewMemOnly() method. Passing "" as the path argument to the New() method will now return os.ErrInvalid. Advanced users calling NewUsing() can create disk-based or memory-only indexes, but the change here is that pass "" as the path argument no longer defaults you into getting a memory-only index. Instead, the KV store is selected manually, just as it is for the disk-based solutions. Here is an example use of the NewUsing() method to create a memory-only index: NewUsing("", indexMapping, Config.DefaultIndexType, Config.DefaultMemKVStore, nil) Config.DefaultMemKVStore is just a new default value added to the configuration, it currently points to gtreap.Name (which could have been used directly instead for more control) closes #427	2016-09-27 14:11:40 -04:00
Marty Schoch	389e18a779	attempt to support google app engine the default configuration, which sets the default kv engine to boltdb is now done in file protected with the !appengine build tag. this at least lets the analysis-wizzard app run locally in the appengine simulator. this still has not been tested on the real appengine, and further changes may be required.	2016-07-29 21:29:05 -04:00
Marty Schoch	bd2a23fb6d	remove firestorm index scheme firestorm was an experiment we learned a lot, but it did not result in a usable index scheme	2016-06-26 07:51:41 -04:00
Marty Schoch	8f8bb91439	simplify date parsing in queries, add date to query string parsing of date ranges in queries no longer consults the index mapping. it was deteremined that this wasn't very useful and led to overly complicated query syntax/behavior. instead, applications get set the datetime parser used for date range queries with the top-level config QueryDateTimeParser also, we now support querying date ranges in the query string, the syntax is: field:>"date" >,>=,<,<= operators are supported the date must be surrounded by quotes and must parse in the configured date format	2016-04-22 17:12:10 -04:00
Marty Schoch	aa7658bbb0	give indexes names, make stats available via expvar by default	2015-12-06 14:01:03 -05:00
Marty Schoch	699c86073a	make existing integration tests work with firestorm	2015-12-01 12:29:56 -05:00
Marty Schoch	f81b2be334	major refactor of bleve configuration see #221 for full details	2015-09-16 17:10:59 -04:00
Marty Schoch	dbb93b75a4	refactoring to allow pluggable index encodings this lays the foundation for supporting the new firestorm indexing scheme. i'm merging these changes ahead of the rest of the firestorm branch so i can continue to make changes to the analysis pipeline in parallel	2015-09-02 13:12:08 -04:00
Marty Schoch	4840aaaa5a	make analysis queue size changeable	2015-09-02 11:55:30 -04:00
Marty Schoch	e2223f5121	changed HTML highlighter to use html mark tag	2015-07-06 18:00:05 -04:00
Marty Schoch	00e5412e73	moving goleveldb into main config as it has no build tags	2015-04-24 17:21:35 -04:00
Marty Schoch	a9c07acbfa	refactor of kvstore api to support native merge in rocksdb refactor to share code in emulated batch refactor to share code in emulated merge refactor index kvstore benchmarks to share more code refactor index kvstore benchmarks to be more repeatable	2015-04-24 17:13:50 -04:00
Marty Schoch	0f16eccd6b	new tokenizer that allows you to pre-identify tokens with regexp name "exception" configure with list of regexp string "exceptions" these exceptions regexps that match sequences you want treated as a single token. these sequences are NOT sent to the underlying tokenizer configure "tokenizer" is the named tokenizer that should be used for processing all text regions not matching exceptions An example configuration with simple patterns to match URLs and email addresses: map[string]interface{}{ "type": "exception", "tokenizer": "unicode", "exceptions": []interface{}{ `[hH][tT][tT][pP][sS]?://(\S)`, `[fF][iI][lL][eE]://(\S)`, `[fF][tT][pP]://(\S)*`, `\S+@\S+`, } }	2015-04-08 15:31:58 -04:00
Marty Schoch	300ec79c96	first pass at checking errors that were ignored part of #169	2015-03-06 14:46:29 -05:00
Marty Schoch	dd1cd189a7	added initial implementation of hindi analyzer closes #66	2015-02-04 15:12:08 -05:00
Steve Yen	12dc2aff93	add go1.4 build tag to cznicb KVStore This is because github.com/cznic/b depends on sync.Pool.	2015-01-15 15:54:25 -08:00
Steve Yen	ea0a8657f3	added cznicb in-memory kvstore (no reader isolation)	2015-01-13 17:35:28 -08:00
Steve Yen	db82eae3f4	go fmt	2015-01-13 11:04:45 -08:00
Steve Yen	603c3af8bb	added gtreap in-memory, copy-on-write KVStore	2015-01-12 11:26:21 -08:00
Marty Schoch	5978f50b8c	added ability to log slow searches closes #88	2014-12-28 19:34:16 -08:00
Marty Schoch	0ddfa774ec	clean up logging to use package level *log.Logger by default messages go to ioutil.Discard	2014-12-28 12:14:48 -08:00
Marty Schoch	d452b2a10e	add support for dictionary based compound word filter partially addresses #115	2014-11-18 15:18:42 -05:00
Marty Schoch	cf3643f292	added pure go tokenizer to do unicode word boundary segmentation	2014-10-17 18:07:48 -04:00
Marty Schoch	97902e2619	text analysis now moved out of index write lock onto goroutine 1. text analysis is now done before the write lock is acquired 2. there is now a pool of analysis workers 3. the size of this pool is configurable 4. this allows for documents in a batch to be analyzed concurrently as a part of benchmarking these changes i've also introduce a new null storage implementation. this should never be used, as it does not actualy build an index. it does however let us go through all the normal indexing machinery, without incuring any indexing I/O. this is very helpful in measuring improvements made to the text analsysis pipeline, which are often overshadowed by indexing times in benchmarks actually building an index.	2014-09-24 08:13:14 -04:00
Marty Schoch	8c16d68c00	include cjk analyzer in default config	2014-09-11 10:44:14 -04:00
Marty Schoch	8debf26cb7	changed many components to not have defaults many of these defaults were arbitrary, and not having defaults lets us more easily flag them for configuration added a shingle filter introduce new toke type for shingles	2014-09-09 18:15:14 -04:00
Marty Schoch	933d99c576	rename the configurable token map from standard to custom this makes it consistent with the "custom" analyzer which operates similarly also, added it to the config.go so its registerd and available for use	2014-09-07 14:09:38 -04:00
Marty Schoch	1dcd06e412	add ability to define custom analysis as part of index mapping now, as part of your index mapping you can create custom analysis components. these custome analysis components are serialized as part of the mapping, and reused as you would expect on subsequent accesses.	2014-09-01 13:55:23 -04:00
Marty Schoch	2ee7289bc8	major refactor of search package this started initially to relocate highlighting into a self contained package, which would then also use the registry however, it turned into a much larger refactor in order to avoid cyclic imports now facets, searchers, scorers and collectors are also broken out into subpackages of search	2014-09-01 11:15:38 -04:00
Marty Schoch	209f808722	improve go docs at the top level part of #79	2014-08-31 10:55:22 -04:00
Marty Schoch	7bfad18d40	moved byte array converts into the analysis package	2014-08-29 19:23:21 -04:00
Marty Schoch	77c998a7a2	made config private and fixed broken test	2014-08-29 15:32:36 -04:00
Marty Schoch	37d3f0205d	cleanup spacing between license and package	2014-08-29 14:18:36 -04:00
Marty Schoch	1161361bea	rename imports from couchbaselabs to blevesearch	2014-08-28 15:38:57 -04:00
Marty Schoch	ef59abe4c9	added build tag 'leveldb' to enable this kv store by default we now use the pure go boltdb kv store it is less tested at this point but appears to work test pass, and moves us closer to the goal of being able to just "go get" bleve	2014-08-25 15:18:24 -04:00
Marty Schoch	e8959d03ae	added build tag 'icu' to enable functionality dependent on it	2014-08-25 12:22:01 -04:00
Marty Schoch	21ef6e9878	added build tag for things depending on libstemmer	2014-08-25 12:06:10 -04:00
Marty Schoch	f37bb77794	added build tag to enable cld2	2014-08-25 11:24:20 -04:00
Marty Schoch	27f001bc14	overhauled top-level New/Open API New is now used to create new indexes Open is used to open existing indexes calls to Open no longer specify a mapping because the mapping is serialized and stored along with the index	2014-08-20 16:58:20 -04:00
Marty Schoch	c526a38369	major refactor of analysis files, now wired up to registry ultimately this is make it more convenient for us to wire up different elements of the analysis pipeline, without having to preload everything into memory before we need it separately the index layer now has a mechanism for storing internal key/value pairs. this is expected to be used to store the mapping, and possibly other pieces of data by the top layer, but not exposed to the user at the top.	2014-08-13 21:14:47 -04:00
Marty Schoch	3481ec9cef	added hindi stemmer closes #40	2014-08-11 22:29:47 -04:00
Marty Schoch	c65f7415ff	added hindi normalizer closes #64	2014-08-11 19:51:47 -04:00
Marty Schoch	cd0e3fd85b	added german normalizer updated german analyzer to use this normalizer closes #65	2014-08-11 19:25:37 -04:00
Marty Schoch	a4707ebb4e	configured zero width non joiner char filter, and persian analyzer	2014-08-11 18:57:04 -04:00
Marty Schoch	4ccd69ed45	added arabic normalizer closes #63	2014-08-11 18:35:35 -04:00
Marty Schoch	73b252f6a6	added persian normalizer closes #67	2014-08-11 18:15:41 -04:00

1 2

63 Commits