0
0
Fork 0
Commit Graph

65 Commits

Author SHA1 Message Date
Marty Schoch 94b0367e47 switch back to upsidedown as default index before merge to master 2018-01-05 16:53:16 -05:00
Marty Schoch f13b786609 fix up issues to get all bleve unit tests passing for scorch
make scorch default
2017-12-11 15:47:41 -05:00
Marty Schoch 2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch 6bf9dd59ab BREAKING CHANGE - additional package renaming
i recently learned that package names should also prefer the
singular form, not the plural form
2016-10-01 17:20:59 -04:00
Marty Schoch f90856b8d3 BREAKING CHANGE - rename upside_down to upsidedown 2016-09-30 12:36:38 -04:00
Marty Schoch 9ec2ddd757 initial refactor of query into separate package 2016-09-29 14:54:16 -04:00
Marty Schoch fb0f4bbecd BREAKING CHANGE - new method to create memory only index
Previously bleve allowed you to create a memory-only index by
simply passing "" as the path argument to the New() method.

This was not clear when reading the code, and led to some
problematic error cases as well.

Now, to create a memory-only index one should use the
NewMemOnly() method.  Passing "" as the path argument
to the New() method will now return os.ErrInvalid.

Advanced users calling NewUsing() can create disk-based or
memory-only indexes, but the change here is that pass ""
as the path argument no longer defaults you into getting
a memory-only index.  Instead, the KV store is selected
manually, just as it is for the disk-based solutions.

Here is an example use of the NewUsing() method to create
a memory-only index:

NewUsing("", indexMapping, Config.DefaultIndexType,
         Config.DefaultMemKVStore, nil)

Config.DefaultMemKVStore is just a new default value
added to the configuration, it currently points to
gtreap.Name (which could have been used directly
instead for more control)

closes #427
2016-09-27 14:11:40 -04:00
Marty Schoch 389e18a779 attempt to support google app engine
the default configuration, which sets the default kv engine
to boltdb is now done in file protected with the !appengine
build tag.  this at least lets the analysis-wizzard app
run locally in the appengine simulator.

this still has not been tested on the real appengine, and further
changes may be required.
2016-07-29 21:29:05 -04:00
Marty Schoch bd2a23fb6d remove firestorm index scheme
firestorm was an experiment
we learned a lot, but it did not result in a usable index scheme
2016-06-26 07:51:41 -04:00
Marty Schoch 8f8bb91439 simplify date parsing in queries, add date to query string
parsing of date ranges in queries no longer consults the
index mapping.  it was deteremined that this wasn't very useful
and led to overly complicated query syntax/behavior.

instead, applications get set the datetime parser used for
date range queries with the top-level config QueryDateTimeParser

also, we now support querying date ranges in the query string,
the syntax is:

field:>"date"

>,>=,<,<= operators are supported
the date must be surrounded by quotes
and must parse in the configured date format
2016-04-22 17:12:10 -04:00
Marty Schoch aa7658bbb0 give indexes names, make stats available via expvar by default 2015-12-06 14:01:03 -05:00
Marty Schoch 699c86073a make existing integration tests work with firestorm 2015-12-01 12:29:56 -05:00
Marty Schoch f81b2be334 major refactor of bleve configuration
see #221 for full details
2015-09-16 17:10:59 -04:00
Marty Schoch dbb93b75a4 refactoring to allow pluggable index encodings
this lays the foundation for supporting the new firestorm
indexing scheme.  i'm merging these changes ahead of
the rest of the firestorm branch so i can continue
to make changes to the analysis pipeline in parallel
2015-09-02 13:12:08 -04:00
Marty Schoch 4840aaaa5a make analysis queue size changeable 2015-09-02 11:55:30 -04:00
Marty Schoch e2223f5121 changed HTML highlighter to use html mark tag 2015-07-06 18:00:05 -04:00
Marty Schoch 00e5412e73 moving goleveldb into main config as it has no build tags 2015-04-24 17:21:35 -04:00
Marty Schoch a9c07acbfa refactor of kvstore api to support native merge in rocksdb
refactor to share code in emulated batch
refactor to share code in emulated merge
refactor index kvstore benchmarks to share more code
refactor index kvstore benchmarks to be more repeatable
2015-04-24 17:13:50 -04:00
Marty Schoch 0f16eccd6b new tokenizer that allows you to pre-identify tokens with regexp
name "exception"
configure with list of regexp string "exceptions"
these exceptions regexps that match sequences you want treated
as a single token.  these sequences are NOT sent to the
underlying tokenizer
configure "tokenizer" is the named tokenizer that should be
used for processing all text regions not matching exceptions

An example configuration with simple patterns to match URLs and
email addresses:

map[string]interface{}{
	"type":      "exception",
	"tokenizer": "unicode",
	"exceptions": []interface{}{
		`[hH][tT][tT][pP][sS]?://(\S)*`,
		`[fF][iI][lL][eE]://(\S)*`,
		`[fF][tT][pP]://(\S)*`,
		`\S+@\S+`,
  }
}
2015-04-08 15:31:58 -04:00
Marty Schoch 300ec79c96 first pass at checking errors that were ignored
part of #169
2015-03-06 14:46:29 -05:00
Marty Schoch dd1cd189a7 added initial implementation of hindi analyzer
closes #66
2015-02-04 15:12:08 -05:00
Steve Yen 12dc2aff93 add go1.4 build tag to cznicb KVStore
This is because github.com/cznic/b depends on sync.Pool.
2015-01-15 15:54:25 -08:00
Steve Yen ea0a8657f3 added cznicb in-memory kvstore (no reader isolation) 2015-01-13 17:35:28 -08:00
Steve Yen db82eae3f4 go fmt 2015-01-13 11:04:45 -08:00
Steve Yen 603c3af8bb added gtreap in-memory, copy-on-write KVStore 2015-01-12 11:26:21 -08:00
Marty Schoch 5978f50b8c added ability to log slow searches
closes #88
2014-12-28 19:34:16 -08:00
Marty Schoch 0ddfa774ec clean up logging to use package level *log.Logger
by default messages go to ioutil.Discard
2014-12-28 12:14:48 -08:00
Marty Schoch d452b2a10e add support for dictionary based compound word filter
partially addresses #115
2014-11-18 15:18:42 -05:00
Marty Schoch cf3643f292 added pure go tokenizer to do unicode word boundary segmentation 2014-10-17 18:07:48 -04:00
Marty Schoch 97902e2619 text analysis now moved out of index write lock onto goroutine
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently

as a part of benchmarking these changes i've also introduce a new
null storage implementation.  this should never be used, as it
does not actualy build an index.  it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O.  this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
2014-09-24 08:13:14 -04:00
Marty Schoch 8c16d68c00 include cjk analyzer in default config 2014-09-11 10:44:14 -04:00
Marty Schoch 8debf26cb7 changed many components to not have defaults
many of these defaults were arbitrary, and not having
defaults lets us more easily flag them for configuration
added a shingle filter
introduce new toke type for shingles
2014-09-09 18:15:14 -04:00
Marty Schoch 933d99c576 rename the configurable token map from standard to custom
this makes it consistent with the "custom" analyzer
which operates similarly
also, added it to the config.go so its registerd and
available for use
2014-09-07 14:09:38 -04:00
Marty Schoch 1dcd06e412 add ability to define custom analysis as part of index mapping
now, as part of your index mapping you can create custom
analysis components.  these custome analysis components
are serialized as part of the mapping, and reused
as you would expect on subsequent accesses.
2014-09-01 13:55:23 -04:00
Marty Schoch 2ee7289bc8 major refactor of search package
this started initially to relocate highlighting into
a self contained package, which would then also use
the registry
however, it turned into a much larger refactor in
order to avoid cyclic imports
now facets, searchers, scorers and collectors
are also broken out into subpackages of search
2014-09-01 11:15:38 -04:00
Marty Schoch 209f808722 improve go docs at the top level
part of #79
2014-08-31 10:55:22 -04:00
Marty Schoch 7bfad18d40 moved byte array converts into the analysis package 2014-08-29 19:23:21 -04:00
Marty Schoch 77c998a7a2 made config private and fixed broken test 2014-08-29 15:32:36 -04:00
Marty Schoch 37d3f0205d cleanup spacing between license and package 2014-08-29 14:18:36 -04:00
Marty Schoch 1161361bea rename imports from couchbaselabs to blevesearch 2014-08-28 15:38:57 -04:00
Marty Schoch ef59abe4c9 added build tag 'leveldb' to enable this kv store
by default we now use the pure go boltdb kv store
it is less tested at this point but appears to work
test pass, and moves us closer to the goal of being
able to just "go get" bleve
2014-08-25 15:18:24 -04:00
Marty Schoch e8959d03ae added build tag 'icu' to enable functionality dependent on it 2014-08-25 12:22:01 -04:00
Marty Schoch 21ef6e9878 added build tag for things depending on libstemmer 2014-08-25 12:06:10 -04:00
Marty Schoch f37bb77794 added build tag to enable cld2 2014-08-25 11:24:20 -04:00
Marty Schoch 27f001bc14 overhauled top-level New/Open API
New is now used to create new indexes
Open is used to open existing indexes
calls to Open no longer specify a mapping because the mapping
is serialized and stored along with the index
2014-08-20 16:58:20 -04:00
Marty Schoch c526a38369 major refactor of analysis files, now wired up to registry
ultimately this is make it more convenient for us to wire up
different elements of the analysis pipeline, without having to
preload everything into memory before we need it

separately the index layer now has a mechanism for storing
internal key/value pairs.  this is expected to be used to
store the mapping, and possibly other pieces of data by the
top layer, but not exposed to the user at the top.
2014-08-13 21:14:47 -04:00
Marty Schoch 3481ec9cef added hindi stemmer
closes #40
2014-08-11 22:29:47 -04:00
Marty Schoch c65f7415ff added hindi normalizer
closes #64
2014-08-11 19:51:47 -04:00
Marty Schoch cd0e3fd85b added german normalizer
updated german analyzer to use this normalizer
closes #65
2014-08-11 19:25:37 -04:00
Marty Schoch a4707ebb4e configured zero width non joiner char filter, and persian analyzer 2014-08-11 18:57:04 -04:00