0
0
Fork 0
Commit Graph

29 Commits

Author SHA1 Message Date
Marty Schoch 8063132766 fix new issues found by go vet when using stdlib context pkg 2018-02-27 11:57:21 -08:00
Marty Schoch c74e08f039 BREAKING API CHANGE - use stdlib context pkg
update all references to context to use std lib pkg
2018-02-27 11:33:43 -08:00
Steve Yen dc2b6cd656 simplified MultiSearch requires that indexes honor context deadlines
MultiSearch previously had its own timeout checking.  This commit
removes that timeout checking, so that now MultiSearch instead depends
upon the bleve.Index implementations to perform their own context
deadline/timeout checking.

Because deadline/timeout checking is now handled by the bleve.Index
implementations, this change allows applications to provide richer
error and status results during timeouts.
2016-11-03 16:44:20 -07:00
Marty Schoch 2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch c487f29a46 BREAKING CHANGE - rename numeric_util to numeric 2016-09-30 12:36:43 -04:00
Marty Schoch 79cc39a67e refactor mapping to inteface and move into separate package
the index mapping contains some relatively messy logic
and the top-level bleve package only cares about a relatively
small portion of this
the motivation for this change is to codify the part that the
top-level bleve package cares about into an interface
then move all the details into its own package

NOTE: the top-level bleve package still has hard dependency on
the actual implementation (for now) because it must deserialize
mappings from JSON and simply assumes it is this one instance.
this is seen as OK for now, and this issue could be revisited
in a future change.  moving the logic into a separate package
is seen as a simplification of top-level bleve, even though
we still depend on the one particular implementation.
2016-09-29 14:53:18 -04:00
Marty Schoch 3fd2a64872 BREAKING CHANGE - removed DumpXXX() methods from bleve.Index
The DumpXXX() methods were always documented as internal and
unsupported.  However, now they are being removed from the
public top-level API.  They are still available on the internal
IndexReader, which can be accessed using the Advanced() method.

The DocCount() and DumpXXX() methods on the internal index
have moved to the internal index reader, since they logically
operate on a snapshot of an index.
2016-09-13 12:40:01 -04:00
Marty Schoch 60750c1614 improved implementation to address perf regressions
primary change is going back to sort values be []string
and not []interface{}, this avoid allocatiosn converting
into the interface{}

that sounds obvious, so why didn't we just do that first?
because a common (default) sort is score, which is naturally
a number, not a string (like terms).  converting into the
number was also expensive, and the common case.

so, this solution also makes the change to NOT put the score
into the sort value list.  instead you see the dummy value
"_score".  this is just a placeholder, the actual sort impl
knows that field of the sort is the score, and will sort
using the actual score.

also, several other aspets of the benchmark were cleaned up
so that unnecessary allocations do not pollute the cpu profiles

Here are the updated benchmarks:

$ go test -run=xxx -bench=. -benchmem -cpuprofile=cpu.out
BenchmarkTop10of100000Scores-4     	    3000	    465809 ns/op	    2548 B/op	      33 allocs/op
BenchmarkTop100of100000Scores-4    	    2000	    626488 ns/op	   21484 B/op	     213 allocs/op
BenchmarkTop10of1000000Scores-4    	     300	   5107658 ns/op	    2560 B/op	      33 allocs/op
BenchmarkTop100of1000000Scores-4   	     300	   5275403 ns/op	   21624 B/op	     213 allocs/op
PASS
ok  	github.com/blevesearch/bleve/search/collectors	7.188s

Prior to this PR, master reported:

$ go test -run=xxx -bench=. -benchmem
BenchmarkTop10of100000Scores-4          3000        453269 ns/op      360161 B/op         42 allocs/op
BenchmarkTop100of100000Scores-4         2000        519131 ns/op      388275 B/op        219 allocs/op
BenchmarkTop10of1000000Scores-4          200       7459004 ns/op     4628236 B/op         52 allocs/op
BenchmarkTop100of1000000Scores-4         200       8064864 ns/op     4656596 B/op        232 allocs/op
PASS
ok      github.com/blevesearch/bleve/search/collectors  7.385s

So, we're pretty close on the smaller datasets, and we scale better on the larger datasets.
We also show fewer allocations and bytes in all cases (some of this is artificial due to test cleanup).
2016-08-25 15:47:07 -04:00
Marty Schoch ce0b299d6f switch sort impl to use interface
this improves perf in the case where we're not doing any sorting
as we avoid allocating memory and converting scores into
numeric terms
2016-08-24 19:02:22 -04:00
Marty Schoch 0322ecd441 adjust new sort functionality to also work with MultiSearch 2016-08-24 14:07:10 -04:00
Marty Schoch 2a703376ea fix ineffectual assignments 2016-04-02 22:42:56 -04:00
Marty Schoch 194ee82c80 gofmt simplifications 2016-04-02 21:54:33 -04:00
Marty Schoch d7292ed891 add support for gathering stats via map for easier consumption 2016-03-07 18:37:46 -05:00
Marty Schoch 0b2380d9bf introduce ability for searches to timeout or be cancelled
our implementation uses: golang.org/x/net/context

New method SearchInContext() allows the user to run a search
in the provided context.  If that context is cancelled or
exceeds its deadline Bleve will attempt to stop and return
as soon as possible.  This is a *best effort* attempt at this
time and may *not* be in a timely manner.  If the caller must
return very near the timeout, the call should also be wrapped
in a goroutine.

The IndexAlias implementation is affected in a slightly more
complex way.  In order to return partial results when a timeout
occurs on some indexes, the timeout is strictly enforced, and
at the moment this does introduce an additional goroutine.

The Bleve implementation honoring the context is currently
very course-grained.  Specifically we check the Done() channel
between each DocumentMatch produced during the search.  In the
future we will propogate the context deeper into the internals
of Bleve, and this will allow finer-grained timeout behavior.
2016-03-02 17:30:21 -05:00
Marty Schoch 496fd365fd fix broken test expectations 2016-02-23 13:05:16 -05:00
Marty Schoch 214b67ad66 SearchResult now includes a Status section
the Status section can report on the number of total/fail/success
indexes when querying across multiple indexes through IndexAlias

Further, searching an IndexAlias will now return partial results,
the burden is on the caller to check the number of failed
indexes and decide how to handle this situation.
2016-02-22 16:50:40 -05:00
Marty Schoch 84ec206fec add some tests for index names in results 2015-12-08 14:38:46 -05:00
Marty Schoch aa7658bbb0 give indexes names, make stats available via expvar by default 2015-12-06 14:01:03 -05:00
Marty Schoch 93e01a803e fix issues identified by errcheck
part of #169
2015-04-07 14:52:00 -04:00
Marty Schoch 522f9d5cc7 significant change to index format, support dictionary rows
this introduces disk format v4
now the summary rows for a term are stored in their own
"dictionary row" format, previously the same information
was stored in special term frequency rows
this now allows us to easily iterate all the terms for a field
in sorted order (useful for many other fuzzy data structures)

at the top-level of bleve you can now browse terms within a field
using the following api on the Index interface:

  FieldDict(field string) (index.FieldDict, error)
  FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error)
  FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)

fixes #127
2015-03-10 16:22:19 -04:00
Marty Schoch af356acff0 changed batch behavior
now created through the index itself
mapping problems reported early at the time
data is added to the batch, previously these
were not reported until the batch was executed
2015-03-09 08:20:39 -04:00
Marty Schoch 0771f813ce SearchResult Took field now returns full time in Search()
likewise, MultiSearch used by aliases spanning multiple
will also return full time in MultiSearch()
closes #163
2015-02-19 12:11:40 +05:30
Marty Schoch daeaa2c129 fix bad math in multi search, and return original reqest in res
related to #164
2015-02-18 17:24:22 +05:30
Marty Schoch 68712cd142 support for accessing the underlying index/store impls
now you can access the underlying index/store implementations
using the Advanced() method.  this is intedned for advanced
usage only, and can lead to problems if misused.

also, there is a new method NewUsing(...) which allows callers
of the top-level API to choose which underlying k/v store they
want to use.
2014-12-27 13:23:46 -08:00
Marty Schoch 0355525d93 added another set of tests for IndexAlias with single Index 2014-11-25 14:56:42 -05:00
Marty Schoch 5fa93c8540 added index alias tests for multiple aliases 2014-11-25 14:25:56 -05:00
Marty Schoch b3841fa335 added more tests for MultiSearch 2014-11-25 13:50:15 -05:00
Marty Schoch 3c886276ed fix error message typo 2014-11-24 17:14:44 -05:00
Marty Schoch 69d69e4516 fix panic in MultiSearch when all indexes return error
fixes #126
2014-11-24 17:12:16 -05:00