primary change is going back to sort values be []string
and not []interface{}, this avoid allocatiosn converting
into the interface{}
that sounds obvious, so why didn't we just do that first?
because a common (default) sort is score, which is naturally
a number, not a string (like terms). converting into the
number was also expensive, and the common case.
so, this solution also makes the change to NOT put the score
into the sort value list. instead you see the dummy value
"_score". this is just a placeholder, the actual sort impl
knows that field of the sort is the score, and will sort
using the actual score.
also, several other aspets of the benchmark were cleaned up
so that unnecessary allocations do not pollute the cpu profiles
Here are the updated benchmarks:
$ go test -run=xxx -bench=. -benchmem -cpuprofile=cpu.out
BenchmarkTop10of100000Scores-4 3000 465809 ns/op 2548 B/op 33 allocs/op
BenchmarkTop100of100000Scores-4 2000 626488 ns/op 21484 B/op 213 allocs/op
BenchmarkTop10of1000000Scores-4 300 5107658 ns/op 2560 B/op 33 allocs/op
BenchmarkTop100of1000000Scores-4 300 5275403 ns/op 21624 B/op 213 allocs/op
PASS
ok github.com/blevesearch/bleve/search/collectors 7.188s
Prior to this PR, master reported:
$ go test -run=xxx -bench=. -benchmem
BenchmarkTop10of100000Scores-4 3000 453269 ns/op 360161 B/op 42 allocs/op
BenchmarkTop100of100000Scores-4 2000 519131 ns/op 388275 B/op 219 allocs/op
BenchmarkTop10of1000000Scores-4 200 7459004 ns/op 4628236 B/op 52 allocs/op
BenchmarkTop100of1000000Scores-4 200 8064864 ns/op 4656596 B/op 232 allocs/op
PASS
ok github.com/blevesearch/bleve/search/collectors 7.385s
So, we're pretty close on the smaller datasets, and we scale better on the larger datasets.
We also show fewer allocations and bytes in all cases (some of this is artificial due to test cleanup).
our implementation uses: golang.org/x/net/context
New method SearchInContext() allows the user to run a search
in the provided context. If that context is cancelled or
exceeds its deadline Bleve will attempt to stop and return
as soon as possible. This is a *best effort* attempt at this
time and may *not* be in a timely manner. If the caller must
return very near the timeout, the call should also be wrapped
in a goroutine.
The IndexAlias implementation is affected in a slightly more
complex way. In order to return partial results when a timeout
occurs on some indexes, the timeout is strictly enforced, and
at the moment this does introduce an additional goroutine.
The Bleve implementation honoring the context is currently
very course-grained. Specifically we check the Done() channel
between each DocumentMatch produced during the search. In the
future we will propogate the context deeper into the internals
of Bleve, and this will allow finer-grained timeout behavior.
the Status section can report on the number of total/fail/success
indexes when querying across multiple indexes through IndexAlias
Further, searching an IndexAlias will now return partial results,
the burden is on the caller to check the number of failed
indexes and decide how to handle this situation.
this introduces disk format v4
now the summary rows for a term are stored in their own
"dictionary row" format, previously the same information
was stored in special term frequency rows
this now allows us to easily iterate all the terms for a field
in sorted order (useful for many other fuzzy data structures)
at the top-level of bleve you can now browse terms within a field
using the following api on the Index interface:
FieldDict(field string) (index.FieldDict, error)
FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error)
FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)
fixes#127
now created through the index itself
mapping problems reported early at the time
data is added to the batch, previously these
were not reported until the batch was executed
now you can access the underlying index/store implementations
using the Advanced() method. this is intedned for advanced
usage only, and can lead to problems if misused.
also, there is a new method NewUsing(...) which allows callers
of the top-level API to choose which underlying k/v store they
want to use.