Before this change, upside down's reader would alloc a new
TermFrequencyRow on every Next(), which would be immediately
transformed into an index.TermFieldDoc{}. This change reuses a
pre-allocated TermFrequencyRow that's a field in the reader.
From some bleve-query perf profiling, term field vectors appeared to
be alloc'ed, which was unnecessary as term field vectors are disabled
in the bleve-blast/bleve-query tests.
Currently bleve batch is build by user goroutine
Then read by bleve gourinte
This is still safe when used correctly
However, Reset() will modify the map, which is now a data race
This fix is to simply make batch.Reset() alloc new maps.
This provides a data-access pattern that can be used safely.
Also, this thread argues that creating a new map may be faster
than trying to reuse an existing one:
https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J
Separate but related, I have opted to remove the "unsafe batch"
checking that we did. This was always limited anyway, and now
users of Go 1.6 are just as likely to get a panic from the
runtime for concurrent map access anyway. So, the price paid
by us (additional mutex) is not worth it.
fixes#360 and #260
several data structures had a pointer at the start of the struct
on some 32-bit systems, this causes the remaining fields no longer
be aligned on 64-bit boundaries
the fix identifed by @pmezard is to put the counters first in the
struct, which guarantees correct alignment
fixes#359
This is somewhat unlikely, but if a term is (incredibly) popular, its
uvarint count value representation might go beyond 8 bytes.
Some KVStore implementations (like forestdb) provide a BatchEx cgo
optimization that depends on proper preallocated counting, so this
change provides a proper worst-case estimate based on the max-unvarint
of 10 bytes instead of the previously incorrect 8 bytes.
With this change, the upside_down batchRows() and firestorm
batchRows() now use the new KVWriter.NewBatchEx() API, which can
improve performance by reducing the number of cgo hops.
In order to spend less time in append(), this change in upside_down
(similar to another recent performance change in firestorm) builds up
an array of arrays as the eventual input to batchRows().
Previously, the code would gather all the backIndexRows before
processing them. This change instead merges the backIndexRows
concurrently on the theory that we might as well make progress on
compute & processing tasks while waiting for the rest of the back
index rows to be fetched from the KVStore.
Start backindex reading concurrently with analysi to try to utilize
more I/O bandwidth.
The analysis time vs indexing time stats tracking are also now "off",
since there's now concurrency between those actiivties.
One tradeoff is that the lock area in upside_down Batch() is increased
as part of this change.
Taking another optimization from firestorm, upside_down's
storeField()/indexField() funcs now also append() to passed-in arrays
rather than always allocating their own arrays.
Rows content is an implementation detail of bleve index and may change
in the future. That said, they also contains information valuable to
assess the quality of the index or understand its performances. So, as
long as we agree that type asserting rows should only be done if you
know what you are doing and are ready to deal with future changes, I see
no reason to hide the row fields from external packages.
Fix#268