Previously, the code would gather all the backIndexRows before
processing them. This change instead merges the backIndexRows
concurrently on the theory that we might as well make progress on
compute & processing tasks while waiting for the rest of the back
index rows to be fetched from the KVStore.
Start backindex reading concurrently with analysi to try to utilize
more I/O bandwidth.
The analysis time vs indexing time stats tracking are also now "off",
since there's now concurrency between those actiivties.
One tradeoff is that the lock area in upside_down Batch() is increased
as part of this change.
Taking another optimization from firestorm, upside_down's
storeField()/indexField() funcs now also append() to passed-in arrays
rather than always allocating their own arrays.
Rows content is an implementation detail of bleve index and may change
in the future. That said, they also contains information valuable to
assess the quality of the index or understand its performances. So, as
long as we agree that type asserting rows should only be done if you
know what you are doing and are ready to deal with future changes, I see
no reason to hide the row fields from external packages.
Fix#268
It boils down to:
1. client sends some work and a notification channel to a single worker,
then waits.
2. worker processes the work
3. worker sends the result to the client using the notification channel
I do not see any problem with this, even with unbuffered channels.
this lays the foundation for supporting the new firestorm
indexing scheme. i'm merging these changes ahead of
the rest of the firestorm branch so i can continue
to make changes to the analysis pipeline in parallel
the logic for reading the docID from the keys
in this row relies on the keys NEVER containing
the byte separator character (0xff), this is OK
as we require that all keys be valid utf-8
however, it turns out that in the case where this
rule was violated, we would panic, because we
return nil, nil and later try to print the doc id