Since its just the pointer size of the IndexReader that is
being accounted for while estimating the RAM needed to
execute a search query, get rid of the Size() API in the
IndexReader interface.
This API (unexported) will estimate the amount of memory needed to execute
a search query over an index before the collector begins data collection.
Sample estimates for certain queries:
{Size: 10, BenchmarkUpsidedownSearchOverhead}
ESTIMATE BENCHMEM
TermQuery 4616 4796
MatchQuery 5210 5405
DisjunctionQuery (Match queries) 7700 8447
DisjunctionQuery (Term queries) 6514 6591
ConjunctionQuery (Match queries) 7524 8175
Nested disjunction query (disjunction of disjunctions) 10306 10708
…
a failing test was producing unhelpful pointer addresses as
the only debug output. this changes the output to print
the terms and locations as readable text
part of #629
the logic of how a phrase search works should be an internal
detail of the phrase searcher. further, these changes will
allow proper scoring of phrase matches, which require access
to the underlying searcher objects, which were hidden in the
previous approach.
this originated from a misunderstanding of mine going back
several years. the values need not be float64 just because
we plan to serialize them as json.
there are still larger questions about what the right type should
be, and where should any conversions go. but, this commit
simply attempts to address the most egregious problems
at the core, the Next() method moves another searcher forward
and checks each hit to see if it also satisfies the phrase
constraints. the current implementation has 4 nested for loops.
these nested loops make it harder read (indentation) and harder
to reason about (complexity).
this refactor does not remove any loops, it simply moves some
of the inner loops into separate methods so that one can
more easily reason about the parts separately.