counter-intuitively the list impl was faster than the heap
the theory was the heap did more comparisons and swapping
so even though it benefited from no interface and some cache
locality, it was still slower
the idea was to just use a raw slice kept in order
this avoids the need for interface, but can take same comparison
approach as the list
it seems to work out:
go test -run=xxx -bench=. -benchmem -cpuprofile=cpu.out
BenchmarkTop10of100000Scores-4 5000 299959 ns/op 2600 B/op 36 allocs/op
BenchmarkTop100of100000Scores-4 2000 601104 ns/op 20720 B/op 216 allocs/op
BenchmarkTop10of1000000Scores-4 500 3450196 ns/op 2616 B/op 36 allocs/op
BenchmarkTop100of1000000Scores-4 500 3874276 ns/op 20856 B/op 216 allocs/op
PASS
ok github.com/blevesearch/bleve/search/collectors 7.440s