0
0
Commit Graph

17 Commits

Author SHA1 Message Date
Marty Schoch
606fd6344b INDEX FORMAT CHANGE: change back index row value
Previously term entries were encoded pairwise (field/term), so
you'd have data like:

F1/T1 F1/T2 F1/T3 F2/T4 F3/T5

As you can see, even though field 1 has 3 terms, we repeat the F1
part in the encoded data.  This is a bit wasteful.

In the new format we encode it as a list of terms for each field:

F1/T1,T2,T3 F2/T4 F3/T5

When fields have multiple terms, this saves space.  In unit
tests there is no additional waste even in the case that a field
has only a single value.

Here are the results of an indexing test case (beer-search):

$ benchcmp indexing-before.txt indexing-after.txt
benchmark               old ns/op       new ns/op       delta
BenchmarkIndexing-4     11275835988     10745514321     -4.70%

benchmark               old allocs     new allocs     delta
BenchmarkIndexing-4     25230685       22480494       -10.90%

benchmark               old bytes      new bytes      delta
BenchmarkIndexing-4     4802816224     4741641856     -1.27%

And here are the results of a MatchAll search building a facet
on the "abv" field:

$ benchcmp facet-before.txt facet-after.txt
benchmark             old ns/op     new ns/op     delta
BenchmarkFacets-4     439762100     228064575     -48.14%

benchmark             old allocs     new allocs     delta
BenchmarkFacets-4     9460208        3723286        -60.64%

benchmark             old bytes     new bytes     delta
BenchmarkFacets-4     260784261     151746483     -41.81%

Although we expect the index to be smaller in many cases, the
beer-search index is about the same in this case.  However,
this may be due to the underlying storage (boltdb) in this case.

Finally, the index version was bumped from 5 to 7, since smolder
also used version 6, which could lead to some confusion.
2017-01-24 15:39:38 -05:00
Steve Yen
5927224e15 optimize mergeOldAndNew for case of first time a doc is seen 2017-01-09 22:48:58 -08:00
Steve Yen
790f2e3e32 optimize by alloc'ing arrays of TermFrequencyRow/TermVector 2017-01-09 22:42:00 -08:00
Steve Yen
8f4726ab10 use struct{}{} idiom instead of additional mark var 2017-01-09 10:17:26 -08:00
Steve Yen
302cac72c4 optimize mergeOldAndNew when non-update case 2017-01-08 17:59:49 -08:00
Steve Yen
40780254ae optimize upsidedown mergeOldAndNew existing key maps
The optimization is to provide a better initial size to the map
constructor and to use a 0-byte-sized struct{} as the map values.
2017-01-07 22:05:55 -08:00
Steve Yen
c2bafa2a51 optimize term vectors/locations via preallocated arrays
The change should hit the allocator less often when processing term
vectors/locations as it preallocates larger, contiguous arrays of
records upfront.
2017-01-07 12:34:06 -08:00
Steve Yen
8b140d84c4 minor optimization of upsidedown backIndexRowForDoc
This change might allow a smart enough golang compiler to perhaps
allocate a backIndexRow on the stack rather than the heap.
2017-01-07 11:49:42 -08:00
Steve Yen
c21d27e15a upsidedown TermFieldReader checks includeTermVectors flag param
The flag was part of the API, but wasn't previously checked.
2017-01-05 21:10:27 -08:00
Steve Yen
a941a0f318 simplify DocumentFieldTerms append() usage 2016-10-25 15:30:19 -07:00
Steve Yen
01fb59d293 optimize upside-down DictionaryRow for fewer parsing alloc's 2016-10-12 09:22:50 -07:00
Steve Yen
2d72b542c0 optimize upside-down FieldDict reader with prealloc'ed objects
As part of this commit, there's also a newly added
Dictionaryrow.parseDictionaryK() helper method.
2016-10-12 09:18:58 -07:00
Marty Schoch
2f48d7fb02 fix misspellings 2016-10-02 12:11:15 -04:00
Marty Schoch
2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch
6bf9dd59ab BREAKING CHANGE - additional package renaming
i recently learned that package names should also prefer the
singular form, not the plural form
2016-10-01 17:20:59 -04:00
Steve Yen
c362ab302e fix tracking of termSearchersFinished stats 2016-09-30 16:11:30 -07:00
Marty Schoch
f90856b8d3 BREAKING CHANGE - rename upside_down to upsidedown 2016-09-30 12:36:38 -04:00