bleve/index/upsidedown/upsidedown.proto

message BackIndexTermsEntry {
  required uint32 field = 1;
	repeated string terms = 2;
}

message BackIndexStoreEntry {
	required uint32 field = 1;
	repeated uint64 arrayPositions = 2;
}

message BackIndexRowValue {
	repeated BackIndexTermsEntry termsEntries = 1;
	repeated BackIndexStoreEntry storedEntries = 2;
}
INDEX FORMAT CHANGE: change back index row value Previously term entries were encoded pairwise (field/term), so you'd have data like: F1/T1 F1/T2 F1/T3 F2/T4 F3/T5 As you can see, even though field 1 has 3 terms, we repeat the F1 part in the encoded data. This is a bit wasteful. In the new format we encode it as a list of terms for each field: F1/T1,T2,T3 F2/T4 F3/T5 When fields have multiple terms, this saves space. In unit tests there is no additional waste even in the case that a field has only a single value. Here are the results of an indexing test case (beer-search): $ benchcmp indexing-before.txt indexing-after.txt benchmark old ns/op new ns/op delta BenchmarkIndexing-4 11275835988 10745514321 -4.70% benchmark old allocs new allocs delta BenchmarkIndexing-4 25230685 22480494 -10.90% benchmark old bytes new bytes delta BenchmarkIndexing-4 4802816224 4741641856 -1.27% And here are the results of a MatchAll search building a facet on the "abv" field: $ benchcmp facet-before.txt facet-after.txt benchmark old ns/op new ns/op delta BenchmarkFacets-4 439762100 228064575 -48.14% benchmark old allocs new allocs delta BenchmarkFacets-4 9460208 3723286 -60.64% benchmark old bytes new bytes delta BenchmarkFacets-4 260784261 151746483 -41.81% Although we expect the index to be smaller in many cases, the beer-search index is about the same in this case. However, this may be due to the underlying storage (boltdb) in this case. Finally, the index version was bumped from 5 to 7, since smolder also used version 6, which could lead to some confusion. 2017-01-24 21:33:54 +01:00			`message BackIndexTermsEntry {`
			`required uint32 field = 1;`
			`repeated string terms = 2;`
major change to fields now can track array positions for field values stored fields now include this in the key and the back index now uses protobufs to simplify serialization closes #73 2014-08-19 14:58:26 +02:00			`}`

			`message BackIndexStoreEntry {`
			`required uint32 field = 1;`
			`repeated uint64 arrayPositions = 2;`
			`}`

			`message BackIndexRowValue {`
INDEX FORMAT CHANGE: change back index row value Previously term entries were encoded pairwise (field/term), so you'd have data like: F1/T1 F1/T2 F1/T3 F2/T4 F3/T5 As you can see, even though field 1 has 3 terms, we repeat the F1 part in the encoded data. This is a bit wasteful. In the new format we encode it as a list of terms for each field: F1/T1,T2,T3 F2/T4 F3/T5 When fields have multiple terms, this saves space. In unit tests there is no additional waste even in the case that a field has only a single value. Here are the results of an indexing test case (beer-search): $ benchcmp indexing-before.txt indexing-after.txt benchmark old ns/op new ns/op delta BenchmarkIndexing-4 11275835988 10745514321 -4.70% benchmark old allocs new allocs delta BenchmarkIndexing-4 25230685 22480494 -10.90% benchmark old bytes new bytes delta BenchmarkIndexing-4 4802816224 4741641856 -1.27% And here are the results of a MatchAll search building a facet on the "abv" field: $ benchcmp facet-before.txt facet-after.txt benchmark old ns/op new ns/op delta BenchmarkFacets-4 439762100 228064575 -48.14% benchmark old allocs new allocs delta BenchmarkFacets-4 9460208 3723286 -60.64% benchmark old bytes new bytes delta BenchmarkFacets-4 260784261 151746483 -41.81% Although we expect the index to be smaller in many cases, the beer-search index is about the same in this case. However, this may be due to the underlying storage (boltdb) in this case. Finally, the index version was bumped from 5 to 7, since smolder also used version 6, which could lead to some confusion. 2017-01-24 21:33:54 +01:00			`repeated BackIndexTermsEntry termsEntries = 1;`
major change to fields now can track array positions for field values stored fields now include this in the key and the back index now uses protobufs to simplify serialization closes #73 2014-08-19 14:58:26 +02:00			`repeated BackIndexStoreEntry storedEntries = 2;`
BREAKING CHANGE - rename upside_down to upsidedown 2016-09-30 17:30:17 +02:00			`}`