update README for zap file format changes
This commit is contained in:
parent
74b2eeb14d
commit
f246e0e4c0
@ -19,6 +19,7 @@ Current usage:
|
|||||||
- next use dictionary to navigate to posting list for a specific term
|
- next use dictionary to navigate to posting list for a specific term
|
||||||
- walk posting list
|
- walk posting list
|
||||||
- if necessary, walk posting details as we go
|
- if necessary, walk posting details as we go
|
||||||
|
- if location info is desired, consult location bitmap to see if it is there
|
||||||
|
|
||||||
## stored fields section
|
## stored fields section
|
||||||
|
|
||||||
@ -89,6 +90,16 @@ If you know the doc number you're interested in, this format lets you jump to th
|
|||||||
|
|
||||||
If you know the doc number you're interested in, this format lets you jump to the correct chunk (docNum/chunkFactor) directly and then seek within that chunk until you find it.
|
If you know the doc number you're interested in, this format lets you jump to the correct chunk (docNum/chunkFactor) directly and then seek within that chunk until you find it.
|
||||||
|
|
||||||
|
## bitmaps of hits with location info
|
||||||
|
|
||||||
|
- for each posting list
|
||||||
|
- preparation phase:
|
||||||
|
- encode roaring bitmap (inidicating which hits have location details indexed) posting list to bytes (so we know the length)
|
||||||
|
- file writing phase:
|
||||||
|
- remember the start position for this bitmap
|
||||||
|
- write length of encoded roaring bitmap
|
||||||
|
- write the serialized roaring bitmap data
|
||||||
|
|
||||||
## postings list section
|
## postings list section
|
||||||
|
|
||||||
- for each posting list
|
- for each posting list
|
||||||
@ -98,6 +109,7 @@ If you know the doc number you're interested in, this format lets you jump to th
|
|||||||
- remember the start position for this posting list
|
- remember the start position for this posting list
|
||||||
- write freq/norm details offset (remembered from previous, as varint uint64)
|
- write freq/norm details offset (remembered from previous, as varint uint64)
|
||||||
- write location details offset (remembered from previous, as varint uint64)
|
- write location details offset (remembered from previous, as varint uint64)
|
||||||
|
- write location bitmap offset (remembered from pervious, as varint uint64)
|
||||||
- write length of encoded roaring bitmap
|
- write length of encoded roaring bitmap
|
||||||
- write the serialized roaring bitmap data
|
- write the serialized roaring bitmap data
|
||||||
|
|
||||||
@ -116,7 +128,6 @@ If you know the doc number you're interested in, this format lets you jump to th
|
|||||||
- for each field
|
- for each field
|
||||||
- file writing phase:
|
- file writing phase:
|
||||||
- remember start offset for each field
|
- remember start offset for each field
|
||||||
- write 1 if field has location info indexed, 0 if not (varint uint64)
|
|
||||||
- write dictionary address (remembered from previous) (varint uint64)
|
- write dictionary address (remembered from previous) (varint uint64)
|
||||||
- write length of field name (varint uint64)
|
- write length of field name (varint uint64)
|
||||||
- write field name bytes
|
- write field name bytes
|
||||||
|
Loading…
Reference in New Issue
Block a user