Steve Yen
c09e2a08ca
scorch zap chunkedContentCoder reuses chunk metadata slice memory
...
And, renamed the chunk MetaData.DocID field to DocNum for naming
correctness, where much of this commit is the mechanical effect of
that rename.
2018-02-05 07:39:16 -08:00
Steve Yen
3da191852d
scorch zap tighten up prepareSegment()'s lock area
2018-02-05 07:39:16 -08:00
Steve Yen
6578655758
scorch zap refactored out mergeToWriter() func
...
This is a step towards supporting in-memory zap segment merging.
2018-02-05 07:39:16 -08:00
Steve Yen
eb21bf8315
scorch zap merge & build share persistStoredFieldValues()
...
Refactored out a helper func, persistStoredFieldValues(), that both
the persistence and merge codepaths now share.
2018-02-05 07:38:55 -08:00
Sreekanth Sivasankaran
9636209ae5
Update persister.go
...
comment updated
2018-02-05 20:49:30 +05:30
Sreekanth Sivasankaran
678c412157
unblock the files for clean up, esp for merged new segment files
2018-02-02 14:44:02 +05:30
Steve Yen
714f5321e0
scorch zap merge storedFieldVals inner loop optimization
2018-02-01 16:28:15 -08:00
Abhinav Dangeti
ff210fbc6d
Merge pull request #744 from abhinavdangeti/geopoint-fix
...
MB-26396: Handling docs with geopoints in slice format
2018-02-01 10:20:06 -08:00
Steve Yen
175f80403a
Merge pull request #747 from steveyen/master
...
scorch zap DictIterator term count fixed and more merge unit tests
2018-02-01 10:13:18 -08:00
Abhinav Dangeti
c24f8944c4
Merge pull request #738 from abhinavdangeti/scorch-stats
...
Add support for certain disk stats
2018-02-01 08:35:59 -08:00
Steve Yen
93b037cdbb
scorch zap TestMergeWithUpdates()
2018-01-31 11:44:41 -08:00
Steve Yen
4dd64b68fa
scorch zap TestMergeWithEmptySegment(s)
2018-01-30 22:27:40 -08:00
Steve Yen
684ee3c0e7
scorch zap DictIterator term count fixed and more merge unit tests
...
The zap DictionaryIterator Next() was incorrectly returning the
postingsList offset as the term count. As part of this, refactored
out a PostingsList.read() helper method.
Also added more merge unit test scenarios, including merging a segment
for a few rounds to see if there are differences before/after merging.
2018-01-30 21:22:06 -08:00
abhinavdangeti
6451c8c37f
MB-26396: Handling documents with geopoints in slice format
...
+ The issue lies with parsing documents containing a geopoint
in slice format - which wasn't handled.
+ Unit test that verifies the fix.
2018-01-29 18:31:56 -08:00
Steve Yen
a3b125508b
Merge pull request #746 from steveyen/master
...
more scorch zap optimizations (array for docTermMap, etc)
2018-01-29 15:50:04 -08:00
Steve Yen
634cfa0560
scorch zap chunkedIntCoder optimization to prealloc some final buf
2018-01-29 11:03:53 -08:00
Steve Yen
a444c25ddf
scorch zap merge uses array for docTermMap with no sorting
...
Instead of sorting docNum keys from a hashmap, this change instead
iterates from docNum 0 to N and uses an array instead of hashmap.
The array is also reused across outer loop iterations.
This optimizes for when there's a lot of structural similarity between
docs, where many/most docs have the same fields. i.e., beers,
breweries. If every doc has completely different fields, then this
change might produce worse behavior compared to the previous sparse
hashmap approach.
2018-01-29 10:47:08 -08:00
Steve Yen
5d1a2b0ad7
Merge pull request #743 from steveyen/master
...
zap-based in-memory segment impl & various merge optimizations
2018-01-29 09:22:12 -08:00
Steve Yen
745575a6c1
scorch zap mergeStoredAndRemap uses array indexing, not append()
...
Since we have right array size preallocated, we don't need the extra
capacity checking of append().
2018-01-27 11:35:10 -08:00
Steve Yen
8dd17a3b20
scorch zap mergeStoredAndRemap uses continue for less indentation
2018-01-27 11:35:10 -08:00
Steve Yen
0041664bc4
scorch zap merge computeNewDocCount() optimize 1 variable
2018-01-27 11:35:10 -08:00
Steve Yen
6985db13a0
scorch zap merge reuses docNumbers array
2018-01-27 11:35:10 -08:00
Steve Yen
916bbf4125
scorch zap merge prealloc's docTermMap capacity
2018-01-27 11:35:10 -08:00
Steve Yen
56cdb68f35
scorch zap merge checks err2 not err
...
Also, optimize the appending of the termSeparator so that the
docTermMap is accessed and updated just once.
2018-01-27 11:35:10 -08:00
Steve Yen
3030d4edb5
scorch zap merge preallocs segNewDocNums capacity
2018-01-27 11:35:10 -08:00
Steve Yen
9038d75c98
scorch zap allocate govarint.U64Base128Encoder just once
...
Instead of allocating a govarint.U64Base128Encoder in the inner loop,
allocate it just once on the outside, as it appears that it's just a
thin wrapper around binary.PutUvarint().
2018-01-27 11:35:10 -08:00
Steve Yen
10dd5489c2
scorch zap Dict.postingsList() takes []byte for more mem control
...
This allows callers that already have a []byte term to avoid
string'ification garbage.
2018-01-27 11:35:10 -08:00
Steve Yen
6a17ff48c7
scorch zap removed uneeded []byte cast of term
2018-01-27 11:35:10 -08:00
Steve Yen
d389e2bb40
scorch zap merge file cleanup on error, and some minor prealloc's
2018-01-27 11:35:10 -08:00
Steve Yen
29d526a7c2
scorch zap merge uses DefaultChunkFactor
2018-01-27 11:35:10 -08:00
Steve Yen
603425c2c5
scorch zap mergerLoop missing fireAsyncError case
2018-01-27 11:35:10 -08:00
Steve Yen
37121c3b49
scorch zap writeRoaringWithLen optimized with reused bufs
2018-01-27 11:35:10 -08:00
Steve Yen
5a035dc9aa
scorch zap in-memory segment representation (SegmentBase)
...
The zap SegmentBase struct is a refactoring of the zap Segment into
the subset of fields that are needed for read-only ops, without any
persistence related info. This allows us to use zap's optimized data
encoding as scorch's in-memory segments.
The zap Segment struct now embeds a zap SegmentBase struct, and layers
on persistence. Both the zap Segment and zap SegmentBase implement
scorch's Segment interface.
2018-01-27 11:35:10 -08:00
Steve Yen
dc62324e02
scorch zap miscellaneous typos
2018-01-27 11:35:10 -08:00
abhinavdangeti
567d756c27
Add support for certain disk stats
...
+ num_bytes_used_disk
+ num_files_on_disk
2018-01-24 14:10:14 -08:00
Marty Schoch
0fc9b4b74a
Merge pull request #742 from steveyen/scorch-unlock-needed
...
scorch unlocks in introduceSegment's DocNumbers() error codepath
2018-01-23 12:09:23 -05:00
Steve Yen
34fd77709f
scorch unlocks in introduceSegment's DocNumbers() error codepath
2018-01-20 17:17:16 -08:00
Marty Schoch
cb6391e75e
Merge pull request #733 from abhinavdangeti/scorch-segment-sizeinbytes
...
Include overhead from data structures in segment's SizeInBytes
2018-01-19 09:10:03 -05:00
Marty Schoch
5a812ee9ce
Merge pull request #732 from sreekanth-cb/facet_merge
...
MB-27498 - date range facet query panics
2018-01-19 09:02:57 -05:00
Sreekanth Sivasankaran
47f1c66889
adding UT
2018-01-19 11:47:28 +05:30
abhinavdangeti
1176c73a9c
Include overhead from data structures in segment's SizeInBytes
...
+ Account for all the overhead incurred from the data structures
within mem.Segment and zap.Segment.
- SizeOfMap = 8
- SizeOfPointer = 8
- SizeOfSlice = 24
- SizeOfString = 16
+ Include overhead from certain new fields as well.
2018-01-17 11:11:44 -08:00
Marty Schoch
44c371582a
Merge pull request #739 from ethantkoenig/unique_token_filter
...
Add UniqueTerm token filter
2018-01-17 13:10:10 -05:00
Ethan Koenig
012d436dd7
Add UniqueTerm token filter
2018-01-16 22:24:51 -08:00
Steve Yen
f4c3f984a4
Merge pull request #734 from steveyen/master
...
scorch mem segment optimizations
2018-01-16 08:57:02 -08:00
Marty Schoch
423d7dc4e4
Merge pull request #736 from ethantkoenig/readme
...
Fix coverage badge in README
2018-01-16 08:01:46 -05:00
Steve Yen
71d6d1691b
scorch zap optimizations of inner loops and easy preallocs
2018-01-15 23:04:23 -08:00
Ethan Koenig
d14b290235
Fix coverage badge in README
2018-01-15 22:23:41 -08:00
Steve Yen
d682c85a7b
scorch mem segments uses backing array trick even more
...
This change invokes make() only once per distinct type to allocate the
large, contiguous backing arrays for the mem segment.
2018-01-15 19:17:39 -08:00
Steve Yen
0f19b542a3
scorch mem segment prealloc's Locfields/starts/ends/pos/arraypos
...
This change preallocates more of the backing arrays for Locfields,
Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem
segment.
On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch
indexing throughput seems to improve from 15MB/sec to 20MB/sec after
the recent series of preallocation changes.
2018-01-15 18:40:28 -08:00
Steve Yen
a84bd122d2
scorch mem segment preallocates sub-slices via # terms
...
This change tracks the number of terms per posting list to
preallocate the sub-slices for the Freqs & Norms.
2018-01-15 18:20:43 -08:00