0
0
Fork 0
Commit Graph

1755 Commits

Author SHA1 Message Date
Marty Schoch f531a248e7
Merge pull request #749 from sreekanth-cb/zapfile_cleanup_fix
unblock the files for clean up, esp for merged new segment files
2018-02-08 10:53:41 -05:00
Steve Yen 3d729c73c1
Merge pull request #758 from steveyen/scorch-optimizations-20180207
scorch optimizations via struct reuse
2018-02-08 06:16:27 -08:00
Sreekanth Sivasankaran feecce1eb2 fix for merger persister handshake stalemate
The slow merger was lagging behind the fast persister
to a persister notify send-loop while the persister awaits
for any new introductions from introducer totally blocking
the merger

This fix along with the deleted files eligibilty flipping
makes the file count to around 6 to 11 files  per shard
for both travel and beer samples
2018-02-08 11:00:21 +05:30
Steve Yen a83ee0f364 scorch zap.MergeToWriter() takes SegmentBases instead of Segments
This change turns zap.MergeToWriter() into a public func, so that it's
now directly callable from outside packages (such as from scorch's
top-level merger or persister).  And, MergerToWriter() now takes input
of SegmentBases instead of Segments, so that it can now work on either
in-memory zap segments or file-based zap segments.

This is yet another stepping stone towards in-memory merging of zap
segments.
2018-02-07 14:38:13 -08:00
Steve Yen 8c2520d55c scorch zap optimize via postingsList reuse
pprof graphs were showing many postingsList allocations during
merging, so this change optimizes by reusing postingList memory in the
merging loops.
2018-02-07 14:33:20 -08:00
Steve Yen 03c8b2b7ec scorch mem segment optimizes DictEntry's across Next() calls
This change optimizes the scorch/mem DictionaryIterator by reusing a
DictEntry struct across multiple Next() calls.  This follows the same
optimization trick and Next() semantics as upsidedown's FieldDict
implementation.
2018-02-07 14:17:48 -08:00
Steve Yen 78a7ae562f
Merge pull request #756 from steveyen/optimize-storedIndexOffset-loop
scorch zap mergeStoredAndRemap loop optimization
2018-02-06 18:00:34 -08:00
Steve Yen 0dfd73d6cc scorch zap mergeStoredAndRemap loop optimization
This change avoids an array/slice access in a loop body.
2018-02-06 17:10:44 -08:00
Steve Yen eb1d269521
Merge pull request #748 from steveyen/master
scorch zap merge related refactorings / optimizations
2018-02-06 07:52:17 -08:00
Sreekanth Sivasankaran 07274c036d tuning the edge for merge-task execution loop
Adjusting the merge task creation loop to accommodate
the newly merged segments so that the eventual merge
results/ number of segments stay within the calculated budget.
2018-02-06 13:48:16 +05:30
Steve Yen 1e36cdf358
Merge pull request #752 from steveyen/more-calc-budget-tests
more zap merge-planner CalcBudget tests at larger sizes
2018-02-05 12:51:42 -08:00
Steve Yen a280ba7cf8 scorch zap TestIndexRollback fixes
The TestIndexRollback unit test was failing more often than ever
(perhaps raciness?), so this commit tries to remove avenues of
raciness in the test...

- The Scorch.Open() method is refactored into an Scorch.openBolt()
  helper method in order to allow unit tests to control which
  background goroutines are started.

- TestIndexRollback() doesn't start the merger goroutine, to simulate
  a really slow merger that never gets around to merging old segments.

- TestIndexRollback() creates a long-lived reader after the first
  batch, so that the first index snapshot isn't removed due to the
  long-lived reader's ref-count.

- TestIndexRollback() temporarily bumps NumSnapshotsToKeep to a large
  number so the persister isn't tempted to removeOldData() that we're
  trying to rollback to.
2018-02-05 12:23:58 -08:00
Steve Yen fdb240f5f9 more zap merge-planner CalcBudget tests at larger sizes
Helps provide a sense of how # of segments grows as # of documents
grows.  Ex: 1B docs => budget of 54 segments.
2018-02-05 10:02:47 -08:00
Steve Yen c09e2a08ca scorch zap chunkedContentCoder reuses chunk metadata slice memory
And, renamed the chunk MetaData.DocID field to DocNum for naming
correctness, where much of this commit is the mechanical effect of
that rename.
2018-02-05 07:39:16 -08:00
Steve Yen 3da191852d scorch zap tighten up prepareSegment()'s lock area 2018-02-05 07:39:16 -08:00
Steve Yen 6578655758 scorch zap refactored out mergeToWriter() func
This is a step towards supporting in-memory zap segment merging.
2018-02-05 07:39:16 -08:00
Steve Yen eb21bf8315 scorch zap merge & build share persistStoredFieldValues()
Refactored out a helper func, persistStoredFieldValues(), that both
the persistence and merge codepaths now share.
2018-02-05 07:38:55 -08:00
Sreekanth Sivasankaran 9636209ae5
Update persister.go
comment updated
2018-02-05 20:49:30 +05:30
Sreekanth Sivasankaran 678c412157 unblock the files for clean up, esp for merged new segment files 2018-02-02 14:44:02 +05:30
Steve Yen 714f5321e0 scorch zap merge storedFieldVals inner loop optimization 2018-02-01 16:28:15 -08:00
Abhinav Dangeti ff210fbc6d
Merge pull request #744 from abhinavdangeti/geopoint-fix
MB-26396: Handling docs with geopoints in slice format
2018-02-01 10:20:06 -08:00
Steve Yen 175f80403a
Merge pull request #747 from steveyen/master
scorch zap DictIterator term count fixed and more merge unit tests
2018-02-01 10:13:18 -08:00
Abhinav Dangeti c24f8944c4
Merge pull request #738 from abhinavdangeti/scorch-stats
Add support for certain disk stats
2018-02-01 08:35:59 -08:00
Steve Yen 93b037cdbb scorch zap TestMergeWithUpdates() 2018-01-31 11:44:41 -08:00
Steve Yen 4dd64b68fa scorch zap TestMergeWithEmptySegment(s) 2018-01-30 22:27:40 -08:00
Steve Yen 684ee3c0e7 scorch zap DictIterator term count fixed and more merge unit tests
The zap DictionaryIterator Next() was incorrectly returning the
postingsList offset as the term count.  As part of this, refactored
out a PostingsList.read() helper method.

Also added more merge unit test scenarios, including merging a segment
for a few rounds to see if there are differences before/after merging.
2018-01-30 21:22:06 -08:00
abhinavdangeti 6451c8c37f MB-26396: Handling documents with geopoints in slice format
+ The issue lies with parsing documents containing a geopoint
  in slice format - which wasn't handled.
+ Unit test that verifies the fix.
2018-01-29 18:31:56 -08:00
Steve Yen a3b125508b
Merge pull request #746 from steveyen/master
more scorch zap optimizations (array for docTermMap, etc)
2018-01-29 15:50:04 -08:00
Steve Yen 634cfa0560 scorch zap chunkedIntCoder optimization to prealloc some final buf 2018-01-29 11:03:53 -08:00
Steve Yen a444c25ddf scorch zap merge uses array for docTermMap with no sorting
Instead of sorting docNum keys from a hashmap, this change instead
iterates from docNum 0 to N and uses an array instead of hashmap.
The array is also reused across outer loop iterations.

This optimizes for when there's a lot of structural similarity between
docs, where many/most docs have the same fields.  i.e., beers,
breweries.  If every doc has completely different fields, then this
change might produce worse behavior compared to the previous sparse
hashmap approach.
2018-01-29 10:47:08 -08:00
Steve Yen 5d1a2b0ad7
Merge pull request #743 from steveyen/master
zap-based in-memory segment impl & various merge optimizations
2018-01-29 09:22:12 -08:00
Steve Yen 745575a6c1 scorch zap mergeStoredAndRemap uses array indexing, not append()
Since we have right array size preallocated, we don't need the extra
capacity checking of append().
2018-01-27 11:35:10 -08:00
Steve Yen 8dd17a3b20 scorch zap mergeStoredAndRemap uses continue for less indentation 2018-01-27 11:35:10 -08:00
Steve Yen 0041664bc4 scorch zap merge computeNewDocCount() optimize 1 variable 2018-01-27 11:35:10 -08:00
Steve Yen 6985db13a0 scorch zap merge reuses docNumbers array 2018-01-27 11:35:10 -08:00
Steve Yen 916bbf4125 scorch zap merge prealloc's docTermMap capacity 2018-01-27 11:35:10 -08:00
Steve Yen 56cdb68f35 scorch zap merge checks err2 not err
Also, optimize the appending of the termSeparator so that the
docTermMap is accessed and updated just once.
2018-01-27 11:35:10 -08:00
Steve Yen 3030d4edb5 scorch zap merge preallocs segNewDocNums capacity 2018-01-27 11:35:10 -08:00
Steve Yen 9038d75c98 scorch zap allocate govarint.U64Base128Encoder just once
Instead of allocating a govarint.U64Base128Encoder in the inner loop,
allocate it just once on the outside, as it appears that it's just a
thin wrapper around binary.PutUvarint().
2018-01-27 11:35:10 -08:00
Steve Yen 10dd5489c2 scorch zap Dict.postingsList() takes []byte for more mem control
This allows callers that already have a []byte term to avoid
string'ification garbage.
2018-01-27 11:35:10 -08:00
Steve Yen 6a17ff48c7 scorch zap removed uneeded []byte cast of term 2018-01-27 11:35:10 -08:00
Steve Yen d389e2bb40 scorch zap merge file cleanup on error, and some minor prealloc's 2018-01-27 11:35:10 -08:00
Steve Yen 29d526a7c2 scorch zap merge uses DefaultChunkFactor 2018-01-27 11:35:10 -08:00
Steve Yen 603425c2c5 scorch zap mergerLoop missing fireAsyncError case 2018-01-27 11:35:10 -08:00
Steve Yen 37121c3b49 scorch zap writeRoaringWithLen optimized with reused bufs 2018-01-27 11:35:10 -08:00
Steve Yen 5a035dc9aa scorch zap in-memory segment representation (SegmentBase)
The zap SegmentBase struct is a refactoring of the zap Segment into
the subset of fields that are needed for read-only ops, without any
persistence related info.  This allows us to use zap's optimized data
encoding as scorch's in-memory segments.

The zap Segment struct now embeds a zap SegmentBase struct, and layers
on persistence.  Both the zap Segment and zap SegmentBase implement
scorch's Segment interface.
2018-01-27 11:35:10 -08:00
Steve Yen dc62324e02 scorch zap miscellaneous typos 2018-01-27 11:35:10 -08:00
abhinavdangeti 567d756c27 Add support for certain disk stats
+ num_bytes_used_disk
+ num_files_on_disk
2018-01-24 14:10:14 -08:00
Marty Schoch 0fc9b4b74a
Merge pull request #742 from steveyen/scorch-unlock-needed
scorch unlocks in introduceSegment's DocNumbers() error codepath
2018-01-23 12:09:23 -05:00
Steve Yen 34fd77709f scorch unlocks in introduceSegment's DocNumbers() error codepath 2018-01-20 17:17:16 -08:00