0
0
Fork 0
Commit Graph

1887 Commits

Author SHA1 Message Date
Sreekanth Sivasankaran 606a270669 Fix for empty segment merge handling
Avoid creating new files with emtpy segments tasks
during the merge operation, skips the
incorrect appending of a newer segment during merge.
2018-02-15 16:44:20 +05:30
Sreekanth Sivasankaran 35611f4287
Merge branch 'master' into persister_pause 2018-02-14 16:53:06 +05:30
Sreekanth Sivasankaran 6f2797bec3 Adding a pause to persister until the merger
catches up
2018-02-14 16:39:26 +05:30
Steve Yen 030469a351
Merge pull request #767 from steveyen/persistSnapshot-err-handling
improvements to err handling in persistSnapshot(), etc
2018-02-13 14:53:42 -08:00
Steve Yen 2651ba4b19
Merge pull request #773 from steveyen/merge-enumerator
scorch zap segment merging via a new enumerator instead of vellum.MergeIterator
2018-02-13 13:05:39 -08:00
Steve Yen 57fc03258e scorch rollback ignores unsafeBatch flag
See also: https://github.com/blevesearch/bleve/issues/760
2018-02-13 10:21:42 -08:00
Steve Yen 29663c2795
Merge pull request #770 from steveyen/optimize-prealloced-postings-iterator
scorch zap segment merging reuses prealloc'ed PostingsIterator
2018-02-13 10:02:42 -08:00
Steve Yen fe544f3352 scorch zap merge uses enumerator for vellum.Iterator's 2018-02-12 21:28:46 -08:00
Steve Yen a073424e5a scorch zap dict.postingsListFromOffset() method
A helper method that can create a PostingsList if the caller already
knows the postingsOffset.
2018-02-12 20:54:07 -08:00
Steve Yen 2158e06c40 scorch zap merge collects dicts & itrs in lock-step
The theory with this change is that the dicts and itrs should be
positionally in "lock-step" with paired entries.

And, since later code also uses the same array indexing to access the
drops and newDocNums, those also need to be positionally in pair-wise
lock-step, too.
2018-02-12 20:54:07 -08:00
Steve Yen 95a4f37e5c scorch zap enumerator impl that joins multiple vellum iterators
Unlike vellum's MergeIterator, the enumerator introduced in this
commit doesn't merge when there are matching keys across iterators.

Instead, the enumerator implementation provides a traversal of all the
tuples of (key, iteratorIndex, val) from the underlying vellum
iterators, ordered by key ASC, iteratorIndex ASC.
2018-02-12 20:54:06 -08:00
Steve Yen a4c54c4389
Merge pull request #772 from abhinavdangeti/master
Update vendor'ed revision for moss to the latest
2018-02-12 11:12:44 -08:00
abhinavdangeti 846235593c Update vendor'ed revision for moss to the latest 2018-02-12 10:04:34 -08:00
Steve Yen e37c563c56 scorch zap merge move fieldDvLocsOffset var declaration
Move the var declaration to nearer where its used.
2018-02-08 18:03:09 -08:00
Steve Yen f177f07613 scorch zap segment merging reuses prealloc'ed PostingsIterator
During zap segment merging, a new zap PostingsIterator was allocated
for every field X segment X term.

This change optimizes by reusing a single PostingsIterator instance
per persistMergedRest() invocation.

And, also unused fields are removed from the PostingsIterator.
2018-02-08 17:24:30 -08:00
Steve Yen 6f5f90cd41 scorch zap segment cleanup handling for some edge cases
Two cases in this commit...

If we're shutting down, the merger might not have handed off its
latest merged segment to the introducer yet, so the merger still owns
the segment and needs to Close() that segment itself.

In persistSnapshot(), there migth be cases where the persister might
not be able to swap in its newly persisted segments -- so, the
persistSnapshot() needs to Close() those segments itself.
2018-02-08 14:04:04 -08:00
Steve Yen 83272a9629 scorch persistSnapshot() err handling & propagation 2018-02-08 14:03:59 -08:00
Steve Yen dee6a2b1c6 scorch persistSnapshot() consistently uses err to commit vs abort
Some codepaths in persistSnapshot() were saving errors into an err2
local variable, which might lead incorrectly to commit during an error
situation rather than abort.
2018-02-08 14:02:35 -08:00
Steve Yen 7b9fe0a216
Merge pull request #768 from steveyen/issue-764
scorch uses segment.id to encode boltdb sub-bucket key
2018-02-08 13:51:11 -08:00
Steve Yen 91ac0d011a scorch uses segment.id to encode boltdb sub-bucket key
fixes #764
2018-02-08 13:25:16 -08:00
Steve Yen 8a7990427f
Merge pull request #765 from steveyen/more-TestIndexRollback-fixes
fix for TestIndexRollback unit tests
2018-02-08 12:45:28 -08:00
Steve Yen 1552caeab9
Merge pull request #766 from steveyen/scorch-persistSnapshot-comment
scorch persistSnapshot comments update
2018-02-08 12:41:01 -08:00
Steve Yen d0644fec12 scorch persistSnapshot comments update
See also: https://github.com/blevesearch/bleve/issues/763
2018-02-08 12:22:58 -08:00
Steve Yen 99852accb0 scorch RollbackPoints() no error at start & fix TestIndexRollback
When a scorch is just opened and is "empty", RollbackPoints() no
longer considers that an error situation.

Also, this commit makes the TestIndexRollback unit tests is a bit more
forgiving to races, as we were seeing failures sometimes in travis-CI
environments (TestIndexRollback was passing fine on my dev macbook).
The theory is the double-looping in the persisterLoop would sometimes
be racy, leading to 1 or 2 rollback points.
2018-02-08 11:45:25 -08:00
Marty Schoch ea20b1be42
Merge pull request #755 from steveyen/optimize-zap-merge-byte-copy-storedDocs
optimize zap merge byte copy stored docs
2018-02-08 12:27:50 -05:00
Steve Yen ed4826b189 scorch zap merge optimization to byte-copy storedDocs
The optimization to byte-copy all the storedDocs for a given segment
during merging kicks in when the fields are the same across all
segments and when there are no deletions for that given segment.  This
can happen, for example, during data loading or insert-only scenarios.

As part of this commit, the Segment.copyStoredDocs() method was added,
which uses a single Write() call to copy all the stored docs bytes of
a segment to a writer in one shot.

And, getDocStoredMetaAndCompressed() was refactored into a related
helper function, getDocStoredOffsets(), which provides the storedDocs
metadata (offsets & lengths) for a doc.
2018-02-08 09:08:35 -08:00
Steve Yen 0b50a20cac scorch zap move docDropped const to earlier in file 2018-02-08 09:06:31 -08:00
Steve Yen 822457542e scorch zap VERSION bump: check whether fields are the same at merge
COMPATIBILITY NOTE: scorch zap version bumped in this commit.

The version bump is because mergeFields() now computes whether fields
are the same across segments and it relies on the previous commit
where fieldID's are assigned in field name sorted order (albeit with
_id field always having fieldID of 0).

Potential future commits might rely on this info that "fields are the
same across segments" for more optimizations, etc.
2018-02-08 09:06:30 -08:00
Steve Yen ffdeb8055e scorch sorts fields by name to assign fieldID's
This is a stepping stone to allow easier future comparisons of field
maps and potential merge optimizations.

In bleve-blast tests on a 2015 macbook (50K wikipedia docs, 8
indexers, batch size 100, ssd), this does not seem to have a distinct
effect on indexing throughput.
2018-02-08 09:06:30 -08:00
Marty Schoch 1af90936c4
Merge pull request #751 from sreekanth-cb/merger_persister_handshake_fix
fix for merger persister handshake stalemate
2018-02-08 11:03:01 -05:00
Marty Schoch 0bcfb15ace
Merge pull request #754 from sreekanth-cb/mergeplan_edge_tuning
tuning the edge for merge-task execution loop
2018-02-08 10:59:03 -05:00
Marty Schoch 534bd5ef4d
Merge pull request #753 from steveyen/zap-rollback-test-fixes
scorch zap TestIndexRollback fixes
2018-02-08 10:57:41 -05:00
Marty Schoch f531a248e7
Merge pull request #749 from sreekanth-cb/zapfile_cleanup_fix
unblock the files for clean up, esp for merged new segment files
2018-02-08 10:53:41 -05:00
Steve Yen 3d729c73c1
Merge pull request #758 from steveyen/scorch-optimizations-20180207
scorch optimizations via struct reuse
2018-02-08 06:16:27 -08:00
Sreekanth Sivasankaran feecce1eb2 fix for merger persister handshake stalemate
The slow merger was lagging behind the fast persister
to a persister notify send-loop while the persister awaits
for any new introductions from introducer totally blocking
the merger

This fix along with the deleted files eligibilty flipping
makes the file count to around 6 to 11 files  per shard
for both travel and beer samples
2018-02-08 11:00:21 +05:30
Steve Yen a83ee0f364 scorch zap.MergeToWriter() takes SegmentBases instead of Segments
This change turns zap.MergeToWriter() into a public func, so that it's
now directly callable from outside packages (such as from scorch's
top-level merger or persister).  And, MergerToWriter() now takes input
of SegmentBases instead of Segments, so that it can now work on either
in-memory zap segments or file-based zap segments.

This is yet another stepping stone towards in-memory merging of zap
segments.
2018-02-07 14:38:13 -08:00
Steve Yen 8c2520d55c scorch zap optimize via postingsList reuse
pprof graphs were showing many postingsList allocations during
merging, so this change optimizes by reusing postingList memory in the
merging loops.
2018-02-07 14:33:20 -08:00
Steve Yen 03c8b2b7ec scorch mem segment optimizes DictEntry's across Next() calls
This change optimizes the scorch/mem DictionaryIterator by reusing a
DictEntry struct across multiple Next() calls.  This follows the same
optimization trick and Next() semantics as upsidedown's FieldDict
implementation.
2018-02-07 14:17:48 -08:00
Steve Yen 78a7ae562f
Merge pull request #756 from steveyen/optimize-storedIndexOffset-loop
scorch zap mergeStoredAndRemap loop optimization
2018-02-06 18:00:34 -08:00
Steve Yen 0dfd73d6cc scorch zap mergeStoredAndRemap loop optimization
This change avoids an array/slice access in a loop body.
2018-02-06 17:10:44 -08:00
Steve Yen eb1d269521
Merge pull request #748 from steveyen/master
scorch zap merge related refactorings / optimizations
2018-02-06 07:52:17 -08:00
Sreekanth Sivasankaran 07274c036d tuning the edge for merge-task execution loop
Adjusting the merge task creation loop to accommodate
the newly merged segments so that the eventual merge
results/ number of segments stay within the calculated budget.
2018-02-06 13:48:16 +05:30
Steve Yen 1e36cdf358
Merge pull request #752 from steveyen/more-calc-budget-tests
more zap merge-planner CalcBudget tests at larger sizes
2018-02-05 12:51:42 -08:00
Steve Yen a280ba7cf8 scorch zap TestIndexRollback fixes
The TestIndexRollback unit test was failing more often than ever
(perhaps raciness?), so this commit tries to remove avenues of
raciness in the test...

- The Scorch.Open() method is refactored into an Scorch.openBolt()
  helper method in order to allow unit tests to control which
  background goroutines are started.

- TestIndexRollback() doesn't start the merger goroutine, to simulate
  a really slow merger that never gets around to merging old segments.

- TestIndexRollback() creates a long-lived reader after the first
  batch, so that the first index snapshot isn't removed due to the
  long-lived reader's ref-count.

- TestIndexRollback() temporarily bumps NumSnapshotsToKeep to a large
  number so the persister isn't tempted to removeOldData() that we're
  trying to rollback to.
2018-02-05 12:23:58 -08:00
Steve Yen fdb240f5f9 more zap merge-planner CalcBudget tests at larger sizes
Helps provide a sense of how # of segments grows as # of documents
grows.  Ex: 1B docs => budget of 54 segments.
2018-02-05 10:02:47 -08:00
Steve Yen c09e2a08ca scorch zap chunkedContentCoder reuses chunk metadata slice memory
And, renamed the chunk MetaData.DocID field to DocNum for naming
correctness, where much of this commit is the mechanical effect of
that rename.
2018-02-05 07:39:16 -08:00
Steve Yen 3da191852d scorch zap tighten up prepareSegment()'s lock area 2018-02-05 07:39:16 -08:00
Steve Yen 6578655758 scorch zap refactored out mergeToWriter() func
This is a step towards supporting in-memory zap segment merging.
2018-02-05 07:39:16 -08:00
Steve Yen eb21bf8315 scorch zap merge & build share persistStoredFieldValues()
Refactored out a helper func, persistStoredFieldValues(), that both
the persistence and merge codepaths now share.
2018-02-05 07:38:55 -08:00
Sreekanth Sivasankaran 9636209ae5
Update persister.go
comment updated
2018-02-05 20:49:30 +05:30