0
0
Commit Graph

261 Commits

Author SHA1 Message Date
Abhinav Dangeti
c24f8944c4
Merge pull request #738 from abhinavdangeti/scorch-stats
Add support for certain disk stats
2018-02-01 08:35:59 -08:00
Steve Yen
93b037cdbb scorch zap TestMergeWithUpdates() 2018-01-31 11:44:41 -08:00
Steve Yen
4dd64b68fa scorch zap TestMergeWithEmptySegment(s) 2018-01-30 22:27:40 -08:00
Steve Yen
684ee3c0e7 scorch zap DictIterator term count fixed and more merge unit tests
The zap DictionaryIterator Next() was incorrectly returning the
postingsList offset as the term count.  As part of this, refactored
out a PostingsList.read() helper method.

Also added more merge unit test scenarios, including merging a segment
for a few rounds to see if there are differences before/after merging.
2018-01-30 21:22:06 -08:00
Steve Yen
634cfa0560 scorch zap chunkedIntCoder optimization to prealloc some final buf 2018-01-29 11:03:53 -08:00
Steve Yen
a444c25ddf scorch zap merge uses array for docTermMap with no sorting
Instead of sorting docNum keys from a hashmap, this change instead
iterates from docNum 0 to N and uses an array instead of hashmap.
The array is also reused across outer loop iterations.

This optimizes for when there's a lot of structural similarity between
docs, where many/most docs have the same fields.  i.e., beers,
breweries.  If every doc has completely different fields, then this
change might produce worse behavior compared to the previous sparse
hashmap approach.
2018-01-29 10:47:08 -08:00
Steve Yen
745575a6c1 scorch zap mergeStoredAndRemap uses array indexing, not append()
Since we have right array size preallocated, we don't need the extra
capacity checking of append().
2018-01-27 11:35:10 -08:00
Steve Yen
8dd17a3b20 scorch zap mergeStoredAndRemap uses continue for less indentation 2018-01-27 11:35:10 -08:00
Steve Yen
0041664bc4 scorch zap merge computeNewDocCount() optimize 1 variable 2018-01-27 11:35:10 -08:00
Steve Yen
6985db13a0 scorch zap merge reuses docNumbers array 2018-01-27 11:35:10 -08:00
Steve Yen
916bbf4125 scorch zap merge prealloc's docTermMap capacity 2018-01-27 11:35:10 -08:00
Steve Yen
56cdb68f35 scorch zap merge checks err2 not err
Also, optimize the appending of the termSeparator so that the
docTermMap is accessed and updated just once.
2018-01-27 11:35:10 -08:00
Steve Yen
3030d4edb5 scorch zap merge preallocs segNewDocNums capacity 2018-01-27 11:35:10 -08:00
Steve Yen
9038d75c98 scorch zap allocate govarint.U64Base128Encoder just once
Instead of allocating a govarint.U64Base128Encoder in the inner loop,
allocate it just once on the outside, as it appears that it's just a
thin wrapper around binary.PutUvarint().
2018-01-27 11:35:10 -08:00
Steve Yen
10dd5489c2 scorch zap Dict.postingsList() takes []byte for more mem control
This allows callers that already have a []byte term to avoid
string'ification garbage.
2018-01-27 11:35:10 -08:00
Steve Yen
6a17ff48c7 scorch zap removed uneeded []byte cast of term 2018-01-27 11:35:10 -08:00
Steve Yen
d389e2bb40 scorch zap merge file cleanup on error, and some minor prealloc's 2018-01-27 11:35:10 -08:00
Steve Yen
29d526a7c2 scorch zap merge uses DefaultChunkFactor 2018-01-27 11:35:10 -08:00
Steve Yen
603425c2c5 scorch zap mergerLoop missing fireAsyncError case 2018-01-27 11:35:10 -08:00
Steve Yen
37121c3b49 scorch zap writeRoaringWithLen optimized with reused bufs 2018-01-27 11:35:10 -08:00
Steve Yen
5a035dc9aa scorch zap in-memory segment representation (SegmentBase)
The zap SegmentBase struct is a refactoring of the zap Segment into
the subset of fields that are needed for read-only ops, without any
persistence related info.  This allows us to use zap's optimized data
encoding as scorch's in-memory segments.

The zap Segment struct now embeds a zap SegmentBase struct, and layers
on persistence.  Both the zap Segment and zap SegmentBase implement
scorch's Segment interface.
2018-01-27 11:35:10 -08:00
Steve Yen
dc62324e02 scorch zap miscellaneous typos 2018-01-27 11:35:10 -08:00
abhinavdangeti
567d756c27 Add support for certain disk stats
+ num_bytes_used_disk
+ num_files_on_disk
2018-01-24 14:10:14 -08:00
Steve Yen
34fd77709f scorch unlocks in introduceSegment's DocNumbers() error codepath 2018-01-20 17:17:16 -08:00
abhinavdangeti
1176c73a9c Include overhead from data structures in segment's SizeInBytes
+ Account for all the overhead incurred from the data structures
  within mem.Segment and zap.Segment.
    - SizeOfMap = 8
    - SizeOfPointer = 8
    - SizeOfSlice = 24
    - SizeOfString = 16
+ Include overhead from certain new fields as well.
2018-01-17 11:11:44 -08:00
Steve Yen
71d6d1691b scorch zap optimizations of inner loops and easy preallocs 2018-01-15 23:04:23 -08:00
Steve Yen
d682c85a7b scorch mem segments uses backing array trick even more
This change invokes make() only once per distinct type to allocate the
large, contiguous backing arrays for the mem segment.
2018-01-15 19:17:39 -08:00
Steve Yen
0f19b542a3 scorch mem segment prealloc's Locfields/starts/ends/pos/arraypos
This change preallocates more of the backing arrays for Locfields,
Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem
segment.

On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch
indexing throughput seems to improve from 15MB/sec to 20MB/sec after
the recent series of preallocation changes.
2018-01-15 18:40:28 -08:00
Steve Yen
a84bd122d2 scorch mem segment preallocates sub-slices via # terms
This change tracks the number of terms per posting list to
preallocate the sub-slices for the Freqs & Norms.
2018-01-15 18:20:43 -08:00
Steve Yen
a4110d325c scorch mem segment preallocates slices that are key'ed by postingId
The scorch mem segment build phase uses the append() idiom to populate
various slices that are keyed by postings list id's.  These slices
include...

* Postings
* PostingsLocs
* Freqs
* Norms
* Locfields
* Locstarts
* Locends
* Locpos
* Locarraypos

This change introduces an initialization step that preallocates those
slices up-front, by assigning postings list id's to terms up-front.

This change also has an additional effect of simplifying the
processDocument() logic to no longer have to worry about a first-time
initialization case, removing some duplicate'ish code.
2018-01-15 16:53:39 -08:00
Steve Yen
917c470791 scorch mem segment VisitDocument() accesses StoredTypes/Pos outside of loop 2018-01-15 11:54:46 -08:00
Steve Yen
e7bd6026eb scorch mem segment preallocs docMap/fieldLens with capacity
The first time through, startNumFields should be 0, where there ought
to be more optimization assuming later docs have similar fields as the
first doc.
2018-01-15 11:52:20 -08:00
Steve Yen
d777d7c365 scorch mem segment comments consistency 2018-01-15 11:08:21 -08:00
Marty Schoch
4e82a8a0ca
Merge pull request #726 from sreekanth-cb/docValue_configs
DocValue Config, new API Changes
2018-01-10 18:11:18 -05:00
Sreekanth Sivasankaran
53aef2104e fixing err handling in UTs, name changes 2018-01-10 22:00:26 +05:30
abhinavdangeti
43bfcc00c9 Do not account mmap'ed part of zap segments in MemoryUsed
This API is designed to only emit the dirty "unpersisted"
bytes only. This does not included the mmap'ed part in the
zap segments (disk).
2018-01-09 09:43:53 -08:00
Sreekanth Sivasankaran
4c256f5669 DocValue Config, new API Changes
-VisitableDocValueFields API for persisted DV field list
-making dv configs overridable at field level
-enabling on the fly/runtime un inverting of doc values
-few UT updates
2018-01-08 10:58:33 +05:30
Marty Schoch
1788a03803 remove junk from end of scorch readme 2018-01-06 21:09:53 -05:00
Marty Schoch
e756c7acf0 add initial support for async error callback 2018-01-05 16:43:16 -05:00
Marty Schoch
6237479605 fix race condition in setting up event callbacks
previous approach used SetEventCallback method which allowed
you to change the callback, unfotunately that also included
times after the goroutines were started and potentially firing
the callback.

checking lock on this would be too expensive, so instead we go
for an approach that allows callbacks to be registered by name
during process init(), then upon opening up an index a string
config key 'eventCallbackName' is used to look up the
appropriate callback function.  also, since this string config
name is serializable, it fits into the existing bleve index
metadata without any new issues.
2018-01-05 13:46:03 -05:00
Marty Schoch
57a075afdb improving command-line tool for scorch 2018-01-05 11:50:07 -05:00
Marty Schoch
c691cd2bb5 refactor scorch/zap command-line tools under bleve
zap command-line tool added to main bleve command-line tool
this required physical relocation due to the vendoring used
only on the bleve command-line tool (unforseen limitation)

a new scorch command-line tool has also been introduced
and for the same reasons it is physically store under
the top-level bleve command-line tool as well
2018-01-05 10:17:18 -05:00
Abhinav Dangeti
dee1dd9bc8
Merge pull request #720 from abhinavdangeti/scorch
Updated Rollback APIs
2018-01-04 14:51:33 -08:00
abhinavdangeti
111f0d0721 Updated Rollback APIs
New APIs:
+ RollbackPoints()
    - Retrieves the available list of rollback points: epoch+meta.
    - The application will need to check with the meta to decide
    on the rollback point.
+ Rollback()
    - API requires a rollback point identified by the first API.
    - Atomically & Durably rolls back the index to specified point,
    provided the specified rollback point is still available.
+ Unit test: TestIndexRollback
    - Writes a batch.
    - Sets the rollback point.
    - Writes second batch.
    - Rollback to previously decided point.
    - Ensure that data is as is before the second batch.
2018-01-04 13:21:58 -08:00
Marty Schoch
71cdac785d
Merge pull request #703 from sreekanth-cb/docValue_persisted
docValue persist changes
2018-01-04 10:34:58 -05:00
Sreekanth Sivasankaran
71a726bbf6 perf issue was due to duplicate fieldIDs getting
inserted to the list of dv enabled fields list -
DocValueFields in mem segment.
Moved back to the original type `DocValueFields map[uint16]bool`
for easy look up to check whether the fieldID is
configured for dv storage.
2018-01-04 15:34:55 +05:30
Sreekanth Sivasankaran
f42ecb0ac7 docvalue "zap-path" cmd to print out the dv disk sizes 2018-01-04 13:58:51 +05:30
Marty Schoch
1a59a1bb99 attempt to fix core reference counting issues
Observed problem:

Persisted index state (in root bolt) would contain index snapshots which
pointed to index files that did not exist.

Debugging this uncovered two main problems:

1.  At the end of persisting a snapshot, the persister creates a new index
snapshot with the SAME epoch as the current root, only it replaces in-memory
segments with the new disk based ones.  This is problematic because reference
counting an index segment triggers "eligible for deletion".  And eligible for
deletion is keyed by epoch.  So having two separate instances going by the same
epoch is problematic.  Specifically, one of them gets to 0 before the other,
and we wrongly conclude it's eligible for deletion, when in fact the "other"
instance with same epoch is actually still in use.

To address this problem, we have modified the behavior of the persister.  Now,
upon completion of persistence, ONLY if new files were actually created do we
proceed to introduce a new snapshot.  AND, this new snapshot now gets it's own
brand new epoch.  BOTH of these are important because since the persister now
also introduces a new epoch, it will see this epoch again in the future AND be
expected to persist it.  That is OK (mostly harmless), but we cannot allow it
to form a loop.  Checking that new files were actually introduced is what
short-circuits the potential loop.  The new epoch introduced by the persister,
if seen again will not have any new segments that actually need persisting to
disk, and the cycle is stopped.

2.  The implementation of NumSnapshotsToKeep, and related code to deleted old
snapshots from the root bolt also contains problems.  Specifically, the
determination of which snapshots to keep vs delete did not consider which ones
were actually persisted.  So, lets say you had set NumSnapshotsToKeep to 3, if
the introducer gets 3 snapshots ahead of the persister, what can happen is that
the three snapshots we choose to keep are all in memory.  We now wrongly delete
all of the snapshots from the root bolt.  But it gets worse, in this instant of
time, we now have files on disk that nothing in the root bolt points to, so we
also go ahead and delete those files.  Those files were still being referenced
by the in-memory snapshots.  But, now even if they get persisted to disk, they
simply have references to non-existent files.  Opening up one of these indexes
results in lost data (often everything).

To address this problem, we made large change to the way this section of code
operates.  First, we now start with a list of all epochs actually persisted in
the root bolt.  Second, we set aside NumSnapshotsToKeep of these snapshots to
keep.  Third, anything else in the eligibleForRemoval list will be deleted.  I
suspect this code is slower and less elegant, but I think it is more correct.
Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to
1, which feels like saner out-of-the-box behavior (though it's debatable if the
original intent was perhaps instead for "extra" snapshots to keep, but with the
variable named as it is, 1 makes more sense to me)

Other minor changes included in this change:

- Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and
'ineligibleForRemoval' members of Scorch struct were moved into the
paragraph with 'rootLock' to clarify that you must hold the lock to access it.

- TestBatchRaceBug260 was updated to properly Close() the index, which leads to
occasional test failures.
2018-01-03 12:05:00 -05:00
Sreekanth Sivasankaran
448201243a removed redundant buf writer, and checks 2017-12-30 16:54:06 +05:30
Sreekanth Sivasankaran
61ba81e964 Merge branch 'scorch', remote-tracking branch 'origin' into docValue_persisted 2017-12-30 16:52:51 +05:30
Marty Schoch
29b63cfe43
Merge pull request #711 from abhinavdangeti/scorch3
Tracking memory consumption for a scorch index
2017-12-29 12:52:32 -08:00
abhinavdangeti
5c26f5a86d Tracking memory consumption for a scorch index
+ Track memory usage at a segment level
+ Add a new scorch API: MemoryUsed()
    - Aggregate the memory consumption across
      segments when API is invoked.

+ TODO:
    - Revisit the second iteration if it can be gotten
      rid off, and the size accounted for during the first
      run while building an in-mem segment.
    - Accounting for pointer and slice overhead.
2017-12-29 10:20:11 -07:00
abhinavdangeti
055d3e12df Adding onEvent callback support for scorch
Event types:
- EventKindCloseStart
- EventKindClose
- EventKindMergerProgress
- EventKindPersisterProgress
- EventKindBatchIntroductionStart
- EventKindBatchIntroduction
2017-12-29 09:47:25 -07:00
Sreekanth Sivasankaran
c8df014c0c Updated readme, zap version, added new docvalue cmd,
fixed the footer and fields cmd,
interface name updated
2017-12-29 21:39:29 +05:30
abhinavdangeti
4bede84fd0 Wiring up missing stats for scorch
- updates, deletes, batches, errors
- term_searchers_started, term_searchers_finished
- num_plain_test_bytes_indexed
2017-12-28 14:07:58 -07:00
abhinavdangeti
becd4677cd Adding num_items_introduced, num_items_persisted stats
+ Adding new entries to the stats struct of scorch.
+ These stats are atomically incremented upon every segment
  introduction, and upon successful persistence.
2017-12-28 14:07:44 -07:00
Sreekanth Sivasankaran
8abac42796 errCheck fixes 2017-12-28 13:23:57 +05:30
Sreekanth Sivasankaran
0272451093 adding checks for robustness 2017-12-28 13:05:25 +05:30
Sreekanth Sivasankaran
76f827f469 docValue persist changes
docValues are persisted along with the index,
in a columnar fashion per field with variable
sized chunking for quick look up.
-naive chunk level caching is added per field
-data part inside a chunk is snappy compressed
-metaHeader inside the chunk index the dv values
 inside the uncompressed data part
-all the fields are docValue persisted in this iteration
2017-12-28 12:05:33 +05:30
abhinavdangeti
dcabc267a0 Wait for rollback'ed snapshot to persist 2017-12-27 10:06:29 -07:00
Steve Yen
c7a342bc7d scorch conjuncts match phrase test passes
The conjunction searcher Advance() method now checks if its curr
doc-matches suffices before advancing them.
2017-12-23 09:19:40 -08:00
Steve Yen
a884f38bf6 scorch docInternalToNumber returns 0 on error 2017-12-21 16:44:31 -08:00
Steve Yen
67e0e5973b scorch mergeStoredAndRemap() memory reuse
In mergeStoredAndRemap(), instead of allocating new hashmaps for each
document, this commit reuses some arrays that are indexed by fieldId.
2017-12-20 15:18:22 -08:00
Steve Yen
c155255506 scorch optimize zap.Merge() to reuse some buffers 2017-12-20 14:59:53 -08:00
Steve Yen
ea4eb7301b scorch merger checks closeCh 2017-12-20 14:59:53 -08:00
Steve Yen
04ac9d5b1f scorch removeOldBoltSnapshots() deletes from correct bucket 2017-12-20 14:46:48 -08:00
Steve Yen
df6c8f4074 scorch added kvconfig unsafe_batch option
Added an option to the kvconfig JSON, called "unsafe_batch" (bool).
Default is false, so Batch() calls are synchronously persisted by
default.  Advanced users may want to unsafe, asynchronous persistence
to tradeoff performance (mutations are queryable sooner) over safety.

    {
      "index_type": "scorch",
      "kvconfig": { "unsafe_batch": true }
    }

This change replaces the previous kvstore=="moss" workaround.
2017-12-20 10:11:55 -08:00
Steve Yen
1abbfadf0d scorch simplify err check after vellum load 2017-12-19 22:34:39 -08:00
Steve Yen
dbc88cf6b3 scorch docNumberToBytes() checks cap(buf) before allocating
With more pprof focusing (zooming in on a particular func), there were
still some memory allocations showing up with docNumberToBytes() in
micro benchmarks of bleve-query.  On a dev macbook, on an index of 50K
wikipedia docs, using search of relatively common "text:date"...

   400 qps - upsidedown/moss
   680 qps - scorch before
   775 qps - scorch after
2017-12-19 19:15:19 -08:00
Steve Yen
8f8333e01b scorch optimize zap Count()
This proposed approach avoids building a temporary AndNot() bitmap,
following the same kind of optimization used by mem segments.
2017-12-19 18:02:27 -08:00
Steve Yen
a0556ad65b scorch added more cases to TestIndexInsertThenDelete 2017-12-19 16:41:56 -08:00
Steve Yen
142ccdfaec scorch remove leftover doc comment
I'm suspecting that Marty's editor is more exciting than mine. :-)
2017-12-19 13:53:04 -08:00
Steve Yen
c0e09d8906
Merge pull request #676 from steveyen/scorch
scorch avoid extra clone by using roaring.AndNot(x, y)
2017-12-19 13:52:40 -08:00
Steve Yen
f8b52f5e68
Merge pull request #674 from abhinavdangeti/scorch
scorch APIs to support rollback
2017-12-19 13:38:47 -08:00
Steve Yen
d0e4f85026 scorch avoid extra clone by using roaring.AndNot(x, y) 2017-12-19 13:37:04 -08:00
abhinavdangeti
679f1ce9c3 scorch APIs to support rollback
- PreviousPersistedSnapshot
- SnapshotRevert

+ unit test
2017-12-19 10:53:08 -08:00
Steve Yen
f6b506134b import couchbase/vellum instead of couchbaselabs/vellum
Also, scrubbed an old couchbaselabs/moss reference in comments.

Also, go fmt.
2017-12-19 10:49:57 -08:00
Steve Yen
730d906a50 scorch reuses Posting instance in PostingsIterator.Next()
With this change, there are no more memory allocations in the calls to
PostingsIterator.Next() in the micro benchmarks of bleve-query.  On a
dev macbook, on an index of 50K wikipedia docs, using high frequency
search of "text:date"...

   400 qps - upsidedown/moss
   565 qps - scorch before
   680 qps - scorch after
2017-12-18 16:15:38 -08:00
Steve Yen
867bb2c031 scorch mergeplan explicitly weeds out empty segments
Rather than waiting on scoring to weed out empty segments, this commit
does the weeding out of empty segments explicitly and up front.
2017-12-18 11:33:19 -08:00
Steve Yen
20fe70770a scorch added some tests on # of expected segments 2017-12-17 12:39:15 -08:00
Steve Yen
34f5e2175f scorch fix persister for lost notifications on no-data batches
With the previous commit, there can be a scenario where batches that
had internal-updates-only can be rapidly introduced by the app, but
the persisted notifications on only the very last IndexSnapshot would
be fired.  The persisted notifications on the in-between batches might
be missed.

The solution was to track the persisted notification channels at a
higher Scorch struct level, instead of tracking the persisted channels
at the IndexSnapshot and SegmentSnapshot levels.

Also, the persister double-check looping was simplified, which avoids
a race where an introducer might incorrectly not notify the persister.
2017-12-17 12:30:05 -08:00
Steve Yen
ecbb3d2df4 scorch handles non-updating batches better
This commit improves handling when an incoming batch has internal-data
updates only and no doc updates.  In this case, a nil segment instead
of an empty segment instance is used in the segmentIntroduction.  The
segmentIntroduction, that is, might now hold only internal-data
updates only.

To handle synchronous persistence, a new field that's a slice of
persisted notification channels is added to the IndexSnapshot struct,
which the persister goroutine will close as each IndexSnapshot is
persisted.

Also, as part of this change, instead of checking the unsafeBatch flag
in several places, we instead check for non-nil'ness of these
persisted chan's.
2017-12-17 08:51:23 -08:00
Steve Yen
e98602600d scorch mergeplan added TierGrowth option
Previously, CalcBudget() was treating
MergePlanOptions.SegmentsPerMergeTask as the growth factor while
computing the idealized staircase of segments.

This change introduces a TierGrowth option to MergePlanOptions for
more control and so that SegmentsPerMergeTask can be tweaked
independently of the tier growth factor.
2017-12-16 14:22:15 -08:00
Steve Yen
0539744e90 scorch mergeplan.ToBarChart() refactored to callable API
Refactored out API so it's usable from other places.
2017-12-16 08:39:10 -08:00
Steve Yen
dc4df18001
Merge pull request #662 from steveyen/scorch
scorch mergeplan package comments tweak
2017-12-15 18:41:20 -08:00
Marty Schoch
a575be4d56 fix issue where we incorrectly seed the nextSegmentID on Open() 2017-12-15 19:26:23 -05:00
Steve Yen
45c212a0c2 scorch mergeplan package comments tweak
Moving the package comment for mergeplan to the right place.
2017-12-15 13:25:39 -08:00
Steve Yen
620dcdb6f8 scorch uses prealloc'ed buffer for docNumberToBytes()
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now faster than
upsidedown/moss on high-freq term search "text:date"...

       400 qps - upsidedown/moss
       404 qps - scorch before
       565 qps - scorch after
2017-12-15 11:58:21 -08:00
Steve Yen
f05794c6aa scorch removed worker goroutines from TermFieldReader()
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now in more the same
neighborhood of upsidedown/moss...

high-freq term search "text:date"...
   400 qps - upsidedown/moss
   360 qps - scorch before
   404 qps - scorch after

zero-freq term search "text:mschoch"...
  100K qps - upsidedown/moss
   55K qps - scorch before
   99K qps - scorch after

Of note, the scorch index had ~150 *.zap files in it, which likely
made made the worker goroutine overhead more costly than for a case
with few segments, where goroutine and channel related work appeared
relatively prominently in the pprof SVG's.
2017-12-15 11:11:18 -08:00
Marty Schoch
562b473e36
Merge pull request #657 from steveyen/scorch
scorch fix data race w/ AddEligibleForRemoval
2017-12-14 17:56:06 -05:00
Marty Schoch
b5aa4ed22b return err not panic 2017-12-14 17:41:02 -05:00
Steve Yen
506aa1c325 scorch fix data race w/ AddEligibleForRemoval
Found from "go test -race ./..."

WARNING: DATA RACE
Read at 0x00c420088060 by goroutine 48:
  github.com/blevesearch/bleve/index/scorch.(*Scorch).AddEligibleForRemoval()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:348 +0x6d

Previous write at 0x00c420088060 by goroutine 31:
  github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt.func1()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:332 +0x87b
  github.com/boltdb/bolt.(*DB).View()
      /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1
  github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1
  github.com/blevesearch/bleve/index/scorch.(*Scorch).Open()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f
  github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351
  testing.tRunner()
      /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c

Goroutine 48 (running) created at:
  github.com/blevesearch/bleve/index/scorch.(*IndexSnapshot).DecRef()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/snapshot_index.go:72 +0x23e
  github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt.func1()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:330 +0x8f4
  github.com/boltdb/bolt.(*DB).View()
      /Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1
  github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1
  github.com/blevesearch/bleve/index/scorch.(*Scorch).Open()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f
  github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen()
      /Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351
  testing.tRunner()
      /usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c
2017-12-14 14:40:33 -08:00
Marty Schoch
6ab27e4afa quick hack to disable safe batches in fts 2017-12-14 17:19:50 -05:00
Steve Yen
eb2f541d4f scorch filters _id from Reader.Document() results 2017-12-14 13:52:28 -08:00
Steve Yen
a8884e1011 scorch fix for TestSortMatchSearch
The cachedDocs preparation has to happen for all docs in the field,
not just on the currently requested docNum.

Also, as part of this commit, there's a loop optimization where we no
longer use bytes.Split() on the terms buffer, thus avoiding garbage
creation.
2017-12-14 13:22:13 -08:00
Steve Yen
2be5eb4427 scorch tracks zap files that can't be removed yet
A race & solution found by Marty Schoch... consider a case when the
merger might grab a nextSegmentID, like 4, but takes awhile to
complete.  Meanwhile, the persister grabs the nextSegmentID of 5, but
finishes its persistence work fast, and then loops to cleanup any old
files.  The simple approach of checking a "highest segment ID" of 5 is
wrong now, because the deleter now thinks that segment 4's zap file is
(incorrectly) ok to delete.

The solution in this commit is to track an ephemeral map of filenames
which are ineligibleForRemoval, because they're still being written
(by the merger) and haven't been fully incorporated into the rootBolt
yet.

The merger adds to that ineligibleForRemoval map as it starts a merged
zap file, the persister cleans up entries from that map when it
persists zap filenames into the rootBolt, and the deleter (part of the
persister's loop) consults the map before performing any actual zap
file deletions.
2017-12-14 10:49:33 -08:00
Marty Schoch
bd742caf65 don't try to close a nil segment if err opening 2017-12-14 10:29:19 -05:00
Marty Schoch
149a26b5c1 merge deletion and cacheddocs fixes discussed in meeting 2017-12-14 10:27:39 -05:00
Sreekanth Sivasankaran
95b65ade3e getting right internalID for doc in UT 2017-12-14 17:16:47 +05:30
Sreekanth Sivasankaran
1066ee7d22 DocumentVisitFieldTerms Scorch implementation level1 2017-12-14 12:38:29 +05:30
Marty Schoch
2b92e5ff99
Merge pull request #653 from steveyen/scorch
scorch cleanup of the rootBolt of old snapshots
2017-12-13 22:47:14 -05:00
Marty Schoch
e1b0c61e2a fix bug in handling iterator-done 2017-12-13 22:08:06 -05:00
Steve Yen
b7dff6669f scorch cleanup of *.zap files not listed in the rootBolt 2017-12-13 17:09:50 -08:00
Steve Yen
c0cc46a2be scorch cleanup of the rootBolt of old snapshots
A new global variable, NumSnapshotsToKeep, represents the default
number of old snapshots that each scorch instance should maintain -- 0
is the default.  Apps that need rollback'ability may want to increase
this value in early initialization.

The Scorch.eligibleForRemoval field tracks epoches which are safe to
delete from the rootBolt.  The eligibleForRemoval is appended to
whenever the ref-count on an IndexSnapshot drops to 0.

On startup, eligibleForRemoval is also initialized with any older
epoch's found in the rootBolt.

The newly introduced Scorch.removeOldSnapshots() method is called on
every cycle of the persisterLoop(), where it maintains the
eligibleForRemoval slice to under a size defined by the
NumSnapshotsToKeep.

A future commit will remove actual storage files in order to match the
"source of truth" information found in the rootBolt.
2017-12-13 15:53:31 -08:00
Steve Yen
c13ff85aaf scorch ref-counting
Future commits will provide actual cleanup when ref-counts reach 0.
2017-12-13 14:48:07 -08:00
Marty Schoch
50471003dc basic refactoring of introducer to make it more readable 2017-12-13 16:30:39 -05:00
Marty Schoch
a0e12b2640 add license to a few files missing it 2017-12-13 16:12:29 -05:00
Marty Schoch
85e15628ee major refactoring of posting details 2017-12-13 16:10:06 -05:00
Marty Schoch
6e2207c445 additional refactoring of build/merge 2017-12-13 15:22:13 -05:00
Marty Schoch
50441e5065 refactor to reuse shared code 2017-12-13 14:41:20 -05:00
Marty Schoch
289dc398bd more refacotring of build/merge 2017-12-13 14:26:11 -05:00
Marty Schoch
1cd3fd7fbe extrac common functionality between build/merge 2017-12-13 14:06:54 -05:00
Marty Schoch
cd45487cb3 fsync rootBolt when persisting snapshot 2017-12-13 13:55:06 -05:00
Marty Schoch
f83c9f2a20 initial cut of merger that actually introduces changes 2017-12-13 13:41:03 -05:00
Marty Schoch
c15c3c11cd extra protection if dict address is 0 (empty segment) 2017-12-13 13:31:18 -05:00
Steve Yen
be7dd36ac6 mergeplan: more tests and bargraph tweaks 2017-12-12 10:37:27 -08:00
Steve Yen
59a1e26300 mergeplan: scoring implemented 2017-12-12 10:37:27 -08:00
Marty Schoch
57121e40a8 fix issues identified by errcheck 2017-12-12 11:41:14 -05:00
Marty Schoch
665c3c80ff initial cut of zap segment merging 2017-12-12 11:21:55 -05:00
Marty Schoch
927216df8c fix postings list count impl 2017-12-12 08:42:13 -05:00
Steve Yen
3461fb741f mergeplan: a placeholder planner that merges all segments
A stepping stone to fleshing out the API contract.
2017-12-11 14:53:08 -08:00
Marty Schoch
58ef21a88a fix golint issue 2017-12-11 16:24:46 -05:00
Marty Schoch
f246e0e4c0 update README for zap file format changes 2017-12-11 16:22:29 -05:00
Marty Schoch
74b2eeb14d refactor where we do some work so we can return error 2017-12-11 15:59:36 -05:00
Marty Schoch
f13b786609 fix up issues to get all bleve unit tests passing for scorch
make scorch default
2017-12-11 15:47:41 -05:00
Marty Schoch
d7eb223e14 remove bolt segment format
upcomning breaking changes and no desire to maintain
2017-12-11 10:20:26 -05:00
Marty Schoch
eada7b209b fix test issue identified by sreekanth 2017-12-11 10:16:56 -05:00
Marty Schoch
8280859bb8 handle read-only and in-mem only cases 2017-12-11 09:07:01 -05:00
Marty Schoch
e8cc7ac0bf add new fields command to zap cmd-line util 2017-12-11 09:05:50 -05:00
Marty Schoch
690cd39921 add crazy slow but functional DocumentVisitFieldTerms 2017-12-10 08:55:59 -05:00
Marty Schoch
dc0adc8827 add fsync 2017-12-09 20:52:01 -05:00
Marty Schoch
e0d9828cd0 add more detail to the readme 2017-12-09 14:42:36 -05:00
Marty Schoch
414899618b switch from bolt format to zap in the persister 2017-12-09 14:28:50 -05:00
Marty Schoch
9781d9b089 add initial version of zap file format 2017-12-09 14:28:33 -05:00
Marty Schoch
ff2e6b98e4 added empty segment 2017-12-09 12:43:02 -05:00
Marty Schoch
e470105635 fix issues identified by errcheck 2017-12-06 18:36:14 -05:00
Marty Schoch
adac4f41db initial version of scorch which persists index to disk 2017-12-06 18:33:47 -05:00
Marty Schoch
b1346b4c8a add readme describing our use of bolt as a segment format 2017-12-05 16:09:00 -05:00
Marty Schoch
898a6b1e85 fix errcheck issues 2017-12-05 13:32:57 -05:00
Marty Schoch
ece27ef215 adding initial version of bolt persisted segment 2017-12-05 13:05:12 -05:00
Marty Schoch
f6be841668 add test for postings list count method 2017-12-05 13:01:36 -05:00
Marty Schoch
30e9d6daa5 add better testing of array positions 2017-12-05 12:54:44 -05:00
Marty Schoch
8d9d45115f add test of location field 2017-12-05 12:20:06 -05:00
Marty Schoch
8f0350865b add test for segment fields method 2017-12-05 12:17:56 -05:00
Marty Schoch
7a6b5483f2 add validation that all locations were seen 2017-12-05 11:58:05 -05:00
Marty Schoch
e08fdab54a remove todo item 2017-12-05 10:13:27 -05:00
Marty Schoch
87e2627551 added dictionary tests to mem segment 2017-12-05 09:49:41 -05:00
Marty Schoch
ed067f45dd added Close() method to Segment 2017-12-05 09:31:02 -05:00
Marty Schoch
22ffc8940e update segment API to return error in key places 2017-12-04 18:06:06 -05:00
Marty Schoch
b74cf4b081 add copyright header to all new files in scorch 2017-12-01 15:42:50 -05:00