Steve Yen
f05794c6aa
scorch removed worker goroutines from TermFieldReader()
...
On a couple of micro benchmarks on a dev macbook using bleve-query on
an index of 50K wikipedia docs, scorch is now in more the same
neighborhood of upsidedown/moss...
high-freq term search "text:date"...
400 qps - upsidedown/moss
360 qps - scorch before
404 qps - scorch after
zero-freq term search "text:mschoch"...
100K qps - upsidedown/moss
55K qps - scorch before
99K qps - scorch after
Of note, the scorch index had ~150 *.zap files in it, which likely
made made the worker goroutine overhead more costly than for a case
with few segments, where goroutine and channel related work appeared
relatively prominently in the pprof SVG's.
2017-12-15 11:11:18 -08:00
Marty Schoch
562b473e36
Merge pull request #657 from steveyen/scorch
...
scorch fix data race w/ AddEligibleForRemoval
2017-12-14 17:56:06 -05:00
Marty Schoch
b5aa4ed22b
return err not panic
2017-12-14 17:41:02 -05:00
Steve Yen
506aa1c325
scorch fix data race w/ AddEligibleForRemoval
...
Found from "go test -race ./..."
WARNING: DATA RACE
Read at 0x00c420088060 by goroutine 48:
github.com/blevesearch/bleve/index/scorch.(*Scorch).AddEligibleForRemoval()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:348 +0x6d
Previous write at 0x00c420088060 by goroutine 31:
github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt.func1()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:332 +0x87b
github.com/boltdb/bolt.(*DB).View()
/Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1
github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1
github.com/blevesearch/bleve/index/scorch.(*Scorch).Open()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f
github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351
testing.tRunner()
/usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c
Goroutine 48 (running) created at:
github.com/blevesearch/bleve/index/scorch.(*IndexSnapshot).DecRef()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/snapshot_index.go:72 +0x23e
github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt.func1()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:330 +0x8f4
github.com/boltdb/bolt.(*DB).View()
/Users/steveyen/go/src/github.com/boltdb/bolt/db.go:629 +0xc1
github.com/blevesearch/bleve/index/scorch.(*Scorch).loadFromBolt()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/persister.go:290 +0xa1
github.com/blevesearch/bleve/index/scorch.(*Scorch).Open()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch.go:121 +0x77f
github.com/blevesearch/bleve/index/scorch.TestIndexOpenReopen()
/Users/steveyen/go/src/github.com/blevesearch/bleve/index/scorch/scorch_test.go:115 +0x1351
testing.tRunner()
/usr/local/Cellar/go/1.9/libexec/src/testing/testing.go:746 +0x16c
2017-12-14 14:40:33 -08:00
Marty Schoch
6ab27e4afa
quick hack to disable safe batches in fts
2017-12-14 17:19:50 -05:00
Steve Yen
eb2f541d4f
scorch filters _id from Reader.Document() results
2017-12-14 13:52:28 -08:00
Steve Yen
a8884e1011
scorch fix for TestSortMatchSearch
...
The cachedDocs preparation has to happen for all docs in the field,
not just on the currently requested docNum.
Also, as part of this commit, there's a loop optimization where we no
longer use bytes.Split() on the terms buffer, thus avoiding garbage
creation.
2017-12-14 13:22:13 -08:00
Steve Yen
2be5eb4427
scorch tracks zap files that can't be removed yet
...
A race & solution found by Marty Schoch... consider a case when the
merger might grab a nextSegmentID, like 4, but takes awhile to
complete. Meanwhile, the persister grabs the nextSegmentID of 5, but
finishes its persistence work fast, and then loops to cleanup any old
files. The simple approach of checking a "highest segment ID" of 5 is
wrong now, because the deleter now thinks that segment 4's zap file is
(incorrectly) ok to delete.
The solution in this commit is to track an ephemeral map of filenames
which are ineligibleForRemoval, because they're still being written
(by the merger) and haven't been fully incorporated into the rootBolt
yet.
The merger adds to that ineligibleForRemoval map as it starts a merged
zap file, the persister cleans up entries from that map when it
persists zap filenames into the rootBolt, and the deleter (part of the
persister's loop) consults the map before performing any actual zap
file deletions.
2017-12-14 10:49:33 -08:00
Marty Schoch
bd742caf65
don't try to close a nil segment if err opening
2017-12-14 10:29:19 -05:00
Marty Schoch
149a26b5c1
merge deletion and cacheddocs fixes discussed in meeting
2017-12-14 10:27:39 -05:00
Sreekanth Sivasankaran
95b65ade3e
getting right internalID for doc in UT
2017-12-14 17:16:47 +05:30
Sreekanth Sivasankaran
1066ee7d22
DocumentVisitFieldTerms Scorch implementation level1
2017-12-14 12:38:29 +05:30
Marty Schoch
2b92e5ff99
Merge pull request #653 from steveyen/scorch
...
scorch cleanup of the rootBolt of old snapshots
2017-12-13 22:47:14 -05:00
Marty Schoch
e1b0c61e2a
fix bug in handling iterator-done
2017-12-13 22:08:06 -05:00
Steve Yen
b7dff6669f
scorch cleanup of *.zap files not listed in the rootBolt
2017-12-13 17:09:50 -08:00
Steve Yen
c0cc46a2be
scorch cleanup of the rootBolt of old snapshots
...
A new global variable, NumSnapshotsToKeep, represents the default
number of old snapshots that each scorch instance should maintain -- 0
is the default. Apps that need rollback'ability may want to increase
this value in early initialization.
The Scorch.eligibleForRemoval field tracks epoches which are safe to
delete from the rootBolt. The eligibleForRemoval is appended to
whenever the ref-count on an IndexSnapshot drops to 0.
On startup, eligibleForRemoval is also initialized with any older
epoch's found in the rootBolt.
The newly introduced Scorch.removeOldSnapshots() method is called on
every cycle of the persisterLoop(), where it maintains the
eligibleForRemoval slice to under a size defined by the
NumSnapshotsToKeep.
A future commit will remove actual storage files in order to match the
"source of truth" information found in the rootBolt.
2017-12-13 15:53:31 -08:00
Steve Yen
c13ff85aaf
scorch ref-counting
...
Future commits will provide actual cleanup when ref-counts reach 0.
2017-12-13 14:48:07 -08:00
Marty Schoch
50471003dc
basic refactoring of introducer to make it more readable
2017-12-13 16:30:39 -05:00
Marty Schoch
a0e12b2640
add license to a few files missing it
2017-12-13 16:12:29 -05:00
Marty Schoch
85e15628ee
major refactoring of posting details
2017-12-13 16:10:06 -05:00
Marty Schoch
6e2207c445
additional refactoring of build/merge
2017-12-13 15:22:13 -05:00
Marty Schoch
50441e5065
refactor to reuse shared code
2017-12-13 14:41:20 -05:00
Marty Schoch
289dc398bd
more refacotring of build/merge
2017-12-13 14:26:11 -05:00
Marty Schoch
1cd3fd7fbe
extrac common functionality between build/merge
2017-12-13 14:06:54 -05:00
Marty Schoch
cd45487cb3
fsync rootBolt when persisting snapshot
2017-12-13 13:55:06 -05:00
Marty Schoch
f83c9f2a20
initial cut of merger that actually introduces changes
2017-12-13 13:41:03 -05:00
Marty Schoch
c15c3c11cd
extra protection if dict address is 0 (empty segment)
2017-12-13 13:31:18 -05:00
Steve Yen
be7dd36ac6
mergeplan: more tests and bargraph tweaks
2017-12-12 10:37:27 -08:00
Steve Yen
59a1e26300
mergeplan: scoring implemented
2017-12-12 10:37:27 -08:00
Marty Schoch
57121e40a8
fix issues identified by errcheck
2017-12-12 11:41:14 -05:00
Marty Schoch
665c3c80ff
initial cut of zap segment merging
2017-12-12 11:21:55 -05:00
Marty Schoch
927216df8c
fix postings list count impl
2017-12-12 08:42:13 -05:00
Steve Yen
3461fb741f
mergeplan: a placeholder planner that merges all segments
...
A stepping stone to fleshing out the API contract.
2017-12-11 14:53:08 -08:00
Marty Schoch
58ef21a88a
fix golint issue
2017-12-11 16:24:46 -05:00
Marty Schoch
f246e0e4c0
update README for zap file format changes
2017-12-11 16:22:29 -05:00
Marty Schoch
74b2eeb14d
refactor where we do some work so we can return error
2017-12-11 15:59:36 -05:00
Marty Schoch
f13b786609
fix up issues to get all bleve unit tests passing for scorch
...
make scorch default
2017-12-11 15:47:41 -05:00
Marty Schoch
d7eb223e14
remove bolt segment format
...
upcomning breaking changes and no desire to maintain
2017-12-11 10:20:26 -05:00
Marty Schoch
eada7b209b
fix test issue identified by sreekanth
2017-12-11 10:16:56 -05:00
Marty Schoch
8280859bb8
handle read-only and in-mem only cases
2017-12-11 09:07:01 -05:00
Marty Schoch
e8cc7ac0bf
add new fields command to zap cmd-line util
2017-12-11 09:05:50 -05:00
Marty Schoch
690cd39921
add crazy slow but functional DocumentVisitFieldTerms
2017-12-10 08:55:59 -05:00
Marty Schoch
dc0adc8827
add fsync
2017-12-09 20:52:01 -05:00
Marty Schoch
e0d9828cd0
add more detail to the readme
2017-12-09 14:42:36 -05:00
Marty Schoch
414899618b
switch from bolt format to zap in the persister
2017-12-09 14:28:50 -05:00
Marty Schoch
9781d9b089
add initial version of zap file format
2017-12-09 14:28:33 -05:00
Marty Schoch
ff2e6b98e4
added empty segment
2017-12-09 12:43:02 -05:00
Marty Schoch
e470105635
fix issues identified by errcheck
2017-12-06 18:36:14 -05:00
Marty Schoch
adac4f41db
initial version of scorch which persists index to disk
2017-12-06 18:33:47 -05:00
Marty Schoch
b1346b4c8a
add readme describing our use of bolt as a segment format
2017-12-05 16:09:00 -05:00
Marty Schoch
898a6b1e85
fix errcheck issues
2017-12-05 13:32:57 -05:00
Marty Schoch
ece27ef215
adding initial version of bolt persisted segment
2017-12-05 13:05:12 -05:00
Marty Schoch
f6be841668
add test for postings list count method
2017-12-05 13:01:36 -05:00
Marty Schoch
30e9d6daa5
add better testing of array positions
2017-12-05 12:54:44 -05:00
Marty Schoch
8d9d45115f
add test of location field
2017-12-05 12:20:06 -05:00
Marty Schoch
8f0350865b
add test for segment fields method
2017-12-05 12:17:56 -05:00
Marty Schoch
7a6b5483f2
add validation that all locations were seen
2017-12-05 11:58:05 -05:00
Marty Schoch
e08fdab54a
remove todo item
2017-12-05 10:13:27 -05:00
Marty Schoch
87e2627551
added dictionary tests to mem segment
2017-12-05 09:49:41 -05:00
Marty Schoch
ed067f45dd
added Close() method to Segment
2017-12-05 09:31:02 -05:00
Marty Schoch
22ffc8940e
update segment API to return error in key places
2017-12-04 18:06:06 -05:00
Marty Schoch
b74cf4b081
add copyright header to all new files in scorch
2017-12-01 15:42:50 -05:00
Marty Schoch
89aa02cf5b
fix highlighting of composite fields
...
updated log statements for refactored names
2017-12-01 15:12:08 -05:00
Marty Schoch
cff14f1212
fix crash in DocNumbers when segment is empty
2017-12-01 09:50:27 -05:00
Marty Schoch
eb256f78bc
switch to constant referring to id field id 0
...
this avoids potentially mutating something that is intended
to be immutable
2017-12-01 09:30:07 -05:00
Marty Schoch
7c964de8bf
switch to binary search for finding segment from global doc num
...
added unit tests for this function specifically
2017-12-01 09:26:51 -05:00
Marty Schoch
c2047dcdf9
refactor doc id reader creation to share more code
...
fix issue identified by steve
2017-12-01 08:54:39 -05:00
Marty Schoch
bcd4bdc3d1
added initial bolt thought to README
2017-12-01 07:27:04 -05:00
Marty Schoch
395458ce83
refactor to make mem segment contents exported
2017-12-01 07:26:47 -05:00
Steve Yen
398dcb19b3
scorch introducer uses the roaring.Or(x, y) API
...
Instead of cloning an input bitmap, the roaring.Or(x, y)
implementation fills a brand new result bitmap, which should be allow
for more efficient packing and memory utilization.
2017-11-30 10:37:10 -08:00
Steve Yen
67986d41bf
scorch InternalID() handles case of unknown docId
2017-11-30 08:36:01 -08:00
Marty Schoch
848aca4639
fix issues identified by errcheck
2017-11-29 13:34:15 -05:00
Marty Schoch
23f6dc1cc6
working in-memory version
2017-11-29 11:33:35 -05:00