bleve

Author	SHA1	Message	Date
Steve Yen	0f19b542a3	scorch mem segment prealloc's Locfields/starts/ends/pos/arraypos This change preallocates more of the backing arrays for Locfields, Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem segment. On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch indexing throughput seems to improve from 15MB/sec to 20MB/sec after the recent series of preallocation changes.	2018-01-15 18:40:28 -08:00
Steve Yen	a84bd122d2	scorch mem segment preallocates sub-slices via # terms This change tracks the number of terms per posting list to preallocate the sub-slices for the Freqs & Norms.	2018-01-15 18:20:43 -08:00
Steve Yen	a4110d325c	scorch mem segment preallocates slices that are key'ed by postingId The scorch mem segment build phase uses the append() idiom to populate various slices that are keyed by postings list id's. These slices include... * Postings * PostingsLocs * Freqs * Norms * Locfields * Locstarts * Locends * Locpos * Locarraypos This change introduces an initialization step that preallocates those slices up-front, by assigning postings list id's to terms up-front. This change also has an additional effect of simplifying the processDocument() logic to no longer have to worry about a first-time initialization case, removing some duplicate'ish code.	2018-01-15 16:53:39 -08:00
Steve Yen	917c470791	scorch mem segment VisitDocument() accesses StoredTypes/Pos outside of loop	2018-01-15 11:54:46 -08:00
Steve Yen	e7bd6026eb	scorch mem segment preallocs docMap/fieldLens with capacity The first time through, startNumFields should be 0, where there ought to be more optimization assuming later docs have similar fields as the first doc.	2018-01-15 11:52:20 -08:00
Steve Yen	d777d7c365	scorch mem segment comments consistency	2018-01-15 11:08:21 -08:00
Marty Schoch	4e82a8a0ca	Merge pull request #726 from sreekanth-cb/docValue_configs DocValue Config, new API Changes	2018-01-10 18:11:18 -05:00
Sreekanth Sivasankaran	53aef2104e	fixing err handling in UTs, name changes	2018-01-10 22:00:26 +05:30
abhinavdangeti	43bfcc00c9	Do not account mmap'ed part of zap segments in MemoryUsed This API is designed to only emit the dirty "unpersisted" bytes only. This does not included the mmap'ed part in the zap segments (disk).	2018-01-09 09:43:53 -08:00
Sreekanth Sivasankaran	4c256f5669	DocValue Config, new API Changes -VisitableDocValueFields API for persisted DV field list -making dv configs overridable at field level -enabling on the fly/runtime un inverting of doc values -few UT updates	2018-01-08 10:58:33 +05:30
Marty Schoch	c691cd2bb5	refactor scorch/zap command-line tools under bleve zap command-line tool added to main bleve command-line tool this required physical relocation due to the vendoring used only on the bleve command-line tool (unforseen limitation) a new scorch command-line tool has also been introduced and for the same reasons it is physically store under the top-level bleve command-line tool as well	2018-01-05 10:17:18 -05:00
Sreekanth Sivasankaran	71a726bbf6	perf issue was due to duplicate fieldIDs getting inserted to the list of dv enabled fields list - DocValueFields in mem segment. Moved back to the original type `DocValueFields map[uint16]bool` for easy look up to check whether the fieldID is configured for dv storage.	2018-01-04 15:34:55 +05:30
Sreekanth Sivasankaran	f42ecb0ac7	docvalue "zap-path" cmd to print out the dv disk sizes	2018-01-04 13:58:51 +05:30
Sreekanth Sivasankaran	448201243a	removed redundant buf writer, and checks	2017-12-30 16:54:06 +05:30
Sreekanth Sivasankaran	61ba81e964	Merge branch 'scorch', remote-tracking branch 'origin' into docValue_persisted	2017-12-30 16:52:51 +05:30
abhinavdangeti	5c26f5a86d	Tracking memory consumption for a scorch index + Track memory usage at a segment level + Add a new scorch API: MemoryUsed() - Aggregate the memory consumption across segments when API is invoked. + TODO: - Revisit the second iteration if it can be gotten rid off, and the size accounted for during the first run while building an in-mem segment. - Accounting for pointer and slice overhead.	2017-12-29 10:20:11 -07:00
Sreekanth Sivasankaran	c8df014c0c	Updated readme, zap version, added new docvalue cmd, fixed the footer and fields cmd, interface name updated	2017-12-29 21:39:29 +05:30
Sreekanth Sivasankaran	8abac42796	errCheck fixes	2017-12-28 13:23:57 +05:30
Sreekanth Sivasankaran	0272451093	adding checks for robustness	2017-12-28 13:05:25 +05:30
Sreekanth Sivasankaran	76f827f469	docValue persist changes docValues are persisted along with the index, in a columnar fashion per field with variable sized chunking for quick look up. -naive chunk level caching is added per field -data part inside a chunk is snappy compressed -metaHeader inside the chunk index the dv values inside the uncompressed data part -all the fields are docValue persisted in this iteration	2017-12-28 12:05:33 +05:30
Steve Yen	67e0e5973b	scorch mergeStoredAndRemap() memory reuse In mergeStoredAndRemap(), instead of allocating new hashmaps for each document, this commit reuses some arrays that are indexed by fieldId.	2017-12-20 15:18:22 -08:00
Steve Yen	c155255506	scorch optimize zap.Merge() to reuse some buffers	2017-12-20 14:59:53 -08:00
Steve Yen	1abbfadf0d	scorch simplify err check after vellum load	2017-12-19 22:34:39 -08:00
Steve Yen	8f8333e01b	scorch optimize zap Count() This proposed approach avoids building a temporary AndNot() bitmap, following the same kind of optimization used by mem segments.	2017-12-19 18:02:27 -08:00
Steve Yen	d0e4f85026	scorch avoid extra clone by using roaring.AndNot(x, y)	2017-12-19 13:37:04 -08:00
Steve Yen	f6b506134b	import couchbase/vellum instead of couchbaselabs/vellum Also, scrubbed an old couchbaselabs/moss reference in comments. Also, go fmt.	2017-12-19 10:49:57 -08:00
Steve Yen	730d906a50	scorch reuses Posting instance in PostingsIterator.Next() With this change, there are no more memory allocations in the calls to PostingsIterator.Next() in the micro benchmarks of bleve-query. On a dev macbook, on an index of 50K wikipedia docs, using high frequency search of "text:date"... 400 qps - upsidedown/moss 565 qps - scorch before 680 qps - scorch after	2017-12-18 16:15:38 -08:00
Marty Schoch	b5aa4ed22b	return err not panic	2017-12-14 17:41:02 -05:00
Marty Schoch	e1b0c61e2a	fix bug in handling iterator-done	2017-12-13 22:08:06 -05:00
Steve Yen	c13ff85aaf	scorch ref-counting Future commits will provide actual cleanup when ref-counts reach 0.	2017-12-13 14:48:07 -08:00
Marty Schoch	a0e12b2640	add license to a few files missing it	2017-12-13 16:12:29 -05:00
Marty Schoch	85e15628ee	major refactoring of posting details	2017-12-13 16:10:06 -05:00
Marty Schoch	6e2207c445	additional refactoring of build/merge	2017-12-13 15:22:13 -05:00
Marty Schoch	50441e5065	refactor to reuse shared code	2017-12-13 14:41:20 -05:00
Marty Schoch	289dc398bd	more refacotring of build/merge	2017-12-13 14:26:11 -05:00
Marty Schoch	1cd3fd7fbe	extrac common functionality between build/merge	2017-12-13 14:06:54 -05:00
Marty Schoch	f83c9f2a20	initial cut of merger that actually introduces changes	2017-12-13 13:41:03 -05:00
Marty Schoch	c15c3c11cd	extra protection if dict address is 0 (empty segment)	2017-12-13 13:31:18 -05:00
Marty Schoch	57121e40a8	fix issues identified by errcheck	2017-12-12 11:41:14 -05:00
Marty Schoch	665c3c80ff	initial cut of zap segment merging	2017-12-12 11:21:55 -05:00
Marty Schoch	927216df8c	fix postings list count impl	2017-12-12 08:42:13 -05:00
Marty Schoch	58ef21a88a	fix golint issue	2017-12-11 16:24:46 -05:00
Marty Schoch	f246e0e4c0	update README for zap file format changes	2017-12-11 16:22:29 -05:00
Marty Schoch	74b2eeb14d	refactor where we do some work so we can return error	2017-12-11 15:59:36 -05:00
Marty Schoch	f13b786609	fix up issues to get all bleve unit tests passing for scorch make scorch default	2017-12-11 15:47:41 -05:00
Marty Schoch	d7eb223e14	remove bolt segment format upcomning breaking changes and no desire to maintain	2017-12-11 10:20:26 -05:00
Marty Schoch	8280859bb8	handle read-only and in-mem only cases	2017-12-11 09:07:01 -05:00
Marty Schoch	e8cc7ac0bf	add new fields command to zap cmd-line util	2017-12-11 09:05:50 -05:00
Marty Schoch	dc0adc8827	add fsync	2017-12-09 20:52:01 -05:00
Marty Schoch	e0d9828cd0	add more detail to the readme	2017-12-09 14:42:36 -05:00
Marty Schoch	9781d9b089	add initial version of zap file format	2017-12-09 14:28:33 -05:00
Marty Schoch	ff2e6b98e4	added empty segment	2017-12-09 12:43:02 -05:00
Marty Schoch	adac4f41db	initial version of scorch which persists index to disk	2017-12-06 18:33:47 -05:00
Marty Schoch	b1346b4c8a	add readme describing our use of bolt as a segment format	2017-12-05 16:09:00 -05:00
Marty Schoch	898a6b1e85	fix errcheck issues	2017-12-05 13:32:57 -05:00
Marty Schoch	ece27ef215	adding initial version of bolt persisted segment	2017-12-05 13:05:12 -05:00
Marty Schoch	f6be841668	add test for postings list count method	2017-12-05 13:01:36 -05:00
Marty Schoch	30e9d6daa5	add better testing of array positions	2017-12-05 12:54:44 -05:00
Marty Schoch	8d9d45115f	add test of location field	2017-12-05 12:20:06 -05:00
Marty Schoch	8f0350865b	add test for segment fields method	2017-12-05 12:17:56 -05:00
Marty Schoch	7a6b5483f2	add validation that all locations were seen	2017-12-05 11:58:05 -05:00
Marty Schoch	e08fdab54a	remove todo item	2017-12-05 10:13:27 -05:00
Marty Schoch	87e2627551	added dictionary tests to mem segment	2017-12-05 09:49:41 -05:00
Marty Schoch	ed067f45dd	added Close() method to Segment	2017-12-05 09:31:02 -05:00
Marty Schoch	22ffc8940e	update segment API to return error in key places	2017-12-04 18:06:06 -05:00
Marty Schoch	b74cf4b081	add copyright header to all new files in scorch	2017-12-01 15:42:50 -05:00
Marty Schoch	89aa02cf5b	fix highlighting of composite fields updated log statements for refactored names	2017-12-01 15:12:08 -05:00
Marty Schoch	cff14f1212	fix crash in DocNumbers when segment is empty	2017-12-01 09:50:27 -05:00
Marty Schoch	eb256f78bc	switch to constant referring to id field id 0 this avoids potentially mutating something that is intended to be immutable	2017-12-01 09:30:07 -05:00
Marty Schoch	395458ce83	refactor to make mem segment contents exported	2017-12-01 07:26:47 -05:00
Marty Schoch	848aca4639	fix issues identified by errcheck	2017-11-29 13:34:15 -05:00
Marty Schoch	23f6dc1cc6	working in-memory version	2017-11-29 11:33:35 -05:00

1 2 3

122 Commits