bleve

Author	SHA1	Message	Date
Marty Schoch	11ff31c2f9	rename SizeFull to Size	2018-03-16 11:31:47 -04:00
Marty Schoch	f1c26e29f0	Merge branch 'master' into avoid-app-herder-hot-lock	2018-03-16 10:30:34 -04:00
Marty Schoch	45e0e5c666	memoize the size of an entire index snapshot by memoizing the size of index snapshots and their constituent parts, we significantly reduce the amount of time that the lock is held in the app_herder, when calculating the total memory used	2018-03-15 17:25:05 -04:00
abhinavdangeti	65fed52d0b	Do not account IndexReader's size in the query RAM estimate Since its just the pointer size of the IndexReader that is being accounted for while estimating the RAM needed to execute a search query, get rid of the Size() API in the IndexReader interface.	2018-03-15 13:23:58 -07:00
abhinavdangeti	7e36109b3c	MB-28162: Provide API to estimate memory needed to run a search query This API (unexported) will estimate the amount of memory needed to execute a search query over an index before the collector begins data collection. Sample estimates for certain queries: {Size: 10, BenchmarkUpsidedownSearchOverhead} ESTIMATE BENCHMEM TermQuery 4616 4796 MatchQuery 5210 5405 DisjunctionQuery (Match queries) 7700 8447 DisjunctionQuery (Term queries) 6514 6591 ConjunctionQuery (Match queries) 7524 8175 Nested disjunction query (disjunction of disjunctions) 10306 10708 …	2018-03-06 13:53:42 -08:00
Marty Schoch	30acc55d05	remove unnecessary scorch reader wrapper we now use *IndexSnapshot directly	2018-03-02 14:03:54 -08:00
Sreekanth Sivasankaran	4b742505aa	adding stats for scorch	2018-02-28 15:31:55 +05:30
Sreekanth Sivasankaran	4c256f5669	DocValue Config, new API Changes -VisitableDocValueFields API for persisted DV field list -making dv configs overridable at field level -enabling on the fly/runtime un inverting of doc values -few UT updates	2018-01-08 10:58:33 +05:30
Marty Schoch	57a075afdb	improving command-line tool for scorch	2018-01-05 11:50:07 -05:00
Sreekanth Sivasankaran	61ba81e964	Merge branch 'scorch', remote-tracking branch 'origin' into docValue_persisted	2017-12-30 16:52:51 +05:30
Sreekanth Sivasankaran	c8df014c0c	Updated readme, zap version, added new docvalue cmd, fixed the footer and fields cmd, interface name updated	2017-12-29 21:39:29 +05:30
abhinavdangeti	4bede84fd0	Wiring up missing stats for scorch - updates, deletes, batches, errors - term_searchers_started, term_searchers_finished - num_plain_test_bytes_indexed	2017-12-28 14:07:58 -07:00
Sreekanth Sivasankaran	76f827f469	docValue persist changes docValues are persisted along with the index, in a columnar fashion per field with variable sized chunking for quick look up. -naive chunk level caching is added per field -data part inside a chunk is snappy compressed -metaHeader inside the chunk index the dv values inside the uncompressed data part -all the fields are docValue persisted in this iteration	2017-12-28 12:05:33 +05:30
Steve Yen	c7a342bc7d	scorch conjuncts match phrase test passes The conjunction searcher Advance() method now checks if its curr doc-matches suffices before advancing them.	2017-12-23 09:19:40 -08:00
Steve Yen	a884f38bf6	scorch docInternalToNumber returns 0 on error	2017-12-21 16:44:31 -08:00
Steve Yen	dbc88cf6b3	scorch docNumberToBytes() checks cap(buf) before allocating With more pprof focusing (zooming in on a particular func), there were still some memory allocations showing up with docNumberToBytes() in micro benchmarks of bleve-query. On a dev macbook, on an index of 50K wikipedia docs, using search of relatively common "text:date"... 400 qps - upsidedown/moss 680 qps - scorch before 775 qps - scorch after	2017-12-19 19:15:19 -08:00
Steve Yen	620dcdb6f8	scorch uses prealloc'ed buffer for docNumberToBytes() On a couple of micro benchmarks on a dev macbook using bleve-query on an index of 50K wikipedia docs, scorch is now faster than upsidedown/moss on high-freq term search "text:date"... 400 qps - upsidedown/moss 404 qps - scorch before 565 qps - scorch after	2017-12-15 11:58:21 -08:00
Steve Yen	f05794c6aa	scorch removed worker goroutines from TermFieldReader() On a couple of micro benchmarks on a dev macbook using bleve-query on an index of 50K wikipedia docs, scorch is now in more the same neighborhood of upsidedown/moss... high-freq term search "text:date"... 400 qps - upsidedown/moss 360 qps - scorch before 404 qps - scorch after zero-freq term search "text:mschoch"... 100K qps - upsidedown/moss 55K qps - scorch before 99K qps - scorch after Of note, the scorch index had ~150 *.zap files in it, which likely made made the worker goroutine overhead more costly than for a case with few segments, where goroutine and channel related work appeared relatively prominently in the pprof SVG's.	2017-12-15 11:11:18 -08:00
Steve Yen	eb2f541d4f	scorch filters _id from Reader.Document() results	2017-12-14 13:52:28 -08:00
Steve Yen	a8884e1011	scorch fix for TestSortMatchSearch The cachedDocs preparation has to happen for all docs in the field, not just on the currently requested docNum. Also, as part of this commit, there's a loop optimization where we no longer use bytes.Split() on the terms buffer, thus avoiding garbage creation.	2017-12-14 13:22:13 -08:00
Marty Schoch	149a26b5c1	merge deletion and cacheddocs fixes discussed in meeting	2017-12-14 10:27:39 -05:00
Sreekanth Sivasankaran	1066ee7d22	DocumentVisitFieldTerms Scorch implementation level1	2017-12-14 12:38:29 +05:30
Steve Yen	c0cc46a2be	scorch cleanup of the rootBolt of old snapshots A new global variable, NumSnapshotsToKeep, represents the default number of old snapshots that each scorch instance should maintain -- 0 is the default. Apps that need rollback'ability may want to increase this value in early initialization. The Scorch.eligibleForRemoval field tracks epoches which are safe to delete from the rootBolt. The eligibleForRemoval is appended to whenever the ref-count on an IndexSnapshot drops to 0. On startup, eligibleForRemoval is also initialized with any older epoch's found in the rootBolt. The newly introduced Scorch.removeOldSnapshots() method is called on every cycle of the persisterLoop(), where it maintains the eligibleForRemoval slice to under a size defined by the NumSnapshotsToKeep. A future commit will remove actual storage files in order to match the "source of truth" information found in the rootBolt.	2017-12-13 15:53:31 -08:00
Steve Yen	c13ff85aaf	scorch ref-counting Future commits will provide actual cleanup when ref-counts reach 0.	2017-12-13 14:48:07 -08:00
Marty Schoch	f13b786609	fix up issues to get all bleve unit tests passing for scorch make scorch default	2017-12-11 15:47:41 -05:00
Marty Schoch	690cd39921	add crazy slow but functional DocumentVisitFieldTerms	2017-12-10 08:55:59 -05:00
Marty Schoch	adac4f41db	initial version of scorch which persists index to disk	2017-12-06 18:33:47 -05:00
Marty Schoch	22ffc8940e	update segment API to return error in key places	2017-12-04 18:06:06 -05:00
Marty Schoch	b74cf4b081	add copyright header to all new files in scorch	2017-12-01 15:42:50 -05:00
Marty Schoch	7c964de8bf	switch to binary search for finding segment from global doc num added unit tests for this function specifically	2017-12-01 09:26:51 -05:00
Marty Schoch	c2047dcdf9	refactor doc id reader creation to share more code fix issue identified by steve	2017-12-01 08:54:39 -05:00
Steve Yen	67986d41bf	scorch InternalID() handles case of unknown docId	2017-11-30 08:36:01 -08:00
Marty Schoch	848aca4639	fix issues identified by errcheck	2017-11-29 13:34:15 -05:00
Marty Schoch	23f6dc1cc6	working in-memory version	2017-11-29 11:33:35 -05:00

34 Commits