bleve

Author	SHA1	Message	Date
Steve Yen	745575a6c1	scorch zap mergeStoredAndRemap uses array indexing, not append() Since we have right array size preallocated, we don't need the extra capacity checking of append().	2018-01-27 11:35:10 -08:00
Steve Yen	8dd17a3b20	scorch zap mergeStoredAndRemap uses continue for less indentation	2018-01-27 11:35:10 -08:00
Steve Yen	0041664bc4	scorch zap merge computeNewDocCount() optimize 1 variable	2018-01-27 11:35:10 -08:00
Steve Yen	6985db13a0	scorch zap merge reuses docNumbers array	2018-01-27 11:35:10 -08:00
Steve Yen	916bbf4125	scorch zap merge prealloc's docTermMap capacity	2018-01-27 11:35:10 -08:00
Steve Yen	56cdb68f35	scorch zap merge checks err2 not err Also, optimize the appending of the termSeparator so that the docTermMap is accessed and updated just once.	2018-01-27 11:35:10 -08:00
Steve Yen	3030d4edb5	scorch zap merge preallocs segNewDocNums capacity	2018-01-27 11:35:10 -08:00
Steve Yen	9038d75c98	scorch zap allocate govarint.U64Base128Encoder just once Instead of allocating a govarint.U64Base128Encoder in the inner loop, allocate it just once on the outside, as it appears that it's just a thin wrapper around binary.PutUvarint().	2018-01-27 11:35:10 -08:00
Steve Yen	10dd5489c2	scorch zap Dict.postingsList() takes []byte for more mem control This allows callers that already have a []byte term to avoid string'ification garbage.	2018-01-27 11:35:10 -08:00
Steve Yen	6a17ff48c7	scorch zap removed uneeded []byte cast of term	2018-01-27 11:35:10 -08:00
Steve Yen	d389e2bb40	scorch zap merge file cleanup on error, and some minor prealloc's	2018-01-27 11:35:10 -08:00
Steve Yen	29d526a7c2	scorch zap merge uses DefaultChunkFactor	2018-01-27 11:35:10 -08:00
Steve Yen	603425c2c5	scorch zap mergerLoop missing fireAsyncError case	2018-01-27 11:35:10 -08:00
Steve Yen	37121c3b49	scorch zap writeRoaringWithLen optimized with reused bufs	2018-01-27 11:35:10 -08:00
Steve Yen	5a035dc9aa	scorch zap in-memory segment representation (SegmentBase) The zap SegmentBase struct is a refactoring of the zap Segment into the subset of fields that are needed for read-only ops, without any persistence related info. This allows us to use zap's optimized data encoding as scorch's in-memory segments. The zap Segment struct now embeds a zap SegmentBase struct, and layers on persistence. Both the zap Segment and zap SegmentBase implement scorch's Segment interface.	2018-01-27 11:35:10 -08:00
Steve Yen	dc62324e02	scorch zap miscellaneous typos	2018-01-27 11:35:10 -08:00
Steve Yen	34fd77709f	scorch unlocks in introduceSegment's DocNumbers() error codepath	2018-01-20 17:17:16 -08:00
abhinavdangeti	1176c73a9c	Include overhead from data structures in segment's SizeInBytes + Account for all the overhead incurred from the data structures within mem.Segment and zap.Segment. - SizeOfMap = 8 - SizeOfPointer = 8 - SizeOfSlice = 24 - SizeOfString = 16 + Include overhead from certain new fields as well.	2018-01-17 11:11:44 -08:00
Steve Yen	71d6d1691b	scorch zap optimizations of inner loops and easy preallocs	2018-01-15 23:04:23 -08:00
Steve Yen	d682c85a7b	scorch mem segments uses backing array trick even more This change invokes make() only once per distinct type to allocate the large, contiguous backing arrays for the mem segment.	2018-01-15 19:17:39 -08:00
Steve Yen	0f19b542a3	scorch mem segment prealloc's Locfields/starts/ends/pos/arraypos This change preallocates more of the backing arrays for Locfields, Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem segment. On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch indexing throughput seems to improve from 15MB/sec to 20MB/sec after the recent series of preallocation changes.	2018-01-15 18:40:28 -08:00
Steve Yen	a84bd122d2	scorch mem segment preallocates sub-slices via # terms This change tracks the number of terms per posting list to preallocate the sub-slices for the Freqs & Norms.	2018-01-15 18:20:43 -08:00
Steve Yen	a4110d325c	scorch mem segment preallocates slices that are key'ed by postingId The scorch mem segment build phase uses the append() idiom to populate various slices that are keyed by postings list id's. These slices include... * Postings * PostingsLocs * Freqs * Norms * Locfields * Locstarts * Locends * Locpos * Locarraypos This change introduces an initialization step that preallocates those slices up-front, by assigning postings list id's to terms up-front. This change also has an additional effect of simplifying the processDocument() logic to no longer have to worry about a first-time initialization case, removing some duplicate'ish code.	2018-01-15 16:53:39 -08:00
Steve Yen	917c470791	scorch mem segment VisitDocument() accesses StoredTypes/Pos outside of loop	2018-01-15 11:54:46 -08:00
Steve Yen	e7bd6026eb	scorch mem segment preallocs docMap/fieldLens with capacity The first time through, startNumFields should be 0, where there ought to be more optimization assuming later docs have similar fields as the first doc.	2018-01-15 11:52:20 -08:00
Steve Yen	d777d7c365	scorch mem segment comments consistency	2018-01-15 11:08:21 -08:00
Marty Schoch	4e82a8a0ca	Merge pull request #726 from sreekanth-cb/docValue_configs DocValue Config, new API Changes	2018-01-10 18:11:18 -05:00
Sreekanth Sivasankaran	53aef2104e	fixing err handling in UTs, name changes	2018-01-10 22:00:26 +05:30
abhinavdangeti	43bfcc00c9	Do not account mmap'ed part of zap segments in MemoryUsed This API is designed to only emit the dirty "unpersisted" bytes only. This does not included the mmap'ed part in the zap segments (disk).	2018-01-09 09:43:53 -08:00
Sreekanth Sivasankaran	4c256f5669	DocValue Config, new API Changes -VisitableDocValueFields API for persisted DV field list -making dv configs overridable at field level -enabling on the fly/runtime un inverting of doc values -few UT updates	2018-01-08 10:58:33 +05:30
Marty Schoch	1788a03803	remove junk from end of scorch readme	2018-01-06 21:09:53 -05:00
Marty Schoch	e756c7acf0	add initial support for async error callback	2018-01-05 16:43:16 -05:00
Marty Schoch	6237479605	fix race condition in setting up event callbacks previous approach used SetEventCallback method which allowed you to change the callback, unfotunately that also included times after the goroutines were started and potentially firing the callback. checking lock on this would be too expensive, so instead we go for an approach that allows callbacks to be registered by name during process init(), then upon opening up an index a string config key 'eventCallbackName' is used to look up the appropriate callback function. also, since this string config name is serializable, it fits into the existing bleve index metadata without any new issues.	2018-01-05 13:46:03 -05:00
Marty Schoch	57a075afdb	improving command-line tool for scorch	2018-01-05 11:50:07 -05:00
Marty Schoch	c691cd2bb5	refactor scorch/zap command-line tools under bleve zap command-line tool added to main bleve command-line tool this required physical relocation due to the vendoring used only on the bleve command-line tool (unforseen limitation) a new scorch command-line tool has also been introduced and for the same reasons it is physically store under the top-level bleve command-line tool as well	2018-01-05 10:17:18 -05:00
Abhinav Dangeti	dee1dd9bc8	Merge pull request #720 from abhinavdangeti/scorch Updated Rollback APIs	2018-01-04 14:51:33 -08:00
abhinavdangeti	111f0d0721	Updated Rollback APIs New APIs: + RollbackPoints() - Retrieves the available list of rollback points: epoch+meta. - The application will need to check with the meta to decide on the rollback point. + Rollback() - API requires a rollback point identified by the first API. - Atomically & Durably rolls back the index to specified point, provided the specified rollback point is still available. + Unit test: TestIndexRollback - Writes a batch. - Sets the rollback point. - Writes second batch. - Rollback to previously decided point. - Ensure that data is as is before the second batch.	2018-01-04 13:21:58 -08:00
Marty Schoch	71cdac785d	Merge pull request #703 from sreekanth-cb/docValue_persisted docValue persist changes	2018-01-04 10:34:58 -05:00
Sreekanth Sivasankaran	71a726bbf6	perf issue was due to duplicate fieldIDs getting inserted to the list of dv enabled fields list - DocValueFields in mem segment. Moved back to the original type `DocValueFields map[uint16]bool` for easy look up to check whether the fieldID is configured for dv storage.	2018-01-04 15:34:55 +05:30
Sreekanth Sivasankaran	f42ecb0ac7	docvalue "zap-path" cmd to print out the dv disk sizes	2018-01-04 13:58:51 +05:30
Marty Schoch	1a59a1bb99	attempt to fix core reference counting issues Observed problem: Persisted index state (in root bolt) would contain index snapshots which pointed to index files that did not exist. Debugging this uncovered two main problems: 1. At the end of persisting a snapshot, the persister creates a new index snapshot with the SAME epoch as the current root, only it replaces in-memory segments with the new disk based ones. This is problematic because reference counting an index segment triggers "eligible for deletion". And eligible for deletion is keyed by epoch. So having two separate instances going by the same epoch is problematic. Specifically, one of them gets to 0 before the other, and we wrongly conclude it's eligible for deletion, when in fact the "other" instance with same epoch is actually still in use. To address this problem, we have modified the behavior of the persister. Now, upon completion of persistence, ONLY if new files were actually created do we proceed to introduce a new snapshot. AND, this new snapshot now gets it's own brand new epoch. BOTH of these are important because since the persister now also introduces a new epoch, it will see this epoch again in the future AND be expected to persist it. That is OK (mostly harmless), but we cannot allow it to form a loop. Checking that new files were actually introduced is what short-circuits the potential loop. The new epoch introduced by the persister, if seen again will not have any new segments that actually need persisting to disk, and the cycle is stopped. 2. The implementation of NumSnapshotsToKeep, and related code to deleted old snapshots from the root bolt also contains problems. Specifically, the determination of which snapshots to keep vs delete did not consider which ones were actually persisted. So, lets say you had set NumSnapshotsToKeep to 3, if the introducer gets 3 snapshots ahead of the persister, what can happen is that the three snapshots we choose to keep are all in memory. We now wrongly delete all of the snapshots from the root bolt. But it gets worse, in this instant of time, we now have files on disk that nothing in the root bolt points to, so we also go ahead and delete those files. Those files were still being referenced by the in-memory snapshots. But, now even if they get persisted to disk, they simply have references to non-existent files. Opening up one of these indexes results in lost data (often everything). To address this problem, we made large change to the way this section of code operates. First, we now start with a list of all epochs actually persisted in the root bolt. Second, we set aside NumSnapshotsToKeep of these snapshots to keep. Third, anything else in the eligibleForRemoval list will be deleted. I suspect this code is slower and less elegant, but I think it is more correct. Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to 1, which feels like saner out-of-the-box behavior (though it's debatable if the original intent was perhaps instead for "extra" snapshots to keep, but with the variable named as it is, 1 makes more sense to me) Other minor changes included in this change: - Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and 'ineligibleForRemoval' members of Scorch struct were moved into the paragraph with 'rootLock' to clarify that you must hold the lock to access it. - TestBatchRaceBug260 was updated to properly Close() the index, which leads to occasional test failures.	2018-01-03 12:05:00 -05:00
Sreekanth Sivasankaran	448201243a	removed redundant buf writer, and checks	2017-12-30 16:54:06 +05:30
Sreekanth Sivasankaran	61ba81e964	Merge branch 'scorch', remote-tracking branch 'origin' into docValue_persisted	2017-12-30 16:52:51 +05:30
Marty Schoch	29b63cfe43	Merge pull request #711 from abhinavdangeti/scorch3 Tracking memory consumption for a scorch index	2017-12-29 12:52:32 -08:00
abhinavdangeti	5c26f5a86d	Tracking memory consumption for a scorch index + Track memory usage at a segment level + Add a new scorch API: MemoryUsed() - Aggregate the memory consumption across segments when API is invoked. + TODO: - Revisit the second iteration if it can be gotten rid off, and the size accounted for during the first run while building an in-mem segment. - Accounting for pointer and slice overhead.	2017-12-29 10:20:11 -07:00
abhinavdangeti	055d3e12df	Adding onEvent callback support for scorch Event types: - EventKindCloseStart - EventKindClose - EventKindMergerProgress - EventKindPersisterProgress - EventKindBatchIntroductionStart - EventKindBatchIntroduction	2017-12-29 09:47:25 -07:00
Sreekanth Sivasankaran	c8df014c0c	Updated readme, zap version, added new docvalue cmd, fixed the footer and fields cmd, interface name updated	2017-12-29 21:39:29 +05:30
abhinavdangeti	4bede84fd0	Wiring up missing stats for scorch - updates, deletes, batches, errors - term_searchers_started, term_searchers_finished - num_plain_test_bytes_indexed	2017-12-28 14:07:58 -07:00
abhinavdangeti	becd4677cd	Adding num_items_introduced, num_items_persisted stats + Adding new entries to the stats struct of scorch. + These stats are atomically incremented upon every segment introduction, and upon successful persistence.	2017-12-28 14:07:44 -07:00
Sreekanth Sivasankaran	8abac42796	errCheck fixes	2017-12-28 13:23:57 +05:30

1 2 3 4

154 Commits