bleve

Author	SHA1	Message	Date
Sreekanth Sivasankaran	683e195ac4	adding empty segment handling during introduction cleaning up the segment live size check	2018-02-24 07:03:27 +05:30
Sreekanth Sivasankaran	606a270669	Fix for empty segment merge handling Avoid creating new files with emtpy segments tasks during the merge operation, skips the incorrect appending of a newer segment during merge.	2018-02-15 16:44:20 +05:30
Steve Yen	175f80403a	Merge pull request #747 from steveyen/master scorch zap DictIterator term count fixed and more merge unit tests	2018-02-01 10:13:18 -08:00
Abhinav Dangeti	c24f8944c4	Merge pull request #738 from abhinavdangeti/scorch-stats Add support for certain disk stats	2018-02-01 08:35:59 -08:00
Steve Yen	93b037cdbb	scorch zap TestMergeWithUpdates()	2018-01-31 11:44:41 -08:00
Steve Yen	4dd64b68fa	scorch zap TestMergeWithEmptySegment(s)	2018-01-30 22:27:40 -08:00
Steve Yen	684ee3c0e7	scorch zap DictIterator term count fixed and more merge unit tests The zap DictionaryIterator Next() was incorrectly returning the postingsList offset as the term count. As part of this, refactored out a PostingsList.read() helper method. Also added more merge unit test scenarios, including merging a segment for a few rounds to see if there are differences before/after merging.	2018-01-30 21:22:06 -08:00
Steve Yen	634cfa0560	scorch zap chunkedIntCoder optimization to prealloc some final buf	2018-01-29 11:03:53 -08:00
Steve Yen	a444c25ddf	scorch zap merge uses array for docTermMap with no sorting Instead of sorting docNum keys from a hashmap, this change instead iterates from docNum 0 to N and uses an array instead of hashmap. The array is also reused across outer loop iterations. This optimizes for when there's a lot of structural similarity between docs, where many/most docs have the same fields. i.e., beers, breweries. If every doc has completely different fields, then this change might produce worse behavior compared to the previous sparse hashmap approach.	2018-01-29 10:47:08 -08:00
Steve Yen	745575a6c1	scorch zap mergeStoredAndRemap uses array indexing, not append() Since we have right array size preallocated, we don't need the extra capacity checking of append().	2018-01-27 11:35:10 -08:00
Steve Yen	8dd17a3b20	scorch zap mergeStoredAndRemap uses continue for less indentation	2018-01-27 11:35:10 -08:00
Steve Yen	0041664bc4	scorch zap merge computeNewDocCount() optimize 1 variable	2018-01-27 11:35:10 -08:00
Steve Yen	6985db13a0	scorch zap merge reuses docNumbers array	2018-01-27 11:35:10 -08:00
Steve Yen	916bbf4125	scorch zap merge prealloc's docTermMap capacity	2018-01-27 11:35:10 -08:00
Steve Yen	56cdb68f35	scorch zap merge checks err2 not err Also, optimize the appending of the termSeparator so that the docTermMap is accessed and updated just once.	2018-01-27 11:35:10 -08:00
Steve Yen	3030d4edb5	scorch zap merge preallocs segNewDocNums capacity	2018-01-27 11:35:10 -08:00
Steve Yen	9038d75c98	scorch zap allocate govarint.U64Base128Encoder just once Instead of allocating a govarint.U64Base128Encoder in the inner loop, allocate it just once on the outside, as it appears that it's just a thin wrapper around binary.PutUvarint().	2018-01-27 11:35:10 -08:00
Steve Yen	10dd5489c2	scorch zap Dict.postingsList() takes []byte for more mem control This allows callers that already have a []byte term to avoid string'ification garbage.	2018-01-27 11:35:10 -08:00
Steve Yen	6a17ff48c7	scorch zap removed uneeded []byte cast of term	2018-01-27 11:35:10 -08:00
Steve Yen	d389e2bb40	scorch zap merge file cleanup on error, and some minor prealloc's	2018-01-27 11:35:10 -08:00
Steve Yen	29d526a7c2	scorch zap merge uses DefaultChunkFactor	2018-01-27 11:35:10 -08:00
Steve Yen	603425c2c5	scorch zap mergerLoop missing fireAsyncError case	2018-01-27 11:35:10 -08:00
Steve Yen	37121c3b49	scorch zap writeRoaringWithLen optimized with reused bufs	2018-01-27 11:35:10 -08:00
Steve Yen	5a035dc9aa	scorch zap in-memory segment representation (SegmentBase) The zap SegmentBase struct is a refactoring of the zap Segment into the subset of fields that are needed for read-only ops, without any persistence related info. This allows us to use zap's optimized data encoding as scorch's in-memory segments. The zap Segment struct now embeds a zap SegmentBase struct, and layers on persistence. Both the zap Segment and zap SegmentBase implement scorch's Segment interface.	2018-01-27 11:35:10 -08:00
Steve Yen	dc62324e02	scorch zap miscellaneous typos	2018-01-27 11:35:10 -08:00
abhinavdangeti	567d756c27	Add support for certain disk stats + num_bytes_used_disk + num_files_on_disk	2018-01-24 14:10:14 -08:00
Steve Yen	34fd77709f	scorch unlocks in introduceSegment's DocNumbers() error codepath	2018-01-20 17:17:16 -08:00
abhinavdangeti	1176c73a9c	Include overhead from data structures in segment's SizeInBytes + Account for all the overhead incurred from the data structures within mem.Segment and zap.Segment. - SizeOfMap = 8 - SizeOfPointer = 8 - SizeOfSlice = 24 - SizeOfString = 16 + Include overhead from certain new fields as well.	2018-01-17 11:11:44 -08:00
Steve Yen	71d6d1691b	scorch zap optimizations of inner loops and easy preallocs	2018-01-15 23:04:23 -08:00
Steve Yen	d682c85a7b	scorch mem segments uses backing array trick even more This change invokes make() only once per distinct type to allocate the large, contiguous backing arrays for the mem segment.	2018-01-15 19:17:39 -08:00
Steve Yen	0f19b542a3	scorch mem segment prealloc's Locfields/starts/ends/pos/arraypos This change preallocates more of the backing arrays for Locfields, Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem segment. On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch indexing throughput seems to improve from 15MB/sec to 20MB/sec after the recent series of preallocation changes.	2018-01-15 18:40:28 -08:00
Steve Yen	a84bd122d2	scorch mem segment preallocates sub-slices via # terms This change tracks the number of terms per posting list to preallocate the sub-slices for the Freqs & Norms.	2018-01-15 18:20:43 -08:00
Steve Yen	a4110d325c	scorch mem segment preallocates slices that are key'ed by postingId The scorch mem segment build phase uses the append() idiom to populate various slices that are keyed by postings list id's. These slices include... * Postings * PostingsLocs * Freqs * Norms * Locfields * Locstarts * Locends * Locpos * Locarraypos This change introduces an initialization step that preallocates those slices up-front, by assigning postings list id's to terms up-front. This change also has an additional effect of simplifying the processDocument() logic to no longer have to worry about a first-time initialization case, removing some duplicate'ish code.	2018-01-15 16:53:39 -08:00
Steve Yen	917c470791	scorch mem segment VisitDocument() accesses StoredTypes/Pos outside of loop	2018-01-15 11:54:46 -08:00
Steve Yen	e7bd6026eb	scorch mem segment preallocs docMap/fieldLens with capacity The first time through, startNumFields should be 0, where there ought to be more optimization assuming later docs have similar fields as the first doc.	2018-01-15 11:52:20 -08:00
Steve Yen	d777d7c365	scorch mem segment comments consistency	2018-01-15 11:08:21 -08:00
Marty Schoch	4e82a8a0ca	Merge pull request #726 from sreekanth-cb/docValue_configs DocValue Config, new API Changes	2018-01-10 18:11:18 -05:00
Sreekanth Sivasankaran	53aef2104e	fixing err handling in UTs, name changes	2018-01-10 22:00:26 +05:30
abhinavdangeti	43bfcc00c9	Do not account mmap'ed part of zap segments in MemoryUsed This API is designed to only emit the dirty "unpersisted" bytes only. This does not included the mmap'ed part in the zap segments (disk).	2018-01-09 09:43:53 -08:00
Sreekanth Sivasankaran	4c256f5669	DocValue Config, new API Changes -VisitableDocValueFields API for persisted DV field list -making dv configs overridable at field level -enabling on the fly/runtime un inverting of doc values -few UT updates	2018-01-08 10:58:33 +05:30
Marty Schoch	1788a03803	remove junk from end of scorch readme	2018-01-06 21:09:53 -05:00
Marty Schoch	e756c7acf0	add initial support for async error callback	2018-01-05 16:43:16 -05:00
Marty Schoch	6237479605	fix race condition in setting up event callbacks previous approach used SetEventCallback method which allowed you to change the callback, unfotunately that also included times after the goroutines were started and potentially firing the callback. checking lock on this would be too expensive, so instead we go for an approach that allows callbacks to be registered by name during process init(), then upon opening up an index a string config key 'eventCallbackName' is used to look up the appropriate callback function. also, since this string config name is serializable, it fits into the existing bleve index metadata without any new issues.	2018-01-05 13:46:03 -05:00
Marty Schoch	57a075afdb	improving command-line tool for scorch	2018-01-05 11:50:07 -05:00
Marty Schoch	c691cd2bb5	refactor scorch/zap command-line tools under bleve zap command-line tool added to main bleve command-line tool this required physical relocation due to the vendoring used only on the bleve command-line tool (unforseen limitation) a new scorch command-line tool has also been introduced and for the same reasons it is physically store under the top-level bleve command-line tool as well	2018-01-05 10:17:18 -05:00
Abhinav Dangeti	dee1dd9bc8	Merge pull request #720 from abhinavdangeti/scorch Updated Rollback APIs	2018-01-04 14:51:33 -08:00
abhinavdangeti	111f0d0721	Updated Rollback APIs New APIs: + RollbackPoints() - Retrieves the available list of rollback points: epoch+meta. - The application will need to check with the meta to decide on the rollback point. + Rollback() - API requires a rollback point identified by the first API. - Atomically & Durably rolls back the index to specified point, provided the specified rollback point is still available. + Unit test: TestIndexRollback - Writes a batch. - Sets the rollback point. - Writes second batch. - Rollback to previously decided point. - Ensure that data is as is before the second batch.	2018-01-04 13:21:58 -08:00
Marty Schoch	71cdac785d	Merge pull request #703 from sreekanth-cb/docValue_persisted docValue persist changes	2018-01-04 10:34:58 -05:00
Sreekanth Sivasankaran	71a726bbf6	perf issue was due to duplicate fieldIDs getting inserted to the list of dv enabled fields list - DocValueFields in mem segment. Moved back to the original type `DocValueFields map[uint16]bool` for easy look up to check whether the fieldID is configured for dv storage.	2018-01-04 15:34:55 +05:30
Sreekanth Sivasankaran	f42ecb0ac7	docvalue "zap-path" cmd to print out the dv disk sizes	2018-01-04 13:58:51 +05:30

1 2 3 4

164 Commits