bleve

Author	SHA1	Message	Date
Steve Yen	192621f402	scorch includeFreq/Norm/Locs params for postingsList.Iterator API This commit adds boolean flag params to the scorch PostingsList.Iterator() method, so that the caller can specify whether freq/norm/locs information is needed or not. Future changes can leverage these params for optimizations.	2018-03-26 09:49:44 -07:00
Steve Yen	fc7584f5a0	scorch zap prealloc extra locs for future growth	2018-03-26 09:49:44 -07:00
Steve Yen	3f4b161850	scorch zap postingsIter reuses array positions slice	2018-03-26 09:49:44 -07:00
Steve Yen	db792717a6	scorch zap postingsIter reuses nextLocs/nextSegmentLocs The previous code would inefficiently throw away the nextLocs and would also throw away the []segment.Location slice if there were no locations, such as if it was a 1-hit postings list. This change tries to reuse the nextLocs/nextSegmentLocs for all cases.	2018-03-26 09:49:44 -07:00
Steve Yen	6540b197d4	scorch zap provide full buffer capacity to snappy Encode/Decode() The snappy Encode/Decode() API's accept an optional destination buffer param where their encoded/decoded output results will be placed, but they only check that the buffer has enough len() rather than enough capacity before deciding to allocate a new buffer.	2018-03-26 09:49:44 -07:00
Steve Yen	84424edcad	scorch zap sync.Pool for reusable VisitDocument() data structures As part of this, snappy.Decode() is also provided a reused buffer for decompression.	2018-03-26 09:49:44 -07:00
Steve Yen	ba644f3893	scorch zap fix postingsIter.nextBytes() when 1-bit encoded The previous commit's optimization that replaced the locsBitmap was incorrectly handling the case when there was a 1-bit encoding optimization in the postingsIterator.nextBytes() method, incorrectly generating the freq-norm bytes. Also as part of this change, more unused locsBitmap's were removed.	2018-03-26 09:19:00 -07:00
Steve Yen	7a19e6fd7e	scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding This is attempt #2 of the optimization that replaces the locsBitmap, without any changes from the original commit attempt. A commit that follows this one contains the actual fix. See also... - commit `621b58dd83` (the 1st attempt) - commit `49a4ee60ba` (the revert) ------------- The original commit message body from 621b58 was... NOTE: this is a zap file format change. The separate "postings locations" roaring Bitmap that encoded whether a posting has locations info is now replaced by the least significant bit in the freq varint encoded in the freq-norm chunkedIntCoder. encode/decodeFreqHasLocs() are added as helper functions.	2018-03-23 12:50:24 -07:00
Steve Yen	49a4ee60ba	Revert "scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding" Testing with the cbft application led to cbft process exits... AsyncError exit()... error reading location field: EOF -- main.initBleveOptions.func1() at init_bleve.go:85 This reverts commit `621b58dd83`.	2018-03-23 10:01:30 -07:00
Steve Yen	621b58dd83	scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding NOTE: this is a zap file format change. The separate "postings locations" roaring Bitmap that encoded whether a posting has locations info is now replaced by the least significant bit in the freq varint encoded in the freq-norm chunkedIntCoder. encode/decodeFreqHasLocs() are added as helper functions.	2018-03-22 17:43:07 -07:00
Steve Yen	b506fae4f7	scorch zap postingsItr remove unused offset/locoffset fields	2018-03-21 18:00:14 -07:00
Steve Yen	d1e2b55c72	scorch zap postingsItr.nextDocNum() maintains allNChunk correctly When PostingsIterator.nextDocNum() moves the 'all' roaring bitmap iterator forwards, it was incorrectly not keeping the allNChunk value aligned.	2018-03-21 17:57:54 -07:00
Abhinav Dangeti	ae27aa2f14	Merge pull request #848 from abhinavdangeti/curr Getting rid of panics added for debugging MB-28719,MB-28781	2018-03-20 15:14:22 -07:00
abhinavdangeti	0e3c57c465	Revert "scorch zap getField() which panics if the field is unknown" This reverts commit `85b4a31e2a`.	2018-03-20 14:51:33 -07:00
abhinavdangeti	844845b5d2	Revert "scorch zap panic if mergeFields() sees unsorted fields" This reverts commit `2f4d3d8587`.	2018-03-20 14:51:25 -07:00
Marty Schoch	35ea1d4423	fix MB-28719 and MB-28781 invalid/missing field in scorch Use of sync.Pool to reuse the interm structure relied on resetting the fieldsInv slice. However, actual segments continued to use this same fieldsInv slice after returning it to the pool. Simple fix is to nil out fieldsInv slice in reset method and let the newly built segment keep the one from the interim struct.	2018-03-20 17:41:56 -04:00
Steve Yen	2f4d3d8587	scorch zap panic if mergeFields() sees unsorted fields mergeFields depends on the fields from the various segments being sorted for the fieldsSame comparison to work. Of note, the 'fieldi > 1' guard skips the 0th field, which should always be the '_id' field.	2018-03-20 11:17:46 -07:00
Steve Yen	85b4a31e2a	scorch zap getField() which panics if the field is unknown	2018-03-20 11:12:18 -07:00
Steve Yen	f65ba5c0f4	MB-28781 - scorch zap merge freq/loc copying only when fieldsSame The optimization recently introduced in commit `530a3d24cf`, ("scorch zap optimize merge by byte copying freq/norm/loc's") was to byte-copy freq/norm/loc data directly during merging. But, it was incorrect if the fields were different across segments. This change now performs that byte-copying merging optimization only when the fields are the same across segments, and if not, leverages the old approach of deserializing & re-serializing the freq/norm/loc information, which has the important step of remapping fieldID's. See also: https://issues.couchbase.com/browse/MB-28781	2018-03-19 11:26:51 -07:00
Steve Yen	c881146270	scorch zap mergeTermFreqNormLocsByCopying() helper func	2018-03-19 10:36:23 -07:00
Steve Yen	5df53c8e1f	scorch zap file merger uses 1MB buffered writer pprof of bleve-blast was showing file merging was in syscall/write a lot. The bufio.NewWriter() provides a default buffer size of 4K, which is too small, and using bufio.NewWriterSize(1MB buffer size) leads to syscall/write dropping out of the file merging flame graphs.	2018-03-16 11:49:53 -07:00
Steve Yen	b411e65234	scorch zap optimize postingsIterator reuse of freq/locChunkOffsets	2018-03-16 11:22:50 -07:00
Steve Yen	e52eb84e37	scorch zap optimize merge when deletion bitmap is empty This change detects whether a deletion bitmap is empty, and treats that as a nil bitmap, which allows further postings iterator codepaths to avoid roaring bitmap operations (like, AndNot(docNums, drops)).	2018-03-16 11:22:50 -07:00
Steve Yen	5411d9ae4f	Merge pull request #826 from steveyen/scorch-estimate-buf-size estimate interim buffer size based on previous results	2018-03-16 11:22:42 -07:00
Marty Schoch	f1c26e29f0	Merge branch 'master' into avoid-app-herder-hot-lock	2018-03-16 10:30:34 -04:00
Sreekanth Sivasankaran	53c3cab512	Merge branch 'master' into minor_docvalue_space_savings	2018-03-16 08:53:57 +05:30
Sreekanth Sivasankaran	23cebae5a8	Merge pull request #815 from blevesearch/loadchunk_minor minor optimisation to loadChunk method	2018-03-16 08:15:37 +05:30
Marty Schoch	45e0e5c666	memoize the size of an entire index snapshot by memoizing the size of index snapshots and their constituent parts, we significantly reduce the amount of time that the lock is held in the app_herder, when calculating the total memory used	2018-03-15 17:25:05 -04:00
Sreekanth Sivasankaran	d1155c223a	zap version bump, changed the offset slice format ,UTs	2018-03-15 23:25:53 +05:30
Sreekanth Sivasankaran	1775602958	posting iterator array positions clean up, max segment size limit adjustment for hit-1 optimisation	2018-03-15 14:40:00 +05:30
Sreekanth Sivasankaran	441065a41b	comments,simplification	2018-03-15 13:11:29 +05:30
Steve Yen	4af65a7846	scorch zap prealloc buf via estimate from previous interim work	2018-03-14 09:32:14 -07:00
Steve Yen	7578ff7cb8	scorch zap optimize interim's reuse of vellum builders Since interim structs are now sync.Pool'ed, we can now also hold onto and reuse the associated vellum builder.	2018-03-14 07:49:28 -07:00
Sreekanth Sivasankaran	19318194fa	moving to new offset slice format	2018-03-13 14:06:48 +05:30
Sreekanth Sivasankaran	5271b582bb	Merge branch 'master' of https://github.com/blevesearch/bleve into loadchunk_minor	2018-03-13 11:59:29 +05:30
Steve Yen	dbfc5e9130	scorch zap reuse interim freq/norm/loc slices	2018-03-12 10:04:11 -07:00
Steve Yen	07901910e2	scorch zap reuse roaring Bitmap in prepareDicts() slice growth In this change, if the postings/postingsLocs slices need to be grown, then copy over and reuse any of the preallocated roaring Bitmap's from the old slice.	2018-03-12 09:19:38 -07:00
Steve Yen	b1f3969521	scorch zap reuse roaring Bitmap in postings lists	2018-03-12 09:18:11 -07:00
Steve Yen	cad88096ca	scorch zap reuse roaring Bitmap during merge	2018-03-12 09:17:37 -07:00
Steve Yen	c4ceffe584	scorch zap sync Pool for interim data	2018-03-12 09:17:37 -07:00
Steve Yen	531800c479	scorch zap use roaring Add() instead of AddInt() This change invokes Add() directly as AddInt() is a convenience wrapper around Add().	2018-03-12 09:17:37 -07:00
Steve Yen	6df6a036d8	Merge pull request #817 from steveyen/zap-no-longer-uses-mem-segment scorch zap no longer uses mem segment	2018-03-12 07:54:10 -07:00
Sreekanth Sivasankaran	aaccf59191	docValue space savings merging the doc value length and loc slices into a single offset slice as that is enough to compute the starting offset and length of the the doc values data for a given document inside a docValue chunk.	2018-03-12 15:36:46 +05:30
Steve Yen	2a20a36e15	scorch zap optimimze to avoid bitmaps for 1-hit posting lists This commit avoids creating roaring.Bitmap's (which would have just a single entry) when a postings list/iterator represents a single "1-hit" encoding.	2018-03-10 06:33:09 -08:00
Steve Yen	5abf7b7a19	scorch zap remove mem.Segment usage from persist / build.go	2018-03-09 15:23:58 -08:00
Steve Yen	eade78be2f	scorch zap unit tests no longer use mem.Segment	2018-03-09 15:23:58 -08:00
Steve Yen	e82774ad20	scorch zap AnalysisResultsToSegmentBase() AnalysisResultsToSegmentBase() allows analysis results to be directly converted into a zap-encoded SegmentBase, which can then be introduced onto the root, avoiding the creation of mem.Segment data structures. This leads to some reduction of garbage memory allocations. The grouping and sorting and shaping of the postings list information is taken from the mem.Segment codepaths. The encoding of stored fields reuses functions from zap's merger, which has the largest savings of garbage memory avoidance. And, the encoding of tf/loc chunks, postings & dictionary information also follows the approach used by zap's merger, which also has some savings of garbage memory avoidance. In future changes, the mem.Segment dependencies will be removed from zap, which should result in a smaller codebase.	2018-03-09 15:22:30 -08:00
Steve Yen	3884cf4d12	scorch zap writePostings() helper func refactored out	2018-03-09 13:29:28 -08:00
Sreekanth Sivasankaran	d6522e7e17	minor optimisation to loadChunk method	2018-03-09 16:10:39 +05:30
Steve Yen	eac9808990	scorch zap optimize FST val encoding for terms with 1 hit NOTE: this is a scorch zap file format change / bump to version 4. In this optimization, the uint64 val stored in the vellum FST (term dictionary) now may either be a uint64 postingsOffset (same as before this change) or a uint64 encoding of the docNum + norm (in the case where a term appears in just a single doc).	2018-03-08 09:19:54 -08:00

1 2 3 4

166 Commits