0
0
Fork 0
Commit Graph

43 Commits

Author SHA1 Message Date
Steve Yen 7a19e6fd7e scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding
This is attempt #2 of the optimization that replaces the locsBitmap,
without any changes from the original commit attempt.  A commit that
follows this one contains the actual fix.

See also...
- commit 621b58dd83 (the 1st attempt)
- commit 49a4ee60ba (the revert)

-------------
The original commit message body from 621b58 was...

NOTE: this is a zap file format change.

The separate "postings locations" roaring Bitmap that encoded whether
a posting has locations info is now replaced by the least significant
bit in the freq varint encoded in the freq-norm chunkedIntCoder.

encode/decodeFreqHasLocs() are added as helper functions.
2018-03-23 12:50:24 -07:00
Steve Yen 49a4ee60ba Revert "scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding"
Testing with the cbft application led to cbft process exits...

  AsyncError exit()... error reading location field: EOF --
  main.initBleveOptions.func1() at init_bleve.go:85

This reverts commit 621b58dd83.
2018-03-23 10:01:30 -07:00
Steve Yen 621b58dd83 scorch zap replace locsBitmap w/ 1 bit from freq-norm varint encoding
NOTE: this is a zap file format change.

The separate "postings locations" roaring Bitmap that encoded whether
a posting has locations info is now replaced by the least significant
bit in the freq varint encoded in the freq-norm chunkedIntCoder.

encode/decodeFreqHasLocs() are added as helper functions.
2018-03-22 17:43:07 -07:00
Marty Schoch f1c26e29f0
Merge branch 'master' into avoid-app-herder-hot-lock 2018-03-16 10:30:34 -04:00
Sreekanth Sivasankaran 53c3cab512
Merge branch 'master' into minor_docvalue_space_savings 2018-03-16 08:53:57 +05:30
Marty Schoch 45e0e5c666 memoize the size of an entire index snapshot
by memoizing the size of index snapshots and their
constituent parts, we significantly reduce the amount
of time that the lock is held in the app_herder, when
calculating the total memory used
2018-03-15 17:25:05 -04:00
Sreekanth Sivasankaran d1155c223a zap version bump, changed the offset slice format
,UTs
2018-03-15 23:25:53 +05:30
Sreekanth Sivasankaran 19318194fa moving to new offset slice format 2018-03-13 14:06:48 +05:30
Steve Yen 5abf7b7a19 scorch zap remove mem.Segment usage from persist / build.go 2018-03-09 15:23:58 -08:00
Steve Yen eac9808990 scorch zap optimize FST val encoding for terms with 1 hit
NOTE: this is a scorch zap file format change / bump to version 4.

In this optimization, the uint64 val stored in the vellum FST (term
dictionary) now may either be a uint64 postingsOffset (same as before
this change) or a uint64 encoding of the docNum + norm (in the case
where a term appears in just a single doc).
2018-03-08 09:19:54 -08:00
Steve Yen 79f28b7c93 scorch fix persistDocValues() err return 2018-03-07 09:11:10 -08:00
Steve Yen dde6c2e01b scorch zap optimize writeRoaringWithLen()
Before this change, writeRoaringWithLen() would leverage a reused
bytes.Buffer (#A) and invoke the roaring.WriteTo() API.

But, it turns out the roaring.WriteTo() API has a suboptimal
implementation, in that underneath-the-hood it converts the roaring
bitmap to a byte buffer (using roaring.ToBytes()), and then calls
Write().  But, that Write() turns out to be an additional memcpy into
the provided bytes.Buffer (#A).

By directly invoking roaring.ToBytes(), this change to
writeRoaringWithLen() avoids the extra memory allocation and memcpy.
2018-03-06 14:59:20 -08:00
Steve Yen b62ca996f6 scorch zap optimize chunkedIntCoder.Add() calls to use multiple vals
This change leverages the ability for the chunkedIntCoder.Add() method
to accept multiple input param values (via the '...' param signature),
meaning there are fewer Add() invocations.
2018-03-06 14:11:41 -08:00
Steve Yen 8f8fd511b7 scorch zap access freqs[offset] outside loop 2018-03-05 12:02:33 -08:00
Steve Yen a338386a03 scorch build optimize freq/loc slice capacity 2018-03-05 12:02:33 -08:00
Steve Yen 856778ad7b scorch zap build prealloc docNumbers capacity 2018-03-05 12:02:33 -08:00
Steve Yen 8c0881eab2 scorch zap build reuses mem postingsList/Iterator structs 2018-03-05 12:02:33 -08:00
Steve Yen 88c740095b scorch optimizations for mem.PostingsIterator.Next() & docTermMap
Due to the usage rules of iterators, mem.PostingsIterator.Next() can
reuse its returned Postings instance.

Also, there's a micro optimization in persistDocValues() for one fewer
access to the docTermMap in the inner-loop.
2018-03-03 11:31:18 -08:00
Marty Schoch 0363b24dd4 update to use new vellum Reset API 2018-03-01 09:37:39 -08:00
Steve Yen 720010783e scorch zap InitSegmentBase() helper func
Refactored out a zap.InitSegmentBase() func so that non-zap packages
can create SegmentBase instances.
2018-02-19 14:13:31 -08:00
Steve Yen 822457542e scorch zap VERSION bump: check whether fields are the same at merge
COMPATIBILITY NOTE: scorch zap version bumped in this commit.

The version bump is because mergeFields() now computes whether fields
are the same across segments and it relies on the previous commit
where fieldID's are assigned in field name sorted order (albeit with
_id field always having fieldID of 0).

Potential future commits might rely on this info that "fields are the
same across segments" for more optimizations, etc.
2018-02-08 09:06:30 -08:00
Steve Yen c09e2a08ca scorch zap chunkedContentCoder reuses chunk metadata slice memory
And, renamed the chunk MetaData.DocID field to DocNum for naming
correctness, where much of this commit is the mechanical effect of
that rename.
2018-02-05 07:39:16 -08:00
Steve Yen eb21bf8315 scorch zap merge & build share persistStoredFieldValues()
Refactored out a helper func, persistStoredFieldValues(), that both
the persistence and merge codepaths now share.
2018-02-05 07:38:55 -08:00
Steve Yen 634cfa0560 scorch zap chunkedIntCoder optimization to prealloc some final buf 2018-01-29 11:03:53 -08:00
Steve Yen 37121c3b49 scorch zap writeRoaringWithLen optimized with reused bufs 2018-01-27 11:35:10 -08:00
Steve Yen 5a035dc9aa scorch zap in-memory segment representation (SegmentBase)
The zap SegmentBase struct is a refactoring of the zap Segment into
the subset of fields that are needed for read-only ops, without any
persistence related info.  This allows us to use zap's optimized data
encoding as scorch's in-memory segments.

The zap Segment struct now embeds a zap SegmentBase struct, and layers
on persistence.  Both the zap Segment and zap SegmentBase implement
scorch's Segment interface.
2018-01-27 11:35:10 -08:00
Steve Yen dc62324e02 scorch zap miscellaneous typos 2018-01-27 11:35:10 -08:00
Steve Yen 71d6d1691b scorch zap optimizations of inner loops and easy preallocs 2018-01-15 23:04:23 -08:00
Sreekanth Sivasankaran 71a726bbf6 perf issue was due to duplicate fieldIDs getting
inserted to the list of dv enabled fields list -
DocValueFields in mem segment.
Moved back to the original type `DocValueFields map[uint16]bool`
for easy look up to check whether the fieldID is
configured for dv storage.
2018-01-04 15:34:55 +05:30
Sreekanth Sivasankaran 448201243a removed redundant buf writer, and checks 2017-12-30 16:54:06 +05:30
Sreekanth Sivasankaran c8df014c0c Updated readme, zap version, added new docvalue cmd,
fixed the footer and fields cmd,
interface name updated
2017-12-29 21:39:29 +05:30
Sreekanth Sivasankaran 76f827f469 docValue persist changes
docValues are persisted along with the index,
in a columnar fashion per field with variable
sized chunking for quick look up.
-naive chunk level caching is added per field
-data part inside a chunk is snappy compressed
-metaHeader inside the chunk index the dv values
 inside the uncompressed data part
-all the fields are docValue persisted in this iteration
2017-12-28 12:05:33 +05:30
Steve Yen f6b506134b import couchbase/vellum instead of couchbaselabs/vellum
Also, scrubbed an old couchbaselabs/moss reference in comments.

Also, go fmt.
2017-12-19 10:49:57 -08:00
Marty Schoch 85e15628ee major refactoring of posting details 2017-12-13 16:10:06 -05:00
Marty Schoch 6e2207c445 additional refactoring of build/merge 2017-12-13 15:22:13 -05:00
Marty Schoch 50441e5065 refactor to reuse shared code 2017-12-13 14:41:20 -05:00
Marty Schoch 289dc398bd more refacotring of build/merge 2017-12-13 14:26:11 -05:00
Marty Schoch 1cd3fd7fbe extrac common functionality between build/merge 2017-12-13 14:06:54 -05:00
Marty Schoch 58ef21a88a fix golint issue 2017-12-11 16:24:46 -05:00
Marty Schoch f13b786609 fix up issues to get all bleve unit tests passing for scorch
make scorch default
2017-12-11 15:47:41 -05:00
Marty Schoch 8280859bb8 handle read-only and in-mem only cases 2017-12-11 09:07:01 -05:00
Marty Schoch dc0adc8827 add fsync 2017-12-09 20:52:01 -05:00
Marty Schoch 9781d9b089 add initial version of zap file format 2017-12-09 14:28:33 -05:00