0
0
Commit Graph

32 Commits

Author SHA1 Message Date
Steve Yen
d682c85a7b scorch mem segments uses backing array trick even more
This change invokes make() only once per distinct type to allocate the
large, contiguous backing arrays for the mem segment.
2018-01-15 19:17:39 -08:00
Steve Yen
0f19b542a3 scorch mem segment prealloc's Locfields/starts/ends/pos/arraypos
This change preallocates more of the backing arrays for Locfields,
Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem
segment.

On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch
indexing throughput seems to improve from 15MB/sec to 20MB/sec after
the recent series of preallocation changes.
2018-01-15 18:40:28 -08:00
Steve Yen
a84bd122d2 scorch mem segment preallocates sub-slices via # terms
This change tracks the number of terms per posting list to
preallocate the sub-slices for the Freqs & Norms.
2018-01-15 18:20:43 -08:00
Steve Yen
a4110d325c scorch mem segment preallocates slices that are key'ed by postingId
The scorch mem segment build phase uses the append() idiom to populate
various slices that are keyed by postings list id's.  These slices
include...

* Postings
* PostingsLocs
* Freqs
* Norms
* Locfields
* Locstarts
* Locends
* Locpos
* Locarraypos

This change introduces an initialization step that preallocates those
slices up-front, by assigning postings list id's to terms up-front.

This change also has an additional effect of simplifying the
processDocument() logic to no longer have to worry about a first-time
initialization case, removing some duplicate'ish code.
2018-01-15 16:53:39 -08:00
Steve Yen
917c470791 scorch mem segment VisitDocument() accesses StoredTypes/Pos outside of loop 2018-01-15 11:54:46 -08:00
Steve Yen
e7bd6026eb scorch mem segment preallocs docMap/fieldLens with capacity
The first time through, startNumFields should be 0, where there ought
to be more optimization assuming later docs have similar fields as the
first doc.
2018-01-15 11:52:20 -08:00
Steve Yen
d777d7c365 scorch mem segment comments consistency 2018-01-15 11:08:21 -08:00
Sreekanth Sivasankaran
4c256f5669 DocValue Config, new API Changes
-VisitableDocValueFields API for persisted DV field list
-making dv configs overridable at field level
-enabling on the fly/runtime un inverting of doc values
-few UT updates
2018-01-08 10:58:33 +05:30
Sreekanth Sivasankaran
71a726bbf6 perf issue was due to duplicate fieldIDs getting
inserted to the list of dv enabled fields list -
DocValueFields in mem segment.
Moved back to the original type `DocValueFields map[uint16]bool`
for easy look up to check whether the fieldID is
configured for dv storage.
2018-01-04 15:34:55 +05:30
Sreekanth Sivasankaran
61ba81e964 Merge branch 'scorch', remote-tracking branch 'origin' into docValue_persisted 2017-12-30 16:52:51 +05:30
abhinavdangeti
5c26f5a86d Tracking memory consumption for a scorch index
+ Track memory usage at a segment level
+ Add a new scorch API: MemoryUsed()
    - Aggregate the memory consumption across
      segments when API is invoked.

+ TODO:
    - Revisit the second iteration if it can be gotten
      rid off, and the size accounted for during the first
      run while building an in-mem segment.
    - Accounting for pointer and slice overhead.
2017-12-29 10:20:11 -07:00
Sreekanth Sivasankaran
c8df014c0c Updated readme, zap version, added new docvalue cmd,
fixed the footer and fields cmd,
interface name updated
2017-12-29 21:39:29 +05:30
Sreekanth Sivasankaran
76f827f469 docValue persist changes
docValues are persisted along with the index,
in a columnar fashion per field with variable
sized chunking for quick look up.
-naive chunk level caching is added per field
-data part inside a chunk is snappy compressed
-metaHeader inside the chunk index the dv values
 inside the uncompressed data part
-all the fields are docValue persisted in this iteration
2017-12-28 12:05:33 +05:30
Steve Yen
c13ff85aaf scorch ref-counting
Future commits will provide actual cleanup when ref-counts reach 0.
2017-12-13 14:48:07 -08:00
Marty Schoch
f13b786609 fix up issues to get all bleve unit tests passing for scorch
make scorch default
2017-12-11 15:47:41 -05:00
Marty Schoch
adac4f41db initial version of scorch which persists index to disk 2017-12-06 18:33:47 -05:00
Marty Schoch
f6be841668 add test for postings list count method 2017-12-05 13:01:36 -05:00
Marty Schoch
30e9d6daa5 add better testing of array positions 2017-12-05 12:54:44 -05:00
Marty Schoch
8d9d45115f add test of location field 2017-12-05 12:20:06 -05:00
Marty Schoch
8f0350865b add test for segment fields method 2017-12-05 12:17:56 -05:00
Marty Schoch
7a6b5483f2 add validation that all locations were seen 2017-12-05 11:58:05 -05:00
Marty Schoch
e08fdab54a remove todo item 2017-12-05 10:13:27 -05:00
Marty Schoch
87e2627551 added dictionary tests to mem segment 2017-12-05 09:49:41 -05:00
Marty Schoch
ed067f45dd added Close() method to Segment 2017-12-05 09:31:02 -05:00
Marty Schoch
22ffc8940e update segment API to return error in key places 2017-12-04 18:06:06 -05:00
Marty Schoch
b74cf4b081 add copyright header to all new files in scorch 2017-12-01 15:42:50 -05:00
Marty Schoch
89aa02cf5b fix highlighting of composite fields
updated log statements for refactored names
2017-12-01 15:12:08 -05:00
Marty Schoch
cff14f1212 fix crash in DocNumbers when segment is empty 2017-12-01 09:50:27 -05:00
Marty Schoch
eb256f78bc switch to constant referring to id field id 0
this avoids potentially mutating something that is intended
to be immutable
2017-12-01 09:30:07 -05:00
Marty Schoch
395458ce83 refactor to make mem segment contents exported 2017-12-01 07:26:47 -05:00
Marty Schoch
848aca4639 fix issues identified by errcheck 2017-11-29 13:34:15 -05:00
Marty Schoch
23f6dc1cc6 working in-memory version 2017-11-29 11:33:35 -05:00