0
0
Fork 0
Commit Graph

1625 Commits

Author SHA1 Message Date
abhinavdangeti 6451c8c37f MB-26396: Handling documents with geopoints in slice format
+ The issue lies with parsing documents containing a geopoint
  in slice format - which wasn't handled.
+ Unit test that verifies the fix.
2018-01-29 18:31:56 -08:00
Steve Yen 5d1a2b0ad7
Merge pull request #743 from steveyen/master
zap-based in-memory segment impl & various merge optimizations
2018-01-29 09:22:12 -08:00
Steve Yen 745575a6c1 scorch zap mergeStoredAndRemap uses array indexing, not append()
Since we have right array size preallocated, we don't need the extra
capacity checking of append().
2018-01-27 11:35:10 -08:00
Steve Yen 8dd17a3b20 scorch zap mergeStoredAndRemap uses continue for less indentation 2018-01-27 11:35:10 -08:00
Steve Yen 0041664bc4 scorch zap merge computeNewDocCount() optimize 1 variable 2018-01-27 11:35:10 -08:00
Steve Yen 6985db13a0 scorch zap merge reuses docNumbers array 2018-01-27 11:35:10 -08:00
Steve Yen 916bbf4125 scorch zap merge prealloc's docTermMap capacity 2018-01-27 11:35:10 -08:00
Steve Yen 56cdb68f35 scorch zap merge checks err2 not err
Also, optimize the appending of the termSeparator so that the
docTermMap is accessed and updated just once.
2018-01-27 11:35:10 -08:00
Steve Yen 3030d4edb5 scorch zap merge preallocs segNewDocNums capacity 2018-01-27 11:35:10 -08:00
Steve Yen 9038d75c98 scorch zap allocate govarint.U64Base128Encoder just once
Instead of allocating a govarint.U64Base128Encoder in the inner loop,
allocate it just once on the outside, as it appears that it's just a
thin wrapper around binary.PutUvarint().
2018-01-27 11:35:10 -08:00
Steve Yen 10dd5489c2 scorch zap Dict.postingsList() takes []byte for more mem control
This allows callers that already have a []byte term to avoid
string'ification garbage.
2018-01-27 11:35:10 -08:00
Steve Yen 6a17ff48c7 scorch zap removed uneeded []byte cast of term 2018-01-27 11:35:10 -08:00
Steve Yen d389e2bb40 scorch zap merge file cleanup on error, and some minor prealloc's 2018-01-27 11:35:10 -08:00
Steve Yen 29d526a7c2 scorch zap merge uses DefaultChunkFactor 2018-01-27 11:35:10 -08:00
Steve Yen 603425c2c5 scorch zap mergerLoop missing fireAsyncError case 2018-01-27 11:35:10 -08:00
Steve Yen 37121c3b49 scorch zap writeRoaringWithLen optimized with reused bufs 2018-01-27 11:35:10 -08:00
Steve Yen 5a035dc9aa scorch zap in-memory segment representation (SegmentBase)
The zap SegmentBase struct is a refactoring of the zap Segment into
the subset of fields that are needed for read-only ops, without any
persistence related info.  This allows us to use zap's optimized data
encoding as scorch's in-memory segments.

The zap Segment struct now embeds a zap SegmentBase struct, and layers
on persistence.  Both the zap Segment and zap SegmentBase implement
scorch's Segment interface.
2018-01-27 11:35:10 -08:00
Steve Yen dc62324e02 scorch zap miscellaneous typos 2018-01-27 11:35:10 -08:00
Marty Schoch 0fc9b4b74a
Merge pull request #742 from steveyen/scorch-unlock-needed
scorch unlocks in introduceSegment's DocNumbers() error codepath
2018-01-23 12:09:23 -05:00
Steve Yen 34fd77709f scorch unlocks in introduceSegment's DocNumbers() error codepath 2018-01-20 17:17:16 -08:00
Marty Schoch cb6391e75e
Merge pull request #733 from abhinavdangeti/scorch-segment-sizeinbytes
Include overhead from data structures in segment's SizeInBytes
2018-01-19 09:10:03 -05:00
Marty Schoch 5a812ee9ce
Merge pull request #732 from sreekanth-cb/facet_merge
MB-27498 - date range facet query panics
2018-01-19 09:02:57 -05:00
Sreekanth Sivasankaran 47f1c66889 adding UT 2018-01-19 11:47:28 +05:30
abhinavdangeti 1176c73a9c Include overhead from data structures in segment's SizeInBytes
+ Account for all the overhead incurred from the data structures
  within mem.Segment and zap.Segment.
    - SizeOfMap = 8
    - SizeOfPointer = 8
    - SizeOfSlice = 24
    - SizeOfString = 16
+ Include overhead from certain new fields as well.
2018-01-17 11:11:44 -08:00
Marty Schoch 44c371582a
Merge pull request #739 from ethantkoenig/unique_token_filter
Add UniqueTerm token filter
2018-01-17 13:10:10 -05:00
Ethan Koenig 012d436dd7 Add UniqueTerm token filter 2018-01-16 22:24:51 -08:00
Steve Yen f4c3f984a4
Merge pull request #734 from steveyen/master
scorch mem segment optimizations
2018-01-16 08:57:02 -08:00
Marty Schoch 423d7dc4e4
Merge pull request #736 from ethantkoenig/readme
Fix coverage badge in README
2018-01-16 08:01:46 -05:00
Steve Yen 71d6d1691b scorch zap optimizations of inner loops and easy preallocs 2018-01-15 23:04:23 -08:00
Ethan Koenig d14b290235 Fix coverage badge in README 2018-01-15 22:23:41 -08:00
Steve Yen d682c85a7b scorch mem segments uses backing array trick even more
This change invokes make() only once per distinct type to allocate the
large, contiguous backing arrays for the mem segment.
2018-01-15 19:17:39 -08:00
Steve Yen 0f19b542a3 scorch mem segment prealloc's Locfields/starts/ends/pos/arraypos
This change preallocates more of the backing arrays for Locfields,
Locstarts, Locends, Locpos, Locaaraypos sub-slices of a scorch mem
segment.

On small bleve-blast tests (50K wiki docs) on a dev macbook, scorch
indexing throughput seems to improve from 15MB/sec to 20MB/sec after
the recent series of preallocation changes.
2018-01-15 18:40:28 -08:00
Steve Yen a84bd122d2 scorch mem segment preallocates sub-slices via # terms
This change tracks the number of terms per posting list to
preallocate the sub-slices for the Freqs & Norms.
2018-01-15 18:20:43 -08:00
Steve Yen a4110d325c scorch mem segment preallocates slices that are key'ed by postingId
The scorch mem segment build phase uses the append() idiom to populate
various slices that are keyed by postings list id's.  These slices
include...

* Postings
* PostingsLocs
* Freqs
* Norms
* Locfields
* Locstarts
* Locends
* Locpos
* Locarraypos

This change introduces an initialization step that preallocates those
slices up-front, by assigning postings list id's to terms up-front.

This change also has an additional effect of simplifying the
processDocument() logic to no longer have to worry about a first-time
initialization case, removing some duplicate'ish code.
2018-01-15 16:53:39 -08:00
Steve Yen 917c470791 scorch mem segment VisitDocument() accesses StoredTypes/Pos outside of loop 2018-01-15 11:54:46 -08:00
Steve Yen e7bd6026eb scorch mem segment preallocs docMap/fieldLens with capacity
The first time through, startNumFields should be 0, where there ought
to be more optimization assuming later docs have similar fields as the
first doc.
2018-01-15 11:52:20 -08:00
Steve Yen d777d7c365 scorch mem segment comments consistency 2018-01-15 11:08:21 -08:00
Marty Schoch 4d71e901e8 make new analyzers available to consumers of the config pkg
many tools and applications using bleve use the config pkg to
include support for many languages out of the box by forcing
import of optional packages.
2018-01-11 11:01:35 -05:00
Sreekanth Sivasankaran 039a4df33b initialize only with an imminent merge 2018-01-11 15:09:27 +05:30
Sreekanth Sivasankaran 3afc5458e0 MB-27498 - date range facet query panics
Initialise the facet results map in case of an
 empty partial hits with a multi node cluster
2018-01-11 14:44:05 +05:30
Marty Schoch 4e82a8a0ca
Merge pull request #726 from sreekanth-cb/docValue_configs
DocValue Config, new API Changes
2018-01-10 18:11:18 -05:00
Marty Schoch 09a61a7a38 add analyzers for several languages
Having pure Go snowball stemmers allows us to add support for
many languages into the core of bleve.  Specifically we just
added: Russian, Danish, Finnish, Hungarian, Dutch, Norwegian,
Romanian, Swedish, Turkish
2018-01-10 16:00:29 -05:00
Marty Schoch e68b70aa82 Merge branch 'sokolovstas-ru_analyzer' 2018-01-10 15:16:45 -05:00
Marty Schoch a9532e510a refactor slightly to use our new hosted snowball stemmers
rather than having each package include it directly inside of
bleve, we have decide to host them all in one repo

https://github.com/blevesearch/snowballstem

this makes the easier for the rest of the community to use
outside of bleve contexts
2018-01-10 15:15:31 -05:00
Sreekanth Sivasankaran 53aef2104e fixing err handling in UTs, name changes 2018-01-10 22:00:26 +05:30
Marty Schoch af198c833f Merge branch 'ru_analyzer' of https://github.com/sokolovstas/bleve into sokolovstas-ru_analyzer 2018-01-10 10:29:15 -05:00
Marty Schoch b1a079fe57
Merge pull request #725 from tomkralidis/patch-1
fix minor typo
2018-01-10 09:56:02 -05:00
Abhinav Dangeti 9c73d37987
Merge pull request #730 from abhinavdangeti/scorch-master
Do not account mmap'ed part of zap segments in MemoryUsed
2018-01-09 09:56:24 -08:00
abhinavdangeti 43bfcc00c9 Do not account mmap'ed part of zap segments in MemoryUsed
This API is designed to only emit the dirty "unpersisted"
bytes only. This does not included the mmap'ed part in the
zap segments (disk).
2018-01-09 09:43:53 -08:00
Sreekanth Sivasankaran 4c256f5669 DocValue Config, new API Changes
-VisitableDocValueFields API for persisted DV field list
-making dv configs overridable at field level
-enabling on the fly/runtime un inverting of doc values
-few UT updates
2018-01-08 10:58:33 +05:30