bleve

Author	SHA1	Message	Date
Marty Schoch	d7405a4d79	updated attempt to reuse []byte previous attempt was flawed (but maked by Reset() method) new approach is to do this work in the Reset() method itself, logically this is where it belongs. but further we acknowledge that IndexInternalID []byte lifetime lives beyond the TermFieldDoc, so another copy is made into the DocumentMatch. Although this introduces yet another copy the theory being tested is that it allows each of these structuress to reuse memory without additional allocation.	2016-08-03 17:01:27 -04:00
Marty Schoch	89d83cb5a1	reuse memory already allocated for copies of docids when the term field reader is copying ID values out of the kv store's iterator, it is already attempting to reuse the term frequency row data structure. this change allows us to also attempt to reuse the []byte allocated for previous copies of the docid. we reset the slice length to zero then copy the data into the existing slice, avoiding new allocation and garbage collection in the cases where there is already enough space	2016-08-03 13:45:48 -04:00
Marty Schoch	4b1b866e0f	remove commented out old code	2016-08-02 16:48:00 -04:00
Marty Schoch	36de4a7097	cleaner fix for the TermFrequencyRow reuse bug reset to nil first, let remaining logic work as before	2016-08-01 17:17:29 -04:00
Marty Schoch	cfce9c5fc5	initialize term vector list in parseV otherwise reusing previous term frequency row causes us to keep tacking on to one gigantic list	2016-08-01 17:01:34 -04:00
Marty Schoch	172ca7e69e	need to copy the doc ID for it to survive past next iteration	2016-08-01 17:01:04 -04:00
Marty Schoch	e188fe35f7	switch back to single DocumentMatch struct instead of separate DocumentMatch/DocumentMatchInternal rules are simple, everything operates on the IndexInternalID field until the results are returned, then ID is set correctly the IndexInternalID field is not exported to JSON	2016-08-01 14:58:02 -04:00
Marty Schoch	1aacd9bad5	changed approach IndexInternalID is now []byte this is still opaque, and should still work for any future index implementations as it is a least common denominator choice, all implementations must internally represent the id as []byte at some point for storage to disk	2016-08-01 14:26:50 -04:00
Marty Schoch	5aa9e95468	major refactor of index/search API index id's are now opaque (until finally returned to top-level user) - the TermFieldDoc's returned by TermFieldReader no longer contain doc id - instead they return an opaque IndexInternalID - items returned are still in the "natural index order" - but that is no longer guaranteed to be "doc id order" - correct behavior requires that they all follow the same order - but not any particular order - new API FinalizeDocID which converts index internal ID's to public string ID - APIs used internally which previously took doc id now take IndexInternalID - that is DocumentFieldTerms() and DocumentFieldTermsForFields() - however, APIs that are used externally do not reflect this change - that is Document() - DocumentIDReader follows the same changes, but this is less obvious - behavior clarified, used to iterate doc ids, BUT NOT in doc id order - method STILL available to iterate doc ids in range - but again, you won't get them in any meaningful order - new method to iterate actual doc ids from list of possible ids - this was introduced to make the DocIDSearcher continue working searchers now work with the new opaque index internal doc ids - they return new DocumentMatchInternal (which does not have string ID) scorerers also work with these opaque index internal doc ids - they return DocumentMatchInternal (which does not have string ID) collectors now also perform a final step of converting the final result - they STILL return traditional DocumentMatch (with string ID) - but they now also require an IndexReader (so that they can do the conversion)	2016-07-31 13:46:18 -04:00
Marty Schoch	47ee69ae82	term field reader supports optionally omitting 3 details at the time you create the term field reader, you can specify that you don't need the term freq, the norm, or the term vectors in that case, the index implementation can choose to not return them in its subsequently returned values this is advisory only, some simple implementations may ignore this and continue to return the values anyway (as the current impl of upside_down does today) this change will allow future index implementations the opportunity to do less work when it isn't required	2016-07-30 10:26:42 -04:00
Marty Schoch	389e18a779	attempt to support google app engine the default configuration, which sets the default kv engine to boltdb is now done in file protected with the !appengine build tag. this at least lets the analysis-wizzard app run locally in the appengine simulator. this still has not been tested on the real appengine, and further changes may be required.	2016-07-29 21:29:05 -04:00
Marty Schoch	b158fb147d	Merge pull request #400 from steveyen/WIP-search-optimizations search optimizations	2016-07-29 17:30:35 -04:00
Steve Yen	4822cff63a	optimize Advance() with pre-allocated in-out param This perf-related change helps the code and API reach more similarity with the Next() methods, which now take a pre-allocate param.	2016-07-29 14:15:00 -07:00
Steve Yen	3c82086805	optimize upside_down reader & 64-bit struct alignments The UpsideDownCouchTermFieldReader.Next() only needs the doc ID from the key, so this change provides a specialized parseKDoc() method for that optimization. Additionally, fields in various structs are more 64-bit aligned, in an attempt to reduce the invocations of runtime.typedmemmove() and runtime.heapBitsBulkBarrier(), which the go compiler seems to automatically insert to transparently handle misaligned data.	2016-07-23 10:37:40 -07:00
Steve Yen	e33ae65cd2	optimize SqrtCache as just-an-array	2016-07-21 19:41:33 -07:00
Steve Yen	5094d2d097	optimize moss PrefixIterator Previously, the PrefixIterator() for moss was implemented by comparing the prefix bytes on every Next(). With this optimization, the next larger endKeyExclusive is computed at the iterator's initialization, which allows us to avoid all those prefix comparisons.	2016-07-21 18:33:34 -07:00
Steve Yen	5271a0f62b	optimize termFieldVectorsFromTermVectors when empty	2016-07-21 11:46:14 -07:00
Steve Yen	b8c8478783	optimize collector to check ctx.Done() only occasionally	2016-07-21 11:10:49 -07:00
Steve Yen	cbb174b074	optimize moss iterator Next() done/k/v maintenance	2016-07-21 11:10:49 -07:00
Steve Yen	b564ebbfbe	optimization comments on DocumentMatch instance reuse	2016-07-21 11:10:49 -07:00
Steve Yen	b744148449	optimization to actually reuse the TermFrequencyRow	2016-07-21 11:10:49 -07:00
Steve Yen	6d7fa0b964	optimize moss iterator checkDone()	2016-07-21 11:10:49 -07:00
Steve Yen	39d3e2f028	optimize upside_down reader Next() with TermFieldDoc reuse This optimization changes the index.TermFieldReader.Next() interface API, adding an optional, pre-allocated *TermFieldDoc parameter, which can help prevent garbage creation.	2016-07-21 11:10:49 -07:00
Steve Yen	988ca62182	optimize upside_down reader Next() with doc match reuse This optimization changes the search.Search.Next() interface API, adding an optional, pre-allocated DocumentMatch parameter. When it's non-nil, the TermSearcher and TermQueryScorer will use that pre-allocated DocumentMatch, instead of allocating a brand new DocumentMatch instance.	2016-07-21 11:10:49 -07:00
Steve Yen	2498ccc913	optimize upside_down reader Next() to reuse TermFrequencyRow Before this change, upside down's reader would alloc a new TermFrequencyRow on every Next(), which would be immediately transformed into an index.TermFieldDoc{}. This change reuses a pre-allocated TermFrequencyRow that's a field in the reader.	2016-07-21 11:10:49 -07:00
Steve Yen	68af6aef62	optimize upside_down reader Next() when 0-length term field vectors From some bleve-query perf profiling, term field vectors appeared to be alloc'ed, which was unnecessary as term field vectors are disabled in the bleve-blast/bleve-query tests.	2016-07-21 11:10:49 -07:00
Marty Schoch	5934a185f3	Merge pull request #398 from slavikm/master Make facets much faster	2016-07-21 09:12:28 -04:00
slavikm	fc990bc2d1	Remove the field IDs from outside of the index	2016-07-19 20:42:45 -07:00
Marty Schoch	d8ffa8fb5e	Merge pull request #397 from steveyen/master MB-20101 - moss KV fix Get() of 0-length vals	2016-07-19 11:58:59 -04:00
slavikm	ce64c17be1	Do field cache only once per search	2016-07-17 16:29:17 -07:00
slavikm	9a9b630a6d	Make facets much faster	2016-07-17 15:31:35 -07:00
Steve Yen	80623f4a8a	MB-20101 - moss KV fix Get() of 0-length vals The moss KV store adapter's Get() implementation was incorrectly transforming a 0-length val (e.g., []byte{}) into a nil val.	2016-07-15 14:41:30 -07:00
Marty Schoch	412d50f1d7	Merge pull request #394 from jakubkulhan/match-query-operator Added match query operator	2016-07-11 17:46:53 -06:00
Jakub Kulhan	7a695d1189	added match query operator	2016-07-06 13:15:56 +02:00
Marty Schoch	9089de251f	remove byte_array_conveters fixes #392 fixes #100	2016-07-01 10:21:41 -04:00
Marty Schoch	63f2eb6740	update godocs for date range querying fixes #382	2016-06-26 10:25:09 -04:00
Marty Schoch	2807a2c8bd	adding CONTRIBUTING.md to repo closes #327	2016-06-26 09:48:43 -04:00
Marty Schoch	bd2a23fb6d	remove firestorm index scheme firestorm was an experiment we learned a lot, but it did not result in a usable index scheme	2016-06-26 07:51:41 -04:00
Marty Schoch	7e02e616ce	fix indexing of primitives not inside map/struct fixes #389	2016-06-21 21:15:36 -04:00
Marty Schoch	54b06ce0f6	fix bug in regexp, prefix and fuzzy searchers these searchers incorrectly called Next() on their underlying searcher, instead of Advance(). this can cause values to be returned with an ID less than the one that was Advanced() to, which violates the contract, and causes other incorrect behavior. fixes #342	2016-06-21 09:00:05 -04:00
Marty Schoch	9f31ea6805	standardize behavior of mapping anonymous fields the behavior has been defined in a way that is compatible with encoding/json. this behavior is as follows: anonymous fields which are structs will have struct fields get field names as if they were directly in the parent struct. anonymous fields which are not structs, or which are interfaces which may or may not point to structs will get field names that correspond to the name of the type the exception to the rules above is that you can always override this behavior by using a JSON struct tag fixes #101	2016-06-16 16:27:24 -04:00
Marty Schoch	58457e7d66	Merge pull request #388 from MachineShop-IOT/master Add bucket fillPercent option for boltdb	2016-06-15 11:18:55 -04:00
Mark Mindenhall	c3c827aded	Add boltdb config test	2016-06-14 13:36:40 -06:00
Mark Mindenhall	d369bd5c3c	Add bucket fill percent option for boltdb	2016-06-13 18:47:38 -06:00
Marty Schoch	fedb46269e	updated whtitepsace to behave more like lucene/es	2016-06-10 15:30:43 -04:00
Marty Schoch	9c9dbcc90a	fix another test issue	2016-06-10 13:21:27 -04:00
Marty Schoch	5ec47500ae	fix format issue identified by go vet	2016-06-10 13:13:15 -04:00
Marty Schoch	80f1117a6c	add couchbase copyright and license now that CLA has been signed	2016-06-10 13:08:50 -04:00
Marty Schoch	043a3bfb7c	change cjk analyzer to use unicode tokenizer change cjk bigram analyzer to work with multi-rune terms add cjk width filter replaces full unicode normailzation these changes make the cjk analyzer behave more like elasticsearch they also remove the depenency on the whitespace analyzer which is now free to also behave more like lucene/es fixes #33	2016-06-10 13:04:40 -04:00
Marty Schoch	b91c5375e4	enhance bleve_dump to print dictionary and counts	2016-06-10 13:01:29 -04:00

... 3 4 5 6 7 ...

1194 Commits