bleve

Author	SHA1	Message	Date
Marty Schoch	d7405a4d79	updated attempt to reuse []byte previous attempt was flawed (but maked by Reset() method) new approach is to do this work in the Reset() method itself, logically this is where it belongs. but further we acknowledge that IndexInternalID []byte lifetime lives beyond the TermFieldDoc, so another copy is made into the DocumentMatch. Although this introduces yet another copy the theory being tested is that it allows each of these structuress to reuse memory without additional allocation.	2016-08-03 17:01:27 -04:00
Marty Schoch	89d83cb5a1	reuse memory already allocated for copies of docids when the term field reader is copying ID values out of the kv store's iterator, it is already attempting to reuse the term frequency row data structure. this change allows us to also attempt to reuse the []byte allocated for previous copies of the docid. we reset the slice length to zero then copy the data into the existing slice, avoiding new allocation and garbage collection in the cases where there is already enough space	2016-08-03 13:45:48 -04:00
Marty Schoch	172ca7e69e	need to copy the doc ID for it to survive past next iteration	2016-08-01 17:01:04 -04:00
Marty Schoch	1aacd9bad5	changed approach IndexInternalID is now []byte this is still opaque, and should still work for any future index implementations as it is a least common denominator choice, all implementations must internally represent the id as []byte at some point for storage to disk	2016-08-01 14:26:50 -04:00
Marty Schoch	5aa9e95468	major refactor of index/search API index id's are now opaque (until finally returned to top-level user) - the TermFieldDoc's returned by TermFieldReader no longer contain doc id - instead they return an opaque IndexInternalID - items returned are still in the "natural index order" - but that is no longer guaranteed to be "doc id order" - correct behavior requires that they all follow the same order - but not any particular order - new API FinalizeDocID which converts index internal ID's to public string ID - APIs used internally which previously took doc id now take IndexInternalID - that is DocumentFieldTerms() and DocumentFieldTermsForFields() - however, APIs that are used externally do not reflect this change - that is Document() - DocumentIDReader follows the same changes, but this is less obvious - behavior clarified, used to iterate doc ids, BUT NOT in doc id order - method STILL available to iterate doc ids in range - but again, you won't get them in any meaningful order - new method to iterate actual doc ids from list of possible ids - this was introduced to make the DocIDSearcher continue working searchers now work with the new opaque index internal doc ids - they return new DocumentMatchInternal (which does not have string ID) scorerers also work with these opaque index internal doc ids - they return DocumentMatchInternal (which does not have string ID) collectors now also perform a final step of converting the final result - they STILL return traditional DocumentMatch (with string ID) - but they now also require an IndexReader (so that they can do the conversion)	2016-07-31 13:46:18 -04:00
Marty Schoch	47ee69ae82	term field reader supports optionally omitting 3 details at the time you create the term field reader, you can specify that you don't need the term freq, the norm, or the term vectors in that case, the index implementation can choose to not return them in its subsequently returned values this is advisory only, some simple implementations may ignore this and continue to return the values anyway (as the current impl of upside_down does today) this change will allow future index implementations the opportunity to do less work when it isn't required	2016-07-30 10:26:42 -04:00
Steve Yen	4822cff63a	optimize Advance() with pre-allocated in-out param This perf-related change helps the code and API reach more similarity with the Next() methods, which now take a pre-allocate param.	2016-07-29 14:15:00 -07:00
Steve Yen	3c82086805	optimize upside_down reader & 64-bit struct alignments The UpsideDownCouchTermFieldReader.Next() only needs the doc ID from the key, so this change provides a specialized parseKDoc() method for that optimization. Additionally, fields in various structs are more 64-bit aligned, in an attempt to reduce the invocations of runtime.typedmemmove() and runtime.heapBitsBulkBarrier(), which the go compiler seems to automatically insert to transparently handle misaligned data.	2016-07-23 10:37:40 -07:00
Steve Yen	b744148449	optimization to actually reuse the TermFrequencyRow	2016-07-21 11:10:49 -07:00
Steve Yen	39d3e2f028	optimize upside_down reader Next() with TermFieldDoc reuse This optimization changes the index.TermFieldReader.Next() interface API, adding an optional, pre-allocated *TermFieldDoc parameter, which can help prevent garbage creation.	2016-07-21 11:10:49 -07:00
Steve Yen	2498ccc913	optimize upside_down reader Next() to reuse TermFrequencyRow Before this change, upside down's reader would alloc a new TermFrequencyRow on every Next(), which would be immediately transformed into an index.TermFieldDoc{}. This change reuses a pre-allocated TermFrequencyRow that's a field in the reader.	2016-07-21 11:10:49 -07:00
Marty Schoch	81780f97d0	add term search stats	2016-03-05 07:50:25 -05:00
Steve Yen	82b8b3468e	upside_down analysis converts to docIDBytes once	2016-01-06 23:38:02 -08:00
Marty Schoch	2bd3ef4080	copy relevant k/v pairs before advancing underlying iterator	2015-10-28 12:23:54 -04:00
Marty Schoch	900f1b4a67	major kvstore interface and impl overhaul clarified the interface contract	2015-09-23 11:25:47 -07:00
Marty Schoch	3e60ca24ec	support using end key on forestdb iterator for term freq lookup also additoanl forestdb configs	2015-08-18 16:22:02 -04:00
Marty Schoch	867110e03b	major improvements to index row encoding improvements uncovered some issues with how k/v data was copied or not. to address this, kv abstraction layer now lets impl specify if the bytes returned are safe to use after a reader (or writer since writers are also readers) are closed See index/store/KVReader - BytesSafeAfterClose() bool false is the safe value if you're not sure it will cause index impls to copy the data Some kv impls already have created a copy a the C-api barrier in which case they can safely return true. Overall this yields ~25% speedup for searches with leveldb. It yields ~10% speedup for boltdb. Returning stored fields is now slower with boltdb, as previously we were returning unsafe bytes.	2015-04-03 16:50:48 -04:00
Marty Schoch	522f9d5cc7	significant change to index format, support dictionary rows this introduces disk format v4 now the summary rows for a term are stored in their own "dictionary row" format, previously the same information was stored in special term frequency rows this now allows us to easily iterate all the terms for a field in sorted order (useful for many other fuzzy data structures) at the top-level of bleve you can now browse terms within a field using the following api on the Index interface: FieldDict(field string) (index.FieldDict, error) FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error) FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error) fixes #127	2015-03-10 16:22:19 -04:00
Marty Schoch	300ec79c96	first pass at checking errors that were ignored part of #169	2015-03-06 14:46:29 -05:00
Marty Schoch	198ca1ad4d	major refactor of kvstore/index internals, see below In the index/store package introduce KVReader creates snapshot all read operations consistent from this snapshot must close to release introduce KVWriter only one writer active access to all operations allows for consisten read-modify-write must close to release introduce AssociativeMerge operation on batch allows efficient read-modify-write for associative operations used to consolidate updates to the term summary rows saves 1 set and 1 get op per shared instance of term in field In the index package introduced an IndexReader exposes a consisten snapshot of the index for searching At top level All searches now operate on a consisten snapshot of the index	2014-09-12 17:21:35 -04:00
Marty Schoch	9d2187706e	another round of golint	2014-09-03 19:53:59 -04:00
Marty Schoch	e1b77956d4	more golint cleanups	2014-09-03 18:47:02 -04:00
Marty Schoch	7a7eb2e94c	add newline between license and package this avoids cluttering godocs with the license	2014-09-02 10:54:50 -04:00
Marty Schoch	1161361bea	rename imports from couchbaselabs to blevesearch	2014-08-28 15:38:57 -04:00
Marty Schoch	c33f1668f7	refactor dump methods improved test coverage	2014-08-15 13:12:55 -04:00
Marty Schoch	2c86a731b4	added DocIdReader to Index interface added more debug capabilities removed hard-coded limitation on number of fields in doc	2014-07-11 14:24:28 -04:00
Marty Schoch	d48eee948e	refactored index to separate out kv storage now how pluggable options for leveldb gouchstore in memory only	2014-05-09 16:37:04 -04:00
Marty Schoch	f92f274665	refactored to remove panics, return errors, and fewer type assertions	2014-04-18 21:07:41 -04:00
Marty Schoch	bb2f66be92	Revert "refactor to use less panics, return more errors" This reverts commit `dec37fed07`.	2014-04-18 16:09:34 -04:00
Marty Schoch	dec37fed07	refactor to use less panics, return more errors	2014-04-18 15:54:29 -04:00
Marty Schoch	3d842dfaf2	initial commit	2014-04-17 16:55:53 -04:00

31 Commits