bleve

Author	SHA1	Message	Date
Marty Schoch	27ba6187bc	adds support for more complex field sorts with object (not string) previously from JSON we would just deserialize strings like "-abv" or "city" or "_id" or "_score" as simple sorts on fields, ids or scores respectively while this is simple and compact, it can be ambiguous (for example if you have a field starting with - or if you have a field named "_id" already. also, this simple syntax doesnt allow us to specify more cmoplex options to deal with type/mode/missing we keep support for the simple string syntax, but now also recognize a more expressive syntax like: { "by": "field", "field": "abv", "desc": true, "type": "string", "mode": "min", "missing": "first" } type, mode and missing are optional and default to "auto", "default", and "last" respectively	2016-08-17 14:33:51 -07:00
Marty Schoch	750e0ac16c	change sort field impl to use indexed values not stored values	2016-08-17 09:20:44 -07:00
Marty Schoch	d7405a4d79	updated attempt to reuse []byte previous attempt was flawed (but maked by Reset() method) new approach is to do this work in the Reset() method itself, logically this is where it belongs. but further we acknowledge that IndexInternalID []byte lifetime lives beyond the TermFieldDoc, so another copy is made into the DocumentMatch. Although this introduces yet another copy the theory being tested is that it allows each of these structuress to reuse memory without additional allocation.	2016-08-03 17:01:27 -04:00
Marty Schoch	89d83cb5a1	reuse memory already allocated for copies of docids when the term field reader is copying ID values out of the kv store's iterator, it is already attempting to reuse the term frequency row data structure. this change allows us to also attempt to reuse the []byte allocated for previous copies of the docid. we reset the slice length to zero then copy the data into the existing slice, avoiding new allocation and garbage collection in the cases where there is already enough space	2016-08-03 13:45:48 -04:00
Marty Schoch	36de4a7097	cleaner fix for the TermFrequencyRow reuse bug reset to nil first, let remaining logic work as before	2016-08-01 17:17:29 -04:00
Marty Schoch	cfce9c5fc5	initialize term vector list in parseV otherwise reusing previous term frequency row causes us to keep tacking on to one gigantic list	2016-08-01 17:01:34 -04:00
Marty Schoch	172ca7e69e	need to copy the doc ID for it to survive past next iteration	2016-08-01 17:01:04 -04:00
Marty Schoch	1aacd9bad5	changed approach IndexInternalID is now []byte this is still opaque, and should still work for any future index implementations as it is a least common denominator choice, all implementations must internally represent the id as []byte at some point for storage to disk	2016-08-01 14:26:50 -04:00
Marty Schoch	5aa9e95468	major refactor of index/search API index id's are now opaque (until finally returned to top-level user) - the TermFieldDoc's returned by TermFieldReader no longer contain doc id - instead they return an opaque IndexInternalID - items returned are still in the "natural index order" - but that is no longer guaranteed to be "doc id order" - correct behavior requires that they all follow the same order - but not any particular order - new API FinalizeDocID which converts index internal ID's to public string ID - APIs used internally which previously took doc id now take IndexInternalID - that is DocumentFieldTerms() and DocumentFieldTermsForFields() - however, APIs that are used externally do not reflect this change - that is Document() - DocumentIDReader follows the same changes, but this is less obvious - behavior clarified, used to iterate doc ids, BUT NOT in doc id order - method STILL available to iterate doc ids in range - but again, you won't get them in any meaningful order - new method to iterate actual doc ids from list of possible ids - this was introduced to make the DocIDSearcher continue working searchers now work with the new opaque index internal doc ids - they return new DocumentMatchInternal (which does not have string ID) scorerers also work with these opaque index internal doc ids - they return DocumentMatchInternal (which does not have string ID) collectors now also perform a final step of converting the final result - they STILL return traditional DocumentMatch (with string ID) - but they now also require an IndexReader (so that they can do the conversion)	2016-07-31 13:46:18 -04:00
Marty Schoch	47ee69ae82	term field reader supports optionally omitting 3 details at the time you create the term field reader, you can specify that you don't need the term freq, the norm, or the term vectors in that case, the index implementation can choose to not return them in its subsequently returned values this is advisory only, some simple implementations may ignore this and continue to return the values anyway (as the current impl of upside_down does today) this change will allow future index implementations the opportunity to do less work when it isn't required	2016-07-30 10:26:42 -04:00
Steve Yen	4822cff63a	optimize Advance() with pre-allocated in-out param This perf-related change helps the code and API reach more similarity with the Next() methods, which now take a pre-allocate param.	2016-07-29 14:15:00 -07:00
Steve Yen	3c82086805	optimize upside_down reader & 64-bit struct alignments The UpsideDownCouchTermFieldReader.Next() only needs the doc ID from the key, so this change provides a specialized parseKDoc() method for that optimization. Additionally, fields in various structs are more 64-bit aligned, in an attempt to reduce the invocations of runtime.typedmemmove() and runtime.heapBitsBulkBarrier(), which the go compiler seems to automatically insert to transparently handle misaligned data.	2016-07-23 10:37:40 -07:00
Steve Yen	5271a0f62b	optimize termFieldVectorsFromTermVectors when empty	2016-07-21 11:46:14 -07:00
Steve Yen	b744148449	optimization to actually reuse the TermFrequencyRow	2016-07-21 11:10:49 -07:00
Steve Yen	39d3e2f028	optimize upside_down reader Next() with TermFieldDoc reuse This optimization changes the index.TermFieldReader.Next() interface API, adding an optional, pre-allocated *TermFieldDoc parameter, which can help prevent garbage creation.	2016-07-21 11:10:49 -07:00
Steve Yen	2498ccc913	optimize upside_down reader Next() to reuse TermFrequencyRow Before this change, upside down's reader would alloc a new TermFrequencyRow on every Next(), which would be immediately transformed into an index.TermFieldDoc{}. This change reuses a pre-allocated TermFrequencyRow that's a field in the reader.	2016-07-21 11:10:49 -07:00
Steve Yen	68af6aef62	optimize upside_down reader Next() when 0-length term field vectors From some bleve-query perf profiling, term field vectors appeared to be alloc'ed, which was unnecessary as term field vectors are disabled in the bleve-blast/bleve-query tests.	2016-07-21 11:10:49 -07:00
slavikm	fc990bc2d1	Remove the field IDs from outside of the index	2016-07-19 20:42:45 -07:00
slavikm	ce64c17be1	Do field cache only once per search	2016-07-17 16:29:17 -07:00
slavikm	9a9b630a6d	Make facets much faster	2016-07-17 15:31:35 -07:00
Marty Schoch	b8a2fbb887	fix data race in bleve batch reuse Currently bleve batch is build by user goroutine Then read by bleve gourinte This is still safe when used correctly However, Reset() will modify the map, which is now a data race This fix is to simply make batch.Reset() alloc new maps. This provides a data-access pattern that can be used safely. Also, this thread argues that creating a new map may be faster than trying to reuse an existing one: https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J Separate but related, I have opted to remove the "unsafe batch" checking that we did. This was always limited anyway, and now users of Go 1.6 are just as likely to get a panic from the runtime for concurrent map access anyway. So, the price paid by us (additional mutex) is not worth it. fixes #360 and #260	2016-04-08 15:32:13 -04:00
Marty Schoch	7892882519	fix typos	2016-04-02 21:59:30 -04:00
Marty Schoch	194ee82c80	gofmt simplifications	2016-04-02 21:54:33 -04:00
Marty Schoch	3dc64de478	moved fields requiring 64-bit alignment to start of struct several data structures had a pointer at the start of the struct on some 32-bit systems, this causes the remaining fields no longer be aligned on 64-bit boundaries the fix identifed by @pmezard is to put the counters first in the struct, which guarantees correct alignment fixes #359	2016-03-20 10:38:28 -04:00
Steve Yen	be2800a8e4	MB-18715 - moss Merge() didn't bump bufUsed correctly And, also allocate more memory for both the partial and full merges.	2016-03-15 17:09:40 -07:00
Marty Schoch	d7292ed891	add support for gathering stats via map for easier consumption	2016-03-07 18:37:46 -05:00
Marty Schoch	23a323bc9d	add support for numPlainTextBytesIndexed metric	2016-03-05 14:05:08 -05:00
Marty Schoch	81780f97d0	add term search stats	2016-03-05 07:50:25 -05:00
Steve Yen	a29dd25a48	upside_down dict row value size accounts for large uvarint's This is somewhat unlikely, but if a term is (incredibly) popular, its uvarint count value representation might go beyond 8 bytes. Some KVStore implementations (like forestdb) provide a BatchEx cgo optimization that depends on proper preallocated counting, so this change provides a proper worst-case estimate based on the max-unvarint of 10 bytes instead of the previously incorrect 8 bytes.	2016-02-22 11:52:51 -08:00
Marty Schoch	40c95513b7	add support for including kvstore stats	2016-02-05 12:26:19 -05:00
Marty Schoch	c5dea9e882	fix accessing store via Advanced() method which was broken	2016-02-02 11:54:18 -05:00
Marty Schoch	fc34a97875	copy locations on merge for more safe/predictable behavior fixes #328	2016-01-19 14:21:48 -05:00
Steve Yen	035d9d0e40	unneeded cast and parens	2016-01-17 00:16:05 -08:00
Marty Schoch	1335eb2a7b	Merge pull request #322 from steveyen/WIP-perf-20160113 KVReader.MultiGet and KVWriter.NewBatchEx API's	2016-01-15 14:28:59 -05:00
Silvan Jegen	d326898f7b	Remove unneeded brackets	2016-01-14 16:41:41 +01:00
Steve Yen	6849e538be	upside_down and firestorm use new NewBatchEx() API With this change, the upside_down batchRows() and firestorm batchRows() now use the new KVWriter.NewBatchEx() API, which can improve performance by reducing the number of cgo hops.	2016-01-13 23:08:20 -08:00
Steve Yen	fe39b3fd13	avoid fieldTermFreqs loop if no composite fields	2016-01-13 14:45:04 -08:00
Marty Schoch	af25e724f6	Merge branch 'master' of https://github.com/slavikm/bleve into slavikm-master	2016-01-13 16:10:59 -05:00
Steve Yen	0e72b949b3	upside_down batchRows() takes array of arrays In order to spend less time in append(), this change in upside_down (similar to another recent performance change in firestorm) builds up an array of arrays as the eventual input to batchRows().	2016-01-11 18:11:21 -08:00
slavikm	680be52f87	Implemented boolean field support	2016-01-11 17:18:03 -08:00
Steve Yen	7ce7d98cba	upside_down merge dictionary deltas before using batch.Merge() This change performs more dictionary delta incr/decr math in batchRows() instead of in the KVStore ExecuteBatch() machinery.	2016-01-11 16:52:07 -08:00
Steve Yen	94273d5fa9	upside_down process internal rows earlier With this change, internal rows are processed while we're waiting for backIndex rows to be retrieved.	2016-01-11 16:25:35 -08:00
Steve Yen	bb5cd8f3d6	upside_down merge backIndexRow concurrently Previously, the code would gather all the backIndexRows before processing them. This change instead merges the backIndexRows concurrently on the theory that we might as well make progress on compute & processing tasks while waiting for the rest of the back index rows to be fetched from the KVStore.	2016-01-10 18:50:42 -08:00
Steve Yen	c3b5246b0c	upside_down track analysis time tighter; and comments	2016-01-10 15:36:54 -08:00
Steve Yen	d3dd40d334	upside_down retrieves backindex concurrently with analysis Start backindex reading concurrently with analysi to try to utilize more I/O bandwidth. The analysis time vs indexing time stats tracking are also now "off", since there's now concurrency between those actiivties. One tradeoff is that the lock area in upside_down Batch() is increased as part of this change.	2016-01-10 15:18:28 -08:00
Steve Yen	860de28a28	fix memory leak by closing batches in batchRows()	2016-01-07 17:59:42 -08:00
Steve Yen	846912d083	upside_down udc.termVectorsFromTokenFreq rows append optimization	2016-01-07 00:48:34 -08:00
Steve Yen	8b980bd2ef	firestorm avoid extra goroutine, similar to upside_down	2016-01-07 00:43:27 -08:00
Steve Yen	fbd0e7bfe9	upside_down backIndexTermEntries precalloc'ed capacity	2016-01-07 00:23:25 -08:00
Steve Yen	4eee8821f9	upside_down storeField/indexField append to provided arrays Taking another optimization from firestorm, upside_down's storeField()/indexField() funcs now also append() to passed-in arrays rather than always allocating their own arrays.	2016-01-07 00:13:46 -08:00

1 2 3 4

180 Commits