bleve

gibheer

bleve

Author	SHA1	Message	Date
abhinavdangeti	7e36109b3c	MB-28162: Provide API to estimate memory needed to run a search query This API (unexported) will estimate the amount of memory needed to execute a search query over an index before the collector begins data collection. Sample estimates for certain queries: {Size: 10, BenchmarkUpsidedownSearchOverhead} ESTIMATE BENCHMEM TermQuery 4616 4796 MatchQuery 5210 5405 DisjunctionQuery (Match queries) 7700 8447 DisjunctionQuery (Term queries) 6514 6591 ConjunctionQuery (Match queries) 7524 8175 Nested disjunction query (disjunction of disjunctions) 10306 10708 …	2018-03-06 13:53:42 -08:00
Marty Schoch	0eba2a3f0c	reduce garbage created while processing facets previously we parsed/returned large sections of the documents back index row in order to compute facet information. this would require parsing the protobuf of the entire back index row. unfortunately this creates considerable garbage. this new version introduces a visitor/callback approach to working with data inside the back index row. the benefit of this approach is that we can let the higher-level code see values, prior to any copies of data being made or intermediate garbage being created. implementations of the callback must copy any value which they would like to retain beyond the callback. NOTE: this approach is duplicates code from the automatically generated protobuf code NOTE: this approach assumes that the "field" field be serialized before the "terms" field. This is guaranteed by our currently generated protobuf encoder, and is recommended by the protobuf spec. But, decoders SHOULD support them occuring in any order, which we do not.	2017-03-02 17:00:46 -05:00
Patrick Mezard	c81fd6fdb0	index: DocIDReader.Next() returns nil when done not io.EOF	2016-11-20 19:05:35 +01:00
Steve Yen	2a8237e8cc	optimize FacetsBuilder with cached fields & avoid some allocs	2016-10-25 15:34:48 -07:00
Marty Schoch	2332455bd2	nicer formatting of license header	2016-10-02 10:13:14 -04:00
Marty Schoch	3fd2a64872	BREAKING CHANGE - removed DumpXXX() methods from bleve.Index The DumpXXX() methods were always documented as internal and unsupported. However, now they are being removed from the public top-level API. They are still available on the internal IndexReader, which can be accessed using the Advanced() method. The DocCount() and DumpXXX() methods on the internal index have moved to the internal index reader, since they logically operate on a snapshot of an index.	2016-09-13 12:40:01 -04:00
Marty Schoch	e1fb860a86	removed unused AsyncIndex interface	2016-09-13 08:42:36 -04:00
Marty Schoch	1b68c4ec5b	make backindex rows more compact, fix bug counting docs on start	2016-09-11 20:29:15 -04:00
Marty Schoch	da9339bcdf	refactor FinalizeID into ExternalID and InternalID	2016-09-11 20:29:14 -04:00
Marty Schoch	750e0ac16c	change sort field impl to use indexed values not stored values	2016-08-17 09:20:44 -07:00
Marty Schoch	b857769217	document Reset behavior as its non-obvious	2016-08-03 17:16:15 -04:00
Marty Schoch	d7405a4d79	updated attempt to reuse []byte previous attempt was flawed (but maked by Reset() method) new approach is to do this work in the Reset() method itself, logically this is where it belongs. but further we acknowledge that IndexInternalID []byte lifetime lives beyond the TermFieldDoc, so another copy is made into the DocumentMatch. Although this introduces yet another copy the theory being tested is that it allows each of these structuress to reuse memory without additional allocation.	2016-08-03 17:01:27 -04:00
Marty Schoch	1aacd9bad5	changed approach IndexInternalID is now []byte this is still opaque, and should still work for any future index implementations as it is a least common denominator choice, all implementations must internally represent the id as []byte at some point for storage to disk	2016-08-01 14:26:50 -04:00
Marty Schoch	5aa9e95468	major refactor of index/search API index id's are now opaque (until finally returned to top-level user) - the TermFieldDoc's returned by TermFieldReader no longer contain doc id - instead they return an opaque IndexInternalID - items returned are still in the "natural index order" - but that is no longer guaranteed to be "doc id order" - correct behavior requires that they all follow the same order - but not any particular order - new API FinalizeDocID which converts index internal ID's to public string ID - APIs used internally which previously took doc id now take IndexInternalID - that is DocumentFieldTerms() and DocumentFieldTermsForFields() - however, APIs that are used externally do not reflect this change - that is Document() - DocumentIDReader follows the same changes, but this is less obvious - behavior clarified, used to iterate doc ids, BUT NOT in doc id order - method STILL available to iterate doc ids in range - but again, you won't get them in any meaningful order - new method to iterate actual doc ids from list of possible ids - this was introduced to make the DocIDSearcher continue working searchers now work with the new opaque index internal doc ids - they return new DocumentMatchInternal (which does not have string ID) scorerers also work with these opaque index internal doc ids - they return DocumentMatchInternal (which does not have string ID) collectors now also perform a final step of converting the final result - they STILL return traditional DocumentMatch (with string ID) - but they now also require an IndexReader (so that they can do the conversion)	2016-07-31 13:46:18 -04:00
Marty Schoch	47ee69ae82	term field reader supports optionally omitting 3 details at the time you create the term field reader, you can specify that you don't need the term freq, the norm, or the term vectors in that case, the index implementation can choose to not return them in its subsequently returned values this is advisory only, some simple implementations may ignore this and continue to return the values anyway (as the current impl of upside_down does today) this change will allow future index implementations the opportunity to do less work when it isn't required	2016-07-30 10:26:42 -04:00
Steve Yen	4822cff63a	optimize Advance() with pre-allocated in-out param This perf-related change helps the code and API reach more similarity with the Next() methods, which now take a pre-allocate param.	2016-07-29 14:15:00 -07:00
Steve Yen	39d3e2f028	optimize upside_down reader Next() with TermFieldDoc reuse This optimization changes the index.TermFieldReader.Next() interface API, adding an optional, pre-allocated *TermFieldDoc parameter, which can help prevent garbage creation.	2016-07-21 11:10:49 -07:00
slavikm	fc990bc2d1	Remove the field IDs from outside of the index	2016-07-19 20:42:45 -07:00
slavikm	ce64c17be1	Do field cache only once per search	2016-07-17 16:29:17 -07:00
slavikm	9a9b630a6d	Make facets much faster	2016-07-17 15:31:35 -07:00
Marty Schoch	b8a2fbb887	fix data race in bleve batch reuse Currently bleve batch is build by user goroutine Then read by bleve gourinte This is still safe when used correctly However, Reset() will modify the map, which is now a data race This fix is to simply make batch.Reset() alloc new maps. This provides a data-access pattern that can be used safely. Also, this thread argues that creating a new map may be faster than trying to reuse an existing one: https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J Separate but related, I have opted to remove the "unsafe batch" checking that we did. This was always limited anyway, and now users of Go 1.6 are just as likely to get a panic from the runtime for concurrent map access anyway. So, the price paid by us (additional mutex) is not worth it. fixes #360 and #260	2016-04-08 15:32:13 -04:00
Marty Schoch	194ee82c80	gofmt simplifications	2016-04-02 21:54:33 -04:00
Marty Schoch	d7292ed891	add support for gathering stats via map for easier consumption	2016-03-07 18:37:46 -05:00
Marty Schoch	c5dea9e882	fix accessing store via Advanced() method which was broken	2016-02-02 11:54:18 -05:00
Marty Schoch	699c86073a	make existing integration tests work with firestorm	2015-12-01 12:29:56 -05:00
Patrick Mezard	f2b3d5698e	index: document TermFieldReader interface	2015-10-27 18:53:03 +01:00
Patrick Mezard	3df789d258	index: document empty strings behaviour when calling DocIDReader()	2015-10-27 18:53:03 +01:00
Patrick Mezard	5100e00f20	doc: DocIDReader.Advance() is no longer implementation dependent	2015-10-20 20:32:23 +02:00
Patrick Mezard	2fa334fc27	doc: talk about "documents" not "indexed or stored documents"	2015-10-20 20:24:24 +02:00
Patrick Mezard	b174c137fd	doc: document DocIDReader, and some Index bits	2015-10-20 20:24:24 +02:00
Marty Schoch	900f1b4a67	major kvstore interface and impl overhaul clarified the interface contract	2015-09-23 11:25:47 -07:00
Marty Schoch	dbb93b75a4	refactoring to allow pluggable index encodings this lays the foundation for supporting the new firestorm indexing scheme. i'm merging these changes ahead of the rest of the firestorm branch so i can continue to make changes to the analysis pipeline in parallel	2015-09-02 13:12:08 -04:00
dtynn	89dc2c22bc	update TermVector	2015-05-17 13:07:14 +08:00
Marty Schoch	328bc73ed0	clarify Batch is not threadsafe in docs in some limited cases we can detect unsafe usage in these cases, do not trip over ourselves and panic instead return a strongly typed error upside_down.UnsafeBatchUseDetected also, introduced Batch.Reset() to allow batch reuse this is currently still experimental closes #195	2015-05-15 15:04:52 -04:00
Marty Schoch	8581e73cef	added String method for Batch also changed Batch methods to pointer receiver closes #180	2015-04-08 10:41:42 -04:00
Marty Schoch	522f9d5cc7	significant change to index format, support dictionary rows this introduces disk format v4 now the summary rows for a term are stored in their own "dictionary row" format, previously the same information was stored in special term frequency rows this now allows us to easily iterate all the terms for a field in sorted order (useful for many other fuzzy data structures) at the top-level of bleve you can now browse terms within a field using the following api on the Index interface: FieldDict(field string) (index.FieldDict, error) FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error) FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error) fixes #127	2015-03-10 16:22:19 -04:00
Marty Schoch	300ec79c96	first pass at checking errors that were ignored part of #169	2015-03-06 14:46:29 -05:00
Marty Schoch	c7443fe52b	refactored API a bit more things can return error now in a couple of places we had to swallow errors because they didn't fit the existing API. in these case and proactively in a few others we now return error as well. also the batch API has been updated to allow performing set/delete internal within the batch	2014-10-31 09:40:23 -04:00
Marty Schoch	64b0066121	added support for tracking index stats and exposing via expvar closes #83	2014-10-02 11:12:49 -07:00
Marty Schoch	198ca1ad4d	major refactor of kvstore/index internals, see below In the index/store package introduce KVReader creates snapshot all read operations consistent from this snapshot must close to release introduce KVWriter only one writer active access to all operations allows for consisten read-modify-write must close to release introduce AssociativeMerge operation on batch allows efficient read-modify-write for associative operations used to consolidate updates to the term summary rows saves 1 set and 1 get op per shared instance of term in field In the index package introduced an IndexReader exposes a consisten snapshot of the index for searching At top level All searches now operate on a consisten snapshot of the index	2014-09-12 17:21:35 -04:00
Marty Schoch	9d2187706e	another round of golint	2014-09-03 19:53:59 -04:00
Marty Schoch	7a7eb2e94c	add newline between license and package this avoids cluttering godocs with the license	2014-09-02 10:54:50 -04:00
Marty Schoch	1161361bea	rename imports from couchbaselabs to blevesearch	2014-08-28 15:38:57 -04:00
Marty Schoch	c33f1668f7	refactor dump methods improved test coverage	2014-08-15 13:12:55 -04:00
Marty Schoch	c526a38369	major refactor of analysis files, now wired up to registry ultimately this is make it more convenient for us to wire up different elements of the analysis pipeline, without having to preload everything into memory before we need it separately the index layer now has a mechanism for storing internal key/value pairs. this is expected to be used to store the mapping, and possibly other pieces of data by the top layer, but not exposed to the user at the top.	2014-08-13 21:14:47 -04:00
Marty Schoch	e5d4e6f1e4	refactored index layer to support batch operations this change was then exposed at the higher levels also the beer-sample app was upgraded to index in batches of 100 by default. this yieled an indexing speed up from 27s to 16s. closes #57	2014-08-11 16:27:18 -04:00
Marty Schoch	7bbaa8ecd5	added support for returning facet results with requests supports terms, numeric ranges, and date ranges closes #14	2014-08-11 11:03:29 -04:00
Marty Schoch	292af78b9e	implemented prefix search closes #4	2014-08-07 13:45:39 -04:00
Marty Schoch	4ae9eb895c	added method to list fields in the index also added a corresponding http handler	2014-07-31 11:47:36 -04:00
Marty Schoch	216767953c	introduced a config option to disable creating indexes if they don't already exist closes #23 and closes #24	2014-07-30 14:29:26 -04:00

1 2

53 Commits