bleve

Author	SHA1	Message	Date
Marty Schoch	ae4b354c72	Merge pull request #411 from steveyen/master tighter moss KV store iterator handling	2016-08-27 08:00:45 -04:00
Steve Yen	eaa59621ff	tighter moss KV store iterator handling	2016-08-19 09:10:03 -07:00
Marty Schoch	27ba6187bc	adds support for more complex field sorts with object (not string) previously from JSON we would just deserialize strings like "-abv" or "city" or "_id" or "_score" as simple sorts on fields, ids or scores respectively while this is simple and compact, it can be ambiguous (for example if you have a field starting with - or if you have a field named "_id" already. also, this simple syntax doesnt allow us to specify more cmoplex options to deal with type/mode/missing we keep support for the simple string syntax, but now also recognize a more expressive syntax like: { "by": "field", "field": "abv", "desc": true, "type": "string", "mode": "min", "missing": "first" } type, mode and missing are optional and default to "auto", "default", and "last" respectively	2016-08-17 14:33:51 -07:00
Marty Schoch	750e0ac16c	change sort field impl to use indexed values not stored values	2016-08-17 09:20:44 -07:00
Marty Schoch	5f1454106d	Merge pull request #402 from mschoch/indexapiwork Index/Search API work	2016-08-10 12:41:51 -04:00
Marty Schoch	aa3ae3d39c	enable read_only mode for boltdb indexes fixes #405	2016-08-06 10:47:34 -04:00
Marty Schoch	da794d3762	fix bug introduced by reuse of TermFrequencyRow values in a recent commit, we changed the code to reuse TermFrequencyRow objects intsead of constantly allocating new ones. unfortunately, one of the original methods was not coded with this reuse in mind, and a lazy initialization cause us to leak data from previous uses of the same object. in particular this caused term vector information from previous hits to still be applied to subsequent hits. eventually this causes the highlighter to try and highlight invalid regions of a slice. fixes #404	2016-08-05 08:33:04 -04:00
Marty Schoch	b857769217	document Reset behavior as its non-obvious	2016-08-03 17:16:15 -04:00
Marty Schoch	d7405a4d79	updated attempt to reuse []byte previous attempt was flawed (but maked by Reset() method) new approach is to do this work in the Reset() method itself, logically this is where it belongs. but further we acknowledge that IndexInternalID []byte lifetime lives beyond the TermFieldDoc, so another copy is made into the DocumentMatch. Although this introduces yet another copy the theory being tested is that it allows each of these structuress to reuse memory without additional allocation.	2016-08-03 17:01:27 -04:00
Marty Schoch	89d83cb5a1	reuse memory already allocated for copies of docids when the term field reader is copying ID values out of the kv store's iterator, it is already attempting to reuse the term frequency row data structure. this change allows us to also attempt to reuse the []byte allocated for previous copies of the docid. we reset the slice length to zero then copy the data into the existing slice, avoiding new allocation and garbage collection in the cases where there is already enough space	2016-08-03 13:45:48 -04:00
Marty Schoch	36de4a7097	cleaner fix for the TermFrequencyRow reuse bug reset to nil first, let remaining logic work as before	2016-08-01 17:17:29 -04:00
Marty Schoch	cfce9c5fc5	initialize term vector list in parseV otherwise reusing previous term frequency row causes us to keep tacking on to one gigantic list	2016-08-01 17:01:34 -04:00
Marty Schoch	172ca7e69e	need to copy the doc ID for it to survive past next iteration	2016-08-01 17:01:04 -04:00
Marty Schoch	1aacd9bad5	changed approach IndexInternalID is now []byte this is still opaque, and should still work for any future index implementations as it is a least common denominator choice, all implementations must internally represent the id as []byte at some point for storage to disk	2016-08-01 14:26:50 -04:00
Marty Schoch	5aa9e95468	major refactor of index/search API index id's are now opaque (until finally returned to top-level user) - the TermFieldDoc's returned by TermFieldReader no longer contain doc id - instead they return an opaque IndexInternalID - items returned are still in the "natural index order" - but that is no longer guaranteed to be "doc id order" - correct behavior requires that they all follow the same order - but not any particular order - new API FinalizeDocID which converts index internal ID's to public string ID - APIs used internally which previously took doc id now take IndexInternalID - that is DocumentFieldTerms() and DocumentFieldTermsForFields() - however, APIs that are used externally do not reflect this change - that is Document() - DocumentIDReader follows the same changes, but this is less obvious - behavior clarified, used to iterate doc ids, BUT NOT in doc id order - method STILL available to iterate doc ids in range - but again, you won't get them in any meaningful order - new method to iterate actual doc ids from list of possible ids - this was introduced to make the DocIDSearcher continue working searchers now work with the new opaque index internal doc ids - they return new DocumentMatchInternal (which does not have string ID) scorerers also work with these opaque index internal doc ids - they return DocumentMatchInternal (which does not have string ID) collectors now also perform a final step of converting the final result - they STILL return traditional DocumentMatch (with string ID) - but they now also require an IndexReader (so that they can do the conversion)	2016-07-31 13:46:18 -04:00
Marty Schoch	47ee69ae82	term field reader supports optionally omitting 3 details at the time you create the term field reader, you can specify that you don't need the term freq, the norm, or the term vectors in that case, the index implementation can choose to not return them in its subsequently returned values this is advisory only, some simple implementations may ignore this and continue to return the values anyway (as the current impl of upside_down does today) this change will allow future index implementations the opportunity to do less work when it isn't required	2016-07-30 10:26:42 -04:00
Steve Yen	4822cff63a	optimize Advance() with pre-allocated in-out param This perf-related change helps the code and API reach more similarity with the Next() methods, which now take a pre-allocate param.	2016-07-29 14:15:00 -07:00
Steve Yen	3c82086805	optimize upside_down reader & 64-bit struct alignments The UpsideDownCouchTermFieldReader.Next() only needs the doc ID from the key, so this change provides a specialized parseKDoc() method for that optimization. Additionally, fields in various structs are more 64-bit aligned, in an attempt to reduce the invocations of runtime.typedmemmove() and runtime.heapBitsBulkBarrier(), which the go compiler seems to automatically insert to transparently handle misaligned data.	2016-07-23 10:37:40 -07:00
Steve Yen	5094d2d097	optimize moss PrefixIterator Previously, the PrefixIterator() for moss was implemented by comparing the prefix bytes on every Next(). With this optimization, the next larger endKeyExclusive is computed at the iterator's initialization, which allows us to avoid all those prefix comparisons.	2016-07-21 18:33:34 -07:00
Steve Yen	5271a0f62b	optimize termFieldVectorsFromTermVectors when empty	2016-07-21 11:46:14 -07:00
Steve Yen	cbb174b074	optimize moss iterator Next() done/k/v maintenance	2016-07-21 11:10:49 -07:00
Steve Yen	b744148449	optimization to actually reuse the TermFrequencyRow	2016-07-21 11:10:49 -07:00
Steve Yen	6d7fa0b964	optimize moss iterator checkDone()	2016-07-21 11:10:49 -07:00
Steve Yen	39d3e2f028	optimize upside_down reader Next() with TermFieldDoc reuse This optimization changes the index.TermFieldReader.Next() interface API, adding an optional, pre-allocated *TermFieldDoc parameter, which can help prevent garbage creation.	2016-07-21 11:10:49 -07:00
Steve Yen	2498ccc913	optimize upside_down reader Next() to reuse TermFrequencyRow Before this change, upside down's reader would alloc a new TermFrequencyRow on every Next(), which would be immediately transformed into an index.TermFieldDoc{}. This change reuses a pre-allocated TermFrequencyRow that's a field in the reader.	2016-07-21 11:10:49 -07:00
Steve Yen	68af6aef62	optimize upside_down reader Next() when 0-length term field vectors From some bleve-query perf profiling, term field vectors appeared to be alloc'ed, which was unnecessary as term field vectors are disabled in the bleve-blast/bleve-query tests.	2016-07-21 11:10:49 -07:00
Marty Schoch	5934a185f3	Merge pull request #398 from slavikm/master Make facets much faster	2016-07-21 09:12:28 -04:00
slavikm	fc990bc2d1	Remove the field IDs from outside of the index	2016-07-19 20:42:45 -07:00
slavikm	ce64c17be1	Do field cache only once per search	2016-07-17 16:29:17 -07:00
slavikm	9a9b630a6d	Make facets much faster	2016-07-17 15:31:35 -07:00
Steve Yen	80623f4a8a	MB-20101 - moss KV fix Get() of 0-length vals The moss KV store adapter's Get() implementation was incorrectly transforming a 0-length val (e.g., []byte{}) into a nil val.	2016-07-15 14:41:30 -07:00
Marty Schoch	bd2a23fb6d	remove firestorm index scheme firestorm was an experiment we learned a lot, but it did not result in a usable index scheme	2016-06-26 07:51:41 -04:00
Mark Mindenhall	c3c827aded	Add boltdb config test	2016-06-14 13:36:40 -06:00
Mark Mindenhall	d369bd5c3c	Add bucket fill percent option for boltdb	2016-06-13 18:47:38 -06:00
Marty Schoch	1be5699c54	Merge pull request #381 from MachineShop-IOT/master Compact for boltdb (workaround for #374)	2016-06-08 00:01:20 -04:00
Steve Yen	4e531ae11b	configurable mossStoreOptions and DeferredSort defaults to true	2016-06-07 17:38:43 -07:00
Mark Mindenhall	09fcc69516	rename defaultBatchSize to defaultCompactBatchSize	2016-06-01 14:25:57 -06:00
Mark Mindenhall	b5a4378a46	Cleanup godoc comments in PR	2016-06-01 13:59:57 -06:00
Mark Mindenhall	fecf7ab5c4	Compact for boltdb (workaround for #374 )	2016-06-01 13:16:43 -06:00
Marty Schoch	92cf2a8974	Merge pull request #376 from MachineShop-IOT/master Remove DictionaryTerm with count 0 during compact (workaround for #374)	2016-06-01 13:39:30 -04:00
Steve Yen	bf318b489b	enable mossStore as configurable lower-level store Also, bumped moss vendor SHA to latest moss with mossStore.	2016-05-26 13:33:22 -07:00
Mark Mindenhall	04351eb8f1	Move creation of iterator within transaction	2016-05-26 12:29:49 -06:00
Mark Mindenhall	686b20be4f	Remove DictionaryTerm with count 0 during compact (workaround for #374 )	2016-05-26 11:04:53 -06:00
Mark Mindenhall	3aa1d72233	Add compact method to goleveldb store	2016-05-17 16:58:17 -06:00
Marty Schoch	73b514fa4f	do not put +/-Inf or NaN values into the stats map	2016-04-15 13:39:30 -04:00
Marty Schoch	b8a2fbb887	fix data race in bleve batch reuse Currently bleve batch is build by user goroutine Then read by bleve gourinte This is still safe when used correctly However, Reset() will modify the map, which is now a data race This fix is to simply make batch.Reset() alloc new maps. This provides a data-access pattern that can be used safely. Also, this thread argues that creating a new map may be faster than trying to reuse an existing one: https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J Separate but related, I have opted to remove the "unsafe batch" checking that we did. This was always limited anyway, and now users of Go 1.6 are just as likely to get a panic from the runtime for concurrent map access anyway. So, the price paid by us (additional mutex) is not worth it. fixes #360 and #260	2016-04-08 15:32:13 -04:00
Marty Schoch	2a703376ea	fix ineffectual assignments	2016-04-02 22:42:56 -04:00
Marty Schoch	7892882519	fix typos	2016-04-02 21:59:30 -04:00
Marty Schoch	194ee82c80	gofmt simplifications	2016-04-02 21:54:33 -04:00
Marty Schoch	639fb1ab89	remove NativeMergeOperator from core, it requires unsafe	2016-03-24 12:06:43 -04:00
Marty Schoch	724684a4f1	additional firestorm fixes for 64-bit alignment part of #359	2016-03-20 11:02:13 -04:00
Marty Schoch	3dc64de478	moved fields requiring 64-bit alignment to start of struct several data structures had a pointer at the start of the struct on some 32-bit systems, this causes the remaining fields no longer be aligned on 64-bit boundaries the fix identifed by @pmezard is to put the counters first in the struct, which guarantees correct alignment fixes #359	2016-03-20 10:38:28 -04:00
Steve Yen	be2800a8e4	MB-18715 - moss Merge() didn't bump bufUsed correctly And, also allocate more memory for both the partial and full merges.	2016-03-15 17:09:40 -07:00
Steve Yen	c1597842d0	moss lowerLevelUpdate didn't handle batches of size 1	2016-03-11 15:47:23 -08:00
Steve Yen	f1dac8b497	moss defaults to non-nil options.Log	2016-03-09 10:15:11 -08:00
Steve Yen	1d63c55f7c	parse mossLowerLevelMaxBatchSize only when lower-level-store exists	2016-03-09 10:09:15 -08:00
Steve Yen	76b9365928	added moss RegistryCollectionOptions The moss RegistryCollectionOptions allows applications to register moss-related callback API functions and other advanced feature usage at process initialization time. For example, this could be used for moss's OnError(), OnEvent() and logging callback options.	2016-03-09 09:40:29 -08:00
Marty Schoch	d7292ed891	add support for gathering stats via map for easier consumption	2016-03-07 18:37:46 -05:00
Marty Schoch	e51f4d5450	changing async test strategy, was failing in go 1.6	2016-03-07 09:39:20 -05:00
Marty Schoch	23a323bc9d	add support for numPlainTextBytesIndexed metric	2016-03-05 14:05:08 -05:00
Marty Schoch	81780f97d0	add term search stats	2016-03-05 07:50:25 -05:00
Marty Schoch	147debaa12	expose metrics and moss stats wrapping underlying stats as well	2016-03-04 13:43:39 -05:00
Steve Yen	f6d1bd2c87	moss option MaxPreMergerBatches renamed	2016-03-03 11:18:30 -08:00
Steve Yen	7d67d89a9c	MB-18441 - moss lower-level iterator starts positioned on current The iterator starts off positioned so that Current() is correct, so invoking Next() right off the bat was incorrect.	2016-03-01 21:45:48 -08:00
Steve Yen	a29dd25a48	upside_down dict row value size accounts for large uvarint's This is somewhat unlikely, but if a term is (incredibly) popular, its uvarint count value representation might go beyond 8 bytes. Some KVStore implementations (like forestdb) provide a BatchEx cgo optimization that depends on proper preallocated counting, so this change provides a proper worst-case estimate based on the max-unvarint of 10 bytes instead of the previously incorrect 8 bytes.	2016-02-22 11:52:51 -08:00
Steve Yen	dd1718fa78	index/store/moss uses AllocMerge() instead of Merge() Performance optimization. Before this change, by using Merge() instead of AllocMerge(), moss's internal batch buf's would be wastefully, dramatically grown during append()'s to a mis-sized buf.	2016-02-22 11:48:02 -08:00
Steve Yen	ea1a52464d	more index/store/moss err handling	2016-02-20 14:25:42 -08:00
Steve Yen	eb315fa500	integrate index/store/moss KV store	2016-02-20 14:25:42 -08:00
Marty Schoch	208b700e17	add missing build tag guarding cznicb benchmark	2016-02-09 15:57:35 -05:00
Marty Schoch	40c95513b7	add support for including kvstore stats	2016-02-05 12:26:19 -05:00
Marty Schoch	c5dea9e882	fix accessing store via Advanced() method which was broken	2016-02-02 11:54:18 -05:00
Marty Schoch	710d06e974	add support for native C merge operators	2016-01-27 17:51:07 -05:00
Steve Yen	d97e3caf4f	fix comment typo	2016-01-22 09:04:24 -08:00
Steve Yen	d5de1d3da1	metrics implements BatchEx correctly	2016-01-21 11:00:41 -08:00
Marty Schoch	fc34a97875	copy locations on merge for more safe/predictable behavior fixes #328	2016-01-19 14:21:48 -05:00
Steve Yen	035d9d0e40	unneeded cast and parens	2016-01-17 00:16:05 -08:00
Marty Schoch	1335eb2a7b	Merge pull request #322 from steveyen/WIP-perf-20160113 KVReader.MultiGet and KVWriter.NewBatchEx API's	2016-01-15 14:28:59 -05:00
opennota	8517feb1c6	Fix some typos	2016-01-15 05:46:27 +07:00
Silvan Jegen	d326898f7b	Remove unneeded brackets	2016-01-14 16:41:41 +01:00
Steve Yen	6849e538be	upside_down and firestorm use new NewBatchEx() API With this change, the upside_down batchRows() and firestorm batchRows() now use the new KVWriter.NewBatchEx() API, which can improve performance by reducing the number of cgo hops.	2016-01-13 23:08:20 -08:00
Steve Yen	d94ccf2d74	added KVWriter.NewBatchEx() method	2016-01-13 16:19:04 -08:00
Steve Yen	fb048f6c64	added KVReader.MultiGet() method	2016-01-13 15:12:10 -08:00
Steve Yen	8dc067b1d9	go fmt	2016-01-13 15:11:50 -08:00
Steve Yen	fe39b3fd13	avoid fieldTermFreqs loop if no composite fields	2016-01-13 14:45:04 -08:00
Marty Schoch	af25e724f6	Merge branch 'master' of https://github.com/slavikm/bleve into slavikm-master	2016-01-13 16:10:59 -05:00
Silvan Jegen	35ac2b2bee	Run go fmt ./...	2016-01-12 22:15:50 +01:00
Steve Yen	0e72b949b3	upside_down batchRows() takes array of arrays In order to spend less time in append(), this change in upside_down (similar to another recent performance change in firestorm) builds up an array of arrays as the eventual input to batchRows().	2016-01-11 18:11:21 -08:00
slavikm	680be52f87	Implemented boolean field support	2016-01-11 17:18:03 -08:00
Steve Yen	7ce7d98cba	upside_down merge dictionary deltas before using batch.Merge() This change performs more dictionary delta incr/decr math in batchRows() instead of in the KVStore ExecuteBatch() machinery.	2016-01-11 16:52:07 -08:00
Steve Yen	94273d5fa9	upside_down process internal rows earlier With this change, internal rows are processed while we're waiting for backIndex rows to be retrieved.	2016-01-11 16:25:35 -08:00
Steve Yen	bb5cd8f3d6	upside_down merge backIndexRow concurrently Previously, the code would gather all the backIndexRows before processing them. This change instead merges the backIndexRows concurrently on the theory that we might as well make progress on compute & processing tasks while waiting for the rest of the back index rows to be fetched from the KVStore.	2016-01-10 18:50:42 -08:00
Steve Yen	c3b5246b0c	upside_down track analysis time tighter; and comments	2016-01-10 15:36:54 -08:00
Steve Yen	d3dd40d334	upside_down retrieves backindex concurrently with analysis Start backindex reading concurrently with analysi to try to utilize more I/O bandwidth. The analysis time vs indexing time stats tracking are also now "off", since there's now concurrency between those actiivties. One tradeoff is that the lock area in upside_down Batch() is increased as part of this change.	2016-01-10 15:18:28 -08:00
Steve Yen	bff95eef70	firestorm close kvwriter sooner	2016-01-10 15:18:27 -08:00
Steve Yen	860de28a28	fix memory leak by closing batches in batchRows()	2016-01-07 17:59:42 -08:00
Steve Yen	70105477cf	added Close() method to KVBatch interface	2016-01-07 17:54:21 -08:00
Marty Schoch	48fcd5a7d5	Merge branch 'WIP-perf-20160106' of https://github.com/steveyen/bleve into steveyen-WIP-perf-20160106	2016-01-07 15:40:29 -05:00
Marty Schoch	665f5c58e1	fix errcheck violation	2016-01-07 11:11:43 -05:00
Marty Schoch	e54db33346	try testing slightly different way	2016-01-07 11:06:18 -05:00
Marty Schoch	cd940cc375	add another check to try to understand test failure on travis	2016-01-07 10:45:20 -05:00
Steve Yen	846912d083	upside_down udc.termVectorsFromTokenFreq rows append optimization	2016-01-07 00:48:34 -08:00
Steve Yen	8b980bd2ef	firestorm avoid extra goroutine, similar to upside_down	2016-01-07 00:43:27 -08:00
Steve Yen	fbd0e7bfe9	upside_down backIndexTermEntries precalloc'ed capacity	2016-01-07 00:23:25 -08:00
Steve Yen	4eee8821f9	upside_down storeField/indexField append to provided arrays Taking another optimization from firestorm, upside_down's storeField()/indexField() funcs now also append() to passed-in arrays rather than always allocating their own arrays.	2016-01-07 00:13:46 -08:00
Steve Yen	1af2927967	upside_down gets analysis perf rows optimizations from firestorm	2016-01-06 23:53:13 -08:00
Steve Yen	82b8b3468e	upside_down analysis converts to docIDBytes once	2016-01-06 23:38:02 -08:00
Steve Yen	d6a997d8c1	firestorm gtreap lookup once per snapshot docID Previously, firestorm would lookup docID's in the inFlight gtreap for every candidate docNum, and this change moves the lookup to outside of the loop.	2016-01-06 16:46:15 -08:00
Steve Yen	024848ac91	firestorm valid docNum finding, fixes #310	2016-01-06 16:04:56 -08:00
Steve Yen	7df07f94fa	firestorm use the ParseKey() funcs to avoid unneeded value parsing With this change, the row allocation also happens only once per loop, instead of once per item.	2016-01-06 15:53:12 -08:00
Steve Yen	009d59222a	firestorm StoredRow.ParseKey() func	2016-01-06 15:46:26 -08:00
Steve Yen	8389027ae8	firestorm TermFreqRow.ParseKey() func	2016-01-06 15:32:09 -08:00
Steve Yen	89d17f01ef	analyze locations only if includeTermVectors enabled With this change, TermLocations are computed and maintained only if includeTermVectors is enabled, for higher performance.	2016-01-05 12:46:46 -08:00
Steve Yen	70b7e73c82	firestorm compensator inFlight.Get() might return nil	2016-01-03 10:21:54 -08:00
Steve Yen	fb8c9a7475	firestorm.Batch() collects [][]IndexRows instead of []IndexRow Rather than append() all received rows into a flat []IndexRow during the result gathering loop, this change instead collects the analysis result rows into a [][]IndexRow, which avoids extra copying. As part of this, firestorm batchRows() now takes the [][]IndexRow as its input.	2016-01-02 12:30:47 -08:00
Steve Yen	1c5b84911d	firestorm DictUpdater NotifyBatch is more async	2016-01-02 12:21:25 -08:00
Steve Yen	b241242465	firestorm.Analyze() preallocs rows, with analyzeField() func The new analyzeField() helper func is used for both regular fields and for composite fields. With this change, all analysis is done up front, for both regular fields and composite fields. After analysis, this change counts up all the row capacity needed and extends the AnalysisResult.Rows in one shot, as opposed to the previous approach of dynamically growing the array as needed during append()'s. Also, in this change, the TermFreqRow for _id is added first, which seems more correct.	2016-01-02 12:21:25 -08:00
Steve Yen	5b2bc1c20f	firestorm.indexField() check for includeTermVectors moved out of loop	2016-01-02 12:21:25 -08:00
Steve Yen	45e9eaaacb	firestorm.indexField() allocs up-front array of TermFreqRow's This uses the "backing array" technique to allocate many TermFreqRow's at the front of firestorm.indexField(), instead of the previous one-by-one, as-needed TermFreqRow allocation approach. Results from micro-benchmark, null-firestorm, bleve-blast has this change producing a ~half MB/sec improvement.	2016-01-02 12:21:24 -08:00
Steve Yen	7ae696d661	firestorm lookuper notified via batch Previously, the firestorm.Batch() would notify the lookuper goroutine on a document by document basis. If the lookuper input channel became full, then that would block the firestorm.Batch() operation. With this change, lookuper is notified once, with a "batch" that is an []InFlightItem. This change also reuses that same []InFlightItem to invoke the compensator.MutateBatch(). This also has the advantage of only converting the docID's from string to []byte just once, outside of the lock that's used by the compensator. Micro-benchmark of this change with null-firestorm bleve-blast does not show large impact, neither degradation or improvement.	2016-01-02 12:21:24 -08:00
Steve Yen	38d50ed8b5	renamed var to docsUpdated to match docsDeleted naming	2016-01-02 12:21:24 -08:00
Steve Yen	3feeb14b7d	firestorm.batchRows reuses buf for all IndexRows	2016-01-02 12:21:24 -08:00
Steve Yen	0a7f7e3df8	firestorm.Analyze() converts docID to bytes only once	2016-01-02 12:21:24 -08:00
Steve Yen	fd81d0364c	firestorm.indexField() uses capacity of len(tokenFreqs)	2016-01-02 12:21:24 -08:00
Steve Yen	ee5ccda112	use KeyTo/ValueTo in firestorm.batchRows After this change, with null kvstore micro-benchmark... GOMAXPROCS=8 ./bleve-blast -source=../../tmp/enwiki.txt \ -count=100000 -numAnalyzers=8 -numIndexers=8 \ -config=../../configs/null-firestorm.json -batch=100 Then TermFreqRow key and value methods dissapear as large boxes from the cpu profile graphs.	2016-01-01 09:57:59 -08:00
Steve Yen	fd287bdfa4	firestorm.md markdown fixes	2016-01-01 09:57:59 -08:00
Steve Yen	b605224106	use shorter go idiom	2015-12-29 22:14:45 -08:00
Antoine Grondin	6806343677	firestore: fix #296 for division by zero on GC	2015-12-25 11:34:19 +07:00
Antoine Grondin	a6f7abdfa3	firestore: reproducer for division by zero on GC	2015-12-25 11:33:46 +07:00
Marty Schoch	8efbd556a3	fix indexing bug with data coming from arrays fixes #295	2015-12-21 14:59:32 -05:00
Marty Schoch	cf67fe2cbc	fix major synchronization issue in the field_cache The field cache is expected to be the authority on which field names are identified by which identifier. This code was optimized for the most common case in which fields already exist. However, if we deterimine the field is missing with the read lock (shared), we incorrectly immediately proceed to create a new row with the write lock (exclusive). The problem is that multiple goroutines might have come to the same conclusion, and they all proceed to add rows. The two choices were to do the whole operation with the write lock, or recheck the value again with the write lock. We have chosen to repeat the check inside the write-lock, as this optimizes for what we believe to be the most common case, in which most fields will already exist.	2015-12-15 16:39:38 -05:00
Marty Schoch	a73a178923	fix incorrect prefix search behavior avoids double incrementing of end term when reading term dict fixes #293	2015-12-04 14:07:16 -05:00
Marty Schoch	699c86073a	make existing integration tests work with firestorm	2015-12-01 12:29:56 -05:00
Marty Schoch	6d851cfcc2	fix bug in warmup which led to docs being deleted	2015-11-30 10:18:14 -05:00
Marty Schoch	aa8d98f5fa	include space after prefix in log output	2015-11-30 10:17:48 -05:00
Marty Schoch	68d8742826	correctly prefix internal rows with 'i' and print them in debug	2015-11-30 10:17:15 -05:00
Marty Schoch	c93de9734e	fix issues identified by errcheck	2015-11-24 14:32:33 -05:00
Marty Schoch	bbef1980d8	Merge branch 'master' into firestorm	2015-11-24 13:04:36 -05:00
Marty Schoch	ff11f83842	properly handle errors inside metrics kvstore reporting	2015-11-24 12:52:03 -05:00
Marty Schoch	a707d44e0b	Merge branch 'master' into firestorm	2015-11-24 09:44:47 -05:00
Patrick Mezard	e85c9c542e	row: expose TermFrequencyRow term and freq fields Rows content is an implementation detail of bleve index and may change in the future. That said, they also contains information valuable to assess the quality of the index or understand its performances. So, as long as we agree that type asserting rows should only be done if you know what you are doing and are ready to deal with future changes, I see no reason to hide the row fields from external packages. Fix #268	2015-11-17 17:21:26 +01:00
Kosov Eugene	45e670b99b	BoltDB wrapper nano optimization which makes code a bit prettier too	2015-11-05 00:27:28 +03:00
Marty Schoch	4791625b9b	Merge pull request #262 from pmezard/index-and-tokenizer-doc-and-fix Index and tokenizer doc and fix	2015-11-02 11:51:21 -05:00
Marty Schoch	30651065e9	fix panic on insufficiently sized buffer adds test case to reproduce original problem fixes #264	2015-10-30 18:25:38 -04:00
Marty Schoch	2bd3ef4080	copy relevant k/v pairs before advancing underlying iterator	2015-10-28 12:23:54 -04:00
Marty Schoch	d1b07f4909	fix dump methods to properly copy keys and values	2015-10-28 12:06:44 -04:00
Marty Schoch	01526e971f	Merge branch 'master' into firestorm	2015-10-28 11:26:01 -04:00
Patrick Mezard	f2b3d5698e	index: document TermFieldReader interface	2015-10-27 18:53:03 +01:00
Patrick Mezard	3df789d258	index: document empty strings behaviour when calling DocIDReader()	2015-10-27 18:53:03 +01:00
Marty Schoch	1a978a4591	fix go vet issues and cleanup reader/iterator	2015-10-26 16:41:58 -04:00
Marty Schoch	f0d282f5f8	add test case for seeing prefix iterators outside of range similar to #256 except for prefix iterators includes fix for boltdb and gtreap which had incorrect behavior	2015-10-26 16:14:29 -04:00
Patrick Mezard	5100e00f20	doc: DocIDReader.Advance() is no longer implementation dependent	2015-10-20 20:32:23 +02:00
Patrick Mezard	2fa334fc27	doc: talk about "documents" not "indexed or stored documents"	2015-10-20 20:24:24 +02:00
Patrick Mezard	b174c137fd	doc: document DocIDReader, and some Index bits	2015-10-20 20:24:24 +02:00
Patrick Mezard	da72d0c2b9	store_test: deduplicate store initialization	2015-10-20 19:21:01 +02:00
Patrick Mezard	873f483804	gtreap: RangeIterator.Seek should not move before start	2015-10-20 19:12:30 +02:00
Patrick Mezard	5d7628ba3b	boltdb: fix RangeIterator outside of range seeks Two issues: - Seeking before i.start and iterating returned keys before i.start - Seeking after the store last key did not invalidate the iterator and could cause infinite loops.	2015-10-20 19:09:51 +02:00
Patrick Mezard	aada2e7333	store_test: test RangeIterator.Seek on goleveldb	2015-10-20 19:09:38 +02:00
Marty Schoch	6cc21346dc	fix errcheck issues	2015-10-19 14:27:03 -04:00
Marty Schoch	817c317c90	Merge branch 'master' into newkvstore	2015-10-19 12:04:07 -04:00
Marty Schoch	faceecf87b	make row buffer size constant/configurable also handle case where it is insufficiently sized	2015-10-19 12:03:38 -04:00
Marty Schoch	f0ee9a3c66	removed commented code and unused functions	2015-10-19 11:13:03 -04:00
Marty Schoch	c9471d5739	Merge pull request #244 from kevgs/master reducing allocation count	2015-10-16 15:51:30 -04:00
Marty Schoch	e6d0fc8d95	Merge pull request #247 from pmezard/remove-update-goroutine upside_down: no need for a goroutine to enqueue AnalysisWork	2015-10-16 10:15:55 -04:00
Marty Schoch	4c6bc23043	rewrite to keep using same buffer when possible	2015-10-13 14:04:56 -07:00
Marty Schoch	8de860bf12	2 more places that used old Key()	2015-10-13 12:35:08 -07:00
Marty Schoch	5f594d1acc	Merge branch 'master' into newkvstore	2015-10-12 18:07:04 -07:00
Marty Schoch	08572e4925	move literals outside loop for more predicatble test results	2015-10-12 18:06:38 -07:00
Patrick Mezard	8c928539ee	upside_down: no need for a goroutine to enqueue AnalysisWork It boils down to: 1. client sends some work and a notification channel to a single worker, then waits. 2. worker processes the work 3. worker sends the result to the client using the notification channel I do not see any problem with this, even with unbuffered channels.	2015-10-12 10:42:14 +02:00
Marty Schoch	95e06538f3	fix benchmarks for the x kvstores	2015-10-09 11:09:42 -04:00
Marty Schoch	0f05d1d3ca	Merge branch 'master' into newkvstore	2015-10-09 10:33:41 -04:00
Patrick Mezard	aee82f8b49	upside_down: simplify return code in batchRows()	2015-10-09 09:57:12 +02:00
Marty Schoch	e28eb749d7	bump up buffer size	2015-10-06 16:45:38 -04:00
Marty Schoch	71cbb13e07	modify code to reuse buffer for kv generation	2015-10-05 17:49:50 -04:00
Kosov Eugene	a61c350888	reducing allocation count	2015-10-05 22:57:10 +03:00
Patrick Mezard	9d5407be13	boltdb: add "nosync" option to force boltdb.DB.NoSync=true Use this option when rebuilding indexes from scratch. In my small case (~17000 json documents), it reduces indexing from 520s to 250s. I did not add any test, short of forced indexing termination it only has performance effects, which are hard to test. And unknown options are currently ignored. Issue #240	2015-10-03 14:26:48 +02:00
Marty Schoch	d06b526cbf	more refactoring	2015-09-28 16:50:27 -04:00
Marty Schoch	66aa1b020a	Merge branch 'master' into firestorm	2015-09-23 11:32:25 -07:00
Marty Schoch	900f1b4a67	major kvstore interface and impl overhaul clarified the interface contract	2015-09-23 11:25:47 -07:00
Marty Schoch	f81b2be334	major refactor of bleve configuration see #221 for full details	2015-09-16 17:10:59 -04:00
Marty Schoch	c308f611cf	skip unnecessary map before slice benchmark old ns/op new ns/op delta BenchmarkBatch-4 16950972 16377194 -3.38% benchmark old allocs new allocs delta BenchmarkBatch-4 136164 136161 -0.00% benchmark old bytes new bytes delta BenchmarkBatch-4 7168872 7109691 -0.83%	2015-09-10 08:21:26 -04:00
Marty Schoch	f6f1628b15	avoid doing unnecessary work: benchmark old ns/op new ns/op delta BenchmarkBatch-4 20738739 17047158 -17.80% benchmark old allocs new allocs delta BenchmarkBatch-4 136423 136160 -0.19% benchmark old bytes new bytes delta BenchmarkBatch-4 20277781 7168772 -64.65%	2015-09-10 08:19:05 -04:00
Marty Schoch	c8538c835f	Merge branch 'master' into firestorm	2015-09-10 08:14:14 -04:00
Marty Schoch	17c64d37c7	add similar benchmarks from firestorm	2015-09-10 08:13:52 -04:00
Marty Schoch	1e4d637761	adding more benchmarks	2015-09-10 08:01:11 -04:00
Marty Schoch	f74ed6a9ae	Merge remote-tracking branch 'origin' into firestorm cathching up with changes from master	2015-09-02 13:29:03 -04:00
Marty Schoch	dbb93b75a4	refactoring to allow pluggable index encodings this lays the foundation for supporting the new firestorm indexing scheme. i'm merging these changes ahead of the rest of the firestorm branch so i can continue to make changes to the analysis pipeline in parallel	2015-09-02 13:12:08 -04:00
Marty Schoch	7ad7659ce5	add support for using null kvstore outside of bleve internals	2015-09-02 11:50:06 -04:00
Marty Schoch	07d37ca38a	add important rocksdb config options	2015-09-02 11:49:42 -04:00
Marty Schoch	18151862b5	fix go vet issues	2015-08-25 15:13:13 -04:00
Marty Schoch	84811cf5a0	made index type configurable + first version of firestorm	2015-08-25 14:52:42 -04:00
Marty Schoch	3e60ca24ec	support using end key on forestdb iterator for term freq lookup also additoanl forestdb configs	2015-08-18 16:22:02 -04:00
Marty Schoch	ae19d77b04	updated protobuf defs to be valid	2015-08-17 15:37:13 -04:00
Marty Schoch	1187436e46	changed Stored row Values to also use protobuf	2015-08-17 09:48:40 -04:00
Marty Schoch	8d8a05a842	fix more issues	2015-08-14 16:27:00 -04:00
Marty Schoch	e0802a2b39	fixed the worst of the formatting	2015-08-14 16:17:48 -04:00
Marty Schoch	f4df56eb7c	add first draft of firestorm proposal	2015-08-14 16:09:19 -04:00
Marty Schoch	d3dda3d0ea	fixup config parsing and add new options	2015-08-12 13:18:23 -04:00
Marty Schoch	01667dfff3	faster protobufs with gogo	2015-08-12 13:18:23 -04:00
Marty Schoch	7df66b4857	fix broken benchmark cause by index row encoding change	2015-08-06 14:48:04 -04:00
Marty Schoch	9db850a53e	Merge branch 'fix/MaxVarintLen64' of https://github.com/tukdesk/bleve into tukdesk-fix/MaxVarintLen64	2015-07-31 15:16:16 -04:00
Marty Schoch	3682c25467	update to correctly work with composite fields also updated search results to return array positions	2015-07-31 11:16:11 -04:00
Marty Schoch	c1c4941dde	Merge branch 'feature/term_vector' of https://github.com/tukdesk/bleve into tukdesk-feature/term_vector	2015-07-29 14:31:15 -04:00
Marty Schoch	bf8dcae76b	removing build tags	2015-07-28 18:59:10 -04:00
Marty Schoch	1b28f6218b	additional row validation	2015-07-13 15:22:54 -04:00
Marty Schoch	17ef48f82a	switching back to the canonical goleveldb repo	2015-07-08 12:21:17 -06:00
Marty Schoch	bf80f4628e	fix bug in curent goleveldb (must copy during iteration) also changed over to mschoch fork of goleveldb (temporary) the change to my fork is pending some read-only issues described here: https://github.com/syndtr/goleveldb/issues/111 hopefully we can find a path forward, and get that addressed upstream	2015-07-06 18:00:05 -04:00
Marty Schoch	7be7ecdf8e	fix batch indexing bug, incremented docCount before commit fixes #211	2015-06-08 14:14:05 -04:00
Marty Schoch	2768c2da3c	fix previous sloppy fix which hadn't been adequately tested	2015-05-27 19:15:55 -07:00
Marty Schoch	201fb91171	fix up to correctly trim off separator even though it should never be present	2015-05-27 19:10:12 -07:00
Marty Schoch	a58592ceff	fix case where NewBackIndexRowKV returns nil, nil the logic for reading the docID from the keys in this row relies on the keys NEVER containing the byte separator character (0xff), this is OK as we require that all keys be valid utf-8 however, it turns out that in the case where this rule was violated, we would panic, because we return nil, nil and later try to print the doc id	2015-05-27 19:04:57 -07:00
dtynn	59c97ae577	use binary.MaxVarintLen64	2015-05-26 15:35:31 +08:00
Marty Schoch	e0887f9113	fix tests which deadlock boltdb due to deferred cleanup fixes #209	2015-05-21 12:29:31 -04:00
Marty Schoch	a52d3b5c07	put in hack to allow boltdb reader isolation test to pass in boltdb, long readers MAY block a writer. in particular if the write requires additional allocation, it must acquire a lock already held by the reader. in general this is not a problem for bleve (though it can affect performance in some cases), but it is a problem for the reader isolation test. this commit adds a hack to try and avoid the need for additional allocation closes #208	2015-05-21 11:39:59 -04:00
dtynn	b4f7496031	update the index format version number	2015-05-18 15:16:35 +08:00
dtynn	89dc2c22bc	update TermVector	2015-05-17 13:07:14 +08:00
Marty Schoch	8f70def63b	properly use the stored array positions when loading a document fixes #205	2015-05-15 15:47:54 -04:00
Marty Schoch	328bc73ed0	clarify Batch is not threadsafe in docs in some limited cases we can detect unsafe usage in these cases, do not trip over ourselves and panic instead return a strongly typed error upside_down.UnsafeBatchUseDetected also, introduced Batch.Reset() to allow batch reuse this is currently still experimental closes #195	2015-05-15 15:04:52 -04:00
Marty Schoch	57cd67fa88	fix data race on index metadata (docCount) closes #198	2015-05-08 08:07:20 -04:00
Marty Schoch	57358088ec	fix row merging bug trying to be clever, we reused the memory allocated for the left operand when doing partial merges this had been tested to be safe, in general. however, the implementation was then written such that we always reused globally defined operands, this meant that we mutated the operands which were intended to always represent +1/-1 this then cascades quickly to making increment/decrement values much larger/smaller than they should be related to #197	2015-05-06 11:00:04 -04:00
Marty Schoch	30a0ba1f9b	fix bug, dictionary row encoding buffer too small we incorrectly created a []byte of length 8 but the max for a uvarint is 10 closes #197	2015-05-06 10:04:02 -04:00
Steve Yen	e98ae8ab71	update metrics store to latest kvstore api	2015-04-27 11:01:53 -07:00
Marty Schoch	16f538d7b7	close documents returned by iterator before losing their reference fixes #194	2015-04-24 17:48:21 -04:00
Marty Schoch	b54a59139c	change forestdb imports to couchbase not couchbaselabs	2015-04-24 17:35:01 -04:00
Marty Schoch	ee47d1c21a	standardize on including 1000 sized batches	2015-04-24 17:31:34 -04:00
Marty Schoch	452fea6a24	adding initial impl of rocksdb kv store	2015-04-24 17:19:44 -04:00
Marty Schoch	a9c07acbfa	refactor of kvstore api to support native merge in rocksdb refactor to share code in emulated batch refactor to share code in emulated merge refactor index kvstore benchmarks to share more code refactor index kvstore benchmarks to be more repeatable	2015-04-24 17:13:50 -04:00
indraniel	a62320a50e	+ fix goleveldb's BytesSafeAfterClose() on reader - it should be set to false	2015-04-10 15:45:22 -05:00
Marty Schoch	d5dc66313f	change variable name conflicting when both LevelDB bencharmks run	2015-04-10 15:03:44 -04:00
Marty Schoch	d5caad4405	changed GoLevelDB benchmark names to be different from LevelDB this will allow for easier comparision when running both versions at the same time	2015-04-10 15:00:56 -04:00
Marty Schoch	5f66bd84c7	fix issues identified by errcheck	2015-04-10 14:59:05 -04:00
indraniel	54ab493b3e	+ correctly copy bytes from the goleveldb store - this is part of a recent bleve KVStore API change. See the following two google group threads for more details: * [help adding goleveldb as an alternative Key/Value store for bleve][1] * [bleve search performance improvement][2] [1]: https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY [2]: https://groups.google.com/forum/#!topic/bleve/aTyqsSnbhik	2015-04-10 11:25:23 -05:00
indraniel	81bef38cce	Revert "+ make copies of the []bytes returned by goleveldb" This reverts commit cb8c1741289a0f00b30733e0d52d9d81d1199603. This commit is no longer desired. The KV store API has been changed to better address this issue. For more details, see the google group conversation thread at: https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY	2015-04-10 11:12:44 -05:00
indraniel	3a70401835	+ make copies of the []bytes returned by goleveldb - The byte strings returned by goleveldb aren't necessarily safe. See the following google group thread: https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY This code change is based on the gist created here: https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY	2015-04-10 11:08:02 -05:00
indraniel	a88d714778	+ add a goleveldb index updside-down benchmark test	2015-04-10 11:08:02 -05:00
indraniel	a0a2a61050	+ keep 'get' consistent with levigo implementation - this change keeps the method behavior consistent with the levigo/leveldb implementation. - don't issue an err if a key isn't found	2015-04-10 11:08:02 -05:00
indraniel	5e55fa2866	+ keep 'getWithSnapshot' consistent with levigo implementation - this change keeps the method behavior consistent with the levigo/leveldb implementation. - the leveldb store_test.go and goleveldb store_test.go are now identical.	2015-04-10 11:08:02 -05:00
indraniel	caa19e6c36	+ initial stub of goleveldb package - This is a first-pass introduction. Things may not be working correctly yet.	2015-04-10 11:08:02 -05:00
Marty Schoch	8581e73cef	added String method for Batch also changed Batch methods to pointer receiver closes #180	2015-04-08 10:41:42 -04:00
Marty Schoch	539aeb8dc7	fix errors identified by errcheck part of #169	2015-04-07 18:05:41 -04:00
Marty Schoch	ba6b3c8bb3	fix more issues identified by errcheck part of #169	2015-04-07 16:45:23 -04:00
Marty Schoch	ab24772bf0	fix issues identified by errcheck part of #169	2015-04-07 16:34:29 -04:00
Marty Schoch	56c4a09de1	fix issues identified by errcheck part of #169	2015-04-07 15:39:56 -04:00
Marty Schoch	93e01a803e	fix issues identified by errcheck part of #169	2015-04-07 14:52:00 -04:00
Marty Schoch	f1ec73e764	fix issues identified by errcheck part of #169	2015-04-07 13:26:54 -04:00
Marty Schoch	56a30a3574	fix issues identified by errcheck part of #169	2015-04-07 13:05:47 -04:00
Marty Schoch	d2e9409413	fix issues identified by errcheck part of #169	2015-04-07 12:04:59 -04:00
Marty Schoch	dd921d31e3	undoing `f92ab131e4` we now guarantee bytes were copied earlier in the chain the kv store is NOT responsible for making an additional copy closes #181	2015-04-07 11:12:28 -04:00
Marty Schoch	443c0252e0	fix another metrics BytesSafeAfterClose() loop closes #184	2015-04-03 21:17:23 -04:00
Steve Yen	efc39a6857	fix metrics BytesSafeAfterClose() loop fixes issue 184	2015-04-03 16:36:32 -07:00
Marty Schoch	867110e03b	major improvements to index row encoding improvements uncovered some issues with how k/v data was copied or not. to address this, kv abstraction layer now lets impl specify if the bytes returned are safe to use after a reader (or writer since writers are also readers) are closed See index/store/KVReader - BytesSafeAfterClose() bool false is the safe value if you're not sure it will cause index impls to copy the data Some kv impls already have created a copy a the C-api barrier in which case they can safely return true. Overall this yields ~25% speedup for searches with leveldb. It yields ~10% speedup for boltdb. Returning stored fields is now slower with boltdb, as previously we were returning unsafe bytes.	2015-04-03 16:50:48 -04:00
Steve Yen	dbf50b7f29	KVStore gtreap allows only 1 writer at a time	2015-03-26 16:40:18 -07:00
Steve Yen	f92ab131e4	KVStore gtreap implementation copies value bytes	2015-03-26 14:46:37 -07:00
Steve Yen	78453dab7d	metrics KVStore now tracks last 100 errors	2015-03-19 18:41:16 -07:00
Marty Schoch	a44a7c01af	rewrite to used fixed size []byte instead of buffer removes unchecked errors in calls to buffer.Write and also benchmarks considerably faster	2015-03-11 15:12:13 -04:00
Marty Schoch	522f9d5cc7	significant change to index format, support dictionary rows this introduces disk format v4 now the summary rows for a term are stored in their own "dictionary row" format, previously the same information was stored in special term frequency rows this now allows us to easily iterate all the terms for a field in sorted order (useful for many other fuzzy data structures) at the top-level of bleve you can now browse terms within a field using the following api on the Index interface: FieldDict(field string) (index.FieldDict, error) FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error) FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error) fixes #127	2015-03-10 16:22:19 -04:00
Marty Schoch	4e14f4e4ef	change path for forestdb test to correctly cleanup this is due to forestdb auto-compaction using the provided path as just the prefix, so if we're not careful we end up with many stray files laying around here, we create a sub-directory first, and just nuke the whole subdir when we're done	2015-03-10 14:05:58 -04:00
Marty Schoch	300ec79c96	first pass at checking errors that were ignored part of #169	2015-03-06 14:46:29 -05:00
Marty Schoch	a2ad7634f2	update term freq rows to use varint where possible benchmark old ns/op new ns/op delta BenchmarkLevelDBIndexing1Workers 1138292 657901 -42.20% BenchmarkLevelDBIndexing2Workers 1619323 647628 -60.01% BenchmarkLevelDBIndexing4Workers 1172845 636478 -45.73% BenchmarkLevelDBIndexing1Workers10Batch 465556545 448153394 -3.74% BenchmarkLevelDBIndexing2Workers10Batch 504203911 449657355 -10.82% BenchmarkLevelDBIndexing4Workers10Batch 510766435 439839335 -13.89% BenchmarkLevelDBIndexing1Workers100Batch 307657846 268976464 -12.57% BenchmarkLevelDBIndexing2Workers100Batch 302257400 269110215 -10.97% BenchmarkLevelDBIndexing4Workers100Batch 305320485 259084902 -15.14% BenchmarkLevelDBIndexing1Workers1000Batch 301320576 258070231 -14.35% BenchmarkLevelDBIndexing2Workers1000Batch 334174454 261175641 -21.84% BenchmarkLevelDBIndexing4Workers1000Batch 267732436 261461739 -2.34% closes #165	2015-03-06 13:00:53 -05:00
Marty Schoch	c566d34264	bump index format version number, start checking version on open	2015-02-17 17:16:31 +05:30
Steve Yen	38ee9be353	added some batch size 1000 microbenchmarks	2015-01-30 15:58:39 -08:00
Steve Yen	7d6a6aeaa8	single append for inmem KVStore batch	2015-01-29 11:14:08 -08:00
Steve Yen	5a30d36b17	cznicb KVStore uses Put() for faster read-modify-write	2015-01-29 11:02:01 -08:00
Steve Yen	b054cddf76	gtreap KVStore does 1 append for batch Set/Delete	2015-01-29 10:49:39 -08:00
Steve Yen	05d222f490	cznicb KVStore batch uses <2 appends per Set/Delete	2015-01-29 10:22:13 -08:00
Steve Yen	c5c59e61f4	make leveldb faster with non-zero sized batch	2015-01-29 10:20:26 -08:00
Steve Yen	1c1774d4ad	throw away data even faster in null KVStore	2015-01-29 10:17:21 -08:00
Steve Yen	782ad94e01	added debug tag for metrics KVStore	2015-01-16 11:18:40 -08:00
Marty Schoch	eebc8e7825	more debuging around forestdb snapshots	2015-01-16 14:18:28 -05:00
Marty Schoch	ba978ea27e	improving log messages	2015-01-16 14:07:47 -05:00
Marty Schoch	09fe749913	default to autocompaction for forestdb	2015-01-16 13:35:43 -05:00
Steve Yen	12dc2aff93	add go1.4 build tag to cznicb KVStore This is because github.com/cznic/b depends on sync.Pool.	2015-01-15 15:54:25 -08:00
Steve Yen	11ee0209ad	no leading zeros for metrics CSV output	2015-01-15 15:09:53 -08:00
Steve Yen	202191201c	added WriteCSV() to metrics KVStore	2015-01-15 14:11:15 -08:00
Steve Yen	9be4e217bc	metrics KVStore tracks perf metrics on a wrapped KVStore	2015-01-15 11:42:41 -08:00
Steve Yen	ea0a8657f3	added cznicb in-memory kvstore (no reader isolation)	2015-01-13 17:35:28 -08:00
Marty Schoch	362d240b09	added configurable options to leveldb	2015-01-13 16:24:51 -05:00
Steve Yen	d6e6f655c9	initialize forestdb config if provided	2015-01-13 12:03:24 -08:00
Steve Yen	1fa80ffc40	pass config to forestdb Open()	2015-01-13 11:04:02 -08:00
Steve Yen	3a00a968f2	close levigo's read & write options	2015-01-12 18:42:19 -08:00
Steve Yen	c20726bb93	close levigo.Options when db is closed	2015-01-12 18:42:19 -08:00
Steve Yen	603c3af8bb	added gtreap in-memory, copy-on-write KVStore	2015-01-12 11:26:21 -08:00
Marty Schoch	d68c52e621	adding forestdb benchmark	2015-01-12 12:56:37 -05:00
Steve Yen	ae3600aeea	expose forestdb rollback methods	2015-01-06 18:59:02 -08:00
Steve Yen	5467e0a385	forestdb registered name fixed	2015-01-06 17:36:05 -08:00
Marty Schoch	38bdcbeb62	update to new forestdb iterator api	2014-12-27 13:15:14 -08:00
Silvan Jegen	ef18dfe4cd	Fix typos in comments and strings	2014-12-18 18:43:12 +01:00
Sergey Avseyev	a8351be5a6	Update protobuf imports	2014-12-10 01:24:59 +03:00
Silvan Jegen	412049d63c	Remove unneeded import statements	2014-11-29 14:25:24 +01:00
Marty Schoch	6c7237ade9	added test for null kvstore	2014-11-26 15:50:57 -05:00
Marty Schoch	453d4cf770	change to always return stored fields in UTC	2014-11-26 15:36:34 -05:00
Marty Schoch	8ad0f64459	upgrade to current forestdb api	2014-11-25 21:52:35 -05:00
Marty Schoch	d5c1f4a9ab	refactored store tests	2014-11-25 21:52:23 -05:00
Silvan Jegen	e3a2d3b58b	Remove unneeded else clauses	2014-11-20 20:34:05 +01:00
Marty Schoch	47bc7caec3	added getRollbackID() and rollbackTo() to the ForestDB store	2014-11-04 08:34:49 -05:00
Marty Schoch	3f83149ed3	adding back the forestdb kv store impl	2014-10-31 09:42:32 -04:00
Marty Schoch	c7443fe52b	refactored API a bit more things can return error now in a couple of places we had to swallow errors because they didn't fit the existing API. in these case and proactively in a few others we now return error as well. also the batch API has been updated to allow performing set/delete internal within the batch	2014-10-31 09:40:23 -04:00
Marty Schoch	64b0066121	added support for tracking index stats and exposing via expvar closes #83	2014-10-02 11:12:49 -07:00
Marty Schoch	97902e2619	text analysis now moved out of index write lock onto goroutine 1. text analysis is now done before the write lock is acquired 2. there is now a pool of analysis workers 3. the size of this pool is configurable 4. this allows for documents in a batch to be analyzed concurrently as a part of benchmarking these changes i've also introduce a new null storage implementation. this should never be used, as it does not actualy build an index. it does however let us go through all the normal indexing machinery, without incuring any indexing I/O. this is very helpful in measuring improvements made to the text analsysis pipeline, which are often overshadowed by indexing times in benchmarks actually building an index.	2014-09-24 08:13:14 -04:00
Marty Schoch	198ca1ad4d	major refactor of kvstore/index internals, see below In the index/store package introduce KVReader creates snapshot all read operations consistent from this snapshot must close to release introduce KVWriter only one writer active access to all operations allows for consisten read-modify-write must close to release introduce AssociativeMerge operation on batch allows efficient read-modify-write for associative operations used to consolidate updates to the term summary rows saves 1 set and 1 get op per shared instance of term in field In the index package introduced an IndexReader exposes a consisten snapshot of the index for searching At top level All searches now operate on a consisten snapshot of the index	2014-09-12 17:21:35 -04:00
Marty Schoch	7819deb447	added boltdb benchmark, same as others	2014-09-12 16:55:50 -04:00
Marty Schoch	2294b24b9d	remove forestdb for now not any benfefit in maintaining this for the time being	2014-09-12 16:55:11 -04:00
Marty Schoch	9d2187706e	another round of golint	2014-09-03 19:53:59 -04:00
Marty Schoch	e21935f850	another round of golint cleanup	2014-09-03 19:16:46 -04:00
Marty Schoch	e1b77956d4	more golint cleanups	2014-09-03 18:47:02 -04:00
Marty Schoch	377ae090d0	additional golint issues resolved	2014-09-03 18:17:26 -04:00
Marty Schoch	d534b0836b	converted ALL_CAPS constants to CamelCase	2014-09-03 17:48:40 -04:00
Marty Schoch	8e6c8e5644	continued refactoring of the mapping code also renamed some constant that didnt follow go convetions	2014-09-03 13:02:10 -04:00
Marty Schoch	45e1b2dfc6	removing gouchstore store impl this implementation didn't really adhere to the contract and now that we have boltdb we have a better pure go impl	2014-09-02 13:56:35 -04:00
Marty Schoch	7a7eb2e94c	add newline between license and package this avoids cluttering godocs with the license	2014-09-02 10:54:50 -04:00
Marty Schoch	1161361bea	rename imports from couchbaselabs to blevesearch	2014-08-28 15:38:57 -04:00
Marty Schoch	ef59abe4c9	added build tag 'leveldb' to enable this kv store by default we now use the pure go boltdb kv store it is less tested at this point but appears to work test pass, and moves us closer to the goal of being able to just "go get" bleve	2014-08-25 15:18:24 -04:00
Marty Schoch	45a7a6dd8e	fix two missing Close calls holding iterators open	2014-08-25 15:13:15 -04:00
Marty Schoch	8bcf6adb60	changed close of read only tx to Rollback from Commit i was seeing deadlocks before this change using Rollback to close read only tx is what the built-in View() impl does, so i think its safe	2014-08-25 15:11:21 -04:00
Marty Schoch	d67ee483ba	change default bucket name to bleve	2014-08-25 15:11:04 -04:00
Marty Schoch	e7a8a1fbe6	fixing test	2014-08-25 12:34:16 -04:00
Marty Schoch	fbf3636a34	Merge pull request #86 from deoxxa/boltdb-storage add boltdb storage type	2014-08-25 12:27:26 -04:00
Marty Schoch	3309c698f8	fixed Document() behavior ot return nil when doc doesn't exist	2014-08-25 08:55:14 -04:00
deoxxa	a993fa4f74	add boltdb storage type	2014-08-24 18:37:56 +10:00
Marty Schoch	27f001bc14	overhauled top-level New/Open API New is now used to create new indexes Open is used to open existing indexes calls to Open no longer specify a mapping because the mapping is serialized and stored along with the index	2014-08-20 16:58:20 -04:00
Marty Schoch	a08a7f5b2a	fix broken tests	2014-08-19 10:02:33 -04:00
Marty Schoch	082a5b0b03	major change to fields now can track array positions for field values stored fields now include this in the key and the back index now uses protobufs to simplify serialization closes #73	2014-08-19 08:58:26 -04:00
Marty Schoch	c33f1668f7	refactor dump methods improved test coverage	2014-08-15 13:12:55 -04:00
Marty Schoch	4d53db9fc8	fixed bug with internal get/set/delete, added tests	2014-08-15 09:39:41 -04:00
Marty Schoch	c526a38369	major refactor of analysis files, now wired up to registry ultimately this is make it more convenient for us to wire up different elements of the analysis pipeline, without having to preload everything into memory before we need it separately the index layer now has a mechanism for storing internal key/value pairs. this is expected to be used to store the mapping, and possibly other pieces of data by the top layer, but not exposed to the user at the top.	2014-08-13 21:14:47 -04:00
Marty Schoch	e5d4e6f1e4	refactored index layer to support batch operations this change was then exposed at the higher levels also the beer-sample app was upgraded to index in batches of 100 by default. this yieled an indexing speed up from 27s to 16s. closes #57	2014-08-11 16:27:18 -04:00
Marty Schoch	7bbaa8ecd5	added support for returning facet results with requests supports terms, numeric ranges, and date ranges closes #14	2014-08-11 11:03:29 -04:00
Marty Schoch	292af78b9e	implemented prefix search closes #4	2014-08-07 13:45:39 -04:00
Marty Schoch	b16c1d7f79	changed term row encoding previously we used the format: 't' <utf-8 term> <byte separator> <16-bit field id> <utf-8 docID> <byte separator> now we have moved the field before the term, resulting in: 't' <16-bit field id> <utf-8 term> <byte separator> <utf-8 docID> <byte separator> this means now instead of all fields with the same term being grouped together all terms within the same field are grouped together this allows us to enumerate the terms used with a field this allows us to implement prefix search, and possibly improve numeric range queries	2014-08-07 09:39:04 -04:00
Marty Schoch	41d4f67ee2	fix storing/retrieving numeric and date fields also includes new ability to request stored fields be returned with results closes #55 and closes #56 and closes #58	2014-08-06 13:52:20 -04:00
Marty Schoch	4ae9eb895c	added method to list fields in the index also added a corresponding http handler	2014-07-31 11:47:36 -04:00
Marty Schoch	216767953c	introduced a config option to disable creating indexes if they don't already exist closes #23 and closes #24	2014-07-30 14:29:26 -04:00
Marty Schoch	2968d3538a	major refactor, apologies for the large commit removed analyzers (these are now built as needed through config) removed html chacter filter (now built as needed through config) added missing license header changed constructor signature of filters that cannot return errors filter constructors that can have errors, now have Must variant which panics change cdl2 tokenizer into filter (should only see lower-case input) new top level index api, closes #5 refactored index tests to not rely directly on analyzers moved query objects to top-level new top level search api, closes #12 top score collector allows skipping results index mapping supports _all by default, closes #3 and closes #6 index mapping supports disabled sections, closes #7 new http sub package with reusable http.Handler's, closes #22	2014-07-30 12:30:38 -04:00
Marty Schoch	70a8b03bed	added support for composite fields	2014-07-21 17:05:55 -04:00
Marty Schoch	d3466f3919	refactored field from struct to interface	2014-07-14 14:47:05 -04:00
Marty Schoch	2c86a731b4	added DocIdReader to Index interface added more debug capabilities removed hard-coded limitation on number of fields in doc	2014-07-11 14:24:28 -04:00
Marty Schoch	fda861d4e7	add formatted printing of stored rows fix critcal bug in prefix matching on stored row keys	2014-07-03 14:51:06 -04:00
Marty Schoch	9bebbec267	added support for stored fields and highlighting results	2014-06-26 11:43:13 -04:00
Marty Schoch	4af76f539d	fewer allocations building byte array encodings	2014-05-19 11:02:15 -04:00
Marty Schoch	ed308eb253	tweaking perf of gouchstore	2014-05-16 15:00:51 -04:00
Marty Schoch	1b8c353787	adding some benchmarking	2014-05-16 10:09:05 -04:00
Marty Schoch	eac4dee56d	fix bug in Get impl of ForestDB store	2014-05-16 10:08:23 -04:00
Marty Schoch	1c4726c16d	added build tag to include forestdb (not yet public)	2014-05-15 10:32:07 -04:00
Marty Schoch	456b002d64	adding store implementation for forestdb	2014-05-15 10:25:45 -04:00
Marty Schoch	cd5ea0991f	refactored store tests to share common code	2014-05-15 10:18:43 -04:00
Marty Schoch	d48eee948e	refactored index to separate out kv storage now how pluggable options for leveldb gouchstore in memory only	2014-05-09 16:37:04 -04:00
Marty Schoch	0be5cffd21	subsequent calls to advance on the same key should keep returning the same thing only increment on initial call	2014-04-24 16:08:28 -06:00
Marty Schoch	aeebcdd7fe	improved test coverage	2014-04-22 13:57:13 -04:00
Marty Schoch	f1926093de	improve coverage of the mock package	2014-04-22 13:14:17 -04:00
Marty Schoch	9ab4f97f26	fix bug when calling Advance on new reader	2014-04-22 13:13:56 -04:00
Marty Schoch	d0cdf639f3	added test of Advance()	2014-04-20 09:43:02 -04:00

... 5 6 7 8 9 ...

658 Commits