bleve

Author	SHA1	Message	Date
Marty Schoch	1a59a1bb99	attempt to fix core reference counting issues Observed problem: Persisted index state (in root bolt) would contain index snapshots which pointed to index files that did not exist. Debugging this uncovered two main problems: 1. At the end of persisting a snapshot, the persister creates a new index snapshot with the SAME epoch as the current root, only it replaces in-memory segments with the new disk based ones. This is problematic because reference counting an index segment triggers "eligible for deletion". And eligible for deletion is keyed by epoch. So having two separate instances going by the same epoch is problematic. Specifically, one of them gets to 0 before the other, and we wrongly conclude it's eligible for deletion, when in fact the "other" instance with same epoch is actually still in use. To address this problem, we have modified the behavior of the persister. Now, upon completion of persistence, ONLY if new files were actually created do we proceed to introduce a new snapshot. AND, this new snapshot now gets it's own brand new epoch. BOTH of these are important because since the persister now also introduces a new epoch, it will see this epoch again in the future AND be expected to persist it. That is OK (mostly harmless), but we cannot allow it to form a loop. Checking that new files were actually introduced is what short-circuits the potential loop. The new epoch introduced by the persister, if seen again will not have any new segments that actually need persisting to disk, and the cycle is stopped. 2. The implementation of NumSnapshotsToKeep, and related code to deleted old snapshots from the root bolt also contains problems. Specifically, the determination of which snapshots to keep vs delete did not consider which ones were actually persisted. So, lets say you had set NumSnapshotsToKeep to 3, if the introducer gets 3 snapshots ahead of the persister, what can happen is that the three snapshots we choose to keep are all in memory. We now wrongly delete all of the snapshots from the root bolt. But it gets worse, in this instant of time, we now have files on disk that nothing in the root bolt points to, so we also go ahead and delete those files. Those files were still being referenced by the in-memory snapshots. But, now even if they get persisted to disk, they simply have references to non-existent files. Opening up one of these indexes results in lost data (often everything). To address this problem, we made large change to the way this section of code operates. First, we now start with a list of all epochs actually persisted in the root bolt. Second, we set aside NumSnapshotsToKeep of these snapshots to keep. Third, anything else in the eligibleForRemoval list will be deleted. I suspect this code is slower and less elegant, but I think it is more correct. Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to 1, which feels like saner out-of-the-box behavior (though it's debatable if the original intent was perhaps instead for "extra" snapshots to keep, but with the variable named as it is, 1 makes more sense to me) Other minor changes included in this change: - Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and 'ineligibleForRemoval' members of Scorch struct were moved into the paragraph with 'rootLock' to clarify that you must hold the lock to access it. - TestBatchRaceBug260 was updated to properly Close() the index, which leads to occasional test failures.	2018-01-03 12:05:00 -05:00
Marty Schoch	a4acd53c54	try to make racey test safe	2017-12-14 07:53:26 -05:00
Seif Lotfy	06b4daed87	Add new IndexAdvanced function	2017-04-12 00:31:51 +02:00
Marty Schoch	8096d9fb90	remove use of float64 to represent int things this originated from a misunderstanding of mine going back several years. the values need not be float64 just because we plan to serialize them as json. there are still larger questions about what the right type should be, and where should any conversions go. but, this commit simply attempts to address the most egregious problems	2017-02-09 20:15:59 -05:00
Steve Yen	89a1cefde1	API change: optional SearchRequest.IncludeLocations flag This is a change in search result behavior in that location information is no longer provided by default with search results. Although this looks like a wide-ranging change, it's mostly a mechanical replacement of the explain bool flag with a new search.SearcherOptions struct, which holds both the Explain bool flag and the IncludeTermVectors bool flag.	2017-01-05 21:11:22 -08:00
Marty Schoch	2332455bd2	nicer formatting of license header	2016-10-02 10:13:14 -04:00
Marty Schoch	6bf9dd59ab	BREAKING CHANGE - additional package renaming i recently learned that package names should also prefer the singular form, not the plural form	2016-10-01 17:20:59 -04:00
Marty Schoch	35da361bfa	BREAKING CHANGE - renamed packages to be shorter and not use _ this commit only addresses the analysis sub-package	2016-09-30 12:36:10 -04:00
Marty Schoch	ee17941f7f	switch DateRangeQuery to use time.Time instead of string as we are a Go library is this the much more natural way to express such queries. support for strings is still supported through json marshal and unmarshal, as well as inside query string queries as before we use the package level QueryDateTimeParser to deterimine which date time parser to use for parsing only serializing out to json, we consult a new package variable: QueryDateTimeFormat this addresses the longstanding PR #255	2016-09-29 14:54:16 -04:00
Marty Schoch	a265218f76	heavier refactor of Query interface to simplify Boostable, Fieldable, Validatable broken out into separate interfaces. This allows them to be discoverable when needed, but ignorable otherwise. The top-level bleve package only every cares about Validatable and even that is optional. Also, this change goes further to make the structure names more reasonable, for cases where you're directly interacting with the structures.	2016-09-29 14:54:16 -04:00
Marty Schoch	9ec2ddd757	initial refactor of query into separate package	2016-09-29 14:54:16 -04:00
Marty Schoch	79cc39a67e	refactor mapping to inteface and move into separate package the index mapping contains some relatively messy logic and the top-level bleve package only cares about a relatively small portion of this the motivation for this change is to codify the part that the top-level bleve package cares about into an interface then move all the details into its own package NOTE: the top-level bleve package still has hard dependency on the actual implementation (for now) because it must deserialize mappings from JSON and simply assumes it is this one instance. this is seen as OK for now, and this issue could be revisited in a future change. moving the logic into a separate package is seen as a simplification of top-level bleve, even though we still depend on the one particular implementation.	2016-09-29 14:53:18 -04:00
Marty Schoch	fb0f4bbecd	BREAKING CHANGE - new method to create memory only index Previously bleve allowed you to create a memory-only index by simply passing "" as the path argument to the New() method. This was not clear when reading the code, and led to some problematic error cases as well. Now, to create a memory-only index one should use the NewMemOnly() method. Passing "" as the path argument to the New() method will now return os.ErrInvalid. Advanced users calling NewUsing() can create disk-based or memory-only indexes, but the change here is that pass "" as the path argument no longer defaults you into getting a memory-only index. Instead, the KV store is selected manually, just as it is for the disk-based solutions. Here is an example use of the NewUsing() method to create a memory-only index: NewUsing("", indexMapping, Config.DefaultIndexType, Config.DefaultMemKVStore, nil) Config.DefaultMemKVStore is just a new default value added to the configuration, it currently points to gtreap.Name (which could have been used directly instead for more control) closes #427	2016-09-27 14:11:40 -04:00
Marty Schoch	04fd62dec3	further tweaks, now all bleve tests pass	2016-09-11 20:29:15 -04:00
Marty Schoch	60efecc8e9	cap preallocation by the collector to reasonable value the collector has optimizations to avoid allocation and reslicing during the common case of searching for top hits however, in some cases users request an a very large number of search hits to be returned (attempting to get them all) this caused unnecessary allocation of ram. to address this we introduce a new constant PreAllocSizeSkipCap it defaults the value of 1000. if your search+skip is less than this constant, you get the optimized behavior. if your search+skip is greater than this, we cap the preallcations to this lower value. additional space is acquired on an as needed basis by growing the DocumentMatchPool and reslicing the collector backing slice applications can change the value of PreAllocSizeSkipCap to suit their own needs fixes #408	2016-08-31 15:25:17 -04:00
Marty Schoch	27f5c6ec92	expose simple string slice based sorting to top-level bleve this change means simple sort requirements no longer require importing the search package (high-level API goal) also the sort test at the top-level was changed to use this form	2016-08-17 14:49:06 -07:00
Marty Schoch	750e0ac16c	change sort field impl to use indexed values not stored values	2016-08-17 09:20:44 -07:00
Marty Schoch	0bb69a9a1c	Merge branch 'master' of https://github.com/dtylman/bleve into sort-by-field-try2	2016-08-12 14:23:55 -04:00
Danny Tylman	b585c5786b	removing mock data generation packages from unit-tests fixing wrong sort order on certain fields	2016-08-11 11:35:08 +03:00
Danny Tylman	5164e70f6e	Adding sort to SearchRequest.	2016-08-09 16:18:53 +03:00
Marty Schoch	aa3ae3d39c	enable read_only mode for boltdb indexes fixes #405	2016-08-06 10:47:34 -04:00
Marty Schoch	9089de251f	remove byte_array_conveters fixes #392 fixes #100	2016-07-01 10:21:41 -04:00
Marty Schoch	b8a2fbb887	fix data race in bleve batch reuse Currently bleve batch is build by user goroutine Then read by bleve gourinte This is still safe when used correctly However, Reset() will modify the map, which is now a data race This fix is to simply make batch.Reset() alloc new maps. This provides a data-access pattern that can be used safely. Also, this thread argues that creating a new map may be faster than trying to reuse an existing one: https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J Separate but related, I have opted to remove the "unsafe batch" checking that we did. This was always limited anyway, and now users of Go 1.6 are just as likely to get a panic from the runtime for concurrent map access anyway. So, the price paid by us (additional mutex) is not worth it. fixes #360 and #260	2016-04-08 15:32:13 -04:00
Marty Schoch	e00577f265	change registry cache implementation to allow concurrent use previously we just used a Go builtin map this was not safe for concurrent read/write and upon upgrading to Go 1.6 we were notified of the problem fixes #349	2016-03-07 16:05:34 -05:00
Marty Schoch	0b2380d9bf	introduce ability for searches to timeout or be cancelled our implementation uses: golang.org/x/net/context New method SearchInContext() allows the user to run a search in the provided context. If that context is cancelled or exceeds its deadline Bleve will attempt to stop and return as soon as possible. This is a best effort attempt at this time and may not be in a timely manner. If the caller must return very near the timeout, the call should also be wrapped in a goroutine. The IndexAlias implementation is affected in a slightly more complex way. In order to return partial results when a timeout occurs on some indexes, the timeout is strictly enforced, and at the moment this does introduce an additional goroutine. The Bleve implementation honoring the context is currently very course-grained. Specifically we check the Done() channel between each DocumentMatch produced during the search. In the future we will propogate the context deeper into the internals of Bleve, and this will allow finer-grained timeout behavior.	2016-03-02 17:30:21 -05:00
Marty Schoch	e523bf757e	test slow timer with different way to avoid windows 15ms timer	2016-02-09 15:48:08 -05:00
Marty Schoch	9a1e6e1905	fix some test failures on windows	2016-02-09 13:33:11 -05:00
Marty Schoch	2479ddef2e	fixed errcheck issues	2016-01-13 17:10:13 -05:00
slavikm	680be52f87	Implemented boolean field support	2016-01-11 17:18:03 -08:00
Marty Schoch	8efbd556a3	fix indexing bug with data coming from arrays fixes #295	2015-12-21 14:59:32 -05:00
Steve Yen	2d4cd7a696	go fmt index_text.go	2015-11-23 09:28:09 -08:00
Marty Schoch	97735ac2b6	set github issue number in testcase name	2015-11-23 08:41:34 -05:00
Mark Mindenhall	17d8391b2f	Fixes datetime mapping from JSON, using DateTimeFieldMapping	2015-11-20 19:15:35 -07:00
Patrick Mezard	498e4a0de7	simplify FieldMapping.analyzerForField() I stumbled onto that while trying to understand how analyzers are resolved. The new code looks simpler to me and removes useless calls to DocumentMapping.defaultAnalyzerName() when an analyzer is set at FieldMapping level. The slight change to TestStoredFieldPreserved avoids a stacktrace when the test fails.	2015-10-02 15:45:43 +02:00
Patrick Mezard	7db27aeba1	implement document static mappings DocumentMapping.Dynamic was ignored by everything but the marshalling code. Issue #235	2015-09-29 11:32:36 +02:00
Marty Schoch	cddf90c0ee	don't allow operations on empty doc id fixes #239	2015-09-28 17:00:08 -04:00
Marty Schoch	f81b2be334	major refactor of bleve configuration see #221 for full details	2015-09-16 17:10:59 -04:00
Marty Schoch	3682c25467	update to correctly work with composite fields also updated search results to return array positions	2015-07-31 11:16:11 -04:00
Marty Schoch	2af47cea75	fix query string query syntax when term starts with a number fixes #207	2015-05-21 15:43:13 -04:00
Marty Schoch	5b8c9f2d73	added unit test for bug #207	2015-05-21 07:49:41 -04:00
Marty Schoch	8f70def63b	properly use the stored array positions when loading a document fixes #205	2015-05-15 15:47:54 -04:00
Marty Schoch	328bc73ed0	clarify Batch is not threadsafe in docs in some limited cases we can detect unsafe usage in these cases, do not trip over ourselves and panic instead return a strongly typed error upside_down.UnsafeBatchUseDetected also, introduced Batch.Reset() to allow batch reuse this is currently still experimental closes #195	2015-05-15 15:04:52 -04:00
Marty Schoch	7b871fde6a	add test comparing search that matches everyting with doc count	2015-05-09 14:51:07 -04:00
Marty Schoch	6f28f3e5bd	check error returned in test to make errcheck happy	2015-05-08 08:40:46 -04:00
Marty Schoch	57cd67fa88	fix data race on index metadata (docCount) closes #198	2015-05-08 08:07:20 -04:00
Marty Schoch	056d74901e	fix test to actually work reliably	2015-04-08 11:17:34 -04:00
Marty Schoch	8581e73cef	added String method for Batch also changed Batch methods to pointer receiver closes #180	2015-04-08 10:41:42 -04:00
Marty Schoch	539aeb8dc7	fix errors identified by errcheck part of #169	2015-04-07 18:05:41 -04:00
Marty Schoch	56c4a09de1	fix issues identified by errcheck part of #169	2015-04-07 15:39:56 -04:00
Marty Schoch	93e01a803e	fix issues identified by errcheck part of #169	2015-04-07 14:52:00 -04:00

1 2

81 Commits