Observed problem:
Persisted index state (in root bolt) would contain index snapshots which
pointed to index files that did not exist.
Debugging this uncovered two main problems:
1. At the end of persisting a snapshot, the persister creates a new index
snapshot with the SAME epoch as the current root, only it replaces in-memory
segments with the new disk based ones. This is problematic because reference
counting an index segment triggers "eligible for deletion". And eligible for
deletion is keyed by epoch. So having two separate instances going by the same
epoch is problematic. Specifically, one of them gets to 0 before the other,
and we wrongly conclude it's eligible for deletion, when in fact the "other"
instance with same epoch is actually still in use.
To address this problem, we have modified the behavior of the persister. Now,
upon completion of persistence, ONLY if new files were actually created do we
proceed to introduce a new snapshot. AND, this new snapshot now gets it's own
brand new epoch. BOTH of these are important because since the persister now
also introduces a new epoch, it will see this epoch again in the future AND be
expected to persist it. That is OK (mostly harmless), but we cannot allow it
to form a loop. Checking that new files were actually introduced is what
short-circuits the potential loop. The new epoch introduced by the persister,
if seen again will not have any new segments that actually need persisting to
disk, and the cycle is stopped.
2. The implementation of NumSnapshotsToKeep, and related code to deleted old
snapshots from the root bolt also contains problems. Specifically, the
determination of which snapshots to keep vs delete did not consider which ones
were actually persisted. So, lets say you had set NumSnapshotsToKeep to 3, if
the introducer gets 3 snapshots ahead of the persister, what can happen is that
the three snapshots we choose to keep are all in memory. We now wrongly delete
all of the snapshots from the root bolt. But it gets worse, in this instant of
time, we now have files on disk that nothing in the root bolt points to, so we
also go ahead and delete those files. Those files were still being referenced
by the in-memory snapshots. But, now even if they get persisted to disk, they
simply have references to non-existent files. Opening up one of these indexes
results in lost data (often everything).
To address this problem, we made large change to the way this section of code
operates. First, we now start with a list of all epochs actually persisted in
the root bolt. Second, we set aside NumSnapshotsToKeep of these snapshots to
keep. Third, anything else in the eligibleForRemoval list will be deleted. I
suspect this code is slower and less elegant, but I think it is more correct.
Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to
1, which feels like saner out-of-the-box behavior (though it's debatable if the
original intent was perhaps instead for "extra" snapshots to keep, but with the
variable named as it is, 1 makes more sense to me)
Other minor changes included in this change:
- Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and
'ineligibleForRemoval' members of Scorch struct were moved into the
paragraph with 'rootLock' to clarify that you must hold the lock to access it.
- TestBatchRaceBug260 was updated to properly Close() the index, which leads to
occasional test failures.
this originated from a misunderstanding of mine going back
several years. the values need not be float64 just because
we plan to serialize them as json.
there are still larger questions about what the right type should
be, and where should any conversions go. but, this commit
simply attempts to address the most egregious problems
This is a change in search result behavior in that location
information is no longer provided by default with search results.
Although this looks like a wide-ranging change, it's mostly a
mechanical replacement of the explain bool flag with a new
search.SearcherOptions struct, which holds both the Explain bool flag
and the IncludeTermVectors bool flag.
as we are a Go library is this the much more natural way to
express such queries.
support for strings is still supported through json marshal
and unmarshal, as well as inside query string queries
as before we use the package level QueryDateTimeParser to
deterimine which date time parser to use for parsing
only serializing out to json, we consult a new package
variable: QueryDateTimeFormat
this addresses the longstanding PR #255
Boostable, Fieldable, Validatable broken out into separate
interfaces. This allows them to be discoverable when
needed, but ignorable otherwise. The top-level bleve package
only every cares about Validatable and even that is optional.
Also, this change goes further to make the structure names
more reasonable, for cases where you're directly interacting
with the structures.
the index mapping contains some relatively messy logic
and the top-level bleve package only cares about a relatively
small portion of this
the motivation for this change is to codify the part that the
top-level bleve package cares about into an interface
then move all the details into its own package
NOTE: the top-level bleve package still has hard dependency on
the actual implementation (for now) because it must deserialize
mappings from JSON and simply assumes it is this one instance.
this is seen as OK for now, and this issue could be revisited
in a future change. moving the logic into a separate package
is seen as a simplification of top-level bleve, even though
we still depend on the one particular implementation.
Previously bleve allowed you to create a memory-only index by
simply passing "" as the path argument to the New() method.
This was not clear when reading the code, and led to some
problematic error cases as well.
Now, to create a memory-only index one should use the
NewMemOnly() method. Passing "" as the path argument
to the New() method will now return os.ErrInvalid.
Advanced users calling NewUsing() can create disk-based or
memory-only indexes, but the change here is that pass ""
as the path argument no longer defaults you into getting
a memory-only index. Instead, the KV store is selected
manually, just as it is for the disk-based solutions.
Here is an example use of the NewUsing() method to create
a memory-only index:
NewUsing("", indexMapping, Config.DefaultIndexType,
Config.DefaultMemKVStore, nil)
Config.DefaultMemKVStore is just a new default value
added to the configuration, it currently points to
gtreap.Name (which could have been used directly
instead for more control)
closes#427
the collector has optimizations to avoid allocation and reslicing
during the common case of searching for top hits
however, in some cases users request an a very large number of
search hits to be returned (attempting to get them all) this
caused unnecessary allocation of ram.
to address this we introduce a new constant PreAllocSizeSkipCap
it defaults the value of 1000. if your search+skip is less than
this constant, you get the optimized behavior. if your
search+skip is greater than this, we cap the preallcations to
this lower value. additional space is acquired on an as needed
basis by growing the DocumentMatchPool and reslicing the
collector backing slice
applications can change the value of PreAllocSizeSkipCap to suit
their own needs
fixes#408
this change means simple sort requirements no longer require
importing the search package (high-level API goal)
also the sort test at the top-level was changed to use this form
Currently bleve batch is build by user goroutine
Then read by bleve gourinte
This is still safe when used correctly
However, Reset() will modify the map, which is now a data race
This fix is to simply make batch.Reset() alloc new maps.
This provides a data-access pattern that can be used safely.
Also, this thread argues that creating a new map may be faster
than trying to reuse an existing one:
https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J
Separate but related, I have opted to remove the "unsafe batch"
checking that we did. This was always limited anyway, and now
users of Go 1.6 are just as likely to get a panic from the
runtime for concurrent map access anyway. So, the price paid
by us (additional mutex) is not worth it.
fixes#360 and #260
previously we just used a Go builtin map
this was not safe for concurrent read/write and upon upgrading
to Go 1.6 we were notified of the problem
fixes#349
our implementation uses: golang.org/x/net/context
New method SearchInContext() allows the user to run a search
in the provided context. If that context is cancelled or
exceeds its deadline Bleve will attempt to stop and return
as soon as possible. This is a *best effort* attempt at this
time and may *not* be in a timely manner. If the caller must
return very near the timeout, the call should also be wrapped
in a goroutine.
The IndexAlias implementation is affected in a slightly more
complex way. In order to return partial results when a timeout
occurs on some indexes, the timeout is strictly enforced, and
at the moment this does introduce an additional goroutine.
The Bleve implementation honoring the context is currently
very course-grained. Specifically we check the Done() channel
between each DocumentMatch produced during the search. In the
future we will propogate the context deeper into the internals
of Bleve, and this will allow finer-grained timeout behavior.
I stumbled onto that while trying to understand how analyzers are
resolved. The new code looks simpler to me and removes useless calls to
DocumentMapping.defaultAnalyzerName() when an analyzer is set at
FieldMapping level.
The slight change to TestStoredFieldPreserved avoids a stacktrace when
the test fails.
in some limited cases we can detect unsafe usage
in these cases, do not trip over ourselves and panic
instead return a strongly typed error upside_down.UnsafeBatchUseDetected
also, introduced Batch.Reset() to allow batch reuse
this is currently still experimental
closes#195