0
0
Fork 0
Commit Graph

86 Commits

Author SHA1 Message Date
abhinavdangeti 40f63baeb9 MB-28562: Support search query callbacks before and after execution
+ SearchQueryStartCallback
+ SearchQueryEndCallback
2018-03-08 13:35:51 -08:00
abhinavdangeti 96071c085c MB-28163: Register a callback with context to estimate RAM for search
This callback if registered with context will invoke the api to estimate
the memory needed to execute a search query. The callback defined at
the client side will be responsible for determining whether to
continue with the search or abort based on the threshold settings.
2018-03-06 13:53:42 -08:00
abhinavdangeti 7e36109b3c MB-28162: Provide API to estimate memory needed to run a search query
This API (unexported) will estimate the amount of memory needed to execute
a search query over an index before the collector begins data collection.

Sample estimates for certain queries:
{Size: 10, BenchmarkUpsidedownSearchOverhead}
                                                           ESTIMATE    BENCHMEM
TermQuery                                                  4616        4796
MatchQuery                                                 5210        5405
DisjunctionQuery (Match queries)                           7700        8447
DisjunctionQuery (Term queries)                            6514        6591
ConjunctionQuery (Match queries)                           7524        8175
Nested disjunction query (disjunction of disjunctions)     10306       10708
…
2018-03-06 13:53:42 -08:00
Marty Schoch 8063132766 fix new issues found by go vet when using stdlib context pkg 2018-02-27 11:57:21 -08:00
Marty Schoch c74e08f039 BREAKING API CHANGE - use stdlib context pkg
update all references to context to use std lib pkg
2018-02-27 11:33:43 -08:00
Marty Schoch 1a59a1bb99 attempt to fix core reference counting issues
Observed problem:

Persisted index state (in root bolt) would contain index snapshots which
pointed to index files that did not exist.

Debugging this uncovered two main problems:

1.  At the end of persisting a snapshot, the persister creates a new index
snapshot with the SAME epoch as the current root, only it replaces in-memory
segments with the new disk based ones.  This is problematic because reference
counting an index segment triggers "eligible for deletion".  And eligible for
deletion is keyed by epoch.  So having two separate instances going by the same
epoch is problematic.  Specifically, one of them gets to 0 before the other,
and we wrongly conclude it's eligible for deletion, when in fact the "other"
instance with same epoch is actually still in use.

To address this problem, we have modified the behavior of the persister.  Now,
upon completion of persistence, ONLY if new files were actually created do we
proceed to introduce a new snapshot.  AND, this new snapshot now gets it's own
brand new epoch.  BOTH of these are important because since the persister now
also introduces a new epoch, it will see this epoch again in the future AND be
expected to persist it.  That is OK (mostly harmless), but we cannot allow it
to form a loop.  Checking that new files were actually introduced is what
short-circuits the potential loop.  The new epoch introduced by the persister,
if seen again will not have any new segments that actually need persisting to
disk, and the cycle is stopped.

2.  The implementation of NumSnapshotsToKeep, and related code to deleted old
snapshots from the root bolt also contains problems.  Specifically, the
determination of which snapshots to keep vs delete did not consider which ones
were actually persisted.  So, lets say you had set NumSnapshotsToKeep to 3, if
the introducer gets 3 snapshots ahead of the persister, what can happen is that
the three snapshots we choose to keep are all in memory.  We now wrongly delete
all of the snapshots from the root bolt.  But it gets worse, in this instant of
time, we now have files on disk that nothing in the root bolt points to, so we
also go ahead and delete those files.  Those files were still being referenced
by the in-memory snapshots.  But, now even if they get persisted to disk, they
simply have references to non-existent files.  Opening up one of these indexes
results in lost data (often everything).

To address this problem, we made large change to the way this section of code
operates.  First, we now start with a list of all epochs actually persisted in
the root bolt.  Second, we set aside NumSnapshotsToKeep of these snapshots to
keep.  Third, anything else in the eligibleForRemoval list will be deleted.  I
suspect this code is slower and less elegant, but I think it is more correct.
Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to
1, which feels like saner out-of-the-box behavior (though it's debatable if the
original intent was perhaps instead for "extra" snapshots to keep, but with the
variable named as it is, 1 makes more sense to me)

Other minor changes included in this change:

- Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and
'ineligibleForRemoval' members of Scorch struct were moved into the
paragraph with 'rootLock' to clarify that you must hold the lock to access it.

- TestBatchRaceBug260 was updated to properly Close() the index, which leads to
occasional test failures.
2018-01-03 12:05:00 -05:00
Marty Schoch a4acd53c54 try to make racey test safe 2017-12-14 07:53:26 -05:00
Seif Lotfy 06b4daed87 Add new IndexAdvanced function 2017-04-12 00:31:51 +02:00
Marty Schoch 8096d9fb90 remove use of float64 to represent int things
this originated from a misunderstanding of mine going back
several years.  the values need not be float64 just because
we plan to serialize them as json.

there are still larger questions about what the right type should
be, and where should any conversions go.  but, this commit
simply attempts to address the most egregious problems
2017-02-09 20:15:59 -05:00
Steve Yen 89a1cefde1 API change: optional SearchRequest.IncludeLocations flag
This is a change in search result behavior in that location
information is no longer provided by default with search results.

Although this looks like a wide-ranging change, it's mostly a
mechanical replacement of the explain bool flag with a new
search.SearcherOptions struct, which holds both the Explain bool flag
and the IncludeTermVectors bool flag.
2017-01-05 21:11:22 -08:00
Marty Schoch 2332455bd2 nicer formatting of license header 2016-10-02 10:13:14 -04:00
Marty Schoch 6bf9dd59ab BREAKING CHANGE - additional package renaming
i recently learned that package names should also prefer the
singular form, not the plural form
2016-10-01 17:20:59 -04:00
Marty Schoch 35da361bfa BREAKING CHANGE - renamed packages to be shorter and not use _
this commit only addresses the analysis sub-package
2016-09-30 12:36:10 -04:00
Marty Schoch ee17941f7f switch DateRangeQuery to use time.Time instead of string
as we are a Go library is this the much more natural way to
express such queries.

support for strings is still supported through json marshal
and unmarshal, as well as inside query string queries

as before we use the package level QueryDateTimeParser to
deterimine which date time parser to use for parsing

only serializing out to json, we consult a new package
variable: QueryDateTimeFormat

this addresses the longstanding PR #255
2016-09-29 14:54:16 -04:00
Marty Schoch a265218f76 heavier refactor of Query interface to simplify
Boostable, Fieldable, Validatable broken out into separate
interfaces.  This allows them to be discoverable when
needed, but ignorable otherwise.  The top-level bleve package
only every cares about Validatable and even that is optional.

Also, this change goes further to make the structure names
more reasonable, for cases where you're directly interacting
with the structures.
2016-09-29 14:54:16 -04:00
Marty Schoch 9ec2ddd757 initial refactor of query into separate package 2016-09-29 14:54:16 -04:00
Marty Schoch 79cc39a67e refactor mapping to inteface and move into separate package
the index mapping contains some relatively messy logic
and the top-level bleve package only cares about a relatively
small portion of this
the motivation for this change is to codify the part that the
top-level bleve package cares about into an interface
then move all the details into its own package

NOTE: the top-level bleve package still has hard dependency on
the actual implementation (for now) because it must deserialize
mappings from JSON and simply assumes it is this one instance.
this is seen as OK for now, and this issue could be revisited
in a future change.  moving the logic into a separate package
is seen as a simplification of top-level bleve, even though
we still depend on the one particular implementation.
2016-09-29 14:53:18 -04:00
Marty Schoch fb0f4bbecd BREAKING CHANGE - new method to create memory only index
Previously bleve allowed you to create a memory-only index by
simply passing "" as the path argument to the New() method.

This was not clear when reading the code, and led to some
problematic error cases as well.

Now, to create a memory-only index one should use the
NewMemOnly() method.  Passing "" as the path argument
to the New() method will now return os.ErrInvalid.

Advanced users calling NewUsing() can create disk-based or
memory-only indexes, but the change here is that pass ""
as the path argument no longer defaults you into getting
a memory-only index.  Instead, the KV store is selected
manually, just as it is for the disk-based solutions.

Here is an example use of the NewUsing() method to create
a memory-only index:

NewUsing("", indexMapping, Config.DefaultIndexType,
         Config.DefaultMemKVStore, nil)

Config.DefaultMemKVStore is just a new default value
added to the configuration, it currently points to
gtreap.Name (which could have been used directly
instead for more control)

closes #427
2016-09-27 14:11:40 -04:00
Marty Schoch 04fd62dec3 further tweaks, now all bleve tests pass 2016-09-11 20:29:15 -04:00
Marty Schoch 60efecc8e9 cap preallocation by the collector to reasonable value
the collector has optimizations to avoid allocation and reslicing
during the common case of searching for top hits

however, in some cases users request an a very large number of
search hits to be returned (attempting to get them all)  this
caused unnecessary allocation of ram.

to address this we introduce a new constant PreAllocSizeSkipCap
it defaults the value of 1000.  if your search+skip is less than
this constant, you get the optimized behavior.  if your
search+skip is greater than this, we cap the preallcations to
this lower value.  additional space is acquired on an as needed
basis by growing the DocumentMatchPool and reslicing the
collector backing slice

applications can change the value of PreAllocSizeSkipCap to suit
their own needs

fixes #408
2016-08-31 15:25:17 -04:00
Marty Schoch 27f5c6ec92 expose simple string slice based sorting to top-level bleve
this change means simple sort requirements no longer require
importing the search package (high-level API goal)

also the sort test at the top-level was changed to use this form
2016-08-17 14:49:06 -07:00
Marty Schoch 750e0ac16c change sort field impl to use indexed values not stored values 2016-08-17 09:20:44 -07:00
Marty Schoch 0bb69a9a1c Merge branch 'master' of https://github.com/dtylman/bleve into sort-by-field-try2 2016-08-12 14:23:55 -04:00
Danny Tylman b585c5786b removing mock data generation packages from unit-tests
fixing wrong sort order on certain fields
2016-08-11 11:35:08 +03:00
Danny Tylman 5164e70f6e Adding sort to SearchRequest. 2016-08-09 16:18:53 +03:00
Marty Schoch aa3ae3d39c enable read_only mode for boltdb indexes
fixes #405
2016-08-06 10:47:34 -04:00
Marty Schoch 9089de251f remove byte_array_conveters
fixes #392
fixes #100
2016-07-01 10:21:41 -04:00
Marty Schoch b8a2fbb887 fix data race in bleve batch reuse
Currently bleve batch is build by user goroutine
Then read by bleve gourinte
This is still safe when used correctly
However, Reset() will modify the map, which is now a data race

This fix is to simply make batch.Reset() alloc new maps.
This provides a data-access pattern that can be used safely.
Also, this thread argues that creating a new map may be faster
than trying to reuse an existing one:

https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J

Separate but related, I have opted to remove the "unsafe batch"
checking that we did.  This was always limited anyway, and now
users of Go 1.6 are just as likely to get a panic from the
runtime for concurrent map access anyway.  So, the price paid
by us (additional mutex) is not worth it.

fixes #360 and #260
2016-04-08 15:32:13 -04:00
Marty Schoch e00577f265 change registry cache implementation to allow concurrent use
previously we just used a Go builtin map
this was not safe for concurrent read/write and upon upgrading
to Go 1.6 we were notified of the problem

fixes #349
2016-03-07 16:05:34 -05:00
Marty Schoch 0b2380d9bf introduce ability for searches to timeout or be cancelled
our implementation uses: golang.org/x/net/context

New method SearchInContext() allows the user to run a search
in the provided context.  If that context is cancelled or
exceeds its deadline Bleve will attempt to stop and return
as soon as possible.  This is a *best effort* attempt at this
time and may *not* be in a timely manner.  If the caller must
return very near the timeout, the call should also be wrapped
in a goroutine.

The IndexAlias implementation is affected in a slightly more
complex way.  In order to return partial results when a timeout
occurs on some indexes, the timeout is strictly enforced, and
at the moment this does introduce an additional goroutine.

The Bleve implementation honoring the context is currently
very course-grained.  Specifically we check the Done() channel
between each DocumentMatch produced during the search.  In the
future we will propogate the context deeper into the internals
of Bleve, and this will allow finer-grained timeout behavior.
2016-03-02 17:30:21 -05:00
Marty Schoch e523bf757e test slow timer with different way to avoid windows 15ms timer 2016-02-09 15:48:08 -05:00
Marty Schoch 9a1e6e1905 fix some test failures on windows 2016-02-09 13:33:11 -05:00
Marty Schoch 2479ddef2e fixed errcheck issues 2016-01-13 17:10:13 -05:00
slavikm 680be52f87 Implemented boolean field support 2016-01-11 17:18:03 -08:00
Marty Schoch 8efbd556a3 fix indexing bug with data coming from arrays
fixes #295
2015-12-21 14:59:32 -05:00
Steve Yen 2d4cd7a696 go fmt index_text.go 2015-11-23 09:28:09 -08:00
Marty Schoch 97735ac2b6 set github issue number in testcase name 2015-11-23 08:41:34 -05:00
Mark Mindenhall 17d8391b2f Fixes datetime mapping from JSON, using DateTimeFieldMapping 2015-11-20 19:15:35 -07:00
Patrick Mezard 498e4a0de7 simplify FieldMapping.analyzerForField()
I stumbled onto that while trying to understand how analyzers are
resolved. The new code looks simpler to me and removes useless calls to
DocumentMapping.defaultAnalyzerName() when an analyzer is set at
FieldMapping level.

The slight change to TestStoredFieldPreserved avoids a stacktrace when
the test fails.
2015-10-02 15:45:43 +02:00
Patrick Mezard 7db27aeba1 implement document static mappings
DocumentMapping.Dynamic was ignored by everything but the marshalling
code.

Issue #235
2015-09-29 11:32:36 +02:00
Marty Schoch cddf90c0ee don't allow operations on empty doc id
fixes #239
2015-09-28 17:00:08 -04:00
Marty Schoch f81b2be334 major refactor of bleve configuration
see #221 for full details
2015-09-16 17:10:59 -04:00
Marty Schoch 3682c25467 update to correctly work with composite fields
also updated search results to return array positions
2015-07-31 11:16:11 -04:00
Marty Schoch 2af47cea75 fix query string query syntax when term starts with a number
fixes #207
2015-05-21 15:43:13 -04:00
Marty Schoch 5b8c9f2d73 added unit test for bug #207 2015-05-21 07:49:41 -04:00
Marty Schoch 8f70def63b properly use the stored array positions when loading a document
fixes #205
2015-05-15 15:47:54 -04:00
Marty Schoch 328bc73ed0 clarify Batch is not threadsafe in docs
in some limited cases we can detect unsafe usage
in these cases, do not trip over ourselves and panic
instead return a strongly typed error upside_down.UnsafeBatchUseDetected
also, introduced Batch.Reset() to allow batch reuse
this is currently still experimental
closes #195
2015-05-15 15:04:52 -04:00
Marty Schoch 7b871fde6a add test comparing search that matches everyting with doc count 2015-05-09 14:51:07 -04:00
Marty Schoch 6f28f3e5bd check error returned in test to make errcheck happy 2015-05-08 08:40:46 -04:00
Marty Schoch 57cd67fa88 fix data race on index metadata (docCount)
closes #198
2015-05-08 08:07:20 -04:00