0
0
Fork 0
Commit Graph

1356 Commits

Author SHA1 Message Date
Marty Schoch 8df8d4e797 fix geo point distance search
there was a bug where if the circle described by the point
distance query crossed the poles, then we incorrectly built
a box around it.  this resulted in incorrect searh results.
2017-04-27 17:28:07 -04:00
Marty Schoch 92c5f3e2e6 Merge pull request #584 from mschoch/more-collector-benchmarks
topn collector switch approach based on size+skip
2017-04-27 09:27:08 -04:00
Marty Schoch a4a34cc3b2 topn collector switch approach based on size+skip
we now use the slice store when size+skip <= 10
and use the heap store when size+skip > 10

here are the new perf numbers:

go test -run=xxx -bench=. -benchmem
BenchmarkTop10of0Scores-4            	 1000000	      1150 ns/op	    2304 B/op	      15 allocs/op
BenchmarkTop10of3Scores-4            	 1000000	      1417 ns/op	    2304 B/op	      18 allocs/op
BenchmarkTop10of10Scores-4           	 1000000	      2133 ns/op	    2312 B/op	      25 allocs/op
BenchmarkTop10of25Scores-4           	  500000	      3410 ns/op	    2464 B/op	      26 allocs/op
BenchmarkTop10of50Scores-4           	  300000	      5174 ns/op	    2464 B/op	      26 allocs/op
BenchmarkTop10of10000Scores-4        	    5000	    342955 ns/op	    2488 B/op	      26 allocs/op
BenchmarkTop100of0Scores-4           	  300000	      4796 ns/op	   18320 B/op	      15 allocs/op
BenchmarkTop100of3Scores-4           	  300000	      5160 ns/op	   18352 B/op	      19 allocs/op
BenchmarkTop100of10Scores-4          	  200000	      6354 ns/op	   18408 B/op	      26 allocs/op
BenchmarkTop100of25Scores-4          	  200000	     10023 ns/op	   18568 B/op	      41 allocs/op
BenchmarkTop100of50Scores-4          	  100000	     16821 ns/op	   18832 B/op	      66 allocs/op
BenchmarkTop100of10000Scores-4       	    3000	    508989 ns/op	   19760 B/op	     117 allocs/op
BenchmarkTop1000of10000Scores-4      	    1000	   1814198 ns/op	  184768 B/op	    1017 allocs/op
BenchmarkTop10000of100000Scores-4    	      50	  26623920 ns/op	 1939592 B/op	   19024 allocs/op
BenchmarkTop10of100000Scores-4       	     500	   3730204 ns/op	    2496 B/op	      26 allocs/op
BenchmarkTop100of100000Scores-4      	     300	   4057127 ns/op	   19912 B/op	     117 allocs/op
BenchmarkTop1000of100000Scores-4     	     200	   6390180 ns/op	  186200 B/op	    1017 allocs/op
BenchmarkTop10000of1000000Scores-4   	      20	  82785756 ns/op	 1963897 B/op	   19024 allocs/op
PASS
ok  	github.com/blevesearch/bleve/search/collector	31.537s

Previously with heap:

go test -run=xxx -bench=. -benchmem
BenchmarkTop10of0Scores-4            	 1000000	      1216 ns/op	    2288 B/op	      15 allocs/op
BenchmarkTop10of3Scores-4            	 1000000	      1593 ns/op	    2320 B/op	      19 allocs/op
BenchmarkTop10of10Scores-4           	  500000	      2734 ns/op	    2376 B/op	      26 allocs/op
BenchmarkTop10of25Scores-4           	  300000	      5077 ns/op	    2520 B/op	      27 allocs/op
BenchmarkTop10of50Scores-4           	  200000	      6875 ns/op	    2528 B/op	      27 allocs/op
BenchmarkTop10of10000Scores-4        	    3000	    351210 ns/op	    2552 B/op	      27 allocs/op
BenchmarkTop100of0Scores-4           	  300000	      4846 ns/op	   18304 B/op	      15 allocs/op
BenchmarkTop100of3Scores-4           	  300000	      5357 ns/op	   18336 B/op	      19 allocs/op
BenchmarkTop100of10Scores-4          	  200000	      6462 ns/op	   18392 B/op	      26 allocs/op
BenchmarkTop100of25Scores-4          	  200000	     10012 ns/op	   18552 B/op	      41 allocs/op
BenchmarkTop100of50Scores-4          	  100000	     17089 ns/op	   18816 B/op	      66 allocs/op
BenchmarkTop100of10000Scores-4       	    3000	    528193 ns/op	   19744 B/op	     117 allocs/op
BenchmarkTop1000of10000Scores-4      	    1000	   1859447 ns/op	  184752 B/op	    1017 allocs/op
BenchmarkTop10000of100000Scores-4    	      50	  28005664 ns/op	 1939576 B/op	   19024 allocs/op
BenchmarkTop10of100000Scores-4       	     300	   4120091 ns/op	    2560 B/op	      27 allocs/op
BenchmarkTop100of100000Scores-4      	     300	   4325227 ns/op	   19896 B/op	     117 allocs/op
BenchmarkTop1000of100000Scores-4     	     200	   6799804 ns/op	  186184 B/op	    1017 allocs/op
BenchmarkTop10000of1000000Scores-4   	      20	  88494230 ns/op	 1963881 B/op	   19024 allocs/op
PASS
ok  	github.com/blevesearch/bleve/search/collector	30.198s

Previously with slice:

go test -run=xxx -bench=. -benchmem
BenchmarkTop10of0Scores-4            	 1000000	      1202 ns/op	    2288 B/op	      15 allocs/op
BenchmarkTop10of3Scores-4            	 1000000	      1453 ns/op	    2288 B/op	      18 allocs/op
BenchmarkTop10of10Scores-4           	 1000000	      2162 ns/op	    2296 B/op	      25 allocs/op
BenchmarkTop10of25Scores-4           	  500000	      3420 ns/op	    2448 B/op	      26 allocs/op
BenchmarkTop10of50Scores-4           	  300000	      5336 ns/op	    2448 B/op	      26 allocs/op
BenchmarkTop10of10000Scores-4        	    5000	    356733 ns/op	    2472 B/op	      26 allocs/op
BenchmarkTop100of0Scores-4           	  300000	      4877 ns/op	   18304 B/op	      15 allocs/op
BenchmarkTop100of3Scores-4           	  300000	      5132 ns/op	   18304 B/op	      18 allocs/op
BenchmarkTop100of10Scores-4          	  200000	      5787 ns/op	   18312 B/op	      25 allocs/op
BenchmarkTop100of25Scores-4          	  200000	      8083 ns/op	   18344 B/op	      40 allocs/op
BenchmarkTop100of50Scores-4          	  100000	     14419 ns/op	   18400 B/op	      65 allocs/op
BenchmarkTop100of10000Scores-4       	    2000	    665401 ns/op	   18848 B/op	     116 allocs/op
BenchmarkTop1000of10000Scores-4      	     100	  15417063 ns/op	  176560 B/op	    1016 allocs/op
BenchmarkTop10000of100000Scores-4    	       1	1860011022 ns/op	 1857960 B/op	   19023 allocs/op
BenchmarkTop10of100000Scores-4       	     300	   4099276 ns/op	    2480 B/op	      26 allocs/op
BenchmarkTop100of100000Scores-4      	     300	   4533645 ns/op	   18984 B/op	     116 allocs/op
BenchmarkTop1000of100000Scores-4     	      50	  30519235 ns/op	  178008 B/op	    1016 allocs/op
BenchmarkTop10000of1000000Scores-4   	       1	3483977385 ns/op	 1882072 B/op	   19023 allocs/op
PASS
ok  	github.com/blevesearch/bleve/search/collector	31.666s

It appears that this sucessfully gets the best of both, in these particular benchmark sizes.
2017-04-27 08:57:13 -04:00
Marty Schoch 17e21be71a Merge pull request #578 from seiflotfy/index-advanced
Add new IndexAdvanced function
2017-04-19 09:14:44 -04:00
Marty Schoch d855acf7fb Merge pull request #581 from mschoch/more-collector-benchmarks
add more collector benchmarks
2017-04-18 21:35:32 -04:00
Marty Schoch 5b9e11ee5f add more collector benchmarks 2017-04-18 17:24:50 -04:00
Seif Lotfy 06b4daed87 Add new IndexAdvanced function 2017-04-12 00:31:51 +02:00
Marty Schoch 0b1034dcbe Merge pull request #576 from mschoch/fix-multisearch-sort-state
fix race condition in incorrectly shared state in MultiSearch
2017-04-06 18:05:36 -04:00
Marty Schoch a78e632bd6 fix race condition in incorrectly shared state in MultiSearch
When performing a MultiSearch, we create child SearchRequests
from the original SearchRequest.  In doing so we copy many fields.
But, copying of the SortOrder was incorrect, as this contains
state, and distint SortOrder objects must be used.  This change
introduces a Copy() method to the SearchSort interface, and
to the SortOrder types.  MultiSearch now creates a new copy of
the SortOrder for each child request.
2017-04-06 17:49:33 -04:00
Marty Schoch 957812369d Merge pull request #572 from mschoch/fix-geo-fts
add option for multi term searcher to skip max disjunction check
2017-04-04 10:58:35 -04:00
Marty Schoch 6f62489f21 add option for multi term searcher to skip max disjunction check
- geo searches now use this option and skip the check
- export ComputeGeoTerms for geo debug visualizations
2017-04-04 10:46:57 -04:00
Marty Schoch 7dd52a69d2 Merge pull request #566 from mschoch/term-range
introduce new query TermRange
2017-03-31 22:09:51 -04:00
Marty Schoch 1eba5541f2 introduce new query TermRange
The term range query is not often used in full-text queries, but
can be useful when filtering on keyword indexed text terms in
the index.

The JSON syntax to do a TermRange query is the same as for
NumericRange, but the min/max values must be string and not
float64.
2017-03-31 22:04:00 -04:00
Marty Schoch 4d00d863af Merge pull request #565 from mschoch/refactor-searchers
refactor searchers
2017-03-31 17:27:54 -04:00
Marty Schoch f8fdfebb6c refactor searchers
- TermSearcher has alternate constructor if term is []byte, this can avoid
  copying in some cases.  TermScorer updated to accept []byte term. Also
  removed a few struct fields which were not being used.

- New MultiTermSearcher searches for documents containing any of a list of
  terms.  Current implementation simply uses DisjunctionSearcher.

- Several other searcher constructors now simply build a list of terms and
  then delegate to the MultiTermSearcher
  - NewPrefixSearcher
  - NewRegexpSearcher
  - NewFuzzySearcher
  - NewNumericRangeSearcher

- NewGeoBoundingBoxSearcher and NewGeoPointDistanceSearcher make use of
  the MultiTermSearcher internally, and follow the pattern of returning
  an existing search.Searcher, as opposed to their own wrapping struct.

- Callback filter functions used in NewGeoBoundingBoxSearcher and
  NewGeoPointDistanceSearcher have been extracted into separate functions
  which makes the code much easier to read.
2017-03-31 17:21:46 -04:00
Marty Schoch 0d41e80b66 Merge pull request #563 from mschoch/fix-geo-stored
fix geopoint fields to be able to be stored and retrieved
2017-03-31 10:12:34 -04:00
Marty Schoch 3ad13236ec fix geopoint fields to be able to be stored and retrieved 2017-03-31 09:40:54 -04:00
Marty Schoch 647693f1b1 Merge pull request #562 from mschoch/geo-sreekanth
geo review comments from sreekanth
2017-03-31 08:49:15 -04:00
Marty Schoch 6554e9624f geo review comments from sreekanth
also one fix came from steve, i must have forgotten to push that
commit up before merging
2017-03-31 08:41:40 -04:00
Marty Schoch 024877f311 Merge pull request #556 from mschoch/geo-experiment
add experimental support for indexing/query geo points
2017-03-30 15:34:00 -04:00
Marty Schoch 9790574610 update to geo query parsing and top-level bleve accessibility
- make geo queries accessible from top-level bleve
- update query parsing to support same geo point formats as
  document parsing
- add constructor for easier sorting by geo distance in Go
- additional integration tests using alternate (GeoJSON) style points
2017-03-30 15:23:27 -04:00
Marty Schoch f025c9f229 Merge pull request #561 from mschoch/remove-fdb
remove forestdb from bleve
2017-03-30 12:33:40 -04:00
Marty Schoch 74140d4f2b remove forestdb from bleve 2017-03-30 12:27:23 -04:00
Marty Schoch 5636536583 fixed typo and formatted searches.json through jq . 2017-03-29 19:33:54 -04:00
Marty Schoch 7f89ff9493 add geo integration tests 2017-03-29 18:57:35 -04:00
Marty Schoch 6507e31787 improved geo searcher unit tests
also added flag for bounding box searcher to optionally not
check boundaries.  this is useful when other searchers are going
to check every point anyway by some other criteria.
2017-03-29 16:57:58 -04:00
Marty Schoch f44630a205 add support for customizing unit used in distance sorting 2017-03-29 16:04:30 -04:00
Marty Schoch fdbe669fd5 several more items on the geo checklist
- added readme pointing back to lucene origins
- improved documentation of exported methods in geo package
- improved test coverage to 100% on geo package
- added support for parsing geojson style points
- removed some duplicated code in the geo bounding box searcher
2017-03-29 14:21:59 -04:00
Marty Schoch 6c259524c3 Merge pull request #558 from MTecknology/master
Added name to copyright notice
2017-03-28 14:30:38 -04:00
Michael Lustfield c26af21050 Added name to copyright notice 2017-03-28 12:17:26 -05:00
Marty Schoch a16efa5e78 add experimental support for indexing/query geo points
New field type GeoPointField, or "geopoint" in mapping JSON.

Currently structs and maps are considered when a mapping explicitly
marks a field as type "geopoint".  Several variants of "lon", "lng", and "lat"
are looked for in map keys, struct field names, or method names.

New query type GeoBoundingBoxQuery searches for documents which have a
GeoPointField indexed with a value that is inside the specified bounding box.

New query type GeoDistanceQuery searches for documents which have a
GeoPointField indexed with a value that is less than or equal to the
specified distance from the specified location.

New sort by method "geo_distance".  Hits can be sorted by their distance
from the specified location.

New geo utility package with all routines ported from Lucene.

New FilteringSearcher, which wraps an existing Searcher, but filters
all hits with a user-provided callback.
2017-03-24 17:22:21 -07:00
Marty Schoch 4702785f1f Merge pull request #555 from mschoch/change-collector-heap
switch collector store impl from slice to heap
2017-03-24 11:03:22 -07:00
Marty Schoch 952572718e switch collector store impl from slice to heap
Additional testing has shown that the heap collector performs
significantly better when larger numbers of hits are requested.

The heap is also faster (though very close) when fewer (10) hits
are requested.

Here are the numbers from my laptop:

slice:

go test -run=xxx -bench=. -benchmem
BenchmarkTop10of10000Scores-4        	    5000	    396943 ns/op	    2472 B/op	      26 allocs/op
BenchmarkTop100of10000Scores-4       	    2000	    630894 ns/op	   18848 B/op	     116 allocs/op
BenchmarkTop1000of10000Scores-4      	     100	  14996445 ns/op	  176552 B/op	    1016 allocs/op
BenchmarkTop10000of100000Scores-4    	       1	1878796320 ns/op	 1857768 B/op	   19023 allocs/op
BenchmarkTop10of100000Scores-4       	     500	   3858309 ns/op	    2480 B/op	      26 allocs/op
BenchmarkTop100of100000Scores-4      	     300	   4270086 ns/op	   19000 B/op	     116 allocs/op
BenchmarkTop1000of100000Scores-4     	      50	  30163705 ns/op	  178024 B/op	    1016 allocs/op
BenchmarkTop10000of1000000Scores-4   	       1	3429557237 ns/op	 1882008 B/op	   19023 allocs/op
PASS
ok  	github.com/blevesearch/bleve/search/collector	16.316s

heap:

go test -run=xxx -bench=. -benchmem
BenchmarkTop10of10000Scores-4        	    5000	    341064 ns/op	    2552 B/op	      27 allocs/op
BenchmarkTop100of10000Scores-4       	    3000	    501922 ns/op	   19744 B/op	     117 allocs/op
BenchmarkTop1000of10000Scores-4      	    1000	   1759088 ns/op	  184744 B/op	    1017 allocs/op
BenchmarkTop10000of100000Scores-4    	      50	  25954696 ns/op	 1939608 B/op	   19024 allocs/op
BenchmarkTop10of100000Scores-4       	     500	   3814933 ns/op	    2560 B/op	      27 allocs/op
BenchmarkTop100of100000Scores-4      	     300	   4009369 ns/op	   19896 B/op	     117 allocs/op
BenchmarkTop1000of100000Scores-4     	     200	   6397276 ns/op	  186184 B/op	    1017 allocs/op
BenchmarkTop10000of1000000Scores-4   	      20	  81815315 ns/op	 1963912 B/op	   19024 allocs/op
PASS
ok  	github.com/blevesearch/bleve/search/collector	14.980s
2017-03-24 09:38:06 -07:00
Marty Schoch 4fe6f97f44 Merge pull request #552 from steveyen/collector-benchmarks
more collector benchmarks with larger sizes
2017-03-16 17:07:37 -04:00
Steve Yen 088953fbb6 more collector benchmarks with larger sizes 2017-03-16 13:46:28 -07:00
Marty Schoch 4eee341e04 Merge pull request #551 from mschoch/fix-query-string-neg
fix query string parsing of numeric ranges with negative value
2017-03-16 11:39:39 -04:00
Marty Schoch 0aab8d7fb9 fix query string parsing of numeric ranges with negative value
fixes #550
2017-03-16 11:11:28 -04:00
Marty Schoch 1bcfe4efa1 Merge pull request #546 from sreekanth-cb/store_abort_close
Store abort close
2017-03-07 12:35:18 -05:00
Sreekanth Sivasankaran f759d841c2 Adding guards for config casting. 2017-03-07 22:51:27 +05:30
Sreekanth Sivasankaran 0cdd0b38e2 Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close 2017-03-07 19:57:16 +05:30
Sreekanth Sivasankaran e88ff3c60a Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close
Syntax change for errcheck tool
2017-03-07 19:56:08 +05:30
Sreekanth Sivasankaran 9795e12d27 Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close 2017-03-07 19:36:13 +05:30
Sreekanth Sivasankaran ee819f5950 MB-22410 - Configurable forced Store Abort API
Adding a configurable forced store close
Bumping the moss store version
2017-03-07 19:33:51 +05:30
Marty Schoch 9bdfb4c6cd Merge pull request #548 from mschoch/fix-perf-regression
fix perf regression, unnecessarily loading backindex
2017-03-04 15:39:00 -05:00
Marty Schoch bc7d8e3b35 fix perf regression, unnecessarily loading backindex 2017-03-04 15:23:16 -05:00
Marty Schoch 75d75bf1bc Merge pull request #547 from mschoch/facets_less_garbage
reduce garbage created while processing facets
2017-03-02 17:23:52 -05:00
Marty Schoch 0eba2a3f0c reduce garbage created while processing facets
previously we parsed/returned large sections of the documents
back index row in order to compute facet information.  this
would require parsing the protobuf of the entire back index row.
unfortunately this creates considerable garbage.

this new version introduces a visitor/callback approach to
working with data inside the back index row.  the benefit
of this approach is that we can let the higher-level code
see values, prior to any copies of data being made or
intermediate garbage being created.  implementations of
the callback must copy any value which they would like to
retain beyond the callback.

NOTE: this approach is duplicates code from the
automatically generated protobuf code

NOTE: this approach assumes that the "field" field be serialized
before the "terms" field.  This is guaranteed by our currently
generated protobuf encoder, and is recommended by the protobuf
spec.  But, decoders SHOULD support them occuring in any order,
which we do not.
2017-03-02 17:00:46 -05:00
Marty Schoch b04745abcc remove smolder indexing scheme
this was an experiment that we're no longer working on
we learned from it, but now carrying it forward has
a maintenance burden we don't wish to pay
2017-03-01 14:38:17 -05:00
Sreekanth Sivasankaran 67a5814fbe MB-22410:deleting/editing index definition with large dirty write queue can be very slow
Adding a configurable forced store close
2017-03-01 18:58:32 +05:30
Sreekanth Sivasankaran 324e4237cf adding configurable Abort Close 2017-03-01 16:23:56 +05:30