Marty Schoch
b8de2df68d
add a pure Go Spanish analyzer
...
this introduces a new light Spanish stemmer
and move the other pure Go Spanish analysis components back
in from the blevex package
the libstemmer version of the stemmer will remain in blevex
2017-04-29 19:31:43 -04:00
Marty Schoch
ce901a8870
add a pure Go German analyzer
...
this introduces new German light stemmer
and moves the other pure Go german analysis components back
in from the blevex package
the libstemmer version of the stemmer will remain in blevex
2017-04-29 18:46:58 -04:00
Marty Schoch
11a45d6f9c
Merge pull request #585 from mschoch/fix-geo-point-dist
...
fix geo point distance search
2017-04-27 18:26:34 -04:00
Marty Schoch
8df8d4e797
fix geo point distance search
...
there was a bug where if the circle described by the point
distance query crossed the poles, then we incorrectly built
a box around it. this resulted in incorrect searh results.
2017-04-27 17:28:07 -04:00
Marty Schoch
92c5f3e2e6
Merge pull request #584 from mschoch/more-collector-benchmarks
...
topn collector switch approach based on size+skip
2017-04-27 09:27:08 -04:00
Marty Schoch
a4a34cc3b2
topn collector switch approach based on size+skip
...
we now use the slice store when size+skip <= 10
and use the heap store when size+skip > 10
here are the new perf numbers:
go test -run=xxx -bench=. -benchmem
BenchmarkTop10of0Scores-4 1000000 1150 ns/op 2304 B/op 15 allocs/op
BenchmarkTop10of3Scores-4 1000000 1417 ns/op 2304 B/op 18 allocs/op
BenchmarkTop10of10Scores-4 1000000 2133 ns/op 2312 B/op 25 allocs/op
BenchmarkTop10of25Scores-4 500000 3410 ns/op 2464 B/op 26 allocs/op
BenchmarkTop10of50Scores-4 300000 5174 ns/op 2464 B/op 26 allocs/op
BenchmarkTop10of10000Scores-4 5000 342955 ns/op 2488 B/op 26 allocs/op
BenchmarkTop100of0Scores-4 300000 4796 ns/op 18320 B/op 15 allocs/op
BenchmarkTop100of3Scores-4 300000 5160 ns/op 18352 B/op 19 allocs/op
BenchmarkTop100of10Scores-4 200000 6354 ns/op 18408 B/op 26 allocs/op
BenchmarkTop100of25Scores-4 200000 10023 ns/op 18568 B/op 41 allocs/op
BenchmarkTop100of50Scores-4 100000 16821 ns/op 18832 B/op 66 allocs/op
BenchmarkTop100of10000Scores-4 3000 508989 ns/op 19760 B/op 117 allocs/op
BenchmarkTop1000of10000Scores-4 1000 1814198 ns/op 184768 B/op 1017 allocs/op
BenchmarkTop10000of100000Scores-4 50 26623920 ns/op 1939592 B/op 19024 allocs/op
BenchmarkTop10of100000Scores-4 500 3730204 ns/op 2496 B/op 26 allocs/op
BenchmarkTop100of100000Scores-4 300 4057127 ns/op 19912 B/op 117 allocs/op
BenchmarkTop1000of100000Scores-4 200 6390180 ns/op 186200 B/op 1017 allocs/op
BenchmarkTop10000of1000000Scores-4 20 82785756 ns/op 1963897 B/op 19024 allocs/op
PASS
ok github.com/blevesearch/bleve/search/collector 31.537s
Previously with heap:
go test -run=xxx -bench=. -benchmem
BenchmarkTop10of0Scores-4 1000000 1216 ns/op 2288 B/op 15 allocs/op
BenchmarkTop10of3Scores-4 1000000 1593 ns/op 2320 B/op 19 allocs/op
BenchmarkTop10of10Scores-4 500000 2734 ns/op 2376 B/op 26 allocs/op
BenchmarkTop10of25Scores-4 300000 5077 ns/op 2520 B/op 27 allocs/op
BenchmarkTop10of50Scores-4 200000 6875 ns/op 2528 B/op 27 allocs/op
BenchmarkTop10of10000Scores-4 3000 351210 ns/op 2552 B/op 27 allocs/op
BenchmarkTop100of0Scores-4 300000 4846 ns/op 18304 B/op 15 allocs/op
BenchmarkTop100of3Scores-4 300000 5357 ns/op 18336 B/op 19 allocs/op
BenchmarkTop100of10Scores-4 200000 6462 ns/op 18392 B/op 26 allocs/op
BenchmarkTop100of25Scores-4 200000 10012 ns/op 18552 B/op 41 allocs/op
BenchmarkTop100of50Scores-4 100000 17089 ns/op 18816 B/op 66 allocs/op
BenchmarkTop100of10000Scores-4 3000 528193 ns/op 19744 B/op 117 allocs/op
BenchmarkTop1000of10000Scores-4 1000 1859447 ns/op 184752 B/op 1017 allocs/op
BenchmarkTop10000of100000Scores-4 50 28005664 ns/op 1939576 B/op 19024 allocs/op
BenchmarkTop10of100000Scores-4 300 4120091 ns/op 2560 B/op 27 allocs/op
BenchmarkTop100of100000Scores-4 300 4325227 ns/op 19896 B/op 117 allocs/op
BenchmarkTop1000of100000Scores-4 200 6799804 ns/op 186184 B/op 1017 allocs/op
BenchmarkTop10000of1000000Scores-4 20 88494230 ns/op 1963881 B/op 19024 allocs/op
PASS
ok github.com/blevesearch/bleve/search/collector 30.198s
Previously with slice:
go test -run=xxx -bench=. -benchmem
BenchmarkTop10of0Scores-4 1000000 1202 ns/op 2288 B/op 15 allocs/op
BenchmarkTop10of3Scores-4 1000000 1453 ns/op 2288 B/op 18 allocs/op
BenchmarkTop10of10Scores-4 1000000 2162 ns/op 2296 B/op 25 allocs/op
BenchmarkTop10of25Scores-4 500000 3420 ns/op 2448 B/op 26 allocs/op
BenchmarkTop10of50Scores-4 300000 5336 ns/op 2448 B/op 26 allocs/op
BenchmarkTop10of10000Scores-4 5000 356733 ns/op 2472 B/op 26 allocs/op
BenchmarkTop100of0Scores-4 300000 4877 ns/op 18304 B/op 15 allocs/op
BenchmarkTop100of3Scores-4 300000 5132 ns/op 18304 B/op 18 allocs/op
BenchmarkTop100of10Scores-4 200000 5787 ns/op 18312 B/op 25 allocs/op
BenchmarkTop100of25Scores-4 200000 8083 ns/op 18344 B/op 40 allocs/op
BenchmarkTop100of50Scores-4 100000 14419 ns/op 18400 B/op 65 allocs/op
BenchmarkTop100of10000Scores-4 2000 665401 ns/op 18848 B/op 116 allocs/op
BenchmarkTop1000of10000Scores-4 100 15417063 ns/op 176560 B/op 1016 allocs/op
BenchmarkTop10000of100000Scores-4 1 1860011022 ns/op 1857960 B/op 19023 allocs/op
BenchmarkTop10of100000Scores-4 300 4099276 ns/op 2480 B/op 26 allocs/op
BenchmarkTop100of100000Scores-4 300 4533645 ns/op 18984 B/op 116 allocs/op
BenchmarkTop1000of100000Scores-4 50 30519235 ns/op 178008 B/op 1016 allocs/op
BenchmarkTop10000of1000000Scores-4 1 3483977385 ns/op 1882072 B/op 19023 allocs/op
PASS
ok github.com/blevesearch/bleve/search/collector 31.666s
It appears that this sucessfully gets the best of both, in these particular benchmark sizes.
2017-04-27 08:57:13 -04:00
Nicholas Wiersma
b12c902457
Allow pre parsing query strings
2017-04-25 16:30:47 +02:00
Marty Schoch
17e21be71a
Merge pull request #578 from seiflotfy/index-advanced
...
Add new IndexAdvanced function
2017-04-19 09:14:44 -04:00
Marty Schoch
d855acf7fb
Merge pull request #581 from mschoch/more-collector-benchmarks
...
add more collector benchmarks
2017-04-18 21:35:32 -04:00
Marty Schoch
5b9e11ee5f
add more collector benchmarks
2017-04-18 17:24:50 -04:00
Seif Lotfy
06b4daed87
Add new IndexAdvanced function
2017-04-12 00:31:51 +02:00
Marty Schoch
0b1034dcbe
Merge pull request #576 from mschoch/fix-multisearch-sort-state
...
fix race condition in incorrectly shared state in MultiSearch
2017-04-06 18:05:36 -04:00
Marty Schoch
a78e632bd6
fix race condition in incorrectly shared state in MultiSearch
...
When performing a MultiSearch, we create child SearchRequests
from the original SearchRequest. In doing so we copy many fields.
But, copying of the SortOrder was incorrect, as this contains
state, and distint SortOrder objects must be used. This change
introduces a Copy() method to the SearchSort interface, and
to the SortOrder types. MultiSearch now creates a new copy of
the SortOrder for each child request.
2017-04-06 17:49:33 -04:00
Marty Schoch
957812369d
Merge pull request #572 from mschoch/fix-geo-fts
...
add option for multi term searcher to skip max disjunction check
2017-04-04 10:58:35 -04:00
Marty Schoch
6f62489f21
add option for multi term searcher to skip max disjunction check
...
- geo searches now use this option and skip the check
- export ComputeGeoTerms for geo debug visualizations
2017-04-04 10:46:57 -04:00
Steve Yen
bd73d1bb75
optimmize heap collector Final() for large counts
...
The previous heap Final() loop would decrement count all the way to 0
when it only has to fill enough of the return slice.
2017-04-01 12:54:49 -07:00
Marty Schoch
7dd52a69d2
Merge pull request #566 from mschoch/term-range
...
introduce new query TermRange
2017-03-31 22:09:51 -04:00
Marty Schoch
1eba5541f2
introduce new query TermRange
...
The term range query is not often used in full-text queries, but
can be useful when filtering on keyword indexed text terms in
the index.
The JSON syntax to do a TermRange query is the same as for
NumericRange, but the min/max values must be string and not
float64.
2017-03-31 22:04:00 -04:00
Marty Schoch
4d00d863af
Merge pull request #565 from mschoch/refactor-searchers
...
refactor searchers
2017-03-31 17:27:54 -04:00
Marty Schoch
f8fdfebb6c
refactor searchers
...
- TermSearcher has alternate constructor if term is []byte, this can avoid
copying in some cases. TermScorer updated to accept []byte term. Also
removed a few struct fields which were not being used.
- New MultiTermSearcher searches for documents containing any of a list of
terms. Current implementation simply uses DisjunctionSearcher.
- Several other searcher constructors now simply build a list of terms and
then delegate to the MultiTermSearcher
- NewPrefixSearcher
- NewRegexpSearcher
- NewFuzzySearcher
- NewNumericRangeSearcher
- NewGeoBoundingBoxSearcher and NewGeoPointDistanceSearcher make use of
the MultiTermSearcher internally, and follow the pattern of returning
an existing search.Searcher, as opposed to their own wrapping struct.
- Callback filter functions used in NewGeoBoundingBoxSearcher and
NewGeoPointDistanceSearcher have been extracted into separate functions
which makes the code much easier to read.
2017-03-31 17:21:46 -04:00
Marty Schoch
0d41e80b66
Merge pull request #563 from mschoch/fix-geo-stored
...
fix geopoint fields to be able to be stored and retrieved
2017-03-31 10:12:34 -04:00
Marty Schoch
3ad13236ec
fix geopoint fields to be able to be stored and retrieved
2017-03-31 09:40:54 -04:00
Marty Schoch
647693f1b1
Merge pull request #562 from mschoch/geo-sreekanth
...
geo review comments from sreekanth
2017-03-31 08:49:15 -04:00
Marty Schoch
6554e9624f
geo review comments from sreekanth
...
also one fix came from steve, i must have forgotten to push that
commit up before merging
2017-03-31 08:41:40 -04:00
Marty Schoch
024877f311
Merge pull request #556 from mschoch/geo-experiment
...
add experimental support for indexing/query geo points
2017-03-30 15:34:00 -04:00
Marty Schoch
9790574610
update to geo query parsing and top-level bleve accessibility
...
- make geo queries accessible from top-level bleve
- update query parsing to support same geo point formats as
document parsing
- add constructor for easier sorting by geo distance in Go
- additional integration tests using alternate (GeoJSON) style points
2017-03-30 15:23:27 -04:00
Marty Schoch
f025c9f229
Merge pull request #561 from mschoch/remove-fdb
...
remove forestdb from bleve
2017-03-30 12:33:40 -04:00
Marty Schoch
74140d4f2b
remove forestdb from bleve
2017-03-30 12:27:23 -04:00
Marty Schoch
5636536583
fixed typo and formatted searches.json through jq .
2017-03-29 19:33:54 -04:00
Marty Schoch
7f89ff9493
add geo integration tests
2017-03-29 18:57:35 -04:00
Marty Schoch
6507e31787
improved geo searcher unit tests
...
also added flag for bounding box searcher to optionally not
check boundaries. this is useful when other searchers are going
to check every point anyway by some other criteria.
2017-03-29 16:57:58 -04:00
Marty Schoch
f44630a205
add support for customizing unit used in distance sorting
2017-03-29 16:04:30 -04:00
Marty Schoch
fdbe669fd5
several more items on the geo checklist
...
- added readme pointing back to lucene origins
- improved documentation of exported methods in geo package
- improved test coverage to 100% on geo package
- added support for parsing geojson style points
- removed some duplicated code in the geo bounding box searcher
2017-03-29 14:21:59 -04:00
Marty Schoch
6c259524c3
Merge pull request #558 from MTecknology/master
...
Added name to copyright notice
2017-03-28 14:30:38 -04:00
Michael Lustfield
c26af21050
Added name to copyright notice
2017-03-28 12:17:26 -05:00
Marty Schoch
a16efa5e78
add experimental support for indexing/query geo points
...
New field type GeoPointField, or "geopoint" in mapping JSON.
Currently structs and maps are considered when a mapping explicitly
marks a field as type "geopoint". Several variants of "lon", "lng", and "lat"
are looked for in map keys, struct field names, or method names.
New query type GeoBoundingBoxQuery searches for documents which have a
GeoPointField indexed with a value that is inside the specified bounding box.
New query type GeoDistanceQuery searches for documents which have a
GeoPointField indexed with a value that is less than or equal to the
specified distance from the specified location.
New sort by method "geo_distance". Hits can be sorted by their distance
from the specified location.
New geo utility package with all routines ported from Lucene.
New FilteringSearcher, which wraps an existing Searcher, but filters
all hits with a user-provided callback.
2017-03-24 17:22:21 -07:00
Marty Schoch
4702785f1f
Merge pull request #555 from mschoch/change-collector-heap
...
switch collector store impl from slice to heap
2017-03-24 11:03:22 -07:00
Marty Schoch
952572718e
switch collector store impl from slice to heap
...
Additional testing has shown that the heap collector performs
significantly better when larger numbers of hits are requested.
The heap is also faster (though very close) when fewer (10) hits
are requested.
Here are the numbers from my laptop:
slice:
go test -run=xxx -bench=. -benchmem
BenchmarkTop10of10000Scores-4 5000 396943 ns/op 2472 B/op 26 allocs/op
BenchmarkTop100of10000Scores-4 2000 630894 ns/op 18848 B/op 116 allocs/op
BenchmarkTop1000of10000Scores-4 100 14996445 ns/op 176552 B/op 1016 allocs/op
BenchmarkTop10000of100000Scores-4 1 1878796320 ns/op 1857768 B/op 19023 allocs/op
BenchmarkTop10of100000Scores-4 500 3858309 ns/op 2480 B/op 26 allocs/op
BenchmarkTop100of100000Scores-4 300 4270086 ns/op 19000 B/op 116 allocs/op
BenchmarkTop1000of100000Scores-4 50 30163705 ns/op 178024 B/op 1016 allocs/op
BenchmarkTop10000of1000000Scores-4 1 3429557237 ns/op 1882008 B/op 19023 allocs/op
PASS
ok github.com/blevesearch/bleve/search/collector 16.316s
heap:
go test -run=xxx -bench=. -benchmem
BenchmarkTop10of10000Scores-4 5000 341064 ns/op 2552 B/op 27 allocs/op
BenchmarkTop100of10000Scores-4 3000 501922 ns/op 19744 B/op 117 allocs/op
BenchmarkTop1000of10000Scores-4 1000 1759088 ns/op 184744 B/op 1017 allocs/op
BenchmarkTop10000of100000Scores-4 50 25954696 ns/op 1939608 B/op 19024 allocs/op
BenchmarkTop10of100000Scores-4 500 3814933 ns/op 2560 B/op 27 allocs/op
BenchmarkTop100of100000Scores-4 300 4009369 ns/op 19896 B/op 117 allocs/op
BenchmarkTop1000of100000Scores-4 200 6397276 ns/op 186184 B/op 1017 allocs/op
BenchmarkTop10000of1000000Scores-4 20 81815315 ns/op 1963912 B/op 19024 allocs/op
PASS
ok github.com/blevesearch/bleve/search/collector 14.980s
2017-03-24 09:38:06 -07:00
Marty Schoch
4fe6f97f44
Merge pull request #552 from steveyen/collector-benchmarks
...
more collector benchmarks with larger sizes
2017-03-16 17:07:37 -04:00
Steve Yen
088953fbb6
more collector benchmarks with larger sizes
2017-03-16 13:46:28 -07:00
Marty Schoch
4eee341e04
Merge pull request #551 from mschoch/fix-query-string-neg
...
fix query string parsing of numeric ranges with negative value
2017-03-16 11:39:39 -04:00
Marty Schoch
0aab8d7fb9
fix query string parsing of numeric ranges with negative value
...
fixes #550
2017-03-16 11:11:28 -04:00
Marty Schoch
1bcfe4efa1
Merge pull request #546 from sreekanth-cb/store_abort_close
...
Store abort close
2017-03-07 12:35:18 -05:00
Sreekanth Sivasankaran
f759d841c2
Adding guards for config casting.
2017-03-07 22:51:27 +05:30
Sreekanth Sivasankaran
0cdd0b38e2
Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close
2017-03-07 19:57:16 +05:30
Sreekanth Sivasankaran
e88ff3c60a
Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close
...
Syntax change for errcheck tool
2017-03-07 19:56:08 +05:30
Sreekanth Sivasankaran
9795e12d27
Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close
2017-03-07 19:36:13 +05:30
Sreekanth Sivasankaran
ee819f5950
MB-22410 - Configurable forced Store Abort API
...
Adding a configurable forced store close
Bumping the moss store version
2017-03-07 19:33:51 +05:30
Marty Schoch
9bdfb4c6cd
Merge pull request #548 from mschoch/fix-perf-regression
...
fix perf regression, unnecessarily loading backindex
2017-03-04 15:39:00 -05:00
Marty Schoch
bc7d8e3b35
fix perf regression, unnecessarily loading backindex
2017-03-04 15:23:16 -05:00