0
0
A modern text indexing library for go. (this is a mirror of the github repository) http://www.blevesearch.com/
Go to file
Marty Schoch 2043bb4bf8 fix pagination bug introduced by collector optimization
fixes #378

this bug was introduced by:
f2aba116c4

theory of operation for this collector (top N, skip K)

- collect the highest scoring N+K results
- if K > 0, skip K and return the next N

internal details

- the top N+K are kept in a list
- the list is ordered from lowest scoring (first) to highest scoring (last)
- as a hit comes in, we find where this new hit would fit into this list
- if this caused the list to get too big, trim off the head (lowest scoring hit)

theory of the optimization

- we were not tracking the lowest score in the list
- so if the score was lower than the lowest score, we would add/remove it
- by keeping track of the lowest score in the list, we can avoid these ops

problem with the optimization
- the optimization worked by returning early
- by returning early there was a subtle change to documents which had the same score
- the reason is that which docs end up in the top N+K changed by returning early
- why was that? docs are coming in, in order by key ascending
- when finding the correct position to insert a hit into the list, we checked <, not <= the score
- this has the subtle effect that docs with the same score end up in reverse order

for example consider the following in progress list:

doc ids [   c    a    b  ]
scores  [   1    5    9  ]

if we now see doc d with score 5, we get:

doc ids [   c    a    d    b  ]
scores  [   1    5    5    9  ]

While that appears in order (a, d) it is actually reverse order, because when we
produce the top N we start at the end.

theory of the fix

- previous pagination depended on later hits with the same score "bumping" earlier
hits with the same score off the bottom of the list
- however, if we change the logic to <= instead of <, now the list in the previous
example would look like:

doc ids [   c    d    a    b  ]
scores  [   1    5    5    9  ]

- this small change means that now earlier (lower id) will score higher, and
thus we no longer depend on later hits bumping things down, which means returning
early is a valid thing to do

NOTE: this does depend on the hits coming back in order by ID.  this is not
something strictly guaranteed, but it was the same assumption that allowed the
original behavior

This also has the side-effect that 2 hits with the same score come back in
ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 11:35:18 -04:00
analysis fix ineffectual assignments 2016-04-02 22:42:56 -04:00
config adding config options to use cellar 2016-05-26 17:33:42 -04:00
docs update travis to build with deps specified in manifest 2016-04-08 17:06:25 -04:00
document add support for numPlainTextBytesIndexed metric 2016-03-05 14:05:08 -05:00
http try to close indexes at end of http handler test 2016-02-09 16:26:03 -05:00
index enable mossStore as configurable lower-level store 2016-05-26 13:33:22 -07:00
numeric_util simplify prefix coding 2015-10-12 14:53:17 -07:00
registry a few more gofmt simplifications 2016-04-02 22:48:00 -04:00
search fix pagination bug introduced by collector optimization 2016-06-01 11:35:18 -04:00
test fix pagination bug introduced by collector optimization 2016-06-01 11:35:18 -04:00
utils more enhancements to bleve_query 2015-12-16 14:52:33 -05:00
vendor enable mossStore as configurable lower-level store 2016-05-26 13:33:22 -07:00
.gitignore add initial manifest 2016-04-08 15:03:05 -04:00
.travis.yml update travis to build with deps specified in manifest 2016-04-08 17:06:25 -04:00
config.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
doc.go apply doc fix patch from rakoo 2014-09-07 09:09:47 -04:00
error.go remove temporary error and replace with permanent check 2016-02-03 10:23:49 -05:00
examples_test.go fix some test failures on windows 2016-02-09 13:33:11 -05:00
index_alias_impl_test.go fix ineffectual assignments 2016-04-02 22:42:56 -04:00
index_alias_impl.go more defensive merging of errors in search result status 2016-03-10 16:04:05 -05:00
index_alias.go Fix typos in comments and strings 2014-12-18 18:43:12 +01:00
index_impl.go Load the document only once for both fields and highlighter 2016-04-28 11:12:33 -07:00
index_meta_test.go fix issues identified by errcheck 2015-04-07 15:39:56 -04:00
index_meta.go Minor fix to ensure full index path exists 2016-03-13 21:44:21 -06:00
index_stats.go moved fields requiring 64-bit alignment to start of struct 2016-03-20 10:38:28 -04:00
index_test.go fix data race in bleve batch reuse 2016-04-08 15:32:13 -04:00
index.go add support for gathering stats via map for easier consumption 2016-03-07 18:37:46 -05:00
LICENSE adding license file 2014-04-17 17:03:15 -04:00
mapping_document.go export the Validate method on mapping objects 2016-03-28 17:14:41 -04:00
mapping_field.go add support for toggling Store/Index Dynamic in IndexMapping 2016-03-08 07:58:29 -05:00
mapping_index.go gofmt simplifications 2016-04-02 21:54:33 -04:00
mapping_test.go fix handling of dynamic property in mappings of sub-documents 2016-03-11 12:18:24 -05:00
query_bool_field.go Implemented boolean field support 2016-01-11 17:18:03 -08:00
query_boolean.go boolean query defaults to minShould of 0 2016-01-12 16:30:10 -05:00
query_conjunction.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
query_date_range.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
query_disjunction.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
query_docid.go query_docid: add DocIDQuery to filter by document identifiers 2015-11-04 18:41:16 +01:00
query_fuzzy.go Fix some typos 2016-01-15 05:46:27 +07:00
query_match_all.go fix marshaling of MatchAll queries 2016-04-07 18:20:35 -04:00
query_match_none.go fix marshaling of MatchNone queries 2016-04-16 20:51:27 -04:00
query_match_phrase.go mapping_field: document IncludeTermVectors 2015-11-19 15:38:16 +01:00
query_match.go clean up logging to use package level *log.Logger 2014-12-28 12:14:48 -08:00
query_numeric_range.go Update NumericRangeQuery comments 2015-11-12 22:16:10 +01:00
query_phrase.go mapping_field: document IncludeTermVectors 2015-11-19 15:38:16 +01:00
query_prefix.go major refactor of kvstore/index internals, see below 2014-09-12 17:21:35 -04:00
query_regexp.go more correct fix, handles case where validate is called 2016-01-21 17:26:24 -05:00
query_string_parser_test.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
query_string_parser.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
query_string.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
query_string.nex add support for wildcard and regexp queries to query string 2016-04-08 15:56:02 -04:00
query_string.nn.go add support for wildcard and regexp queries to query string 2016-04-08 15:56:02 -04:00
query_string.y simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
query_string.y.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
query_term.go major refactor of kvstore/index internals, see below 2014-09-12 17:21:35 -04:00
query_test.go boolean query defaults to minShould of 0 2016-01-12 16:30:10 -05:00
query_wildcard.go added regexp and wildcard queries 2015-03-11 16:57:22 -04:00
query.go simplify date parsing in queries, add date to query string 2016-04-22 17:12:10 -04:00
README.md adding goreportcard 2016-04-02 21:01:23 -04:00
reflect.go typo in lookupPropertyPathPart() func name 2015-11-23 09:27:22 -08:00
search_test.go parse search results by converting strings back to errors 2016-04-26 17:56:37 -04:00
search.go parse search results by converting strings back to errors 2016-04-26 17:56:37 -04:00

bleve bleve

Build Status Coverage Status GoDoc Join the chat at https://gitter.im/blevesearch/bleve codebeat Go Report Card

modern text indexing in go - blevesearch.com

Try out bleve live by searching our wiki.

Features

  • Index any go data structure (including JSON)
  • Intelligent defaults backed up by powerful configuration
  • Supported field types:
    • Text, Numeric, Date
  • Supported query types:
    • Term, Phrase, Match, Match Phrase, Prefix
    • Conjunction, Disjunction, Boolean
    • Numeric Range, Date Range
    • Simple query syntax for human entry
  • tf-idf Scoring
  • Search result match highlighting
  • Supports Aggregating Facets:
    • Terms Facet
    • Numeric Range Facet
    • Date Range Facet

Discussion

Discuss usage and development of bleve in the google group.

Indexing

	message := struct{
		Id   string
		From string
		Body string
	}{
		Id:   "example",
		From: "marty.schoch@gmail.com",
		Body: "bleve indexing is easy",
	}

	mapping := bleve.NewIndexMapping()
	index, err := bleve.New("example.bleve", mapping)
	if err != nil {
		panic(err)
	}
	index.Index(message.Id, message)

Querying

	index, _ := bleve.Open("example.bleve")
	query := bleve.NewQueryStringQuery("bleve")
	searchRequest := bleve.NewSearchRequest(query)
	searchResult, _ := index.Search(searchRequest)

License

Apache License Version 2.0