0
0
A modern text indexing library for go. (this is a mirror of the github repository) http://www.blevesearch.com/
Go to file
Marty Schoch 1a59a1bb99 attempt to fix core reference counting issues
Observed problem:

Persisted index state (in root bolt) would contain index snapshots which
pointed to index files that did not exist.

Debugging this uncovered two main problems:

1.  At the end of persisting a snapshot, the persister creates a new index
snapshot with the SAME epoch as the current root, only it replaces in-memory
segments with the new disk based ones.  This is problematic because reference
counting an index segment triggers "eligible for deletion".  And eligible for
deletion is keyed by epoch.  So having two separate instances going by the same
epoch is problematic.  Specifically, one of them gets to 0 before the other,
and we wrongly conclude it's eligible for deletion, when in fact the "other"
instance with same epoch is actually still in use.

To address this problem, we have modified the behavior of the persister.  Now,
upon completion of persistence, ONLY if new files were actually created do we
proceed to introduce a new snapshot.  AND, this new snapshot now gets it's own
brand new epoch.  BOTH of these are important because since the persister now
also introduces a new epoch, it will see this epoch again in the future AND be
expected to persist it.  That is OK (mostly harmless), but we cannot allow it
to form a loop.  Checking that new files were actually introduced is what
short-circuits the potential loop.  The new epoch introduced by the persister,
if seen again will not have any new segments that actually need persisting to
disk, and the cycle is stopped.

2.  The implementation of NumSnapshotsToKeep, and related code to deleted old
snapshots from the root bolt also contains problems.  Specifically, the
determination of which snapshots to keep vs delete did not consider which ones
were actually persisted.  So, lets say you had set NumSnapshotsToKeep to 3, if
the introducer gets 3 snapshots ahead of the persister, what can happen is that
the three snapshots we choose to keep are all in memory.  We now wrongly delete
all of the snapshots from the root bolt.  But it gets worse, in this instant of
time, we now have files on disk that nothing in the root bolt points to, so we
also go ahead and delete those files.  Those files were still being referenced
by the in-memory snapshots.  But, now even if they get persisted to disk, they
simply have references to non-existent files.  Opening up one of these indexes
results in lost data (often everything).

To address this problem, we made large change to the way this section of code
operates.  First, we now start with a list of all epochs actually persisted in
the root bolt.  Second, we set aside NumSnapshotsToKeep of these snapshots to
keep.  Third, anything else in the eligibleForRemoval list will be deleted.  I
suspect this code is slower and less elegant, but I think it is more correct.
Also, previously NumSnapshotsToKeep defaulted to 0, I have now defaulted it to
1, which feels like saner out-of-the-box behavior (though it's debatable if the
original intent was perhaps instead for "extra" snapshots to keep, but with the
variable named as it is, 1 makes more sense to me)

Other minor changes included in this change:

- Location of 'nextSnapshotEpoch', 'eligibleForRemoval', and
'ineligibleForRemoval' members of Scorch struct were moved into the
paragraph with 'rootLock' to clarify that you must hold the lock to access it.

- TestBatchRaceBug260 was updated to properly Close() the index, which leads to
occasional test failures.
2018-01-03 12:05:00 -05:00
analysis Fix test 2017-06-22 18:56:28 -04:00
cmd/bleve added hyphen in query sort by option 2017-05-18 11:27:51 -07:00
config remove forestdb from bleve 2017-03-30 12:27:23 -04:00
docs nicer formatting of license header 2016-10-02 10:13:14 -04:00
document remove unused Document.Number property 2017-08-24 16:21:26 -07:00
geo fix geo point distance search 2017-04-27 17:28:07 -04:00
http disable http unit test which relied on debug functionality 2017-12-11 15:38:44 -05:00
index attempt to fix core reference counting issues 2018-01-03 12:05:00 -05:00
mapping working in-memory version 2017-11-29 11:33:35 -05:00
numeric add experimental support for indexing/query geo points 2017-03-24 17:22:21 -07:00
registry optimize FacetsBuilder with cached fields & avoid some allocs 2016-10-25 15:34:48 -07:00
search phrase searcher don't allow advance after end 2017-12-27 10:24:33 -08:00
test scorch conjuncts match phrase test passes 2017-12-23 09:19:40 -08:00
vendor try newer version of bolt (seeing random crashes on travis) 2017-12-11 22:09:26 -05:00
.gitignore initial refactor of query into separate package 2016-09-29 14:54:16 -04:00
.travis.yml travis: update go versions 2017-09-12 10:56:33 +02:00
config_app.go nicer formatting of license header 2016-10-02 10:13:14 -04:00
config_disk.go nicer formatting of license header 2016-10-02 10:13:14 -04:00
config.go fix up issues to get all bleve unit tests passing for scorch 2017-12-11 15:47:41 -05:00
CONTRIBUTING.md adding CONTRIBUTING.md to repo 2016-06-26 09:48:43 -04:00
doc.go nicer formatting of license header 2016-10-02 10:13:14 -04:00
error.go nicer formatting of license header 2016-10-02 10:13:14 -04:00
examples_test.go additional golint cleanups 2016-10-02 12:00:01 -04:00
index_alias_impl_test.go simplified MultiSearch requires that indexes honor context deadlines 2016-11-03 16:44:20 -07:00
index_alias_impl.go fix race condition in incorrectly shared state in MultiSearch 2017-04-06 17:49:33 -04:00
index_alias.go nicer formatting of license header 2016-10-02 10:13:14 -04:00
index_impl.go Add new IndexAdvanced function 2017-04-12 00:31:51 +02:00
index_meta_test.go nicer formatting of license header 2016-10-02 10:13:14 -04:00
index_meta.go nicer formatting of license header 2016-10-02 10:13:14 -04:00
index_stats.go nicer formatting of license header 2016-10-02 10:13:14 -04:00
index_test.go attempt to fix core reference counting issues 2018-01-03 12:05:00 -05:00
index.go add support for BleveType() alternative for type detection 2017-05-19 09:22:12 -04:00
LICENSE adding license file 2014-04-17 17:03:15 -04:00
mapping.go add experimental support for indexing/query geo points 2017-03-24 17:22:21 -07:00
query.go introduce new query TermRange 2017-03-31 22:04:00 -04:00
README.md Apache2 license badge 2017-02-13 16:09:54 -08:00
search_test.go clean up of unit test. 2017-02-02 23:33:26 +05:30
search.go Adding a new bucket setter method for dateTimeRange 2017-06-12 15:53:27 +05:30

bleve bleve

Build Status Coverage Status GoDoc Join the chat at https://gitter.im/blevesearch/bleve codebeat Go Report Card Sourcegraph License

modern text indexing in go - blevesearch.com

Try out bleve live by searching the bleve website.

Features

  • Index any go data structure (including JSON)
  • Intelligent defaults backed up by powerful configuration
  • Supported field types:
    • Text, Numeric, Date
  • Supported query types:
    • Term, Phrase, Match, Match Phrase, Prefix
    • Conjunction, Disjunction, Boolean
    • Numeric Range, Date Range
    • Simple query syntax for human entry
  • tf-idf Scoring
  • Search result match highlighting
  • Supports Aggregating Facets:
    • Terms Facet
    • Numeric Range Facet
    • Date Range Facet

Discussion

Discuss usage and development of bleve in the google group.

Indexing

message := struct{
	Id   string
	From string
	Body string
}{
	Id:   "example",
	From: "marty.schoch@gmail.com",
	Body: "bleve indexing is easy",
}

mapping := bleve.NewIndexMapping()
index, err := bleve.New("example.bleve", mapping)
if err != nil {
	panic(err)
}
index.Index(message.Id, message)

Querying

index, _ := bleve.Open("example.bleve")
query := bleve.NewQueryStringQuery("bleve")
searchRequest := bleve.NewSearchRequest(query)
searchResult, _ := index.Search(searchRequest)

License

Apache License Version 2.0