0
0
Commit Graph

74 Commits

Author SHA1 Message Date
Steve Yen
7ce7d98cba upside_down merge dictionary deltas before using batch.Merge()
This change performs more dictionary delta incr/decr math in
batchRows() instead of in the KVStore ExecuteBatch() machinery.
2016-01-11 16:52:07 -08:00
Steve Yen
94273d5fa9 upside_down process internal rows earlier
With this change, internal rows are processed while we're waiting for
backIndex rows to be retrieved.
2016-01-11 16:25:35 -08:00
Steve Yen
bb5cd8f3d6 upside_down merge backIndexRow concurrently
Previously, the code would gather all the backIndexRows before
processing them.  This change instead merges the backIndexRows
concurrently on the theory that we might as well make progress on
compute & processing tasks while waiting for the rest of the back
index rows to be fetched from the KVStore.
2016-01-10 18:50:42 -08:00
Steve Yen
c3b5246b0c upside_down track analysis time tighter; and comments 2016-01-10 15:36:54 -08:00
Steve Yen
d3dd40d334 upside_down retrieves backindex concurrently with analysis
Start backindex reading concurrently with analysi to try to utilize
more I/O bandwidth.

The analysis time vs indexing time stats tracking are also now "off",
since there's now concurrency between those actiivties.

One tradeoff is that the lock area in upside_down Batch() is increased
as part of this change.
2016-01-10 15:18:28 -08:00
Steve Yen
860de28a28 fix memory leak by closing batches in batchRows() 2016-01-07 17:59:42 -08:00
Steve Yen
846912d083 upside_down udc.termVectorsFromTokenFreq rows append optimization 2016-01-07 00:48:34 -08:00
Steve Yen
8b980bd2ef firestorm avoid extra goroutine, similar to upside_down 2016-01-07 00:43:27 -08:00
Steve Yen
4eee8821f9 upside_down storeField/indexField append to provided arrays
Taking another optimization from firestorm, upside_down's
storeField()/indexField() funcs now also append() to passed-in arrays
rather than always allocating their own arrays.
2016-01-07 00:13:46 -08:00
Steve Yen
82b8b3468e upside_down analysis converts to docIDBytes once 2016-01-06 23:38:02 -08:00
Steve Yen
89d17f01ef analyze locations only if includeTermVectors enabled
With this change, TermLocations are computed and maintained only if
includeTermVectors is enabled, for higher performance.
2016-01-05 12:46:46 -08:00
Marty Schoch
8efbd556a3 fix indexing bug with data coming from arrays
fixes #295
2015-12-21 14:59:32 -05:00
Marty Schoch
30651065e9 fix panic on insufficiently sized buffer
adds test case to reproduce original problem
fixes #264
2015-10-30 18:25:38 -04:00
Marty Schoch
817c317c90 Merge branch 'master' into newkvstore 2015-10-19 12:04:07 -04:00
Marty Schoch
faceecf87b make row buffer size constant/configurable
also handle case where it is insufficiently sized
2015-10-19 12:03:38 -04:00
Marty Schoch
c9471d5739 Merge pull request #244 from kevgs/master
reducing allocation count
2015-10-16 15:51:30 -04:00
Marty Schoch
4c6bc23043 rewrite to keep using same buffer when possible 2015-10-13 14:04:56 -07:00
Marty Schoch
8de860bf12 2 more places that used old Key() 2015-10-13 12:35:08 -07:00
Patrick Mezard
8c928539ee upside_down: no need for a goroutine to enqueue AnalysisWork
It boils down to:
1. client sends some work and a notification channel to a single worker,
   then waits.
2. worker processes the work
3. worker sends the result to the client using the notification channel

I do not see any problem with this, even with unbuffered channels.
2015-10-12 10:42:14 +02:00
Marty Schoch
0f05d1d3ca Merge branch 'master' into newkvstore 2015-10-09 10:33:41 -04:00
Patrick Mezard
aee82f8b49 upside_down: simplify return code in batchRows() 2015-10-09 09:57:12 +02:00
Marty Schoch
e28eb749d7 bump up buffer size 2015-10-06 16:45:38 -04:00
Marty Schoch
71cbb13e07 modify code to reuse buffer for kv generation 2015-10-05 17:49:50 -04:00
Kosov Eugene
a61c350888 reducing allocation count 2015-10-05 22:57:10 +03:00
Marty Schoch
d06b526cbf more refactoring 2015-09-28 16:50:27 -04:00
Marty Schoch
900f1b4a67 major kvstore interface and impl overhaul
clarified the interface contract
2015-09-23 11:25:47 -07:00
Marty Schoch
dbb93b75a4 refactoring to allow pluggable index encodings
this lays the foundation for supporting the new firestorm
indexing scheme.  i'm merging these changes ahead of
the rest of the firestorm branch so i can continue
to make changes to the analysis pipeline in parallel
2015-09-02 13:12:08 -04:00
Marty Schoch
3682c25467 update to correctly work with composite fields
also updated search results to return array positions
2015-07-31 11:16:11 -04:00
Marty Schoch
c1c4941dde Merge branch 'feature/term_vector' of https://github.com/tukdesk/bleve into tukdesk-feature/term_vector 2015-07-29 14:31:15 -04:00
Marty Schoch
7be7ecdf8e fix batch indexing bug, incremented docCount before commit
fixes #211
2015-06-08 14:14:05 -04:00
dtynn
b4f7496031 update the index format version number 2015-05-18 15:16:35 +08:00
dtynn
89dc2c22bc update TermVector 2015-05-17 13:07:14 +08:00
Marty Schoch
8f70def63b properly use the stored array positions when loading a document
fixes #205
2015-05-15 15:47:54 -04:00
Marty Schoch
328bc73ed0 clarify Batch is not threadsafe in docs
in some limited cases we can detect unsafe usage
in these cases, do not trip over ourselves and panic
instead return a strongly typed error upside_down.UnsafeBatchUseDetected
also, introduced Batch.Reset() to allow batch reuse
this is currently still experimental
closes #195
2015-05-15 15:04:52 -04:00
Marty Schoch
57cd67fa88 fix data race on index metadata (docCount)
closes #198
2015-05-08 08:07:20 -04:00
Marty Schoch
a9c07acbfa refactor of kvstore api to support native merge in rocksdb
refactor to share code in emulated batch
refactor to share code in emulated merge
refactor index kvstore benchmarks to share more code
refactor index kvstore benchmarks to be more repeatable
2015-04-24 17:13:50 -04:00
Marty Schoch
f1ec73e764 fix issues identified by errcheck
part of #169
2015-04-07 13:26:54 -04:00
Marty Schoch
522f9d5cc7 significant change to index format, support dictionary rows
this introduces disk format v4
now the summary rows for a term are stored in their own
"dictionary row" format, previously the same information
was stored in special term frequency rows
this now allows us to easily iterate all the terms for a field
in sorted order (useful for many other fuzzy data structures)

at the top-level of bleve you can now browse terms within a field
using the following api on the Index interface:

  FieldDict(field string) (index.FieldDict, error)
  FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error)
  FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)

fixes #127
2015-03-10 16:22:19 -04:00
Marty Schoch
a2ad7634f2 update term freq rows to use varint where possible
benchmark old ns/op new ns/op delta
BenchmarkLevelDBIndexing1Workers 1138292 657901 -42.20%
BenchmarkLevelDBIndexing2Workers 1619323 647628 -60.01%
BenchmarkLevelDBIndexing4Workers 1172845 636478 -45.73%
BenchmarkLevelDBIndexing1Workers10Batch 465556545 448153394 -3.74%
BenchmarkLevelDBIndexing2Workers10Batch 504203911 449657355 -10.82%
BenchmarkLevelDBIndexing4Workers10Batch 510766435 439839335 -13.89%
BenchmarkLevelDBIndexing1Workers100Batch 307657846 268976464 -12.57%
BenchmarkLevelDBIndexing2Workers100Batch 302257400 269110215 -10.97%
BenchmarkLevelDBIndexing4Workers100Batch 305320485 259084902 -15.14%
BenchmarkLevelDBIndexing1Workers1000Batch 301320576 258070231 -14.35%
BenchmarkLevelDBIndexing2Workers1000Batch 334174454 261175641 -21.84%
BenchmarkLevelDBIndexing4Workers1000Batch 267732436 261461739 -2.34%

closes #165
2015-03-06 13:00:53 -05:00
Marty Schoch
c566d34264 bump index format version number, start checking version on open 2015-02-17 17:16:31 +05:30
Marty Schoch
ba978ea27e improving log messages 2015-01-16 14:07:47 -05:00
Silvan Jegen
ef18dfe4cd Fix typos in comments and strings 2014-12-18 18:43:12 +01:00
Sergey Avseyev
a8351be5a6
Update protobuf imports 2014-12-10 01:24:59 +03:00
Marty Schoch
c7443fe52b refactored API a bit
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API.  in these case and proactively in a few
others we now return error as well.

also the batch API has been updated to allow performing
set/delete internal within the batch
2014-10-31 09:40:23 -04:00
Marty Schoch
64b0066121 added support for tracking index stats and exposing via expvar
closes #83
2014-10-02 11:12:49 -07:00
Marty Schoch
97902e2619 text analysis now moved out of index write lock onto goroutine
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently

as a part of benchmarking these changes i've also introduce a new
null storage implementation.  this should never be used, as it
does not actualy build an index.  it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O.  this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
2014-09-24 08:13:14 -04:00
Marty Schoch
198ca1ad4d major refactor of kvstore/index internals, see below
In the index/store package
introduce KVReader
  creates snapshot
  all read operations consistent from this snapshot
  must close to release

introduce KVWriter
  only one writer active
  access to all operations
  allows for consisten read-modify-write
  must close to release

introduce AssociativeMerge operation on batch
  allows efficient read-modify-write
  for associative operations
  used to consolidate updates to the term summary rows
  saves 1 set and 1 get op per shared instance of term in field

In the index package
introduced an IndexReader
  exposes a consisten snapshot of the index for searching

At top level
  All searches now operate on a consisten snapshot of the index
2014-09-12 17:21:35 -04:00
Marty Schoch
9d2187706e another round of golint 2014-09-03 19:53:59 -04:00
Marty Schoch
377ae090d0 additional golint issues resolved 2014-09-03 18:17:26 -04:00
Marty Schoch
d534b0836b converted ALL_CAPS constants to CamelCase 2014-09-03 17:48:40 -04:00