0
0
Commit Graph

104 Commits

Author SHA1 Message Date
Marty Schoch
522f9d5cc7 significant change to index format, support dictionary rows
this introduces disk format v4
now the summary rows for a term are stored in their own
"dictionary row" format, previously the same information
was stored in special term frequency rows
this now allows us to easily iterate all the terms for a field
in sorted order (useful for many other fuzzy data structures)

at the top-level of bleve you can now browse terms within a field
using the following api on the Index interface:

  FieldDict(field string) (index.FieldDict, error)
  FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error)
  FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)

fixes #127
2015-03-10 16:22:19 -04:00
Marty Schoch
4e14f4e4ef change path for forestdb test to correctly cleanup
this is due to forestdb auto-compaction using the provided
path as just the prefix, so if we're not careful we end
up with many stray files laying around
here, we create a sub-directory first, and just nuke the
whole subdir when we're done
2015-03-10 14:05:58 -04:00
Marty Schoch
300ec79c96 first pass at checking errors that were ignored
part of #169
2015-03-06 14:46:29 -05:00
Marty Schoch
a2ad7634f2 update term freq rows to use varint where possible
benchmark old ns/op new ns/op delta
BenchmarkLevelDBIndexing1Workers 1138292 657901 -42.20%
BenchmarkLevelDBIndexing2Workers 1619323 647628 -60.01%
BenchmarkLevelDBIndexing4Workers 1172845 636478 -45.73%
BenchmarkLevelDBIndexing1Workers10Batch 465556545 448153394 -3.74%
BenchmarkLevelDBIndexing2Workers10Batch 504203911 449657355 -10.82%
BenchmarkLevelDBIndexing4Workers10Batch 510766435 439839335 -13.89%
BenchmarkLevelDBIndexing1Workers100Batch 307657846 268976464 -12.57%
BenchmarkLevelDBIndexing2Workers100Batch 302257400 269110215 -10.97%
BenchmarkLevelDBIndexing4Workers100Batch 305320485 259084902 -15.14%
BenchmarkLevelDBIndexing1Workers1000Batch 301320576 258070231 -14.35%
BenchmarkLevelDBIndexing2Workers1000Batch 334174454 261175641 -21.84%
BenchmarkLevelDBIndexing4Workers1000Batch 267732436 261461739 -2.34%

closes #165
2015-03-06 13:00:53 -05:00
Marty Schoch
c566d34264 bump index format version number, start checking version on open 2015-02-17 17:16:31 +05:30
Steve Yen
38ee9be353 added some batch size 1000 microbenchmarks 2015-01-30 15:58:39 -08:00
Steve Yen
7d6a6aeaa8 single append for inmem KVStore batch 2015-01-29 11:14:08 -08:00
Steve Yen
5a30d36b17 cznicb KVStore uses Put() for faster read-modify-write 2015-01-29 11:02:01 -08:00
Steve Yen
b054cddf76 gtreap KVStore does 1 append for batch Set/Delete 2015-01-29 10:49:39 -08:00
Steve Yen
05d222f490 cznicb KVStore batch uses <2 appends per Set/Delete 2015-01-29 10:22:13 -08:00
Steve Yen
c5c59e61f4 make leveldb faster with non-zero sized batch 2015-01-29 10:20:26 -08:00
Steve Yen
1c1774d4ad throw away data even faster in null KVStore 2015-01-29 10:17:21 -08:00
Steve Yen
782ad94e01 added debug tag for metrics KVStore 2015-01-16 11:18:40 -08:00
Marty Schoch
eebc8e7825 more debuging around forestdb snapshots 2015-01-16 14:18:28 -05:00
Marty Schoch
ba978ea27e improving log messages 2015-01-16 14:07:47 -05:00
Marty Schoch
09fe749913 default to autocompaction for forestdb 2015-01-16 13:35:43 -05:00
Steve Yen
12dc2aff93 add go1.4 build tag to cznicb KVStore
This is because github.com/cznic/b depends on sync.Pool.
2015-01-15 15:54:25 -08:00
Steve Yen
11ee0209ad no leading zeros for metrics CSV output 2015-01-15 15:09:53 -08:00
Steve Yen
202191201c added WriteCSV() to metrics KVStore 2015-01-15 14:11:15 -08:00
Steve Yen
9be4e217bc metrics KVStore tracks perf metrics on a wrapped KVStore 2015-01-15 11:42:41 -08:00
Steve Yen
ea0a8657f3 added cznicb in-memory kvstore (no reader isolation) 2015-01-13 17:35:28 -08:00
Marty Schoch
362d240b09 added configurable options to leveldb 2015-01-13 16:24:51 -05:00
Steve Yen
d6e6f655c9 initialize forestdb config if provided 2015-01-13 12:03:24 -08:00
Steve Yen
1fa80ffc40 pass config to forestdb Open() 2015-01-13 11:04:02 -08:00
Steve Yen
3a00a968f2 close levigo's read & write options 2015-01-12 18:42:19 -08:00
Steve Yen
c20726bb93 close levigo.Options when db is closed 2015-01-12 18:42:19 -08:00
Steve Yen
603c3af8bb added gtreap in-memory, copy-on-write KVStore 2015-01-12 11:26:21 -08:00
Marty Schoch
d68c52e621 adding forestdb benchmark 2015-01-12 12:56:37 -05:00
Steve Yen
ae3600aeea expose forestdb rollback methods 2015-01-06 18:59:02 -08:00
Steve Yen
5467e0a385 forestdb registered name fixed 2015-01-06 17:36:05 -08:00
Marty Schoch
38bdcbeb62 update to new forestdb iterator api 2014-12-27 13:15:14 -08:00
Silvan Jegen
ef18dfe4cd Fix typos in comments and strings 2014-12-18 18:43:12 +01:00
Sergey Avseyev
a8351be5a6
Update protobuf imports 2014-12-10 01:24:59 +03:00
Silvan Jegen
412049d63c Remove unneeded import statements 2014-11-29 14:25:24 +01:00
Marty Schoch
6c7237ade9 added test for null kvstore 2014-11-26 15:50:57 -05:00
Marty Schoch
453d4cf770 change to always return stored fields in UTC 2014-11-26 15:36:34 -05:00
Marty Schoch
8ad0f64459 upgrade to current forestdb api 2014-11-25 21:52:35 -05:00
Marty Schoch
d5c1f4a9ab refactored store tests 2014-11-25 21:52:23 -05:00
Silvan Jegen
e3a2d3b58b Remove unneeded else clauses 2014-11-20 20:34:05 +01:00
Marty Schoch
47bc7caec3 added getRollbackID() and rollbackTo() to the ForestDB store 2014-11-04 08:34:49 -05:00
Marty Schoch
3f83149ed3 adding back the forestdb kv store impl 2014-10-31 09:42:32 -04:00
Marty Schoch
c7443fe52b refactored API a bit
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API.  in these case and proactively in a few
others we now return error as well.

also the batch API has been updated to allow performing
set/delete internal within the batch
2014-10-31 09:40:23 -04:00
Marty Schoch
64b0066121 added support for tracking index stats and exposing via expvar
closes #83
2014-10-02 11:12:49 -07:00
Marty Schoch
97902e2619 text analysis now moved out of index write lock onto goroutine
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently

as a part of benchmarking these changes i've also introduce a new
null storage implementation.  this should never be used, as it
does not actualy build an index.  it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O.  this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
2014-09-24 08:13:14 -04:00
Marty Schoch
198ca1ad4d major refactor of kvstore/index internals, see below
In the index/store package
introduce KVReader
  creates snapshot
  all read operations consistent from this snapshot
  must close to release

introduce KVWriter
  only one writer active
  access to all operations
  allows for consisten read-modify-write
  must close to release

introduce AssociativeMerge operation on batch
  allows efficient read-modify-write
  for associative operations
  used to consolidate updates to the term summary rows
  saves 1 set and 1 get op per shared instance of term in field

In the index package
introduced an IndexReader
  exposes a consisten snapshot of the index for searching

At top level
  All searches now operate on a consisten snapshot of the index
2014-09-12 17:21:35 -04:00
Marty Schoch
7819deb447 added boltdb benchmark, same as others 2014-09-12 16:55:50 -04:00
Marty Schoch
2294b24b9d remove forestdb for now
not any benfefit in maintaining this for the time being
2014-09-12 16:55:11 -04:00
Marty Schoch
9d2187706e another round of golint 2014-09-03 19:53:59 -04:00
Marty Schoch
e21935f850 another round of golint cleanup 2014-09-03 19:16:46 -04:00
Marty Schoch
e1b77956d4 more golint cleanups 2014-09-03 18:47:02 -04:00