0
0
Commit Graph

775 Commits

Author SHA1 Message Date
Marty Schoch
1e4d637761 adding more benchmarks 2015-09-10 08:01:11 -04:00
Marty Schoch
f74ed6a9ae Merge remote-tracking branch 'origin' into firestorm
cathching up with changes from master
2015-09-02 13:29:03 -04:00
Marty Schoch
dbb93b75a4 refactoring to allow pluggable index encodings
this lays the foundation for supporting the new firestorm
indexing scheme.  i'm merging these changes ahead of
the rest of the firestorm branch so i can continue
to make changes to the analysis pipeline in parallel
2015-09-02 13:12:08 -04:00
Marty Schoch
7ad7659ce5 add support for using null kvstore outside of bleve internals 2015-09-02 11:50:06 -04:00
Marty Schoch
07d37ca38a add important rocksdb config options 2015-09-02 11:49:42 -04:00
Marty Schoch
18151862b5 fix go vet issues 2015-08-25 15:13:13 -04:00
Marty Schoch
84811cf5a0 made index type configurable + first version of firestorm 2015-08-25 14:52:42 -04:00
Marty Schoch
3e60ca24ec support using end key on forestdb iterator for term freq lookup
also additoanl forestdb configs
2015-08-18 16:22:02 -04:00
Marty Schoch
ae19d77b04 updated protobuf defs to be valid 2015-08-17 15:37:13 -04:00
Marty Schoch
1187436e46 changed Stored row Values to also use protobuf 2015-08-17 09:48:40 -04:00
Marty Schoch
8d8a05a842 fix more issues 2015-08-14 16:27:00 -04:00
Marty Schoch
e0802a2b39 fixed the worst of the formatting 2015-08-14 16:17:48 -04:00
Marty Schoch
f4df56eb7c add first draft of firestorm proposal 2015-08-14 16:09:19 -04:00
Marty Schoch
d3dda3d0ea fixup config parsing and add new options 2015-08-12 13:18:23 -04:00
Marty Schoch
01667dfff3 faster protobufs with gogo 2015-08-12 13:18:23 -04:00
Marty Schoch
7df66b4857 fix broken benchmark cause by index row encoding change 2015-08-06 14:48:04 -04:00
Marty Schoch
9db850a53e Merge branch 'fix/MaxVarintLen64' of https://github.com/tukdesk/bleve into tukdesk-fix/MaxVarintLen64 2015-07-31 15:16:16 -04:00
Marty Schoch
3682c25467 update to correctly work with composite fields
also updated search results to return array positions
2015-07-31 11:16:11 -04:00
Marty Schoch
c1c4941dde Merge branch 'feature/term_vector' of https://github.com/tukdesk/bleve into tukdesk-feature/term_vector 2015-07-29 14:31:15 -04:00
Marty Schoch
bf8dcae76b removing build tags 2015-07-28 18:59:10 -04:00
Marty Schoch
1b28f6218b additional row validation 2015-07-13 15:22:54 -04:00
Marty Schoch
17ef48f82a switching back to the canonical goleveldb repo 2015-07-08 12:21:17 -06:00
Marty Schoch
bf80f4628e fix bug in curent goleveldb (must copy during iteration)
also changed over to mschoch fork of goleveldb (temporary)

the change to my fork is pending some read-only issues described
here:  https://github.com/syndtr/goleveldb/issues/111

hopefully we can find a path forward, and get that addressed upstream
2015-07-06 18:00:05 -04:00
Marty Schoch
7be7ecdf8e fix batch indexing bug, incremented docCount before commit
fixes #211
2015-06-08 14:14:05 -04:00
Marty Schoch
2768c2da3c fix previous sloppy fix which hadn't been adequately tested 2015-05-27 19:15:55 -07:00
Marty Schoch
201fb91171 fix up to correctly trim off separator
even though it should never be present
2015-05-27 19:10:12 -07:00
Marty Schoch
a58592ceff fix case where NewBackIndexRowKV returns nil, nil
the logic for reading the docID from the keys
in this row relies on the keys NEVER containing
the byte separator character (0xff), this is OK
as we require that all keys be valid utf-8
however, it turns out that in the case where this
rule was violated, we would panic, because we
return nil, nil and later try to print the doc id
2015-05-27 19:04:57 -07:00
dtynn
59c97ae577 use binary.MaxVarintLen64 2015-05-26 15:35:31 +08:00
Marty Schoch
e0887f9113 fix tests which deadlock boltdb due to deferred cleanup
fixes #209
2015-05-21 12:29:31 -04:00
Marty Schoch
a52d3b5c07 put in hack to allow boltdb reader isolation test to pass
in boltdb, long readers *MAY* block a writer.  in particular if
the write requires additional allocation, it must acquire a lock
already held by the reader.  in general this is not a problem
for bleve (though it can affect performance in some cases), but
it is a problem for the reader isolation test.  this commit
adds a hack to try and avoid the need for additional allocation
closes #208
2015-05-21 11:39:59 -04:00
dtynn
b4f7496031 update the index format version number 2015-05-18 15:16:35 +08:00
dtynn
89dc2c22bc update TermVector 2015-05-17 13:07:14 +08:00
Marty Schoch
8f70def63b properly use the stored array positions when loading a document
fixes #205
2015-05-15 15:47:54 -04:00
Marty Schoch
328bc73ed0 clarify Batch is not threadsafe in docs
in some limited cases we can detect unsafe usage
in these cases, do not trip over ourselves and panic
instead return a strongly typed error upside_down.UnsafeBatchUseDetected
also, introduced Batch.Reset() to allow batch reuse
this is currently still experimental
closes #195
2015-05-15 15:04:52 -04:00
Marty Schoch
57cd67fa88 fix data race on index metadata (docCount)
closes #198
2015-05-08 08:07:20 -04:00
Marty Schoch
57358088ec fix row merging bug
trying to be clever, we reused the memory allocated for the left
operand when doing partial merges
this had been tested to be safe, in general.  however, the
implementation was then written such that we always reused
globally defined operands, this meant that we mutated
the operands which were intended to always represent
+1/-1
this then cascades quickly to making increment/decrement
values much larger/smaller than they should be
related to #197
2015-05-06 11:00:04 -04:00
Marty Schoch
30a0ba1f9b fix bug, dictionary row encoding buffer too small
we incorrectly created a []byte of length 8
but the max for a uvarint is 10
closes #197
2015-05-06 10:04:02 -04:00
Steve Yen
e98ae8ab71 update metrics store to latest kvstore api 2015-04-27 11:01:53 -07:00
Marty Schoch
16f538d7b7 close documents returned by iterator before losing their reference
fixes #194
2015-04-24 17:48:21 -04:00
Marty Schoch
b54a59139c change forestdb imports to couchbase not couchbaselabs 2015-04-24 17:35:01 -04:00
Marty Schoch
ee47d1c21a standardize on including 1000 sized batches 2015-04-24 17:31:34 -04:00
Marty Schoch
452fea6a24 adding initial impl of rocksdb kv store 2015-04-24 17:19:44 -04:00
Marty Schoch
a9c07acbfa refactor of kvstore api to support native merge in rocksdb
refactor to share code in emulated batch
refactor to share code in emulated merge
refactor index kvstore benchmarks to share more code
refactor index kvstore benchmarks to be more repeatable
2015-04-24 17:13:50 -04:00
indraniel
a62320a50e + fix goleveldb's BytesSafeAfterClose() on reader
- it should be set to false
2015-04-10 15:45:22 -05:00
Marty Schoch
d5dc66313f change variable name conflicting when both LevelDB bencharmks run 2015-04-10 15:03:44 -04:00
Marty Schoch
d5caad4405 changed GoLevelDB benchmark names to be different from LevelDB
this will allow for easier comparision when running both
versions at the same time
2015-04-10 15:00:56 -04:00
Marty Schoch
5f66bd84c7 fix issues identified by errcheck 2015-04-10 14:59:05 -04:00
indraniel
54ab493b3e + correctly copy bytes from the goleveldb store
- this is part of a recent bleve KVStore API change.

    See the following two google group threads for more details:

    * [help adding goleveldb as an alternative Key/Value store for bleve][1]
    * [bleve search performance improvement][2]

    [1]: https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY
    [2]: https://groups.google.com/forum/#!topic/bleve/aTyqsSnbhik
2015-04-10 11:25:23 -05:00
indraniel
81bef38cce Revert "+ make copies of the []bytes returned by goleveldb"
This reverts commit cb8c1741289a0f00b30733e0d52d9d81d1199603.

This commit is no longer desired. The KV store API has been changed to
better address this issue.

For more details, see the google group conversation thread at:

https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY
2015-04-10 11:12:44 -05:00
indraniel
3a70401835 + make copies of the []bytes returned by goleveldb
- The byte strings returned by goleveldb aren't necessarily safe.  See
    the following google group thread:

    https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY

    This code change is based on the gist created here:

    https://groups.google.com/forum/#!topic/bleve/aHZ8gmihLiY
2015-04-10 11:08:02 -05:00
indraniel
a88d714778 + add a goleveldb index updside-down benchmark test 2015-04-10 11:08:02 -05:00
indraniel
a0a2a61050 + keep 'get' consistent with levigo implementation
- this change keeps the method behavior consistent with the
     levigo/leveldb implementation.

   - don't issue an err if a key isn't found
2015-04-10 11:08:02 -05:00
indraniel
5e55fa2866 + keep 'getWithSnapshot' consistent with levigo implementation
- this change keeps the method behavior consistent with the
     levigo/leveldb implementation.

   - the leveldb store_test.go and goleveldb store_test.go are now
     identical.
2015-04-10 11:08:02 -05:00
indraniel
caa19e6c36 + initial stub of goleveldb package
- This is a first-pass introduction. Things may not be working
    correctly yet.
2015-04-10 11:08:02 -05:00
Marty Schoch
8581e73cef added String method for Batch
also changed Batch methods to pointer receiver
closes #180
2015-04-08 10:41:42 -04:00
Marty Schoch
539aeb8dc7 fix errors identified by errcheck
part of #169
2015-04-07 18:05:41 -04:00
Marty Schoch
ba6b3c8bb3 fix more issues identified by errcheck
part of #169
2015-04-07 16:45:23 -04:00
Marty Schoch
ab24772bf0 fix issues identified by errcheck
part of #169
2015-04-07 16:34:29 -04:00
Marty Schoch
56c4a09de1 fix issues identified by errcheck
part of #169
2015-04-07 15:39:56 -04:00
Marty Schoch
93e01a803e fix issues identified by errcheck
part of #169
2015-04-07 14:52:00 -04:00
Marty Schoch
f1ec73e764 fix issues identified by errcheck
part of #169
2015-04-07 13:26:54 -04:00
Marty Schoch
56a30a3574 fix issues identified by errcheck
part of #169
2015-04-07 13:05:47 -04:00
Marty Schoch
d2e9409413 fix issues identified by errcheck
part of #169
2015-04-07 12:04:59 -04:00
Marty Schoch
dd921d31e3 undoing f92ab131e4
we now guarantee bytes were copied earlier in the chain
the kv store is NOT responsible for making an additional copy
closes #181
2015-04-07 11:12:28 -04:00
Marty Schoch
443c0252e0 fix another metrics BytesSafeAfterClose() loop
closes #184
2015-04-03 21:17:23 -04:00
Steve Yen
efc39a6857 fix metrics BytesSafeAfterClose() loop
fixes issue 184
2015-04-03 16:36:32 -07:00
Marty Schoch
867110e03b major improvements to index row encoding
improvements uncovered some issues with how k/v data was copied
or not.  to address this, kv abstraction layer now lets impl
specify if the bytes returned are safe to use after a reader
(or writer since writers are also readers) are closed
See index/store/KVReader - BytesSafeAfterClose() bool
false is the safe value if you're not sure
it will cause index impls to copy the data
Some kv impls already have created a copy a the C-api barrier
in which case they can safely return true.

Overall this yields ~25% speedup for searches with leveldb.
It yields ~10% speedup for boltdb.
Returning stored fields is now slower with boltdb, as previously
we were returning unsafe bytes.
2015-04-03 16:50:48 -04:00
Steve Yen
dbf50b7f29 KVStore gtreap allows only 1 writer at a time 2015-03-26 16:40:18 -07:00
Steve Yen
f92ab131e4 KVStore gtreap implementation copies value bytes 2015-03-26 14:46:37 -07:00
Steve Yen
78453dab7d metrics KVStore now tracks last 100 errors 2015-03-19 18:41:16 -07:00
Marty Schoch
a44a7c01af rewrite to used fixed size []byte instead of buffer
removes unchecked errors in calls to buffer.Write
and also benchmarks considerably faster
2015-03-11 15:12:13 -04:00
Marty Schoch
522f9d5cc7 significant change to index format, support dictionary rows
this introduces disk format v4
now the summary rows for a term are stored in their own
"dictionary row" format, previously the same information
was stored in special term frequency rows
this now allows us to easily iterate all the terms for a field
in sorted order (useful for many other fuzzy data structures)

at the top-level of bleve you can now browse terms within a field
using the following api on the Index interface:

  FieldDict(field string) (index.FieldDict, error)
  FieldDictRange(field string, startTerm []byte, endTerm []byte) (index.FieldDict, error)
  FieldDictPrefix(field string, termPrefix []byte) (index.FieldDict, error)

fixes #127
2015-03-10 16:22:19 -04:00
Marty Schoch
4e14f4e4ef change path for forestdb test to correctly cleanup
this is due to forestdb auto-compaction using the provided
path as just the prefix, so if we're not careful we end
up with many stray files laying around
here, we create a sub-directory first, and just nuke the
whole subdir when we're done
2015-03-10 14:05:58 -04:00
Marty Schoch
300ec79c96 first pass at checking errors that were ignored
part of #169
2015-03-06 14:46:29 -05:00
Marty Schoch
a2ad7634f2 update term freq rows to use varint where possible
benchmark old ns/op new ns/op delta
BenchmarkLevelDBIndexing1Workers 1138292 657901 -42.20%
BenchmarkLevelDBIndexing2Workers 1619323 647628 -60.01%
BenchmarkLevelDBIndexing4Workers 1172845 636478 -45.73%
BenchmarkLevelDBIndexing1Workers10Batch 465556545 448153394 -3.74%
BenchmarkLevelDBIndexing2Workers10Batch 504203911 449657355 -10.82%
BenchmarkLevelDBIndexing4Workers10Batch 510766435 439839335 -13.89%
BenchmarkLevelDBIndexing1Workers100Batch 307657846 268976464 -12.57%
BenchmarkLevelDBIndexing2Workers100Batch 302257400 269110215 -10.97%
BenchmarkLevelDBIndexing4Workers100Batch 305320485 259084902 -15.14%
BenchmarkLevelDBIndexing1Workers1000Batch 301320576 258070231 -14.35%
BenchmarkLevelDBIndexing2Workers1000Batch 334174454 261175641 -21.84%
BenchmarkLevelDBIndexing4Workers1000Batch 267732436 261461739 -2.34%

closes #165
2015-03-06 13:00:53 -05:00
Marty Schoch
c566d34264 bump index format version number, start checking version on open 2015-02-17 17:16:31 +05:30
Steve Yen
38ee9be353 added some batch size 1000 microbenchmarks 2015-01-30 15:58:39 -08:00
Steve Yen
7d6a6aeaa8 single append for inmem KVStore batch 2015-01-29 11:14:08 -08:00
Steve Yen
5a30d36b17 cznicb KVStore uses Put() for faster read-modify-write 2015-01-29 11:02:01 -08:00
Steve Yen
b054cddf76 gtreap KVStore does 1 append for batch Set/Delete 2015-01-29 10:49:39 -08:00
Steve Yen
05d222f490 cznicb KVStore batch uses <2 appends per Set/Delete 2015-01-29 10:22:13 -08:00
Steve Yen
c5c59e61f4 make leveldb faster with non-zero sized batch 2015-01-29 10:20:26 -08:00
Steve Yen
1c1774d4ad throw away data even faster in null KVStore 2015-01-29 10:17:21 -08:00
Steve Yen
782ad94e01 added debug tag for metrics KVStore 2015-01-16 11:18:40 -08:00
Marty Schoch
eebc8e7825 more debuging around forestdb snapshots 2015-01-16 14:18:28 -05:00
Marty Schoch
ba978ea27e improving log messages 2015-01-16 14:07:47 -05:00
Marty Schoch
09fe749913 default to autocompaction for forestdb 2015-01-16 13:35:43 -05:00
Steve Yen
12dc2aff93 add go1.4 build tag to cznicb KVStore
This is because github.com/cznic/b depends on sync.Pool.
2015-01-15 15:54:25 -08:00
Steve Yen
11ee0209ad no leading zeros for metrics CSV output 2015-01-15 15:09:53 -08:00
Steve Yen
202191201c added WriteCSV() to metrics KVStore 2015-01-15 14:11:15 -08:00
Steve Yen
9be4e217bc metrics KVStore tracks perf metrics on a wrapped KVStore 2015-01-15 11:42:41 -08:00
Steve Yen
ea0a8657f3 added cznicb in-memory kvstore (no reader isolation) 2015-01-13 17:35:28 -08:00
Marty Schoch
362d240b09 added configurable options to leveldb 2015-01-13 16:24:51 -05:00
Steve Yen
d6e6f655c9 initialize forestdb config if provided 2015-01-13 12:03:24 -08:00
Steve Yen
1fa80ffc40 pass config to forestdb Open() 2015-01-13 11:04:02 -08:00
Steve Yen
3a00a968f2 close levigo's read & write options 2015-01-12 18:42:19 -08:00
Steve Yen
c20726bb93 close levigo.Options when db is closed 2015-01-12 18:42:19 -08:00
Steve Yen
603c3af8bb added gtreap in-memory, copy-on-write KVStore 2015-01-12 11:26:21 -08:00
Marty Schoch
d68c52e621 adding forestdb benchmark 2015-01-12 12:56:37 -05:00
Steve Yen
ae3600aeea expose forestdb rollback methods 2015-01-06 18:59:02 -08:00
Steve Yen
5467e0a385 forestdb registered name fixed 2015-01-06 17:36:05 -08:00
Marty Schoch
38bdcbeb62 update to new forestdb iterator api 2014-12-27 13:15:14 -08:00
Silvan Jegen
ef18dfe4cd Fix typos in comments and strings 2014-12-18 18:43:12 +01:00
Sergey Avseyev
a8351be5a6
Update protobuf imports 2014-12-10 01:24:59 +03:00
Silvan Jegen
412049d63c Remove unneeded import statements 2014-11-29 14:25:24 +01:00
Marty Schoch
6c7237ade9 added test for null kvstore 2014-11-26 15:50:57 -05:00
Marty Schoch
453d4cf770 change to always return stored fields in UTC 2014-11-26 15:36:34 -05:00
Marty Schoch
8ad0f64459 upgrade to current forestdb api 2014-11-25 21:52:35 -05:00
Marty Schoch
d5c1f4a9ab refactored store tests 2014-11-25 21:52:23 -05:00
Silvan Jegen
e3a2d3b58b Remove unneeded else clauses 2014-11-20 20:34:05 +01:00
Marty Schoch
47bc7caec3 added getRollbackID() and rollbackTo() to the ForestDB store 2014-11-04 08:34:49 -05:00
Marty Schoch
3f83149ed3 adding back the forestdb kv store impl 2014-10-31 09:42:32 -04:00
Marty Schoch
c7443fe52b refactored API a bit
more things can return error now
in a couple of places we had to swallow errors because they didn't
fit the existing API.  in these case and proactively in a few
others we now return error as well.

also the batch API has been updated to allow performing
set/delete internal within the batch
2014-10-31 09:40:23 -04:00
Marty Schoch
64b0066121 added support for tracking index stats and exposing via expvar
closes #83
2014-10-02 11:12:49 -07:00
Marty Schoch
97902e2619 text analysis now moved out of index write lock onto goroutine
1. text analysis is now done before the write lock is acquired
2. there is now a pool of analysis workers
3. the size of this pool is configurable
4. this allows for documents in a batch to be analyzed concurrently

as a part of benchmarking these changes i've also introduce a new
null storage implementation.  this should never be used, as it
does not actualy build an index.  it does however let us go
through all the normal indexing machinery, without incuring
any indexing I/O.  this is very helpful in measuring improvements
made to the text analsysis pipeline, which are often overshadowed
by indexing times in benchmarks actually building an index.
2014-09-24 08:13:14 -04:00
Marty Schoch
198ca1ad4d major refactor of kvstore/index internals, see below
In the index/store package
introduce KVReader
  creates snapshot
  all read operations consistent from this snapshot
  must close to release

introduce KVWriter
  only one writer active
  access to all operations
  allows for consisten read-modify-write
  must close to release

introduce AssociativeMerge operation on batch
  allows efficient read-modify-write
  for associative operations
  used to consolidate updates to the term summary rows
  saves 1 set and 1 get op per shared instance of term in field

In the index package
introduced an IndexReader
  exposes a consisten snapshot of the index for searching

At top level
  All searches now operate on a consisten snapshot of the index
2014-09-12 17:21:35 -04:00
Marty Schoch
7819deb447 added boltdb benchmark, same as others 2014-09-12 16:55:50 -04:00
Marty Schoch
2294b24b9d remove forestdb for now
not any benfefit in maintaining this for the time being
2014-09-12 16:55:11 -04:00
Marty Schoch
9d2187706e another round of golint 2014-09-03 19:53:59 -04:00
Marty Schoch
e21935f850 another round of golint cleanup 2014-09-03 19:16:46 -04:00
Marty Schoch
e1b77956d4 more golint cleanups 2014-09-03 18:47:02 -04:00
Marty Schoch
377ae090d0 additional golint issues resolved 2014-09-03 18:17:26 -04:00
Marty Schoch
d534b0836b converted ALL_CAPS constants to CamelCase 2014-09-03 17:48:40 -04:00
Marty Schoch
8e6c8e5644 continued refactoring of the mapping code
also renamed some constant that didnt follow go convetions
2014-09-03 13:02:10 -04:00
Marty Schoch
45e1b2dfc6 removing gouchstore store impl
this implementation didn't really adhere to the contract
and now that we have boltdb we have a better pure go impl
2014-09-02 13:56:35 -04:00
Marty Schoch
7a7eb2e94c add newline between license and package
this avoids cluttering godocs with the license
2014-09-02 10:54:50 -04:00
Marty Schoch
1161361bea rename imports from couchbaselabs to blevesearch 2014-08-28 15:38:57 -04:00
Marty Schoch
ef59abe4c9 added build tag 'leveldb' to enable this kv store
by default we now use the pure go boltdb kv store
it is less tested at this point but appears to work
test pass, and moves us closer to the goal of being
able to just "go get" bleve
2014-08-25 15:18:24 -04:00
Marty Schoch
45a7a6dd8e fix two missing Close calls holding iterators open 2014-08-25 15:13:15 -04:00
Marty Schoch
8bcf6adb60 changed close of read only tx to Rollback from Commit
i was seeing deadlocks before this change
using Rollback to close read only tx is what the
built-in View() impl does, so i think its safe
2014-08-25 15:11:21 -04:00
Marty Schoch
d67ee483ba change default bucket name to bleve 2014-08-25 15:11:04 -04:00
Marty Schoch
e7a8a1fbe6 fixing test 2014-08-25 12:34:16 -04:00
Marty Schoch
fbf3636a34 Merge pull request #86 from deoxxa/boltdb-storage
add boltdb storage type
2014-08-25 12:27:26 -04:00
Marty Schoch
3309c698f8 fixed Document() behavior ot return nil when doc doesn't exist 2014-08-25 08:55:14 -04:00
deoxxa
a993fa4f74 add boltdb storage type 2014-08-24 18:37:56 +10:00
Marty Schoch
27f001bc14 overhauled top-level New/Open API
New is now used to create new indexes
Open is used to open existing indexes
calls to Open no longer specify a mapping because the mapping
is serialized and stored along with the index
2014-08-20 16:58:20 -04:00
Marty Schoch
a08a7f5b2a fix broken tests 2014-08-19 10:02:33 -04:00
Marty Schoch
082a5b0b03 major change to fields
now can track array positions for field values
stored fields now include this in the key
and the back index now uses protobufs to simplify serialization
closes #73
2014-08-19 08:58:26 -04:00
Marty Schoch
c33f1668f7 refactor dump methods
improved test coverage
2014-08-15 13:12:55 -04:00
Marty Schoch
4d53db9fc8 fixed bug with internal get/set/delete, added tests 2014-08-15 09:39:41 -04:00
Marty Schoch
c526a38369 major refactor of analysis files, now wired up to registry
ultimately this is make it more convenient for us to wire up
different elements of the analysis pipeline, without having to
preload everything into memory before we need it

separately the index layer now has a mechanism for storing
internal key/value pairs.  this is expected to be used to
store the mapping, and possibly other pieces of data by the
top layer, but not exposed to the user at the top.
2014-08-13 21:14:47 -04:00
Marty Schoch
e5d4e6f1e4 refactored index layer to support batch operations
this change was then exposed at the higher levels
also the beer-sample app was upgraded to index in batches of 100
by default.  this yieled an indexing speed up from 27s to 16s.
closes #57
2014-08-11 16:27:18 -04:00
Marty Schoch
7bbaa8ecd5 added support for returning facet results with requests
supports terms, numeric ranges, and date ranges
closes #14
2014-08-11 11:03:29 -04:00
Marty Schoch
292af78b9e implemented prefix search
closes #4
2014-08-07 13:45:39 -04:00
Marty Schoch
b16c1d7f79 changed term row encoding
previously we used the format:
't' <utf-8 term> <byte separator> <16-bit field id> <utf-8 docID> <byte separator>

now we have moved the field before the term, resulting in:
't' <16-bit field id> <utf-8 term> <byte separator> <utf-8 docID> <byte separator>

this means now instead of all fields with the same term being grouped together
all terms within the same field are grouped together

this allows us to enumerate the terms used with a field

this allows us to implement prefix search, and possibly improve numeric range queries
2014-08-07 09:39:04 -04:00
Marty Schoch
41d4f67ee2 fix storing/retrieving numeric and date fields
also includes new ability to request stored fields be returned with results

closes #55 and closes #56 and closes #58
2014-08-06 13:52:20 -04:00
Marty Schoch
4ae9eb895c added method to list fields in the index
also added a corresponding http handler
2014-07-31 11:47:36 -04:00
Marty Schoch
216767953c introduced a config option to disable creating indexes if they don't already exist
closes #23 and closes #24
2014-07-30 14:29:26 -04:00
Marty Schoch
2968d3538a major refactor, apologies for the large commit
removed analyzers (these are now built as needed through config)
removed html chacter filter (now built as needed through config)
added missing license header
changed constructor signature of filters that cannot return errors
filter constructors that can have errors, now have Must variant which panics
change cdl2 tokenizer into filter (should only see lower-case input)
new top level index api, closes #5
refactored index tests to not rely directly on analyzers
moved query objects to top-level
new top level search api, closes #12
top score collector allows skipping results
index mapping supports _all by default, closes #3 and closes #6
index mapping supports disabled sections, closes #7
new http sub package with reusable http.Handler's, closes #22
2014-07-30 12:30:38 -04:00
Marty Schoch
70a8b03bed added support for composite fields 2014-07-21 17:05:55 -04:00
Marty Schoch
d3466f3919 refactored field from struct to interface 2014-07-14 14:47:05 -04:00
Marty Schoch
2c86a731b4 added DocIdReader to Index interface
added more debug capabilities
removed hard-coded limitation on number of fields in doc
2014-07-11 14:24:28 -04:00
Marty Schoch
fda861d4e7 add formatted printing of stored rows
fix critcal bug in prefix matching on stored row keys
2014-07-03 14:51:06 -04:00
Marty Schoch
9bebbec267 added support for stored fields and highlighting results 2014-06-26 11:43:13 -04:00
Marty Schoch
4af76f539d fewer allocations building byte array encodings 2014-05-19 11:02:15 -04:00
Marty Schoch
ed308eb253 tweaking perf of gouchstore 2014-05-16 15:00:51 -04:00
Marty Schoch
1b8c353787 adding some benchmarking 2014-05-16 10:09:05 -04:00
Marty Schoch
eac4dee56d fix bug in Get impl of ForestDB store 2014-05-16 10:08:23 -04:00
Marty Schoch
1c4726c16d added build tag to include forestdb (not yet public) 2014-05-15 10:32:07 -04:00
Marty Schoch
456b002d64 adding store implementation for forestdb 2014-05-15 10:25:45 -04:00
Marty Schoch
cd5ea0991f refactored store tests to share common code 2014-05-15 10:18:43 -04:00
Marty Schoch
d48eee948e refactored index to separate out kv storage
now how pluggable options for
leveldb
gouchstore
in memory only
2014-05-09 16:37:04 -04:00
Marty Schoch
0be5cffd21 subsequent calls to advance on the same key
should keep returning the same thing
only increment on initial call
2014-04-24 16:08:28 -06:00
Marty Schoch
aeebcdd7fe improved test coverage 2014-04-22 13:57:13 -04:00
Marty Schoch
f1926093de improve coverage of the mock package 2014-04-22 13:14:17 -04:00
Marty Schoch
9ab4f97f26 fix bug when calling Advance on new reader 2014-04-22 13:13:56 -04:00
Marty Schoch
d0cdf639f3 added test of Advance() 2014-04-20 09:43:02 -04:00
Marty Schoch
63fdd841ac fix bug returning results after end 2014-04-20 09:10:41 -04:00
Marty Schoch
1f1ac3e4a8 added some negative tests to row 2014-04-18 22:31:13 -04:00
Marty Schoch
15726437eb fix issue identified by go vet 2014-04-18 21:11:32 -04:00
Marty Schoch
f92f274665 refactored to remove panics, return errors, and fewer type assertions 2014-04-18 21:07:41 -04:00
Marty Schoch
a3e04d8697 rewrote to not handle errors which cannot occur 2014-04-18 16:36:03 -04:00
Marty Schoch
bb2f66be92 Revert "refactor to use less panics, return more errors"
This reverts commit dec37fed07.
2014-04-18 16:09:34 -04:00
Marty Schoch
dec37fed07 refactor to use less panics, return more errors 2014-04-18 15:54:29 -04:00
Marty Schoch
3d842dfaf2 initial commit 2014-04-17 16:55:53 -04:00