bleve

Author	SHA1	Message	Date
Marty Schoch	9781d9b089	add initial version of zap file format	2017-12-09 14:28:33 -05:00
Marty Schoch	ff2e6b98e4	added empty segment	2017-12-09 12:43:02 -05:00
Marty Schoch	e470105635	fix issues identified by errcheck	2017-12-06 18:36:14 -05:00
Marty Schoch	adac4f41db	initial version of scorch which persists index to disk	2017-12-06 18:33:47 -05:00
Marty Schoch	b1346b4c8a	add readme describing our use of bolt as a segment format	2017-12-05 16:09:00 -05:00
Marty Schoch	898a6b1e85	fix errcheck issues	2017-12-05 13:32:57 -05:00
Marty Schoch	ece27ef215	adding initial version of bolt persisted segment	2017-12-05 13:05:12 -05:00
Marty Schoch	f6be841668	add test for postings list count method	2017-12-05 13:01:36 -05:00
Marty Schoch	30e9d6daa5	add better testing of array positions	2017-12-05 12:54:44 -05:00
Marty Schoch	8d9d45115f	add test of location field	2017-12-05 12:20:06 -05:00
Marty Schoch	8f0350865b	add test for segment fields method	2017-12-05 12:17:56 -05:00
Marty Schoch	7a6b5483f2	add validation that all locations were seen	2017-12-05 11:58:05 -05:00
Marty Schoch	e08fdab54a	remove todo item	2017-12-05 10:13:27 -05:00
Marty Schoch	87e2627551	added dictionary tests to mem segment	2017-12-05 09:49:41 -05:00
Marty Schoch	ed067f45dd	added Close() method to Segment	2017-12-05 09:31:02 -05:00
Marty Schoch	22ffc8940e	update segment API to return error in key places	2017-12-04 18:06:06 -05:00
Marty Schoch	b74cf4b081	add copyright header to all new files in scorch	2017-12-01 15:42:50 -05:00
Marty Schoch	89aa02cf5b	fix highlighting of composite fields updated log statements for refactored names	2017-12-01 15:12:08 -05:00
Marty Schoch	cff14f1212	fix crash in DocNumbers when segment is empty	2017-12-01 09:50:27 -05:00
Marty Schoch	eb256f78bc	switch to constant referring to id field id 0 this avoids potentially mutating something that is intended to be immutable	2017-12-01 09:30:07 -05:00
Marty Schoch	7c964de8bf	switch to binary search for finding segment from global doc num added unit tests for this function specifically	2017-12-01 09:26:51 -05:00
Marty Schoch	c2047dcdf9	refactor doc id reader creation to share more code fix issue identified by steve	2017-12-01 08:54:39 -05:00
Marty Schoch	bcd4bdc3d1	added initial bolt thought to README	2017-12-01 07:27:04 -05:00
Marty Schoch	395458ce83	refactor to make mem segment contents exported	2017-12-01 07:26:47 -05:00
Steve Yen	398dcb19b3	scorch introducer uses the roaring.Or(x, y) API Instead of cloning an input bitmap, the roaring.Or(x, y) implementation fills a brand new result bitmap, which should be allow for more efficient packing and memory utilization.	2017-11-30 10:37:10 -08:00
Steve Yen	67986d41bf	scorch InternalID() handles case of unknown docId	2017-11-30 08:36:01 -08:00
Marty Schoch	848aca4639	fix issues identified by errcheck	2017-11-29 13:34:15 -05:00
Marty Schoch	23f6dc1cc6	working in-memory version	2017-11-29 11:33:35 -05:00
Steve Yen	546700b2de	fix comment typo	2017-08-24 16:25:10 -07:00
Marty Schoch	cea119449e	fix data race in doc id search the implementation of the doc id search requires that the list of ids be sorted. however, when doing a multisearch across many indexes at once, the list of doc ids in the query is shared. deeper in the implementation, the search of each shard attempts to sort this list, resulting in a data race. this is one example of a potentially larger problem, however it has been decided to fix this data race, even though larger issues of data owernship may remain unresolved. this fix makes a copy of the list of doc ids, just prior to sorting the list. subsequently, all use of the list is on the copy that was made, not the original. fixes #518	2017-08-07 15:11:35 -04:00
abhinavdangeti	8ec88a6cb0	MB-24560: Add moss store\|collection histograms to stats	2017-05-25 16:32:36 -07:00
Marty Schoch	3ad13236ec	fix geopoint fields to be able to be stored and retrieved	2017-03-31 09:40:54 -04:00
Marty Schoch	74140d4f2b	remove forestdb from bleve	2017-03-30 12:27:23 -04:00
Marty Schoch	1bcfe4efa1	Merge pull request #546 from sreekanth-cb/store_abort_close Store abort close	2017-03-07 12:35:18 -05:00
Sreekanth Sivasankaran	f759d841c2	Adding guards for config casting.	2017-03-07 22:51:27 +05:30
Sreekanth Sivasankaran	e88ff3c60a	Merge branch 'store_abort_close' of https://github.com/sreekanth-cb/bleve into store_abort_close Syntax change for errcheck tool	2017-03-07 19:56:08 +05:30
Sreekanth Sivasankaran	ee819f5950	MB-22410 - Configurable forced Store Abort API Adding a configurable forced store close Bumping the moss store version	2017-03-07 19:33:51 +05:30
Marty Schoch	0eba2a3f0c	reduce garbage created while processing facets previously we parsed/returned large sections of the documents back index row in order to compute facet information. this would require parsing the protobuf of the entire back index row. unfortunately this creates considerable garbage. this new version introduces a visitor/callback approach to working with data inside the back index row. the benefit of this approach is that we can let the higher-level code see values, prior to any copies of data being made or intermediate garbage being created. implementations of the callback must copy any value which they would like to retain beyond the callback. NOTE: this approach is duplicates code from the automatically generated protobuf code NOTE: this approach assumes that the "field" field be serialized before the "terms" field. This is guaranteed by our currently generated protobuf encoder, and is recommended by the protobuf spec. But, decoders SHOULD support them occuring in any order, which we do not.	2017-03-02 17:00:46 -05:00
Marty Schoch	b04745abcc	remove smolder indexing scheme this was an experiment that we're no longer working on we learned from it, but now carrying it forward has a maintenance burden we don't wish to pay	2017-03-01 14:38:17 -05:00
Sreekanth Sivasankaran	67a5814fbe	MB-22410:deleting/editing index definition with large dirty write queue can be very slow Adding a configurable forced store close	2017-03-01 18:58:32 +05:30
Sreekanth Sivasankaran	324e4237cf	adding configurable Abort Close	2017-03-01 16:23:56 +05:30
Sundar Sridharan	74c7de0dcf	re-order childSnapshot declaration	2017-02-21 15:54:04 -08:00
Sundar Sridharan	04d428656e	Add Snapshot interface methods for moss child collections feature	2017-02-20 15:03:45 -08:00
Steve Yen	0b70a1bcb8	use inlined prealloc'ed termFreqRow in upsidedown termFieldReader	2017-02-08 18:23:13 -08:00
Steve Yen	31fecc3663	avoid row alloc's in upsidedown termFieldReader constructor	2017-02-08 18:14:30 -08:00
Marty Schoch	606fd6344b	INDEX FORMAT CHANGE: change back index row value Previously term entries were encoded pairwise (field/term), so you'd have data like: F1/T1 F1/T2 F1/T3 F2/T4 F3/T5 As you can see, even though field 1 has 3 terms, we repeat the F1 part in the encoded data. This is a bit wasteful. In the new format we encode it as a list of terms for each field: F1/T1,T2,T3 F2/T4 F3/T5 When fields have multiple terms, this saves space. In unit tests there is no additional waste even in the case that a field has only a single value. Here are the results of an indexing test case (beer-search): $ benchcmp indexing-before.txt indexing-after.txt benchmark old ns/op new ns/op delta BenchmarkIndexing-4 11275835988 10745514321 -4.70% benchmark old allocs new allocs delta BenchmarkIndexing-4 25230685 22480494 -10.90% benchmark old bytes new bytes delta BenchmarkIndexing-4 4802816224 4741641856 -1.27% And here are the results of a MatchAll search building a facet on the "abv" field: $ benchcmp facet-before.txt facet-after.txt benchmark old ns/op new ns/op delta BenchmarkFacets-4 439762100 228064575 -48.14% benchmark old allocs new allocs delta BenchmarkFacets-4 9460208 3723286 -60.64% benchmark old bytes new bytes delta BenchmarkFacets-4 260784261 151746483 -41.81% Although we expect the index to be smaller in many cases, the beer-search index is about the same in this case. However, this may be due to the underlying storage (boltdb) in this case. Finally, the index version was bumped from 5 to 7, since smolder also used version 6, which could lead to some confusion.	2017-01-24 15:39:38 -05:00
Steve Yen	5927224e15	optimize mergeOldAndNew for case of first time a doc is seen	2017-01-09 22:48:58 -08:00
Steve Yen	790f2e3e32	optimize by alloc'ing arrays of TermFrequencyRow/TermVector	2017-01-09 22:42:00 -08:00
Steve Yen	8f4726ab10	use struct{}{} idiom instead of additional mark var	2017-01-09 10:17:26 -08:00
Steve Yen	302cac72c4	optimize mergeOldAndNew when non-update case	2017-01-08 17:59:49 -08:00

1 2 3 4 5 ...

458 Commits