0
0
Commit Graph

1830 Commits

Author SHA1 Message Date
Marty Schoch
adac4f41db initial version of scorch which persists index to disk 2017-12-06 18:33:47 -05:00
Marty Schoch
b1346b4c8a add readme describing our use of bolt as a segment format 2017-12-05 16:09:00 -05:00
Marty Schoch
898a6b1e85 fix errcheck issues 2017-12-05 13:32:57 -05:00
Marty Schoch
ece27ef215 adding initial version of bolt persisted segment 2017-12-05 13:05:12 -05:00
Marty Schoch
f6be841668 add test for postings list count method 2017-12-05 13:01:36 -05:00
Marty Schoch
30e9d6daa5 add better testing of array positions 2017-12-05 12:54:44 -05:00
Marty Schoch
8d9d45115f add test of location field 2017-12-05 12:20:06 -05:00
Marty Schoch
8f0350865b add test for segment fields method 2017-12-05 12:17:56 -05:00
Marty Schoch
7a6b5483f2 add validation that all locations were seen 2017-12-05 11:58:05 -05:00
Marty Schoch
e08fdab54a remove todo item 2017-12-05 10:13:27 -05:00
Marty Schoch
87e2627551 added dictionary tests to mem segment 2017-12-05 09:49:41 -05:00
Marty Schoch
ed067f45dd added Close() method to Segment 2017-12-05 09:31:02 -05:00
Marty Schoch
22ffc8940e update segment API to return error in key places 2017-12-04 18:06:06 -05:00
Marty Schoch
b74cf4b081 add copyright header to all new files in scorch 2017-12-01 15:42:50 -05:00
Marty Schoch
89aa02cf5b fix highlighting of composite fields
updated log statements for refactored names
2017-12-01 15:12:08 -05:00
Marty Schoch
cff14f1212 fix crash in DocNumbers when segment is empty 2017-12-01 09:50:27 -05:00
Marty Schoch
eb256f78bc switch to constant referring to id field id 0
this avoids potentially mutating something that is intended
to be immutable
2017-12-01 09:30:07 -05:00
Marty Schoch
7c964de8bf switch to binary search for finding segment from global doc num
added unit tests for this function specifically
2017-12-01 09:26:51 -05:00
Marty Schoch
c2047dcdf9 refactor doc id reader creation to share more code
fix issue identified by steve
2017-12-01 08:54:39 -05:00
Marty Schoch
bcd4bdc3d1 added initial bolt thought to README 2017-12-01 07:27:04 -05:00
Marty Schoch
395458ce83 refactor to make mem segment contents exported 2017-12-01 07:26:47 -05:00
Marty Schoch
f521d80835
Merge pull request #645 from steveyen/scorch
scorch InternalID() handles case of unknown docId
2017-12-01 07:21:26 -05:00
Steve Yen
398dcb19b3 scorch introducer uses the roaring.Or(x, y) API
Instead of cloning an input bitmap, the roaring.Or(x, y)
implementation fills a brand new result bitmap, which should be allow
for more efficient packing and memory utilization.
2017-11-30 10:37:10 -08:00
Steve Yen
67986d41bf scorch InternalID() handles case of unknown docId 2017-11-30 08:36:01 -08:00
Marty Schoch
848aca4639 fix issues identified by errcheck 2017-11-29 13:34:15 -05:00
Marty Schoch
23f6dc1cc6 working in-memory version 2017-11-29 11:33:35 -05:00
Joachim Schwarm
4ddc50e86d
typo in documentation 2017-11-21 16:35:07 +01:00
Marty Schoch
6eea5b78da Merge pull request #631 from dvrkps/patch-1
travis: update go versions
2017-09-12 09:10:15 -04:00
Davor Kapsa
f0503355da travis: update go versions 2017-09-12 10:56:33 +02:00
Marty Schoch
c048833fcd added stringer method to phrase part
a failing test was producing unhelpful pointer addresses as
the only debug output.  this changes the output to print
the terms and locations as readable text

part of #629
2017-09-01 09:16:08 -04:00
Marty Schoch
930c06dfec rewrote logic to be more obvious
found during code walkthrough on 8/24/2017
2017-08-25 09:30:16 -07:00
Marty Schoch
b7a51dae2a Merge pull request #625 from steveyen/master
remove unused Document.Number property
2017-08-24 17:08:20 -07:00
Steve Yen
546700b2de fix comment typo 2017-08-24 16:25:10 -07:00
Steve Yen
87115cbfb7 remove unused Document.Number property 2017-08-24 16:21:26 -07:00
Marty Schoch
82a101aedd Merge pull request #623 from mschoch/fix-race-518
fix data race in doc id search
2017-08-08 08:17:03 -04:00
Marty Schoch
cea119449e fix data race in doc id search
the implementation of the doc id search requires that the list
of ids be sorted.  however, when doing a multisearch across
many indexes at once, the list of doc ids in the query is shared.
deeper in the implementation, the search of each shard attempts
to sort this list, resulting in a data race.

this is one example of a potentially larger problem, however
it has been decided to fix this data race, even though larger
issues of data owernship may remain unresolved.

this fix makes a copy of the list of doc ids, just prior to
sorting the list.  subsequently, all use of the list is on the
copy that was made, not the original.

fixes #518
2017-08-07 15:11:35 -04:00
Andrey Khomenko
dc9f994d95 Update index.go 2017-07-20 12:06:45 -04:00
Marty Schoch
174f8ed44a Merge pull request #615 from ethantkoenig/fix/camel_case
Fix token start/end/position values in camelCase tokenizer
2017-06-28 13:18:15 -04:00
Ethan Koenig
0433f05d9c Fix test 2017-06-22 18:56:28 -04:00
Ethan Koenig
8994ad2e00 Fix token start/end/position values in camelCase tokenizer 2017-06-22 17:42:39 -04:00
Marty Schoch
011b168f7b Merge pull request #612 from sreekanth-cb/extend_setter_dateTimeRange
Adding a new bucket setter method for dateTimeRange
2017-06-14 12:31:07 -04:00
Sreekanth Sivasankaran
71afa918fe Adding a new bucket setter method for dateTimeRange 2017-06-12 15:53:27 +05:30
Marty Schoch
48ac9862db Merge pull request #607 from mschoch/fix-query-string-numeric
fix issue with numeric range queries in query string
2017-06-06 16:57:00 -04:00
Marty Schoch
4c801f2f01 fix issue with numeric range queries in query string
previously the query string queries were modified to aid in
compatibility with other search systems.  this change:
f391b991c2
has a problem when combined with:
77101ae424
due to the introduction of MatchNoneSearchers being returned
in a case where previously they never would.

the fix for now is to simply return disjunction queries on 0
terms instead.  this ultimately also matches nothing, but avoids
triggering the logic which handles match none searchers in a
special way.
2017-06-06 16:03:05 -04:00
Marty Schoch
9234339472 Merge pull request #605 from mschoch/fix-nil-ptr
fix nil ptr panic on newly introduced text marshaler support
2017-06-05 10:58:29 -04:00
Marty Schoch
7274dddd2e fix nil ptr panic on newly introduced text marshaler support
We recenlty introduced support for indexing the content of
things implementing TextMarshaler.  Since often times interfaces
are implemented via pointer receivers, we added support to
introspect pointers (previously we just dereferenceed them and
traversed into their underlying structs).  However, in doing so
we neglected to consider the case where the pointer does
implement the interface we care about, but happens to be nil.

fixes #603
2017-06-05 10:08:10 -04:00
Stanislav Sokolov
d8d57e6990 Added Russian analyzer with snowball stemmer 2017-06-05 18:01:01 +05:00
Marty Schoch
3351c3b046 Merge pull request #602 from mschoch/filter-numeric-range
filter numeric range terms against the term dictionary
2017-05-31 13:20:32 -04:00
Marty Schoch
77101ae424 filter numeric range terms against the term dictionary
previously, all numeric terms required to implement a numeric
range search were passed to the disjunction query (possibly
exceeding the disjunction clause limit)

now, after producing the list of terms, we filter them against
the terms which actually exist in the term dictionary.  the
theory is that this will often greatly reduce the number of terms
and therefore reduce the likelihood that you would run into the
disjunction term limit in practice.

because the term dictionary interface does not have a seek API
and we're reluctant to add that now, i chose to do a binary
search of the terms, which either finds the term, or not. then
subsequent binary searches can proceed from that position,
since both the list of terms and the term dictionary are sorted.
2017-05-31 13:15:13 -04:00
Marty Schoch
cd5b307cde Merge pull request #598 from abhinavdangeti/master
MB-24560: Add moss store|collection histograms to stats
2017-05-26 12:07:14 -04:00