0
0
Commit Graph

1133 Commits

Author SHA1 Message Date
Mark Mindenhall
fecf7ab5c4 Compact for boltdb (workaround for #374) 2016-06-01 13:16:43 -06:00
Marty Schoch
92cf2a8974 Merge pull request #376 from MachineShop-IOT/master
Remove DictionaryTerm with count 0 during compact (workaround for #374)
2016-06-01 13:39:30 -04:00
Marty Schoch
4688b437bf Merge pull request #380 from mschoch/fix-collector
fix pagination bug introduced by collector optimization
2016-06-01 13:11:41 -04:00
Marty Schoch
2043bb4bf8 fix pagination bug introduced by collector optimization
fixes #378

this bug was introduced by:
f2aba116c4

theory of operation for this collector (top N, skip K)

- collect the highest scoring N+K results
- if K > 0, skip K and return the next N

internal details

- the top N+K are kept in a list
- the list is ordered from lowest scoring (first) to highest scoring (last)
- as a hit comes in, we find where this new hit would fit into this list
- if this caused the list to get too big, trim off the head (lowest scoring hit)

theory of the optimization

- we were not tracking the lowest score in the list
- so if the score was lower than the lowest score, we would add/remove it
- by keeping track of the lowest score in the list, we can avoid these ops

problem with the optimization
- the optimization worked by returning early
- by returning early there was a subtle change to documents which had the same score
- the reason is that which docs end up in the top N+K changed by returning early
- why was that? docs are coming in, in order by key ascending
- when finding the correct position to insert a hit into the list, we checked <, not <= the score
- this has the subtle effect that docs with the same score end up in reverse order

for example consider the following in progress list:

doc ids [   c    a    b  ]
scores  [   1    5    9  ]

if we now see doc d with score 5, we get:

doc ids [   c    a    d    b  ]
scores  [   1    5    5    9  ]

While that appears in order (a, d) it is actually reverse order, because when we
produce the top N we start at the end.

theory of the fix

- previous pagination depended on later hits with the same score "bumping" earlier
hits with the same score off the bottom of the list
- however, if we change the logic to <= instead of <, now the list in the previous
example would look like:

doc ids [   c    d    a    b  ]
scores  [   1    5    5    9  ]

- this small change means that now earlier (lower id) will score higher, and
thus we no longer depend on later hits bumping things down, which means returning
early is a valid thing to do

NOTE: this does depend on the hits coming back in order by ID.  this is not
something strictly guaranteed, but it was the same assumption that allowed the
original behavior

This also has the side-effect that 2 hits with the same score come back in
ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 11:35:18 -04:00
Marty Schoch
105626269c adding config options to use cellar 2016-05-26 17:33:42 -04:00
Marty Schoch
44f0883ef2 Merge pull request #375 from steveyen/mossStore
enable mossStore as configurable lower-level store
2016-05-26 16:50:57 -04:00
Steve Yen
bf318b489b enable mossStore as configurable lower-level store
Also, bumped moss vendor SHA to latest moss with mossStore.
2016-05-26 13:33:22 -07:00
Mark Mindenhall
04351eb8f1 Move creation of iterator within transaction 2016-05-26 12:29:49 -06:00
Mark Mindenhall
686b20be4f Remove DictionaryTerm with count 0 during compact (workaround for #374) 2016-05-26 11:04:53 -06:00
Marty Schoch
d8ccda94f1 Merge pull request #373 from MachineShop-IOT/master
Add compact method to goleveldb store
2016-05-18 09:06:53 -04:00
Mark Mindenhall
3aa1d72233 Add compact method to goleveldb store 2016-05-17 16:58:17 -06:00
Marty Schoch
c6666d4674 Merge pull request #372 from slavikm/master
Load the document only once for both fields and highlighter
2016-04-29 15:10:37 -04:00
slavikm
f2aba116c4 Make top score collector about 7 times faster 2016-04-29 09:46:47 -07:00
slavikm
6d830a9f3e Load the document only once for both fields and highlighter 2016-04-28 11:12:33 -07:00
Marty Schoch
760057afb6 parse search results by converting strings back to errors 2016-04-26 17:56:37 -04:00
Marty Schoch
3badeb5fe1 add Validate() method to SearchRequest 2016-04-22 20:33:21 -04:00
Marty Schoch
8f8bb91439 simplify date parsing in queries, add date to query string
parsing of date ranges in queries no longer consults the
index mapping.  it was deteremined that this wasn't very useful
and led to overly complicated query syntax/behavior.

instead, applications get set the datetime parser used for
date range queries with the top-level config QueryDateTimeParser

also, we now support querying date ranges in the query string,
the syntax is:

field:>"date"

>,>=,<,<= operators are supported
the date must be surrounded by quotes
and must parse in the configured date format
2016-04-22 17:12:10 -04:00
Marty Schoch
d0c6dbc9cf unregister index from expvar stats on close 2016-04-20 11:43:14 -04:00
Marty Schoch
709b418823 properly initialize index stats object for in memory indexes 2016-04-20 11:37:51 -04:00
Marty Schoch
53f7eb2891 multi-term searches check DisjunctionMaxClauseCount earlier
regexp, fuzzy and numeric range searchers now check to see if
they will be exceeding a configured DisjunctionMaxClauseCount
and stop work earlier, this does a better job of avoiding
situations which consume all available memory for an operation
they cannot complete
2016-04-18 10:06:34 -04:00
Marty Schoch
c7ae842b33 fix marshaling of MatchNone queries 2016-04-16 20:51:27 -04:00
Marty Schoch
95b03f9b54 Merge pull request #370 from mschoch/safermetrics
do not put +/-Inf or NaN values into the stats map
2016-04-15 13:56:25 -04:00
Marty Schoch
73b514fa4f do not put +/-Inf or NaN values into the stats map 2016-04-15 13:39:30 -04:00
Marty Schoch
18b50305e4 update travis to build with deps specified in manifest 2016-04-08 17:06:25 -04:00
Marty Schoch
d06f4b3860 fix manifest versions
i was too eager to commit the manifest and do the release
the original manifest contained "too new" versions for some
of the deps
2016-04-08 16:49:18 -04:00
Marty Schoch
7ec37d6533 add support for wildcard and regexp queries to query string
you can now use terms like:

test?string*

and similar text in query strings to perform wildcard
searches.  also if you use:

/aregexp/

it will perform a regexp search as well
2016-04-08 15:56:02 -04:00
Marty Schoch
2f59b75ce4 Merge branch 'mschoch-fix-data-race' 2016-04-08 15:32:47 -04:00
Marty Schoch
b8a2fbb887 fix data race in bleve batch reuse
Currently bleve batch is build by user goroutine
Then read by bleve gourinte
This is still safe when used correctly
However, Reset() will modify the map, which is now a data race

This fix is to simply make batch.Reset() alloc new maps.
This provides a data-access pattern that can be used safely.
Also, this thread argues that creating a new map may be faster
than trying to reuse an existing one:

https://groups.google.com/d/msg/golang-nuts/UvUm3LA1u8g/jGv_FobNpN0J

Separate but related, I have opted to remove the "unsafe batch"
checking that we did.  This was always limited anyway, and now
users of Go 1.6 are just as likely to get a panic from the
runtime for concurrent map access anyway.  So, the price paid
by us (additional mutex) is not worth it.

fixes #360 and #260
2016-04-08 15:32:13 -04:00
Marty Schoch
617fcf693e add initial manifest 2016-04-08 15:03:05 -04:00
Marty Schoch
74b7872987 fix marshaling of MatchAll queries 2016-04-07 18:20:35 -04:00
Marty Schoch
36170a4736 more gofmt simplifications 2016-04-03 00:03:33 -04:00
Marty Schoch
dd209dcbf2 tweak syntax to make older go vet happy 2016-04-02 23:54:36 -04:00
Marty Schoch
4433964f78 tweak syntax 2016-04-02 23:17:47 -04:00
Marty Schoch
020ea1b8ee tweak syntax slightly 2016-04-02 23:08:43 -04:00
Marty Schoch
1c69112820 a few more gofmt simplifications 2016-04-02 22:48:00 -04:00
Marty Schoch
2a703376ea fix ineffectual assignments 2016-04-02 22:42:56 -04:00
Marty Schoch
7892882519 fix typos 2016-04-02 21:59:30 -04:00
Marty Schoch
194ee82c80 gofmt simplifications 2016-04-02 21:54:33 -04:00
Marty Schoch
7594daad01 adding goreportcard 2016-04-02 21:01:23 -04:00
Marty Schoch
0b171c85da change "simple" analyzer to use "letter" tokenizer
this change improves compatibility with the simple analyzer
defined by Lucene.  this has important implications for
some perf tests as well as they often use the simple
analyzer.
2016-03-31 15:13:17 -04:00
Marty Schoch
d774f980d3 search request JSON omitting size, now defaults to 10
this change only affects JSON parsing, any search request which
omits the size field entirely now defaults to 10 which
is the same behavior as NewSearchRequest()

0 is still a valid size, but must be set explicitly
2016-03-31 09:56:06 -04:00
Marty Schoch
2b82387eae export the Validate method on mapping objects 2016-03-28 17:14:41 -04:00
Marty Schoch
639fb1ab89 remove NativeMergeOperator from core, it requires unsafe 2016-03-24 12:06:43 -04:00
Marty Schoch
1f0509fe48 Merge pull request #365 from bcampbell/documenting
some minor godoc additions
2016-03-22 18:10:40 -04:00
Ben Campbell
4fafb2be3f Merge branch 'master' into documenting 2016-03-23 10:48:09 +13:00
Marty Schoch
724684a4f1 additional firestorm fixes for 64-bit alignment
part of #359
2016-03-20 11:02:13 -04:00
Marty Schoch
3dc64de478 moved fields requiring 64-bit alignment to start of struct
several data structures had a pointer at the start of the struct
on some 32-bit systems, this causes the remaining fields no longer
be aligned on 64-bit boundaries

the fix identifed by @pmezard is to put the counters first in the
struct, which guarantees correct alignment

fixes #359
2016-03-20 10:38:28 -04:00
Marty Schoch
5ea6b063ad Merge pull request #358 from steveyen/master
MB-18715 - moss Merge fix
2016-03-15 20:26:24 -04:00
Steve Yen
be2800a8e4 MB-18715 - moss Merge() didn't bump bufUsed correctly
And, also allocate more memory for both the partial and full merges.
2016-03-15 17:09:40 -07:00
Marty Schoch
f1abf6beb3 facets now also have secondary sort
in case of term facets, secondary sort (after count) is on the term
for date and numberic facets, secondary sort is on the facet name

fixes #335
2016-03-14 12:02:30 -04:00