0
0
Commit Graph

957 Commits

Author SHA1 Message Date
Marty Schoch
9089de251f remove byte_array_conveters
fixes #392
fixes #100
2016-07-01 10:21:41 -04:00
Marty Schoch
63f2eb6740 update godocs for date range querying
fixes #382
2016-06-26 10:25:09 -04:00
Marty Schoch
2807a2c8bd adding CONTRIBUTING.md to repo
closes #327
2016-06-26 09:48:43 -04:00
Marty Schoch
bd2a23fb6d remove firestorm index scheme
firestorm was an experiment
we learned a lot, but it did not result in a usable index scheme
2016-06-26 07:51:41 -04:00
Marty Schoch
7e02e616ce fix indexing of primitives not inside map/struct
fixes #389
2016-06-21 21:15:36 -04:00
Marty Schoch
54b06ce0f6 fix bug in regexp, prefix and fuzzy searchers
these searchers incorrectly called Next() on their underlying
searcher, instead of Advance().  this can cause values to be
returned with an ID less than the one that was Advanced() to,
which violates the contract, and causes other incorrect behavior.

fixes #342
2016-06-21 09:00:05 -04:00
Marty Schoch
9f31ea6805 standardize behavior of mapping anonymous fields
the behavior has been defined in a way that is compatible with
encoding/json.  this behavior is as follows:

anonymous fields which are structs will have struct fields get
field names as if they were directly in the parent struct.

anonymous fields which are not structs, or which are interfaces
which may or may not point to structs will get field names that
correspond to the name of the type

the exception to the rules above is that you can always override
this behavior by using a JSON struct tag

fixes #101
2016-06-16 16:27:24 -04:00
Marty Schoch
58457e7d66 Merge pull request #388 from MachineShop-IOT/master
Add bucket fillPercent option for boltdb
2016-06-15 11:18:55 -04:00
Mark Mindenhall
c3c827aded Add boltdb config test 2016-06-14 13:36:40 -06:00
Mark Mindenhall
d369bd5c3c Add bucket fill percent option for boltdb 2016-06-13 18:47:38 -06:00
Marty Schoch
fedb46269e updated whtitepsace to behave more like lucene/es 2016-06-10 15:30:43 -04:00
Marty Schoch
9c9dbcc90a fix another test issue 2016-06-10 13:21:27 -04:00
Marty Schoch
5ec47500ae fix format issue identified by go vet 2016-06-10 13:13:15 -04:00
Marty Schoch
80f1117a6c add couchbase copyright and license now that CLA has been signed 2016-06-10 13:08:50 -04:00
Marty Schoch
043a3bfb7c change cjk analyzer to use unicode tokenizer
change cjk bigram analyzer to work with multi-rune terms
add cjk width filter replaces full unicode normailzation

these changes make the cjk analyzer behave more like elasticsearch
they also remove the depenency on the whitespace analyzer
which is now free to also behave more like lucene/es

fixes #33
2016-06-10 13:04:40 -04:00
Marty Schoch
b91c5375e4 enhance bleve_dump to print dictionary and counts 2016-06-10 13:01:29 -04:00
Marty Schoch
d4097c1f29 Merge pull request #386 from a-little-srdjan/BuildTermFromRunes-in-token-filters
removing duplicate code by reusing util.go in analysis
2016-06-10 12:49:56 -04:00
a-little-srdjan
efe573bc10 removing duplicate code by reusing util.go in analysis 2016-06-09 15:13:30 -04:00
Marty Schoch
5722d7b1d1 Merge pull request #384 from a-little-srdjan/ngram_int_bounds
Preventing panic on ngram initialization, and extending the type conversion
2016-06-08 23:45:32 -04:00
Marty Schoch
52b533a027 Merge pull request #383 from a-little-srdjan/camelcase_token_filter
init. simple camel case parser.
2016-06-08 23:44:07 -04:00
Marty Schoch
1be5699c54 Merge pull request #381 from MachineShop-IOT/master
Compact for boltdb (workaround for #374)
2016-06-08 00:01:20 -04:00
Marty Schoch
c8c2baf399 Merge pull request #385 from steveyen/mossDeferredSort
configurable mossStoreOptions and DeferredSort defaults to true
2016-06-07 23:00:57 -04:00
Steve Yen
4e531ae11b configurable mossStoreOptions and DeferredSort defaults to true 2016-06-07 17:38:43 -07:00
a-little-srdjan
9341cc835e Preventing panic on ngram initialization, and extending the type conversion. 2016-06-06 15:54:18 -04:00
a-little-srdjan
3f2701a97c init. simple camel case parser. 2016-06-03 11:04:21 -04:00
Mark Mindenhall
09fcc69516 rename defaultBatchSize to defaultCompactBatchSize 2016-06-01 14:25:57 -06:00
Mark Mindenhall
b5a4378a46 Cleanup godoc comments in PR 2016-06-01 13:59:57 -06:00
Mark Mindenhall
fecf7ab5c4 Compact for boltdb (workaround for #374) 2016-06-01 13:16:43 -06:00
Marty Schoch
92cf2a8974 Merge pull request #376 from MachineShop-IOT/master
Remove DictionaryTerm with count 0 during compact (workaround for #374)
2016-06-01 13:39:30 -04:00
Marty Schoch
4688b437bf Merge pull request #380 from mschoch/fix-collector
fix pagination bug introduced by collector optimization
2016-06-01 13:11:41 -04:00
Marty Schoch
2043bb4bf8 fix pagination bug introduced by collector optimization
fixes #378

this bug was introduced by:
f2aba116c4

theory of operation for this collector (top N, skip K)

- collect the highest scoring N+K results
- if K > 0, skip K and return the next N

internal details

- the top N+K are kept in a list
- the list is ordered from lowest scoring (first) to highest scoring (last)
- as a hit comes in, we find where this new hit would fit into this list
- if this caused the list to get too big, trim off the head (lowest scoring hit)

theory of the optimization

- we were not tracking the lowest score in the list
- so if the score was lower than the lowest score, we would add/remove it
- by keeping track of the lowest score in the list, we can avoid these ops

problem with the optimization
- the optimization worked by returning early
- by returning early there was a subtle change to documents which had the same score
- the reason is that which docs end up in the top N+K changed by returning early
- why was that? docs are coming in, in order by key ascending
- when finding the correct position to insert a hit into the list, we checked <, not <= the score
- this has the subtle effect that docs with the same score end up in reverse order

for example consider the following in progress list:

doc ids [   c    a    b  ]
scores  [   1    5    9  ]

if we now see doc d with score 5, we get:

doc ids [   c    a    d    b  ]
scores  [   1    5    5    9  ]

While that appears in order (a, d) it is actually reverse order, because when we
produce the top N we start at the end.

theory of the fix

- previous pagination depended on later hits with the same score "bumping" earlier
hits with the same score off the bottom of the list
- however, if we change the logic to <= instead of <, now the list in the previous
example would look like:

doc ids [   c    d    a    b  ]
scores  [   1    5    5    9  ]

- this small change means that now earlier (lower id) will score higher, and
thus we no longer depend on later hits bumping things down, which means returning
early is a valid thing to do

NOTE: this does depend on the hits coming back in order by ID.  this is not
something strictly guaranteed, but it was the same assumption that allowed the
original behavior

This also has the side-effect that 2 hits with the same score come back in
ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 11:35:18 -04:00
Marty Schoch
105626269c adding config options to use cellar 2016-05-26 17:33:42 -04:00
Marty Schoch
44f0883ef2 Merge pull request #375 from steveyen/mossStore
enable mossStore as configurable lower-level store
2016-05-26 16:50:57 -04:00
Steve Yen
bf318b489b enable mossStore as configurable lower-level store
Also, bumped moss vendor SHA to latest moss with mossStore.
2016-05-26 13:33:22 -07:00
Mark Mindenhall
04351eb8f1 Move creation of iterator within transaction 2016-05-26 12:29:49 -06:00
Mark Mindenhall
686b20be4f Remove DictionaryTerm with count 0 during compact (workaround for #374) 2016-05-26 11:04:53 -06:00
Marty Schoch
d8ccda94f1 Merge pull request #373 from MachineShop-IOT/master
Add compact method to goleveldb store
2016-05-18 09:06:53 -04:00
Mark Mindenhall
3aa1d72233 Add compact method to goleveldb store 2016-05-17 16:58:17 -06:00
Marty Schoch
c6666d4674 Merge pull request #372 from slavikm/master
Load the document only once for both fields and highlighter
2016-04-29 15:10:37 -04:00
slavikm
f2aba116c4 Make top score collector about 7 times faster 2016-04-29 09:46:47 -07:00
slavikm
6d830a9f3e Load the document only once for both fields and highlighter 2016-04-28 11:12:33 -07:00
Marty Schoch
760057afb6 parse search results by converting strings back to errors 2016-04-26 17:56:37 -04:00
Marty Schoch
3badeb5fe1 add Validate() method to SearchRequest 2016-04-22 20:33:21 -04:00
Marty Schoch
8f8bb91439 simplify date parsing in queries, add date to query string
parsing of date ranges in queries no longer consults the
index mapping.  it was deteremined that this wasn't very useful
and led to overly complicated query syntax/behavior.

instead, applications get set the datetime parser used for
date range queries with the top-level config QueryDateTimeParser

also, we now support querying date ranges in the query string,
the syntax is:

field:>"date"

>,>=,<,<= operators are supported
the date must be surrounded by quotes
and must parse in the configured date format
2016-04-22 17:12:10 -04:00
Marty Schoch
d0c6dbc9cf unregister index from expvar stats on close 2016-04-20 11:43:14 -04:00
Marty Schoch
709b418823 properly initialize index stats object for in memory indexes 2016-04-20 11:37:51 -04:00
Marty Schoch
53f7eb2891 multi-term searches check DisjunctionMaxClauseCount earlier
regexp, fuzzy and numeric range searchers now check to see if
they will be exceeding a configured DisjunctionMaxClauseCount
and stop work earlier, this does a better job of avoiding
situations which consume all available memory for an operation
they cannot complete
2016-04-18 10:06:34 -04:00
Marty Schoch
c7ae842b33 fix marshaling of MatchNone queries 2016-04-16 20:51:27 -04:00
Marty Schoch
95b03f9b54 Merge pull request #370 from mschoch/safermetrics
do not put +/-Inf or NaN values into the stats map
2016-04-15 13:56:25 -04:00
Marty Schoch
73b514fa4f do not put +/-Inf or NaN values into the stats map 2016-04-15 13:39:30 -04:00