0
0
bleve/test/tests/basic/searches.json

829 lines
10 KiB
JSON
Raw Normal View History

[
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"term": "marti"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"term": "noone"
}
},
"result": {
"total_hits": 0,
"hits": []
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"match_phrase": "long name"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"term": "walking"
}
},
"result": {
"total_hits": 0,
"hits": []
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"fuzziness": 0,
"prefix_length": 0,
"field": "name",
"match": "walking"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "c"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"prefix": "bobble"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "d"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"query": "+name:phone"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "d"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "age",
"max": 30
}
},
"result": {
"total_hits": 2,
"hits": [
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "a"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "b"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "age",
"max": 30,
"min": 20
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"conjuncts": [
{
"boost": 1,
"field": "age",
"min": 20
},
{
"boost": 1,
"field": "age",
"max": 30
}
]
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "birthday",
"start": "2010-01-01"
}
},
"result": {
"total_hits": 2,
"hits": [
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "c"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "d"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "birthday",
"end": "2010-01-01"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "tags",
"term": "gopher"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "tags",
"term": "belieber"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a"
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "tags",
"term": "notintagsarray"
}
},
"result": {
"total_hits": 0,
"hits": []
}
},
{
"comment": "with size 0, total should be 1, but hits empty",
"search": {
"from": 0,
"size": 0,
"query": {
"field": "name",
"term": "marti"
}
},
"result": {
"total_hits": 1,
"hits": []
}
},
{
"comment": "a search for doc a that includes tags field, verifies both values come back",
"search": {
"from": 0,
"size": 10,
"fields": ["tags"],
"query": {
"field": "name",
"term": "marti"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a",
"fields": {
"tags": ["gopher", "belieber"]
}
}
]
}
},
{
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"term": "msrti",
"fuzziness": 1
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a"
}
]
}
},
{
"comment": "highlight results",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"match": "long"
},
"highlight": {
"fields": ["name"]
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b",
"fragments": {
2015-07-06 23:56:45 +02:00
"name": ["steve has a <mark>long</mark> name"]
}
}
]
}
},
{
"comment": "highlight results without specifying fields",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"match": "long"
},
"highlight": {}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b",
"fragments": {
2015-07-06 23:56:45 +02:00
"name": ["steve has a <mark>long</mark> name"]
}
}
]
}
},
{
"comment": "request fields",
"search": {
"from": 0,
"size": 10,
"fields": ["age","birthday"],
"query": {
"field": "name",
"match": "long"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b",
"fields": {
"age": 27,
"birthday": "2001-09-09T01:46:40Z"
}
}
]
}
},
{
"comment": "tests query string only containing MUST NOT clause, bug #193",
"search": {
"from": 0,
"size": 10,
"query": {
"query": "-title:mista"
}
},
"result": {
"total_hits": 3,
"hits": [
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "b"
},
{
"id": "c"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "d"
}
]
}
},
{
"comment": "highlight results including non-matching field (which should be produced in its entirety, though unhighlighted)",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"match": "long"
},
"highlight": {
"fields": ["name", "title"]
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b",
"fragments": {
"name": ["steve has a <mark>long</mark> name"],
"title": ["missess"]
}
}
]
}
},
{
"comment": "search and highlight an array field",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "tags",
"match": "gopher"
},
"highlight": {
"fields": ["tags"]
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a",
"fragments": {
"tags": ["<mark>gopher</mark>"]
}
}
]
}
},
{
"comment": "reproduce bug in prefix search",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "title",
"prefix": "miss"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "b"
}
]
}
},
{
"comment": "test match none",
"search": {
"from": 0,
"size": 10,
"query": {
"match_none": {}
}
},
"result": {
"total_hits": 0,
"hits": []
}
},
{
"comment": "test match all",
"search": {
"from": 0,
"size": 10,
"query": {
"match_all": {}
}
},
"result": {
"total_hits": 4,
"hits": [
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "a"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "b"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "c"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "d"
}
]
}
},
{
"comment": "test doc id query",
"search": {
"from": 0,
"size": 10,
"query": {
"ids": ["b", "c"]
}
},
"result": {
"total_hits": 2,
"hits": [
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "b"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "c"
}
]
}
},
{
"comment": "test query string MUST and SHOULD",
"search": {
"from": 0,
"size": 10,
"query": {
"query": "+age:>20 missess"
}
},
"result": {
"total_hits": 3,
"hits": [
{
"id": "b"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "c"
},
{
fix pagination bug introduced by collector optimization fixes #378 this bug was introduced by: https://github.com/blevesearch/bleve/commit/f2aba116c49ea51b27bc9afd3bf15305ef04883c theory of operation for this collector (top N, skip K) - collect the highest scoring N+K results - if K > 0, skip K and return the next N internal details - the top N+K are kept in a list - the list is ordered from lowest scoring (first) to highest scoring (last) - as a hit comes in, we find where this new hit would fit into this list - if this caused the list to get too big, trim off the head (lowest scoring hit) theory of the optimization - we were not tracking the lowest score in the list - so if the score was lower than the lowest score, we would add/remove it - by keeping track of the lowest score in the list, we can avoid these ops problem with the optimization - the optimization worked by returning early - by returning early there was a subtle change to documents which had the same score - the reason is that which docs end up in the top N+K changed by returning early - why was that? docs are coming in, in order by key ascending - when finding the correct position to insert a hit into the list, we checked <, not <= the score - this has the subtle effect that docs with the same score end up in reverse order for example consider the following in progress list: doc ids [ c a b ] scores [ 1 5 9 ] if we now see doc d with score 5, we get: doc ids [ c a d b ] scores [ 1 5 5 9 ] While that appears in order (a, d) it is actually reverse order, because when we produce the top N we start at the end. theory of the fix - previous pagination depended on later hits with the same score "bumping" earlier hits with the same score off the bottom of the list - however, if we change the logic to <= instead of <, now the list in the previous example would look like: doc ids [ c d a b ] scores [ 1 5 5 9 ] - this small change means that now earlier (lower id) will score higher, and thus we no longer depend on later hits bumping things down, which means returning early is a valid thing to do NOTE: this does depend on the hits coming back in order by ID. this is not something strictly guaranteed, but it was the same assumption that allowed the original behavior This also has the side-effect that 2 hits with the same score come back in ascending ID order, which is somehow more pleasing to me than reverse order.
2016-06-01 16:43:14 +02:00
"id": "d"
}
]
}
},
{
"comment": "test regexp matching term",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"regexp": "mar.*"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a"
}
]
}
},
{
"comment": "test regexp that should not match when properly anchored",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"regexp": "mar."
}
},
"result": {
"total_hits": 0,
"hits": []
}
},
{
"comment": "test wildcard matching term",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "name",
"wildcard": "mar*"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a"
}
]
}
},
{
"comment": "test boost - term query",
"search": {
"from": 0,
"size": 10,
"query": {
"disjuncts": [
{
"field": "name",
"term": "marti",
"boost": 1.0
},
{
"field": "name",
"term": "steve",
"boost": 5.0
}
]
}
},
"result": {
"total_hits": 2,
"hits": [
{
"id": "b"
},
{
"id": "a"
}
]
}
},
{
"comment": "test boost - term query",
"search": {
"from": 0,
"size": 10,
"query": {
"disjuncts": [
{
"field": "name",
"term": "marti",
"boost": 1.0
},
{
"fuzziness": 1,
"field": "name",
"term": "steve",
"boost": 5.0
}
]
}
},
"result": {
"total_hits": 2,
"hits": [
{
"id": "b"
},
{
"id": "a"
}
]
}
},
{
"comment": "test boost - numeric range query",
"search": {
"from": 0,
"size": 10,
"query": {
"disjuncts": [
{
"field": "name",
"term": "marti",
"boost": 1.0
},
{
"field": "age",
"min": 25,
"max": 29,
"boost": 50.0
}
]
}
},
"result": {
"total_hits": 2,
"hits": [
{
"id": "b"
},
{
"id": "a"
}
]
}
},
{
"comment": "test boost - regexp query",
"search": {
"from": 0,
"size": 10,
"query": {
"disjuncts": [
{
"field": "name",
"term": "marti",
"boost": 1.0
},
{
"field": "name",
"regexp": "stev.*",
"boost": 5.0
}
]
}
},
"result": {
"total_hits": 2,
"hits": [
{
"id": "b"
},
{
"id": "a"
}
]
}
},
{
"comment": "test wildcard inside query string",
"search": {
"from": 0,
"size": 10,
"query": {
"query": "name:mar*"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a"
}
]
}
},
{
"comment": "test regexp inside query string",
"search": {
"from": 0,
"size": 10,
"query": {
"query": "name:/mar.*/"
}
},
"result": {
"total_hits": 1,
"hits": [
{
"id": "a"
}
]
}
},
{
"comment": "test term range",
"search": {
"from": 0,
"size": 10,
"query": {
"field": "title",
"max": "miz",
"min": "mis"
}
},
"result": {
"total_hits": 2,
"hits": [
{
"id": "a"
},
{
"id": "b"
}
]
}
}
]