Order by sum aggregation throws error in elastic search - c#

I am trying to sort the aggregated result by applying another aggregation that does the summing and then applying order by descending to that sum.
if I try like below, the aggregation result get sorted by doc count.
"order": {
"revrsenestedowners": "desc"
}
Below code explains the problem am facing. (field names are changed just to illustrate the problem)
"machines" is my nested object but the "owners" is not nested and it belongs to parent object.
I need to get the top 10 machines name by owners machine count (require sum as the owners object are list and can have more than one value).
{
"query": {
"range": {
"createdDate": {
"gte": "2015-04-28T00:00:00",
"lte": "2015-05-01T23:59:59"
}
}
},
"aggs": {
"nestedagg": {
"nested": {
"path": "machines"
},
"aggs": {
"terms": {
"terms": {
"field": "machines.machineName",
"size": 10,
"order": {
"sumowners": "desc"
}
},
"aggs": {
"revrsenestedowners": {
"reverse_nested": {},
"aggs": {
"sumowners": {
"sum": {
"field": "owners.machinesCount"
}
}
}
}
}
}
}
}
}
}
I require the sum ordering and not the doc count ordering.
for it to work I may require something like :
"order": {
"revrsenestedowners.sumowners": "desc"
}
Is there a way to achieve what I'm looking for.
Or Is this the limitation with elastic search? or a bug?
I'm stuck and really appreciate any help

I raised same issue on elastic search forum and they replied with the correct syntax,
https://github.com/elastic/elasticsearch/issues/11059
The Answer is:
"order": {
"revrsenestedowners > sumowners.value" : "desc"
}
Hope it may help to others,

Related

ElasticSearch NEST - FieldValueFactorFunction score function outputting invalid json query

Let me preface this to say I'm newer to ElasticSearch and NEST, and may be doing something wrong. This is using NEST 7.6.2.
I'm following the documentation to create a field_value_factor score function containing a filter and weight using object initializer syntax, i.e.:
new FieldValueFactorFunction
{
Field = "foo",
Modifier = FieldValueFactorModifier.Log1P,
Missing = 1,
Filter = new MatchQuery
{
Field = "bar",
Query = "1"
},
Weight = .2
}
However, at runtime it appears to output an invalid json format in the query itself:
{
"filter": {
"match": {
"bar": {
"query": "1"
}
}
},
"field_value_factor": {
"field": "foo",
"missing": 1.0,
"modifier": "log1p",
"filter": {
"match": {
"bar": {
"query": "1"
}
}
},
"weight": 0.2
},
"weight": 0.2
}
Which fails with the error field_value_factor query does not support [value]. I do know the valid function syntax I'm trying to emulate is the following:
{
"filter": {
"match": {
"bar": {
"query": "1"
}
}
},
"field_value_factor": {
"field": "foo",
"missing": 1.0,
"modifier": "log1p"
},
"weight": 0.2
}
Is this a bug in NEST/Elasticsearch.net? Is my syntax incorrect? Is there an alternate way to do what I'm trying to do?
This was apparently an issue in the version of NEST I was using. Updating the nuget package resolved it.

Azure Cosmos DB Add Composite Index for Array of String

I am trying to add a new composite index to do multiple fields search.
I would like to know thing to consider while adding a new Composite index and will it work for array string?
Sample Cosmos Document
{
"id": "ed78b9b5-764b-4ebc-a4f2-6b764679",
"OrderReference": "X000011380",
"SetReferences": [
"000066474884"
],
"TransactionReference": "ed78b9b5-764b-4ebc-6b7644f06679",
"TransactionType": "Debit",
"Amount": 73.65,
"Currency": "USD",
"BrandCode": "TestBrand",
"PartitionKey": "Test-21052020-255",
"SettlementDateTime": "2020-05-21T04:35:35.133Z",
"ReasonCode": "TestReason",
"IsProcessed": true,
}
My Existing index policy
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/PartitionKey/?"
},
{
"path": "/BrandCode/?"
}
],
"excludedPaths": [
{
"path": "/*"
},
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
[
{
"path": "/PartitionKey",
"order": "ascending"
},
{
"path": "/IsProcessed",
"order": "ascending"
}
]
]
}
To fetch data from Array of string SettlementReferences, IsProcessed, ReasonCode.
SELECT * FROM c WHERE ARRAY_CONTAINS(c.SettlementReferences, '00884') and c.IsProcessed = true and c.ReasonCode = 'TestReason'
I am planning to add the following policy
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/PartitionKey/?"
},
{
"path": "/BrandCode/?"
}
],
"excludedPaths": [
{
"path": "/*"
},
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
[
{
"path": "/PartitionKey",
"order": "ascending"
},
{
"path": "/IsProcessed",
"order": "ascending"
}
],
[
{
"path": "/SettlementReferences",
"order": "ascending"
},
{
"path": "/IsProcessed",
"order": "ascending"
},
{
"path": "/ReasonCode",
"order": "ascending"
}
]
]
}
Please let me know if this change is sufficient?
Moreover, I tried to compare RU's before and after the change. I don't see any massive difference, both coming around 133.56 Rus.
Is there anything more I need to consider for optimized performance?
Composite Indexes will not help with this query and overall don't have any impact on equality statements. They are useful when doing order by's in your queries. This is why you don't see any RU/s reduction in your query. However, you will notice increased RU/s on writes.
If you want to improve your query performance you should add any properties in your where clauses into the "includedPaths" in your index policy.
Another thing to point out too is generally it is a best practice to by default index everything and selectively add properties to excludedPaths. This way, if your schema changes it will be indexed automatically without having to rebuilt your index.
As mark mentioned we need to add include path for Array "/SettlementReferences /[]/?". After adding my number of Ru's reduced from 115 to 5 ru's.

Mongodb Lookup with sorting and grouping in C#

I have the following db config:
db={
"order": [
{
"id": 1,
"version": 1
},
{
"id": 1,
"version": 2
},
{
"id": 2,
"version": 1
},
{
"id": 2,
"version": 2
}
],
"orderDetail": [
{
"orderId": 1,
"orderDate": new Date("2020-01-18T16:00:00Z")
},
{
"orderId": 1,
"orderDate": new Date("2020-01-11T16:00:00Z")
},
{
"orderId": 1,
"orderDate": new Date("2020-01-12T16:00:00Z")
}
]
}
I'm using the fluent interface to perform a Lookup joining the orderDetails to the order collection (as shown in this post). Now that I have the join in place what's the best method to:
Sort the joined array such that the details are sorted by orderDate
Group the Orders (by OrderID) and sort by version to select the latest (largest Version #)
The workaround I implemented for #1 involves sorting the list after performing the lookup, but that's only because I wasn't able to apply a sort to the "as" of collection as part of the Lookup.
If anyone has any ideas, I'd appreciate it. Thanks!
If you are using MongoDB v3.6 or higher, you can use the $lookup with uncorrelated subqueries to use the inner pipelines to archive what you want.
Join Conditions and Uncorrelated Sub-queries
Since you didn't provide what collections or fields you are using, I will give a generic example:
db.customers.aggregate([
{
$lookup: {
from: "orders",
let: { customer_id: "$_id" },
pipeline: [
{ $match: { $expr: { $eq: [ "$customer_id", "$$customer_id" ] } } },
{ $sort: { orderDate: -1 } }
],
as: "orders"
}
}
]);
I hope that gives you a way to get where you want. =]

ElasticSearch combining MultiMatch with Must

So I have this object model:
string Name; // name of the person
int Age; // age of the person
string CreatedBy; // operator who created person
My query sounds like this: all documents WHERE Age > 40 AND CreatedBy == 'callum' AND Name contains 'll'
CreatedBy is a necessary, scope of control.
Age is also a necessary (but isn't a security issue)
Name is where it can get fuzzy, because that is what the user is querying. Akin to sort of contains
The query below works for the first two parts:
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
I tried adding a multi_match because ultimately it maybe a search across Name, Address and other bits of information. I couldn't make sense of where to fit it in.
In my, nested queries would be useful. So first filter out all irrelevant users, then filter out irrelevant ages. Then do some fuzzier matching on relevant fields.
So, the answer to this isn't straightforward.
First of all you need to create an Analyser for Compound Words.
So in the .NET client it looks like:
this.elasticClient.CreateIndex("customer", p => p
.Settings(s => s
.Analysis(a => a
.TokenFilters(t => t
.NGram("bigrams_filter", ng => ng
.MaxGram(2)
.MinGram(2)))
.Analyzers(al => al
.Custom("bigrams", l => l
.Tokenizer("standard")
.Filters("lowercase", "bigrams_filter"))))));
this.elasticClient.Map<Person>(m => m
.Properties(props => props
.String(s => s
.Name(p => p.Name)
.Index(FieldIndexOption.Analyzed)
.Analyzer("bigrams"))
.String(s => s
.Name(p => p.CreatedBy)
.NotAnalyzed())
.Number(n => n
.Name(p => p.Age))));
Which is a sort of direct translation of the the first link provided. This now means that all names will be broken into their bigram representation:
Callum
ca
al
ll
lu
um
Then you need the actual query to take advantage of this. Now this is bit I like, because we've set up that index on the name column, it means that all term queries can have partial words in them, so take this for example (Sense query):
GET customer/_search
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "ll",
"fields": ["name"]
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
}
}
}
Here, we have a filtered query. So the query is always the first to be run (can't find documentation yet to cite that, but I have read it), and this will be the partial terms match. Then we simply filter - which is done after the query - to get the subset of results we need.
Because the ngrams analyser is only set on name that is the only column that will be partially matched against. So CreatedBy won't and thus we get our security around the results.
Basically what you can do is put the query into two blocks:
"query": {
"filter":{
"bool":
{
"must":[
{
"range": {
"age": {
"gt": 40
}
}
}
]
}
},
"query":{
"bool": {
"must": [
{
"multi_match" : {
"query": "ll",
"fields": [ "createdBy", "Address","Name" ] ,
"fuzziness":2
}
}
]
}
}
}
What you can do is in filter you can use condtions to filter things out, on then with the filtered data you can apply you multi-match query. The main reason why I included age in filter is because you dont need to perform any kind of free text search, you just need to check with a static value, you can include more conditions within the must block of filter.
You can also look into this article, which might give you some overview.
https://googleweblight.com/?lite_url=https://www.elastic.co/blog/found-optimizing-elasticsearch-searches&ei=EBaRAJDx&lc=en-IN&s=1&m=75&host=www.google.co.in&ts=1465153335&sig=APY536wHUUfGEjoafiVIzGx2H77aieiymw
Hope it helps!

Elasticsearch - Rolling up "other" results

I'm trying to rollup some of my 'other' results using Elasticsearch. Ideally, I'd like my query to return the top N hits and then roll the rest of the data up into an N+1 hit titled "Other".
So for example, if I'm trying to aggregate "Institutions by Total Value", I'd get back 10 Institutions with the most value and then the total aggregated value of the other institutions as another record. The purpose is that I'd like to see the total value aggregated across all institutions but not have to list thousands.
An example search I've been using is:
GET my_index/institution/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
... terms queries ...
]
}
}
}
},
"aggs": {
"dimension_type_name_agg": {
"terms": {
"field": "institution_name",
"order": {
"metric_sum_total_value_agg": "desc"
},
"size": 0
},
"aggs": {
"metric_sum_total_value_agg": {
"sum": {
"field": "total_value"
}
},
"metric_count_account_id_agg": {
"value_count": {
"field": "institution_id"
}
}
}
}
}
}
I'm curious as to if this can be done by modifying a query like the one given above. Also, I'm using C# and Nest/Elasticsearch.NET so any tips on how this translates to that side is appreciated as well.

Categories

Resources