Elasticsearch - Rolling up "other" results - c#

I'm trying to rollup some of my 'other' results using Elasticsearch. Ideally, I'd like my query to return the top N hits and then roll the rest of the data up into an N+1 hit titled "Other".
So for example, if I'm trying to aggregate "Institutions by Total Value", I'd get back 10 Institutions with the most value and then the total aggregated value of the other institutions as another record. The purpose is that I'd like to see the total value aggregated across all institutions but not have to list thousands.
An example search I've been using is:
GET my_index/institution/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
... terms queries ...
]
}
}
}
},
"aggs": {
"dimension_type_name_agg": {
"terms": {
"field": "institution_name",
"order": {
"metric_sum_total_value_agg": "desc"
},
"size": 0
},
"aggs": {
"metric_sum_total_value_agg": {
"sum": {
"field": "total_value"
}
},
"metric_count_account_id_agg": {
"value_count": {
"field": "institution_id"
}
}
}
}
}
}
I'm curious as to if this can be done by modifying a query like the one given above. Also, I'm using C# and Nest/Elasticsearch.NET so any tips on how this translates to that side is appreciated as well.

Related

ElasticSearch NEST - FieldValueFactorFunction score function outputting invalid json query

Let me preface this to say I'm newer to ElasticSearch and NEST, and may be doing something wrong. This is using NEST 7.6.2.
I'm following the documentation to create a field_value_factor score function containing a filter and weight using object initializer syntax, i.e.:
new FieldValueFactorFunction
{
Field = "foo",
Modifier = FieldValueFactorModifier.Log1P,
Missing = 1,
Filter = new MatchQuery
{
Field = "bar",
Query = "1"
},
Weight = .2
}
However, at runtime it appears to output an invalid json format in the query itself:
{
"filter": {
"match": {
"bar": {
"query": "1"
}
}
},
"field_value_factor": {
"field": "foo",
"missing": 1.0,
"modifier": "log1p",
"filter": {
"match": {
"bar": {
"query": "1"
}
}
},
"weight": 0.2
},
"weight": 0.2
}
Which fails with the error field_value_factor query does not support [value]. I do know the valid function syntax I'm trying to emulate is the following:
{
"filter": {
"match": {
"bar": {
"query": "1"
}
}
},
"field_value_factor": {
"field": "foo",
"missing": 1.0,
"modifier": "log1p"
},
"weight": 0.2
}
Is this a bug in NEST/Elasticsearch.net? Is my syntax incorrect? Is there an alternate way to do what I'm trying to do?
This was apparently an issue in the version of NEST I was using. Updating the nuget package resolved it.

Mongodb Lookup with sorting and grouping in C#

I have the following db config:
db={
"order": [
{
"id": 1,
"version": 1
},
{
"id": 1,
"version": 2
},
{
"id": 2,
"version": 1
},
{
"id": 2,
"version": 2
}
],
"orderDetail": [
{
"orderId": 1,
"orderDate": new Date("2020-01-18T16:00:00Z")
},
{
"orderId": 1,
"orderDate": new Date("2020-01-11T16:00:00Z")
},
{
"orderId": 1,
"orderDate": new Date("2020-01-12T16:00:00Z")
}
]
}
I'm using the fluent interface to perform a Lookup joining the orderDetails to the order collection (as shown in this post). Now that I have the join in place what's the best method to:
Sort the joined array such that the details are sorted by orderDate
Group the Orders (by OrderID) and sort by version to select the latest (largest Version #)
The workaround I implemented for #1 involves sorting the list after performing the lookup, but that's only because I wasn't able to apply a sort to the "as" of collection as part of the Lookup.
If anyone has any ideas, I'd appreciate it. Thanks!
If you are using MongoDB v3.6 or higher, you can use the $lookup with uncorrelated subqueries to use the inner pipelines to archive what you want.
Join Conditions and Uncorrelated Sub-queries
Since you didn't provide what collections or fields you are using, I will give a generic example:
db.customers.aggregate([
{
$lookup: {
from: "orders",
let: { customer_id: "$_id" },
pipeline: [
{ $match: { $expr: { $eq: [ "$customer_id", "$$customer_id" ] } } },
{ $sort: { orderDate: -1 } }
],
as: "orders"
}
}
]);
I hope that gives you a way to get where you want. =]

ElasticSearch combining MultiMatch with Must

So I have this object model:
string Name; // name of the person
int Age; // age of the person
string CreatedBy; // operator who created person
My query sounds like this: all documents WHERE Age > 40 AND CreatedBy == 'callum' AND Name contains 'll'
CreatedBy is a necessary, scope of control.
Age is also a necessary (but isn't a security issue)
Name is where it can get fuzzy, because that is what the user is querying. Akin to sort of contains
The query below works for the first two parts:
"query": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
I tried adding a multi_match because ultimately it maybe a search across Name, Address and other bits of information. I couldn't make sense of where to fit it in.
In my, nested queries would be useful. So first filter out all irrelevant users, then filter out irrelevant ages. Then do some fuzzier matching on relevant fields.
So, the answer to this isn't straightforward.
First of all you need to create an Analyser for Compound Words.
So in the .NET client it looks like:
this.elasticClient.CreateIndex("customer", p => p
.Settings(s => s
.Analysis(a => a
.TokenFilters(t => t
.NGram("bigrams_filter", ng => ng
.MaxGram(2)
.MinGram(2)))
.Analyzers(al => al
.Custom("bigrams", l => l
.Tokenizer("standard")
.Filters("lowercase", "bigrams_filter"))))));
this.elasticClient.Map<Person>(m => m
.Properties(props => props
.String(s => s
.Name(p => p.Name)
.Index(FieldIndexOption.Analyzed)
.Analyzer("bigrams"))
.String(s => s
.Name(p => p.CreatedBy)
.NotAnalyzed())
.Number(n => n
.Name(p => p.Age))));
Which is a sort of direct translation of the the first link provided. This now means that all names will be broken into their bigram representation:
Callum
ca
al
ll
lu
um
Then you need the actual query to take advantage of this. Now this is bit I like, because we've set up that index on the name column, it means that all term queries can have partial words in them, so take this for example (Sense query):
GET customer/_search
{
"query": {
"filtered": {
"query": {
"multi_match": {
"query": "ll",
"fields": ["name"]
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"age": {
"gt": 40
}
}
},
{
"match": {
"createdBy": "Callum"
}
}
]
}
}
}
}
}
Here, we have a filtered query. So the query is always the first to be run (can't find documentation yet to cite that, but I have read it), and this will be the partial terms match. Then we simply filter - which is done after the query - to get the subset of results we need.
Because the ngrams analyser is only set on name that is the only column that will be partially matched against. So CreatedBy won't and thus we get our security around the results.
Basically what you can do is put the query into two blocks:
"query": {
"filter":{
"bool":
{
"must":[
{
"range": {
"age": {
"gt": 40
}
}
}
]
}
},
"query":{
"bool": {
"must": [
{
"multi_match" : {
"query": "ll",
"fields": [ "createdBy", "Address","Name" ] ,
"fuzziness":2
}
}
]
}
}
}
What you can do is in filter you can use condtions to filter things out, on then with the filtered data you can apply you multi-match query. The main reason why I included age in filter is because you dont need to perform any kind of free text search, you just need to check with a static value, you can include more conditions within the must block of filter.
You can also look into this article, which might give you some overview.
https://googleweblight.com/?lite_url=https://www.elastic.co/blog/found-optimizing-elasticsearch-searches&ei=EBaRAJDx&lc=en-IN&s=1&m=75&host=www.google.co.in&ts=1465153335&sig=APY536wHUUfGEjoafiVIzGx2H77aieiymw
Hope it helps!

ElasticSearch NEST - Find results with special characters

I am trying to write a search query on an elastic index that will return me results from any part of the field value.
I have a Path field that contains values like C:\temp\ab-cd\abc.doc
I want the ability to send a query that will return my any matching part from what I wrote
QueryContainer currentQuery = new QueryStringQuery
{
DefaultField = "Path",
Query = string.Format("*{0}*", "abc"),
};
The above will return results, this will not:
QueryContainer currentQuery = new QueryStringQuery
{
DefaultField = "Path",
Query = string.Format("*{0}*", "ab-cd"),
};
The same goes for any other special character like ##$%^&* and so on.
Is there some generic way to send a query and to find exactly what I searched?
Each of my fields are multi-fields and I can use the *.raw options but do not exactly know how or if I should
Use nGrams to split the text in smaller chunks and use term filter to query it. Pro: it should be faster. Con: the size of the index (disk space) will be larger because more terms (from nGram filter) are generated.
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "keyword",
"filter": [
"substring"
]
}
},
"filter": {
"substring": {
"type": "nGram",
"min_gram": 1,
"max_gram": 50
}
}
}
},
"mappings": {
"test": {
"properties": {
"Path": {
"type": "string",
"index_analyzer": "my_ngram_analyzer",
"search_analyzer": "keyword"
}
}
}
}
}
And the query:
GET /test/test/_search
{
"query": {
"term": {
"Path": {
"value": "\temp"
}
}
}
}
If you wish, you can use the config above as a sub-field for whatever mapping you already have.
If you want to use query_string there one thing you need to be aware: you need to escape special characters. For example -, \ and : (complete list here). Also, when indexing, the \ char needs escaping, otherwise it will issue an error. This is what I tested especially with query_string: https://gist.github.com/astefan/a52fa4989bf5298102d1

Order by sum aggregation throws error in elastic search

I am trying to sort the aggregated result by applying another aggregation that does the summing and then applying order by descending to that sum.
if I try like below, the aggregation result get sorted by doc count.
"order": {
"revrsenestedowners": "desc"
}
Below code explains the problem am facing. (field names are changed just to illustrate the problem)
"machines" is my nested object but the "owners" is not nested and it belongs to parent object.
I need to get the top 10 machines name by owners machine count (require sum as the owners object are list and can have more than one value).
{
"query": {
"range": {
"createdDate": {
"gte": "2015-04-28T00:00:00",
"lte": "2015-05-01T23:59:59"
}
}
},
"aggs": {
"nestedagg": {
"nested": {
"path": "machines"
},
"aggs": {
"terms": {
"terms": {
"field": "machines.machineName",
"size": 10,
"order": {
"sumowners": "desc"
}
},
"aggs": {
"revrsenestedowners": {
"reverse_nested": {},
"aggs": {
"sumowners": {
"sum": {
"field": "owners.machinesCount"
}
}
}
}
}
}
}
}
}
}
I require the sum ordering and not the doc count ordering.
for it to work I may require something like :
"order": {
"revrsenestedowners.sumowners": "desc"
}
Is there a way to achieve what I'm looking for.
Or Is this the limitation with elastic search? or a bug?
I'm stuck and really appreciate any help
I raised same issue on elastic search forum and they replied with the correct syntax,
https://github.com/elastic/elasticsearch/issues/11059
The Answer is:
"order": {
"revrsenestedowners > sumowners.value" : "desc"
}
Hope it may help to others,

Categories

Resources