Elasticsearch: Get the ordering number of a specific item - c#

Among companies in the same industry, I need to take the top 5 companies with the highest revenue, and which particular company ranks in that industry.
It is easy to write the first query:
GET myIndex/_search
{
"from": 0,
"size": 5,
"query": {
"match": {
"industryCode": "xxxx"
}
},
"sort": [
{
"revenue": {
"order": "desc"
}
}
]
}
But I don't know how to write the second query. Currently, I have to use the scroll function to scan all records of companies in the same industry, like this:
async Task<int> GetRank()
{
int rank = 0;
searchRequest.Size = 500;
searchRequest.From = 0;
searchRequest.Scroll = "1m";
var rs = await _elasticClient.SearchAsync<Tmp>(searchRequest);
while (rs.Documents.Count > 0)
{
foreach (var item in rs.Documents)
{
rank++;
if (item.OrganCode == request.OrganCode) return rank;
}
rs = _elasticClient.Scroll<Tmp>("1m", rs.ScrollId);
}
return rank;
}
This approach is really very slow, if the company has very low revenue, it may take several minutes to produce results. Is there any way to solve this problem? Thank you very much!!!

If I understand your question correctly, you want to get the top 5 companies with the highest revenue, grouped by industry code. This can be done with a terms aggregation and a top_hits sub aggregation
{
"aggs": {
"industry_codes": {
"aggs": {
"top_companies": {
"top_hits": {
"size": 5,
"sort": [
{
"revenue": {
"order": "desc"
}
}
]
}
}
},
"terms": {
"field": "industryCode"
}
}
},
"size": 0
}
In NEST, this would look something like
var client = new ElasticClient(settings);
var searchResponse = client.Search<Tmp>(s => s
.Size(0)
.Aggregations(a => a
.Terms("industry_codes", t => t
.Field(f => f.IndustryCode)
.Aggregations(aa => aa
.TopHits("top_companies", th => th
.Sort(so => so
.Descending(f => f.Revenue)
)
.Size(5)
)
)
)
)
);
To get the top hits for each industry code
var termsAgg = searchResponse.Aggregations.Terms("industry_codes");
foreach (var bucket in termsAgg.Buckets)
{
var topHits = bucket.TopHits("top_companies");
foreach (var company in topHits.Documents<Tmp>())
{
// do something with company
}
}

Related

EF Core LINQ groups results of the query incorrectly

I have a entity like this:
public class Vehicle
{
public long Id { get; set; }
public string RegistrationNumber { get; set; }
public string Model { get; set; }
public string Code { get; set; }
//other properties
}
Which has a unique constraint on { RegistrationNumber, Model, Code, /*two other properties*/ }
I'm trying to query the database to get an object that's structured like this:
[
{
"name": "Model1",
"codes": [
{
"name": "AAA",
"registrationNumbers": ["2", "3"]
},
{
"name":"BBB",
"registrationNumbers": ["3", "4"]
}
]
},
{
"name": "Model2",
"codes": [
{
"name": "BBB",
"registrationNumbers": ["4", "5"]
}
]
}
]
I.e. the list of Models, each models has a list of Codes that can co-appear with it, each code has a list of Registration Numbers that can appear with that Model and that Code.
I'm doing a LINQ like this:
var vehicles = _context.Vehicles.Where(/*some additional filters*/)
return await vehicles.Select(v => v.Model).Distinct().Select(m => new ModelFilterDTO()
{
Name = m,
Codes = vehicles.Where(v => v.Model== m).Select(v => v.Code).Distinct().Select(c => new CodeFilterDTO()
{
Name = c,
RegistrationNumbers = vehicles.Where(v => v.Model == m && v.Code == c).Select(v => v.RegistrationNumber).Distinct()
})
}).ToListAsync();
Which gets translated into this SQL query:
SELECT [t].[Model], [t2].[Code], [t2].[RegistrationNumber], [t2].[Id]
FROM (
SELECT DISTINCT [v].[Model]
FROM [Vehicles] AS [v]
WHERE --additional filtering
) AS [t]
OUTER APPLY (
SELECT [t0].[Code], [t1].[RegistrationNumber], [t1].[Id]
FROM (
SELECT DISTINCT [v0].[Code]
FROM [Vehicles] AS [v0]
WHERE /* additional filtering */ AND ([v0].[Model] = [t].[Model])
) AS [t0]
LEFT JOIN (
SELECT DISTINCT [v1].[RegistrationNumber], [v1].[Id], [v1].[Code]
FROM [Vehicles] AS [v1]
WHERE /* additional filtering */ AND ([v1].[Model] = [t].[Model])
) AS [t1] ON [t0].[Code] = [t1].[Code]
) AS [t2]
ORDER BY [t2].[Id]
Running this query in the SQL Server gets me correct sets of values. But when I perform the LINQ, I get an object like this:
[
{
"name": "Model1",
"codes": [
{
"name": "AAA",
"registrationNumbers": [/* every single registration number that is present among the records that passed the filters*/]
}
]
}
]
What is the problem may be, and how to fix it?
Edit: After playing with it for a bit, I'm even more confused than I was
This LINQ:
var vehicles = _context.Vehicles.Where(/*some additional filters*/)
return await vehicles.Select(v => v.Model).Distinct().Select(m => new ModelFilterDTO()
{
Name = m
}).ToListAsync();
Gives the expected result:
[
{
"name": "Model1"
},
{
"name": "Model2"
},
...
]
Hovewer this LINQ:
var vehicles = _context.Vehicles.Where(/*some additional filters*/)
return await vehicles.Select(v => v.Model).Distinct().Select(m => new ModelFilterDTO()
{
Name = m,
Codes = vehicles.Select(v=>v.Code).Distinct().Select(c => new CodeFilterDTO()
{
Name = c
})
}).ToListAsync();
Gives result like this:
[
{
"name": "Model1",
"codes": [
{
"name": "AAA"
}
]
}
]
Open for yourself GroupBy operator. Using double grouping you can achieve desired result.
var rawData = await _context.Vehicles
.Where(/*some additional filters*/)
.Select(v => new
{
v.Model,
v.RegistrationNumber,
v.Code
})
.ToListAsync(); // materialize minimum data
// perform grouping on the client side
var result = rawData
.GroupBy(v => v.Model)
.Select(gm => new ModelFilterDTO
{
Name = gm.Key,
Codes = gm
.GroupBy(x => x.Code)
.Select(gc => new CodeFilterDTO
{
Name = gc.Key,
RegistrationNumbers = gc.Select(x => x.RegistrationNumber).ToList()
}).ToList()
})
.ToList();

C# MongoDB Filter returns the whole object

I'm trying to create a MongoDB filter in C#.
For example i have a JSON object like this :
"Username": "Tinwen",
"Foods": [
{
"Fruit": "Apple",
"Amount": 1
},
{
"Fruit": "Banana",
"Amount": 2
},
{
"Fruit": "Mango",
"Amount": 3
},
{
"Fruit": "Strawberry",
"Amount": 3
}
]
}
And i want to create a filter that returns only the objects in the array with Amount == 2 || Amount == 3:
{
"Username": "Tinwen",
"Foods": [
{
"Fruit": "Banana",
"Amount": 2
},
{
"Fruit": "Mango",
"Amount": 3
},
{
"Fruit": "Strawberry",
"Amount": 3
}
]
}
I've already tried filter like this :
var amountFilter= Builders<MyObject>.Filter.ElemMatch(
m => m.Foods,
f => f.Amount == 2 || f.Amount == 3);
And this one :
var expected = new List<int>();
expected.Add(2);
expected.Add(3);
var amountFilter = Builders<MyObject>.Filter.And(Builders<MyObject>.Filter.ElemMatch(
x => x.Foods, Builders<Foods>.Filter.And(
Builders<Foods>.Filter.In(y => y.Amount, expected))));
But every time it returns me the whole object (with the full array).
For now i'm using LinQ like this:
List<MyObject> res = _messageCollection.Find(amountFilter).ToEnumerable().ToList();
foreach (var msg in res)
{
for (int j = 0; j < msg.Foods.Count; j++)
{
if (!expected.Contains(msg.Foods[j].Amount))
{
msg.Foods.RemoveAt(j);
}
}
}
for (int i = 0; i < res.Count; i++)
{
if (res[i].Foods.Count == 0)
{
res.RemoveAt(i);
}
}
But I'm pretty sure it can be down using MongoDB filter (and also because it's pretty bad with LinQ). So if anyone have an answer that can help me !
you can do it with linq like following. but you gotta make the Foods property IEnumerable<Food>
public class User
{
public string Username { get; set; }
public IEnumerable<Food> Foods { get; set; }
}
the following query will do a $filter projection on the Foods array/list.
var result = await collection
.AsQueryable()
.Where(x => x.Foods.Any(f => f.Amount == 2 || f.Amount == 3))
.Select(x => new User
{
Username = x.Username,
Foods = x.Foods.Where(f => f.Amount == 2 || f.Amount == 3)
})
.ToListAsync();

Grouping data based on date entity framework and LINQ

I have a array of analytic events in my database and i would like to send this data grouped by date to my client app.
The data from the db looks something like this (but with hundreds of records):
[
{
"DateAdded": "2006-12-30 00:38:54",
"Event": "click",
"Category": "externalWebsite"
},
{
"DateAdded": "2006-07-20 00:36:44",
"Event": "click",
"Category": "social"
},
{
"DateAdded": "2006-09-20 00:36:44",
"Event": "click",
"Category": "social"
},
{
"DateAdded": "2006-09-22 00:12:34",
"Event": "load",
"Category": "profile"
}
]
What I would like to do is return the count of all the say 'social' 'click' but by month so it would look like this:
[
{
"name": "socialclicks",
"series": [
{
"count": 259,
"name": "Jan"
},
{
"count": 0,
"name": "Feb"
},
{
"count": 52,
"name": "Mar"
}
... etc, etc up to Dec <====
]
}
]
So, what I have been trying is to get all the records that are associated with a particular user using their id. This is simple.
Now I need to split them records into monthly counts showing the last 12 months from the current month (if the month doesn't exist return 0) - this is proving to be complicated and difficult.
My approach was this:
var records = context.records.where(r => r.Id = userId).ToList();
var jan
var feb
var mar
var apr
... etc, etc
for (int i = 0; i < records.Count ; i++)
{
if (record.DateAdded > "2005-12-31 00:00.00" && record.DateAdded < "2006-01-31 00:00.00") {
jan++;
}
if (record.DateAdded > "2006-01-31 00:00.00" && record.DateAdded < "2006-02-28 00:00.00") {
feb++;
}
...etc, etc
}
Then i use these variables to count and hard code the name for the returned data.
As you can see, there is lots of etc, etc because the code has become ridiculous!
There must be a more simple way to do this but i cant seem to find one!
Any assistance would be appreciated.
Thanks
The first thing to do is group all your data by the 2 properties you're interested in
Event
Category
Example:
var partialResult = entries.GroupBy(x => new {
x.Event,
x.Category
});
From there, when you project your result and you can group again by Month & Year. - anonymous object used for demo, but you could easily define this as a struct/class as appropriate:
var result = entries.GroupBy(x => new {
x.Event,
x.Category
}).Select(g => new {
g.Key.Event,
g.Key.Category,
Series = g.GroupBy(x => new {x.DateAdded.Month, x.DateAdded.Year})
.Select(i => new{
i.Key.Month,
i.Key.Year,
Count = i.Count()
}).ToArray()
});
foreach(var item in result)
{
Console.WriteLine($"Event:{item.Event} Category:{item.Category}");
foreach(var serie in item.Series)
Console.WriteLine($"\t{CultureInfo.CurrentCulture.DateTimeFormat.GetMonthName(serie.Month)}{serie.Year} Count={serie.Count}");
}
Edit: To satisfy your requirement that:
if the month doesn't exist return 0
You need to add a few complexities. First a method which can work out all the Month/Year combinations between 2 dates.
private static IEnumerable<(int Month, int Year)> MonthsBetween(
DateTime startDate,
DateTime endDate)
{
DateTime iterator;
DateTime limit;
if (endDate > startDate)
{
iterator = new DateTime(startDate.Year, startDate.Month, 1);
limit = endDate;
}
else
{
iterator = new DateTime(endDate.Year, endDate.Month, 1);
limit = startDate;
}
var dateTimeFormat = CultureInfo.CurrentCulture.DateTimeFormat;
while (iterator < limit)
{
yield return (iterator.Month,iterator.Year);
iterator = iterator.AddMonths(1);
}
}
Also you'll need some kind of range to both calculate all the months between, as well as filter your original query:
var dateRangeStart = DateTime.Parse("2006-01-01");
var dateRangeEnd = DateTime.Parse("2007-01-01");
var monthRange = MonthsBetween(dateRangeStart,dateRangeEnd);
var results = entries.Where(e => e.DateAdded>=dateRangeStart && e.DateAdded<dateRangeEnd)
..... etc
And then, when outputting results you need to effectively do a left join onto your list of years/months. For some reason this is easier using query syntax than lambda.
foreach(var item in results)
{
Console.WriteLine($"Event:{item.Event} Category:{item.Category}");
var joinedSeries = from month in monthRange
join serie in item.Series
on new{month.Year, month.Month} equals new {serie.Year, serie.Month} into joined
from data in joined.DefaultIfEmpty()
select new {
Month = data == null ? month.Month : data.Month,
Year = data == null ? month.Year : data.Year,
Count = data == null ? 0 : data.Count
};
foreach(var serie in joinedSeries)
Console.WriteLine($"\t{CultureInfo.CurrentCulture.DateTimeFormat.GetMonthName(serie.Month)}{serie.Year} Count={serie.Count}");
}
Live example: https://dotnetfiddle.net/K7ZoJN

How to priortize based on range/filter of Elastic Search DSL, such that a list can be filtered, first to show with availability > 60% first and then <

The relevance of the Applicants needs to sort, based on availability % in the month.
First, the applicants with %availabiliity more than 60% should come and then the applicants with %availability less than 60% should come.
The Fluent DSL query using ElasticSearch.net which I am trying with
var response = await
_elasticClient.SearchAsync<ApplicantsWithDetailsResponse>(s =>
s.Aggregations(a => a
.Filter("higer_average", f => f.Filter(fd => fd.Range(r => r.Field(p
=> p.AvailablePercentage).GreaterThanOrEquals(60).Boost(5))))
.Filter("lower_average", f => f.Filter(fd => fd.Range(r => r.Field(p
=> p.AvailablePercentage).GreaterThan(0).LessThan(60).Boost(3)))
)));
or
var response = await _elasticClient.SearchAsync<ApplicantsWithDetailsResponse>(
s => s
.Query(q => q
.Bool(p =>
p.Must(queryFilter => queryFilter.MatchAll())
.Filter(f => f.Range(r => r.Field("AvailablePercentage").GreaterThanOrEquals(60)))
.Boost(5)
.Filter(f => f.Range(r => r.Field("AvailablePercentage").GreaterThan(0).LessThan(60)))
.Boost(1.2)
)));
The applicant's list coming is not as per the logic. They get mixed.
Even If I try to filter to show only values greater than 60, that also does not work
Your query is not correct; it serializes to
{
"query": {
"bool": {
"boost": 1.2,
"filter": [
{
"range": {
"AvailablePercentage": {
"gt": 0.0,
"lt": 60.0
}
}
}
],
"must": [
{
"match_all": {}
}
]
}
}
}
the boost is applied to the entire bool query
the last Filter assigned overwrites any previous filters
Filters are anded, so all would need to be satisfied for a match
It's useful during development to be observe what JSON the client sends to Elasticsearch. There are numerous ways that you can do this, and one that is useful is
var defaultIndex = "default-index";
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool)
.DefaultIndex(defaultIndex)
.DisableDirectStreaming()
.PrettyJson()
.OnRequestCompleted(callDetails =>
{
if (callDetails.RequestBodyInBytes != null)
{
Console.WriteLine(
$"{callDetails.HttpMethod} {callDetails.Uri} \n" +
$"{Encoding.UTF8.GetString(callDetails.RequestBodyInBytes)}");
}
else
{
Console.WriteLine($"{callDetails.HttpMethod} {callDetails.Uri}");
}
Console.WriteLine();
if (callDetails.ResponseBodyInBytes != null)
{
Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
$"{Encoding.UTF8.GetString(callDetails.ResponseBodyInBytes)}\n" +
$"{new string('-', 30)}\n");
}
else
{
Console.WriteLine($"Status: {callDetails.HttpStatusCode}\n" +
$"{new string('-', 30)}\n");
}
});
var client = new ElasticClient(settings);
This will write all requests and responses out to the Console. Note that you may not want to do this in production for every request, as there is a performance overhead in buffering requests and responses in this way.
Your query should look something like
var response = client.Search<ApplicantsWithDetailsResponse>(s => s
.Query(q => q
.Bool(p => p
.Must(queryFilter => queryFilter
.MatchAll()
)
.Should(f => f
.Range(r => r
.Field("AvailablePercentage")
.GreaterThanOrEquals(60)
.Boost(5)
), f => f
.Range(r => r
.Field("AvailablePercentage")
.GreaterThan(0)
.LessThan(60)
.Boost(1.2)
)
)
.MinimumShouldMatch(1)
)
)
);
Which emits the following query
{
"query": {
"bool": {
"minimum_should_match": 1,
"must": [
{
"match_all": {}
}
],
"should": [
{
"range": {
"AvailablePercentage": {
"boost": 5.0,
"gte": 60.0
}
}
},
{
"range": {
"AvailablePercentage": {
"boost": 1.2,
"gt": 0.0,
"lt": 60.0
}
}
}
]
}
}
}
Combine range queries with should clauses and specify that at least one must match using MinimumShouldMatch. This is needed because of the presence of a must clause, which means that the should clauses act as boosting signal to documents, but a document does not have to satisfy any of the clauses to be considered a match. With MinimumShouldMatch set to 1, at least one of the should clauses has to be satisfied to be considered a match.
Since the must clause is a match_all query in this case, we could simply omit it and remove MinimumShouldMatch. A should clause without a must clause implies that at least one of the clauses must match.
We can also combine queries using operator overloading, for brevity. The final query would look like
var response = client.Search<ApplicantsWithDetailsResponse>(s => s
.Query(q => q
.Range(r => r
.Field("AvailablePercentage")
.GreaterThanOrEquals(60)
.Boost(5)
) || q
.Range(r => r
.Field("AvailablePercentage")
.GreaterThan(0)
.LessThan(60)
.Boost(1.2)
)
)
);
which emits the query
{
"query": {
"bool": {
"should": [
{
"range": {
"AvailablePercentage": {
"boost": 5.0,
"gte": 60.0
}
}
},
{
"range": {
"AvailablePercentage": {
"boost": 1.2,
"gt": 0.0,
"lt": 60.0
}
}
}
]
}
}
}

NEST FunctionScore() returns all indexed items before adding functions, throws exceptions after adding them

Alright, so the query works perfectly in Sense in Chrome. I use the following query:
{
"size":127,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"prefix": {
"name": {
"value": "incomp"
}
}
},
{
"match": {
"name": "a word that is"
}
}
]
}
},
"functions": [
{
"exp": {
"date": {
"origin": "now/d",
"scale": "3w",
"offset": "10d",
"decay": "0.88"
}
}
}
]
}
}
}
In short, I match on the indexed "name" property of a custom type in ES, giving priority to recently added items and supporting "suggestions as you type" - thus the prefix query. It works perfectly well, tuned as it is, so my next step would be to reproduce in NEST.
However, I'm facing some issues with the .NET NEST code below:
var results4 = _client.Search<customDataType>(
s => s.Size(5030)
.Query(q => q
.FunctionScore(fs => fs
.Name("another_named_query")
.BoostMode(FunctionBoostMode.Multiply)
.ScoreMode(FunctionScoreMode.Multiply)
.Query(qu => qu
.Bool(b => b
.Must(m => m
.Prefix(p => p
.Field(ff => ff.Name)
.Value(prefixVal)))
.Must(m2 => m2
.Match(mh => mh
.Field(f2 => f2.Name)
.Query(stringBeforePrefixVal)))))
/*.Functions( fcs => fcs.ExponentialDate(
exp => exp
.Origin(DateMath.Now)
.Scale(new Time(1814400000))
.Offset(new Time(864000000))
.Decay(0.88d))
)*/)));
I can't figure out why any attempt to use the "FunctionScore" method results in what a MatchAll() would do - all records are returned.
Meanwhile, when adding the Functions (commented above) I get an UnexpectedElasticsearchClientException with a NullReference inner exception at Nest.FieldResolver.Resolve(Field field) in C:\code\elasticsearch-net\src\Nest\CommonAbstractions\Infer\Field\FieldResolver.cs:line 31.
I'm baffled by all of this, and there don't seem to be similar problems that I can use as a starting point. Is there anything I can do to get the query above running, or should I resort to manually doing a restful API call?
Almost correct, but you're missing the field on which the exponential date decay function should run. Assuming your POCO looks like
public class customDataType
{
public string Name { get; set; }
public DateTime Date { get; set; }
}
the query would be
var prefixVal = "incomp";
var stringBeforePrefixVal = "a word that is";
var results4 = client.Search<customDataType>(s => s
.Size(5030)
.Query(q => q
.FunctionScore(fs => fs
.Name("another_named_query")
.BoostMode(FunctionBoostMode.Multiply)
.ScoreMode(FunctionScoreMode.Multiply)
.Query(qu => qu
.Bool(b => b
.Must(m => m
.Prefix(p => p
.Field(ff => ff.Name)
.Value(prefixVal)))
.Must(m2 => m2
.Match(mh => mh
.Field(f2 => f2.Name)
.Query(stringBeforePrefixVal)))))
.Functions(fcs => fcs
.ExponentialDate(exp => exp
.Field(f => f.Date)
.Origin("now/d")
.Scale("3w")
.Offset("10d")
.Decay(0.88)
)
)
)
)
);
which yields
{
"size": 5030,
"query": {
"function_score": {
"_name": "another_named_query",
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "a word that is"
}
}
}
]
}
},
"functions": [
{
"exp": {
"date": {
"origin": "now/d",
"scale": "3w",
"offset": "10d",
"decay": 0.88
}
}
}
],
"score_mode": "multiply",
"boost_mode": "multiply"
}
}
}
You can take advantage of operator overloading in NEST to shorten the bool query further, by &&ing the prefix and match query
var results4 = client.Search<customDataType>(s => s
.Size(5030)
.Query(q => q
.FunctionScore(fs => fs
.Name("another_named_query")
.BoostMode(FunctionBoostMode.Multiply)
.ScoreMode(FunctionScoreMode.Multiply)
.Query(qu => qu
.Prefix(p => p
.Field(ff => ff.Name)
.Value(prefixVal)
) && qu
.Match(mh => mh
.Field(f2 => f2.Name)
.Query(stringBeforePrefixVal)
)
)
.Functions(fcs => fcs
.ExponentialDate(exp => exp
.Field(f => f.Date)
.Origin("now/d")
.Scale("3w")
.Offset("10d")
.Decay(0.88)
)
)
)
)
);

Categories

Resources