GroupBy performs slowly - c#

I have the following query and is super slow for 3000 records and produces 370 entries. How can I improve performance on it?
dealerResults = _results.GroupBy(x => new { x.DealerName, x.DealerId })
.Select(x => new MarketingReportResults()
{
DealerId = x.Key.DealerId,
DealerName = x.Key.DealerName,
LinkedTotal = linkedLeadCores.Count(y => y.DealerId == x.Key.DealerId),
LeadsTotal = x.Count(),
SalesTotal = x.Count(y => y.IsSold),
Percent = (decimal)(x.Count() * 100) / count,
ActiveTotal = x.Count(y => y.IsActive),
}).ToList();

I think the linkedLeadCores.Count() is the bottleneck here as you loop though the entire linkedLeadCores list each time a entry of _results is processed. This assumption seems to be confirmed by your comments also.
So to remove the bottleneck you could create a map (aka dictionary) that holds the count for each dealer before doing anything with _results like this ...
var linkedLeadCoresCountMap = linkedLeadCores
.GroupBy(y => y.DealerId )
.ToDictionary(y => y.Key, y => y.Count());
... and then you could write
LinkedTotal = linkedLeadCoresCountMap.ContainsKey(x.Key.DealerId) ?
linkedLeadCoresCountMap[x.Key.DealerId] : 0,

Doing a Group Join to linkedLeadCores will use an internal hash table for lookup and should solve your problem.
var dealerResults =
(from r in _results.GroupBy(x => new { x.DealerName, x.DealerId })
join llc in linkedLeadCores on r.Key.DealerId equals llc.DealerId into g
select new MarketingReportResults()
{
DealerId = r.Key.DealerId,
DealerName = r.Key.DealerName,
LinkedTotal = g.Count(),
LeadsTotal = r.Count(),
SalesTotal = r.Count(y => y.IsSold),
Percent = (decimal)(r.Count() * 100) / count,
ActiveTotal = r.Count(y => y.IsActive),
}).ToList();

Related

LINQ Query with GroupBy, MAX and Count

What could be the LINQ query for this SQL?
SELECT PartId, BSId,
COUNT(PartId), MAX(EffectiveDateUtc)
FROM PartCostConfig (NOLOCK)
GROUP BY PartId, BSId
HAVING COUNT(PartId) > 1
I am actually grouping by two columns and trying to retrieve max EffectiveDateUtc for each part.
This is what I could write. Stuck up on pulling the top record based on the date.
Also not sure, if this is a optimal one.
//Get all the parts which have more than ONE active record with the pat
//effective date and for the same BSId
var filters = (from p in configs
?.GroupBy(w => new
{
w.PartId,
w.BSId
})
?.Select(g => new
{
PartId = g.Key.PartId,
BSId = g.Key.BSId,
Count = g.Count()
})
?.Where(y => y.Count > 1)
select p)
?.Distinct()?.ToList();
var filteredData = (from p in configs
join f in filters on p.PartId equals f.PartId
select new Config
{
Id = p.Id,
PartId = p.PartId,
BSId = p.BSId,
//EffectiveDateUtc = MAX(??)
}).OrderByDescending(x => x.EffectiveDateUtc).GroupBy(g => new { g.PartId, g.BSId }).ToList();
NOTE: I need the top record (based on date) for each part. Was trying to see if I can avoid for loop.
The equivalent query would be:
var query =
from p in db.PartCostConfig
group p by new { p.PartId, p.BSId } into g
let count = g.Count()
where count > 1
select new
{
g.Key.PartId,
g.Key.BSId,
Count = count,
EffectiveDate = g.Max(x => x.EffectiveDateUtc),
};
If I understand well, you are trying to achieve something like this:
var query=configs.GroupBy(w => new{ w.PartId, w.BSId})
.Where(g=>g.Count()>1)
.Select(g=>new
{
g.Key.PartId,
g.Key.BSId,
Count = g.Count(),
EffectiveDate = g.Max(x => x.EffectiveDateUtc)
});

How to rank a list with original order in c#

I want to make a ranking from a list and output it on original order.
This is my code so far:
var data = new[] { 7.806468478, 7.806468478, 7.806468478, 7.173501754, 7.173501754, 7.173501754, 3.40877696, 3.40877696, 3.40877696,
4.097010736, 4.097010736, 4.097010736, 4.036494085, 4.036494085, 4.036494085, 38.94333318, 38.94333318, 38.94333318, 14.43588131, 14.43588131, 14.43588131 };
var rankings = data.OrderByDescending(x => x)
.GroupBy(x => x)
.SelectMany((g, i) =>
g.Select(e => new { Col1 = e, Rank = i + 1 }))
.ToList();
However, the result will be order it from descending:
What I want is to display by its original order.
e.g.: Rank = 3, Rank = 3, Rank = 3, Rank = 4, Rank = 4, Rank = 4, etc...
Thank You.
Using what you have, one method would be to keep track of the original order and sort a second time (ugly and potentially slow):
var rankings = data.Select((x, i) => new {Item = x, Index = i})
.OrderByDescending(x => x.Item)
.GroupBy(x => x.Item)
.SelectMany((g, i) =>
g.Select(e => new {
Index = e.Index,
Item = new { Col1 = e.Item, Rank = i + 1 }
}))
.OrderBy(x => x.Index)
.Select(x => x.Item)
.ToList();
I would instead suggest creating a dictionary with your rankings and joining this back with your list:
var rankings = data.Distinct()
.OrderByDescending(x => x)
.Select((g, i) => new { Key = g, Rank = i + 1 })
.ToDictionary(x => x.Key, x => x.Rank);
var output = data.Select(x => new { Col1 = x, Rank = rankings[x] })
.ToList();
As #AntonínLejsek kindly pointed out, replacing the above GroupBy call with Distinct() is the way to go.
Note doubles are not a precise type and thus are really not a good candidate for values in a lookup table, nor would I recommend using GroupBy/Distinct with a floating-point value as a key. Be mindful of your precision and consider using an appropriate string conversion. In light of this, you may want to define an epsilon value and forgo LINQ's GroupBy entirely, opting instead to encapsulate each data point into a (non-anonymous) reference type, then loop through a sorted list and assign ranks. For example (disclaimer: untested):
class DataPoint
{
decimal Value { get; set; }
int Rank { get; set; }
}
var dataPointsPreservingOrder = data.Select(x => new DataPoint {Value = x}).ToList();
var sortedDescending = dataPointsPreservingOrder.OrderByDescending(x => x.Value).ToList();
var epsilon = 1E-15; //use a value that makes sense here
int rank = 0;
double? currentValue = null;
foreach(var x in sortedDescending)
{
if(currentValue == null || Math.Abs(x.Value - currentValue.Value) > epsilon)
{
currentValue = x.Value;
++rank;
}
x.Rank = rank;
}
From review of the data you will need to iterate twice over the result set.
The first iteration will be to capture the rankings as.
var sorted = data
.OrderByDescending(x => x)
.GroupBy(x => x)
.Select((g, i) => new { Col1 = g.First(), Rank = i + 1 })
.ToList();
Now we have a ranking of highest to lowest with the correct rank value. Next we iterate the data again to find where the value exists in the overall ranks as:
var rankings = (from i in data
let rank = sorted.First(x => x.Col1 == i)
select new
{
Col1 = i,
Rank = rank.Rank
}).ToList();
This results in a ranked list in the original order of the data.
A bit shorter:
var L = data.Distinct().ToList(); // because SortedSet<T> doesn't have BinarySearch :[
L.Sort();
var rankings = Array.ConvertAll(data,
x => new { Col1 = x, Rank = L.Count - L.BinarySearch(x) });

Linq left join with group join

I have set of Users and Visits. (So user do visits)
Visit have User navigation property.
I need to find the users who don't visit.
I can do this by finding the users who visit, finding all of the users then taking the difference.
I was trying to find a solution which is faster.
This is what I have right now:
var users = _db.Users.AsNoTracking().Include(c => c.City).Where(x => x.City.Id == city);
var groupedUsers = _db.Visits.AsNoTracking().Include(c => c.City).Include(a=>a.VisitedBy).Where(x => x.City.Id == city).GroupBy(x => x.VisitedBy).Select(group => new { VisitedBy = group.Key, Count = group.Count() });
var visitingUsers = groupedUsers.Select(user => user.VisitedBy);
var dif = users.Except(visitingUsers);
However I was trying GroupJoin as below:
var results = _db.Users.Include(c => c.City).Where(c => c.City.Id == city).
GroupJoin(_db.Visits.Include(c => c.City).Include(u => u.VisitedBy), u => u.Id, v => v.VisitedBy.Id, (u, v) => new { User = u , Visits = v })
.Select(o=>o.User);
But this gives me all of the Users, I want the users who don't exist in the Visit set.
Any help?
You may be able to avoid the correlated sub-query in the other answer by actually doing the left join with null check. Here's a quick example:
var A = new [] { new Foo { Bar = 1 }, new Foo { Bar = 2 }};
var B = new [] { new Foo { Bar = 2 }};
var C = from x in A
join y in B on x.Bar equals y.Bar into z
from y in z.DefaultIfEmpty()
where y == null
select x;
Check the emitted SQL...
I am not too sure if the city filtering is what you are after however the following should achieve what you desire:
var visitsToCity = _db.Visits.Where(v => v.City.Id == city);
var usersOfCity = _db.Users.Where(u => u.City.Id == city);
var nonVisitingUsers = usersOfCity.Where(u => !visitsToCity.Any(v => v.VisitedBy == u));
The last Where combined with the Any should result in a SQL statement like:
SELECT * FROM Users u WHERE u.CityId = #p0 AND
NOT EXISTS(SELECT 1 FROM Visits v WHERE v.CityId = #p0 AND
v.VisitedById = u.Id)
Where #p0 will be supplied with the value of city.

Converting SQL to Linq with groupby, sum and count

I would like to do a group by and on that a sum and a count. I don't seem to be able to create the solution in linq. How can I convert my query to linq?
SELECT HistoricalBillingProductGroup,
COUNT(*),
BillingPeriod,
SUM(TotalMonthlyChargesOtcAndMrc)
FROM [x].[dbo].[tblReport]
group by BillingPeriod, HistoricalBillingProductGroup
order by BillingPeriod
This is what I got sofar in Linq
var result =
context.Reports.GroupBy(x => new {x.BillingPeriod, x.HistoricalBillingProductGroup})
.Select(x => new StatisticsReportLine
{
HistoricalBillingGroup = x.FirstOrDefault().HistoricalBillingProductGroup,
BillingPeriod = x.FirstOrDefault().BillingPeriod,
CountOfRows = x.Count(),
SumOfAmount = x.Sum(p => p.TotalMonthlyChargesOtcAndMrc) ?? 0
})
.ToString();
The query I get from this is enormous and takes a very long time to load. In SQL its a matter of milliseconds. I hardly doubt this is the solution.
I believe the calls to x.FirstOrDefault() are the source of your problem. Each one of these will result in a very costly inner query inside the SELECT clause of the generated SQL.
Try using the Key property of the IGrouping<T> instead :
var result = context.Reports
.GroupBy(x => new {x.BillingPeriod, x.HistoricalBillingProductGroup})
.OrderBy(x => x.Key.BillingPeriod)
.Select(x => new StatisticsReportLine
{
HistoricalBillingProductGroup = x.Key.HistoricalBillingProductGroup,
BillingPeriod = x.Key.BillingPeriod,
CountOfRows = x.Count(),
SumOfAmount = x.Sum(p => p.TotalMonthlyChargesOtcAndMrc) ?? 0
});
Or if you prefer query syntax:
var result =
(from r in context.Reports
group r by new { r.BillingPeriod, r.HistoricalBillingProductGroup } into g
orderby g.Key.BillingPeriod
select new StatisticsReportLine
{
HistoricalBillingProductGroup = g.Key.HistoricalBillingProductGroup,
BillingPeriod = g.Key.BillingPeriod,
CountOfRows = g.Count(),
SumOfAmount = x.Sum(p => p.TotalMonthlyChargesOtcAndMrc) ?? 0
});
You could try this one:
var result = context.Reports
.GroupBy(x => new {x.BillingPeriod, x.HistoricalBillingProductGroup})
.Select(x => new StatisticsReportLine
{
HistoricalBillingGroup = x.Key.HistoricalBillingProductGroup,
BillingPeriod = x.Key.BillingPeriod,
CountOfRows = x.Count(),
SumOfAmount = x.Sum(p => p.TotalMonthlyChargesOtcAndMrc) ?? 0
}).ToString();
In the above query you make a group by on two properties, BillingPeriod and HistoricalBillingProductGroup. So in each group that will be created, you will have a key, that will be consisted by these two properties.

Translating SQL to lambda with groupby

I'm trying to translate this sql statement
SELECT row, SUM(value) as VarSum, AVG(value) as VarAve, COUNT(value) as TotalCount
FROM MDNumeric
WHERE collectionid = 6 and varname in ('C3INEV1', 'C3INEVA2', 'C3INEVA3', 'C3INVA11', 'C3INVA17', 'C3INVA19')
GROUP BY row
into an EF 4 query using lambda expressions and am missing something.
I have:
sumvars = sv.staticvararraylist.Split(',');
var aavresult = _myIFR.MDNumerics
.Where(r => r.collectionid == _collid)
.Where(r => sumvars.Contains(r.varname))
.GroupBy(r1 =>r1.row)
.Select(rg =>
new
{
Row = rg.Key,
VarSum = rg.Sum(p => p.value),
VarAve = rg.Average(p => p.value),
TotalCount = rg.Count()
});
where the staticvararraylist has the string 'C3INEV1', 'C3INEVA2', 'C3INEVA3', 'C3INVA11', 'C3INVA17', 'C3INVA19' (without single quotes) and the _collid variable = 6.
While I'm getting the correct grouping, my sum, average, & count values aren't correct.
You didn't post your error message, but I suspect it's related to Contains. I've found that Any works just as well.
This should get you quite close:
var result =
from i in _myIFR.MDNumerics
where i.collectionid == _collid && sumvars.Any(v => i.varname == v)
group i by i.row into g
select new {
row = g.Key,
VarSum = g.Sum(p => p.value),
VarAve = g.Average(p => p.value),
TotalCount = g.Count()
};
Try this:
var aavresult = _myIFR.MDNumerics
.Where(r => r.collectionid == _collid && sumvars.Contains(r.varname))
.GroupBy(r1 =>r1.row,
(key,res) => new
{
Row = key,
VarSum = res.Sum(r1 => r1.value),
VarAve = res.Average(r1 => r1.value),
TotalCount = res.Count()
});

Categories

Resources