Averaging with Linq while ignoring 0s cleanly

Averaging with Linq while ignoring 0s cleanly - c#

I have a linq statement that averages the rows in a DataTable and displays them on a chart, grouped by date and time of day.
There are 1 big problem: there are many 0 values that are returned, due to particular times of day simply not having anything going on. These are skewing my averages something awful
Different times of day may have 0s in different columns, so I can't just delete each row with a 0 in the columns (cleanly), as I would end up with no rows left in the datatable, or at least I can't think of a clean way to do it in any case.
This is what I have:
var results = from row2 in fiveDayDataTable.AsEnumerable()
group row2 by ((DateTime)row2["TheDate"]).TimeOfDay
into g
select new
{
Time = g.Key,
AvgItem1 = g.Average(x => (int)x["Item1"]),
AvgItem2 = g.Average(x => (int)x["Item2"]),
AvgItem3 = g.Average(x => (int)x["Item3"]),
AvgItem4 = g.Average(x => (int)x["Item4"]),
AvgItem5 = g.Average(x => (int)x["Item5"]),
};
I don't know if this is possible, so I figured I would ask- is there a way to do the average without the 0s?
Thank you!

Sure you can filter out the zeros:
AvgItem1 = g.Select(x => (int)x["Item1"]).Where(x => x != 0).Average(),
AvgItem2 = g.Select(x => (int)x["Item2"]).Where(x => x != 0).Average(),
AvgItem3 = g.Select(x => (int)x["Item3"]).Where(x => x != 0).Average(),
AvgItem4 = g.Select(x => (int)x["Item4"]).Where(x => x != 0).Average(),
AvgItem5 = g.Select(x => (int)x["Item5"]).Where(x => x != 0).Average(),
If your result set (after the Where) might be empty, you might need to call DefaultIfEmpty.
AvgItem1 = g.Select(x => (int)x["Item1"]).Where(x => x != 0).DefaultIfEmpty(0).Average(),
This will return a non-empty result set so your Average will be able to work with it.

Since you have a lot of repetition, you could consider refactoring your average logic into a separate method or anonymous function:
Func<IEnumerable<YourClass>, string, double> avg =
(g, name) => g.Select(x => (int)x[name]).Where(x => x != 0).Average();
var results = from row2 in fiveDayDataTable.AsEnumerable()
group row2 by ((DateTime)row2["TheDate"]).TimeOfDay
into g
select new
{
Time = g.Key,
AvgItem1 = avg(g, "Item1"),
AvgItem2 = avg(g, "Item2"),
AvgItem3 = avg(g, "Item3"),
AvgItem4 = avg(g, "Item4"),
AvgItem5 = avg(g, "Item5"),
};

Add a Where to each query just before the Average in which you ensure that the item is not equal to zero.

Related

Group datetime by an interval of minutes

I'm trying to group my list using linq by an interval of 30 minutes.
Let’s say we have this list:
X called at 10:00 AM
Y called at 10:10 AM
Y called at 10:20 AM
Y called at 10:35 AM
X called at 10:40 AM
Y called at 10:45 AM
What i need is to group these items in a 30 minutes frame and by user, like so:
X called at 10:00 AM
Y called 3 times between 10:10 AM and 10:35 AM
X called at 10:40 AM
Y called at 10:45 AM
Here's what i'm using with Linq:
myList
.GroupBy(i => i.caller, (k, g) => g
.GroupBy(i => (long)new TimeSpan(Convert.ToDateTime(i.date).Ticks - g.Min(e => Convert.ToDateTime(e.date)).Ticks).TotalMinutes / 30)
.Select(g => new
{
count = g.Count(),
obj = g
}));
I need the result in one list, but instead im getting the result in nested lists, which needs multiple foreach to extract.
Any help is much appreciated!

I think you are looking for SelectMany which will unwind one level of grouping:
var ans = myList
.GroupBy(c => c.caller, (caller, cg) => new { Key = caller, MinDateTime = cg.Min(c => c.date), Calls = cg })
.SelectMany(cg => cg.Calls.GroupBy(c => (int)(c.date - cg.MinDateTime).TotalMinutes / 30))
.OrderBy(cg => cg.Min(c => c.date))
.ToList();
Note: The GroupBy return selects the Min as a minor efficiency improvement so you don't constantly re-find the minimum DateTime for each group per call.
Note 2: The (int) conversion creates the buckets - otherwise, .TotalMinutes returns a double and the division by 30 just gives you a (unique) fractional answer and you get no grouping into buckets.
By modifying the initial code (again for minor efficiency), you can reformat the answer to match your textual result:
var ans = myList
.GroupBy(c => c.caller, (caller, cg) => new { Key = caller, MinDateTime = cg.Min(c => c.date), Calls = cg })
.SelectMany(cg => cg.Calls.GroupBy(c => (int)(c.date - cg.MinDateTime).TotalMinutes / 30), (bucket, cg) => new { FirstCall = cg.MinBy(c => c.date), Calls = cg })
.OrderBy(fcc => fcc.FirstCall.date)
.ToList();
var ans2 = ans.Select(fcc => new { Caller = fcc.FirstCall.caller, FirstCallDateTime = fcc.FirstCall.date, LastCallDateTime = fcc.Calls.Max(c => c.date), Count = fcc.Calls.Count() })
.ToList();

Instead of grouping by a DateTime, try grouping by a key derived from the date.
string GetTimeBucketId(DateTime time) {
return $"${time.Year}-{time.Month}-{time.Day}T{time.Hour}-{time.Minute % 30}";
}
myList
.GroupBy(i => GetTimeBucketId(i.caller.date))
.Select(g => { Count = g.Count(), Key = g.Key });

How to aggregate and SUM EntityFramework fields with multiple joins

I am able to produce a set of results that are desirable, but I have the need to group and sum of these fields and am struggling to understand how to approach this.
In my scenario, what would be the best way to get results that will:
Have a distinct [KeyCode] (right now I get many records, same KeyCode
but different occupation details)
SUM wage and projection fields (in same query)
Here is my LINQ code:
private IQueryable<MyAbstractCustomOccupationInfoClass> GetMyAbstractCustomOccupationInfoClass(string[] regionNumbers)
{
//Get a list of wage data
var wages = _db.ProjectionAndWages
.Join(
_db.HWOLInformation,
wages => wages.KeyCode,
hwol => hwol.KeyCode,
(wages, hwol) => new { wages, hwol }
)
.Where(o => regionNumbers.Contains(o.hwol.LocationID))
.Where(o => o.wages.areaID.Equals("48"))
.Where(o => regionNumbers.Contains(o.wages.RegionNumber.Substring(4))); //regions filter, remove first 4 characters (0000)
//Join OccupationInfo table to wage data, for "full" output results
var occupations = wages.Join(
_db.OccupationInfo,
o => o.wages.KeyCode,
p => p.KeyCode,
(p, o) => new MyAbstractCustomOccupationInfoClass
{
KeyCode = o.KeyCode,
KeyTitle = o.KeyTitle,
CareerField = o.CareerField,
AverageAnnualOpeningsGrowth = p.wages.AverageAnnualOpeningsGrowth,
AverageAnnualOpeningsReplacement = p.wages.AverageAnnualOpeningsReplacement,
AverageAnnualOpeningsTotal = p.wages.AverageAnnualOpeningsTotal,
});
//TO-DO: How to Aggregate and Sum "occupations" list here & make the [KeyCode] Distinct ?
return occupations;
}
I am unsure if I should perform the Grouping mechanism on the 2nd join? Or perform a .GroupJoin()? Or have a third query?

var occupations = _db.OccupationInfo.GroupJoin(
wages,
o => o.KeyCode,
p => p.wages.KeyCode,
(o, pg) => new MyAbstractCustomOccupationInfoClass {
KeyCode = o.KeyCode,
KeyTitle = o.KeyTitle,
CareerField = o.CareerField,
AverageAnnualOpeningsGrowth = pg.Sum(p => p.wages.AverageAnnualOpeningsGrowth),
AverageAnnualOpeningsReplacement = pg.Sum(p => p.wages.AverageAnnualOpeningsReplacement),
AverageAnnualOpeningsTotal = pg.Sum(p => p.wages.AverageAnnualOpeningsTotal),
});

Get top n rows and sum the rest and call it others in Entity Framework linq lambda query

My data structure:
BrowserName(Name) Count(Y)
MSIE9 7
MSIE10 8
Chrome 10
Safari 11
-- and so on------
What I'm trying to do is get the top 10 and then get the sum of rest and call it 'others'.
I'm trying to get the others as below but geting error..
Data.OrderBy(o => o.count).Skip(10)
.Select(r => new downModel { modelname = "Others", count = r.Sum(w => w.count) }).ToList();
The error is at 'r.Sum(w => w.count)' and it says
downModel does not contain a definition of Sum
The downModel just has string 'modelname' and int 'count'.
Any help is sincerely appreciated.
Thanks

It should be possible to get the whole result - the top ten and the accumulated "others" - in a single database query like so:
var downModelList = context.Data
.OrderByDescending(d => d.Count)
.Take(10)
.Select(d => new
{
Name = d.Name,
Count = d.Count
})
.Concat(context.Data
.OrderByDescending(d => d.Count)
.Skip(10)
.Select(d => new
{
Name = "Others",
Count = d.Count
}))
.GroupBy(x => x.Name)
.Select(g => new downModel
{
modelName = g.Key,
count = g.Sum(x => x.Count)
})
.ToList();

If you want to create just one model, then get the sum first and create your object:
var count = Data.OrderBy(o => o.count).Skip(10).Sum(x => x.count);
var model = new downModel { modelname = "Others", count = count };
Btw, OrderBy performs a sort in ascending order. If you want to get (or Skip) top results you need to use OrderByDescending.

LINQ: how to get a group of a table ordering with a related table?

I have a doubt about the object IGrouping that results from a linq where I use a "group by" sentence.
I have two tables in the database, Products and Responses they have a relationship 1 to *. In the Responses table we have a column called FinalRate which is the rate of the product. The products can have n responses or rates.
I want to get the Products order by the sum of the FinalRate divided by the number of rates done. That is to say, order by the average rate descending from higher to lower marks.
As it can be read in the code (at the end of the question), I try to get the responses first. To sum all the finalrates and divide them by the count I use a group.
There are 2 problems with the code, even if the current code works:
1.-I tried to get the Products in a single query but it is impossible because I can not use the products table in the group and then use the Response table in the "orderby". One more thing LINQ only gives you the possibility to group one table, it is imposible to have "group prod, response".
I couldn't get this sql sentence in LINQ:
select prod.ProductID,prod.Commercial_Product_Name,prod.Manufacturer_Name,
prod.ProductImageUrl
from rev_product prod
inner join rev_response res on res.AtProductid=prod.ProductID
group by prod.ProductID,prod.Commercial_Product_Name,prod.Manufacturer_Name
,prod.ProductImageUrl
order by (sum(res.FinalRate)/count(res.AtProductid))
I tried this:
var gruposproductos = (from prod in ctx.Products
join res in ctx.Responses on prod.ProductID equals res.AtProductId
group prod by prod.ProductID into g
orderby (g.Sum(ra =>ra.FinalRate)/g.Count())
descending select g).Take(2);
But as I say, the "orderby (g.Sum..." gives an error, because "into g" groups the Product table, not the Response Table.
So this is why in my final code I don't get the products in the same LINQ sentence.
2.-Once accepted this fact, the problem is that I get an IGrouping, but I don't obtain a list of Responses that I can iterate without doing the two foreach in the code. I wanted only one loop, as one would do if you had a "List" object.
It is not really a cool method but it works. Moreover, I have to control that in the second loop there is only added 1 time.
Any better code?
var groupproducts = (from res in ctx.Responses
group res by res.AtProductId into g
orderby (g.Sum(ra =>ra.FinalRate)/g.Count())
descending select g).Take(2).ToList();
List<Product> theproducts = new List<Product>();
foreach (var groupresponse in groupproducts)
{
foreach (var response in groupresponse)
{
var producttemp= (from prod in ctx.Products
where prod.ProductID == response.AtProductId
select prod).First();
theproducts.Add(producttemp);
}
}
}
FINAL SOLUTION (thx a lot #Daniel)
var productsanonymtype = ctx.Products.Select(x => new
{
Product = x,
AverageRating = x.Responses.Count() == 0 ? 0 : x.Responses.Select(r => (double)r.FinalRate).Sum() / x.Responses.Count()
}).OrderByDescending(x => x.AverageRating);
List<Product> products = new List<Product>();
foreach (var prod in productsanonymtype)
{
products.Add(prod.Product);
}

Try this:
products.Select(x => new
{
Product = x,
AverageRating = x.Responses.Sum(x => x.FinalRate) /
x.Responses.Count()
});
The Sum overload I am using is not implemented in all providers. If that's a problem for you, you can use this alternate version:
products.Select(x => new
{
Product = x,
AverageRating = x.Responses.Select(x => x.FinalRate)
.Sum() /
x.Responses.Count()
});
If there is no navigation property from product to its responses you should first try to fix that. If you can't you can use this version:
products.Join(responses, x => x.Id, x => x.ProductId,
(p, r) => new { Product = p, Response = r })
.GroupBy(x => x.Product)
.Select(g => new { Product = g.Key,
AverageRating = g.Select(x => x.Response.FinalRate)
.Sum() /
g.Count()
});
Assuming FinalRate is an int, both methods will calculate the average rating with an int, i.e. there will be no 4.5 rating. And there will be no rounding, i.e. an actual average rating of 4.9 will result in 4. You can fix that by casting one of the operands of the division to double.
Another problem is the case with no ratings so far. The code above will result in an exception in this case. If that's a problem for you, you can change the calculation to this:
AverageRating = g.Count() == 0
? 0
: g.Select(x => (double)x.Response.FinalRate).Sum() / g.Count()

ctx.Products.GroupBy(x => new {
ProductId = x.ProductId,
FinalRate = x.Responses.Sum(y => y.FinalRate),
CountProductId = x.Responses.Count
})
.OrderBy(x => x.Key.FinalRate / x.Key.CountProductId);
And here with the projection.....
ctx.Products.Select(x => new {
ProductID = x.ProductID,
Commercial_Product_Name = x.Commercial_Product_Name,
Manufacturer_Name = x.Manufacturer_Name,
ProductImageUrl = x.ProductImageUrl,
FinalRate = x.Responses.Sum(y => y.FinalRate),
CountProductId = x.Responses.Count
})
.GroupBy(x => new {
ProductId = x.ProductId,
FinalRate = x.FinalRate,
CountProductId = x.CountProductId
})
.OrderBy(x => x.Key.FinalRate / x.Key.CountProductId);

How do I .OrderBy() and .Take(x) this LINQ query?

The LINQ query below is working fine but I need to tweak it a bit.
I want all the records in the file grouped by recordId (a customer number) and then ordered by, in descending order, the date. I'm getting the grouping and the dates are in descending order. Now, here comes the tweaking.
I want the groups to be sorted, in ascending order, by recordId. Currently, the groups are sorted by the date, or so it seems. I tried adding a .OrderBy after the .GroupBy and couldn't get that to work at all.
Last, I want to .take(x) records where x is dependent on some other factors. Basically, the .take(x) will return the most-recent x records. I tried placing a .take(x) in various places and I wasn't getting the correct results.
var recipients = File.ReadAllLines(path)
.Select (record => record.Split('|'))
.Select (tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = tokens[13],
date = Convert.ToDateTime(tokens[17])
}
)
.OrderByDescending (m => m.date)
.GroupBy (m => m.recordId)
.Dump();
Edit #1 -
recordId is not unique. There may / will likely be multiple records with the same recordId. recordId is actually a customer number.
The output will be a resultset with first name, last name, date, and recordId. Depending on several factors, there many be 1 to 5 records returned for each recordId.
Edit #2 -
The .Take(x) is for the recordId. Each recordId may have multiple rows. For now, let's assume I want the most recent date for each recordId. (select top(1) when sorted by date descending)
Edit #3 -
The following query generates the following results. Note each recordId only produces 1 row in the output (this is okay) and it appears it is the most recent date. I haven't thouroughly checked this yet.
Now, how do I sort, in ascending order, by recordId?
var recipients = File.ReadAllLines(path)
.Select (record => record.Split('|'))
.Select (tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = Convert.ToInt32(tokens[13]),
date = Convert.ToDateTime(tokens[17])
}
)
.GroupBy (m => m.recordId)
.OrderByDescending (m => m.Max (x => x.date ) )
.Select (m => m.First () )
.Dump();
FirstName LastName recordId date
X X 2531334 3/11/2011 12:00:00 AM
X X 1443809 10/18/2001 12:00:00 AM
X X 2570897 3/10/2011 12:00:00 AM
X X 1960526 3/10/2011 12:00:00 AM
X X 2475293 3/10/2011 12:00:00 AM
X X 2601783 3/10/2011 12:00:00 AM
X X 2581844 3/6/2011 12:00:00 AM
X X 1773430 3/3/2011 12:00:00 AM
X X 1723271 2/4/2003 12:00:00 AM
X X 1341886 2/28/2011 12:00:00 AM
X X 1427818 11/15/1986 12:00:00 AM

You can't that easily order by a field which is not part of the group by fields. You get a list for each group. This means, you get a list of date for each recordId.
You could order by Max(date) or Min(date).
Or you could group by recordId and date, and order by date.
order by most recent date:
.GroupBy (m => m.recordId)
// take the most recent date in the group
.OrderByDescending (m => m.Max(x => x.date))
.SelectMany(x => x.First
The Take part is another question. You could just add Take(x) to the expression, then you get this number of groups.
Edit:
For a kind of select top(1):
.GroupBy (m => m.recordId)
// take the most recent date in the group
.OrderByDescending (m => m.Max(x => x.date))
// take the first of each group, which is the most recent
.Select(x => x.First())
// you got the most recent record of each recordId
// and you can take a certain number of it.
.Take(x);
snipped I had before in my answer, you won't need it according to your question as it is now:
// create a separate group for each unique date and recordId
.GroupBy (m => m.date, m => m.recordId)
.OrderByDescending (m => m.Key)

This seems very similar to your other question - Reading a delimted file using LINQ
I don't believe you want to use Group here at all - I believe instead that you want to use OrderBy and ThenBy - something like:
var recipients = File.ReadAllLines(path)
.Select (record => record.Split('|'))
.Select (tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
recordId = tokens[13],
date = Convert.ToDateTime(tokens[17])
}
)
.OrderBy (m => m.recordId)
.ThenByDescending (m => m.date)
.Dump();
For a simple Take... you can just add this .Take(N) just before the Dump()
However, I'm not sure this is what you are looking for? Can you clarify your question?

just add
.OrderBy( g=> g.Key);
after your grouping. This will order your groupings by RecordId ascending.
Last, I want to .take(x) records where
x is dependent on some other factors.
Basically, the .take(x) will return
the most-recent x records.
If you mean by "the most recent" by date, why would you want to group by RecordId in the first place - just order by date descending:
..
.OrderByDescending (m => m.date)
.Take(x)
.Dump();
If you just want to get the top x records in the order established by the grouping though you could do the following:
...
.GroupBy (m => m.recordId)
.SelectMany(s => s)
.Take(x)
.Dump();

If you want something like the first 3 for each group, then I think you need to use a nested query like:
var recipients = File.ReadAllLines(path)
.Select(record => record.Split('|'))
.Select(tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
RecordId = tokens[13],
Date = Convert.ToDateTime(tokens[17])
}
)
.GroupBy(m => m.RecordId)
.Select(grouped => new
{
Id = grouped.Key,
First3 = grouped.OrderByDescending(x => x.Date).Take(3)
}
.Dump();
and if you want this flattened into a record list then you can use SelectMany:
var recipients = var recipients = File.ReadAllLines(path)
.Select(record => record.Split('|'))
.Select(tokens => new
{
FirstName = tokens[2],
LastName = tokens[4],
RecordId = tokens[13],
Date = Convert.ToDateTime(tokens[17])
}
)
.GroupBy(m => m.RecordId)
.Select(grouped => grouped.OrderByDescending(x => x.Date).Take(3))
.SelectMany(item => item)
.Dump();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Averaging with Linq while ignoring 0s cleanly - c#

Add a Where to each query just before the Average in which you ensure that the item is not equal to zero.

Related

Group datetime by an interval of minutes

How to aggregate and SUM EntityFramework fields with multiple joins

Get top n rows and sum the rest and call it others in Entity Framework linq lambda query

LINQ: how to get a group of a table ordering with a related table?

How do I .OrderBy() and .Take(x) this LINQ query?

Categories

Resources