Group By 'MM-dd-yyyy hh-mm' - c#

I have records of scores that I group by the date,pid,bidc and use SUM to aggregate the scores.The original field type is datetime. By grouping by date, I am getting incorrect aggregates because I have multiple sets in a given hour-min for a given day.What is the correct way to handle this and aggregate by 'MM-dd-yyyy hh-mm'
var scores = from ts in _context.Scores
select ts;
List<ScoreAgg> aggScores = scores.GroupBy(p => new { p.create_dt.Date, p.pid, p.bidc }).Select(g => new ScoreAgg()
{
pid = g.Key.pid,
bidc = g.Key.bidc,
score = g.Sum(s => s.weight),
create_dt = g.Key.Date
}).ToList<ScoreAgg>();

You can use DbFunction.CreateDateTime to build date without seconds:
scores.GroupBy(p => new {
Date = DbFunctions.CreateDateTime(p.create_dt.Year, p.create_dt.Month, p.create_dt.Day, p.create_dt.Hour, p.create_dt.Minute, second:null)
p.pid,
p.bidc
})

You should consider making a computed column on the table like
DateOnly AS CONVERT(date, [Date])
GroupBy(p => new { p.create_dt.DateOnly, p.pid, p.bidc })
and index DateOnly column to make aggregate faster. if you just do a convert inside your groupby clause Sql will not use the index defined on Date at all.

Related

Group in Linq with Max: How do I get the full data set of the max value?

what I have got so far is this:
var a = from e in tcdb.timeclockevent
group e by e.workerId into r
select new { workerId = r.Key, Date = r.Max(d => d.timestamp) };
This Query is giving me latest "timestamp" of every workerId (Note: workerId is not the primary key of tcdb.timeclockevent). So it is only giving me pairs of two values but I need the whole data sets
Does anybody know how I can get the whole datasets of tcdb.timeclock with the maximal timestamp for every workerId?
OR
Does anybody know how I can get the Id of the data sets of the maximal date for each worker?
Thank you in advance :)
You can order your r grouping by timestamp and select the first one
var a = from e in tcdb.timeclockevent
group e by e.workerId into r
select r.OrderByDescending(d => d.timestamp).FirstOrDefault();
Does anybody know how I can get the whole datasets of tcdb.timeclock with the maximal timestamp for every workerId?
Well, the straightforward query would be like this:
var queryA =
from e in tcdb.timeclockevent
group e by e.workerId into g
let maxDate = g.Max(e => e.timestamp)
select new { workerId = g.Key, events = g.Where(e => e.timestamp == maxDate) };
If you don't need IQueryable<T> result and since there is no SQL construct that returns directly the grouped result set, you could try the following query, which uses a different way of filtering the records with maximal timestamp for every workerId inside the database, and then does the grouping in memory:
var queryB = tcdb.timeclockevent
.Where(e => !tcdb.timeclockevent.Any(e2 =>
e2.workerId == e.workerId && e2.timestamp > e.timestamp))
.AsEnumerable()
.GroupBy(e => e.workerId);
You can try and see which one performs better with your data.

How can I write this query in linq to sql?

I'm new in linq and want to write this query:
var query = from p in behzad.rptost
where p.date.substring(0, 4) == "1395"
-->in this calc sum=p.price_moavaq+price_year
select p;
How can I write that query?
From what I assume you are trying to sum up the price_moavaq field per year.
Furthermore, by the use of the substring I guess your date field isn't of DateTime type but just a string.
So you need to use a groupby:
var query = from p in behzad.rptost
group p by p.date.substring(0, 4) into grouping
select new { Year = p.Key, Sum = p.Sum(x => x.price_moavaq);
In the case that your date field is of DateTime type then just use .Year:
var query = from p in behzad.rptost
group p by p.date.Year into grouping
select new { Year = p.Key, Sum = p.Sum(x => x.price_moavaq);

Sum and Group by in linq using Datarows

Full disclosure, I'm pretty much a total noob whe it comes to linq. I could be way of base on how i should be approaching this.
I have a DataTable with 3 columns
oid,idate,amount
each id has multiple dates, and each date has multiple amounts. What I need to do is sum the amount for each day for each id, so instead of:
id,date,amount
00045,02/13/2011,11.50
00045,02/14/2011,11.00
00045,02/14/2011,12.00
00045,02/15/2011,10.00
00045,02/15/2011,5.00
00045,02/15/2011,12.00
00054,02/13/2011,8.00
00054,02/13/2011,9.00
I would have:
id,date,SumOfAmounts
00045,02/13/2011,11.50
00045,02/14/2011,23.00
00045,02/15/2011,27.00
00054,02/13/2011,17.00
private void excelDaily_Copy_Into(DataTable copyFrom, DataTable copyTo)
{
var results = from row in copyFrom.AsEnumerable()
group row by new
{
oid = row["oid"],
idate = row["idate"]
} into n
select new
{
///unsure what to do
}
};
I've tried a dozen or so different ways of doing this and I always sort of hit a wall where i can't figure out how to progress. I've been all over stack overflow and the msdn and nothing so far has really helped me.
Thank you in advance!
You could try this:
var results = from row in copyFrom.AsEnumerable()
group row by new
{
oid = row.Field<int>("oid"),// Or string, depending what is the real type of your column
idate = row.Field<DateTime>("idate")
} into g
select new
{
g.Key.oid,
g.Key.idate,
SumOfAmounts=g.Sum(e=>e.Field<decimal>("amount"));
};
I suggest to use Field extension method which provides strongly-typed access to each of the column values in the specified row.
Although you don't specify it, apparently copyFrom is an object from a class DataTable that implements IEnumerable.
According to MSDN System.Data.DataTable the class does not implement it. If you use that class, you need property Rows, which returns a collections of rows that implements IEnumerable:
IEnumerable<DataRow> rows = copyFrom.Rows.Cast<DataRow>()
but if you use a different DataTable class, you'll probably do something similar to cast it to a sequence of DataRow.
An object of class System.Data.DataRow has item properties to access the columns in the row. In your case the column names are oid, idate and amount.
To convert your copyFrom to the sequence of items you want to do the processing on is:
var itemsToProcess = copyFrom.Rows.Cast<DataRow>()
.Select(row => new
{
Oid = row["oid"],
Date = (DateTime)row["idate"],
Amount = (decimal)row["amount"],
});
I'm not sure, but I assume that column idate contains dates and column amount contains some value. Feel free to use other types if your columns contain other types.
If your columns contain strings, convert them to the proper items using Parse:
var itemsToProcess = copyFrom.Rows.Cast<DataRow>()
.Select(row => new
{
Id = (string)row["oid"],
Date = DateTime.Parse( (string) row["idate"]),
Amount = Decimal.Parse (string) row["amount"]),
});
If you are unfamiliar with the lambda expressions. It helped me a lot to read it as follows:
itemsToProcess is a collection of items, taken from the collection of
DataRows, where from each row in this collection we created a new
object with three properties: Id = ...; Data = ...; Amount = ...
See
Explanation of Standard Linq oerations for Cast and Select
Anonymous Types
Now we have a sequence where we can compare dates and sum the amounts.
What you want, is to group all items in this sequence into groups with the same Id and Date. So you want a group where with Id = 00045 and Date = 02/13/2011, and a group with Id = 00045 and date = ,02/14/2011.
For this you use Enumerable.GroupBy. As the selector (= what have all items in one group in common) you use the combination of Id and Date:
var groups = itemsToProcess.GroupBy(item => new
{Id = item.Id, Data = item.Data} );
Now you have groups.
Each group has a property Key, of a type with two properties: Id and Data.
Each group is a sequence of items from your itemsToProcess collection (so it is an "itemToprocess" with Id / Data / Value properties)
all items in one group have the same Id and same Data.
So all you have to do is Sum all elements from the sequence in each group.
var resultSequence = groups.Select(groupItem => new
{
Id = groupItem.Key.Id
Date = groupItem.Key.Date,
Sum = groupItem.Sum(itemToProcess => itemToProcess.Value,
}
So putting it all together into one statement:
var resultSequence = copyFrom.Rows.Cast<DataRow>()
.Select(row => new
{
Id = (string)row["oid"],
Date = DateTime.Parse( (string) row["idate"]),
Amount = Decimal.Parse (string) row["amount"]),
})
.GroupBy (itemToProcess => new
{
Id = item.Id,
Data = item.Data
});
.Select(groupItem => new
{
Id = groupItem.Key.Id
Date = groupItem.Key.Date,
Sum = groupItem.Sum(itemToProcess => itemToProcess.Value,
});

Is it possible to assign single linq query result to more than one variable?

I can do this in TSQL
SELECT
#TotalDays = COUNT(Days),
#TotalHours = SUM(Hours)
FROM
Schedule
WHERE
GroupID = 1
How to achieve this in linq in single query, my current code;
var totalDays = 0;
var totalHours = 0;
totalDays = _schedule.Count(c => c.GroupID == 1);
totalHours = _schedule.Where(w => w.GroupID == 1).Sum(s => s.Hours);
This is not effective because it call 2 separate queries in the database
You could try something like this:
var result = _schedule.Where(s => s.GroupID == 1)
.GroupBy(x => x.GroupID)
.Select(gr => new
{
TotalDays = gr.Count(),
TotalHours = gr.Sum(s=>s.Hours);
});
Initially, you filter your data based on the GroupID. You pick those with GroupID equals to 1. Then you GroupBy them by their ID. This mihgt seams a bit silly, but this way you create a group of your data. So then you count just count the item in the group and calculate the sum you want. Last but not least after having made the GroupBy, you select an anonymous type with two properties, one for the TotalDays and one for the TotalHours.
Then you can consume the above result as below:
var totalDays = 0;
var totalHours = 0;
var first = result.FirstOrDefault();
if(first!=null)
{
totalDays = first.TotalDays,
totalHours = first.TotalHours
};
The problem, sometimes, trying to make a single LINQ query is that it actually gets translated into multiple database calls. Sometimes it is better to pull all of your raw data into memory in a single database call and then perform the calculations.
This will ensure only one database call:
var data = _schedule.Where(w => w.GroupID == 1).Select(w => w.Hours).ToArray();
var totalDays = data.Count();
var totalHours = data.Sum();
The key to making this work is the .ToArray() which forces the evaluation of the database query. If there are a lot of items this call can become inefficient, but in lot of cases it is still very fast.
You can use the next code
//one request to data base
var lstResult=_schedule.Where(w => w.GroupID == 1).ToArray();
//one loop in array for Count method
totalDays = lstResult.Count();
//One loop in array for Sum method
totalHours = lstResult.Sum(s => s.Hours);

Entity framework distinct but not on all columns

I'd like to make a query through entity framework that unions contacts from two different tables, remove duplicates and orders by date. The issue I'm having is around the different dates making the same contact appear as unique. I don't want to include the date in the distinct but I do need it afterwards for the ordering. I can't do the ordering first, remove the date and then perform the distinct, because the distinct changes the ordering. Neither can I order before the union because that doesn't ensure ordering after the union.
I would like to distinct all fields except the date, which is only required for the ordering.
Ideally I would pass a comparer to the distinct but EF can't translate this to SQL.
db.Packages.Select(p => new Recent()
{
Name = p.Attention, Address1 = p.Address1, ... , Date = ShippingDate
})
.Concat(db.Letters.Select(l => new Recent()
{
Name = l.AddressedTo, Address1 = p.Address1, ..., Date = MarkedDate
})
.Distinct()
.OrderByDescending(r => r.Date);
OR the problem in SQL
SELECT DISTINCT Attention, Address1, ShippingDate
FROM Packages
UNION ALL
SELECT AddressedTo, Address1, MarkedDate
FROM Letters
ORDER BY ShipmentDate DESC
You should be able to use a GroupBy to do what you want, like so (not to mention Group By is more performant than Distinct in EF):
db.Packages.Select(p => new Recent()
{
Name = p.Attention, Address1 = p.Address1, ... , Date = ShippingDate})
.Concat(db.Letters.Select(l => new Recent()
{
Name = l.AddressedTo, Address1 = p.Address1, ..., Date = MarkedDate}))
.GroupBy(p => <parameters to group by - which make the record distinct>)
.Select(g => new {Contact = g.Key, LastShippingDate = g.Max(p => p.ShippingDate)});
I'd be concerned with this approach, even if it was possible distinct would then remove one of the items and leave you with random date out of the two, and then your sort would be totally unpredictable.

Categories

Resources