Use Linq let sum to calculate an average - c#

How to calculate sum of timedifference and average using linq let method, I tried below mentioned code it's return timedifference list only.
var query1 = from c in DBCollection.Find(Query_Collection).ToList()
let DtCreateDate = Convert.ToDateTime(c["CreatedDate"])
let DtModifiedDate = Convert.ToDateTime(c["LastModifiedDate"])
let difference = (DtModifiedDate - DtCreateDate).TotalSeconds select new { difference };

By doing the following:
var query1 = from c in DBCollection.Find(Query_Collection).ToList()
let DtCreateDate = Convert.ToDateTime(c["CreatedDate"])
let DtModifiedDate = Convert.ToDateTime(c["LastModifiedDate"])
let difference = (DtModifiedDate - DtCreateDate).TotalSeconds
let averageSum = (((DtCreateDate + DtModifiedDate) / 2) + difference) //calculate the average
select new { difference, averageSum };
Above, the 'difference' between two given dates is saved in the variable difference.
I have added another variable called 'averageSum', that now stores the value of the average between the two dates, and then adds the difference to the average.

I've don't understund how you calculate average for every row, but to accumulate custom values you can use Aggregate method as shown below:
var query1 = from c in DBCollection.Find(Query_Collection).ToList()
.Select(e => new {DtCreateDate = Convert.ToDateTime(e["CreatedDate"]), DtModifiedDate = Convert.ToDateTime(e["LastModifiedDate"])})
.Select(e => new {Diff = (e.DtModifiedDate - e.DtCreateDate).TotalSeconds, Av = 0 /* Calculate average for the row */} )
.Aggregate(new {Diff = (double) 0, Av = 0}, (group, cur) => new {Diff = group.Diff + cur.Diff, Av = group.Av + cur.Av});
Result of the statement is a single object which contais sums for difference and average on the whole list

Related

Accumulate values of a list

I have a list with that each object has two fields:
Date as DateTime
Estimated as double.
I have some values like this:
01/01/2019 2
01/02/2019 3
01/03/2019 4
... and so.
I need to generate another list, same format, but accumulating the Estimated field, date by date. So the result must be:
01/01/2019 2
01/02/2019 5 (2+3)
01/03/2019 9 (5+4) ... and so.
Right now, I'm calculating it in a foreach statement
for (int iI = 0; iI < SData.TotalDays; iI++)
{
DateTime oCurrent = SData.ProjectStart.AddDays(iI);
oRet.Add(new GraphData(oCurrent, GetProperEstimation(oCurrent)));
}
Then, I can execute a Linq Sum for all the dates prior or equal to the current date:
private static double GetProperEstimation(DateTime pDate)
{
return Data.Where(x => x.Date.Date <= pDate.Date).Sum(x => x.Estimated);
}
It works. But the problem is that is ABSLOUTELLY slow, taking more than 1 minute for a 271 element list.
Is there a better way to do this?
Thanks in advance.
You can write a simple LINQ-like extension method that accumulates values. This version is generalized to allow different input and output types:
static class ExtensionMethods
{
public static IEnumerable<TOut> Accumulate<TIn, TOut>(this IEnumerable<TIn> source, Func<TIn,double> getFunction, Func<TIn,double,TOut> createFunction)
{
double accumulator = 0;
foreach (var item in source)
{
accumulator += getFunction(item);
yield return createFunction(item, accumulator);
}
}
}
Example usage:
public static void Main()
{
var list = new List<Foo>
{
new Foo { Date = new DateTime(2018,1,1), Estimated = 1 },
new Foo { Date = new DateTime(2018,1,2), Estimated = 2 },
new Foo { Date = new DateTime(2018,1,3), Estimated = 3 },
new Foo { Date = new DateTime(2018,1,4), Estimated = 4 },
new Foo { Date = new DateTime(2018,1,5), Estimated = 5 }
};
var accumulatedList = list.Accumulate
(
(item) => item.Estimated, //Given an item, get the value to be summed
(item, sum) => new { Item = item, Sum = sum } //Given an item and the sum, create an output element
);
foreach (var item in accumulatedList)
{
Console.WriteLine("{0:yyyy-MM-dd} {1}", item.Item.Date, item.Sum);
}
}
Output:
2018-01-01 1
2018-01-02 3
2018-01-03 6
2018-01-04 10
2018-01-05 15
This approach will only require one iteration over the set so should perform much better than a series of sums.
Link to DotNetFiddle example
This is exactly job of MoreLinq.Scan
var newModels = list.Scan((x, y) => new MyModel(y.Date, x.Estimated + y.Estimated));
New models will have the values you want.
in (x, y), x is the previous item and y is the current item in the enumeration.
Why your query is slow?
because Where will iterate your collection from the beginning every time you call it. so number of operations grow exponentially 1 + 2 + 3 + ... + n = ((n^2)/2 + n/2).
You can try this. Simple yet effective.
var i = 0;
var result = myList.Select(x => new MyObject
{
Date = x.Date,
Estimated = i = i + x.Estimated
}).ToList();
Edit : try in this way
.Select(x => new GraphData(x.Date, i = i + x.Estimated))
I will assume that what you said is real what you need hehehe
Algorithm
Create a list or array of values based in the original values ordered date asc
sumValues=0;
foreach (var x in collection){
sumValues+= x.Estimated; //this will accumulate all the past values and present value
oRet.Add(x.date, sumValues);
}
The first step (order the values) is the most important. For each will be very fast.
see sort

Linq query to find average time difference per group

I have a datatable that I want to query to get the average time difference per groups of Case ID. My data looks as follows.
Name Case ID Incept Time Edit Time
---------------------------------------------------------------------------
Blue 1 2017-02-26T02:35:49-04:00 2017-03-26T02:35:49-04:00
Blue 1 2017-02-26T02:34:49-04:00 2017-04-26T02:35:49-04:00
Blue 1 2017-02-26T02:33:49-04:00 2017-05-26T02:35:49-04:00
Blue 2 2017-02-26T02:32:49-04:00 2017-06-26T02:35:49-04:00
Blue 2 2017-02-26T02:31:49-04:00 2017-07-26T01:35:49-04:00
Blue 2 2017-02-26T02:30:49-04:00 2017-08-26T03:35:49-04:00
Red 5 2017-02-26T02:25:49-04:00 2017-09-26T04:35:49-04:00
Red 5 2017-02-26T02:15:49-04:00 2017-10-26T05:35:49-04:00
Red 1 2017-02-26T02:05:49-04:00 2017-11-26T02635:49-04:00
Red 1 2017-02-26T01:35:49-04:00 2017-12-26T02:35:49-04:00
Red 5 2017-02-26T05:35:49-04:00 2017-12-27T02:35:49-04:00
So far I have the following query which can get into each group of Case ID and get the min and max values.
private IEnumerable<DataRow> _data;
var query =
from data in this._data
group data by data.Field<string>("Name") into groups
select new
{
formName = groups.Key,
caseDiffs =
from d in groups
group d by d.Field<string>("Case ID") into grps
select new
{
min = grps.Min(t =>
DateTimeOffset.ParseExact(t.Field<string>("Incept Time"), "yyyy-MM-ddTHH:mm:sszzzz", CultureInfo.InvariantCulture)
),
max = grps.Max(t =>
DateTimeOffset.ParseExact(t.Field<string>("Edit Time"), "yyyy-MM-ddTHH:mm:sszzzz", CultureInfo.InvariantCulture)
)
}
};
My questions are
1) is it possible to include the difference between the min and max values (per case ID group) to the query
2) At the end how can I get the averages calculated like the diagram below
UPDATED to reflect your changed question...
I've split this into three separate queries so that you can read it more easily (you can combine if you want):
//convert the data using a projection query
var query1 = from data in _data
let inceptTime = DateTimeOffset.ParseExact(data.Field<string>("Incept Time"), "yyyy-MM-ddTHH:mm:sszzzz", CultureInfo.InvariantCulture)
let editTime = DateTimeOffset.ParseExact(data.Field<string>("Edit Time"), "yyyy-MM-ddTHH:mm:sszzzz", CultureInfo.InvariantCulture)
let difference = editTime - inceptTime
select new
{
name = data.Field<string>("Name"),
caseId = data.Field<string>("Case ID"),
inceptTime,
editTime,
difference
};
//group by caseID (also by NAME, but that won't matter for this grouping and is needed in query3)
var query2 = from data in query1
group data by new { data.caseId, data.name } into groups
let min = groups.Min(x => x.inceptTime)
let max = groups.Max(x => x.editTime)
select new
{
name = groups.Key.name,
caseId = groups.Key.caseId,
min,
max,
diff = max - min
};
//now group by name
var query3 = from data in query2
group data by new { data.name } into groups
select new
{
name = groups.Key.name,
minDiff = groups.Min(x => x.diff),
maxDiff = groups.Max(x => x.diff),
avgDiff = new TimeSpan((long)groups.Average(x => x.diff.Ticks)),
};
NOTE: The "edit time" for the 9th record is in an invalid format
You just need to define a few let variables in your LINQ query. About 3 more lines, in fact. Your grouping LINQ should look like this:
var query =
from data in this._data
group data by data.Field<string>("Name") into groups
select new
{
formName = groups.Key,
caseDiffs = from d in groups group d by d.Field<string>("Case ID") into grps
// three variables here, so that you can do the
// date math that you require!
let minDt = caseGroup.Min(t =>
DateTimeOffset.ParseExact(t.Field<string>("Incept Time"),
"yyyy-MM-ddTHH:mm:sszzzz", CultureInfo.InvariantCulture)
let maxDt = grps.Max(t =>
DateTimeOffset.ParseExact(t.Field<string>("Edit Time"),
"yyyy-MM-ddTHH:mm:sszzzz", CultureInfo.InvariantCulture)
let diffInSecs = (maxDt - minDt).TotalSeconds
select new
{
min = minDt,
max = maxDt,
diff = diffInSecs
}
};
Hope that helps!

Get n strings from Linq

I am trying to get every 5 "NewNumber" int's to insert in to var q. Let's say there are 20 records returned by UniqueNumbers, I would like to get 1-5, 6-10, 11-15, 16-20 and then have Number1 = 1,Number2 = 2,Number3 = 3,Number4 = 4,Number5 = 5 passed to var q the first time, followed by Number1 = 6, Number2 = 7, Number3 = 8, Number4 = 9, Number5 = 10 and so on...
var UniqueNumbers =
from t in Numbers
group t by new { t.Id } into g
select new
{
NewNumber = g.Key.Id,
};
UniqueNumbers.Skip(0).Take(5)
var q = new SolrQueryInList("NewNumber1", "NewNumber2","NewNumber3","NewNumber4","NewNumber5");
If you have a list of items, you can easily separate them into groups of five like this:
int count = 0;
var groupsOfFive =
from t in remaining
group t by count++ / 5 into g
select new { Key=g.Key, Numbers = g };
And then:
foreach (var g in groupsOfFive)
{
var parms = g.Numbers.Select(n => n.ToString()).ToArray();
var q = new SolrQueryInList(parms[0], parms[1], parms[2], parms[3], parms[4]);
}
I think what you want is some variation on that.
Edit
Another way to do it, if for some reason you don't want to do the grouping, would be:
var items = remaining.Select(n => n.ToString()).ToArray();
for (int current = 0; current < remaining.Length; remaining += 5)
{
var q = new SolrQueryInList(
items[current],
items[current+1],
items[current+2],
items[current+3],
items[current+4]);
}
Both of these assume that the number of items is evenly divisible by 5. If it's not, you have to handle the possibility of not enough parameters.
Try something like this:
for (int i = 0; i < UniqueNumbers.Count / 5; i++)
{
// Gets the next 5 numbers
var group = UniqueNumbers.Skip(i * 5).Take(5);
// Convert the numbers to strings
var stringNumbers = group.Select(n => n.ToString()).ToList();
// Pass the numbers into the method
var q = new SolrQueryInList(stringNumbers[0], stringNumbers[1], ...
}
You'll have to figure out how to manage boundary conditions, like if UniqueNumbers.Count is not divisible by 5. You might also be able to modify SolrQueryInList to take a list of numbers so that you don't have to index into the list 5 times for that call.
EDIT:
Jim Mischel pointed out that looping over a Skip operation gets expensive fast. Here's a variant that keeps your place, rather than starting at the beginning of the list every time:
var remaining = UniqueNumbers;
while(remaining.Any())
{
// Gets the next 5 numbers
var group = remaining.Take(5);
// Convert the numbers to strings
var stringNumbers = group.Select(n => n.ToString()).ToList();
// Pass the numbers into the method
var q = new SolrQueryInList(stringNumbers[0], stringNumbers[1], ...
// Update the starting spot
remaining = remaining.Skip(5);
}

Aggregate data in DataTable in time intervals (5 minutes)

I have a DataTable
DataTable dt = new DataTable();
dt.Columns.Add("ts");
dt.Columns.Add("agent");
dt.Columns.Add("host");
dt.Columns.Add("metric");
dt.Columns.Add("val");
My data comes in 15 seconds intervals; and I need to get MAX "val" for a period of 5 minutes for each host/agent/metric (including the 5 min timestamp indicator)
This is the colosest thing that I have.
var q1 = from r in dt.Rows.Cast<DataRow>()
let ts = Convert.ToDateTime(r[0].ToString())
group r by new DateTime(ts.Year, ts.Month, ts.Day, ts.Hour, ts.Minute, ts.Second)
into g
select new
{
ts = g.Key,
agentName = g.Select(r => r[1].ToString()),
Sum = g.Sum(r => (int.Parse(r[4].ToString()))),
Average = g.Average(r => (int.Parse(r[4].ToString()))),
Max = g.Max(r => (int.Parse(r[4].ToString())))
};
Pretty lousy
To group the times by five minute intervals we can simply divide the Ticks in the time by the size of our interval, which we can pre-compute. In this case, it's the number of ticks in five minutes:
long ticksInFiveMinutes = TimeSpan.TicksPerMinute * 5;
The query then becomes:
var query = from r in dt.Rows.Cast<DataRow>()
let ts = Convert.ToDateTime(r[0].ToString())
group r by new { ticks = ts.Ticks / ticksInFiveMinutes, agent, host }
into g
let key = new DateTime(g.Key * ticksInFiveMinutes)
select new
{
ts = key,
agentName = g.Select(r => r[1].ToString()),
Sum = g.Sum(r => (int.Parse(r[4].ToString()))),
Average = g.Average(r => (int.Parse(r[4].ToString()))),
Max = g.Max(r => (int.Parse(r[4].ToString())))
};
How about the following approach...
Define a GetHashcode method:
public DateTime Arrange5Min(DateTime value)
{
var stamp = value.timestamp;
stamp = stamp.AddMinutes(-(stamp.Minute % 5));
stamp = stamp.AddMilliseconds(-stamp.Millisecond - 1000 * stamp.Second);
return stamp;
}
public int MyGetHashCode(DataRow r)
{
unchecked // Overflow is fine, just wrap
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + r[1].ToString().GetHashCode();
hash = hash * 23 + r[2].ToString().GetHashCode();
hash = hash * 23 + r[3].ToString().GetHashCode();
var stamp = Arrange5Min(Convert.ToDateTime(r[0].ToString()));
hash = hash * 23 + stamp.GetHashCode();
return hash;
}
}
borrowed from here: What is the best algorithm for an overridden System.Object.GetHashCode? and LINQ aggregate and group by periods of time
Then use the function in Linq
var q1 = from r in dt.Rows.Cast<DataRow>()
group r by MyGetHashCode(r)
into g
let intermidiate = new {
Row = g.First(),
Max = g.Max(v => (int.Parse(r[4].ToString())))
}
select
new {
Time = Arrange5Min(Convert.ToDateTime(intermidiate[0].ToString())),
Host = intermidiate.Row[2].ToString(),
Agent = intermidiate.Row[1].ToString(),
Metric = intermidiate.Row[3].ToString(),
Max = g.Max(v => (int.Parse(r[4].ToString())))
}

How to: sum all values and assign a percentage of the total in Linq to sql

I have a simple linq query that I'm trying to extend so that I can first sum all the values in the VoteCount field and then for each Nominee I want to assign what percentage of votes the nominee received.
Here's the code:
TheVoteDataContext db = new TheVoteDataContext();
var results = from n in db.Nominees
join v in db.Votes on n.VoteID equals v.VoteID
select new
{
Name = n.Name,
VoteCount = v.VoteCount,
NomineeID = n.NomineeID,
VoteID = v.VoteID
};
Since selecting the single votes for each nominee and calculating the sum of all votes are two different tasks, I cannot think of a way of doing this efficiently in one single query. I would simply do it in two steps, as
var results = from n in db.Nominees
join v in db.Votes on n.VoteID equals v.VoteID
select new
{
Name = n.Name,
VoteCount = v.VoteCount,
NomineeID = n.NomineeID,
VoteID = v.VoteID
};
var sum = (decimal)results.Select(r=>r.VoteCount).Sum();
var resultsWithPercentage = results.Select(r=>new {
Name = r.Name,
VoteCount = r.VoteCount,
NomineeID = r.NomineeID,
VoteID = r.VoteID,
Percentage = sum != 0 ? (r.VoteCount / sum) * 100 : 0
});
You could also calculate the sum before the results (using an aggregate query), this would leave the task of summing to the Database engine. I believe that this would be slower, but you can always find out by trying :)

Categories

Resources