Linq Aggregate on object and List - c#

I do this query with NHibernate:
var test = _session.CreateCriteria(typeof(Estimation))
.SetFetchMode("EstimationItems", FetchMode.Eager)
.List();
An "Estimation" can have several "EstimationItems" (Quantity, Price and ProductId)
I'd like a list of "Estimation" with these constraints :
One line by "Estimation" code on the picture (ex : 2011/0001 and 2011/0003)
By estimation (means on each line) the number of "EstimationItems"
By Estimation (means on each line) the total price (Quantity * Price) for each "EstimationItems"
I hope the structure will be clearer with the picture below.
Thanks,

Here's a proposition:
var stats =
from estimation in test
group estimation by estimation.code into gestimation
let allItems = gestimation.SelectMany(x => x.EstimationItems)
select new
{
Code = gestimation.Key,
ItemNumber = allItems.Count(),
TotalPrice = allItems.Sum(item => item.Price * item.Quantity)
};
Now this creates an anonymous type with the three properties you wanted (code of the estimation, number of items for this estimation code, and total price of the items for this estimation code).
You can adapt it to specific needs. Just bear in mind that allItems is a IEnumerable<EtimationItem> containing all the EstimationItem belonging to a Estimation with the same code.
If you want to use this object outside the scope of the method creating it, which you can't do with anonymous types, then you should create a class to hold these values.
Corrected proposition:
proposition:
var stats =
(from est in test.Cast<Estimation>()
group est by est.code into gEst
let allItems = gEst.SelectMany(est => est.EstimationItems).Cast<EstimationItem>()
select new TestingUI
{
Code = gEst.Key,
Quantity = gEst.Count(),
Total = gEst.Sum(item => item.Price * item.Quantity)
}).ToList();

Dictionary<string, Tuple<int,decimal>> dico = new Dictionary<string, Tuple<int,decimal>>();
foreach (var itemEstimation in test)
{
Estimation estimation = (Estimation)itemEstimation;
if (dico.ContainsKey(estimation.Code) == false)
{
decimal total = 0;
foreach (var item in estimation.EstimationItems)
{
EstimationItem estimationItem = (EstimationItem)item;
total += item.Price * item.Quantity;
}
dico.Add(estimation.Code, new Tuple<int, decimal>(estimation.EstimationItems.Sum(x => x.Quantity), total));
}
}
List<TestingUI> finalResult = new List<TestingUI>();
foreach (var item in dico)
{
Tuple<int, decimal> result;
dico.TryGetValue(item.Key, out result);
finalResult.Add(new TestingUI() { Code = item.Key, Quantity = result.Item1, Total = result.Item2 });
}

Related

Accumulate values of a list

I have a list with that each object has two fields:
Date as DateTime
Estimated as double.
I have some values like this:
01/01/2019 2
01/02/2019 3
01/03/2019 4
... and so.
I need to generate another list, same format, but accumulating the Estimated field, date by date. So the result must be:
01/01/2019 2
01/02/2019 5 (2+3)
01/03/2019 9 (5+4) ... and so.
Right now, I'm calculating it in a foreach statement
for (int iI = 0; iI < SData.TotalDays; iI++)
{
DateTime oCurrent = SData.ProjectStart.AddDays(iI);
oRet.Add(new GraphData(oCurrent, GetProperEstimation(oCurrent)));
}
Then, I can execute a Linq Sum for all the dates prior or equal to the current date:
private static double GetProperEstimation(DateTime pDate)
{
return Data.Where(x => x.Date.Date <= pDate.Date).Sum(x => x.Estimated);
}
It works. But the problem is that is ABSLOUTELLY slow, taking more than 1 minute for a 271 element list.
Is there a better way to do this?
Thanks in advance.
You can write a simple LINQ-like extension method that accumulates values. This version is generalized to allow different input and output types:
static class ExtensionMethods
{
public static IEnumerable<TOut> Accumulate<TIn, TOut>(this IEnumerable<TIn> source, Func<TIn,double> getFunction, Func<TIn,double,TOut> createFunction)
{
double accumulator = 0;
foreach (var item in source)
{
accumulator += getFunction(item);
yield return createFunction(item, accumulator);
}
}
}
Example usage:
public static void Main()
{
var list = new List<Foo>
{
new Foo { Date = new DateTime(2018,1,1), Estimated = 1 },
new Foo { Date = new DateTime(2018,1,2), Estimated = 2 },
new Foo { Date = new DateTime(2018,1,3), Estimated = 3 },
new Foo { Date = new DateTime(2018,1,4), Estimated = 4 },
new Foo { Date = new DateTime(2018,1,5), Estimated = 5 }
};
var accumulatedList = list.Accumulate
(
(item) => item.Estimated, //Given an item, get the value to be summed
(item, sum) => new { Item = item, Sum = sum } //Given an item and the sum, create an output element
);
foreach (var item in accumulatedList)
{
Console.WriteLine("{0:yyyy-MM-dd} {1}", item.Item.Date, item.Sum);
}
}
Output:
2018-01-01 1
2018-01-02 3
2018-01-03 6
2018-01-04 10
2018-01-05 15
This approach will only require one iteration over the set so should perform much better than a series of sums.
Link to DotNetFiddle example
This is exactly job of MoreLinq.Scan
var newModels = list.Scan((x, y) => new MyModel(y.Date, x.Estimated + y.Estimated));
New models will have the values you want.
in (x, y), x is the previous item and y is the current item in the enumeration.
Why your query is slow?
because Where will iterate your collection from the beginning every time you call it. so number of operations grow exponentially 1 + 2 + 3 + ... + n = ((n^2)/2 + n/2).
You can try this. Simple yet effective.
var i = 0;
var result = myList.Select(x => new MyObject
{
Date = x.Date,
Estimated = i = i + x.Estimated
}).ToList();
Edit : try in this way
.Select(x => new GraphData(x.Date, i = i + x.Estimated))
I will assume that what you said is real what you need hehehe
Algorithm
Create a list or array of values based in the original values ordered date asc
sumValues=0;
foreach (var x in collection){
sumValues+= x.Estimated; //this will accumulate all the past values and present value
oRet.Add(x.date, sumValues);
}
The first step (order the values) is the most important. For each will be very fast.
see sort

C# Join multiple collections into one

i have problem with joining multiple collections into one
-> I need collections with data from many sensors connect into one to have for each time values from all sensors in output file, f.e. if one sensor have no data, it will fill file with 0
Please help me, I am desperate
public class MeasuredData
{
public DateTime Time { get; }
public double Value { get; }
public MeasuredData(DateTime time, double value)
{
Time = time;
Value = value;
}
}
If you have multiple variables containing List<MeasuredData>, one for each sensor, you can group them in an array and then query them.
First, you need an extension method to round the DateTimes per #jdweng if you aren't already canonicalizing them as you acquire them.
public static DateTime Round(this DateTime dt, TimeSpan rnd) {
if (rnd == TimeSpan.Zero)
return dt;
else {
var ansTicks = dt.Ticks + Math.Sign(dt.Ticks) * rnd.Ticks / 2;
return new DateTime(ansTicks - ansTicks % rnd.Ticks);
}
}
Now you can create an array of the sensor reading Lists:
var sensorData = new[] { sensor0, sensor1, sensor2, sensor3 };
Then you can extract all the rounded times to create the left hand side of the table:
var roundTo = TimeSpan.FromSeconds(1);
var times = sensorData.SelectMany(sdl => sdl.Select(md => md.Time.Round(roundTo)))
.Distinct()
.Select(t => new { Time = t, Measurements = Enumerable.Empty<MeasuredData>() });
Then you can join each sensor to the table:
foreach (var oneSensorData in sensorData)
times = times.GroupJoin(oneSensorData, t => t.Time, md => md.Time.Round(roundTo),
(t, mdj) => new { t.Time, Measurements = t.Measurements.Concat(mdj) });
Finally, you can convert each row to the time and a List of measurements ordered by time:
var ans = times.Select(tm => new { tm.Time, Measurements = tm.Measurements.ToList() })
.OrderBy(tm => tm.Time);
If you wanted to flatten the List of measurements out to fields in the answer, you would need to do that manually with another Select.
Assuming you have something to join on, you can use Enumerable.Join:
var result = collection1.Join(collection2,
/* whatever your join is */ x => x.id,
y => y.id,
(a, b) => new {x = a, y = b}
foreach(var obj in result)
{
Console.WriteLine($"{obj.x.id}, {obj.y.id}")
}
This prints the id's of the two objects, but they could access anything. The link is probably more helpful, but you didn't give us much info

How do I get total Qty using one linq query?

I have two linq queries, one to get confirmedQty and another one is to get unconfirmedQty.
There is a condition for getting unconfirmedQty. It should be average instead of sum.
result = Sum(confirmedQty) + Avg(unconfirmedQty)
Is there any way to just write one query and get the desired result instead of writing two separate queries?
My Code
class Program
{
static void Main(string[] args)
{
List<Item> items = new List<Item>(new Item[]
{
new Item{ Qty = 100, IsConfirmed=true },
new Item{ Qty = 40, IsConfirmed=false },
new Item{ Qty = 40, IsConfirmed=false },
new Item{ Qty = 40, IsConfirmed=false },
});
int confirmedQty = Convert.ToInt32(items.Where(o => o.IsConfirmed == true).Sum(u => u.Qty));
int unconfirmedQty = Convert.ToInt32(items.Where(o => o.IsConfirmed != true).Average(u => u.Qty));
//Output => Total : 140
Console.WriteLine("Total : " + (confirmedQty + unconfirmedQty));
Console.Read();
}
public class Item
{
public int Qty { get; set; }
public bool IsConfirmed { get; set; }
}
}
Actually accepted answer enumerates your items collection 2N + 1 times and it adds unnecessary complexity to your original solution. If I'd met this piece of code
(from t in items
let confirmedQty = items.Where(o => o.IsConfirmed == true).Sum(u => u.Qty)
let unconfirmedQty = items.Where(o => o.IsConfirmed != true).Average(u => u.Qty)
let total = confirmedQty + unconfirmedQty
select new { tl = total }).FirstOrDefault();
it would take some time to understand what type of data you are projecting items to. Yes, this query is a strange projection. It creates SelectIterator to project each item of sequence, then it create some range variables, which involves iterating items twice, and finally it selects first projected item. Basically you have wrapped your original queries into additional useless query:
items.Select(i => {
var confirmedQty = items.Where(o => o.IsConfirmed).Sum(u => u.Qty);
var unconfirmedQty = items.Where(o => !o.IsConfirmed).Average(u => u.Qty);
var total = confirmedQty + unconfirmedQty;
return new { tl = total };
}).FirstOrDefault();
Intent is hidden deeply in code and you still have same two nested queries. What you can do here? You can simplify your two queries, make them more readable and show your intent clearly:
int confirmedTotal = items.Where(i => i.IsConfirmed).Sum(i => i.Qty);
// NOTE: Average will throw exception if there is no unconfirmed items!
double unconfirmedAverage = items.Where(i => !i.IsConfirmed).Average(i => i.Qty);
int total = confirmedTotal + (int)unconfirmedAverage;
If performance is more important than readability, then you can calculate total in single query (moved to extension method for readability):
public static int Total(this IEnumerable<Item> items)
{
int confirmedTotal = 0;
int unconfirmedTotal = 0;
int unconfirmedCount = 0;
foreach (var item in items)
{
if (item.IsConfirmed)
{
confirmedTotal += item.Qty;
}
else
{
unconfirmedCount++;
unconfirmedTotal += item.Qty;
}
}
if (unconfirmedCount == 0)
return confirmedTotal;
// NOTE: Will not throw if there is no unconfirmed items
return confirmedTotal + unconfirmedTotal / unconfirmedCount;
}
Usage is simple:
items.Total();
BTW Second solution from accepted answer is not correct. It's just a coincidence that it returns correct value, because you have all unconfirmed items with equal Qty. This solution calculates sum instead of average. Solution with grouping will look like:
var total =
items.GroupBy(i => i.IsConfirmed)
.Select(g => g.Key ? g.Sum(i => i.Qty) : (int)g.Average(i => i.Qty))
.Sum();
Here you have grouping items into two groups - confirmed and unconfirmed. Then you calculate either sum or average based on group key, and summary of two group values. This also neither readable nor efficient solution, but it's correct.

Complicated Linq Query

I have a table in a database with 2 fields: index (int), email( varchar(100) )
I need to do the following:
Group all emails by domains names (all emails already lowercase).
Select all emails from all groups where the sum of emails for domain not exceeding 20% of total emails before step 1.
Code example:
DataContext db = new DataContext();
//Domains to group by
List<string> domains = new List<string>() { "gmail.com", "yahoo.com", "hotmail.com" };
Dictionary<string, List<string>> emailGroups = new Dictionary<string, List<string>>();
//Init dictionary
foreach (string thisDomain in domains)
{
emailGroups.Add(thisDomain, new List<string>());
}
//Get distinct emails
var emails = db.Clients.Select(x => x.Email).Distinct();
//Total emails
int totalEmails = emails.Count();
//One percent of total emails
int onePercent = totalEmails / 100;
//Run on each email
foreach (var thisEmail in emails)
{
//Run on each domain
foreach (string thisDomain in emailGroups.Keys)
{
//If email from this domain
if (thisEmail.Contains(thisDomain))
{
//Add to dictionary
emailGroups[thisDomain].Add(thisEmail);
}
}
}
//Will store the final result
List<string> finalEmails = new List<string>();
//Run on each domain
foreach (string thisDomain in emailGroups.Keys)
{
//Get percent of emails in group
int thisDomainPercents = emailGroups[thisDomain].Count / onePercent;
//More than 20%
if (thisDomainPercents > 20)
{
//Take only 20% and join to the final result
finalEmails = finalEmails.Union(emailGroups[thisDomain].Take(20 * onePercent)).ToList();
}
else
{
//Join all to the final result
finalEmails = finalEmails.Union(emailGroups[thisDomain]).ToList();
}
}
Does anyone know a better way to make it?
I can't think of a way of doing this without hitting the DB at least twice, once for the grouping and one for the overall count, you could try something like
var query = from u in db.Users
group u by u.Email.Split('#')[1] into g
select new
{
Domain = g.Key,
Users = g.ToList()
};
query = query.Where(x => x.Users.Count <= (db.Users.Count() * 0.2));
Suppose you want to get the last items in the ascending order in each group:
int m = (int) (input.Count() * 0.2);
var result = input.GroupBy(x=>x.email.Split('#')[1],
(key,g)=>g.OrderByDescending(x=>x.index).Take(m)
.OrderBy(x=>x.index))
.SelectMany(g=>g);//If you want to get the last result without grouping
Or this:
var result = input.GroupBy(x=>x.email.Split('#')[1],
(key,g)=>g.OrderBy(x=>x.index)
.Skip(g.Count()-m))
.SelectMany(g=>g);//If you want to get the last result without grouping
var maxCount = db.Users.Count() * 0.2;
var query = (from u in db.Users
group u by u.Email.Split('#')[1] into g
select new
{
Domain = g.Key,
Users = g.Take(maxCount).ToList()
})
.SelectMany(x => x.Users);

Tricky LINQ query for text file format transformation

I have a shopping list in a text file, like this:
BuyerId Item;
1; Item1;
1; Item2;
1; ItemN;
2; Item1;
2; ItemN;
3; ItemN;
I need to transform this list to a format like this:
Item1; Item2; Item3; ...; ItemN <--- For buyer 1
Item1; ...; ItemN <--- For buyer 2
Item1; ...; ItemN <--- For buyer 3
First I parse the CSV file like this:
IList<string[]> parsedcsv = (from line in lines.Skip(1)
let parsedLine = line.TrimEnd(';').Split(';')
select parsedLine).ToList();
Then I group the items with LINQ and aggregate them to the final format:
IEnumerable<string> buyers = from entry in parsedcsv
group entry by entry[0] into cart
select cart.SelectMany(c => c.Skip(1))
.Aggregate((item1, item2) =>
item1 + ";" + item2).Trim();
HOWEVER, as it happens, the BuyerId is not unique, but repeats after a number of times (for example, it can repeat like this: 1,2,3,4,5,1,2,3,4,5,1,2,3 or like this 1,2,3,1,2,3,1,2).
No big deal, I could quite easily fix this by grouping the items in a loop that checks that I only deal with one buyer at a time:
int lastBatchId = 0;
string currentId = parsedcsv[0][0];
for (int i = 0; i < parsedcsv.Count; i++)
{
bool last = parsedcsv.Count - 1 == i;
if (parsedcsv[i][0] != currentId || last)
{
IEnumerable<string> buyers = from entry in parsedcsv.Skip(lastBatchId)
.Take(i - lastBatchId + (last ? 1 : 0))
...
lastBatchId = i;
currentId = parsedcsv[i][0];
...
... however, this is not the most elegant solution. I'm almost certain this can be done only with LINQ.
Can anyone help me out here please ?
Thanks!
You should have a look at GroupAdjacent.
I'm not sure this is the best solution, but you said you want a pure Linq answer, so here you have it:
var result = from r in (
from l in lines.Skip(1)
let data = l.Split(new string[]{";"," "},
StringSplitOptions.RemoveEmptyEntries)
select new { Id = data.First(), Item = data.Skip(1).First() })
.Aggregate(new
{
Rows = Enumerable.Repeat(new
{
Id = string.Empty,
Items = new List<string>()
}, 1).ToList(),
LastID = new List<string>() { "" }
},
(acc, x) =>
{
if (acc.Rows[0].Id == string.Empty)
acc.Rows.Clear();
if (acc.LastID[0] != x.Id)
acc.Rows.Add(new
{
Id = x.Id,
Items = new List<string>()
});
acc.Rows.Last().Items.Add(x.Item);
acc.LastID[0] = x.Id;
return acc;
}
).Rows
select new
{
r.Id,
Items = string.Join(";", from x in r.Items
select x)
};
I wrote it pretty fast and it could be improved, I don't like it particularly because it resorts to a couple of tricks, but it's pure Linq and could be a starting point.

Categories

Resources