LINQ and multiple aggregates

LINQ and multiple aggregates - c#

I have code similar to this:
decimal total1 = 0;
decimal poTotal = 0;
foreach( var record in listOfRecords)
{
total1 += record.Price;
if ( record.HasPo){
poTotal += record.PoTotal;
}
}
This works fine but I'd like to know how to perform multiple aggregates using linq without excessive coding for groups etc... is there a simple way that doesn't require scanning the list of objects each time?
I know I could do this:
var poTotal = listOfRecords.Where(r=> r.HasPo).Sum(r.PoTotal);
But that requires scanning the entire list and I if I'm to aggregate multiple values I only want to loop/scan one time.

You can use Aggregate method, but I don't think it will be more clear than a simple foreach loop you already have:
var totals = listOfRecords.Aggregate(
new { Total = 0m, PoTotal = 0m },
(a, r) => new {
Total = a.Total + r.Price,
PoTotal = a.PoTotal + (r.HasPo ? r.PoTotal : 0m)
});
Console.WriteLine(totals.Total);
Console.WriteLine(totals.PoTotal);

You could also do something like this:
decimal poTotal = 0;
decimal total = listOfRecords.Sum(record => {
if (record.HasPo) {poTotal += record.PoTotal;}
return record.Price;
});
But I'm not saying you should. As MarcinJurasek says, the simple foreach is clearest in this case.

Related

C# Splitting a List<string> Value

I have a List with values {"1 120 12", "1 130 22", "2 110 21", "2 100 18"}, etc.
List<string> myList = new List<string>();
myList.Add("1 120 12");
myList.Add("1 130 22");
myList.Add("2 110 21");
myList.Add("2 100 18");
I need to count based on the first number (ID) is and sum the consequent values for this
IDs i.e. for ID = 1 -> 120+130=150 and 12+22=34 and so on... I have to return an array with these values.
I know I can get these individual values, add them to an array and split it by the empty space between them with something like:
string[] arr2 = arr[i].Split(' ');
and loop thru them to do the sum of each value, but... is there an easy way to do it straight using Lists or Linq Lambda expression?

You can do it in LINQ like this:
var result = myList.Select(x => x.Split(' ').Select(int.Parse))
.GroupBy(x => x.First())
.Select(x => x.Select(y => y.Skip(1).ToArray())
.Aggregate(new [] {0,0}, (y,z) => new int[] {y[0] + z[0], y[1] + z[1]}));
First, the strings are split and converted to int, then they are grouped by ID, then the ID is dropped, and in the end, they are summed together.
But I strongly recommend not doing it in LINQ, because this expression is not easy to understand. If you do it the classic way with a loop, it is quite clear what is going on at first sight. But put this code containing the loop into a separate method, because that way it won't distract you and you still only call a one-liner as in the LINQ solution.

To do it straight, no LINQ, perhaps:
var d = new Dictionary<string, (int A, int B)>();
foreach(var s in myList){
var bits = s.Split();
if(!d.ContainsKey(bits[0]))
d[bits[0]] = (int.Parse(bits[1]), int.Parse(bits[2]));
else {
(int A, int B) x = d[bits[0]];
d[bits[0]] = (x.A + int.Parse(bits[1]), x.B + int.Parse(bits[2]));
}
}
Using LINQ to parse the int, and switching to using TryGetValue, will tidy it up a bit:
var d = new Dictionary<int, (int A, int B)>();
foreach(var s in myList){
var bits = s.Split().Select(int.Parse).ToArray();
if(d.TryGetValue(bits[0], out (int A, int B) x))
d[bits[0]] = ((x.A + bits[1], x.B + bits[2]));
else
d[bits[0]] = (bits[1], bits[2]);
}
Introducing a local function to safely get either the existing nums in the dictionary or a (0,0) pair might reduce it a bit too:
var d = new Dictionary<int, (int A, int B)>();
(int A, int B) safeGet(int i) => d.ContainsKey(i) ? d[i]: (0,0);
foreach(var s in myList){
var bits = s.Split().Select(int.Parse).ToArray();
var nums = safeGet(bits[0]);
d[bits[0]] = (bits[1] + nums.A, bits[2] + nums.B);
}
Is it any more readable than a linq version? Hmm... Depends on your experience with Linq, and tuples, I suppose..

I know this question already has a lot of answers, but I have not seen one yet that focuses on readability.
If you split your code into a parsing phase and a calculation phase, we can use LINQ without sacrificing readability or maintainability, because each phase only does one thing:
List<string> myList = new List<string>();
myList.Add("1 120 12");
myList.Add("1 130 22");
myList.Add("2 110 21");
myList.Add("2 100 18");
var parsed = (from item in myList
let split = item.Split(' ')
select new
{
ID = int.Parse(split[0]),
Foo = int.Parse(split[1]),
Bar = int.Parse(split[2])
});
var summed = (from item in parsed
group item by item.ID into groupedByID
select new
{
ID = groupedByID.Key,
SumOfFoo = groupedByID.Sum(g => g.Foo),
SumOfBar = groupedByID.Sum(g => g.Bar)
}).ToList();
foreach (var s in summed)
{
Console.WriteLine($"ID: {s.ID}, SumOfFoo: {s.SumOfFoo}, SumOfBar: {s.SumOfBar}");
}
fiddle

If you want, but I think it will be much easier to edit and optimize using the usual value. I don't find using this kind of logic inside LINQ will stay that way for a long period of time. Usually, we need to add more values, more parsing, etc. Make it not really suitable for everyday use.
var query = myList.Select(a => a.Split(' ').Select(int.Parse).ToArray())
.GroupBy(
index => index[0],
amount => new
{
First = amount[1],
Second = amount[2]
},
(index, amount) => new
{
Index = index,
SumFirst = amount.Sum(a => a.First),
SumSecond = amount.Sum(a => a.Second)
}
);
fiddle

is there an easy way to do it straight using Lists or Linq Lambda expression?
Maybe, is it wise to do this? Probably not. Your code will be hard to understand, impossible to unit test, the code will probably not be reusable, and small changes are difficult.
But let's first answer your question as a one LINQ statement:
const char separatorChar = ' ';
IEnumerable<string> inputText = ...
var result = inputtext.Split(separatorChar)
.Select(text => Int32.Parse(text))
.Select(numbers => new
{
Id = numbers.First()
Sum = numbers.Skip(1).Sum(),
});
Not reusable, hard to unit test, difficult to change, not efficient, do you need more arguments?
It would be better to have a procedure that converts one input string into a proper object that contains what your input string really represents.
Alas, you didn't tell us if every input string contains three integer numbers, of that some might contain invalid text, and some might contain more or less than three integer numbers.
You forgot to tell use what your input string represents.
So I'll just make up an identifier:
class ProductSize
{
public int ProductId {get; set;} // The first number in the string
public int Width {get; set;} // The 2nd number
public int Height {get; set;} // The 3rd number
}
You need a static procedure with input a string, and output one ProductSize:
public static ProductSize FromText(string productSizeText)
{
// Todo: check input
const char separatorChar = ' ';
var splitNumbers = productSizeText.Split(separatorChar)
.Select(splitText => Int32.Parse(splitText))
.ToList();
return new ProductSize
{
ProductId = splitNumbers[0],
Width = splitNumbers[1],
Height = splitNumbers[2],
};
}
I need to count based on the first number (ID) is and sum the consequent values for this IDs
After creating method ParseProductSize this is easy:
IEnumerable<string> textProductSizes = ...
var result = textProductSizes.Select(text => ProductSize.FromText(text))
.Select(productSize => new
{
Id = productSize.Id,
Sum = productSize.Width + productSize.Height,
});
If your strings do not always have three numbers
If you don't have always three numbers, then you won't have Width and Height, but a property:
IEnumerable<int> Numbers {get; set;} // TODO: invent proper name
And in ParseProductSize:
var splitText = productSizeText.Split(separatorChar);
return new ProductSize
{
ProductId = Int32.Parse(splitText[0]),
Numbers = splitText.Skip(1)
.Select(text => Int32.Parse(text));
I deliberately keep it an IEnumerable, so if you don't use all Numbers, you won't have parsed numbers for nothing.
The LINQ:
var result = textProductSizes.Select(text => ProductSize.FromText(text))
.Select(productSize => new
{
Id = productSize.Id,
Sum = productSize.Numbers.Sum(),
});

Query values from a C# Generic Dictionary using LINQ

Here's the code i have:
Dictionary<double, long> dictionary = new Dictionary<double, long>();
dictionary.Add(99, 500);
dictionary.Add(98, 500);
dictionary.Add(101, 8000);
dictionary.Add(103, 6000);
dictionary.Add(104, 5);
dictionary.Add(105, 2000);
double price = 100;
the query i want is:
the key that is nearest price AND with the lowest value.
so in the above example it should return 99.
how do i code this in LINQ ?
i have seen alot of linq examples but i cannt adapt any of them to my needs b/c my query has 2 conditions.
thanks for any help.
edit:
based on comments from #nintendojunkie and #DmitryMartovoi i have had to rethink my approach.
if i prioritize key closest to price then resulting value may not be the lowest and if i prioritize value first then the key may be too far from price so the query will have to prioritize BOTH the key and value the same and give me the lowest value with the closest key to price. both key and value are equally important.
can anyone help on this?
thanks

Don't forget - you use dictionary. Dictionary has only unique keys. I think you consider this structure as List<KeyValuePair<double, long>>. If so - please look to this example:
var minimumKeyDifference = dictionary.Min(y => Math.Abs(y.Key - price));
var minimumItems = dictionary.Where(x => Math.Abs(x.Key - price).Equals(minimumKeyDifference));
var desiredKey = dictionary.First(x => x.Value.Equals(minimumItems.Where(y => y.Key.Equals(x.Key)).Min(y => y.Value))).Key;

You say that you need to find the closest price and the lowest value, but you don't define the rules for attributing precedence between two. In the below, I'm attributing them equal precedence: a price distance of 1 is equivalent to a value of 1.
var closest =
dictionary.OrderBy(kvp => Math.Abs(kvp.Key - price) + kvp.Value).First();
The OrderBy(…).First() should be replaced by a MinBy(…) operator, if available, for performance.
Edit: If the value is only meant to serve as a tiebreaker, then use this (also posted by Giorgi Nakeuri):
var closest =
dictionary.OrderBy(kvp => Math.Abs(kvp.Key - price))
.ThenBy(kvp => kvp.Value)
.First();

You can do it this way:
var result = dictionary.Select(c => new { c.Key, Diff = Math.Abs(price - c.Key) + Math.Abs(price - c.Value), c.Value }).OrderBy(c => c.Diff).FirstOrDefault();

The following works if you change your dictionary key's data type to decimal instead of double.
decimal price = 100;
decimal smallestDiff = dictionary.Keys.Min(n => Math.Abs(n - price));
var nearest = dictionary.Where(n => Math.Abs(n.Key - price) == smallestDiff)
.OrderBy(n => n.Value).First();
If you use double this may fail due to rounding issues, but decimal is preferred for anything having to do with money to avoid those issues.

var price = 100.0;
var nearestKey = (from pair in dictionary
let diff = Math.Abs(pair.Key - price)
select new {Key = pair.Key, Diff = diff}
order by diff desc).First().Key;
var minValue = dictionary[nearestKey];

Maybe you want a magic linq query but i suggest to try the in below.
public static class MyExtensions
{
public static double? GetNearestValue (this IDictionary<double, long> dictionary, double value)
{
if (dictionary == null || dictionary.Count == 0)
return null;
double? nearestDiffValue = null;
double? nearestValue = null;
foreach (var item in dictionary) {
double currentDiff = Math.Abs (item.Key - value);
if (nearestDiffValue == null || currentDiff < nearestDiffValue.Value) {
nearestDiffValue = currentDiff;
nearestValue = item.Value;
}
}
return nearestValue;
}
}
And call like this
Console.WriteLine (dictionary.GetNearestValue (100d));

var min = dictionary
.OrderBy(pair => pair.Value)
.Select(pair =>
new
{
k = pair.Key,
d = Math.Abs(pair.Key - price)
})
.OrderBy(t => t.d)
.Select(t => t.k)
.FirstOrDefault();

How to select multiple properties properly?

MyObject have two property named p1 and p2 in int type ;now I want for each of MyObject take p1 and p2 and add those up. I tried this:
int p1Sum = 0, p2Sum = 0;
foreach (int[] ps in new MyEntity().MyObject.Select(o => new { o.p1, o.p2 }))
{
p1Sum += ps[0];
p2Sum += ps[1];
}
but says:
cannot convert AnonymousType#1 to int[]
on foreach.
How can I fix this?

foreach (var ps in new MyEntity().MyObject.Select(o => new { o.p1, o.p2 }))
{
p1Sum += ps.p1;
p2Sum += ps.p2;
}

jyparask's answer will definitely work, but it's worth considering using Sum twice instead - it will involve two database calls, but it may (check!) avoid fetching all the individual values locally:
var entities = new MyEntity().MyObject;
var p1Sum = entities.Sum(x => x.p1);
var p2Sum = entities.Sum(x => x.p2);
Now there's at least logically the possibility of inconsistency here - some entities may be removed or added between the two Sum calls. However, it's possible that EF will ensure that doesn't happen (e.g. via caching) or it may not be relevant in your situation. It's definitely something you should think consider.

In addition to Jon Skeet and jyparask answer you can also try :
var result = (new MyEntity().MyObject
.GroupBy(_=> 0)
.Select(r=> new
{
p1Sum = r.Sum(x=> x.p1)
p2Sum = r.Sum(x=> x.p2)
})
.FirstOrDefault();
The above would result in a single query fetching only Sum for both columns, You may look at the query generated and its execution plan if you are concerned about the performance.
if(result != null)
{
Console.WriteLine("p1Sum = " + result.p1Sum);
Console.WriteLine("p2Sum = " + result.p2Sum);
}

Sliding time window for record analysis

I have a data structure of phone calls. For this question there are two fields, CallTime and NumberDialled.
The analysis I want to perform is "Are there more than two calls to the same number in a 10 second window" The collection is sorted by CallTime already and is a List<Cdr>.
My solution is
List<Cdr> records = GetRecordsSortedByCallTime();
for (int i = 0; i < records.Count; i++)
{
var baseRecord = records[i];
for (int j = i; j < records.Count; j++)
{
var comparisonRec = records[j];
if (comparisonRec.CallTime.Subtract(baseRecord.CallTime).TotalSeconds < 20)
{
if (comparisonRec.NumberDialled == baseRecord.NumberDialled)
ReportProblem(baseRecord, comparisonRec);
}
else
{
// We're more than 20 seconds away from the base record. Break out of the inner loop
break;
}
}
}
Whis is ugly to say the least. Is there a better, cleaner and faster way of doing this?
Although I haven't tested this on a large data set, I will be running it on about 100,000 records per hour so there will be a large number of comparisons for each record.
Update The data is sorted by time not number as in an earlier version of the question

If the phone calls are already sorted by call time, you can do the following:
Initialize a hash table that has a counter for every phone number (the hash table can be first empty and you add elements to it as you go)
Have two pointers to the linked list of yours, let's call them 'left' and 'right'
Whenever the timestamp between the 'left' and 'right' call is less than 10 seconds, move 'right' forwards by one, and increment the count of the newly encountered phone number by one
Whenever the difference is above 10 seconds, move 'left' forwards by one and decrement the count for the phone number from which 'left' pointer left by one
At any point, if there is a phone number whose counter in the hash table is 3 or more, you have found a phone number that has more than 2 calls within a 10 seconds window
This is a linear-time algorithm and processes all the numbers in parallel.

I didn't know you exact structures, so I created my own for this demonstration:
class CallRecord
{
public long NumberDialled { get; set; }
public DateTime Stamp { get; set; }
}
class Program
{
static void Main(string[] args)
{
var calls = new List<CallRecord>()
{
new CallRecord { NumberDialled=123, Stamp=new DateTime(2011,01,01,10,10,0) },
new CallRecord { NumberDialled=123, Stamp=new DateTime(2011,01,01,10,10,9) },
new CallRecord { NumberDialled=123, Stamp=new DateTime(2011,01,01,10,10,18) },
};
var dupCalls = calls.Where(x => calls.Any(y => y.NumberDialled == x.NumberDialled && (x.Stamp - y.Stamp).Seconds > 0 && (x.Stamp - y.Stamp).Seconds <= 10)).Select(x => x.NumberDialled).Distinct();
foreach (var dupCall in dupCalls)
{
Console.WriteLine(dupCall);
}
Console.ReadKey();
}
}
The LINQ expression loops through all records and finds records which are ahead of the current record (.Seconds > 0), and within the time limit (.Seconds <= 10). This might be a bit of a performance hog due to the Any method constantly going over your whole list, but at least the code is cleaner :)

I recommand you to use Rx Extension and the Interval method.
The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators. Using Rx, developers represent asynchronous data streams with Observables, query asynchronous data streams using LINQ operators, and parameterize the concurrency in the asynchronous data streams using Schedulers
The Interval method returns an observable sequence that produces a value after each period
Here is quick example :
var callsPer10Seconds = Observable.Interval(TimeSpan.FromSeconds(10));
from x in callsPer10Seconds
group x by x into g
let count = g.Count()
orderby count descending
select new {Value = g.Key, Count = count};
foreach (var x in q)
{
Console.WriteLine("Value: " + x.Value + " Count: " + x.Count);
}

records.OrderBy(p => p.CallTime)
.GroupBy(p => p.NumberDialled)
.Select(p => new { number = p.Key, cdr = p.ToList() })
.Select(p => new
{
number = p.number,
cdr =
p.cdr.Select((value, index) => index == 0 ? null : (TimeSpan?)(value.CallTime - p.cdr[index - 1].CallTime))
.FirstOrDefault(q => q.HasValue && q.Value.TotalSeconds < 10)
}).Where(p => p.cdr != null);

In two steps :
Generate an enumeration with the call itself and all calls in the interesting span
Filter this list to find consecutive calls
The computation is done in parallel on each record using the AsParallel extension method.
It is also possible to not call the ToArray at the end and let the computation be done while other code could execute on the thread instead of forcing it to wait for the parallel computation to finish.
var records = new [] {
new { CallTime= DateTime.Now, NumberDialled = 1 },
new { CallTime= DateTime.Now.AddSeconds(1), NumberDialled = 1 }
};
var span = TimeSpan.FromSeconds(10);
// Select for each call itself and all other calls in the next 'span' seconds
var callInfos = records.AsParallel()
.Select((r, i) =>
new
{
Record = r,
Following = records.Skip(i+1)
.TakeWhile(r2 => r2.CallTime - r.CallTime < span)
}
);
// Filter the calls that interest us
var problematic = (from callinfo in callInfos
where callinfo.Following.Any(r => callinfo.Record.NumberDialled == r.NumberDialled)
select callinfo.Record)
.ToArray();

If performance is acceptable (which I think it should be, since 100k records is not particularly many), this approach is (I think) nice and clean:
First we group up the records by number:
var byNumber =
from cdr in calls
group cdr by cdr.NumberDialled into g
select new
{
NumberDialled = g.Key,
Calls = g.OrderBy(cdr => cdr.CallTime)
};
What we do now is Zip (.NET 4) each calls collection with itself-shifted-by-one, to transform the list of call times into a list of gaps between calls. We then look for numbers where there's a gap of at most 10 seconds:
var interestingNumbers =
from g in byNumber
let callGaps = g.Calls.Zip(g.Calls.Skip(1),
(cdr1, cdr2) => cdr2.CallTime - cdr1.CallTime)
where callGaps.Any(ts => ts.TotalSeconds <= 10)
select g.NumberDialled;
Now interestingNumbers is a sequence of the numbers of interest.

Multiple SUM using LINQ

I have a loop like the following, can I do the same using multiple SUM?
foreach (var detail in ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload &&
pd.InventoryType == InventoryTypes.Finished))
{
weight += detail.GrossWeight;
length += detail.Length;
items += detail.NrDistaff;
}

Technically speaking, what you have is probably the most efficient way to do what you are asking. However, you could create an extension method on IEnumerable<T> called Each that might make it simpler:
public static class EnumerableExtensions
{
public static void Each<T>(this IEnumerable<T> col, Action<T> itemWorker)
{
foreach (var item in col)
{
itemWorker(item);
}
}
}
And call it like so:
// Declare variables in parent scope
double weight;
double length;
int items;
ArticleLedgerEntries
.Where(
pd =>
pd.LedgerEntryType == LedgerEntryTypeTypes.Unload &&
pd.InventoryType == InventoryTypes.Finished
)
.Each(
pd =>
{
// Close around variables defined in parent scope
weight += pd.GrossWeight;
lenght += pd.Length;
items += pd.NrDistaff;
}
);
UPDATE:
Just one additional note. The above example relies on a closure. The variables weight, length, and items should be declared in a parent scope, allowing them to persist beyond each call to the itemWorker action. I've updated the example to reflect this for clarity sake.

You can call Sum three times, but it will be slower because it will make three loops.
For example:
var list = ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload
&& pd.InventoryType == InventoryTypes.Finished))
var totalWeight = list.Sum(pd => pd.GrossWeight);
var totalLength = list.Sum(pd => pd.Length);
var items = list.Sum(pd => pd.NrDistaff);
Because of delayed execution, it will also re-evaluate the Where call every time, although that's not such an issue in your case. This could be avoided by calling ToArray, but that will cause an array allocation. (And it would still run three loops)
However, unless you have a very large number of entries or are running this code in a tight loop, you don't need to worry about performance.
EDIT: If you really want to use LINQ, you could misuse Aggregate, like this:
int totalWeight, totalLength, items;
list.Aggregate((a, b) => {
weight += detail.GrossWeight;
length += detail.Length;
items += detail.NrDistaff;
return a;
});
This is phenomenally ugly code, but should perform almost as well as a straight loop.
You could also sum in the accumulator, (see example below), but this would allocate a temporary object for every item in your list, which is a dumb idea. (Anonymous types are immutable)
var totals = list.Aggregate(
new { Weight = 0, Length = 0, Items = 0},
(t, pd) => new {
Weight = t.Weight + pd.GrossWeight,
Length = t.Length + pd.Length,
Items = t.Items + pd.NrDistaff
}
);

You could also group by true - 1 (which is actually including any of the items and then have them counted or summered):
var results = from x in ArticleLedgerEntries
group x by 1
into aggregatedTable
select new
{
SumOfWeight = aggregatedTable.Sum(y => y.weight),
SumOfLength = aggregatedTable.Sum(y => y.Length),
SumOfNrDistaff = aggregatedTable.Sum(y => y.NrDistaff)
};
As far as Running time, it is almost as good as the loop (with a constant addition).

You'd be able to do this pivot-style, using the answer in this topic: Is it possible to Pivot data using LINQ?

Ok. I realize that there isn't an easy way to do this using LINQ. I'll take may foreach loop because I understood that it isn't so bad. Thanks to all of you

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ and multiple aggregates - c#

You could also do something like this: decimal poTotal = 0; decimal total = listOfRecords.Sum(record => { if (record.HasPo) {poTotal += record.PoTotal;} return record.Price; }); But I'm not saying you should. As MarcinJurasek says, the simple foreach is clearest in this case.

Related

C# Splitting a List<string> Value

Query values from a C# Generic Dictionary using LINQ

How to select multiple properties properly?

Sliding time window for record analysis

Multiple SUM using LINQ

Categories

Resources