Single LINQ expression to tally up several columns in a DataSet - c#

I have a DataSet with several rows and columns (as DataSets tend to have). I need to create a tally row on the bottom with sums of each column. I'd like to do this with a single LINQ expression, as it will simplify a bunch of my code. I can get the total of a single column like so:
var a = (from m in month
where <some long expression>
select m["BGCO_MINUTES"] as Decimal?).Sum();
However, I want totals for other columns as well. I don't want to use multiple LINQ expressions because there's also a complicated where clause in there, and I'm doing several tally rows with various expressions and only want to loop through this set once. I also don't want to manually loop through the dataset myself and add up the totals since I'm creating many of these tally rows and think it would be messier.
What I want is an anonymous type that contains a total of BGCO_MINUTES, 800IB_MINUTES and TSDATA_MINUTES.
Is there any way to do this?

You could do this:
// run the filters once and get List<DataRow> with the matching rows
var list = (from m in month
where <some long expression>
select m).ToList();
// build the summary object
var result = new {
BGCO_MINUTES = list.Sum(m => m["BGCO_MINUTES"] as Decimal?),
_800IB_MINUTES= list.Sum(m => m["800IB_MINUTES"] as Decimal?),
}
And that's assuming your where clause is not just long to type, but computationally expensive to evaluate. That will iterate through the list once per column.
If you really want to only iterate the list once, you can probably do it with Enumerable.Aggregate, but the code is less elegant (in my opinion):
// run the filters once and get List<DataRow> with the matching rows
var a = (from m in month
where <some long expression>
select m)
.Aggregate( new { BGCO_MINUTES = (decimal?)0m,
_800IB_MINUTES = (decimal?)0m },
(ac,v) => new { BGCO_MINUTES = ac.BGCO_MINUTES + (decimal?)v["BGCO_MINUTES"],
_800IB_MINUTES = ac._800IB_MINUTES + (decimal?)v["800IB_MINUTES"] });
Like I said, I think it's less elegant than the first version, but it should work. Even though the first one requires a temporary copy of the values that match the where clause (memory cost) and 1 pass through the list for each field (CPU cost), I think it's a lot more readable than the latter version - make sure the performance difference is worth it before using the less-understandable version.

Use Aggregate instead of Sum as it is more flexible - you will be able to have object (or simply dictionary) to hold sums for individual columns while iterating through each row.
(non-compiled code ahead)
class SumObject {
public float First;
public float Second;
}
var filtered = (from m in month
where <some long expression>
select m;
filtered.Aggregate(new SumObject(), (currentSum, item)=> {
currentSum.First += item.First;
currentSum.Second += item.Second;
return currentSum;
});

Related

C# and LINQ - arbitrary statement instead of let

Let's say I'm doing a LINQ query like this (this is LINQ to Objects, BTW):
var rows =
from t in totals
let name = Utilities.GetName(t)
orderby name
select t;
So the GetName method just calculates a display name from a Total object and is a decent use of the let keyword. But let's say I have another method, Utilities.Sum() that applies some math on a Total object and sets some properties on it. I can use let to achieve this, like so:
var rows =
from t in totals
let unused = Utilities.Sum(t)
select t;
The thing that is weird here, is that Utilities.Sum() has to return a value, even if I don't use it. Is there a way to use it inside a LINQ statement if it returns void? I obviously can't do something like this:
var rows =
from t in totals
Utilities.Sum(t)
select t;
PS - I know this is probably not good practice to call a method with side effects in a LINQ expression. Just trying to understand LINQ syntax completely.
No, there is no LINQ method that performs an Action on all of the items in the IEnumerable<T>. It was very specifically left out because the designers actively didn't want it to be in there.
Answering the question
No, but you could cheat by creating a Func which just calls the intended method and spits out a random return value, bool for example:
Func<Total, bool> dummy = (total) =>
{
Utilities.Sum(total);
return true;
};
var rows = from t in totals
let unused = dummy(t)
select t;
But this is not a good idea - it's not particularly readable.
The let statement behind the scenes
What the above query will translate to is something similar to this:
var rows = totals.Select(t => new { t, unused = dummy(t) })
.Select(x => x.t);
So another option if you want to use method-syntax instead of query-syntax, what you could do is:
var rows = totals.Select(t =>
{
Utilities.Sum(t);
return t;
});
A little better, but still abusing LINQ.
... but what you should do
But I really see no reason not to just simply loop around totals separately:
foreach (var t in totals)
Utilities.Sum(t);
You should download the "Interactive Extensions" (NuGet Ix-Main) from Microsoft's Reactive Extensions team. It has a load of useful extensions. It'll let you do this:
var rows =
from t in totals.Do(x => Utilities.Sum(x))
select t;
It's there to allow side-effects on a traversed enumerable.
Please, read my comment to the question. The simplest way to achieve such of functionality is to use query like this:
var rows = from t in totals
group t by t.name into grp
select new
{
Name = t.Key,
Sum = grp.Sum()
};
Above query returns IEnumerable object.
For further information, please see: 101 LINQ Samples

Lambda ForEach with Index

Here are a list of column names:
var colNames = new List<string> { "colE", "colL", "colO", "colN" };
Based on the position of the column names in the list, I want to make that column's visible index equal to the position of the column name, but without returning a list. In other words, the following lambda expression without "ToList()" at the end:
colNames.Select((x, index) => { grid_ctrl.Columns[x].VisibleIndex = index; return x; }).ToList();
Can this be coded in a one-line lambda expression?
Use a loop to make side-effects. Use queries to compute new data from existing data:
var updates =
colNames.Select((x, index) => new { col = grid_ctrl.Columns[x].VisibleIndex, index })
.ToList();
foreach (var u in updates)
u.col.VisibleIndex = u.index;
Hiding side-effects in queries can make for nasty surprises. We can still use a query to do the bulk of the work.
You could also use List.ForEach to make those side-effects. That approach is not very extensible, however. It is not as general as a query.
Yes, here you are:
colNames.ForEach((x) => grid_ctrl.Columns[x].VisibleIndex = colNames.IndexOf(x));
Note that you need unique strings in your list, otherwise .IndexOf will behave badly.
Unfortunately LINQ .ForEach, as its relative foreach doesn't provide an enumeration index.

ConcurrentDictionary.Where very slow for filtering based int array (Key field)

I have the following
var links = new ConcurrentDictionary<int, Link>();
which is populated with around 20k records, I have another array of strings (List) that I turn into int array using following.
var intPossible = NonExistingListingIDs.Select(int.Parse); //this is very fast but need to be done
which is pretty fast. but I would like to create a new list or filter out "links" only to what is actually in the intPossible array which matches the Key element of the ConcurrentDictionary.
I have the following using a where clause but it takes about 50 seconds to do the actual filtering which is very slow for what I want to do.
var filtered = links.Where(x => intPossible.Any(y => y == x.Key)).ToList();
I know intersect is pretty fast but I have a array of ints and intersect is not working with this against a ConcurrentDictionary
How can i filter the links to be a little faster instead of 50 seconds.
You need to replace your O(n) inner lookup with something more speedy like a hashset which offers O(1) complexity for lookups.
So
var intPossible = new HashSet<int>(NonExistingListingIDs.Select(int.Parse));
and
var filtered = links.Where(x => intPossible.Contains(x.Key)).ToList();
This will avoid iterating most of intPossible for every item in links.
Alternatively, Linq is your friend:
var intPossible = NonExistingListingIDs.Select(int.Parse);
var filtered =
links.Join(intPossible, link => link.Key, intP => intP, (link, intP) => link);
The implementation of Join does much the same thing as I do above.
An alternative method would be to enumerate your list and use the indexer of the dictionary...might be a little cleaner...
var intPossible = NonExistingListingIDs.Select(int.Parse);
var filtered = from id in intPossible
where links.ContainsKey(id)
select links[id];
You might want to chuck in a .ToList() in there for good measure too...
This should actually be slightly faster than #spender's solution, since .Join has to create a new HashTable, whilst this method uses the HashTable in the ConcurrentDictionary.

C# LINQ - What can I do here to improve performance?

I'm doing some heavy filtering on a collection (which is nothing more than an encapsulated list of entries of "datalines").
I need to 'consolidate' these lines on 3 fields (Date (string), Route (string) and ConsolidationCode (string)).
Extracting the 3 Distinct Lists works fast. I'm more worried about the triple foreach...
I'd say that a normal "complete _DealerCaseSetComplete contains 5000 entries.
The Dates would be around 5, the Routes would be around 100 and the Consolidations 350-500.
I have written following method. It does exactly what I want it to do, but is very slow in calculationtime.
Perhaps you people could guide me towards a faster code execution.
If you require any other code (which is really plain actually, please ask.
private void FillDataGridView()
{
//
_LocalGridControl.Invoke(CreateDataGrid);
//Filter by Date
List<string> Dates = _DealerCaseSetComplete.Data.Select(rec => rec.DateAdded).Distinct().ToList();
//Filter by Route
List<string> Routes = _DealerCaseSetComplete.Data.Select(rec => rec.Route).Distinct().ToList();
//Filter by Consolidation
List<string> Consolidations = _DealerCaseSetComplete.Data.Select(rec => rec.DealerConsolidationCode).Distinct().ToList();
foreach(string d in Dates)
{
foreach(string r in Routes)
{
foreach(string c in Consolidations)
{
List<DealerCaseLine> Filter = _DealerCaseSetComplete.Data.Where(rec => rec.DateAdded == d &&
rec.Route == r &&
rec.DealerConsolidationCode == c).ToList();
if(Filter.Count > 0)
_LocalGridControl.Invoke(AddLineToDataGrid, Filter);
}
}
}
_LocalGridControl.Invoke(SortDataGrid);
}
Looks like you need grouping by three fields:
var filters = from r in _DealerCaseSetComplete.Data
group r by new {
r.DateAdded,
r.Route,
r.DealerConsolidationCode
} into g
select g.ToList();
foreach(List<DealerCaseLine> filter in filters)
_LocalGridControl.Invoke(AddLineToDataGrid, filter);
Your code iterates all data three times to get distinct fields. Then it iterates all data for all combinations of distinct fields (when you do filtering with where clause). With grouping by this three fields you will iterate data only once. Each resulting group will have at least one item, so you don't need to check if there is any items in group, before invoking filter.
It looks like you're trying to get every distinct combination of Dates, Routes and Consolidations.
Your current code is slow because it is, I think, O(n^4). You have three nested loops, the body of which is a linear search.
You can get much better performance by using the overload of Distinct that takes an IEqualityComparer<T>:
http://msdn.microsoft.com/en-us/library/bb338049.aspx
var Consolidated =
_DealerCaseSetComplete.Data.Select(rec => rec).
Distinct(new DealerCaseComparer());
The class DealerCaseComparer would be implemented much as in the above MSDN link.

How can I set properties on all items from a linq query with values from another object that is also pulled from a query?

I have a query pulling from a database:
List<myClass> items = new List<myClass>(from i in context
select new myClass
{
A = i.A,
B = "", // i doesn't know this, this comes from elsewhere
C = i.C
}
I also have another query doing a similar thing:
List<myClass2> otherItems = new List<myClass2>(from j in context
select new myClass2
{
A = j.A, // A is the intersection, there will only be 1 A here but many A's in items
B = j.B
}
In reality these classes are much larger and query data that is separated not only by database but by server as well. Is it possible to use a LINQ query to populate the property B for all items where items.A intersect? All of the built in LINQ predicates appear only to do aggregates, selections or bool expressions.
In my brain I had something like this, but this is all off:
items.Where(x => x.B = (otherItems.Where(z => z.A == x.A).Single().B));
Or am I being ridiculous with trying to make this work in LINQ and should just abandon it in favor of a for loop where the actual setting becomes trivial? Because of deadlines I will be resorting to the for loop (and it's probably going to end up being a lot more readable in the long run anyway), but is it possible to do this? Would an extension method be necessary to add a special predicate to allow this?
LINQ is designed for querying. If you're trying to set things, you should definitely use a loop (probably foreach). That doesn't mean you won't be able to use LINQ as part of that loop, but you shouldn't be trying to apply a side-effect within LINQ itself.
Query the OtherItems first. Do a ToDictionary() on the result. Then, when querying the database, do this:
var items = from i in context
select new myClass
{ A = i.A,
B = otherItems[i.A],
C = i.C
}

Categories

Resources