Converting foreach to Linq - c#

Current Code:
For each element in the MapEntryTable, check the properties IsDisplayedColumn and IsReturnColumn and if they are true then add the element to another set of lists, its running time would be O(n), there would be many elements with both properties as false, so will not get added to any of the lists in the loop.
foreach (var mapEntry in MapEntryTable)
{
if (mapEntry.IsDisplayedColumn)
Type1.DisplayColumnId.Add(mapEntry.OutputColumnId);
if (mapEntry.IsReturnColumn)
Type1.ReturnColumnId.Add(mapEntry.OutputColumnId);
}
Following is the Linq version of doing the same:
MapEntryTable.Where(x => x.IsDisplayedColumn == true).ToList().ForEach(mapEntry => Type1.DisplayColumnId.Add(mapEntry.OutputColumnId));
MapEntryTable.Where(x => x.IsReturnColumn == true).ToList().ForEach(mapEntry => Type1.ReturnColumnId.Add(mapEntry.OutputColumnId));
I am converting all such foreach code to linq, as I am learning it, but my question is:
Do I get any advantage of Linq conversion in this case or is it a disadvantage ?
Is there a better way to do the same using Linq
UPDATE:
Consider the condition where out of 1000 elements in the list 80% have both properties false, then does where provides me a benefit of quickly finding elements with a given condition.
Type1 is a custom type with set of List<int> structures, DisplayColumnId and ReturnColumnId

ForEach ins't a LINQ method. It's a method of List. And not only is it not a part of LINQ, it's very much against the very values and patterns of LINQ. Eric Lippet explains this in a blog post that was written when he was a principle developer on the C# compiler team.
Your "LINQ" approach also:
Completely unnecessarily copies all of the items to be added into a list, which is both wasteful in time and memory and also conflicts with LINQ's goals of deferred execution when executing queries.
Isn't actually a query with the exception of the Where operator. You're acting on the items in the query, rather than performing a query. LINQ is a querying tool, not a tool for manipulating data sets.
You're iterating the source sequence twice. This may or may not be a problem, depending on what the source sequence actually is and what the costs of iterating it are.
A solution that uses LINQ as much as is it is designed for would be to use it like so:
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsDisplayedColumn))
list1.DisplayColumnId.Add(mapEntry.OutputColumnId);
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsReturnColumn))
list2.ReturnColumnId.Add(mapEntry.OutputColumnId);

I would say stick with the original way with the foreach loop, since you are only iterating through the list 1 time over.
also your linq should look more like this:
list1.DisplayColumnId.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn).Select(mapEntry => mapEntry.OutputColumnId));
list2.ReturnColumnId.AddRange(MapEntryTable.Where(x => x.IsReturnColumn).Select(mapEntry => mapEntry.OutputColumnId));

The performance of foreach vs Linq ForEach are almost exactly the same, within nano seconds of each other. Assuming you have the same internal logic in the loop in both versions when testing.
However a for loop, outperforms both by a LARGE margin. for(int i; i < count; ++i) is much faster than both. Because a for loop doesn't rely on an IEnumerable implementation (overhead). The for loop compiles to x86 register index/jump code. It maintains an incrementor, and then it's up to you to retrieve the item by it's index in the loop.
Using a Linq ForEach loop though does have a big disadvantage. You cannot break out of the loop. If you need to do that you have to maintain a boolean like "breakLoop = false", set it to true, and have each recursive exit if breakLoop is true... Bad performing there. Secondly you cannot use continue, instead you use "return".
I never use Linq's foreach loop.
If you are dealing with linq, e.g.
List<Thing> things = .....;
var oldThings = things.Where(p.DateTime.Year < DateTime.Now.Year);
That internally will foreach with linq and give you back only the items with a year less than the current year. Cool..
But if I am doing this:
List<Thing> things = new List<Thing>();
foreach(XElement node in Results) {
things.Add(new Thing(node));
}
I don't need to use a linq for each loop. Even if I did...
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing") {
if (node.Ignore) {
continue;
}
thing.Add(node);
}
even though I could write that cleaner like
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing" && !node.Ignore) {
thing.Add(node);
}
There is no real reason I can think of to do this..>
things.ForEach(thing => {
//do something
//can't break
//can't continue
return; //<- continue
});
And if I want the fastest loop possible,
for (int i = 0; i < things.Count; ++i) {
var thing = things[i];
//do something
}
Will be faster.

Your LINQ isn't quite right as you're converting the results of Where to a List and then pseudo-iterating over those results with ForEach to add to another list. Use ToList or AddRange for converting or adding sequences to lists.
Example, where overwriting list1 (if it were actually a List<T>):
list1 = MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId).ToList();
or to append:
list1.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId));

In C#, to do what you want functionally in one call, you have to write your own partition method. If you are open to using F#, you can use List.Partition<'T>
https://msdn.microsoft.com/en-us/library/ee353782.aspx

Related

C# Modifying the list being iterated over using forEach

I would like to learn if it is possible to modify the list being iterated over using forEach so that there is no need to maintain an index.
var scanResults = await someFunction();
for (int i = 0; i < scanResults.Data.Count(); i++)
{
if ((scanResults.Data.ToList()[i].Filters.Count() == 0) != (scanResults.Data.ToList()[i].SubscribedFilters.Count() == 0))
{
scanResults.Data.ToList()[i] = await AddFilters(scanResults.Data.ToList()[i]);
}
}
return scanResults;
Note as mentioned by John and myself in the comment that in your code you are using ToList() (presumably System.Linq) in the nested check and statements (which is likely a logical mistake); which means you are creating a new list each time. Assuming that you reference one list throughout your nested statement, you will run into InvalidOperationException with a message of Collection Was Modified.
Scenario 1: Stick with for(;;) loop
The benefit of this is you don't need to use additional memory to create a temporary list or an alternative list.
Scenario 2: foreach with a temporary duplicate list to modify
If you really insist on using foreach loop then one simple option is to create an identical list with the same data and iterate through that. Depending on what you are doing within the list this might not work. The downside with this approach is you are using additional memory to store the duplicate list.
Lots of the code is not given in the problem so we can't guarantee which would make more sense in your situation. However in most cases you would try to stick with the for(;;) loop.
I can't see the rest of your code so I can only guess at the data types you are using but it'd be something like this.
foreach (var scanResult in ScanResults)
{
if ((scanResult.Data.ToList().Filters.Count() == 0) != (scanResult.Data.ToList().SubscribedFilters.Count() == 0))
{
scanResult.Data.ToList() = await AddFilters(scanResult.Data.ToList());
}
}

Simplify double foreach instruction

I need to browse a word document and to retrieve some Text Boxes in order to modify them.
But I need to count them before, and I do think that what I wrote is really inefficient.
I'd like to know if it's possible to simplify the following:
foreach (Microsoft.Office.Interop.Word.HeaderFooter OHeader in documentOld.Sections[1].Headers)
{
foreach (Microsoft.Office.Interop.Word.Shape shape in OHeader.Shapes)
{
if (shape.Name.Contains("Text Box"))
{
listTextBox.Add(new KeyValuePair<string, string>(shape.Name.ToString(), shape.TextFrame.TextRange.Text.ToString()));
}
}
}
int count = listTextBox.Count();
I want to know how many elements which contain "Text Box" are in the Shapes.
I see two ways you can do this.
Using LINQ syntax:
var count = (
from OHeader in documentOld.Sections[1].Headers
from shape in OHeader.Shapes
where shape.Name.Contains("Text Box")).Count();
Or, using IEnumerable extension methods:
var count = documentOld.Sections[1].Headers
.SelectMany(h => h.Shapes)
.Count(s => s.Name.Contains("Text Box"));
Note that your version is inefficient in that it creates a list and the KeyValuePairs needlessly, given that you only want to count the number of shapes that match some condition. Other that that, nested foreach blocks are fine for performance, but may lack in readability versus the LINQ equivalents.
Also, please note that I have not tested the code above.
Keeping your code the same by using the foreach loops still all you need to do is have your count variable before the loops and increment it each time you find a match.
int count = 0;
foreach (Microsoft.Office.Interop.Word.HeaderFooter OHeader in documentOld.Sections[1].Headers)
{
foreach (Microsoft.Office.Interop.Word.Shape shape in OHeader.Shapes)
{
if (shape.Name.Contains("Text Box"))
{
++count;
}
}
}

is there an easier way to reverse the order of a DataGridViewSelectedRowCollection?

For some strange reason, a DataGridViewSelectedRowCollection is populated in reverse order from what is displayed in theDataGridView. But what is more puzzling is why there isn't a straightforward way of reversing the order to use in a foreach loop.
I would like to be able to use syntax as simple as this:
foreach (DataGridViewRow r in dataGridView1.SelectedRows.Reverse())
...but of course, that is not supported.*
So, currently I am using this monstrosity:
//reverse the default selection order:
IEnumerable<DataGridViewRow> properlyOrderedSelectedRows
= dataGridView1.SelectedRows.Cast<DataGridViewRow>().ToArray().Reverse();
foreach (DataGridViewRow r in properlyOrderedSelectedRows )
{
MessageBox.Show( r.Cells["ID"].Value.ToString());
}
...which is terribly ugly and convoluted. (I realize I could use a reverse For loop, but I prefer the foreach for its readability.)
What am I missing here? Is there a simpler approach?
*Actually, I would have expected this version to work, according to the discussion here, since DataGridViewSelectedRowCollection implements IEnumerable; but it doesn't compile.
I think you just need to cast in your for each loop like:
foreach (DataGridViewRow row in dataGridView1.SelectedRows.Cast<DataGridViewRow>().Reverse()) {
}
however this isn't as efficient even if it appears to be less code as it has to basically go through the enumerator forwards putting everything on a stack then pops everything back out in reverse order.
If you have a directly-indexable collection you should definitely use a for loop instead and enumerate over the collection in reverse order.
As mentioned here Possible to iterate backwards through a foreach?
You could add the rows to a stack...
stack<DataGridViewRow> properlyOrderedSelectedRows = new stack<DataGridViewRow>(dataGridView1.SelectedRows);
foreach (DataGridViewRow r in properlyOrderedSelectedRows )
{
MessageBox.Show( r.Cells["ID"].Value.ToString());
}
Stack<T> has a constructor that accepts IEnumerable<T>
But what is more puzzling is why there isn't a straightforward way of reversing the order to use in a foreach loop.
I would like to be able to use syntax as simple as this:
...
...but of course, that is not supported.*
How about making it work yourself instead of all these pseudo witty statements, "monstrosities" and highly inefficient LINQ-es. All you need is to write a one liner function in some common place.
public static IEnumerable<DataGridViewRow> GetSelectedRows(this DataGridView source)
{
for (int i = source.SelectedRows.Count - 1; i >= 0; i--)
yield return source.SelectedRows[i];
}

Nested foreach - Need to operate on object at each level

I was trying to refactor some nested foreach loops but ran into an issue. This is the original code:
foreach(var doc in customTrackerDocuments)
{
foreach(var rule in doc.Rules)
{
foreach(var eval in rule.Evaluations)
{
// Do something with customTrackerDocuments, rules, and evaluations
// doc, rule, and eval are all available here
}
}
}
It's admittedly clean and straightforward, so maybe it should just stay like that. However, I've always tried to reduce complexity and increase readability, so I tried this:
foreach(var eval in customTrackerDocuments.SelectMany(doc => doc.Rules).SelectMany(rule => rule.Evaluations))
{
// Do something with customTrackerDocuments, rules, and evaluations
// doc and rule are NOT available here
}
The issue is that doc and rule are no longer available to use in the loop. Is there a way to have them be available with this approach? Or should I just use the first option that has the three nested loops?
I have a fiddle here: https://dotnetfiddle.net/oBMfQC
They are not available because you're only projecting rule.Evaluations in your final SelectMany. You could build up an anonymous type:
foreach(var eval in customTrackerDocuments.SelectMany(doc => doc.Rules, rule => new {doc, rule})
.SelectMany(docrule => docrule.rule.Evaluations, eval => new {docrule.doc, docrule.rule, eval}))
{
// eval.doc, eval,rule, eval.eval are available here
}
Whether or not that's more readable or less complex is debatable. It certainly won't be any faster or use less memory.

A better way to loop through lists

So I have a couple of different lists that I'm trying to process and merge into 1 list.
Below is a snipet of code that I want to see if there was a better way of doing.
The reason why I'm asking is that some of these lists are rather large. I want to see if there is a more efficient way of doing this.
As you can see I'm looping through a list, and the first thing I'm doing is to check to see if the CompanyId exists in the list. If it does, then I find item in the list that I'm going to process.
pList is my processign list. I'm adding the values from my different lists into this list.
I'm wondering if there is a "better way" of accomplishing the Exist and Find.
boolean tstFind = false;
foreach (parseAC item in pACList)
{
tstFind = pList.Exists(x => (x.CompanyId == item.key.ToString()));
if (tstFind == true)
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}
Just as a side note, I'm going to be researching a way to use joins to see if that is faster. But I haven't gotten there yet. The above code is my first cut at solving this issue and it appears to work. However, since I have the time I want to see if there is a better way still.
Any input is greatly appreciated.
Time Findings:
My current Find and Exists code takes about 84 minutes to loop through the 5.5M items in the pACList.
Using pList.firstOrDefault(x=> x.CompanyId == item.key.ToString()); takes 54 minutes to loop through 5.5M items in the pACList
You can retrieve item with FirstOrDefault instead of searching for item two times (first time to define if item exists, and second time to get existing item):
var tstFind = pList.FirstOrDefault(x => x.CompanyId == item.key.ToString());
if (tstFind != null)
{
//Processing done here. pItem gets updated here
}
Yes, use a hashtable so that your algorithm is O(n) instead of O(n*m) which it is right now.
var pListByCompanyId = pList.ToDictionary(x => x.CompanyId);
foreach (parseAC item in pACList)
{
if (pListByCompanyId.ContainsKey(item.key.ToString()))
{
pItem = pListByCompanyId[item.key.ToString()];
//Processing done here. pItem gets updated here
...
}
You can iterate though filtered list using linq
foreach (parseAC item in pACList.Where(i=>pList.Any(x => (x.CompanyId == i.key.ToString()))))
{
pItem = pList.Find(x => (x.CompanyId == item.key.ToString()));
//Processing done here. pItem gets updated here
...
}
Using lists for this type of operation is O(MxN) (M is the count of pACList, N is the count of pList). Additionally, you are searching pACList twice. To avoid that issue, use pList.FirstOrDefault as recommended by #lazyberezovsky.
However, if possible I would avoid using lists. A Dictionary indexed by the key you're searching on would greatly improve the lookup time.
Doing a linear search on the list for each item in another list is not efficient for large data sets. What is preferable is to put the keys into a Table or Dictionary that can be much more efficiently searched to allow you to join the two tables. You don't even need to code this yourself, what you want is a Join operation. You want to get all of the pairs of items from each sequence that each map to the same key.
Either pull out the implementation of the method below, or change Foo and Bar to the appropriate types and use it as a method.
public static IEnumerable<Tuple<Bar, Foo>> Merge(IEnumerable<Bar> pACList
, IEnumerable<Foo> pList)
{
return pACList.Join(pList, item => item.Key.ToString()
, item => item.CompanyID.ToString()
, (a, b) => Tuple.Create(a, b));
}
You can use the results of this call to merge the two items together, as they will have the same key.
Internally the method will create a lookup table that allows for efficient searching before actually doing the searching.
Convert pList to HashSet then query pHashSet.Contains(). Complexity O(N) + O(n)
Sort pList on CompanyId and do Array.BinarySearch() = O(N Log N) + O(n * Log N )
If Max company id is not prohibitively large, simply create and array of them where item with company id i exists at i-th position. Nothing can be more fast.
where N is size of pList and n is size of pACList

Categories

Resources