I need to browse a word document and to retrieve some Text Boxes in order to modify them.
But I need to count them before, and I do think that what I wrote is really inefficient.
I'd like to know if it's possible to simplify the following:
foreach (Microsoft.Office.Interop.Word.HeaderFooter OHeader in documentOld.Sections[1].Headers)
{
foreach (Microsoft.Office.Interop.Word.Shape shape in OHeader.Shapes)
{
if (shape.Name.Contains("Text Box"))
{
listTextBox.Add(new KeyValuePair<string, string>(shape.Name.ToString(), shape.TextFrame.TextRange.Text.ToString()));
}
}
}
int count = listTextBox.Count();
I want to know how many elements which contain "Text Box" are in the Shapes.
I see two ways you can do this.
Using LINQ syntax:
var count = (
from OHeader in documentOld.Sections[1].Headers
from shape in OHeader.Shapes
where shape.Name.Contains("Text Box")).Count();
Or, using IEnumerable extension methods:
var count = documentOld.Sections[1].Headers
.SelectMany(h => h.Shapes)
.Count(s => s.Name.Contains("Text Box"));
Note that your version is inefficient in that it creates a list and the KeyValuePairs needlessly, given that you only want to count the number of shapes that match some condition. Other that that, nested foreach blocks are fine for performance, but may lack in readability versus the LINQ equivalents.
Also, please note that I have not tested the code above.
Keeping your code the same by using the foreach loops still all you need to do is have your count variable before the loops and increment it each time you find a match.
int count = 0;
foreach (Microsoft.Office.Interop.Word.HeaderFooter OHeader in documentOld.Sections[1].Headers)
{
foreach (Microsoft.Office.Interop.Word.Shape shape in OHeader.Shapes)
{
if (shape.Name.Contains("Text Box"))
{
++count;
}
}
}
Related
I have a list of strings. Neither the number of nor the order of these strings is guaranteed. The only thing that is certain is that this list WILL at least contain my 3 strings of interest and inside those strings we'll say "string1", "string2", and "string3" will be contained within them respectively (i.e. these strings can contain more information but those keywords will definitely be in there). I then want to use these results in a function.
My current implementation to solve this is as such:
foreach(var item in myList)
{
if (item.Contains("string1"))
{
myFunction1(item);
}
else if (item.Contains("string2"))
{
myFunction2(item);
}
else if (item.Contains("string3"))
{
myFunction3(item);
}
}
Is there a better way to check string lists and apply functions to those items that match some criteria?
One approach is to use Regex for the fixed list of strings, and check which group is present, like this:
// Note the matching groups around each string
var regex = new Regex("(string1)|(string2)|(string3)");
foreach(var item in myList) {
var match = regex.Match(item);
if (!match.Success) {
continue;
}
if (match.Groups[1].Success) {
myFunction1(item);
}
else if (match.Groups[2].Success)
{
myFunction2(item);
}
else if (match.Groups[3].Success)
{
myFunction3(item);
}
}
This way all three matches would be done with a single pass through the target string.
You could reduce some of the duplicated code in the if statements by creating a Dictionary that maps the strings to their respective functions. (This snippet assumes that myList contains string values, but can easily be adapted to a list of any type.)
Dictionary<string, Action<string>> actions = new Dictionary<string, Action<string>>
{
["string1"] = myFunction1,
["string2"] = myFunction2,
["string3"] = myFunction3
};
foreach (var item in myList)
{
foreach (var action in actions)
{
if (item.Contains(action.Key))
{
action.Value(item);
break;
}
}
}
For a list of only three items, this might not be much of an improvement, but if you have a large list of strings/functions to search for it can make your code much shorter. It also means that adding a new string/function pair is a one-line change. The biggest downside is that the foreach loop is a bit more difficult to read.
Consider you have two lists in C#, first list contains elements of TypeOne and second list contains elements of TypeTwo:
TypeOne
{
int foo;
int bar;
}
TypeTwo
{
int baz;
int qux;
}
Now I need to find elements ( with some property value ) in the first list that don't exist in the second list, and similarly I want to find elements in the second list that don't exist in the first list. (There are only zero or one occurences in either lists.)
What I tried so far is to iterate both lists like this:
foreach (var item in firstList)
{
if (!secondList.Any(a=> a.baz == item.foo)
{
// Item is in the first list but not in second list.
}
}
and again:
foreach (var item in secondList)
{
if (!firstList.Any(a=> a.foo == item.baz)
{
// Item is in the second list but not in first list.
}
}
I hardly think this is a good way to do what I want. I'm iterating my lists two times and use Any in each of them which also iterates the list. So too many iterations.
What is the most efficient way to achieve this?
I am afraid there is no prebuild solution for this, so the best we can do is optimize as much as possible. We only have to iterate the first list, because everything that is in second will be compared already
// First we need copies to operate on
var firstCopy = new List<TypeOne>(firstList);
var secondCopy = new List<TypeTwo>(secondList);
// Now we iterate the first list once complete
foreach (var typeOne in firstList)
{
var match = secondCopy.FirstOrDefault(s => s.baz == typeOne.foo);
if (match == null)
{
// Item in first but not in second
}
else
{
// Match is duplicate and shall be removed from both
firstCopy.Remove(typeOne);
secondCopy.Remove(match);
}
}
After running this both copies will only contain the values which are unique in this instance. This not only reduces it to half the number of iterations but also constantly improves because the second copy shrinks with each match.
Use this LINQ Query.
var result1 = secondList.Where(p2 => !firstList.Any(p1 => p1.foo == p2.baz));
var result2=firstList.Where(p1=> !secondList.Any(p2=> p2.foo == p1.baz);
For some strange reason, a DataGridViewSelectedRowCollection is populated in reverse order from what is displayed in theDataGridView. But what is more puzzling is why there isn't a straightforward way of reversing the order to use in a foreach loop.
I would like to be able to use syntax as simple as this:
foreach (DataGridViewRow r in dataGridView1.SelectedRows.Reverse())
...but of course, that is not supported.*
So, currently I am using this monstrosity:
//reverse the default selection order:
IEnumerable<DataGridViewRow> properlyOrderedSelectedRows
= dataGridView1.SelectedRows.Cast<DataGridViewRow>().ToArray().Reverse();
foreach (DataGridViewRow r in properlyOrderedSelectedRows )
{
MessageBox.Show( r.Cells["ID"].Value.ToString());
}
...which is terribly ugly and convoluted. (I realize I could use a reverse For loop, but I prefer the foreach for its readability.)
What am I missing here? Is there a simpler approach?
*Actually, I would have expected this version to work, according to the discussion here, since DataGridViewSelectedRowCollection implements IEnumerable; but it doesn't compile.
I think you just need to cast in your for each loop like:
foreach (DataGridViewRow row in dataGridView1.SelectedRows.Cast<DataGridViewRow>().Reverse()) {
}
however this isn't as efficient even if it appears to be less code as it has to basically go through the enumerator forwards putting everything on a stack then pops everything back out in reverse order.
If you have a directly-indexable collection you should definitely use a for loop instead and enumerate over the collection in reverse order.
As mentioned here Possible to iterate backwards through a foreach?
You could add the rows to a stack...
stack<DataGridViewRow> properlyOrderedSelectedRows = new stack<DataGridViewRow>(dataGridView1.SelectedRows);
foreach (DataGridViewRow r in properlyOrderedSelectedRows )
{
MessageBox.Show( r.Cells["ID"].Value.ToString());
}
Stack<T> has a constructor that accepts IEnumerable<T>
But what is more puzzling is why there isn't a straightforward way of reversing the order to use in a foreach loop.
I would like to be able to use syntax as simple as this:
...
...but of course, that is not supported.*
How about making it work yourself instead of all these pseudo witty statements, "monstrosities" and highly inefficient LINQ-es. All you need is to write a one liner function in some common place.
public static IEnumerable<DataGridViewRow> GetSelectedRows(this DataGridView source)
{
for (int i = source.SelectedRows.Count - 1; i >= 0; i--)
yield return source.SelectedRows[i];
}
Current Code:
For each element in the MapEntryTable, check the properties IsDisplayedColumn and IsReturnColumn and if they are true then add the element to another set of lists, its running time would be O(n), there would be many elements with both properties as false, so will not get added to any of the lists in the loop.
foreach (var mapEntry in MapEntryTable)
{
if (mapEntry.IsDisplayedColumn)
Type1.DisplayColumnId.Add(mapEntry.OutputColumnId);
if (mapEntry.IsReturnColumn)
Type1.ReturnColumnId.Add(mapEntry.OutputColumnId);
}
Following is the Linq version of doing the same:
MapEntryTable.Where(x => x.IsDisplayedColumn == true).ToList().ForEach(mapEntry => Type1.DisplayColumnId.Add(mapEntry.OutputColumnId));
MapEntryTable.Where(x => x.IsReturnColumn == true).ToList().ForEach(mapEntry => Type1.ReturnColumnId.Add(mapEntry.OutputColumnId));
I am converting all such foreach code to linq, as I am learning it, but my question is:
Do I get any advantage of Linq conversion in this case or is it a disadvantage ?
Is there a better way to do the same using Linq
UPDATE:
Consider the condition where out of 1000 elements in the list 80% have both properties false, then does where provides me a benefit of quickly finding elements with a given condition.
Type1 is a custom type with set of List<int> structures, DisplayColumnId and ReturnColumnId
ForEach ins't a LINQ method. It's a method of List. And not only is it not a part of LINQ, it's very much against the very values and patterns of LINQ. Eric Lippet explains this in a blog post that was written when he was a principle developer on the C# compiler team.
Your "LINQ" approach also:
Completely unnecessarily copies all of the items to be added into a list, which is both wasteful in time and memory and also conflicts with LINQ's goals of deferred execution when executing queries.
Isn't actually a query with the exception of the Where operator. You're acting on the items in the query, rather than performing a query. LINQ is a querying tool, not a tool for manipulating data sets.
You're iterating the source sequence twice. This may or may not be a problem, depending on what the source sequence actually is and what the costs of iterating it are.
A solution that uses LINQ as much as is it is designed for would be to use it like so:
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsDisplayedColumn))
list1.DisplayColumnId.Add(mapEntry.OutputColumnId);
foreach (var mapEntry in MapEntryTable.Where(entry => mapEntry.IsReturnColumn))
list2.ReturnColumnId.Add(mapEntry.OutputColumnId);
I would say stick with the original way with the foreach loop, since you are only iterating through the list 1 time over.
also your linq should look more like this:
list1.DisplayColumnId.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn).Select(mapEntry => mapEntry.OutputColumnId));
list2.ReturnColumnId.AddRange(MapEntryTable.Where(x => x.IsReturnColumn).Select(mapEntry => mapEntry.OutputColumnId));
The performance of foreach vs Linq ForEach are almost exactly the same, within nano seconds of each other. Assuming you have the same internal logic in the loop in both versions when testing.
However a for loop, outperforms both by a LARGE margin. for(int i; i < count; ++i) is much faster than both. Because a for loop doesn't rely on an IEnumerable implementation (overhead). The for loop compiles to x86 register index/jump code. It maintains an incrementor, and then it's up to you to retrieve the item by it's index in the loop.
Using a Linq ForEach loop though does have a big disadvantage. You cannot break out of the loop. If you need to do that you have to maintain a boolean like "breakLoop = false", set it to true, and have each recursive exit if breakLoop is true... Bad performing there. Secondly you cannot use continue, instead you use "return".
I never use Linq's foreach loop.
If you are dealing with linq, e.g.
List<Thing> things = .....;
var oldThings = things.Where(p.DateTime.Year < DateTime.Now.Year);
That internally will foreach with linq and give you back only the items with a year less than the current year. Cool..
But if I am doing this:
List<Thing> things = new List<Thing>();
foreach(XElement node in Results) {
things.Add(new Thing(node));
}
I don't need to use a linq for each loop. Even if I did...
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing") {
if (node.Ignore) {
continue;
}
thing.Add(node);
}
even though I could write that cleaner like
foreach(var node in thingNodes.Where(p => p.NodeType == "Thing" && !node.Ignore) {
thing.Add(node);
}
There is no real reason I can think of to do this..>
things.ForEach(thing => {
//do something
//can't break
//can't continue
return; //<- continue
});
And if I want the fastest loop possible,
for (int i = 0; i < things.Count; ++i) {
var thing = things[i];
//do something
}
Will be faster.
Your LINQ isn't quite right as you're converting the results of Where to a List and then pseudo-iterating over those results with ForEach to add to another list. Use ToList or AddRange for converting or adding sequences to lists.
Example, where overwriting list1 (if it were actually a List<T>):
list1 = MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId).ToList();
or to append:
list1.AddRange(MapEntryTable.Where(x => x.IsDisplayedColumn == true)
.Select(mapEntry => mapEntry.OutputColumnId));
In C#, to do what you want functionally in one call, you have to write your own partition method. If you are open to using F#, you can use List.Partition<'T>
https://msdn.microsoft.com/en-us/library/ee353782.aspx
First, I know this isn't possible out of the box because of obvious reasons.
foreach(string item in myListOfStrings) {
myListOfStrings.Remove(item);
}
The snipped above is one of the most horrible things I've ever seen. So, how do you achieve it then? You could iterate through the list backwards using for, but I don't like this solution either.
What I'm wondering is: Is there a method/extensions that returns an IEnumerable from the current list, something like a floating copy? LINQ has numerous extension methods that do exactly this, but you always have to do something with it, such as filtering (where, take...).
I'm looking forward to something like this:
foreach(string item in myListOfStrings.Shadow()) {
myListOfStrings.Remove(item);
}
where as .Shadow() is:
public static IEnumerable<T> Shadow<T>(this IEnumerable<T> source) {
return new IEnumerable<T>(source);
// or return source.Copy()
// or return source.TakeAll();
}
Example
foreach(ResponseFlags flag in responseFlagsList.Shadow()) {
switch(flag) {
case ResponseFlags.Case1:
...
case ResponseFlags.Case2:
...
}
...
this.InvokeSomeVoidEvent(flag)
responseFlagsList.Remove(flag);
}
Solution
This is how I solved it, and it works like a charm:
public static IEnumerable<T> Shadow<T>(this IEnumerable<T> source) where T: new() {
foreach(T item in source)
yield return item;
}
It's not that super fast (obviously), but it's safe and exactly what I intended to do.
Removing multiple elements from a list 1 by 1 is a C# anti-pattern due to how lists are implemented.
Of course, it can be done with a for loop (instead of foreach). Or it can be done by making a copy of the list. But here is why it should not be done. On a list of 100000 random integers, this takes 2500 ms on my machine:
foreach (var x in listA.ToList())
if (x % 2 == 0)
listA.Remove(x);
and this takes 1250 ms:
for (int i = 0; i < listA.Count; i++)
if (listA[i] % 2 == 0)
listA.RemoveAt(i--);
while these two take 5 and 2 ms respectively:
listB = listB.Where(x => x % 2 != 0).ToList();
listB.RemoveAll(x => x % 2 == 0);
This is because when you remove an element from a list, you are actually deleting from an array, and this is O(N) time, as you need to shift each element after the deleted element one position to the left. On average, this will be N/2 elements.
Remove(element) also needs to find the element before removing it. So Remove(element) will actually always take N steps - elementindex steps to find the element, N - elementindex steps to remove it - in total, N steps.
RemoveAt(index) doesn't have to find the element, but it still has to shift the underlying array, so on average, a RemoveAt is N/2 steps.
The end result is O(N^2) complexity either way, as you're removing up to N elements.
Instead, you should use Linq, which will modify the entire list in O(N) time, or roll your own, but you should not use Remove (or RemoveAt) in a loop.
Why not just do:
foreach(string item in myListOfStrings.ToList())
{
myListOfStrings.Remove(item);
}
To create a copy of the original and use for iterating, then remove from the existing.
If you really need your extension method you could perhaps create something more readable to the user such as:
public static IEnumerable<T> Shadow<T>(this IEnumerable<T> items)
{
if (items == null)
throw new NullReferenceException("Items cannot be null");
List<T> list = new List<T>();
foreach (var item in items)
{
list.Add(item);
}
return list;
}
Which is essentially the same as .ToList().
Calling:
foreach(string item in myListOfStrings.Shadow())
You do not LINQ extension methods for this - you can create a new list explicitly, like this:
foreach(string item in new List<string>(myListOfStrings)) {
myListOfStrings.Remove(item);
}
You have to create a copy of the original list while iterating as below:
var myListOfStrings = new List<string>();
myListOfStrings.Add("1");
myListOfStrings.Add("2");
myListOfStrings.Add("3");
myListOfStrings.Add("4");
myListOfStrings.Add("5");
foreach (string item in myListOfStrings.ToList())
{
myListOfStrings.Remove(item);
}
Your example removes all items from the string, so it's equivalent to:
myListOfStrings.Clear();
It is also equivalent to:
myListOfStrings.RemoveAll(x => true); // Empties myListOfStrings
But what I think you're looking for is a way to remove items for which a predicate is true - which is what RemoveAll() does.
So you could write, for example:
myListOfStrings.RemoveAll(x => x == "TEST"); // Modifies myListOfStrings
Or use any other predicate.
However, that changes the ORIGINAL list; If you just want a copy of the list with certain items removed, you can just use normal Linq:
// Note != instead of == as used in Removeall(),
// because the logic here is reversed.
var filteredList = myListOfStrings.Where(x => x != "TEST").ToList();
Picking up on the answer of svinja I do believe the most efficient way of solving this problem is by doing:
for (int i = 0; i < listA.Count;) {
if (listA[i] % 2 == 0)
listA.RemoveAt(i);
else
i++;
}
It improves on the answer by removing unnecessary sums and subtractions.