LINQ lazy evaluation causing issues with array iterator - c#

I have a class that contains four EnumerableRowCollections, which all point to the same DataTable. The main one will need different combinations of the other three filtered out in different class instances. Since three of them are related, I put them in an array.
EnumerableRowCollection<DataRow> valid;
EnumerableRowCollection<DataRow>[] pending;
All of these collections are defined in the class constructor, but evaluated later due to LINQ's lazy evaluation.
I also have an array of Booleans, which are used to determine which "pending" collections are filtered out of the "valid" collection. These are also assigned in the constructor, and are never changed.
Boolean[] pendingIsValid;
The "valid" collection is filtered like this:
for (var i = 0; i < pending.Length; i++)
if (pendingIsValid[i] && pending[i].Count() > 0)
valid = valid.Where(r => !pending[i].Contains(r));
This also occurs in the constructor, but the Where clause is evaluated lazily, as expected.
This works most of the time, however, in a few cases I got a weird exception when the collection evaluation took place down the road.
I get an IndexOutOfRange because of the local iterator variable, i, in my for loop above is set to 3.
Questions:
Can I make "Where" evaluate the array indexer (or other sub-expressions) non-lazily?
How does the iterator get incremented to 3 at all? Does this lazy evaluation count as "re-entering" the loop?
!?!?

Change it to this:
for (var i = 0; i < pending.Length; i++)
if (pendingIsValid[i] && pending[i].Count() > 0)
{
var j = i;
valid = valid.Where(r => !pending[j].Contains(r));
}
For question #1 - you can make it not lazy by adding .ToList() at the end. However, with the above fix, you can keep it lazy.
Have a read of this: Captured variable in a loop in C# for the explanation

Excellent, Rob. I also figured out this while I was waiting for a response, but yours looks a bit cleaner.
for (var i = 0; i < pending.Length; i++) {
var p = pending[i];
if (pendingIsValid[i] && p.Count() > 0)
valid = valid.Where(r => !p.Contains(r));
}

Related

IEnumerable IndexOutOfRangeException

I dont know why this I'm getting System.IndexOutOfRangeException: 'Index was outside the bounds of the array.' with this code
IEnumerable<char> query = "Text result";
string illegals = "abcet";
for (int i = 0; i < illegals.Length; i++)
{
query = query.Where(c => c != illegals[i]);
}
foreach (var item in query)
{
Console.Write(item);
}
Please can someone explain what's wrong with my code.
The problem is that your lambda expression is capturing the variable i, but the delegate isn't being executed until after the loop. By the time the expression c != illegals[i] is executed, i is illegals.Length, because that's the final value of i. It's important to understand that lambda expressions capture variables, rather than "the values of those variables at the point of the lambda expression being converted into a delegate".
Here are five ways of fixing your code:
Option 1: local copy of i
Copy the value of i into a local variable within the loop, so that each iteration of the loop captures a new variable in the lambda expression. That new variable isn't changed by the rest of the execution of the loop.
for (int i = 0; i < illegals.Length; i++)
{
int copy = i;
query = query.Where(c => c != illegals[copy]);
}
Option 2: extract illegals[i] outside the lambda expression
Extract the value of illegals[i] in the loop (outside the lambda expression) and use that value in the lambda expression. Again, the changing value of i doesn't affect the variable.
for (int i = 0; i < illegals.Length; i++)
{
char illegal = illegals[i];
query = query.Where(c => c != illegal);
}
Option 3: use a foreach loop
This option only works properly with C# 5 and later compilers, as the meaning of foreach changed (for the better) in C# 5.
foreach (char illegal in illegals)
{
query = query.Where(c => c != illegal);
}
Option 4: use Except once
LINQ provides a method to perform set exclusion: Except. This is not quite the same as the earlier options though, as you'll only get a single copy of any particular character in your output. So if e wasn't in illegals, you'd get a result of "Tex resul" with the above options, but "Tex rsul" using Except. Still, it's worth knowing about:
// Replace the loop entirely with this
query = query.Except(illegals);
Option 5: Use Contains once
You can call Where once, with a lambda expression that calls Contains:
// Replace the loop entirely with this
query = query.Where(c => !illegals.Contains(c));
This happens because, although your for loop seems at first glance to be correctly bounded, each iteration captures the index in the closure that is passed to Where. one of the most useful properties of closures is that they capture by reference, enabling all sorts of powerful and sophisticated techniques. However, in this case it means that, by the time the query is executed in the ensuing foreach loop. The index has been incremented past the length of the array.
The most straightforward change to fix this is create a loop scoped copy the current value of the index loop control variable and refer to this in your closure instead of referring directly to the loop control variable.
Ex:
for (int i = 0; i < illegals.Length; i++)
{
var index = i;
query = query.Where(c => c != illegals[index]);
}
However, as has been noted by others, there are better ways to write this that void the problem entirely and they also have the virtue that they raise the level of abstraction.
For example, you can use System.Linq.Enumerable.Except
var legals = query.Except(illegals);

Task Parallel.ForEach loop Error when removing items "Index was outside the bounds of the array. "

I am trying to remove items for a generic list of objects in a foreach loop. When I am doing same thing with task parallel library loop i am getting error.
Index was outside the bounds of the array.
following is my code
List<string> lstSubscriberDidTransaction = ...; // Initialization
var lstSubscriber = JsonConvert.DeserializeObject<List<SubscriberInfoShortenObject>>(somestring);
foreach (string strId in lstSubscriberDidTransaction)
{
lstSubscriber.RemoveAll(h => h != null && h.Msisdn == strId);
}
//Parallel.ForEach(lstSubscriberDidTransaction, msisdn => lstSubscriber.RemoveAll(h => h != null && h.Msisdn == msisdn));
Can somebody help me in it
I am using .net 3.5. for task parallel library with http://nuget.org/packages/TaskParallelLibrary
The List class is not designed for concurrent write (/remove) operations, as stated in the MSDN:
It is safe to perform multiple read operations on a List, but
issues can occur if the collection is modified while it’s being read.
To ensure thread safety, lock the collection during a read or write
operation. To enable a collection to be accessed by multiple threads
for reading and writing, you must implement your own synchronization.
For collections with built-in synchronization, see the classes in the
System.Collections.Concurrent namespace. For an inherently thread–safe
alternative, see the ImmutableList class.
For data structures supporting concurrent access, see this linked article.
To clarify why your problem arises from the List class:
The RemoveAll operation will iterate over the list instance and match the predicate against every contained instance. If the predicate evaluates to true, the index of the matched instance will be used to remove the entry. If the operation is performed in a concurrent matter, another thread may have already removed another entry, so the index is no longer valid or will point to another instance not matching the predicate. The operation is therefore not threadsafe and will not give the results you are expecting.
Just for your viewing pleasure, the given code is the decompiled method from the List class:
public int RemoveAll(Predicate<T> match)
{
if (match == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
int index1 = 0;
while (index1 < this._size && !match(this._items[index1]))
++index1;
if (index1 >= this._size)
return 0;
int index2 = index1 + 1;
while (index2 < this._size)
{
while (index2 < this._size && match(this._items[index2]))
++index2;
if (index2 < this._size)
this._items[index1++] = this._items[index2++];
}
Array.Clear((Array) this._items, index1, this._size - index1);
int num = this._size - index1;
this._size = index1;
++this._version;
return num;
}
To give you some more hints:
Do not use parallel code, as it will not help you without big changes. Optimize your look up data structure and simplify your statement.
HashSet<string> lstSubscriberDidTransaction = ...
...
lstSubscriber.RemoveAll(h => h != null && lstSubscriberDidTransaction.Contains(h.Msisdn))
This should improve the performance, for any more help we would need more insight into your code.

count objects that meet certain condition in List-collection

I want to count the occurences of objects within a List<T> that match a certain condition.
For example like this
int List<T>.Count(Predicate<T> match)
So for example if have a list of chores, I can see how many are overdue.
int overdue = listOfChores.Count((element) => { return element.DueDate <= DateTime.Today; });
I know that does not exist and so far I solve problems like that in the following way:
int overdue = listOfChores.FindAll([...]).Count;
However that allocates and initializes a new List etc. only to get the count.
A way to do this with less allocation overhead etc.:
int good = 0;
foreach(chore element in listOfChores)
if(element.DueDate <= DateTime.Today)
good++;
The last approach can also be exandend to count several conditions without iterating over the loop more than once. (I already found that getting the count property only takes O(1), but making the List to count from still eats a lot of time)
int a = 0;
int b = 0;
foreach(chore element in listOfChores)
if(element.CondA)
a++;
if(element.CondB)
b++;
Given this I could even imagine something like
int[] List<T>.Count(Predicate<T>[] matches)
My question(s):
Is there such a thing, just I haven't found it yet?
If not: What would be way to implement such functionality?
EDIT :
Adding LINQ looks like it fixes it.
You just have your syntax slightly off. This is how to use Count :
int overdue = listOfChores.Count(element => element.DueDate <= DateTime.Today);
If you already have a Predicate<T> and want to pass it to Count just call it like a function:
Predicate<Chore> p = (element) => element.DueDate <= DateTime.Today;
int overdue = listOfChores.Count(element => p(element));
There's is a count method using a predicate : see Enumerable.Count Method (IEnumerable, Func)
Note that this method is an extension method and you can use it only if you add a reference to the System.Linq namespace.

Simple List<string> vs IEnumarble<string> Performance issues

I've tested List<string> vs IEnumerable<string>
iterations with for and foreach loops , is it possible that the List is much faster ?
these are 2 of few links I could find that are publicly stating that performance is better iterating IEnumerable over List.
Link1
Link2
my tests was loading 10K lines from a text file that holds a list of URLs.
I've first loaded it in to a List , then copied List to an IEnumerable
List<string> StrByLst = ...method to load records from the file .
IEnumerable StrsByIE = StrByLst;
so each has 10k items Type <string>
looping on each collection for 100 times , meaning 100K iterations, resulted with
List<string> is faster by amazing 50 x than the IEnumerable<string>
is that predictable ?
update
this is the code that is doing the tests
string WorkDirtPath = HostingEnvironment.ApplicationPhysicalPath;
string fileName = "tst.txt";
string fileToLoad = Path.Combine(WorkDirtPath, fileName);
List<string> ListfromStream = new List<string>();
ListfromStream = PopulateListStrwithAnyFile(fileToLoad) ;
IEnumerable<string> IEnumFromStream = ListfromStream ;
string trslt = "";
Stopwatch SwFr = new Stopwatch();
Stopwatch SwFe = new Stopwatch();
string resultFrLst = "",resultFrIEnumrable, resultFe = "", Container = "";
SwFr.Start();
for (int itr = 0; itr < 100; itr++)
{
for (int i = 0; i < ListfromStream.Count(); i++)
{
Container = ListfromStream.ElementAt(i);
}
//the stop() was here , i was doing changes , so my mistake.
}
SwFr.Stop();
resultFrLst = SwFr.Elapsed.ToString();
//forgot to do this reset though still it is faster (x56??)
SwFr.Reset();
SwFr.Start();
for(int itr = 0; itr<100; itr++)
{
for (int i = 0; i < IEnumFromStream.Count(); i++)
{
Container = IEnumFromStream.ElementAt(i);
}
}
SwFr.Stop();
resultFrIEnumrable = SwFr.Elapsed.ToString();
Update ... final
taking out the counter to outside of the for loops ,
int counter = ..countfor both IEnumerable & List
then passed counter(int) as a count of total items as suggested by #ScottChamberlain .
re checked that every thing is in place, now the results are 5 % faster IEnumerable.
so that concludes , use by scenario - use case... no performance difference at all ...
You are doing something wrong.
The times that you get should be very close to each other, because you are running essentially the same code.
IEnumerable is just an interface, which List implements, so when you call some method on the IEnumerable reference it ends up calling the corresponding method of List.
There is no code implemented in the IEnumerable - this is what interfaces are - they only specify what functionality a class should have, but say nothing about how it's implemented.
You have a few problems with your test, one is the IEnumFromStream.Count() inside the for loop, every time it want to get that value it must enumerate over the entire list to get the count and the value is not cached between loops. Move that call outside of the for loop and save the result in a int and use that value for the for loop, you will see a shorter time for your IEnumerable.
Also the IEnumFromStream.ElementAt(i) behaves similarly to Count() it must iterate over the whole list up to i (eg: first time it goes 0, second time 0,1, third 0,1,2, and so on...) every time where List can jump directly to the index it needs. You should be working with the IEnumerator returned from GetEnumerator() instead.
IEnumerable's and for loop's don't mix well. Use the correct tool for the job, either call GetEnumerator() and work with that or use it in a foreach loop.
Now, I know a lot of you may be saying "But it is a interface it will be just mapping the calls and it should make no difference", but there is a key thing, IEnumerable<T> Does not have a Count() or ElementAt() method!. Those methods are extension methods added by LINQ, and the LINQ classes do not know the underlying collection is a List, so it does what it knows the underlying object can do, and that is iterating over the list every time the method is called.
IEnumerable using IEnumerator
using(var enu = IEnumFromStream.GetEnumerator())
{
//You have to call "MoveNext()" once before getting "Current" the first time,
// this is done so you can have a nice clean while loop like this.
while(enu.MoveNext())
{
Container = enu.Current;
}
}
The above code is basically the same thing as
foreach(var enu in IEnumFromStream)
{
Container = enu;
}
The important thing to remember is IEnumerable's do not have a length, in fact they can be infinitely long. There is a whole field of computer science on detecting a infinitely long IEnumerable
Based on the code you posted I think the problem is with your use of the Stopwatch class.
You declare two of these, SwFr and SwFe, but only use the former. Because of this, the last call to SwFr.Elapsed will get the total amount of time across both sets of for loops.
If you are wanting to reuse that object in this way, place a call to SwFr.Reset() right after resultFrLst = SwFr.Elapsed.ToString();.
Alternatively, you could use SwFe when running the second test.

Is the condition in a for loop evaluated each iteration?

When you do stuff like:
for (int i = 0; i < collection.Count; ++i )
is collection.Count called on every iteration?
Would the result change if the Count property dynamically gets the count on call?
Yes Count will be evaluated on every single pass. The reason why is that it's possible for the collection to be modified during the execution of a loop. Given the loop structure the variable i should represent a valid index into the collection during an iteration. If the check was not done on every loop then this is not provably true. Example case
for ( int i = 0; i < collection.Count; i++ ) {
collection.Clear();
}
The one exception to this rule is looping over an array where the constraint is the Length.
for ( int i = 0; i < someArray.Length; i++ ) {
// Code
}
The CLR JIT will special case this type of loop, in certain circumstances, since the length of an array can't change. In those cases, bounds checking will only occur once.
Reference: http://blogs.msdn.com/brada/archive/2005/04/23/411321.aspx
Count would be evaluated on every pass. If you continued to add to the collection and the iterator never caught up, you would have an endless loop.
class Program
{
static void Main(string[] args)
{
List<int> intCollection = new List<int>();
for(int i=-1;i < intCollection.Count;i++)
{
intCollection.Add(i + 1);
}
}
}
This eventually will get an out of memory exception.
Yes count is checked at every call from the first iteration after the initialization of i to the last iteration where the check fails and the for loop is exited. You can modify the collections count if you want but realize you could end up in an endless loop.
Like the other answers here: Yes, in principal.
There is (at least) one noticeable exception, array.Length. In
for (int i = 0; i < a.Length; i++) a[i] = ...;
The Length property will only be evaluated once. This is a optimization that is hardwired into the compiler. There might be others like that (in the future) but only if it is guaranteed not to make a difference in observable behavior.
Side note, this is NOT checked for every interation in VB.
Unlike C#, VB caches the result of the collection.Count.
EDIT:
The literal VB version of the C# for loop is:
Dim i = 0
Do While i < collection.Count
'code goes here
i+=1
Loop

Categories

Resources