Strange Delegate Reference Behaviour - c#

I have a need to pass the results of a source function (which returns an IEnumerable) through a list of other processing functions (that each take and return an IEnumerable).
All is fine up to that point, but I also need to allow the processing functions to perform multiple loops over their input enumerables.
So rather than pass in IEnumerable<T>, I thought I would change the input parameter to Func<IEnumerable<T>> and allow each of the functions to restart the enumerable if required.
Unfortunately, I'm now getting a stack overflow where the final processing function is calling itself rather than passing the request back down the chain.
The example code is a bit contrived but hopefully gives you an idea of what I'm trying to achieve.
class Program
{
public static void Main(string[] args)
{
Func<IEnumerable<String>> getResults = () => GetInputValues("A", 5);
List<String> valuesToAppend = new List<String>();
valuesToAppend.Add("B");
valuesToAppend.Add("C");
foreach (var item in valuesToAppend)
{
getResults = () => ProcessValues(() => getResults(),item);
}
foreach (var item in getResults())
{
Console.WriteLine(item);
}
}
public static IEnumerable<String> GetInputValues(String value, Int32 numValues)
{
for (int i = 0; i < numValues; i++)
{
yield return value;
}
}
public static IEnumerable<String> ProcessValues(Func<IEnumerable<String>> getInputValues, String appendValue)
{
foreach (var item in getInputValues())
{
yield return item + " " + appendValue;
}
}
}

getResults is captured as a variable, not a value. I don't really like the overall approach you're using here (it seems convoluted), but you should be able to fix the stackoverflow by changing the capture:
foreach (var item in valuesToAppend)
{
var tmp1 = getResults;
var tmp2 = item;
getResults = () => ProcessValues(() => tmp1(),tmp2);
}
On a side note: IEnumerable[<T>] is already kinda repeatable, you simply call foreach another time - is is IEnumerator[<T>] that (despite the Reset()) isn't - but also, I think it is worth doing trying to do this without needing to ever repeat the enumeration, since in the general case that simply cannot be guaranteed to work.
Here's a simpler (IMO) implementation with the same result:
using System;
using System.Collections.Generic;
using System.Linq;
class Program {
public static void Main() {
IEnumerable<String> getResults = Enumerable.Repeat("A", 5);
List<String> valuesToAppend = new List<String> { "B", "C" };
foreach (var item in valuesToAppend) {
string tmp = item;
getResults = getResults.Select(s => s + " " + tmp);
}
foreach (var item in getResults) {
Console.WriteLine(item);
}
}
}

Related

IEnumerable performs differently on Array vs List

This question is more of a "is my understanding accurate", and if not, please help me get my head around it. I have this bit of code to explain my question:
class Example
{
public string MyString { get; set; }
}
var wtf = new[] { "string1", "string2"};
IEnumerable<Example> transformed = wtf.Select(s => new Example { MyString = s });
IEnumerable<Example> transformedList = wtf.Select(s => new Example { MyString = s }).ToList();
foreach (var i in transformed)
i.MyString = "somethingDifferent";
foreach (var i in transformedList)
i.MyString = "somethingDifferent";
foreach(var i in transformed)
Console.WriteLine(i.MyString);
foreach (var i in transformedList)
Console.WriteLine(i.MyString);
It outputs:
string1
string2
somethingDifferent
somethingDifferent
Both Select() methods at first glance return IEnumerable< Example>. However, underlying types are WhereSelectArrayIterator< string, Example> and List< Example >.
This is where my sanity started to come into question. From my understanding the difference in output above is because of the way both underlying types implement the GetEnumerator() method.
Using this handy website, I was able to (I think) track down the bit of code that was causing the difference.
class WhereSelectArrayIterator<TSource, TResult> : Iterator<TResult>
{ }
Looking at that on line 169 points me to Iterator< TResult>, since that's where it appears GetEnumerator() is called.
Starting on line 90 I see:
public IEnumerator<TSource> GetEnumerator() {
if (threadId == Thread.CurrentThread.ManagedThreadId && state == 0) {
state = 1;
return this;
}
Iterator<TSource> duplicate = Clone();
duplicate.state = 1;
return duplicate;
}
What I gather from that is when you enumerate over it, you're actually enumerating over a cloned source (as written in the WhereSelectArrayIterator class' Clone() method).
This will satisfy my need to understand for now, but as a bonus, if someone could help me figure out why this isn't returned the first time I enumerate over the data. From what I can tell, the state should = 0 the first pass. Unless, perhaps there is magic happening under the hood that is calling the same method from different threads.
Update
At this point I'm thinking my 'findings' were a bit misleading (damn Clone method taking me down the wrong rabbit hole) and it was indeed due to deferred execution. I mistakenly thought that even though I deferred execution, once it was enumerated the first time it would store those values in my variable. I should have known better; after all I was using the new keyword in the Select. That said, it still did open my eyes to the idea that a particular class' GetEnumerator() implementation could still return a clone which would present a very similar problem. It just so happened that my problem was different.
Update2
This is an example of what I thought my problem was. Thanks everyone for the information.
IEnumerable<Example> friendly = new FriendlyExamples();
IEnumerable<Example> notFriendly = new MeanExamples();
foreach (var example in friendly)
example.MyString = "somethingDifferent";
foreach (var example in notFriendly)
example.MyString = "somethingDifferent";
foreach (var example in friendly)
Console.WriteLine(example.MyString);
foreach (var example in notFriendly)
Console.WriteLine(example.MyString);
// somethingDifferent
// somethingDifferent
// string1
// string2
Supporting classes:
class Example
{
public string MyString { get; set; }
public Example(Example example)
{
MyString = example.MyString;
}
public Example(string s)
{
MyString = s;
}
}
class FriendlyExamples : IEnumerable<Example>
{
Example[] wtf = new[] { new Example("string1"), new Example("string2") };
public IEnumerator<Example> GetEnumerator()
{
return wtf.Cast<Example>().GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return wtf.GetEnumerator();
}
}
class MeanExamples : IEnumerable<Example>
{
Example[] wtf = new[] { new Example("string1"), new Example("string2") };
public IEnumerator<Example> GetEnumerator()
{
return wtf.Select(e => new Example(e)).Cast<Example>().GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return wtf.Select(e => new Example(e)).GetEnumerator();
}
}
Linq works by making each function return another IEnumerable that is typically a deferred processor. No actual execution occurs until an enumeration of the finally returned Ienumerable occurs. This allows for the create of efficient pipelines.
When you do
var transformed = wtf.Select(s => new Example { MyString = s });
The select code has not actually executed yet. Only when you finally enumerate transformed will the select be done. ie here
foreach (var i in transformed)
i.MyString = "somethingDifferent";
Note that if you do
foreach (var i in transformed)
i.MyString = "somethingDifferent";
the pipeline will be executed again. Here thats is not a big deal but it can be huge if IO is involved.
this line
var transformedList = wtf.Select(s => new Example { MyString = s }).ToList();
Is the same as
var transformedList = transformed.ToList();
The real eyeopener is to place debug statements or breakpoints inside a where or select to actually see the deferred pipeline execution
reading the implementation of linq is useful. here is select https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,5c652c53e80df013,references

Is it the same to iterate over Linq expression result than to assign it first to a variable?

So, this is more difficult to explain in words, so i will put code examples.
let's suppose i already have a list of clients that i want to filter.
Basically i want to know if this:
foreach(var client in list.Where(c=>c.Age > 20))
{
//Do something
}
is the same as this:
var filteredClients = list.Where(c=>c.Age > 20);
foreach(var client in filteredClients)
{
//Do something
}
I've been told that the first approach executes the .Where() in every iteration.
I'm sorry if this is a duplicate, i couldn't find any related question.
Thanks in advance.
Yes, both those examples are functionally identical. One just stores the result from Enumerable.Where in a variable before accessing it while the other just accesses it directly.
To really see why this will not make a difference, you have to understand what a foreach loop essentially does. The code in your examples (both of them) is basically equivalent to this (I’ve assumed a known type Client here):
IEnumerable<Client> x = list.Where(c=>c.Age > 20);
// foreach loop
IEnumerator<Client> enumerator = x.GetEnumerator();
while (enumerator.MoveNext())
{
Client client = enumerator.Current;
// Do something
}
So what actually happens here is the IEnumerable result from the LINQ method is not consumed directly, but an enumerator of it is requested first. And then the foreach loop does nothing else than repeatedly asking for a new object from the enumerator and processing the current element in each loop body.
Looking at this, it doesn’t make sense whether the x in the above code is really an x (i.e. a previously stored variable), or whether it’s the list.Where() call itself. Only the enumerator object—which is created just once—is used in the loop.
Now to cover that SharePoint example which Colin posted. It looks like this:
SPList activeList = SPContext.Current.List;
for (int i=0; i < activeList.Items.Count; i++)
{
SPListItem listItem = activeList.Items[i];
// do stuff
}
This is a fundamentally different thing though. Since this is not using a foreach loop, we do not get that one enumerator object which we use to iterate through the list. Instead, we repeatedly access activeList.Items: Once in the loop body to get an item by index, and once in the continuation condition of the for loop where we get the collection’s Count property value.
Unfortunately, Microsoft does not follow its own guidelines all the time, so even if Items is a property on the SPList object, it actually is creating a new SPListItemCollection object every time. And that object is empty by default and will only lazily load the actual items when you first access an item from it. So above code will eventually create a large amount of SPListItemCollections which will each fetch the items from the database. This behavior is also mentioned in the remarks section of the property documentation.
This generally violates Microsoft’s own guidelines on choosing a property vs a method:
Do use a method, rather than a property, in the following situations.
The operation returns a different result each time it is called, even if the parameters do not change.
Note that if we used a foreach loop for that SharePoint example again, then everything would have been fine, since we would have again only requested a single SPListItemCollection and created a single enumerator for it:
foreach (SPListItem listItem in activeList.Items.Cast<SPListItem>())
{ … }
They are not quite the same:
Here is the original C# code:
static void ForWithVariable(IEnumerable<Person> clients)
{
var adults = clients.Where(x => x.Age > 20);
foreach (var client in adults)
{
Console.WriteLine(client.Age.ToString());
}
}
static void ForWithoutVariable(IEnumerable<Person> clients)
{
foreach (var client in clients.Where(x => x.Age > 20))
{
Console.WriteLine(client.Age.ToString());
}
}
Here is the decompiled Intermediate Language (IL) code this results in (according to ILSpy):
private static void ForWithVariable(IEnumerable<Person> clients)
{
Func<Person, bool> arg_21_1;
if ((arg_21_1 = Program.<>c.<>9__1_0) == null)
{
arg_21_1 = (Program.<>c.<>9__1_0 = new Func<Person, bool>(Program.<>c.<>9.<ForWithVariable>b__1_0));
}
IEnumerable<Person> enumerable = clients.Where(arg_21_1);
foreach (Person current in enumerable)
{
Console.WriteLine(current.Age.ToString());
}
}
private static void ForWithoutVariable(IEnumerable<Person> clients)
{
Func<Person, bool> arg_22_1;
if ((arg_22_1 = Program.<>c.<>9__2_0) == null)
{
arg_22_1 = (Program.<>c.<>9__2_0 = new Func<Person, bool>(Program.<>c.<>9.<ForWithoutVariable>b__2_0));
}
foreach (Person current in clients.Where(arg_22_1))
{
Console.WriteLine(current.Age.ToString());
}
}
As you can see, there is a key difference:
IEnumerable<Person> enumerable = clients.Where(arg_21_1);
A more practical question, however, is whether the differences hurt performance. I concocted a test to measure that.
class Program
{
public static void Main()
{
Measure(ForEachWithVariable);
Measure(ForEachWithoutVariable);
Console.ReadKey();
}
static void Measure(Action<List<Person>, List<Person>> action)
{
var clients = new[]
{
new Person { Age = 10 },
new Person { Age = 20 },
new Person { Age = 30 },
}.ToList();
var adultClients = new List<Person>();
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 1E6; i++)
action(clients, adultClients);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds.ToString());
Console.WriteLine($"{adultClients.Count} adult clients found");
}
static void ForEachWithVariable(List<Person> clients, List<Person> adultClients)
{
var adults = clients.Where(x => x.Age > 20);
foreach (var client in adults)
adultClients.Add(client);
}
static void ForEachWithoutVariable(List<Person> clients, List<Person> adultClients)
{
foreach (var client in clients.Where(x => x.Age > 20))
adultClients.Add(client);
}
}
class Person
{
public int Age { get; set; }
}
After several runs of the program, I was not able to find any significant difference between ForEachWithVariable and ForEachWithoutVariable. They were always close in time, and neither was consistently faster than the other. Interestingly, if I change 1E6 to just 1000, the ForEachWithVariable is actually consistently slower, by about 1 millisecond.
So, I conclude that for LINQ to Objects, there is no practical difference. The same type of test could be run if your particular use case involves LINQ to Entities (or SharePoint).

Scan nested dictionary with LINQ syntax

I have this working code:
using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
public class Example {
public static void Main(string[] args) {
var files = new Dictionary<string, Dictionary<string, int>>()
{ { "file1", new Dictionary<string, int>() { { "A", 1 } } } };
foreach(var file in files) {
File.WriteAllLines(file.Key + ".txt", file.Value.Select(
item => item.Key + item.Value.ToString("000")).ToArray());
}
}
}
But I want to change the foreach to LINQ syntax. Nothing I already tried worked.
Is this what you are after?
var files = new Dictionary<string, Dictionary<string, int>>()
{ { "file1", new Dictionary<string, int>() { { "A", 1 } } } };
files.ForEach(kvp =>
File.WriteAllLines(kvp.Key + ".txt", kvp.Value.Select(
item => item.Key + item.Value.ToString("000")).ToArray()));
As per Alexei's comment, IEnumerable.ForEach isn't a standard extension method as it implies mutation, which isn't the aim of functional programming. You can add it with a helper method like this one:
public static void ForEach<T>(
this IEnumerable<T> source,
Action<T> action)
{
foreach (T element in source)
action(element);
}
Also, your original title implied that the initializer syntax for Dictionaries is unwieldy. What you can do to reduce the amount of typing / code real estate for a large number of elements is to build up an array of anonymous objects and then ToDictionary(). Unfortunately there is a small performance impact:
var files = new [] { new { key = "file1",
value = new [] { new {key = "A", value = 1 } } } }
.ToDictionary(
_ => _.key,
_ => _.value.ToDictionary(x => x.key, x => x.value));
foreach is exactly what you should be using here. LINQ is all about querying data: projecting, filtering, sorting, grouping, etc. You're trying to execute an action for each element in collection which is already in there as you it.
Just iterate using foreach.
There are reasons why there is no ForEach extension method on IEnumerable<T>:
Why is there not a ForEach extension method on the IEnumerable interface?
Why I Don’t Use the ForEach Extension Method
It's mostly about:
The reason to not use ForEach is that it blurs the boundary between pure functional code and state-full imperative code.
The only reason I can see not to use foreach loop is when you want to make your actions run in parallel by using Parallel.ForEach instead:
Parallel.ForEach(
files,
kvp => File.WriteAllLines(kvp.Key + ".txt", kvp.Value.Select(
item => item.Key + item.Value.ToString("000")).ToArray()));
Having ForEach extension method on IEnumerable<T> is a bad design and I advice against it.

Iteration bound variable?

This is non-language-specific, but I'll use examples in C#. Often I face the problem in which I need to add a parameter to an object inside any given iteration of at least one of its parameters, and I have always to come up with a lame temporary list or array of some kind concomitant with the problem of keeping it properly correlated.
So, please bear with me on the examples below:
Is there an easier and better way to do this in C sharp?
List<String> storeStr;
void AssignStringListWithNewUniqueStr (List<String> aList) {
foreach (String str in aList) {
storeStr.add(str);
str = AProcedureToGenerateNewUniqueStr();
}
}
void PrintStringListWithNewUniqueStr (List<String> aList) {
int i = 0;
foreach (String str in aList) {
print(str + storeStr[i]);
i++;
}
}
Notice the correlation above is guaranteed only because I'm iterating through an unchanged aList. When asking about a "easier and better way" I mean it should also make sure the storeStr would always be correlated with its equivalent on aList while keeping it as short and simple as possible. The List could also have been any kind of array or object.
Is there any language in which something like this is possible? It must give same results than above.
IterationBound<String> storeStr;
void AssignStringListWithNewUniqueStr (List<String> aList) {
foreach (String str in aList) {
storeStr = str;
str = AProcedureToGenerateNewUniqueStr();
}
}
void PrintStringListWithNewUniqueStr (List<String> aList) {
foreach (String str in aList) {
print(str + storeStr);
}
}
In this case, the fictitious "IterationBound" kind would guarantee the correlation between the list and the new parameter (in a way, just like Garbage Collectors guarantee allocs). It would somehow notice it was created inside an iteration and associate itself with that specific index (no matter if the syntax there would be uglier, of course). Then, when its called back again in another iteration and it was already created or stored in that specific index, it would retrieve this specific value of that iteration.
Why not simply project your enumerable into a new form?
var combination = aList
.Select(x => new { Initial = x, Addition = AProcedureToGenerateNewUniqueStr() })
.ToList()
.ForEach(x =>
{
print(x.Initial + x.Addition);
});
This way you keep each element associated with the new data.
aList.ForEach(x => print(x + AProcedureToGeneratorNewUniqueString()));

Is there a way to know I am getting the last element in the foreach loop

I need to do special treatment for the last element in the collection. I am wondering if I can know I hit the last element when using foreach loop.
Only way I know of is to increment a counter and compare with length on exit, or when breaking out of loop set a boolean flag, loopExitedEarly.
There isn't a direct way. You'll have to keep buffering the next element.
IEnumerable<Foo> foos = ...
Foo prevFoo = default(Foo);
bool elementSeen = false;
foreach (Foo foo in foos)
{
if (elementSeen) // If prevFoo is not the last item...
ProcessNormalItem(prevFoo);
elementSeen = true;
prevFoo = foo;
}
if (elementSeen) // Required because foos might be empty.
ProcessLastItem(prevFoo);
Alternatively, you could use the underlying enumerator to do the same thing:
using (var erator = foos.GetEnumerator())
{
if (!erator.MoveNext())
return;
Foo current = erator.Current;
while (erator.MoveNext())
{
ProcessNormalItem(current);
current = erator.Current;
}
ProcessLastItem(current);
}
It's a lot easier when working with collections that reveal how many elements they have (typically the Count property from ICollection or ICollection<T>) - you can maintain a counter (alternatively, if the collection exposes an indexer, you could use a for-loop instead):
int numItemsSeen = 0;
foreach(Foo foo in foos)
{
if(++numItemsSeen == foos.Count)
ProcessLastItem(foo)
else ProcessNormalItem(foo);
}
If you can use MoreLinq, it's easy:
foreach (var entry in foos.AsSmartEnumerable())
{
if(entry.IsLast)
ProcessLastItem(entry.Value)
else ProcessNormalItem(entry.Value);
}
If efficiency isn't a concern, you could do:
Foo[] fooArray = foos.ToArray();
foreach(Foo foo in fooArray.Take(fooArray.Length - 1))
ProcessNormalItem(foo);
ProcessLastItem(fooArray.Last());
Unfortunately not, I would write it with a for loop like:
string[] names = { "John", "Mary", "Stephanie", "David" };
int iLast = names.Length - 1;
for (int i = 0; i <= iLast; i++) {
Debug.Write(names[i]);
Debug.Write(i < iLast ? ", " : Environment.NewLine);
}
And yes, I know about String.Join :).
I see others already posted similar ideas while I was typing mine, but I'll post it anyway:
void Enumerate<T>(IEnumerable<T> items, Action<T, bool> action) {
IEnumerator<T> enumerator = items.GetEnumerator();
if (!enumerator.MoveNext()) return;
bool foundNext;
do {
T item = enumerator.Current;
foundNext = enumerator.MoveNext();
action(item, !foundNext);
}
while (foundNext);
}
...
string[] names = { "John", "Mary", "Stephanie", "David" };
Enumerate(names, (name, isLast) => {
Debug.Write(name);
Debug.Write(!isLast ? ", " : Environment.NewLine);
})
Not without jumping through flaming hoops (see above). But you can just use the enumerator directly (slightly awkward because of C#'s enumerator design):
IEnumerator<string> it = foo.GetEnumerator();
for (bool hasNext = it.MoveNext(); hasNext; ) {
string element = it.Current;
hasNext = it.MoveNext();
if (hasNext) { // normal processing
Console.Out.WriteLine(element);
} else { // special case processing for last element
Console.Out.WriteLine("Last but not least, " + element);
}
}
Notes on the other approaches I see here: Mitch's approach requires having access to a container which exposes it's size. J.D.'s approach requires writing a method in advance, then doing your processing via a closure. Ani's approach spreads loop management all over the place. John K's approach involves creating numerous additional objects, or (second method) only allows additional post processing of the last element, rather than special case processing.
I don't understand why people don't use the Enumerator directly in a normal loop, as I've shown here. K.I.S.S.
This is cleaner with Java iterators, because their interface uses hasNext rather than MoveNext. You could easily write an extension method for IEnumerable that gave you Java-style iterators, but that's overkill unless you write this kind of loop a lot.
Is it Special treatment can be done only while processing on the foreach loop, Is it you can't do that while adding to the collection. If this is your case, have your own custom collection,
public class ListCollection : List<string>
{
string _lastitem;
public void Add(string item)
{
//TODO: Do special treatment on the new Item, new item should be last one.
//Not applicable for filter/sort
base.Add(item);
}
}
List<int> numbers = new ....;
int last = numbers.Last();
Stack<int> stack = new ...;
stack.Peek();
update
var numbers = new int[] { 1, 2,3,4,5 };
var enumerator = numbers.GetEnumerator();
object last = null;
bool hasElement = true;
do
{
hasElement = enumerator.MoveNext();
if (hasElement)
{
last = enumerator.Current;
Console.WriteLine(enumerator.Current);
}
else
Console.WriteLine("Last = {0}", last);
} while (hasElement);
Console.ReadKey();
Deferred Execution trick
Build a class that encapsulates the values to be processed and the processing function for deferred execution purpose. We will end up using one instance of it for each element processed in the loop.
// functor class
class Runner {
string ArgString {get;set;}
object ArgContext {get;set;}
// CTOR: encapsulate args and a context to run them in
public Runner(string str, object context) {
ArgString = str;
ArgContext = context;
}
// This is the item processor logic.
public void Process() {
// process ArgString normally in ArgContext
}
}
Use your functor in the foreach loop to effect deferred execution by one element:
// intended to track previous item in the loop
var recent = default(Runner); // see Runner class above
// normal foreach iteration
foreach(var str in listStrings) {
// is deferred because this executes recent item instead of current item
if (recent != null)
recent.Process(); // run recent processing (from previous iteration)
// store the current item for next iteration
recent = new Runner(str, context);
}
// now the final item remains unprocessed - you have a choice
if (want_to_process_normally)
recent.Process(); // just like the others
else
do_something_else_with(recent.ArgString, recent.ArgContext);
This functor approach uses memory more but prevents you from having to count the elements in advance. In some scenarios you might achieve a kind of efficiency.
OR
Shorter Workaround
If you want to apply special processing to the last element after processing them all in a regular way ....
// example using strings
var recentStr = default(string);
foreach(var str in listStrings) {
recentStr = str;
// process str normally
}
// now apply additional special processing to recentStr (last)
It's a potential workaround.

Categories

Resources