C# foreach: Not reference to original objects, but copies? - c#

I have some weird behaviour with a foreach-loop:
IEnumerable<Compound> loadedCompounds;
...
// Loop through the peaks.
foreach (GCPeak p in peaks)
{
// Loop through the compounds.
foreach (Compound c in loadedCompounds)
{
if (c.IsInRange(p) && c.SignalType == p.SignalType)
{
c.AddPeak(p);
}
}
}
So what I'd like to do: Loop through all the GCPeaks (it is a class) and sort them to their corresponding compounds.
AddPeak just adds the GCPeak to a SortedList. Code compiles and runs without exceptions, but the problem is:
After c.AddPeak(p) the SortedList in c contains the GCPeak (checked with Debugger), while the SortedLists in loadedCompounds remains empty.
I am quite confused with this bug I produced:
What is the reason for this behavior? Both Compound and GCPeak are classes so I'd expect references and not copies of my objects and my code to work.
How to do what I'd like to do properly?
EDIT:
This is how I obtain the IEnumarables (The whole thing is coming from an XML file - LINQ to XML). Compounds are obtained basically the same way.
IEnumerable<GCPeak> peaksFromSignal = from p in signal.Descendants("IntegrationResults")
select new GCPeak()
{
SignalType = signaltype,
RunInformation = runInformation,
RetentionTime = XmlConvert.ToDouble(p.Element("RetTime").Value),
PeakArea = XmlConvert.ToDouble(p.Element("Area").Value),
};
Thanks!

An IEnumerable won't hold a hard reference to your list. This causes two potential problems for you.
1) What you are enumerating might not be there anymore (for example if you were enumerating a list of facebook posts using a lazy technique like IEnumerable etc, but your connection to facebook for is closed, then it may evaluate to an empty enumerable. The same would occur if you were doing an IEnumerable over a database collection but that DB connection was closed etc.
2) Using an enumerable like that could lead you later to or previously to that to do a multiple enumeration which can have issues. Resharper typically warns against this (to prevent unintended consequences). See here for more info: Handling warning for possible multiple enumeration of IEnumerable
What you can do to debug your situation would be to use the LINQ extension of .toList() to force early evaluation of your IEnumerable. This will let you see what is in the IEnumerable easier and will let you follow this through your code. Do note that doing toList() does have performance implications as compared to a lazy reference like you have currently but it will force a hard reference earlier and help you debug your scenario and will avoid scenarios mentioned above causing challenges for you.

Thanks for your comments.
Indeed converting my loadedCompounds to a List<> worked.
Lesson learned: Be careful with IEnumerable.
EDIT
As requested, I am adding the implementation of AddPeak:
public void AddPeak(GCPeak peak)
{
if (peak != null)
{
peaks.Add(peak.RunInformation.InjectionDateTime, peak);
}
}
RunInformation is a struct.

Related

C# create List<T> in initialization vs get

I was wondering what is the best method to create a list in a certain object.
1) DefA "always" occupies memory beforehand even if it is never called, right?
2) DefB will "always" have to check for the null condition or does the compiler optimizes this?
3) Is there a better way to implement this?
Thanks
private List<A> _defA = new List<A>();
public List<A> DefA
{
get { return _defA; }
}
private List<B> _defB;
public List<B> DefB
{
get
{
if (_defB == null)
_defB = new List<B>();
return _defB;
}
}
Because I think both options will not affect on performance of your application, my suggestion to choose one which keep code cleaner
Use Lazy type - Lazy on MSDN
From MSDN about Lazy initialization:
By default, Lazy objects are thread-safe. That is, if the
constructor does not specify the kind of thread safety, the Lazy
objects it creates are thread-safe. In multi-threaded scenarios, the
first thread to access the Value property of a thread-safe Lazy
object initializes it for all subsequent accesses on all threads, and
all threads share the same data. Therefore, it does not matter which
thread initializes the object, and race conditions are benign.
So in your case
private Lazy<List<A>> _defA = new Lazy<List<A>>(() => new List<A>());
public List<A> DefA
{
get
{
return _defA.Value;
}
}
In addition this approach will tell your intents to other developers who may work with your code.
In this specific example, the delayed (lazy) instantiation might save a few milliseconds on startup; but at the risk of issues in a multi-threaded scenario.
Say two threads call DefB (Get) almost simultaneously - they might end up setting _defB twice, instead of the once that you intend.
_defA will always take the memory of an empty list, as I understand it, yes - so you'll save some memory the second way if it's not called - but it does make the code MUCH harder to understand. Also, what if a local piece of code doesn't call the accessor method, but just does _defB.Add() or whatever? (which might not be deliberate now, but because it's more complex it's easy to forget/miss in the future)
First of all, don't optimize something that doesn't need optimizing.
If you're creating thousands or millions of the object that contains that property, and this property is seldom used and thus seldom needed, then yes, adding lazy on-demand initialization is probably a good idea. I say probably because there may be other performance-related issues as well.
However, to answer your specific questions, other than "what is the best way":
The initialization of _defA will construct a List<A> object even if the property is never used, that is correct.
The getter method of DefB will always do the null check, that is also correct. The compiler cannot optimize this away.
As for "better way"? That part of the question falls into the "primarily opinion-based" close option here on Stack Overflow. It depends largely on what you determine is better:
More expressive syntax (shorter code)
Less memory spent (option B)
Less code in the getter (option A)
I can give you an alternative to the syntax in option A:
public List<A> DefA
{
get;
} = new List<A>();
This syntax is available in Visual Studio 2015 with C# 6 (even if you compiler for older .NET runtime versions) and is called Auto-property initializer.
The compiler will automagically create the backing field for you (the _defA equivalent) and mark it read-only, so feature-wise this is 100% identical to option A, it's just a different syntax.

Select and ForEach on List<> [duplicate]

This question already has answers here:
LINQ equivalent of foreach for IEnumerable<T>
(22 answers)
Closed 9 years ago.
I am quite new to C# and was trying to use lambda expressions.
I am having a list of object. I would like to select item from the list and perform foreach operation on the selected items. I know i could do it without using lambda expression but wanted to if this was possible using lambda expression.
So i was trying to achieve a similar result
List<UserProfile> users = new List<UserProfile>();
..load users with list of users
List<UserProfile> selecteditem = users.Where(i => i.UserName=="").ToList();
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
it was possible to do
users.Where(i => i.UserName=="").ToList().ForEach(i=>i.UserName="NA");
but not something like this
users.select(i => i.UserName=="").ForEach(i=>i.UserName="NA");
Can someone explain this behaviour..
Let's start here:
I am having a list of object.
It's important to understand that, while accurate, that statement leaves a c# programmer wanting more. What kind of object? In the .Net world, it pays to always keep in mind what specific type of object you are working with. In this case, that type is UserProfile. This may seem like a side issue, but it will become more relevant to the specific question very quickly. What you want to say instead is this:
I have a list of UserProfile objects.
Now let's look at your two expressions:
users.Where(i => i.UserName=="").ToList().ForEach(i=>i.UserName="NA");
and
users.Where(i => i.UserName=="").ForEach(i=>i.UserName="NA");
The difference (aside from that only the first compiles or works) is that you need to call .ToList() to convert the results of Where() function to a List type. Now we begin to see why it is that you want to always think in terms of types when working with .Net code, because it should now occur to you to wonder, "What type am I working with, then?" I'm glad you asked.
The .Where() function results in an IEnumerable<T> type, which is actually not a full type all by itself. It's an interface that describes certain things a type that implements it's contract will be able to do. The IEnumerable interface can be confusing at first, but the important thing to remember is that it defines something that you can use with a foreach loop. That is it's sole purpose. Anything in .Net that you can use with a foreach loop: arrays, lists, collections — they pretty much all implement the IEnumerable interface. There are other things you can loop over, as well. Strings, for example. Many methods you have today that require a List or Array as an argument can be made more powerful and flexible simply by changing that argument type to IEnumerable.
.Net also makes it easy to create state machine-based iterators that will work with this interface. This is especially useful for creating objects that don't themselves hold any items, but do know how to loop over items in a different collection in a specific way. For example, I might loop over just items 3 through 12 in an array of size 20. Or might loop over the items in alphabetical order. The important thing here is that I can do this without needing to copy or duplicate the originals. This makes it very efficient in terms of memory, and it's structure in such a way that you can easily compose different iterators together to get very powerful results.
The IEnumerable<T> type is especially important, because it is one of two types (the other being IQueryable) that form the core of the linq system. Most of the .Where(), .Select(), .Any() etc linq operators you can use are defined as extensions to IEnumerable.
But now we come to an exception: ForEach(). This method is not part of IEnumerable. It is defined directly as part of the List<T> type. So, we see again that it's important to understand what type you are working with at all times, including the results of each of the different expressions that make up a complete statement.
It's also instructional to go into why this particular method is not part of IEnumerable directly. I believe the answer lies in the fact that the linq system takes a lot of inspiration from a the Functional Programming world. In functional programming, you want to have operations (functions) that do exactly one thing, with no side effects. Ideally, these functions will not alter the original data, but rather they will return new data. The ForEach() method is implicitly all about creating bad side effects that alter data. It's just bad functional style. Additionally, ForEach() breaks method chaining, in that it doesn't return a new IEnumerable.
There is one more lesson to learn here. Let's take a look at your original snippet:
List<UserProfile> users = new List<UserProfile>();
// ..load users with list of users
List<UserProfile> selecteditem = users.Where(i => i.UserName=="").ToList();
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
I mentioned something earlier that should help you significantly improve this code. Remember that bit about how you can have IEnumerable items that loop over a collection, without duplicating it? Think about what happens if you wrote that code this way, instead:
List<UserProfile> users = new List<UserProfile>();
// ..load users with list of users
var selecteditem = users.Where(i => i.UserName=="");
foreach(UserProfile item in selecteditem)
{
item.UserName = "NA";
}
All I did was remove the call to .ToList(), but everything will still work. The only thing that changed is we avoided needing to copy the entire list. That should make this code faster. In some circumstances, it can make the code a lot faster. Something to keep in mind: when working the with the linq operator methods, it's generally good to avoid calling .ToArray() or .ToList() whenever possible, and it's possible a lot more than you might think.
As for the foreach() {...} vs .Foreach( ... ): the former is still perfectly appropriate style.
Sure, it's quite simple. List has a ForEach method. There is no such method, or extension method, for IEnumerable.
As to why one has a method and another doesn't, that's an opinion. Eric Lippert blogged on the topic if you're interested in his.

Is .Select<T>(...) to be prefered before .Where<T>(...)?

I got in a discussion with two colleagues regarding a setup for an iteration over an IEnumerable (the contents of which will not be altered in any way during the operation). There are three conflicting theories on which is the optimal approach. Both the others (and me as well) are very certain and that got me unsure, so for the sake of clarity, I want to check with an external source.
The scenario is as follows. We had the code below as a starting point and discovered that some of the hazaas need not to be acted upon. So, starting with the code below, we started to add a blocker for the action.
foreach(Hazaa hazaa in hazaas) ;
My suggestion is as follows.
foreach(Hazaa hazaa in hazaas.Where(element => condition)) ;
One of the guys wants to resolve it by a more explicit form, claiming that LINQ is not appropriate in this case (not sure why it'd be so but he seems to be very convinced). He's solution is this.
foreach(Hazaa hazaa in hazaas) ;
if(condition) ;
The other contra-suggestion is supported by the claim that Where risks to repeat the filtering process needlessly and that it's more certain to minimize the computational workload by picking the appropriate elements once for all by Select.
foreach(Hazaa hazaa in hazaas.Select(element => condition)) ;
I argue that the first is obsolete, since LINQ can handle data objects quite well.
I also believe that Select-ing is in this case equivalently fast to Where-ing and no needless steps will be taken (e.g. the evaluation of the condition on the elements will only be performed once). If anything, it should be faster using Where because we won't be creating an extra instance of anything.
Who's right?
Select is inappropriate. It doesn't filter anything.
if is a possible solution, but Where is just as explicit.
Where executes the condition exactly once per item, just as the if. Additionally, it is important to note that the call to Where doesn't iterate the list. So, using Where you iterate the list exactly once, just like when using if.
I think you are discussing with one person that didn't understand LINQ - the guy that wants to use Select - and one that doesn't like the functional aspect of LINQ.
I would go with Where.
The .Where() and the if(condition) approach will be the same.
But since LinQ is nicely readable i'd prefer that.
The approach with .Select() is nonsense, since it will not return the Hazaa-Object, but an IEnumerable<Boolean>
To be clear about the functions:
myEnumerable.Where(a => isTrueFor(a)) //This is filtering
myEnumerable.Select(a => a.b) //This is projection
Where() will run a function, which returns a Boolean foreach item of the enumerable and return this item depending on the result of the Boolean function
Select() will run a function for every item in the list and return the result of the function without doing any filtering.

Resharper: Possible Multiple Enumeration of IEnumerable

I'm using the new Resharper version 6. In several places in my code it has underlined some text and warned me that there may be a Possible multiple enumeration of IEnumerable.
I understand what this means, and have taken the advice where appropriate, but in some cases I'm not sure it's actually a big deal.
Like in the following code:
var properties = Context.ObjectStateManager.GetObjectStateEntry(this).GetModifiedProperties();
if (properties.Contains("Property1") || properties.Contains("Property2") || properties.Contains("Property3")) {
...
}
It's underlining each mention of properties on the second line, warning that I am enumerating over this IEnumerable multiple times.
If I add .ToList() to the end of line 1 (turning properties from a IEnumerable<string> to a List<string>), the warnings go away.
But surely, if I convert it to a List, then it will enumerate over the entire IEnumerable to build the List in the first place, and then enumerate over the List as required to find the properties (i.e. 1 full enumeration, and 3 partial enumerations). Whereas in my original code, it is only doing the 3 partial enumerations.
Am I wrong? What is the best method here?
I don't know exactly what your properties really is here - but if it's essentially representing an unmaterialized database query, then your if statement will perform three queries.
I suspect it would be better to do:
string[] propertiesToFind = { "Property1", "Property2", "Property3" };
if (properties.Any(x => propertiesToFind.Contains(x))
{
...
}
That will logically only iterate over the sequence once - and if there's a database query involved, it may well be able to just use a SQL "IN" clause to do it all in the database in a single query.
If you invoke Contains() on a IEnumerable, it will invoke the extension method which will just iterate through the items in order to find it. IList has real implementation for Contains() that probably are more efficient than a regular iteration through the values (it might have a search tree with hashes?), hence it doesn't warn with IList.
Since the extension method will only be aware that it's an IEnumerable, it probably can not utilize any built-in methods for Contains() even though it would be possible in theory to identify known types and cast them accordingly in order to utilize them.

Understanding how the C# compiler deals with chaining linq methods

I'm trying to wrap my head around what the C# compiler does when I'm chaining linq methods, particularly when chaining the same method multiple times.
Simple example: Let's say I'm trying to filter a sequence of ints based on two conditions.
The most obvious thing to do is something like this:
IEnumerable<int> Method1(IEnumerable<int> input)
{
return input.Where(i => i % 3 == 0 && i % 5 == 0);
}
But we could also chain the where methods, with a single condition in each:
IEnumerable<int> Method2(IEnumerable<int> input)
{
return input.Where(i => i % 3 == 0).Where(i => i % 5 == 0);
}
I had a look at the IL in Reflector; it is obviously different for the two methods, but analysing it further is beyond my knowledge at the moment :)
I would like to find out:
a) what the compiler does differently in each instance, and why.
b) are there any performance implications (not trying to micro-optimize; just curious!)
The answer to (a) is short, but I'll go into more detail below:
The compiler doesn't actually do the chaining - it happens at runtime, through the normal organization of the objects! There's far less magic here than what might appear at first glance - Jon Skeet recently completed the "Where clause" step in his blog series, Re-implementing LINQ to Objects. I'd recommend reading through that.
In very short terms, what happens is this: each time you call the Where extension method, it returns a new WhereEnumerable object that has two things - a reference to the previous IEnumerable (the one you called Where on), and the lambda you provided.
When you start iterating over this WhereEnumerable (for example, in a foreach later down in your code), internally it simply begins iterating on the IEnumerable that it has referenced.
"This foreach just asked me for the next element in my sequence, so I'm turning around and asking you for the next element in your sequence".
That goes all the way down the chain until we hit the origin, which is actually some kind of array or storage of real elements. As each Enumerable then says "OK, here's my element" passing it back up the chain, it also applies its own custom logic. For a Where, it applies the lambda to see if the element passes the criteria. If so, it allows it to continue on to the next caller. If it fails, it stops at that point, turns back to its referenced Enumerable, and asks for the next element.
This keeps happening until everyone's MoveNext returns false, which means the enumeration is complete and there are no more elements.
To answer (b), there's always a difference, but here it's far too trivial to bother with. Don't worry about it :)
The first will use one iterator, the second will use two. That is, the first sets up a pipeline with one stage, the second will involve two stages.
Two iterators have a slight performance disadvantage to one.

Categories

Resources