C# linq possible multiple enumeration best practices - c#

I sometimes use LINQ constructs in my C# source. I use VS 2010 with ReSharper. Now I'm getting "Possible multiple enumeration of IEnumerable" warning from ReSharper.
I would like to refactor it according to the best practices. Here's briefly what it does:
IEnumerable<String> codesMatching = from c in codes where conditions select c;
String theCode = null;
if (codesMatching.Any())
{
theCode = codesMatching.First();
}
if ((theCode == null) || (codesMatching.Count() != 1))
{
throw new Exception("Matching code either not found or is not unique.");
}
// OK - do something with theCode.
A question:
Should I first store the result of the LINQ expression in a List?
(I'm pretty sure it won't return more than a couple of rows - say 10 at the most.)
Any hints appreciated.
Thanks
Pavel

Since you want to verify if your condition is unique, you can try this (and yes, you must store the result):
var codes = (from c in codes where conditions select c).Take(2).ToArray();
if (codes.Length != 1)
{
throw new Exception("Matching code either not found or is not unique.");
}
var code = codes[0];

Yes, you need to store result as List\Array, and then use it. In that case it won't enumerate it a couple of times.
In your case if you need to be sure that there is just one item that satisfy condition, you can use Single - if there will be more than one item that satisfy conditions it will throw exception. If there will be no items at all, it also throw exception.
And your code will be easier:
string theCode = (from c in codes where conditions select c).Single();
But in that case you can't change exception text, or you need to wrap it into own try\catch block and rethrow it with custom text\exception

Finalizing enumerable with .ToList()/.ToArray() would get rid of the warning, but to understand if it is better than multiple enumerations or not would depend on codes and conditions implementations. .Any() and .First() are lazy primitives and won't execute past the first element and .Count() might not be hit at all, hence converting to a list might be more wasteful than getting a new enumerator.

Related

which of these linq queries are more performant? [duplicate]

What is the difference between these two Linq queries:
var result = ResultLists().Where( c=> c.code == "abc").FirstOrDefault();
// vs.
var result = ResultLists().FirstOrDefault( c => c.code == "abc");
Are the semantics exactly the same?
Iff sematically equal, does the predicate form of FirstOrDefault offer any theoretical or practical performance benefit over Where() plus plain FirstOrDefault()?
Either is fine.
They both run lazily - if the source list has a million items, but the tenth item matches then both will only iterate 10 items from the source.
Performance should be almost identical and any difference would be totally insignificant.
The second one. All other things being equal, the iterator in the second case can stop as soon as it finds a match, where the first one must find all that match, and then pick the first of those.
Nice discussion, all the above answers are correct.
I didn't run any performance test, whereas on the bases of my experience FirstOrDefault() sometimes faster and optimize as compare to Where().FirstOrDefault().
I recently fixed the memory overflow/performance issue ("neural-network algorithm") and fix was changing Where(x->...).FirstOrDefault() to simply FirstOrDefault(x->..).
I was ignoring the editor's recommendation to change Where(x->...).FirstOrDefault() to simply FirstOrDefault(x->..).
So I believe the correct answer to the above question is
The second option is the best approach in all cases
Where is actually a deferred execution - it means, the evaluation of an expression is delayed until its realized value is actually required. It greatly improves performance by avoiding unnecessary execution.
Where looks kind of like this, and returns a new IEnumerable
foreach (var item in enumerable)
{
if (condition)
{
yield return item;
}
}
FirstOrDefault() returns <T> and not throw any exception or return null when there is no result

C# foreach: Not reference to original objects, but copies?

I have some weird behaviour with a foreach-loop:
IEnumerable<Compound> loadedCompounds;
...
// Loop through the peaks.
foreach (GCPeak p in peaks)
{
// Loop through the compounds.
foreach (Compound c in loadedCompounds)
{
if (c.IsInRange(p) && c.SignalType == p.SignalType)
{
c.AddPeak(p);
}
}
}
So what I'd like to do: Loop through all the GCPeaks (it is a class) and sort them to their corresponding compounds.
AddPeak just adds the GCPeak to a SortedList. Code compiles and runs without exceptions, but the problem is:
After c.AddPeak(p) the SortedList in c contains the GCPeak (checked with Debugger), while the SortedLists in loadedCompounds remains empty.
I am quite confused with this bug I produced:
What is the reason for this behavior? Both Compound and GCPeak are classes so I'd expect references and not copies of my objects and my code to work.
How to do what I'd like to do properly?
EDIT:
This is how I obtain the IEnumarables (The whole thing is coming from an XML file - LINQ to XML). Compounds are obtained basically the same way.
IEnumerable<GCPeak> peaksFromSignal = from p in signal.Descendants("IntegrationResults")
select new GCPeak()
{
SignalType = signaltype,
RunInformation = runInformation,
RetentionTime = XmlConvert.ToDouble(p.Element("RetTime").Value),
PeakArea = XmlConvert.ToDouble(p.Element("Area").Value),
};
Thanks!
An IEnumerable won't hold a hard reference to your list. This causes two potential problems for you.
1) What you are enumerating might not be there anymore (for example if you were enumerating a list of facebook posts using a lazy technique like IEnumerable etc, but your connection to facebook for is closed, then it may evaluate to an empty enumerable. The same would occur if you were doing an IEnumerable over a database collection but that DB connection was closed etc.
2) Using an enumerable like that could lead you later to or previously to that to do a multiple enumeration which can have issues. Resharper typically warns against this (to prevent unintended consequences). See here for more info: Handling warning for possible multiple enumeration of IEnumerable
What you can do to debug your situation would be to use the LINQ extension of .toList() to force early evaluation of your IEnumerable. This will let you see what is in the IEnumerable easier and will let you follow this through your code. Do note that doing toList() does have performance implications as compared to a lazy reference like you have currently but it will force a hard reference earlier and help you debug your scenario and will avoid scenarios mentioned above causing challenges for you.
Thanks for your comments.
Indeed converting my loadedCompounds to a List<> worked.
Lesson learned: Be careful with IEnumerable.
EDIT
As requested, I am adding the implementation of AddPeak:
public void AddPeak(GCPeak peak)
{
if (peak != null)
{
peaks.Add(peak.RunInformation.InjectionDateTime, peak);
}
}
RunInformation is a struct.

LINQ Where(): are these two LINQ statements the same? [duplicate]

This question already has an answer here:
And difference between FirstOrDefault(func) & Where(func).FirstOrDefault()?
(1 answer)
Closed 9 years ago.
Can someone explain if these two Linq statements are same or do they differ in terms of execution. I am guessing the result of their execution is the same but please correct me if I am wrong.
var k = db.MySet.Where(a => a.Id == id).SingleOrDefault().Username;
var mo = db.MySet.SingleOrDefault(a => a.Id == id).Username;
Yes, both instructions are functionally equivalent and return the same result. The second is just a shortcut.
However, I wouldn't recommend writing it like this, because SingleOrDefault will return null if there is no item with the specified Id. This will result in a NullReferenceException when you access Username. If you don't expect it to return null, use Single, not SingleOrDefault, because it will give you a more useful error message if your expectation is not met. If you're not sure that a user with that Id exists, use SingleOrDefault, but check the result before accessing its members.
yes. these two linq statements are same.
but i suggest you wirte code like this:
var mo = db.MySet.SingleOrDefault(a => a.Id == id);
if(mo !=null)
{
string username=mo.Username;
}
var k = db.MySet.Where(a => a.Id == id).SingleOrDefault().Username;
var mo = db.MySet.SingleOrDefault(a => a.Id == id).Username;
You asked if they are equivalent...
Yes, they will return the same result, both in LINQ-to-Objects and in LINQ-to-SQL/Entity-Framework
No, they aren't equal, equal in LINQ-to-Objects. Someone had benchmarked them and discovered that the first one is a little faster (because the .Where() has special optimizations based on the type of db.MySet) Reference: https://stackoverflow.com/a/8664387/613130
They're different in the terms of actual code they will execute, but I can't see a situation in which they will give different results. In fact, if you've got Resharper installed, it will recommend you change the former into the latter.
However, I'd in general question why you ever want to do SingleOrDefault() without immediately following it with a null check.
Instead of checking for null I always check for default(T) as the name of the LINQ function implies too. In my opinion its a bit more maintainable code in case the type is changed into a struct or class.
They will both return the same result (or both will throw a NULL reference exception). However, the second one may be more efficient.
The first version will need to enumerate all values that meet the where condition, and then it will check to see if this returned only 1 value. Thus this version might need to enumerate over 100s of values.
The second version will check for only one value that meets the condition at the start. As soon as that version finds 2 values, it will thrown an exception, so it does not have the overhead of enumerating (possibly) 100s of values that will never be used.

If Linq Result Is Empty

If I have a linq query that looks like this, how can I check to see if there were no results found by the query?
var LinqResult =
from a in Db.Table
where a.Value0 == "ninja"
group a by a.Value1 into b
select new { Table = b};
if(LinqResult.Count() == 0) //?
{
}
You should try to avoid using the Count() method as a way to check whether a sequence is empty or not. Phil Haack has an excellent article on his blog where he discusses this antipattern.
Count() must actually enumerate all elements of the sequence - which may be expensive if the sequence is based on multiple LINQ operations (or comes from a database).
You should use the Any() extension method instead - which only attempts to see if there is at least one element in the list, but will not enumerate the entire sequence.
if( !LinqResult.Any() )
{
// your code
}
Personally, I also think that the use of Any() rather than Count() better expresses your intent, and is easier to refactor or change reliably in the future.
By the way, if what you actually want is the first (or only) member of the sequence, you should use either the First() or Single() operators instead.
if(!LinqResult.Any()) //?
{
}

The || (or) Operator in Linq with C#

I'm using linq to filter a selection of MessageItems. The method I've written accepts a bunch of parameters that might be null. If they are null, the criteria for the file should be ignored. If it is not null, use it to filter the results.
It's my understanding that when doing an || operation is C#, if the first expression is true, the second expression should not be evaluated.
e.g.
if(ExpressionOne() || ExpressionTwo())
{
// only ExpressionOne was evaluated because it was true
}
now, in linq, I'm trying this:
var messages = (from msg in dc.MessageItems
where String.IsNullOrEmpty(fromname) || (!String.IsNullOrEmpty(fromname) && msg.FromName.ToLower().Contains(fromname.ToLower()))
select msg);
I would have thought this would be sound, because String.IsNullOrEmpty(fromname) would equal true and the second part of the || wouldn't get run.
However it does get run, and the second part
msg.FromName.ToLower().Contains(fromname.ToLower()))
throws a null reference exception (because fromname is null)!! - I get a classic "Object reference not set to an instance of an object" exception.
Any help?
Have a read of this documentation which explains how linq and c# can experience a disconnect.
Since Linq expressions are expected to be reduced to something other than plain methods you may find that this code breaks if later it is used in some non Linq to Objects context.
That said
String.IsNullOrEmpty(fromname) ||
( !String.IsNullOrEmpty(fromname) &&
msg.FromName.ToLower().Contains(fromname.ToLower())
)
Is badly formed since it should really be
String.IsNullOrEmpty(fromname) ||
msg.FromName.ToLower().Contains(fromname.ToLower())
which makes it nice and clear that you are relying on msg and msg.FromName to both be non null as well.
To make your life easier in c# you could add the following string extension method
public static class ExtensionMethods
{
public static bool Contains(
this string self, string value, StringComparison comparison)
{
return self.IndexOf(value, comparison) >= 0;
}
public static bool ContainsOrNull(
this string self, string value, StringComparison comparison)
{
if (value == null)
return false;
return self.IndexOf(value, comparison) >= 0;
}
}
Then use:
var messages = (from msg in dc.MessageItems
where msg.FromName.ContainsOrNull(
fromname, StringComparison.InvariantCultureIgnoreCase)
select msg);
However this is not the problem. The problem is that the Linq to SQL aspects of the system are trying to use the fromname value to construct the query which is sent to the server.
Since fromname is a variable the translation mechanism goes off and does what is asked of it (producing a lower case representation of fromname even if it is null, which triggers the exception).
in this case you can either do what you have already discovered: keep the query as is but make sure you can always create a non null fromname value with the desired behaviour even if it is null.
Perhaps better would be:
IEnumerable<MessageItem> results;
if (string.IsNullOrEmpty(fromname))
{
results = from msg in dc.MessageItems
select msg;
}
else
{
results = from msg in dc.MessageItems
where msg.FromName.ToLower().Contains(fromname)
select msg;
}
This is not so great it the query contained other constraints and thus invovled more duplication but for the simple query actually should result in more readable/maintainable code. This is a pain if you are relying on anonymous types though but hopefully this is not an issue for you.
Okay. I found A solution.
I changed the offending line to:
where (String.IsNullOrEmpty(fromemail) || (msg.FromEmail.ToLower().Contains((fromemail ?? String.Empty).ToLower())))
It works, but it feels like a hack. I'm sure if the first expression is true the second should not get evaluated.
Would be great if anyone could confirm or deny this for me...
Or if anyone has a better solution, please let me know!!!
If you are using LINQ to SQL, you cannot expect the same C# short-circuit behavior in SQL Server. See this question about short-circuit WHERE clauses (or lack thereof) in SQL Server.
Also, as I mentioned in a comment, I don't believe you are getting this exception in LINQ to SQL because:
Method String.IsNullOrEmpty(String) has no supported translation to SQL, so you can't use it in LINQ to SQL.
You wouldn't be getting the NullReferenceException. This is a managed exception, it would only happen client-side, not in SQL Server.
Are you sure this is not going through LINQ to Objects somewhere? Are you calling ToList() or ToArray() on your source or referencing it as a IEnumerable<T> before running this query?
Update: After reading your comments I tested this again and realized some things. I was wrong about you not using LINQ to SQL. You were not getting the "String.IsNullOrEmpty(String) has no supported translation to SQL" exception because IsNullOrEmpty() is being called on a local variable, not an SQL column, so it is running client-side, even though you are using LINQ to SQL (not LINQ to Objects). Since it is running client-side, you can get a NullReferenceException on that method call, because it is not translated to SQL, where you cannot get a NullReferenceException.
One way to make your solution seem less hacky is be resolving fromname's "null-ness" outside the query:
string lowerfromname = String.IsNullOrEmpty(fromname) ? fromname : fromname.ToLower();
var messages = from msg in dc.MessageItems
where String.IsNullOrEmpty(lowerfromname) || msg.Name.ToLower().Contains(lowerfromname)
select msg.Name;
Note that this will not always be translated to something like (using your comments as example):
SELECT ... FROM ... WHERE #theValue IS NULL OR #theValue = theValue
Its translation will be decided at runtime depending on whether fromname is null or not. If it is null, it will translate without a WHERE clause. If it is not null, it will translate with a simple "WHERE #theValue = theValue", without null check in T-SQL.
So in the end, the question of whether it will short-circuit in SQL or not is irrelevant in this case because the LINQ to SQL runtime will emit different T-SQL queries if fromname is null or not. In a sense, it is short-circuited client-side before querying the database.
Are you sure it's 'fromname' that's null and not 'msg.FromName' that's null?
Like Brian said, I would look if the msg.FromName is null before doing the ToLower().Contains(fromname.ToLower()))
You are correct that the second conditional shouldn't get evaluated as you are using the short-circuit comparitors (see What is the best practice concerning C# short-circuit evaluation?), however I'd suspect that the Linq might try to optimise your query before executing it and in doing so might change the execution order.
Wrapping the whole thing in brackets also, for me, makes for a clearer statement as the whole 'where' condition is contained within the parenthases.

Categories

Resources