LINQ Any() and Single() vs. SingleOrDefault() with null check [closed] - c#

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
In what cases is each solution preferred over the other?
Example 1:
if (personList.Any(x => x.Name == "Fox Mulder"))
{
this.Person = personList.Single(x => x.Name == "Fox Mulder");
}
Example 2:
var mulder = personList.SingleOrDefault(x => x.Name == "Fox Mulder");
if (mulder != null)
{
this.Person = mulder;
}

Both Single and SingleOrDefault will enumerate the collection beyond the first matching result to verify that there is exactly one element matching the criteria, stopping at either the next match or the end of the collection. The first example will be slightly slower, since the Any call will enumerate enough of the collection (possibly all of it) to determine whether any elements meet the criteria, stopping at either the first match or the end of the collection.
There is one other critical difference: the first example could throw an exception. Single will return the matching element if there is exactly one, and throw an exception otherwise. Checking with Any does not verify this; it only verifies that there is at least one.
Based one these two reasons (primarily/especially the second reason), the SingleOrDefault approach is preferable here.
So, there are three cases here.
Case 1: No items match the condition
Option 1: .Any enumerates the entire set and returns false; .Single never executes.
Option 2: .SingleOrDefault enumerates the entire set and returns null.
Options essentially equivalent.
Case 2: Exactly one item matches the condition
Option 1: Any enumerates enough of the set to find the single match (could be the first item, could be the entire set). Next, Single enumerates the entire set to find that one item and confirm that no others match the condition.
Option 2: SingleOrDefault enumerates the entire set, returns the only match.
In this case, option 2 is better (exactly one iteration, compared to (1, 2] iterations)
Case 3: More than one element matches the condition
Option 1: Any enumerates enough to find the first match. Single enumerates enough to find the second match, throws exception.
Option 2: SingleOrDefault enumerates enough to find the second match, throws exception.
Both throw exceptions, but option 2 gets there more quickly.

Option 3:
this.Person = personList.FirstOrDefault(x => x.Name == "Fox Mulder");
if using Entity Framework and name is the primary key, then:
this.person = db.personList.Find("Fox Mulder");
Both of these work if this.Person is null coming in, or if the person isn't found, you expect this.Person to be overwritten with null. FirstOrDefault is the fastest since it will stop at the first record that matches instead of iterating through the entire collection, but it won't throw an exception if multiple are found. The entity framework solution is even better in that case because Find will use the EF cache, possibly not even having to hit the data source at all.

Extrapolating from the is v. as guidelines:
See below, from Casting vs using the 'as' keyword in the CLR:
// Bad code - checks type twice for no reason
if (randomObject is TargetType)
{
TargetType foo = (TargetType) randomObject;
// Do something with foo
}
By using Any and Single, you're also checking the list twice. And the same logic would seem to apply: Not only is this checking twice, but it may be checking different things, i.e., in a multi-threaded application the list could be different between the check and the assignment. In extreme cases the item found with Any might no longer exist when the call to Single is made.
Using this logic, I would go favor example two in all cases until given proof otherwise.

Related

FirstOrDefault function default value [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have array of objects and I would like to find index of some specific object inside this array:
int ix = Array.IndexOf(products, products.Where(item => item != null && item.Id == "xxx").FirstOrDefault());
Item with Id="xxx" doesn't exists, but the ix result is 1.
So, I guess that default for int is 1. How can I know if 1 belongs to first item or default value? It would be nice if I can set default value to -1.
At the end I have done it with findIndex method, but would like to know how to do it with indexOf method.
So you have two parts of one problem. First, you want to find a product:
var product = products.FirstOrDefault(item => item != null && item.Id == "xxx");
And when that product is found, you want to find its index in the products collection:
int index = Array.IndexOf(products, product);
You're halfway there using FirstOrDefault(). If a product with Id "xxx" does not exist, product will be null. So you can check for that and skip IndexOf() for null:
if (product == null)
{
return -1;
}
else
{
return Array.IndexOf(products, product);
}
The fact that your current code returns 1, means that products[1] is null.
Per Microsoft:
Sometimes the value of default(TSource) is not the default value that you want to use if the collection contains no elements. Instead of checking the result for the unwanted default value and then changing it if necessary, you can use the DefaultIfEmpty(IEnumerable, TSource) method to specify the default value that you want to use if the collection is empty. Then, call First(IEnumerable) to obtain the first element.
However, I'm not so sure that's your issue. Your code isn't syntactically correct (you're missing a ')' somewhere), but it appears you're calling FirstOrDefault() after your Where(). This will either return an element or null. Since you said an item with id "xxx" doesn't exist, it's going to check for the index of null in your array. The default value for indexOf is (again, per Microsoft) "the lower bound of the array minus 1." This should return -1 (I'd hope) instead of 1.
Conclusion: take a better look at your code and see what's really going on. Break up this linq statement into two different parts.
var item = products.Where(item => item != null && item.Id == "xxx").FirstOrDefault();
int ix = Array.IndexOf(products, item);
Then step through your code to check the values of everything. I'm sure you will find your issue, and it won't be what you expected.
If you want to call FirstOrDefault on a struct but the default value is the same as a valid one, here's one thing you can do:
(I won't use your code as the missing parenthesis prevents from knowing what your goal is)
myCollection.Where(myCondition).Cast<int?>().FirstOrDefault();
This way, 0 would be the first correct value, and null would mean that there is no correct values.
First or default returns the first element (in this case, the first item found on the where conditions) OR, the default value.
This would return an OBJECT if conditions are met (the first object that mets the condition).
However, if conditions aren't met, it would return null (the default value of an OBJECT),
soo your code would be trying to find indexOf(products,null).. that would be an NullReferenceException.
The firstOrDefault is doing his job under the OBJECT type that is inside the array.
After this, the result is passed as parameter on the method indexOf.
"indexOf" will return the index of the first object on the "where" condition.
If not found, indexOf will return -1.
By the way, u're missing an parenthesis.

Why does Resharper suggest that I simplify "not any equal" to "all not equal"? [duplicate]

This question already has answers here:
LINQ: Not Any vs All Don't
(8 answers)
Closed 7 years ago.
I need to check whether an item doesn't exist in a list of items in C#, so I have this line:
if (!myList.Any(c => c.id == myID)))
Resharper is suggesting that I should change that to:
if (myList.All(c => c.id != myID)))
I can see that they are equivalent, but why is it suggesting the change? Is the first implementation slower for some reason?
The readability of the expression is to me a personal opinion.
I would read this
if (!myList.Any(c => c.id == myID)))
as 'is my item not in the collection'. Where this
if (myList.All(c => c.id != myID)))
reads as 'are all items in the collection different than my item'.
If the 'question' I want to ask -through my linq query- is 'is my item not in the list', then the first query better suits the question that I want to ask. The ! in front of the first query is not a problem.
It's all too easy to miss the ! at the start of the expression in your first example. You are therefore making the expression difficult to read. In addition, the first example reads as "not any equal to", whereas the second is "all not equal to". It's no coincidence that the easier to read code can be expressed as easier to read English.
Easier to read code is likely to be less buggy, as it's easier to understand what it does before changing it. It's because the second example is clearer that ReSharper recommends changing your code.
In general asking a positive question is more intuitive. If you asked the user "Do you really not want to delete this record?", guess how often he will hit the wrong button.
I personally like to turn constructs like this around:
// Not optimal
if (!x) {
A();
} else }
B();
}
// Better
if (x) {
B();
} else }
A();
}
An exception might be the test for not null where a != null might be perceived as positive.

Is using LINQ against a single object considered a bad practice? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I don't mean this question to be too subjective.
I google'd this for some time but got no specific answers to address this issue. The thing is, I think I'm getting somewhat addicted to LINQ. I already used LINQ to query on lists among other things like using Linq to Sql, Xml, and so on. But then something struck me: "What if I used it to query a single object?" So I did. It may seem wrong like trying to kill a fly with a grenade launcher. Though we all agree it would be artistically pleasant to see.
I consider it very readable, I don't think there is any performance issues regarding to this, but let me show you an example.
In a web application, I need to retrieve a setting from my configuration file (web.config). But this should have a default value if the key is not present. Also, the value I need is a decimal, not a string, which is the default return from ConfigurationManager.AppSettings["myKey"]. Also, my number should not be more than 10 and it should not be negative. I know I could write this:
string cfg = ConfigurationManager.AppSettings["myKey"];
decimal bla;
if (!decimal.TryParse(cfg,out bla))
{
bla = 0; // 0 is the default value
}
else
{
if (bla<0 || bla>10)
{
bla = 0;
}
}
Which is not complicated, not convoluted, and easy to read. However, this is how I like it done:
// initialize it so the compiler doesn't complain when you select it after
decimal awesome = 0;
// use Enumerable.Repeat to grab a "singleton" IEnumerable<string>
// which is feed with the value got from app settings
awesome = Enumerable.Repeat(ConfigurationManager.AppSettings["myKey"], 1)
// Is it parseable? grab it
.Where(value => decimal.TryParse(value, out awesome))
// This is a little trick: select the own variable since it has been assigned by TryParse
// Also, from now on I'm working with an IEnumerable<decimal>
.Select(value => awesome)
// Check the other constraints
.Where(number => number >= 0 && number <= 10)
// If the previous "Where"s weren't matched, the IEnumerable is empty, so get the default value
.DefaultIfEmpty(0)
// Return the value from the IEnumerable
.Single();
Without the comments, it looks like this:
decimal awesome = 0;
awesome = Enumerable.Repeat(ConfigurationManager.AppSettings["myKey"], 1)
.Where(value => decimal.TryParse(value, out awesome))
.Select(value => awesome)
.Where(number => number >= 0 && number <= 10)
.DefaultIfEmpty(0)
.Single();
I don't know if I'm the only one here, but I feel the second method is much more "organic" than the first one. It's not easily debuggable, because of LINQ, but it's pretty failproof I guess. At least this one I wrote. Anyway, if you needed to debug, you could just add curly braces and return statements inside the linq methods and be happy about it.
I've been doing this for a while now, and it feels much more natural than doing things "line per line, step by step". Plus, I just specified the default value once. And it's written in a line which says DefaultIfEmpty so it's pretty straightforward.
Another plus, I definitely don't do it if I notice the query will be much larger than the one I wrote up there. Instead, I break into smaller chunks of linq glory so it will be easier to understand and debug.
I find it easier to see a variable assignment and automatically think: this is what you had to do to set this value, rather than look at ifs,elses,switches, and etc, and try to figure out if they're part of the formula or not.
And it prevents developers from writing undesired side effects in wrong places, I think.
But in the end, some could say it looks very hackish, or too arcane.
So I come with the question at hand:
Is using LINQ against a single object considered a bad practice?
I say yes, but it's really up to preference. It definitely has disadvantages, but I will leave that up to you. Your original code can become much simpler though.
string cfg = ConfigurationManager.AppSettings["myKey"];
decimal bla;
if (!decimal.TryParse(cfg,out bla) || bla < 0 || bla > 10)
bla = 0; // 0 is the default value
This works because of "short circuit" evaluation, meaning that the program will stop checking other conditions once the first true condition is found.

String parsing and matching algorithm

I am solving the following problem:
Suppose I have a list of software packages and their names might looks like this (the only known thing is that these names are formed like SOMETHING + VERSION, meaning that the version always comes after the name):
Efficient.Exclusive.Zip.Archiver-PROPER.v.122.24-EXTENDED
Efficient.Exclusive.Zip.Archiver.123.01
Efficient-Exclusive.Zip.Archiver(2011)-126.24-X
Zip.Archiver14.06
Zip-Archiver.v15.08-T
Custom.Zip.Archiver1.08
Custom.Zip.Archiver1
Now, I need to parse this list and select only latest versions of each package. For this example the expected result would be:
Efficient-Exclusive.Zip.Archiver(2011)-126.24-X
Zip-Archiver.v15.08-T
Custom.Zip.Archiver1.08
Current approach that I use can be described the following way:
Split the initial strings into groups by their starting letter,
ignoring spaces, case and special symbols.
(`E`, `Z`, `C` for the example list above)
Foreach element {
Apply the regular expression (or a set of regular expressions),
which tries to deduce the version from the string and perform
the following conversion `STRING -> (VERSION, STRING_BEFORE_VERSION)`
// Example for this step:
// 'Efficient.Exclusive.Zip.Archiver-PROPER.v.122.24-EXTENDED' ->
// (122.24, Efficient.Exclusive.Zip.Archiver-PROPER)
Search through the corresponding group (in this example - the 'E' group)
and find every other strings, which starts from the 'STRING_BEFORE_VERSION' or
from it's significant part. This comparison is performed in ignore-case and
ignore-special-symbols mode.
// The matches for this step:
// Efficient.Exclusive.Zip.Archiver-PROPER, {122.24}
// Efficient.Exclusive.Zip.Archiver, {123.01}
// Efficient-Exclusive.Zip.Archiver, {126.24, 2011}
// The last one will get picked, because year is ignored.
Get the possible version from each match, ***pick the latest, yield that match.***
Remove every possible match (including the initial element) from the list.
}
This algorithm (as I assume) should work for something like O(N * V + N lg N * M), where M stands for the average string matching time and V stands for the version regexp working time.
However, I suspect there is a better solution (there always is!), maybe specific data structure or better matching approach.
If you can suggest something or make some notes on the current approach, please do not hesitate to do this.
How about this? (Pseudo-Code)
Dictionary<string,string> latestPackages=new Dictionary<string,string>(packageNameComparer);
foreach element
{
(package,version)=applyRegex(element);
if(!latestPackages.ContainsKey(package) || isNewer)
{
latestPackages[package]=version;
}
}
//print out latestPackages
Dictionary operations are O(1), so you have O(n) total runtime. No pre-grouping necessary and instead of storing all matches, you only store the one which is currently the newest.
Dictionary has a constructor which accepts a IEqualityComparer-object. There you can implement your own semantic of equality between package names. Keep in mind however that you need to implement a GetHashCode method in this IEqualityComparer which should return the same values for objects that you consider equal. To reproduce the example above you could return a hash code for the first character in the string, which would reproduce the grouping you had inside your dictionary. However you will get more performance with a smarter hash code, which doesn't have so many collisions. Maybe using more characters if that still yields good results.
I think you could probably use a DAWG (http://en.wikipedia.org/wiki/Directed_acyclic_word_graph) here to good effect. I think you could simply cycle down each node till you hit one that has only 1 "child". On this node, you'll have common prefixes "up" the tree and version strings below. From there, parse the version strings by removing everything that isn't a digit or a period, splitting the string by the period and converting each element of the array to an integer. This should give you an int array for each version string. Identify the highest version, record it and travel to the next node with only 1 child.
EDIT: Populating a large DAWG is a pretty expensive operation but lookup is really fast.

Why does Enumerable.All return true for an empty sequence? [duplicate]

This question already has answers here:
Why does IQueryable.All() return true on an empty collection?
(11 answers)
Closed 7 years ago.
var strs = new Collection<string>();
bool b = strs.All(str => str == "ABC");
The code creates an empty collection of string, then tries to determine if all the elements in the collection are "ABC".
If you run it, b will be true.
But the collection does not even have any elements in it, let alone any elements that equal to "ABC".
Is this a bug, or is there a reasonable explanation?
It's certainly not a bug. It's behaving exactly as documented:
true if every element of the source sequence passes the test in the specified predicate, or if the sequence is empty; otherwise, false.
Now you can argue about whether or not it should work that way (it seems fine to me; every element of the sequence conforms to the predicate) but the very first thing to check before you ask whether something is a bug, is the documentation. (It's the first thing to check as soon as a method behaves in a way other than what you expected.)
All requires the predicate to be true for all elements of the sequence. This is explicitly stated in the documentation. It's also the only thing that makes sense if you think of All as being like a logical "and" between the predicate's results for each element. The true you're getting out for the empty sequence is the identity element of the "and" operation. Likewise, the false you get from Any for the empty sequence is the identity for logical "or".
If you think of All as "there are no elements in the sequence that are not", this might make more sense.
It is true, as nothing (no condition) makes it false.
The docs probably explain it. (Jon Skeet also mentioned something a few years back)
Same goes for Any (the opposite of All) returning false for empty sets.
Edit:
You can imagine All to be implemented semantically the same as:
foreach (var e in elems)
{
if (!cond(e))
return false;
}
return true; // no escape from loop
Most answers here seem to go along the lines of "because that's how is defined". But there is also a logical reason why is defined this way.
When defining a function, you want your function to be as general as possible, such that it can be applied to the largest possible number of cases. Say, for instance, that I want to define the Sum function, which returns the sum of all the numbers in a list. What should it return when the list is empty? If you'd return an arbitrary number x, you'd define the function as the:
Function that returns the sum of all numbers in the given list, or x if the list is empty.
But if x is zero, you can also define it as the
Function that returns x plus the given numbers.
Note that definition 2 implies definition 1, but 1 does not imply 2 when x is not zero, which by itself is enough reason to pick 2 over 1. But also note 2 is more elegant and, in its own right, more general than 1. Is like placing a spotlight farther away so that it lightens a larger area. A lot larger actually. I'm not a mathematician myself but I'm sure they'll find a ton of connections between definition 2 and other mathematical concepts, but not so many related to definition 1 when x is not zero.
In general, you can, and most likely want to return the identity element (the one that leaves the other operand unchanged) whenever you have a function that applies a binary operator over a set of elements and the set is empty. This is the same reason a Product function will return 1 when the list is empty (note that you could just replace "x plus" with "one times" in definition 2). And is the same reason All (which can be thought of as the repeated application of the logical AND operator) will return true when the list is empty (p && true is equivalent to p), and the same reason Any (the OR operator) will return false.
The method cycles through all elements until it finds one that does not satisfy the condition, or finds none that fail. If none fail, true is returned.
So, if there are no elements, true is returned (since there were none that failed)
Here is an extension that can do what OP wanted to do:
static bool All<T>(this IEnumerable<T> source, Func<T, bool> predicate, bool mustExist)
{
foreach (var e in source)
{
if (!predicate(e))
return false;
mustExist = false;
}
return !mustExist;
}
...and as others have pointed out already this is not a bug but well-documented intended behavior.
An alternative solution if one does not wish to write a new extension is:
strs.DefaultIfEmpty().All(str => str == "ABC");
PS: The above does not work if looking for the default value itself!
(Which for strings would be null.)
In such cases it becomes less elegant with something similar to:
strs.DefaultIfEmpty(string.Empty).All(str => str == null);
If you can enumerate more than once the easiest solution is:
strs.All(predicate) && strs.Any();
i.e simply add a check after that there actually were any element.
Keeping the implementation aside. Does it really matter if it is true? See if you have some code which iterates over the enumerable and executes some code. if All() is true then that code is still not going to run since the enumerable doesn't have any elements in it.
var hungryDogs = Enumerable.Empty<Dog>();
bool allAreHungry = hungryDogs.All(d=>d.Hungry);
if (allAreHungry)
foreach (Dog dog in hungryDogs)
dog.Feed(biscuits); <--- this line will not run anyway.

Categories

Resources