Why do we need Single() in LINQ? - c#

Why is the main purpose of the extension method Single()?
I know it will throw an exception if more than an element that matches the predicate in the sequence, but I still don't understand in which context it could be useful.
Edit:
I do understand what Single is doing, so you don't need to explain in your question what this method does.

It's useful for declaratively stating
I want the single element in the list and if more than one item matches then something is very wrong
There are many times when programs need to reduce a set of elements to the one that is interesting based an a particular predicate. If more than one matches it indicates an error in the program. Without the Single method a program would need to traverse parts of the potentially expensive list more once.
Compare
Item i = someCollection.Single(thePredicate);
To
Contract.Requires(someCollection.Where(thePredicate).Count() == 1);
Item i = someCollection.First(thePredicate);
The latter requires two statements and iterates a potentially expensive list twice. Not good.
Note: Yes First is potentially faster because it only has to iterate the enumeration up until the first element that matches. The rest of the elements are of no consequence. On the other hand Single must consider the entire enumeration. If multiple matches are of no consequence to your program and indicate no programming errors then yes use First.

Using Single allows you to document your expectations on the number of results, and to fail early, fail hard if they are wrong. Unless you enjoy long debugging sessions for their own sake, I'd say it's enormously useful for increasing the robustness of your code.

Every LINQ operator returns a sequence, so an IEnumerable<T>. To get an actual element, you need one of the First, Last or Single methods - you use the latter if you know for sure the sequence only contains one element. An example would be a 1:1 ID:Name mapping in a database.

A Single will return a single instance of the class/object and not a collection. Very handy when you get a single record by Id. I never expect more than one row.

Related

Is it possible to provide an ordering guarantee for a collection?

I'm trying to create a method which (other than in name) shows that the ordering of some collection will be preserved.
I have considered SortedList, but dismissed it due to the requirement of holding a key. I have also dismissed other Sorted types for similar reasons, and SortedSet due to Linq returning IEnumerable instead of another SortedSet when you operate on it.
I don't mind if a new type is required, or I need to write methods in a specific way. The goal here is to highlight methods which preserve the input order of a collection when operating upon it.
I had thought about adding a custom attribute and just trusting that it will be used correctly, but I would ideally like to find something in the language which is more explicit.
-- Edit
It's not so much the order of the elements in the collection (I could use an IEnumerable), but some operation on the input collection. Let's say I were returning the root of all the numbers in an array, instead of returning (root, number)[], or (root, index)[] I want to return root[] and have it clear to the user that the order of the elements in the returned array matches the order of the elements in the input parameter.
No, there is nothing in C# or .Net that let you express and enforce "this method does not change order of elements in a collection / while iterating through collection".
Conventional expectation is order of elements stored in a collection preserved while iterating unless method/class explicitly named to indicate reordering.
Examples of "no reordering":
for / 'foreach`
.Select, .First, .Take, .SelectMany, .Where
indexing of collection that is not called "SortedXxxxx" - List, array.
Examples of "does reordering"
List.Sort, List.Reverse
.OrderBy, .ThenBy
classes that don't preserve/guarantee ordering like HashSet, Dictionary, OrderedDictionary, SortedList
Sounds like you want a queue. The first object added will be the first removed, so order of insertion is preserved. Typically objects are processed out of the queue with the Dequeue() method, but there is a Peek() method if you don't want to remove from the collection.
Beyond that, you'll probably need to roll your own implementation. It would likely just be a wrapper around a List<T>, where you prevent anything from being Inserted.

Intersection of two sets in most optimized way

Given two sets of values, I have to find whether there is any common element among them or not i.e. whether their intersection is null or not.
Which of the standard C# collection will suit best (in terms of performance) for this purpose? I know that linq has a Intersect extension method to find out the intersection of two list/arrays but my focus is on performance in terms of Big-O notation.
And what if I have to find out the intersection of two sets as well?
Well, if you use LINQ's Intersect method it will build up a HashSet of the second sequence, and then check each element of the first sequence against it. So it's O(M+N)... and you can use foo.Intersect(bar).Any() to get an early-out.
Of course, if you store one (either) set in a HashSet<T> to start with, you can just iterate over the other one checking for containment on each step. You'd still need to build the set to start with though.
Fundamentally you've got an O(M+N) problem whatever you do - you're not going to get cheaper than that (there's always the possibility that you'll have to look at every element) and if your hash codes are reasonable, you should be able to achieve that complexity easily. Of course, some solutions may give better constant factors than others... but that's performance rather than complexity ;)
EDIT: As noted in the comments, there's also ISet<T>.Overlaps - if you've already got either set with a static type of ISet<T> or a concrete implementation, calling Overlaps makes it clearer what you're doing. If both of your sets are statically typed as ISet<T>, use larger.Overlaps(smaller) (where larger and smaller are in terms of the size of the set) as I'd expect an implementation of Overlaps to iterate over the argument and check each element against contents of the set you call it on.
As mentioned , Applying Any() will give you some performance.
I tested it on pretty big dataset and it gave 25% improvements.
Also applying larger.Intersect(smaller) rather than the opposite is very important, in my case, it gave 35% improvements.
Also ordering the list before applying intersect gave another 7-8%.
Another thing to keep in mind that depending on the use case you can completely avoid applying intersect.
For example, for an integer list, if the maximum and minimum are not within the same bounders you don't need to apply intersect since they will never do.
The same goes for a string list with the same idea applied to first letter.
Again depending on your case, try as much as possible to find a rule where intersection is impossible to avoid calling it.

Is .Select<T>(...) to be prefered before .Where<T>(...)?

I got in a discussion with two colleagues regarding a setup for an iteration over an IEnumerable (the contents of which will not be altered in any way during the operation). There are three conflicting theories on which is the optimal approach. Both the others (and me as well) are very certain and that got me unsure, so for the sake of clarity, I want to check with an external source.
The scenario is as follows. We had the code below as a starting point and discovered that some of the hazaas need not to be acted upon. So, starting with the code below, we started to add a blocker for the action.
foreach(Hazaa hazaa in hazaas) ;
My suggestion is as follows.
foreach(Hazaa hazaa in hazaas.Where(element => condition)) ;
One of the guys wants to resolve it by a more explicit form, claiming that LINQ is not appropriate in this case (not sure why it'd be so but he seems to be very convinced). He's solution is this.
foreach(Hazaa hazaa in hazaas) ;
if(condition) ;
The other contra-suggestion is supported by the claim that Where risks to repeat the filtering process needlessly and that it's more certain to minimize the computational workload by picking the appropriate elements once for all by Select.
foreach(Hazaa hazaa in hazaas.Select(element => condition)) ;
I argue that the first is obsolete, since LINQ can handle data objects quite well.
I also believe that Select-ing is in this case equivalently fast to Where-ing and no needless steps will be taken (e.g. the evaluation of the condition on the elements will only be performed once). If anything, it should be faster using Where because we won't be creating an extra instance of anything.
Who's right?
Select is inappropriate. It doesn't filter anything.
if is a possible solution, but Where is just as explicit.
Where executes the condition exactly once per item, just as the if. Additionally, it is important to note that the call to Where doesn't iterate the list. So, using Where you iterate the list exactly once, just like when using if.
I think you are discussing with one person that didn't understand LINQ - the guy that wants to use Select - and one that doesn't like the functional aspect of LINQ.
I would go with Where.
The .Where() and the if(condition) approach will be the same.
But since LinQ is nicely readable i'd prefer that.
The approach with .Select() is nonsense, since it will not return the Hazaa-Object, but an IEnumerable<Boolean>
To be clear about the functions:
myEnumerable.Where(a => isTrueFor(a)) //This is filtering
myEnumerable.Select(a => a.b) //This is projection
Where() will run a function, which returns a Boolean foreach item of the enumerable and return this item depending on the result of the Boolean function
Select() will run a function for every item in the list and return the result of the function without doing any filtering.

Resharper: Possible Multiple Enumeration of IEnumerable

I'm using the new Resharper version 6. In several places in my code it has underlined some text and warned me that there may be a Possible multiple enumeration of IEnumerable.
I understand what this means, and have taken the advice where appropriate, but in some cases I'm not sure it's actually a big deal.
Like in the following code:
var properties = Context.ObjectStateManager.GetObjectStateEntry(this).GetModifiedProperties();
if (properties.Contains("Property1") || properties.Contains("Property2") || properties.Contains("Property3")) {
...
}
It's underlining each mention of properties on the second line, warning that I am enumerating over this IEnumerable multiple times.
If I add .ToList() to the end of line 1 (turning properties from a IEnumerable<string> to a List<string>), the warnings go away.
But surely, if I convert it to a List, then it will enumerate over the entire IEnumerable to build the List in the first place, and then enumerate over the List as required to find the properties (i.e. 1 full enumeration, and 3 partial enumerations). Whereas in my original code, it is only doing the 3 partial enumerations.
Am I wrong? What is the best method here?
I don't know exactly what your properties really is here - but if it's essentially representing an unmaterialized database query, then your if statement will perform three queries.
I suspect it would be better to do:
string[] propertiesToFind = { "Property1", "Property2", "Property3" };
if (properties.Any(x => propertiesToFind.Contains(x))
{
...
}
That will logically only iterate over the sequence once - and if there's a database query involved, it may well be able to just use a SQL "IN" clause to do it all in the database in a single query.
If you invoke Contains() on a IEnumerable, it will invoke the extension method which will just iterate through the items in order to find it. IList has real implementation for Contains() that probably are more efficient than a regular iteration through the values (it might have a search tree with hashes?), hence it doesn't warn with IList.
Since the extension method will only be aware that it's an IEnumerable, it probably can not utilize any built-in methods for Contains() even though it would be possible in theory to identify known types and cast them accordingly in order to utilize them.

Understanding how the C# compiler deals with chaining linq methods

I'm trying to wrap my head around what the C# compiler does when I'm chaining linq methods, particularly when chaining the same method multiple times.
Simple example: Let's say I'm trying to filter a sequence of ints based on two conditions.
The most obvious thing to do is something like this:
IEnumerable<int> Method1(IEnumerable<int> input)
{
return input.Where(i => i % 3 == 0 && i % 5 == 0);
}
But we could also chain the where methods, with a single condition in each:
IEnumerable<int> Method2(IEnumerable<int> input)
{
return input.Where(i => i % 3 == 0).Where(i => i % 5 == 0);
}
I had a look at the IL in Reflector; it is obviously different for the two methods, but analysing it further is beyond my knowledge at the moment :)
I would like to find out:
a) what the compiler does differently in each instance, and why.
b) are there any performance implications (not trying to micro-optimize; just curious!)
The answer to (a) is short, but I'll go into more detail below:
The compiler doesn't actually do the chaining - it happens at runtime, through the normal organization of the objects! There's far less magic here than what might appear at first glance - Jon Skeet recently completed the "Where clause" step in his blog series, Re-implementing LINQ to Objects. I'd recommend reading through that.
In very short terms, what happens is this: each time you call the Where extension method, it returns a new WhereEnumerable object that has two things - a reference to the previous IEnumerable (the one you called Where on), and the lambda you provided.
When you start iterating over this WhereEnumerable (for example, in a foreach later down in your code), internally it simply begins iterating on the IEnumerable that it has referenced.
"This foreach just asked me for the next element in my sequence, so I'm turning around and asking you for the next element in your sequence".
That goes all the way down the chain until we hit the origin, which is actually some kind of array or storage of real elements. As each Enumerable then says "OK, here's my element" passing it back up the chain, it also applies its own custom logic. For a Where, it applies the lambda to see if the element passes the criteria. If so, it allows it to continue on to the next caller. If it fails, it stops at that point, turns back to its referenced Enumerable, and asks for the next element.
This keeps happening until everyone's MoveNext returns false, which means the enumeration is complete and there are no more elements.
To answer (b), there's always a difference, but here it's far too trivial to bother with. Don't worry about it :)
The first will use one iterator, the second will use two. That is, the first sets up a pipeline with one stage, the second will involve two stages.
Two iterators have a slight performance disadvantage to one.

Categories

Resources