I'm struggling to get my head around LINQ and have come to the conclusion that searching through dozens of examples until I find one that is near to my own application in C# is not teaching me how to fish.
So back to the docs where I immediately hit a brick wall.
Can someone please help me decipher the Enumerable.Select method as presented here on msdn
http://msdn.microsoft.com/en-us/library/bb548891.aspx and given as a tip by Intellisense?
Enumerable.Select(TSource, TResult) Method (IEnumerable(TSource>), Func(TSource, TResult))
Here is the same line broken down with line numbers if it helps to refer:
Enumerable.Select
(TSource, TResult)
Method
(IEnumerable(TSource>),
Func
(TSource, TResult))
It might help to look at the definition of this method in C#, from the MSDN article you refer to:
public static IEnumerable<TResult> Select<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector
)
The <angle brackets> denote the type parameters for this generic method, and we can start to explore the purpose of the method simply by looking at what the type parameters are doing.
We begin by looking at the name of the generic method:
Select<TSource, TResult>
This tells us that the method called Select deals with two different types:
The type TSource; and
The type TResult
Let's look at the parameters:
The first parameter is IEnumerable<TSource> source — a source, providing a TSource enumeration.
The second parameter is Func<TSource, TResult> selector — a selector function that takes a TSource and turns it into a TResult. (This can be verified by exploring the definition of Func)
Then we look at its return value:
IEnumerable<TResult>
We now know this method will return a TResult enumeration.
To summarise, we have a function that takes an enumeration of TSource, and a selector function that takes individual TSource items and returns TResult items, and then the whole select function returns an enumeration of TResult.
An example:
To put this into concrete terms, lets say that TSource is of type Person (a class representing a person, with a name, age, gender, etc), and TResult is of type String (representing the person's name). We're going to give the Select function a list of Persons, and a function that, given a Person will select just their name. As the output of calling this Select function, we will get a list of Strings containing just the names of the people.
Aside:
The last piece of the puzzle from the original method signature, at the top, is the this keyword before the first parameter. This is part of the syntax for defining Extension Methods, and all it essentially means is that instead of calling the static Select method (passing in your source enumeration, and selector function) you can just invoke the Select method directly on your enumeration, just as if it had a Select method (and pass in only one parameter — the selector function).
I hope this makes it clearer for you?
The way to think of Select is as mapping each element of a sequence. Hence:
Enumerable.Select<TSource, TResult>: the Select method is parameterised by its source and result types (the type of thing you are mapping and the type you are mapping it to)
IEnumerable<TSource>: the sequence of things to map
Func<TSource, TResult>: the mapping function, that will be applied to each element of the source sequence
The result being an IEnumerable<TResult>, a sequence of mapping results.
For example, you could imagine (as a trivial example) mapping a sequence of integers to the string representations:
IEnumerable<string> strings = ints.Select(i => i.ToString());
Here ints is the IEnumerable<TSource> (IEnumerable<int>) and i => i.ToString() is the Func<TSource, TResult> (Func<int, string>).
I'm of the opinion that the later chapters of C# in Depth do a good job of explaining LINQ, and what it all means. Plus the rest of the book teaches a lof of other very useful C# knowledge.
Related
So when using EF Core and you use most of the Linq extensions you actually use System.Linq.Expressions instead of the usual Func.
So lets say you are using FirstOrDefault on a DbSet.
DbContext.Foos.FirstOrDefault(x=> x.Bar == true);
When you ctrl + lmb on FirstOrDefault it will show you the following overload:
public static TSource FirstOrDefault<TSource>(this IQueryable<TSource> source, Expression<Func<TSource, bool>> predicate)
But there is also an overload for Func:
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
When you want to store an expression in a variable you can do something like the following:
Func<Entity, bool> = x => x.Bar == true;
and
Expression<Func<Entity, bool>> = x => x.Bar == true;
So how does the compiler decide which overload should be used while using these extension methods?
The accepted answer is a reasonable explanation, but I thought I might provide a little more detail.
So lets say you are using FirstOrDefault on a DbSet. DbContext.Foos.FirstOrDefault(x=> x.Bar == true);
First off, I hope you would not write that. If you want to ask "is it raining?" do you ask "is it raining?" or do you ask "is the statement that it is raining a true statement?" Just say FirstOrDefault(x => x.Bar).
Next, given these overloads:
public static TSource FirstOrDefault<TSource>(
this IQueryable<TSource> source,
Expression<Func<TSource, bool>> predicate)
public static TSource FirstOrDefault<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
How does the compiler choose which overload is the best?
First we do type inference to determine what TSource is in each. The details of the type inference algorithm are complex; ask a more focussed question if you have a question about it.
If type inference fails to determine a type for TSource in either, the failed inference method is discarded from the set of candidates. In your example TSource can be determined to be Foo, presumably.
Next, of the candidates that remain, we check them for applicability of arguments to formals. That is, can we convert every supplied argument to its corresponding formal parameter type? (And of course, is the number of arguments provided correct, and so on.) In your example both methods are applicable.
Of the applicable candidates that remain, we now enter a round of betterness checking. How does betterness checking work? Again, we do it argument-by-argument. In this case we have two questions to answer:
DbContext.Foos can be converted to either IEnumerable<Foo> or IQueryable<Foo>. Which, if either, is the better conversion?
The lambda can be converted to either a delegate or an expression tree. Which, if either, is the better conversion?
The second question is easy to answer: neither is better. We learn nothing from this argument with respect to betterness.
To answer the first question, we apply the rule conversion to specific is better than conversion to general. If given a choice to convert to Giraffe or Mammal, converting to Giraffe is better. So now the question is which is more specific, IQueryable<Foo> or IEnumerable<Foo>?
The rule of specificity checking is straightforward: if X can be implicitly converted to Y but Y cannot be implicitly converted to X, then X is the more specific. A Giraffe can be used where an Animal is needed, but an Animal cannot be used where a Giraffe is needed, so Giraffe is more specific. Or: every giraffe is an animal, but not every animal is a giraffe, so giraffe is more specific.
By this measure, IQueryable<T> is more specific than IEnumerable<T> because every queryable is an enumerable but not every enumerable is a queryable.
So the queryable is more specific, and therefore that conversion is better.
Now we ask the question "is there a unique applicable candidate method where compared to every other candidate, at least one conversion was better and no conversion was worse?" There is; the queryable candidate has the property that it is better in one argument than every other, and not worse in every other argument, and it is the unique method that has this property.
Therefore overload resolution chooses that method.
I encourage you to read the specification if you have more questions.
Inherited class proximity matters more than exact method parameter types
Notice that the Expression<Func<T,bool>> variant applies to IQueryable<T>, whereas the Func<T, bool> variant applies to IEnumerable<T>.
When looking for a matching method, the compiler will always pick the one closest to the object's type. The inheritance hierarchy is as follows:
DbSet<T> : IQueryable<T> : IEnumerable<T>
Note: there may be other inheritances inbetween, but that doesn't matter. What matters is which is closest to DbSet<T>. IQueryable<T> is closer related to DbSet<T> than IEnumerable<T>.
Therefore, the compiler will try to find a matching method in IQueryable<T>. It asks two questions:
Does this type have a method by that name?
Do the method parameter types match/map?
IQueryable<T> has a FirstOrDefault method, so bullet point 1 is satisfied); and since x => x.MyBoolean can be implicitly converted to an Expression<Func<T, bool>>, bullet point 2 is also satisfied.
Therefore, you end up with the Expression<Func<T,bool>> variant defined on IQueryable<T>.
Suppose x => x.MyBoolean could not be implicitly converted to Expression<Func<T,bool>> but could be converted to Func<T,bool> (note: this isn't the case, but this could happen for other types/values), then bullet point 2 would not have been satisfied.
At this point, since the compiler did not find a match in IQueryable<T>, it will keep looking further, stumbling on IEnumerable<T> and ask itself the same questions (bullet points). Both bullet points would have been satified.
Therefore, in this case, you would've ended up with the Func<T,bool> variant defined on IEnumerable<T>.
Update
Here's a dotnetfiddle example.
Notice that even though I pass int values (which the base method signature uses), The double signature of the Derived class fits (because int implicitly converts to double) and the compiler never looks in the Base class.
However, this isn't true in Derived2. Since int does not implicitly convert to string, there is no match found in Derived2, and the compiler looks further in Base and uses the int method from Base.
I think the most useful place to look in the C# spec is Anonymous Function Expressions:
An anonymous function does not have a value or type in and of itself, but is convertible to a compatible delegate or expression tree type
...
In an implicitly typed parameter list, the types of the parameters are inferred from the context in which the anonymous function occurs—specifically, when the anonymous function is converted to a compatible delegate type or expression tree type, that type provides the parameter types.
Which then leads us to Anonymous Function Conversions:
A lambda expression F is compatible with an expression tree type Expression<D> if F is compatible with the delegate type D. Note that this does not apply to anonymous methods, only lambda expressions.
Those are the dry bits from the spec. However, it's also helpful to read Eric Lippert's How do we ensure that type inference terminates to pull bits together.
I was reading a solution on a coding exercise site and it was for determining if a sentence is a pangram and I came across this solution:
"abcdefghijklmnopqrstuvwxyz".All(input.ToLower().Contains);
For whatever reason, Contains() is not needed and this compiles just fine. I'm fairly inexperienced with the intricacies of LINQ, so I was wondering if anyone can answer or point me to an answer on this.
The reason Contains does not need the parentheses is because you are passing the function as the parameter to the All function and not the result of the function. If you look at the definition of All you see:
public static bool All<TSource> (this System.Collections.Generic.IEnumerable<TSource> source,
Func<TSource,bool> predicate);
'All' is expecting a Func<TSource,bool>. In this case TSource is char so All is expecting the given parameter to be a reference to a function that receives a character and returns a boolean - which is exactly what Contains does.
You could also write it the following way and it will result in the same output (but might look a bit more messy): (For the difference see #pinkfloydx33's comment below)
"abcdefghijklmnopqrstuvwxyz".All(c => input.ToLower().Contains(c));
Firstly, you should look at the input type.
Here, All method takes input: Func<TSource, bool> predicate
All method is: public static bool All<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
And Contains method is: public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value)
Now, we see that the Contains method is a delegate that is same as All method input predicate.
So, we pass the Contains method as a delegate, not the output of Contains method through the All method.
So, we should write:
"abcdefghijklmnopqrstuvwxyz".All(input.ToLower().Contains);
If we write : "abcdefghijklmnopqrstuvwxyz".All(input.ToLower().Contains());
then it return bool as All method input, that must not work.
But if we want to use parentheses then you may use it in this way:
"abcdefghijklmnopqrstuvwxyz".Contains(input)
I have some questions / concerns using LINQ in my projects. First question is - Is there is difference in performance between old (select Item from..) linq and new version (.Select(r => ..))?
Second question, How LINQ expresions is being translated (and in what)? Will it be translated to old syntax first and then to something else (intermediate language)?
There isn't any difference between the two ways we can write a linq query.
Specifically, this
var adults = from customer in customers
where customer.Age>18
select customer;
is equivalent to this:
var adults = customers.Where(customer=>customer.Age>18);
Actually, the compiler translates the first query to the second query. The first way of writing a linq query is something like a syntactic sugar. Under the hood, if you compile your code and then you make use of a dissasembler to see the IL code, you will notice that your query has been translated to the second one of the above forms.
Queries written with the first way, we say that we have used the query syntax. While queries written with the second way, we say that we have used the fluent syntax.
Is there is difference in performance between old (select Item from..) linq and new version (.Select(r => ..))?
Neither of these are older than the other, as both came into the language with at the same time. If anything .Select() could be argued as older as while the method call will almost always be a call to an extension method (and hence only available since .NET 3.5 and only callable that way with C# 3.0) there were method calls generally since 1.0.
There's no difference in performance, as they are different ways to say the same thing. (It's just about possible that you could find a case that resulted in a redundancy for one but not the other, but for the most part those redundancies are caught by the compiler and removed).
How LINQ expresions is being translated (and in what)? Will it be translated to old syntax first and then to something else (intermediate language)?
Consider that, as per the above, from item in someSource select item.ID and someSouce.Select(item => item.ID) are the same thing. The compiler has to do two things:
Determine how the call should be made.
Determine how the lambda should be used in that.
These two go hand in hand. The first part is the same as with any other method call:
Look for a method defined on the type of someSource that is called Select() and takes one parameter of the appropriate type (I'll come to "appropriate type" in a minute).
If no method is found, look for a method defined on the immediate base of the type of someSource, and so on until you have no more base classes to examine (after reaching object).
If no method is found, look for an extension method defined on a static class that is available to use through a using which has its first (this) parameter the type of someSource, and its second parameter of the appropriate type that I said I'll come back to in a minute.
If no method is found, look for a generic extension method that can accept the types of someSource and the lambda as parameters.
If no method is found, do the above two steps for the base types of someSource and interfaces it implements, continuing to further base types or interfaces those interfaces extend.
If no method is found, raise a compiler error. Likewise, if any of the above steps found two or more equally applicable method in the same step raise a compiler error.
So far this is the same as how "".IsNormalized() calls the IsNormalized() method defined on string, "".GetHashCode() calls the GetHashCode() method defined on object (though a later step means the override defined on string is what is actually executed) and "".GetType() calls the GetType() method defined on object.
Indeed we can see this in the following:
public class WeirdSelect
{
public int Select<T>(Func<WeirdSelect, T> ignored)
{
Console.WriteLine("Select Was Called");
return 2;
}
}
void Main()
{
int result = from whatever in new WeirdSelect() select whatever;
}
Here because WeirdSelect has its own applicable Select method, that is executed instead of one of the extension methods defined in Enumerable and Queryable.
Now, I hand-waved over "parameter of the appropriate type" above because the one complication that lambdas bring into this is that a lambda in C# code can be turned into either a delegate (in this case a Func<TSource, TResult> where TSource is the type of the lambdas parameter and TResult the type of the value it returns) or an expression (in this case a Expression<Func<TSource, TResult>>) in the produced CIL code.
As such, the method call resolution is looking for either a method that will accept a Func<TSource, TResult> (or a similar delegate) or one that will accept an Expression<Func<TSource, TResult>> (or a similar expression). If it finds both at the same stage in the search there will be a compiler error, hence the following will not work:
public class WeirdSelect
{
public int Select<T>(Func<WeirdSelect, T> ignored)
{
Console.WriteLine("Select Was Called");
return 2;
}
public int Select<T>(Expression<Func<WeirdSelect, T>> ignored)
{
Console.WriteLine("Select Was Called on expression");
return 1;
}
}
void Main()
{
int result = from whatever in new WeirdSelect() select whatever;
}
Now, 99.999% of the time we are either using select with something that implements IQueryable<T> or something that implements IEnumerable<T>. If it implements IQueryable<T> then the method call resolution will find public static IQueryable<TResult> Select<TSource, TResult>(this IQueryable<TSource> source, Expression<Func<TSource, TResult>> selector) defined in Queryable and if it implements IEnumerable<T> it will find public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector) defined in Enumerable. It doesn't matter that IQueryable<T> derives from IEnumerable<T> because its method will be found in an earlier step in the process described above, before IEnumerable<T> is considered as a base interface.
Therefore 99.999% of the time there will be a call made to one of those two extension methods. In the IQueryable<T> case the lambda is turned into some code that produces an appropriate Expression which is then passed to the method (the query engine then able to turn that into whatever code is appropriate, e.g. creating appropriate SQL queries if its a database-backed query engine, or something else otherwise). In the IEnumerable<T> case the lamda is turned into an anonymous delegate which is passed to the method which works a bit like:
public static IEnumerable<TResult> Select<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
//Simplifying a few things, this code is to show the idea only
foreach(var item in source)
yield return selector(item);
}
To come back to your question:
Will it be translated to old syntax first and then to something else (intermediate language)?
You could think of the newer from item in source select… syntax as being "turned into" the older source.Select(…) syntax (but not really older since it depends on extension methods over 99% of the time) because it makes the method call a bit clearer, but really they amount to the same thing. In the CIL produced the differences depend on whether the call was a instance method or (as is almost always the case) an extension method and even more so on whether the lambda is used to produce an expression or a delegate.
I am looking at code containing:
public virtual ICollection<T> GetPk(string pk)
{
Expression<Func<T, bool>> predicate = c => c.PartitionKey == pk;
return this.GetAll(predicate);
}
Can someone explain the syntax of <Func<T, bool>> ?
Simply Func<T, bool> is the anonymous method signature. The first type T is the input parameter type and the second type is the return type. This is more like a method when you consider your representation:
bool AnonMethod(T arg0)
{
return arg0.PartitionKey == pk;
}
One of the best explanation can be found at MSDN
You can use this delegate to represent a method that can be passed as a parameter without explicitly declaring a custom delegate. The encapsulated method must correspond to the method signature that is defined by this delegate. This means that the encapsulated method must have one parameter that is passed to it by value, and that it must return a value.
As for argument in your example T is the type of input parameter and bool is the return type of exacted method.
A Func<T, bool> represents a function that takes an object of type T and returns a bool. It's commonly referred to as a "predicate", and is used to verify a condition on an object.
An Expression<Func<T, bool>> represents the abstract syntax tree of the function, i.e. its syntactic structure. It can be used to analyse the code of the function for various purposes, such as transforming it to SQL to be executed against a database.
I always find MSDN to be worth checking on things like this first,
http://msdn.microsoft.com/en-us/library/bb549151.aspx
Beaten by Maheep, didn't see the post message pop-in.
Basically, you're declaring a method that matches a signature, that can then be passed in to the call to get the data.
It is confusing at first but Func<T, bool> describes a function that returns a bool and accepts a parameter as type T.
In this case, T is an object that has a PartitionKey property and this GetPk method is using the Func<T, bool> to match all the T items in the instance object which have a PartitionKey that matches the string pk.
For some background; prior to Func<T, TResult> (and the rest of this family) being part of the framework, you either had to explicitly define delegates or use anonymous methods.
Func and Action were added as part of the addition of lambda expressions to the language. They are the framework-defined delegates which lambda expressions are typed as, but which you as a developer can also use in place of your own custom delegate definitions.
You can get a nice history here;
http://blogs.msdn.com/b/ericwhite/archive/2006/10/03/lambda-expressions.aspx
It's additional syntax so you know what goes in and out of the function.
Func<T, bool> means:
function has 1 input T and 1 output that's bool.
This is other variations of the function
I have an issue, I am allowing a user to select the criterea for ordering a List
Lets say my list is called
List<Cars> AllCars = new List<Cars>;
allCars = //call the database and get all the cars
I now want to order this list
allCars.orderBy(registrationDate)
I understand the above doesn't work but i haven't anyidea what i should be putting in the brackets.
allCars.OrderBy(c => c.RegistrationDate);
I understand the above doesn't work but i haven't anyidea what i should be putting in the brackets.
The declaration of Enumerable.OrderBy is
public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector
)
and, as it's an extension method it can be invoked as
source.OrderBy(keySelector).
Your List<Car> is playing the role of source as List<T> : IEnumerable<T>. The second parameter is the more interesting one and the one that you are confused by. It's declared as being of type
Func<TSource, TKey>
This means that it is a delegate that eats instances of TSource (in your case Car) and returns an instance of TKey; it's up to you to decide what TKey is. You have stated that you want to order by Car.registrationDate so it sounds like TKey is DateTime. Now, how do we get one of these delegates?
In the old days we could say
DateTime GetRegistrationDate(Car car) {
return car.registrationDate;
}
and use OrderBy like so:
allCars.OrderBy(GetRegistrationDate).
In C# 2.0 we gained the ability to use anonymous delegates; these are delegates that don't have a name and are defined in-place.
allCars.OrderBy(delegate(Car car) { return car.registrationDate; });
Then, in C# 3.0 we gained the ability to use lambda expressions which are very special anonymous delegates with a compact notation
allCars.OrderBy(car => car.registrationDate);
Here, c => c.registrationDate is the lambda expression and it represents a Func<Car, DateTime> than can be used the second parameter in Enumerable.OrderBy.
allCars.orderBy(registrationDate)
The reason this doesn't work is because registrationDate is not a delegate. In fact, without any context at all registrationDate is meaningless to the compiler. It doesn't know if you mean Car.registrationDate or maybe you mean ConferenceAttendee.registrationDate or who knows what. This is why you must give additional context to the compiler and tell it that you want the property Car.registrationDate. To do this, you use a delegate in one of the three ways mentioned above.