I am trying to understand LINQ and become confident at using it. What I am struggling with are the parameters asked for. Example:
var sortedWords = words.OrderBy(a=>a.Length)
words is an array collection. OrderBy's intellisense says:
Func<string, TKey> keyselector
A func executes a method, and a string is the value, TKey a key.
In the example http://msdn.microsoft.com/en-us/vcsharp/aa336756.aspx#thenBySimple (ThenBy - Comparer), we compare length by saying a => a.Length. I understand that syntax, but how is that related to what the intellisense is asking for?
I tend to find the method signature and intellisense unreadable because of all the generics.
Thanks.
The type (as displayed by Intellisense) makes sense if you understand the nature of lambda expressions in .NET/C#. Otherwise, it can indeed seem a bit strange to the newcomer. Begin by considering that the type of keySelector, Func<TSource, TKey> is simply a delegate. Before C# 3.0, you would call such a method by passing a delegate as a parameter, for example:
IEnumerable<string> sortedWords = words.OrderBy(new Func<string, int>(mySelectorMethod));
where mySelectorMethod is the name of an ordinary method which takes a string as a parameter and returns an int. (As a side point, I suppose you could use anonymous delegates, but let's not go there for now.) Also, note that this example is purely illustrative, as LINQ is almost always used with .NET 3.5/C# 3.0 (though I believe it can be used with either/both .NET 2.0/C# 2.0 - someone correct me if I'm wrong). Since C# 3.0, methods can be defined inline as lambda expressions, which are intended to be used in precisely such circumstances as these. Read over the MSDN article on lambda expressions (linked above) if you want to get a proper introduction, but here I will simply describe the use in this specific context. As you state, your code (in C# 3.0) is something like the following:
var sortedWords = words.OrderBy(a => a.Length);
The part of the expression that is a => a.Length is the lambda expression, which is really just shorthand for declaring a function inline. The syntax of lambda expressions is quite simple for the most part; to the left of the => the arguments are specified, generally in the form (arg1, arg2, arg3), but since there's only one in this case you can omit the brackets. To the right of the => is the expression that is the return value of the function (lambda expression more accurately). Alternatively you can enclose actual code with a return statement within { and } though this is usually unnecessary. What I believe the C# compiler does is recognises the parameter passed to OrderBy as a lambda expression and then compiles it into a function and creates and passes the delegate for you. Note that lambda expressions can also be converted to System.Linq.Expressions.Expression objects (accessible expression trees) instead of delegates, but this is a much less common usage. Anyway, there's a lot going on behind the scenes here, but hopefully this should at least clarify why the type is Func<TSource, TKey> and how it relates to the lambda expression. As I said, read up on MSDN if you want a deeper understanding of LINQ/lambdas/delegates...
a => a.Length
I understand that syntax, but how is that related to what the intellisense is asking for?
This chunk of code is a lambda expression. A lambda expression is a handy way of generating an Anonymous method (in this case), or a System.Linq.Expressions.Expression . Let's break it down by parts.
The most noticeable feature is the =>, which seperates parameters from a method body.
On the left side of the =>, there is a symbol: a. This is the declaration of a parameter for our anonymous method. The compiler is aware that we are calling OrderBy(), and that OrderBy requires a Func<string, object>. The parameter for such a function is a string, so the compiler determines that a must be a string. The only thing the programmer needed to provide is a name.
On the right side of the =>, there is method body. Since this is a one-liner, the return keyword is implied. The IDE provides intellisense against a as a string, which allows you to use the Length property.
Now, consider this C# 2.0 ...
IEnumerable<string> sortedWords =
Enumerable.OrderBy(words, delegate(string a) {return a.Length;});
With the C# 3.0
IEnumerable<string> sortedWords = words
.OrderBy(a => a.Length);
I think the IntelliSense is actually quite useful, especially for generic methods that take Func<..> type as an argument, because you can see the types and types guide you to understand what the method might do.
For example, the arguments for OrderBy are IEnumerable<string> as a 'this' argument, which means that we have some input containing collection of strings. The first argument keySelector has a type Func<string, TKey>, which means that it is some lambda expression you provide that specifies how to get TKey from string.
This already suggests that the method will probably enumerate over all the items (strings) in the collection and it can use the keySelector to get value of type TKey from every element in the collection. The name TKey already suggests that it will use this value to compare the elements (strings) using this calculated key. However, if you look at the other overload that takes IComparer<TKey> then you can be sure about this - this argument specifies more details about how do you want to compare two values of type TKey, so the function must compare the elements using this key.
... this kind of thinking about types takes some time to get used to, but once you'll learn it, it can be extremely helpful. It is more useful in "functional" style of code, which often uses a lot of generics and lamdba expressions in C# 3.0 (and similar things in functional languages like F# or others)
I never really worry about the Intellisense, to be honest. It was messing me up early in my Linqage. As I spent more time with generics and expressions, it started to make sense, but up until then I just pumped in the syntax.
What it wants is a lambda expression that tells Linq what to look for in order to sort your collection.
I feel you, my brother, hang in there and it will make sense very soon.
OrderBy() takes a delegate for a function that accepts a single parameter (in your case, a string) and returns a value of the type that is substituted for TKey. It may be that the parameter type (string) was already determined since you called the method on an IEnumerable<string> but the delegate type will only be resolved as Func<string, int> after it infers it from the lambda expression when it is fully specified (i.e., a => a.Length). If you haven't given the parser any clues as to what you want as a sort key, it'll just show TKey in IntelliSense until it can determine the intended type.
Related
I was wondering how the the compiler/runtime determines a lambda expression's type?
For example, the following System.Linq Select extension method (not select query)
//var recordIds = new List<int>(records.Select(r => r?.RecordId ?? 0));
//var recordIds = new List<int>(records.Where(r => r != null).Select(r => r.RecordId));
var recordIds = new List<int>(records.Select(r => r.RecordId));
is defined as
Enumerable.Select<TSource, TResult> Method (IEnumerable<TSource>, Func<TSource, TResult>)
and so takes the lambda r => r.RecordId as a Func<TSource, TResult>.
How is the lambda's type determined, and once it is, is it simply cast to that type?
I was wondering how the the compiler/runtime determines a lambda expression's type?
It is rather complicated. Implementing this feature was the "long pole" for the version of Visual Studio that shipped C# 3 -- so every day that I was over schedule on this was a day that VS would slip! -- and the feature has only gotten more complicated with the introduction of covariance and contravariance in C# 4.
As others have noted, consult the specification for exact details. You might also read the various articles I've written about it over the years.
I can give you a quick overview though. Suppose we have
records.Select(r => r.RecordId)
where records is of type IEnumerable<Record> and RecordId is of type int.
The first question is "does IEnumerable<Record> have any applicable method called Select? No, it does not. We therefore go to the extension method round.
The second question then is: "is there any static type with an accessible extension method that is applicable to this call?"
SomeType.Select(records, r => r.RecordId)
Let's suppose that Enumerable is the only such type. It has two versions of Select, and one of them takes a lambda that takes two parameters. We can automatically discard that one. That leaves us with, as you note:
static IEnumerable<R> Select<A, R>(IEnumerable<A>, Func<A, R>)
Third question: Can we deduce the type arguments corresponding to type parameters A and R?
In the first round of type inference we consider all the non-lambda arguments. We only have one. We deduce that A could be Record. However, IEnumerable<T> is covariant, so we make a note that it could be a type more general than Record as well. It cannot be a type that is more specific than Record though.
Now we ask "are we done?" No. We still don't know R.
"Are there any inferences left to make?" Yes. We haven't checked the lambda yet.
"Are there any contradictions or additional facts to know about A?" Nope.
We therefore "fix" A to Record and move on. What do we know so far? We have:
static IEnumerable<R> Select<Record, R>(IEnumerable<Record>, Func<Record, R>)
We then say OK, the argument must be (Record r) => r.RecordId. Can we infer the return type of this lambda knowing that? Plainly yes, it is int. So we put a note on R saying that it could be int.
Are we done? Yes. Is there anything else we can make a deduction about? No. Did we infer all the type parameters? Yes. So we "fix" R to int, and we're done.
Now we do a final round that checks to make sure that Select<Record, int> does not produce any errors; for example, if Select had a generic constraint that was violated by <Record, int> we would reject it at this point.
We've deduced that records.Select(r=>r.RecordId) means the same thing as Enumerable.Select<Record, int>(records, (Record r) => { return (int)r.RecordId; } ) and that's a legal call to that method, so we're done.
Things get rather more complicated when you introduce multiple bounds. See if you can figure out how something like a Join works where there are four type parameters to infer.
I think the best place to look for such information is the language spec:
Section 7.15
An anonymous function is an expression that represents an “in-line”
method definition. An anonymous function does not have a value or type
in and of itself, but is convertible to a compatible delegate or
expression tree type. The evaluation of an anonymous function
conversion depends on the target type of the conversion: If it is a
delegate type, the conversion evaluates to a delegate value
referencing the method which the anonymous function defines. If it is
an expression tree type, the conversion evaluates to an expression
tree which represents the structure of the method as an object
structure.
The above explains the nature of lambda expressions, basically that they have no types by themselves, unlike say, an int literal 5. Another point to note here is that the lambda expression is not casted to the delegate type like you said. It is evaluated, just like how 3 + 5 is evaluated to be of int type.
Exactly how these types are figured out (type inference) is described in section 7.5.2.
...assume that the generic method has the following signature:
Tr M<X1…Xn>(T1
x1 … Tm xm)
With a method call of the form M(E1 …Em) the
task of type inference is to find unique type arguments
S1…Sn for each of the type parameters
X1…Xn so that the call
M1…Sn>(E1…Em)becomes
valid.
The whole algorithm is quite well documented but it's kind of long to post it here. So here's a link to download the language spec.
https://msdn.microsoft.com/en-us/library/ms228593(v=vs.120).aspx
When constructing an expression tree, I have to use nodes invoking external methods in order to obtain values the expression could then continue evaluation with.
These methods are supplied as Func<T> and my code has no knowledge of where they originate from.
What is the most correct way of performing the mentioned invocation? I've tried something like this:
private Dictionary<string, Delegate> _externalSymbols;
private Expression _forExternalSymbol(string identifier)
{
Delegate method = _externalSymbols[identifier];
return Expression.Call(method.Method);
}
which works as long as the method fetched from the dictionary was created in compile-time. However, in case of Func<T> being a dynamic method obtained, for instance, by compiling another expression in runtime, this won't work out throwing
ArgumentException: Incorrect number of arguments supplied for call to method 'Int32 lambda_method(System.Runtime.CompilerServices.ExecutionScope)'
The desired effect may be achieved by wrapping the given function into one extra expression, but that seems quite hideous comparing to what it used to look like:
private Expression _forExternalSymbol(string identifier)
{
Delegate method = _externalSymbols[identifier];
Expression mediator = method is Func<double> ?
(Expression)(Expression<Func<double>>)(() => ((Func<double>)method)()) :
(Expression<Func<string>>)(() => ((Func<string>)method)());
return Expression.Invoke(mediator);
}
Also, this is hardly an extensible approach should I need to add support for types other than double and string.
I would like to know if there are better options which would work with dynamically created methods (preferably applicable to .NET 3.5).
which works as long as the method fetched from the dictionary was created in compile-time
No, it works as long as the method is static. For example, it also won't work if the delegate is a lambda that references something from its parent score (i.e. it's a closure).
The correct way to invoke a delegate is to use Expression.Invoke(). To get an Expression that represents your delegate, use Expression.Constant():
Expression.Invoke(Expression.Constant(method)))
I'm wondering what exactly is the difference between wrapping a delegate inside Expression<> and not ?
I'm seeing Expression<Foo> being used a lot with LinQ, but so far I've not found any article that explains the difference between this, and just using a delegate.
E.g.
Func<int, bool> Is42 = (value) => value == 42;
vs.
Expression<Func<int, bool>> Is42 = (value) => value == 42;
tl;dr, To have an expression is like having the source code of an application, and a delegate is an executable to run the application. An expression can be thought of as the "source" (i.e., syntax tree) of the code that would run. A delegate is a specific compilation that you would run and does the thing.
By storing a lambda as a delegate, you are storing a specific instance of a delegate that does some action. It can't be modified, you just call it. Once you have your delegate, you have limited options in inspecting what it does and whatnot.
By storing a lambda as an expression, you are storing an expression tree that represents the delegate. It can be manipulated to do other things like changing its parameters, changing the body and make it do something radically different. It could even be compiled back to a delegate so you may call it if you wish. You can easily inspect the expression to see what its parameters are, what it does and how it does it. This is something that a query provider can use to understand and translate an expression to another language (such as write an SQL query for a corresponding expression tree).
It is also a whole lot easier to create a delegate dynamically using expressions than it is emitting the code. You can think of your code at a higher level as expressions that is very similar to how a compiler views code instead of going low-level and view your code as IL instructions.
So with an expression, you are capable to do much more than a simple anonymous delegate. Though it's not really free, performance will take a hit if you run compiled expressions compared to a regular method or an anonymous delegate. But that might not be an issue as the other benefits to using expressions may be important to you.
Func<> is just a delegate type. An Expression is a runtime representation of the complete tree of operations which, optionally, may be compiled at runtime into a delegate. It's this tree that is parsed by Expression parsers like Linq-to-SQL to generate SQL statements or do other clever things. When you assign a lambda to an Expression type, the compiler generates this expression tree as well as the usual IL code. More on expression trees.
To illustrate other answers, if you compile those 2 expressions and have look at the compiler generated code, this i what you will see:
Func<int, bool> Is42 = (value) => value == 42;
Func<int, bool> Is42 = new Func<int, bool>((#value) => value == 42);
Expression<Func<int, bool>> Is42 = (value) => value == 42;
ParameterExpression[] parameterExpressionArray;
ParameterExpression parameterExpression = Expression.Parameter(typeof(int), "value");
Expression<Func<int, bool>> Is42 = Expression.Lambda<Func<int, bool>>(Expression.Equal(parameterExpression, Expression.Constant(42, typeof(int))), new ParameterExpression[] { parameterExpression });
Expression Trees allow you to inspect the code inside the expression, in your code.
For example, if you passed this expression: o => o.Name, your code could find out that the Name property was being accessed inside the expression.
Provides the base class from which the classes that represent
expression tree nodes are derived.
System.Linq.Expressions.BinaryExpression
System.Linq.Expressions.BlockExpression
System.Linq.Expressions.ConditionalExpression
System.Linq.Expressions.ConstantExpression
System.Linq.Expressions.DebugInfoExpression
System.Linq.Expressions.DefaultExpression
System.Linq.Expressions.DynamicExpression
System.Linq.Expressions.GotoExpression
System.Linq.Expressions.IndexExpression
System.Linq.Expressions.InvocationExpression
System.Linq.Expressions.LabelExpression
System.Linq.Expressions.LambdaExpression
System.Linq.Expressions.ListInitExpression
System.Linq.Expressions.LoopExpression
System.Linq.Expressions.MemberExpression
System.Linq.Expressions.MemberInitExpression
System.Linq.Expressions.MethodCallExpression
System.Linq.Expressions.NewArrayExpression
System.Linq.Expressions.NewExpression
System.Linq.Expressions.ParameterExpression
System.Linq.Expressions.RuntimeVariablesExpression
System.Linq.Expressions.SwitchExpression
System.Linq.Expressions.TryExpression
System.Linq.Expressions.TypeBinaryExpression
System.Linq.Expressions.UnaryExpression
http://msdn.microsoft.com/en-us/library/system.linq.expressions.expression.aspx
Expression tree represents linq expression that can be analyzed and for example turned into SQL query.
To whatever the other wrote (that is completely correct) I'll add that through the Expression class you can create new methods at runtime. There are some limits. Not all the things you can do in C# can be done in an Expression tree (at least in .NET 3.5 . With .NET 4.0 they have added a great number of possible Expression "types"). The use of this could be (for example) to create a dynamic query and pass it to LINQ-to-SQL or do some filtering based on the input of the user... (you could always do this with CodeDom if all you wanted was a dynamic method incompatible with LINQ-to-SQL, but emitting directly IL code is quite difficult :-) )
I'm using the Dynamic.ParseLambda method from the Dynamic LINQ library to create expressions, compile each to a Func<>, and cache each in a dictionary:
// parse some dynamic expression using this ParseLambda sig:
Expression<Func<TArgument,TResult>> funcExpr =
System.Linq.Dynamic.ParseLambda<TArgument, TResult>(
expressionString, // string for dyn lambda expression
parameters); // object[] params
// then compile & cache the output of this as a delegate:
Func<TArgument,TResult> func = funcExpr.Compile(); //<-cache this
// then to execute, use:
return func(entityInstance);
The problem is, this forces me to cache a different delegate instance for every distinct set of parameter values. This seems kind of wasteful; all the overhead with Dynamic LINQ is in the parsing and compilation; once created, the delegates are near directly coded lambdas in performance. Is there any way to move the params outside of the expression, so I can pass different values to a common cached delegate in when I call it (instead of when I'm creating it)?
// e.g. with params...
return func(entityInstance,parameters);
// or if params are the issue, multiple signatures are ok:
return func(entityInstance, intValue, stringValue);
I don't see any parameter-free .ParseLambda or .Compile signatures in System.Linq.Dynamic, so my hopes aren't high. Anyone know of a quick way to achieve this?
Thanks!
There is a trick here, that I have used before; you do something like an Expression<Func<object[], object>>, and embed the fetch-by-index and casting inside the expression. Then you have as many parameters as you could want, a single signature, and reasonable performance. It does, however, make writing the lambda a bit tricker.
I don't have my old code for this "to hand", but if I needed to reverse engineer it, I would simply write something typical like the following, and then look in reflector to see what it used:
Expression<Func<object[], object>> func = arr => ((string)arr[0]) + (int)arr[1];
(in particular, paying attention to the indexer usage, the casting from inputs, and the casting to output)
Here the context for my question:
A common technique is to declare the parameter of a method as a Lambda expression rather than a delegate. This is so that the method can examine the expression to do interesting things like find out the names of method calls in the body of the delegate instance.
Problem is that you lose some of the intelli-sense features of Resharper. If the parameter of the method was declared as a delegate, Resharper would help out when writing the call to this method, prompting you with the x => x syntax to supply as the argument value to this method.
So... back to my question I would like to do the follow:
MethodThatTakesDelegate(s => s.Length);
}
private void MethodThatTakesDelegate(Func<string, object> func)
{
//convert func into expression
//Expression<Func<string, object>> expr = "code I need to write"
MethodThatTakesExpression(expr);
}
private void MethodThatTakesExpression(Expression<Func<string, object>> expr)
{
//code here to determine the name of the property called against string (ie the Length)
}
Everywhere that you're using the term "lambda expression" you actually mean "expression tree".
A lambda expression is the bit in source code which is
parameters => code
e.g.
x => x * 2
Expression trees are instances of the System.Linq.Expressions.Expression class (or rather, one of the derived classes) which represent code as data.
Lambda expressions are converted by the compiler into either expression trees (or rather, code which generates an expression tree at execution time) or delegate instances.
You can compile an instance of LambdaExpression (which is one of the subclasses of Expression) into a delegate, but you can't go the other way round.
In theory it might be possible to write such a "decompiler" based on the IL returned by MethodBase.GetMethodBody in some situations, but currently there are various delegates which can't be represented by expression trees. An expression tree represents an expression rather than a statement or statement block - so there's no looping, branching (except conditionals), assignment etc. I believe this may change in .NET 4.0, though I wouldn't expect a decompilation step from Microsoft unless there's a really good reason for one.
I don't believe it's possible to achieve what you'd like here. From the comments in your code it looks like you are attempting to capture the name of the property which did the assignment in MethodThatTakesExpression. This requires an expression tree lambda expression which captures the contexnt of the property access.
At the point you pass a delegate into MethodThatTakesDelegate this context is lost. Delegates only store a method address not any context about the method information. Once this conversion is made it's not possible to get it back.
An example of why this is not possible is that there might not even be a named method backing a delegate. It's possible to use ReflectionEmit to generate a method which has no name whatsoever and only exists in memory. It is possible though to assign this out to a Func object.
No, it is not possible.