Custom IQueryProvider that falls back on LinqToObjects - c#

I have written a custom IQueryProvider class that takes an expression and analyses it against a SQL database (I know I could use Linq2Sql but there are some modifications and tweaks that I need that unfortunately make Linq2Sql unsuitable). The class will identify and do something with the properties that are marked (using attributes) but any that aren't I would like to be able to pass the expression on to a LinqToObject provider and allow it to filter the results after.
For example, suppose I have the following linq expression:
var parents=Context.Parents
.Where(parent=>parent.Name.Contains("T") && parent.Age>18);
The Parents class is a custom class that implements IQueryProvider and IQueryable interfaces, but only the Age property is marked for retrieval, so the Age property will be processed, but the Name property is ignored because it is not marked. After I've finished processing the Age property, I'd like to pass the whole expression to LinqToObjects to process and filter, but I don't know how.
N.B. It doesn't need to remove the Age clause of the expression because the result will be the same even after I've processed it so I will always be able to send the whole expression on to LinqToObjects.
I've tried the following code but it doesn't seem to work:
IEnumerator IEnumerable.GetEnumerator() {
if(this.expression != null && !this.isEnumerating) {
this.isEnumerating = true;
var queryable=this.ToList().AsQueryable();
var query = queryable.Provider.CreateQuery(this.expression);
return query.GetEnumerator();
}
return this;
}
this.isEnumerating is just a boolean flag set to prevent recursion.
this.expression contains the following:
{value(namespace.Parents`1[namespace.Child]).Where(parent => ((parent.Name.EndsWith("T") AndAlso parent.Name.StartsWith("M")) AndAlso (parent.Test > 0)))}
When I step through the code, despite converting the results to a list, it still uses my custom class for the query. So I figured that because the class Parent was at the beginning of the expression, it was still routing the query back to my provider, so I tried setting this.expression to Argument[1] of the method call so it was as such:
{parent => ((parent.Name.EndsWith("T") AndAlso parent.Name.StartsWith("M")) AndAlso (parent.Test > 0))}
Which to me looks more like it, however, whenever I pass this into the CreateQuery function, I get this error 'Argument expression is not valid'.
The node type of the expression is now 'Quote' though and not 'Call' and the method is null. I suspect that I just need to make this expression a call expression somehow and it will work, but I'm not sure how to.
Please bear in mind that this expression is a where clause, but it may be any kind of expression and I'd prefer not to be trying to analyse the expression to see what type it is before passing it in to the List query provider.
Perhaps there is a way of stripping off or replacing the Parent class of the original expression with the list provider class but still leaving it in a state that can just be passed in as expression into the List provider regardless of the type of expression?
Any help on this would be greatly appreciated!

You were so close!
My goal was to avoid having to "replicate" the full mind-numbingly convoluted SQL-to-Object expressions feature set. And you put me on the right track (thanks!) here's how to piggy-back SQL-to-Object in a custom IQueryable:
public IEnumerator<T> GetEnumerator() {
// For my case (a custom object-oriented database engine) I still
// have an IQueryProvider which builds a "subset" of objects each populated
// with only "required" fields, as extracted from the expression. IDs,
// dates, particular strings, what have you. This is "cheap" because it
// has an indexing system as well.
var en = ((IEnumerable<T>)this.provider.Execute(this.expression));
// Copy your internal objects into a list.
var ar = new List<T>(en);
var queryable = ar.AsQueryable<T>();
// This is where we went wrong:
// queryable.Provider.CreateQuery(this.expression);
// We can't re-reference the original expression because it will loop
// right back on our custom IQueryable<>. Instead, swap out the first
// argument with the List's queryable:
var mc = (MethodCallExpression)this.expression;
var exp = Expression.Call(mc.Method,
Expression.Constant(queryable),
mc.Arguments[1]);
// Now the CLR can do all of the heavy lifting
var query = queryable.Provider.CreateQuery<T>(exp);
return query.GetEnumerator();
}
Can't believe this took me 3 days to figure out how to avoid reinventing wheel on LINQ-to-Object queries.

Related

Why would I want to use an ExpressionVisitor?

I know from the MSDN's article about How to: Modify Expression Trees what an ExpressionVisitor is supposed to do. It should modify expressions.
Their example is however pretty unrealistic so I was wondering why would I need it? Could you name some real-world cases where it would make sense to modify an expression tree? Or, why does it have to be modified at all? From what to what?
It has also many overloads for visiting all kinds of expressions. How do I know when I should use any of them and what should they return? I saw people using VisitParameter and returning base.VisitParameter(node) the other on the other hand were returning Expression.Parameter(..).
There was a issue where on the database we had fields which contained 0 or 1 (numeric), and we wanted to use bools on the application.
The solution was to create a "Flag" object, which contained the 0 or 1 and had a conversion to bool. We used it like a bool through all the application, but when we used it in a .Where() clause the EntityFramework complained that it is unable to call the conversion method.
So we used a expression visitor to change all property accesses like .Where(x => x.Property) to .Where(x => x.Property.Value == 1) just before sending the tree to EF.
Could you name some real-world cases where it would make sense to modify an expression tree?
Strictly speaking, we never modify an expression tree, as they are immutable (as seen from the outside, at least, there's no promise that it doesn't internally memoise values or otherwise have mutable private state). It's precisely because they are immutable and hence we can't just change a node that the visitor pattern makes a lot of sense if we want to create a new expression tree that is based on the one we have but different in some particular way (the closest thing we have to modifying an immutable object).
We can find a few within Linq itself.
In many ways the simplest Linq provider is the linq-to-objects provider that works on enumerable objects in memory.
When it receives enumerables directly as IEnumerable<T> objects it's pretty straight-forward in that most programmers could write an unoptimised version of most of the methods pretty quickly. E.g. Where is just:
foreach (T item in source)
if (pred(item))
yield return item;
And so on. But what about EnumerableQueryable implementing the IQueryable<T> versions? Since the EnumerableQueryable wraps an IEnumerable<T> we could do the desired operation on the one or more enumerable objects involved, but we have an expression describing that operation in terms of IQueryable<T> and other expressions for selectors, predicates, etc, where what we need is a description of that operation in terms of IEnumerable<T> and delegates for selectors, predicates, etc.
System.Linq.EnumerableRewriter is an implementation of ExpressionVisitor does exactly such a re-write, and the result can then simply be compiled and executed.
Within System.Linq.Expressions itself there are a few implementations of ExpressionVisitor for different purposes. One example is that the interpreter form of compilation can't handle hoisted variables in quoted expressions directly, so it uses a visitor to rewrite it into working on indices into a a dictionary.
As well as producing another expression, an ExpressionVisitor can produce another result. Again System.Linq.Expressions has internal examples itself, with debug strings and ToString() for many expression types working by visiting the expression in question.
This can (though it doesn't have to be) be the approach used by a database-querying linq provider to turn an expression into a SQL query.
How do I know when I should use any of them and what should they return?
The default implementation of these methods will:
If the expression can have no child expressions (e.g. the result of Expression.Constant()) then it will return the node back again.
Otherwise visit all the child expressions, and then call Update on the expression in question, passing the results back. Update in turn will either return a new node of the same type with the new children, or return the same node back again if the children weren't changed.
As such, if you don't know you need to explicitly operate on a node for whatever your purposes are, then you probably don't need to change it. It also means that Update is a convenient way to get a new version of a node for a partial change. But just what "whatever your purposes are" means of course depends on the use case. The most common cases are probably go to one extreme or the other, with either just one or two expression types needing an override, or all or nearly all needing it.
(One caveat is if you are examining the children of those nodes that have children in a ReadOnlyCollection such as BlockExpression for both its steps and variables or TryExpression for its catch-blocks, and you will only sometimes change those children then if you haven't changed you are best to check for this yourself as a flaw [recently fixed, but not in any released version yet] means that if you pass the same children to Update in a different collection to the original ReadOnlyCollection then a new expression is created needlessly which has effects further up the tree. This is normally harmless, but it wastes time and memory).
The ExpressionVisitor enables the visitor pattern for Expression's.
Conceptually, the problem is that when you navigate an Expression tree, all you know is that any given node is an Expression, but you don't know specifically what kind of Expression. This pattern allows you to know what kind of Expression you're working with and specify type-specific handling for different kinds.
When you have an Expression, you can just call .Modify. The Expression knows its own type, so it'll call back the appropriate override.
Looking at the MSDN example you linked:
public class AndAlsoModifier : ExpressionVisitor
{
public Expression Modify(Expression expression)
{
return Visit(expression);
}
protected override Expression VisitBinary(BinaryExpression b)
{
if (b.NodeType == ExpressionType.AndAlso)
{
Expression left = this.Visit(b.Left);
Expression right = this.Visit(b.Right);
// Make this binary expression an OrElse operation instead of an AndAlso operation.
return Expression.MakeBinary(ExpressionType.OrElse, left, right, b.IsLiftedToNull, b.Method);
}
return base.VisitBinary(b);
}
}
In this example, if the Expression happens to be a BinaryExpression, it'll call back VisitBinary(BinaryExpression b) given in the example. Now, you can deal with that BinaryExpression knowing that it's a BinaryExpression. You could also specify other override methods that handle other kinds of Expression's.
It's worth noting that, since this is an overloaded resolution trick, visited Expression's will call back the best-fitting method. So, if there're different kinds of BinaryExpression's, then you could write an override for one specific subtype; if another subtype calls back, it'll just use the default BinaryExpression handling.
In short, this pattern allows you to navigate an Expression tree knowing what kind of Expression's you're working with.
Specific real world example I have just encountered occurred when shifting to EF Core and migrating from Sql Server (MS Specific) to SqlLite (platform independent).
The existing business logic revolved around a middle tier/ service layer interface that assumed Full Text Search (FTS) happened auto-magically in the background which it does with SQL Server. Search related queries were passed into this tier via Expressions and FTS against an Sql Server store required no additional FTS specific entities.
I didn't want to change any of this but with SqlLite you have to target a specific virtual table for a Full Text Search which would in turn have meant changing all the middle tier calls to re-target the FTS tables/entities and then joining them to the business entity tables to get a similar result set.
But by sub-classing ExpressionVisitor I was able to intercept the calls in the DAL layer and simply rewrite the incoming expression (or more precisely some of the BinaryExpressions within the overall search expression) to specifically handle SqlLites FTS requirements.
This meant that specialization of the datalayer to the data store happened within a single class that was called from a single place within a repository base class. No other aspects of the application needed to be altered in order to support FTS via EFCore and any SqlLite FTS related entities could be contained in a single pluggable assembly.
So ExpressionVisitor is really very useful, especially when combined with the whole notion of being able to pass around expression trees as data via various forms of IPC.

Where clause with multiple unknown conditions

I am currently developing a Staff management system for my company. The fields may vary and change time to time, so I have an interface for each field like this:
public interface IStaffInfoField
{
// ...
IQueryable<Staff> Filter(IQueryable<Staff> pList, string pAdditionalData);
// ...
}
For each field, I implement the Filter method, for example with Name:
class NameStaffInfoField : BaseStaffInfoField
{
// ...
public override IQueryable<Staff> Filter(IQueryable<Staff> pList, string pAdditionalData)
{
return pList.Where(q => q.Name.Contains(pAdditionalData));
}
// ...
}
Now the users want to search with multiple conditions, it's easy, I just iterate through the list and call Filter. However, they also want a OR condition (say, staff which have name A, OR name B, AND Department Name C, OR Age 30). Note: Users are end-users and they input the search queries through comboboxes and textboxes.
Can I modify my pattern or the lambda expression somehow to achieve that? Because throughout the progress, I don't save the original list to Union it for OR condition. I think it will be slow if I save the expression and Union it for OR condition.
The only solution I can think of now is to add a method to interface that require raw SQL WHERE statement. But my entire program hasn't used pure SQL query yet, is it bad to use it now?
Since your method returns IQueryable, clients already can use it for arbitrarily complicated queries.
IQueryable<Staff> result = xxx.Filter( .... );
result = result.Where( ... );
if ( ... )
result = result.Where( s => ( s.Age > 30 || s.Salary < 1 ) && s.Whatever == "something" );
The IQueryable is very flexible. The query tree is evaluated and translated to sql when you start to enumerate results.
I only wonder why would you need the interface at all?! Since your Filter method expects the IQueryable, this means that client already has the IQueryable! Why would she call your Filter method then if she can already apply arbitrarily complicated query operators on her own?
Edit:
After your additional explanation, if I were you I would create a simple interface to let users create their own query trees containing OR and AND clauses and create a simple function that would translate the user query tree to linq expression tree.
In other words, do not let end users work at linq query tree level, this is too abstract and also too dangerous to let users touch such low level layer of your code. But abstract trees manually translated to linq trees sound safe and easy.
You can download Albahari's LINQKit. It contains a PredicateBuilder that allows you, among other useful things, to concatenate LINQ expressions with OR in a dynamic way.
var predicate = PredicateBuilder.False<Staff>();
predicate = predicate.Or(s => s.Name.Contains(data));
predicate = predicate.Or(s => s.Age > 30);
return dataContext.Staff.Where(predicate);
You can also download the source code and see how it is implemented.
If your users are end users, and they enter criteria through a UI, you may want to look at a UI control that supports IQueryable. Telerik has a large number of pre-baked controls. In most cases end users interact with a grid and they apply filters to the columns. There are several other vendors that do the same thing.
A second option, if you want to make your life difficult, you could take the input text that the user supplies, parse it into a expression tree and then map that expression tree to a IQueryable. If you are not familiar with parsers this task will be fairly difficult to implement.

Transforming a filter string into a C# delegate

I have a set of classes which I am using for a Data Access Layer for some clients. As part of the data access I am allowing a set of filters to be sent in this format:
"{Member[.Member....]}{Operator}{Value}"
I would like to turn these strings into delegates for use in a LINQ query like this:
.Where([delegate returned by a factory])
Here is a more concrete example:
IEnumerable<Parent> parents = GetSomeParents();
string filter = "Child.Id=5";
var expression = FilterFactory<Parent>.GetExpression(filter);
parents = parents.Where(expression);
expression would contain the delegate: parent => parent.Child.Id == 5
Is there a way using reflection to construct the FilterFactory in a Generic way to handle any member path I send in? Paths with indexing aren't required, but would be nice.
Yes absolutely! This is a really fun thing to do too.
One way you can do this is to use the LINQ Dynamic Query Library and get the expression compiler they have in there. I also have something very similar in my project MetaSharp.
But you could also do it yourself if the syntax or features don't quite match what you're wanting. The general idea is that you need to parse the string and build up an Expression tree that represents what you are parsing. In .NET the expression tree objects can be found in System.Linq.Expressions. Once you have your tree you can call Compile() on it and it will be dynamically compiled into a delegate right then. Try reading about the State Machine and Visitor patterns to figure out the best way to parse an arbitrary expresssion like you have above.
PS I would not recommend using regular expressions!

Why does this code generate a NotSupportedException?

Why does this throw System.NotSupportedException?
string foo(string f) { return f; }
string bar = "";
var item = (from f in myEntities.Beer
where f.BeerName == foo(bar)
select f).FirstOrDefault();
Edit: Here's an MSDN reference that (kind of) explains things...
Any method calls in a LINQ to Entities
query that are not explicitly mapped
to a canonical function will result in
a runtime NotSupportedException
exception being thrown. For a list of
CLR methods that are mapped to
canonical functions, see CLR Method to
Canonical Function Mapping.
See also http://mosesofegypt.net/post/LINQ-to-Entities-what-is-not-supported.aspx
EDIT: Okay, the code blows up because it doesn't know what to do with the call to foo(). The query is built up as an expression tree which is then converted to SQL.
The expression tree translator knows about various things - such as string equality, and various other methods (e.g. string.StartsWith) but it doesn't know what your foo method does - foo() is a black box as far as it's concerned. It therefore can't translate it into SQL.
The second version will fail as soon as you try to iterate over it. You can't use a locally defined method in an IQueryable<> where clause (of course, you can, but it will fail when the LINQ provider tries to translate it into SQL).
Because in the 2nd query no actual query is executed. Try adding ToList() where SingleOrDefault() is.
It's probably because the SQL-generation functionality isn't able to determine what to do with your foo() function, so can't generate output for it.

Dynamic LINQ with direct user input, any dangers?

I have a table in a ASP.NET MVC application that I want to be sortable (serverside) and filterable using AJAX. I wanted it to be fairly easy to use in other places and didn't feel like hardcoding the sorting and filtering into query expressions so I looked for a way to build the expressions dynamically and the best way to do this I found was with Dynamic LINQ.
User input from a URL like below is directly inserted into a dynamic Where or OrderBy.
/Orders?sortby=OrderID&order=desc&CustomerName=Microsoft
This would result in two expressions:
OrderBy("OrderID descending")
Where(#"CustomerName.Contains(""Microsoft"")")
While I understand that it won't be thrown at the database directly and inserting straight SQL in here won't work because it can't be reflected to a property and it's type-safe and all, I wonder if someone more creative than me could find a way to exploit it regardless. One exploit that I can think of is that it's possible to sort/filter on properties that are not visible in the table, but this isn't that harmful since they still wouldn't be shown and it can be prevented by hashing.
The only way I allow direct user input is with OrderBy and Where.
Just making sure, thanks :)
Because LINQ to SQL uses type-safe data model classes, you are protected from SQL Injection attacks by default. LINQ to SQL will automatically encode the values based on the underlying data type.
(c) ScottGu
But you can still get "divide by zero" there, so it is recommended to handle all unexpected exceptions and also limit length of the valid entries, JIC
Hum... I've just found at least one possible issue with the Dynamic Linq. Just exec this snippet 1000 times and watch for the CPU and memory consumption going high up (creating an easy way for the denial of service attack):
var lambda = DynamicExpression
.ParseLambda<Order, bool>("Customer=string.Format(\"{0,9999999}"+
"{0,9999999}{0,9999999}{0,9999999}{0,9999999}\",Customer)")
.Compile();
var arg = new Order
{
Total = 11
};
Console.WriteLine(lambda(arg));
I wrote a blog post on that.
Just a thought, but have you looked at ADO.NET Data Services? This provides a REST-enabled API much like the above with a lot of standard LINQ functionality built in.
I can't think of an interest dynamic LINQ exploit of the top of my head, but if this was me I'd be at least white-listing members (OrderID, CustomerName, etc) - but I'd probably write the Expression logic directly; it isn't especially hard if you are only supporting direct properties.
For example, here is Where (using your Contains logic):
static IQueryable<T> Where<T>(this IQueryable<T> source,
string member, string value)
{
var param = Expression.Parameter(typeof(T), "x");
var arg = Expression.Constant(value, typeof(string));
var prop = Expression.PropertyOrField(param, member);
MethodInfo method = typeof(string).GetMethod(
"Contains", new[] { typeof(string) });
var invoke = Expression.Call(prop, method, arg);
var lambda = Expression.Lambda<Func<T, bool>>(invoke, param);
return source.Where(lambda);
}
I've covered OrderBy previously, here.

Categories

Resources