Parsing ODataQueryOptions<type> without EF/Nhibernate - c#

I have a project with a large codebase that uses an in-house data access layer to work with the database. However, we want to support OData access to the system. I'm quite comfortable with expression trees in C#. How do I get at something I can parse here in order to get the structure of their actual query?
Is there a way to get an AST out of this thing that I can turn into sql code?

Essentially, you need to implement you own Query Provider which known how to translate the expression tree to an underlying query.
A simplified version of a controller method would be:
[ODataRoute("foo")]
public List<Foo> GetFoo(ODataQueryOptions<Foo> queryOptions)
{
var queryAllFoo = _myQueryProvider.QueryAll<Foo>();
var modifiedQuery = queryOptions.ApplyTo(queryAllFoo);
return modifiedQuery.ToList();
}
However!
This is not trivial, it took me about 1 month to implement custom OData query processing
You need to build the EDM model, so the WebApi OData can process and build right expression trees
It might involve reflection, creation of types at runtime in a dynamic assembly (for the projection), compiling lambda expressions for the best performance
WebAPI OData component has some limitations, so if you want to get relations working, you need to spend much more extra time, so in our case we did some custom query string transformation (before processing) and injecting joins into expression trees when needed
There are too many details to explain in one answer, it's a long way..
Good luck!

You can use ODataQueryOptions<T> to get abstract syntax trees for the $filter and $orderby query options. ($skip and $top are also available as parsed integers.) Since you don't need/want LINQ support, you could then simply pass the ASTs to a repository method, which would then visit the ASTs to build up the appropriate SQL stored proc invocation. You will not call ODataQueryOptions.ApplyTo. Here's a sketch:
public IEnumerable<Thing> Get(ODataQueryOptions<Thing> opts)
{
var filter = opts.Filter.FilterClause.Expression;
var ordering = opts.OrderBy.OrderByClause.Expression;
var skip = opts.Skip.Value;
var top = opts.Top.Value;
return this.Repository.GetThings(key, filter, ordering, skip, top);
}
Note that filter and ordering in the above are instances of Microsoft.OData.Core.UriParser.Semantic.SingleValueNode. That class has a convenient Accept<T> method, but you probably do not want your repository to depend on that class directly. That is, you should probably use a helper to produce an intermediate form that is independent of Microsoft's OData implementation.
If this is a common pattern, consider using parameter binding so you can get the various query options directly from the controller method's parameter list.

Related

Why would I want to use an ExpressionVisitor?

I know from the MSDN's article about How to: Modify Expression Trees what an ExpressionVisitor is supposed to do. It should modify expressions.
Their example is however pretty unrealistic so I was wondering why would I need it? Could you name some real-world cases where it would make sense to modify an expression tree? Or, why does it have to be modified at all? From what to what?
It has also many overloads for visiting all kinds of expressions. How do I know when I should use any of them and what should they return? I saw people using VisitParameter and returning base.VisitParameter(node) the other on the other hand were returning Expression.Parameter(..).
There was a issue where on the database we had fields which contained 0 or 1 (numeric), and we wanted to use bools on the application.
The solution was to create a "Flag" object, which contained the 0 or 1 and had a conversion to bool. We used it like a bool through all the application, but when we used it in a .Where() clause the EntityFramework complained that it is unable to call the conversion method.
So we used a expression visitor to change all property accesses like .Where(x => x.Property) to .Where(x => x.Property.Value == 1) just before sending the tree to EF.
Could you name some real-world cases where it would make sense to modify an expression tree?
Strictly speaking, we never modify an expression tree, as they are immutable (as seen from the outside, at least, there's no promise that it doesn't internally memoise values or otherwise have mutable private state). It's precisely because they are immutable and hence we can't just change a node that the visitor pattern makes a lot of sense if we want to create a new expression tree that is based on the one we have but different in some particular way (the closest thing we have to modifying an immutable object).
We can find a few within Linq itself.
In many ways the simplest Linq provider is the linq-to-objects provider that works on enumerable objects in memory.
When it receives enumerables directly as IEnumerable<T> objects it's pretty straight-forward in that most programmers could write an unoptimised version of most of the methods pretty quickly. E.g. Where is just:
foreach (T item in source)
if (pred(item))
yield return item;
And so on. But what about EnumerableQueryable implementing the IQueryable<T> versions? Since the EnumerableQueryable wraps an IEnumerable<T> we could do the desired operation on the one or more enumerable objects involved, but we have an expression describing that operation in terms of IQueryable<T> and other expressions for selectors, predicates, etc, where what we need is a description of that operation in terms of IEnumerable<T> and delegates for selectors, predicates, etc.
System.Linq.EnumerableRewriter is an implementation of ExpressionVisitor does exactly such a re-write, and the result can then simply be compiled and executed.
Within System.Linq.Expressions itself there are a few implementations of ExpressionVisitor for different purposes. One example is that the interpreter form of compilation can't handle hoisted variables in quoted expressions directly, so it uses a visitor to rewrite it into working on indices into a a dictionary.
As well as producing another expression, an ExpressionVisitor can produce another result. Again System.Linq.Expressions has internal examples itself, with debug strings and ToString() for many expression types working by visiting the expression in question.
This can (though it doesn't have to be) be the approach used by a database-querying linq provider to turn an expression into a SQL query.
How do I know when I should use any of them and what should they return?
The default implementation of these methods will:
If the expression can have no child expressions (e.g. the result of Expression.Constant()) then it will return the node back again.
Otherwise visit all the child expressions, and then call Update on the expression in question, passing the results back. Update in turn will either return a new node of the same type with the new children, or return the same node back again if the children weren't changed.
As such, if you don't know you need to explicitly operate on a node for whatever your purposes are, then you probably don't need to change it. It also means that Update is a convenient way to get a new version of a node for a partial change. But just what "whatever your purposes are" means of course depends on the use case. The most common cases are probably go to one extreme or the other, with either just one or two expression types needing an override, or all or nearly all needing it.
(One caveat is if you are examining the children of those nodes that have children in a ReadOnlyCollection such as BlockExpression for both its steps and variables or TryExpression for its catch-blocks, and you will only sometimes change those children then if you haven't changed you are best to check for this yourself as a flaw [recently fixed, but not in any released version yet] means that if you pass the same children to Update in a different collection to the original ReadOnlyCollection then a new expression is created needlessly which has effects further up the tree. This is normally harmless, but it wastes time and memory).
The ExpressionVisitor enables the visitor pattern for Expression's.
Conceptually, the problem is that when you navigate an Expression tree, all you know is that any given node is an Expression, but you don't know specifically what kind of Expression. This pattern allows you to know what kind of Expression you're working with and specify type-specific handling for different kinds.
When you have an Expression, you can just call .Modify. The Expression knows its own type, so it'll call back the appropriate override.
Looking at the MSDN example you linked:
public class AndAlsoModifier : ExpressionVisitor
{
public Expression Modify(Expression expression)
{
return Visit(expression);
}
protected override Expression VisitBinary(BinaryExpression b)
{
if (b.NodeType == ExpressionType.AndAlso)
{
Expression left = this.Visit(b.Left);
Expression right = this.Visit(b.Right);
// Make this binary expression an OrElse operation instead of an AndAlso operation.
return Expression.MakeBinary(ExpressionType.OrElse, left, right, b.IsLiftedToNull, b.Method);
}
return base.VisitBinary(b);
}
}
In this example, if the Expression happens to be a BinaryExpression, it'll call back VisitBinary(BinaryExpression b) given in the example. Now, you can deal with that BinaryExpression knowing that it's a BinaryExpression. You could also specify other override methods that handle other kinds of Expression's.
It's worth noting that, since this is an overloaded resolution trick, visited Expression's will call back the best-fitting method. So, if there're different kinds of BinaryExpression's, then you could write an override for one specific subtype; if another subtype calls back, it'll just use the default BinaryExpression handling.
In short, this pattern allows you to navigate an Expression tree knowing what kind of Expression's you're working with.
Specific real world example I have just encountered occurred when shifting to EF Core and migrating from Sql Server (MS Specific) to SqlLite (platform independent).
The existing business logic revolved around a middle tier/ service layer interface that assumed Full Text Search (FTS) happened auto-magically in the background which it does with SQL Server. Search related queries were passed into this tier via Expressions and FTS against an Sql Server store required no additional FTS specific entities.
I didn't want to change any of this but with SqlLite you have to target a specific virtual table for a Full Text Search which would in turn have meant changing all the middle tier calls to re-target the FTS tables/entities and then joining them to the business entity tables to get a similar result set.
But by sub-classing ExpressionVisitor I was able to intercept the calls in the DAL layer and simply rewrite the incoming expression (or more precisely some of the BinaryExpressions within the overall search expression) to specifically handle SqlLites FTS requirements.
This meant that specialization of the datalayer to the data store happened within a single class that was called from a single place within a repository base class. No other aspects of the application needed to be altered in order to support FTS via EFCore and any SqlLite FTS related entities could be contained in a single pluggable assembly.
So ExpressionVisitor is really very useful, especially when combined with the whole notion of being able to pass around expression trees as data via various forms of IPC.

Using Full-Text Search in Linq / ODataController

I am working on an application which serves data via OData. I am using ASP.Net and ODataControllers querying via EF -- the data is backed by a SQLServer database.
On the front-end website which visualizes this data, the user can search results -- a $filter is dynamically created on the front end and an OData request is sent (allowing server-side filtering).
On the database table backing the data which is eventually served via OData, full-text search is enabled, but it appears in the pipeline OData filter -> Linq query -> SQL query, a LIKE search is used instead of the full-text Contains() method.
Is there any way that anyone knows of to make this use the full-text capabilities in a reasonably elegant way?
Presumably I can do a lot of fumbling about with a custom IODataPathHandler and / or IODataPathTemplateHandler and / or some other things to intercept the points in the pipeline, but I'd rather try to avoid that if possible.
Any advice?
OData's contains function is meant to perform a simple substring match. The OData spec defines the $search query option for full-text search, but Web API does not currently support $search. (There is an open issue.)
Your best bet is probably a custom query option (e.g., /Customers?fulltextsearch=contains(Name, 'Arianne')), but you'll have to write all of the code to parse the option, etc.
If you are determined to map OData contains to T-SQL CONTAINS, then you will need to intercept the translation done by Linq to Entities. Look at the source code for the existing ContainsTranslator and work backwards.
Use an interceptor and a custom EnableQueryAttribute for that purpose:
Define a FtsInterceptor class as described in the article and add it to your context - DbInterception.Add(new FtsInterceptor()).
Define a subclass of the EnableQueryAttribute class and override the ApplyQuery method adding FullTextPrefix (-FTSPREFIX-) for all parameters of the OData contains function:
public class FullTextSearchAttribute : EnableQueryAttribute
{
public override IQueryable ApplyQuery(IQueryable queryable, ODataQueryOptions queryOptions)
{
if (queryOptions.Filter == null)
return queryOptions.ApplyTo(queryable);
const string pattern = "contains\\([%20]*[^%27]*[%20]*,[%20]*%27(?<Value>[^%27]*)";
var matchEvaluator = new MatchEvaluator(match =>
{
var value = match.Groups["Value"].Value;
return match.Value.Replace($"%27{value}", $"%27-FTSPREFIX-{value}");
});
var request = new HttpRequestMessage(HttpMethod.Get,
Regex.Replace(queryOptions.Request.RequestUri.AbsoluteUri,
pattern,
matchEvaluator,
RegexOptions.IgnoreCase));
return new ODataQueryOptions(queryOptions.Context, request).ApplyTo(queryable);
}
}
Use the attribute in your code:
[FullTextSearchAttribute]
public IQueryable<YourDomainClass> Get()
{
//Query
}

Transforming a filter string into a C# delegate

I have a set of classes which I am using for a Data Access Layer for some clients. As part of the data access I am allowing a set of filters to be sent in this format:
"{Member[.Member....]}{Operator}{Value}"
I would like to turn these strings into delegates for use in a LINQ query like this:
.Where([delegate returned by a factory])
Here is a more concrete example:
IEnumerable<Parent> parents = GetSomeParents();
string filter = "Child.Id=5";
var expression = FilterFactory<Parent>.GetExpression(filter);
parents = parents.Where(expression);
expression would contain the delegate: parent => parent.Child.Id == 5
Is there a way using reflection to construct the FilterFactory in a Generic way to handle any member path I send in? Paths with indexing aren't required, but would be nice.
Yes absolutely! This is a really fun thing to do too.
One way you can do this is to use the LINQ Dynamic Query Library and get the expression compiler they have in there. I also have something very similar in my project MetaSharp.
But you could also do it yourself if the syntax or features don't quite match what you're wanting. The general idea is that you need to parse the string and build up an Expression tree that represents what you are parsing. In .NET the expression tree objects can be found in System.Linq.Expressions. Once you have your tree you can call Compile() on it and it will be dynamically compiled into a delegate right then. Try reading about the State Machine and Visitor patterns to figure out the best way to parse an arbitrary expresssion like you have above.
PS I would not recommend using regular expressions!

Dynamic LINQ with direct user input, any dangers?

I have a table in a ASP.NET MVC application that I want to be sortable (serverside) and filterable using AJAX. I wanted it to be fairly easy to use in other places and didn't feel like hardcoding the sorting and filtering into query expressions so I looked for a way to build the expressions dynamically and the best way to do this I found was with Dynamic LINQ.
User input from a URL like below is directly inserted into a dynamic Where or OrderBy.
/Orders?sortby=OrderID&order=desc&CustomerName=Microsoft
This would result in two expressions:
OrderBy("OrderID descending")
Where(#"CustomerName.Contains(""Microsoft"")")
While I understand that it won't be thrown at the database directly and inserting straight SQL in here won't work because it can't be reflected to a property and it's type-safe and all, I wonder if someone more creative than me could find a way to exploit it regardless. One exploit that I can think of is that it's possible to sort/filter on properties that are not visible in the table, but this isn't that harmful since they still wouldn't be shown and it can be prevented by hashing.
The only way I allow direct user input is with OrderBy and Where.
Just making sure, thanks :)
Because LINQ to SQL uses type-safe data model classes, you are protected from SQL Injection attacks by default. LINQ to SQL will automatically encode the values based on the underlying data type.
(c) ScottGu
But you can still get "divide by zero" there, so it is recommended to handle all unexpected exceptions and also limit length of the valid entries, JIC
Hum... I've just found at least one possible issue with the Dynamic Linq. Just exec this snippet 1000 times and watch for the CPU and memory consumption going high up (creating an easy way for the denial of service attack):
var lambda = DynamicExpression
.ParseLambda<Order, bool>("Customer=string.Format(\"{0,9999999}"+
"{0,9999999}{0,9999999}{0,9999999}{0,9999999}\",Customer)")
.Compile();
var arg = new Order
{
Total = 11
};
Console.WriteLine(lambda(arg));
I wrote a blog post on that.
Just a thought, but have you looked at ADO.NET Data Services? This provides a REST-enabled API much like the above with a lot of standard LINQ functionality built in.
I can't think of an interest dynamic LINQ exploit of the top of my head, but if this was me I'd be at least white-listing members (OrderID, CustomerName, etc) - but I'd probably write the Expression logic directly; it isn't especially hard if you are only supporting direct properties.
For example, here is Where (using your Contains logic):
static IQueryable<T> Where<T>(this IQueryable<T> source,
string member, string value)
{
var param = Expression.Parameter(typeof(T), "x");
var arg = Expression.Constant(value, typeof(string));
var prop = Expression.PropertyOrField(param, member);
MethodInfo method = typeof(string).GetMethod(
"Contains", new[] { typeof(string) });
var invoke = Expression.Call(prop, method, arg);
var lambda = Expression.Lambda<Func<T, bool>>(invoke, param);
return source.Where(lambda);
}
I've covered OrderBy previously, here.

Dynamic "WHERE" like queries on memory objects

What would be the best approach to allow users to define a WHERE-like constraints on objects which are defined like this:
Collection<object[]> data
Collection<string> columnNames
where object[] is a single row.
I was thinking about dynamically creating a strong-typed wrapper and just using Dynamic LINQ but maybe there is a simpler solution?
DataSet's are not really an option since the collections are rather huge (40,000+ records) and I don't want to create DataTable and populate it every time I run a query.
What kind of queries do you need to run? If it's just equality, that's relatively easy:
public static IEnumerable<object[]> WhereEqual(
this IEnumerable<object[]> source,
Collection<string> columnNames,
string column,
object value)
{
int columnIndex = columnNames.IndexOf(column);
if (columnIndex == -1)
{
throw new ArgumentException();
}
return source.Where(row => Object.Equals(row[columnIndex], value);
}
If you need something more complicated, please give us an example of what you'd like to be able to write.
If I get your point : you'd like to support users writting the where clause externally - I mean users are real users and not developers so you seek solution for the uicontrol, code where condition bridge. I just though this because you mentioned dlinq.
So if I'm correct what you want to do is really :
give the user the ability to use column names
give the ability to describe a bool function (which will serve as where criteria)
compose the query dynamically and run
For this task let me propose : Rules from the System.Workflow.Activities.Rules namespace. For rules there're several designers available not to mention the ones shipped with Visual Studio (for the web that's another question, but there're several ones for that too).I'd start with Rules without workflow then examine examples from msdn. It's a very flexible and customizable engine.
One other thing: LINQ has connection to this problem as a function returning IQueryable can defer query execution, you can previously define a query and in another part of the code one can extend the returned queryable based on the user's condition (which then can be sticked with extension methods).
When just using object, LINQ isn't really going to help you very much... is it worth the pain? And Dynamic LINQ is certainly overkill. What is the expected way of using this? I can think of a few ways of adding basic Where operations.... but I'm not sure how helpful it would be.
How about embedding something like IronPython in your project? We use that to allow users to define their own expressions (filters and otherwise) inside a sandbox.
I'm thinking about something like this:
((col1 = "abc") or (col2 = "xyz")) and (col3 = "123")
Ultimately it would be nice to have support for LIKE operator with % wildcard.
Thank you all guys - I've finally found it. It's called NQuery and it's available from CodePlex. In its documentation there is even an example which contains a binding to my very structure - list of column names + list of object[]. Plus fully functional SQL query engine.
Just perfect.

Categories

Resources