Detect parenthesis in BinaryExpression - c#

I am building a expression analyser from which I would like to generate database query code, I've gotten quite far but am stuck parsing BinaryExpressions accurately. It's quite easy to break them up into Left and Right but I need to detect parenthesis and generate my code accordingly and I cannot see how to do this.
An example [please ignore the flawed logic :)]:
a => a.Line2 != "1" && (a.Line2 == "a" || a.Line2 != "b") && !a.Line1.EndsWith("a")
I need to detect the 'set' in the middle and preserve their grouping but I cannot see any difference in the expression to a normal BinaryExpression during parsing (I would hate to check the string representation for parenthesis)
Any help would be appreciated.
(I should probably mention that I'm using C#)
--Edit--
I failed to mention that I'm using the standard .Net Expression classes to build the expressions (System.Linq.Expressions namespace)
--Edit2--
Ok I'm not parsing text into code, I'm parsing code into text. So my Parser class has a method like this:
void FilterWith<T>(Expression<Func<T, bool>> filterExpression);
which allows you to write code like this:
FilterWith<Customer>(c => c.Name =="asd" && c.Surname == "qwe");
which is quite easy to parse using the standard .Net classes, my challenge is parsing this expression:
FilterWith<Customer>(c => c.Name == "asd" && (c.Surname == "qwe" && c.Status == 1) && !c.Disabled)
my challenge is to keep the expressions between parenthesis as a single set. The .Net classes correctly splits the parenthesis parts from the others but gives no indication that it is a set due to the parenthesis.

I haven't used Expression myself, but if it works anything like any other AST, then the problem is easier to solve than you make it out to be. As another commentor pointed out, just put parentheses around all of your binary expressions and then you won't have to worry about order of operations issues.
Alternatively, you could check to see if the expression you are generating is at a lower precedence than the containing expression and if so, put parenthesis around it. So if you have a tree like this [* 4 [+ 5 6]] (where tree nodes are represented recursively as [node left-subtree right-subtree]), you would know when writing out the [+ 4 5] tree that it was contained inside a * operation, which is higher precedence than a + operation and thus requires than any of its immediate subtrees be placed in parentheses. The pseudo-code could be something like this:
function parseBinary(node) {
if(node.left.operator.precedence < node.operator.precedence)
write "(" + parseBinary(node.left) + ")"
else
write parseBinary(node.left)
write node.operator
// and now do the same thing for node.right as you did for node.left above
}
You'll need to have a table of precedence for the various operators, and a way to get at the operator itself to find out what it is and thence what its precedence is. However, I imagine you can figure that part out.

When building a expression analyzer, you need first a parser, and for that you need a tokenizer.
A tokenizer is a piece of code that reading an expression, generates tokens (which can be valid or invalid), for a determined syntax.
So your parser, using the tokenizer, reads the expression in the established order (left-to right, right-to-left, top-to-bottom, whatever you choose) and creates a tree that maps the expression.
Then the analyzer interprets the tree into an expression, giving its definitive meaning.

Related

VB.Net/C# - Split string on custom function

I'm working on a custom mathematical expression calculator, but I'm having problems at parsing nested conditional expression like this one:
IIF("M"="M",(IIF(100 < 50,(IIF(2 > 0.45,2,1)),(IIF(2 > 0.45,4,3)))),(IIF(100 < 46,(IIF(2 > 0.45,2,1)),(IIF(2 >0.45,4,3)))))
What I'd like to do is to split the IIF function by commas in order to get its parameters:
Dim condition = "M"="M"
Dim truePart = (IIF(100 < 50,(IIF([2 > 0.45,2,1)),(IIF(2 >0.45,4,3))))
Dim falsePart = (IIF(100 < 46,(IIF(2 > 0.45,2,1)),(IIF(2 >0.45,4,3)))))
At the moment I'm using Regex to parse single IIF function by getting what is inside the parentheses and the split it by commas:
\((.*?)\)
Obviously that doesn't work with such expression since it will stop at the first closing parentheses, therefore I thought about using this to get all the other characters:
\((.*?)\).*
But now I'm not sure how to split it, since using commas is not an option anymore.
The answer from theory is that regular expressions are not capable to do what you requested because they "cannot count". However, you need to count.
The practise says that .NET regular expressions are no regular expressions but stack machines. With a group (?<Group>.*) you in fact add an entry to a stack of that group. With (?<-Group>), you can remove an entry from that stack. You can also test whether the stack is empty.
Out of curiosity, I gave it a try and I believe that
[\(,]([^\(\)]|(?<Par>\()|(?<-Par>\)))*(?(Par)---|[,\)])
should work, where --- is used as an escape sequence. If you understand that "regular expression" right away, then I think you are good to go. In all other cases, I would rather recommend you to write a parser manually. Otherwise, you are not going to understand your code 5min after you have tested it.

Matching and replacing function expressions

I need to do some very light parsing of C# (actually transpiled Razor code) to replace a list of function calls with textual replacements.
If given a set containing {"Foo.myFunc" : "\"def\"" } it should replace this code:
var res = "abc" + Foo.myFunc(foo, Bar.otherFunc( Baz.funk()));
with this:
var res = "abc" + "def"
I don't care about the nested expressions.
This seems fairly trivial and I think I should be able to avoid building an entire C# parser using something like this for every member of the mapping set:
find expression start (e.g. Foo.myFunc)
Push()/Pop() parentheses on a Stack until Count == 0.
Mark this as expression stop
replace everything from expression start until expression stop
But maybe I don't need to ... Is there a (possibly built-in) .NET library that can do this for me? Counting is not possible in the family of languages that RE is in, but maybe the extended regex syntax in C# can handle this somehow using back references?
edit:
As the comments to this answer demonstrates simply counting brackets will not be sufficient generally, as something like trollMe("(") will throw off those algorithms. Only true parsing would then suffice, I guess (?).
The trick for a normal string will be:
(?>"(\\"|[^"])*")
A verbatim string:
(?>#"(""|[^"])*")
Maybe this can help, but I'm not sure that this will work in all cases:
<func>(?=\()((?>/\*.*?\*/)|(?>#"(""|[^"])*")|(?>"(\\"|[^"])*")|\r?\n|[^()"]|(?<open>\()|(?<-open>\)))+?(?(open)(?!))
Replace <func> with your function name.
Useless to say that trollMe("\"(", "((", #"abc""de((f") works as expected.
DEMO

DynamicLINQ - Escaping double quotes inside strings

I'm trying to do a dynamic filtering system using the DynamicLINQ library. I have everything working smoothly when you do something like: Find people with First Name is Bob:
Context.Users.Where("FirstName == \"Bob\"");
But I run into problems when I want to do: Find people with First Name is "Bob" (where Bob is stored in double quotes in the data source).
I tried a few different things, including escaping an escaped double quote and a few other variants:
Context.Users.Where("FirstName == \"\\\"Bob\\\"\"");
// or as a literal for readability
Context.Users.Where(#"FirstName == ""\""Bob\""""");
// From comments below
Context.Users.Where("FirstName == \"\"Bob\"\"");
None of these work. Any help would be greatly appreciated.
Thanks.
EDIT - I'm just dealing with the resulting string right now. The actual string is generated from a model.
If you want use in clause some specific string with special symbols then better way, as i think, use paramtrized form like this
Context.Users.Where("FirstName == #0", "\"Bob\"");
My thought is you can not use .Where() to do dynamic linq evaluations as you have written. The reason is because Where() does not understand what FirstName is, and was never intended to do dynamic Linq expressions. You would use where like the following
.Where( x => x.FirstName == "\"Bob\""); and that will work for sure.
A good head start is to use an existing Library found on ScottGu's Blog as follows:
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx
He has a download with code that will do everything you are describing. It will take a little time to digest the library but I have used it in a project and it works great. You need to know a little bit about Lambdas and you will go far.
Hope this helps :) Good Question, I have been there and done that. It was tricky finding this solution.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
EDIT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Look at Dynamic.cs:
Line 2123 the following code exists in the method ParseToken().
case '"':
case '\'':
char quote = ch;
do
{
NextChar();
while (textPos < textLen && ch != quote) NextChar();
if (textPos == textLen)
throw ParseError(textPos, Res.UnterminatedStringLiteral);
NextChar();
} while (ch == quote);
t = TokenId.StringLiteral;
break;
What this parser appears to be doing is: when it reads the second " in [""Bob""] it returns a null string Literal, thinking it has found the end of the string literal, then it would parse an identifier [Bob] and then another null string literal. Somehow you will have to modify the parser to look for "" as a token.
Maybe in ParseComparison() on line 766 you can devise a way to look for null String Literal followed by an identifier followed by another null String Literal. ???
Easy solution is to Replace " with null since rewriting the parser looks like some major effort.

Sprache: left recursion in grammar

I am developing a parser for a language similar to SQL and I have the problem of creating some of the rules of language, such as: expression IS NULL and expression IN (expression1, expression2, ...) with priority between logical and mathematical operators.
I uploaded a GitHub test project https://github.com/anpv/SpracheTest/ but this variant is not good.
I tried to use the following rules:
private static readonly Parser<AstNode> InOperator =
from expr in Parse.Ref(() => Expression)
from inKeyword in Parse.IgnoreCase("in").Token()
from values in Parse
.Ref(() => Expression)
.DelimitedBy(Comma)
.Contained(OpenParenthesis, CloseParenthesis)
select new InOperator(expr, values);
private static readonly Parser<AstNode> IsNullOperator =
from expr in Parse.Ref(() => Expression)
from isNullKeyword in Parse
.IgnoreCase("is")
.Then(_ => Parse.WhiteSpace.AtLeastOnce())
.Then(_ => Parse.IgnoreCase("null"))
select new IsNullOperator(expr);
private static readonly Parser<AstNode> Equality =
Parse
.ChainOperator(Eq, IsNullOperator.Or(InOperator).Or(Additive), MakeBinary);
which throws ParseException in code like ScriptParser.ParseExpression("1 is null") or ScriptParser.ParseExpression("1 in (1, 2, 3)"): "Parsing failure: Left recursion in the grammar.".
How can I look-ahead for Expression, or do other variants exist to solve this problem?
The answer is, unfortunately, the Sprache cannot parse a left-recursive grammar. I stumbled on comments in the source code talking about how buggy support for left-recursive grammars had been removed when researching this question (which was also how I found your question) - see the source code.
In order to deal with this problem you need to reorganize how you do your parsing. If you are writing a simple expression parser, for example, this is a common problem you have to deal with. Searching the web there is lots of discussion of how to remove left-recursion from a grammar, in particular, for expressions.
In your case, I expect you'll need to do something like:
term := everything simple in an expression (like "1", "2", "3", etc.)
expression := term [ IN ( expression*) | IS NULL | "+" expression | "-" expression | etc.]
or similar - basically - you have to unwind the recursion yourself. By doing that I was able to fix my issues with expressions. I suspect any basic compiler book probably has a section on how to "normalize" a grammar.
It makes building whatever object you are returning from the parser a bit more of a pain, but in the select statement instead of doing "select new Expression(arg1, arg2)" I changed it to be a function call, and the function decides on the specific object being returned depending on what the arguments were.

C# Multi Replace Idea

I know the want for a "multi replace" in C# is no new concept, there are tons of solutions out there. However, I haven't came across any elegant solution that jives well with Linq to Sql. My proposal is to create an extension method (called, you guessed it, MultiReplace) that returns a lambda expression which chains multiple calls of Replace together. So in effect you would have the following:
x => x.SomeMember.MultiReplace("ABC", "-")
// Which would return a compiled expression equivalent to:
x => x.SomeMember.Replace("A", "-").Replace("B", "-").Replace("C", "-")
I'd like your thoughts/input/suggestions on this. Even if it turns out to be a bad idea, it still seems like a wicked opportunity to dive into expression trees. Your feedback is most appreciated.
-Eric
I'm not completely sure why you would want to do what you're describing. However if your motivation is to make more readable linq statements, by condensing some filtering logic, I suggest to look into the Specification Pattern.
If you only want to transform the result however, I would suggest to just do it in code, as there would only be a marginal benefit transforming on the server.
Some more examples on the Specification Pattern and Linq-to-SQL
I'm not sure what MultiReplace should be doing, or why you want to mix it with Linq to Sql. (Anything truly working with Linq to Sql would be translatable into SQL, which would be quite a lot of work, I think.)
The best solution I can think of is Regular Expressions. Why not use them? Linq to Sql may even translate them for you already, since MS SQL supports regular expressions.
x => Regex.Replace(x, "A|B|C", "-")
To me, this seems like a REALLY bad idea, it could be just because of your example, but the syntax you have listed to me is very confusing.
Reading this I would expect that
x => x.SomeMember.MultiReplace("ABC", "-")
If using the following text
ABC This Is A Test ABC Application
You would get something like
--- This Is A Test --- Application
But you are actually saying that it would be
--- This Is - Test --- -pplication
Which I see as problematic....
I'd also mention here that I don't really see a n urgent need for something like this. I guess if there was a need to do multiple replacemnts I'd do one of the following.
Chain them myself "myInput".Replace("m", "-").Replace("t", "-")
Create a function/method that accepts an ARRAY or List of strings for the matches, then a replacement character. Keeping it easy to understand.
Maybe there's no method that will do this, but you could simply nest the calls. Or make a static method that takes in a string and an array containing all its replacements. Another problem is that you need to actually specify a pair (the original substring, and its replacement).
I've never needed/wanted a feature like this.
The best way to do this would be:
static string MultiReplace(string this CSV, string Orig, string Replacement)
{
string final = "";
foreach (string s in CSV.Split(','))
{
final += s.Replace(Orig, Replacement);
}
return final;
}
Then you could call it with no ambiguity:
x => x.SomeMember.MultiReplace("A,B,C", "-")
If you expect that these strings will be longer than 3-4 values in the CSV, you might want to drop the concatenation in favor of a StringBuilder.

Categories

Resources