How to parse a boolean expression and load it into a class? - c#

I've got the following BoolExpr class:
class BoolExpr
{
public enum BOP { LEAF, AND, OR, NOT };
//
// inner state
//
private BOP _op;
private BoolExpr _left;
private BoolExpr _right;
private String _lit;
//
// private constructor
//
private BoolExpr(BOP op, BoolExpr left, BoolExpr right)
{
_op = op;
_left = left;
_right = right;
_lit = null;
}
private BoolExpr(String literal)
{
_op = BOP.LEAF;
_left = null;
_right = null;
_lit = literal;
}
//
// accessor
//
public BOP Op
{
get { return _op; }
set { _op = value; }
}
public BoolExpr Left
{
get { return _left; }
set { _left = value; }
}
public BoolExpr Right
{
get { return _right; }
set { _right = value; }
}
public String Lit
{
get { return _lit; }
set { _lit = value; }
}
//
// public factory
//
public static BoolExpr CreateAnd(BoolExpr left, BoolExpr right)
{
return new BoolExpr(BOP.AND, left, right);
}
public static BoolExpr CreateNot(BoolExpr child)
{
return new BoolExpr(BOP.NOT, child, null);
}
public static BoolExpr CreateOr(BoolExpr left, BoolExpr right)
{
return new BoolExpr(BOP.OR, left, right);
}
public static BoolExpr CreateBoolVar(String str)
{
return new BoolExpr(str);
}
public BoolExpr(BoolExpr other)
{
// No share any object on purpose
_op = other._op;
_left = other._left == null ? null : new BoolExpr(other._left);
_right = other._right == null ? null : new BoolExpr(other._right);
_lit = new StringBuilder(other._lit).ToString();
}
//
// state checker
//
Boolean IsLeaf()
{
return (_op == BOP.LEAF);
}
Boolean IsAtomic()
{
return (IsLeaf() || (_op == BOP.NOT && _left.IsLeaf()));
}
}
What algorithm should I use to parse an input boolean expression string like "¬((A ∧ B) ∨ C ∨ D)" and load it into the above class?

TL;DR: If you want to see the code, jump to the second portion of the answer.
I would build a tree from the expression to parse and then traverse it depth first. You can refer to the wikipedia article about Binary Expression Trees to get a feel for what I'm suggesting.
Start by adding the omitted optional parentheses to make the next step easier
When you read anything that is not an operator or a parenthese, create a LEAF type node
When you read any operator (in your case not, and, or), create the corresponding operator node
Binary operators get the previous and following nodes as children, unary operators only get the next one.
So, for your example ¬((A ∧ B) ∨ C ∨ D), the algorithm would go like this:
¬((A ∧ B) ∨ C ∨ D) becomes ¬(((A ∧ B) ∨ C) ∨ D)
Create a NOT node, it'll get the result of the following opening paren as a child.
Create A LEAF node, AND node and B LEAF node. AND has A and B as children.
Create OR node, it has the previously created AND as a child and a new LEAF node for C.
Create OR node, it has the previously created OR and a new node for D as children.
At that point, your tree looks like this:
NOT
|
OR
/\
OR D
/ \
AND C
/\
A B
You can then add a Node.Evaluate() method that evaluates recursively based on its type (polymorphism could be used here). For example, it could look something like this:
class LeafEx {
bool Evaluate() {
return Boolean.Parse(this.Lit);
}
}
class NotEx {
bool Evaluate() {
return !Left.Evaluate();
}
}
class OrEx {
bool Evaluate() {
return Left.Evaluate() || Right.Evaluate();
}
}
And so on and so forth. To get the result of your expression, you then only need to call
bool result = Root.Evaluate();
Alright, since it's not an assignment and it's actually a fun thing to implement, I went ahead. Some of the code I'll post here is not related to what I described earlier (and some parts are missing) but I'll leave the top part in my answer for reference (nothing in there is wrong (hopefully!)).
Keep in mind this is far from optimal and that I made an effort to not modify your provided BoolExpr class. Modifying it could allow you to reduce the amount of code. There's also no error checking at all.
Here's the main method
static void Main(string[] args)
{
//We'll use ! for not, & for and, | for or and remove whitespace
string expr = #"!((A&B)|C|D)";
List<Token> tokens = new List<Token>();
StringReader reader = new StringReader(expr);
//Tokenize the expression
Token t = null;
do
{
t = new Token(reader);
tokens.Add(t);
} while (t.type != Token.TokenType.EXPR_END);
//Use a minimal version of the Shunting Yard algorithm to transform the token list to polish notation
List<Token> polishNotation = TransformToPolishNotation(tokens);
var enumerator = polishNotation.GetEnumerator();
enumerator.MoveNext();
BoolExpr root = Make(ref enumerator);
//Request boolean values for all literal operands
foreach (Token tok in polishNotation.Where(token => token.type == Token.TokenType.LITERAL))
{
Console.Write("Enter boolean value for {0}: ", tok.value);
string line = Console.ReadLine();
booleanValues[tok.value] = Boolean.Parse(line);
Console.WriteLine();
}
//Eval the expression tree
Console.WriteLine("Eval: {0}", Eval(root));
Console.ReadLine();
}
The tokenization phase creates a Token object for all tokens of the expression. It helps keep the parsing separated from the actual algorithm. Here's the Token class that performs this:
class Token
{
static Dictionary<char, KeyValuePair<TokenType, string>> dict = new Dictionary<char, KeyValuePair<TokenType, string>>()
{
{
'(', new KeyValuePair<TokenType, string>(TokenType.OPEN_PAREN, "(")
},
{
')', new KeyValuePair<TokenType, string>(TokenType.CLOSE_PAREN, ")")
},
{
'!', new KeyValuePair<TokenType, string>(TokenType.UNARY_OP, "NOT")
},
{
'&', new KeyValuePair<TokenType, string>(TokenType.BINARY_OP, "AND")
},
{
'|', new KeyValuePair<TokenType, string>(TokenType.BINARY_OP, "OR")
}
};
public enum TokenType
{
OPEN_PAREN,
CLOSE_PAREN,
UNARY_OP,
BINARY_OP,
LITERAL,
EXPR_END
}
public TokenType type;
public string value;
public Token(StringReader s)
{
int c = s.Read();
if (c == -1)
{
type = TokenType.EXPR_END;
value = "";
return;
}
char ch = (char)c;
if (dict.ContainsKey(ch))
{
type = dict[ch].Key;
value = dict[ch].Value;
}
else
{
string str = "";
str += ch;
while (s.Peek() != -1 && !dict.ContainsKey((char)s.Peek()))
{
str += (char)s.Read();
}
type = TokenType.LITERAL;
value = str;
}
}
}
At that point, in the main method, you can see I transform the list of tokens in Polish Notation order. It makes the creation of the tree much easier and I use a modified implementation of the Shunting Yard Algorithm for this:
static List<Token> TransformToPolishNotation(List<Token> infixTokenList)
{
Queue<Token> outputQueue = new Queue<Token>();
Stack<Token> stack = new Stack<Token>();
int index = 0;
while (infixTokenList.Count > index)
{
Token t = infixTokenList[index];
switch (t.type)
{
case Token.TokenType.LITERAL:
outputQueue.Enqueue(t);
break;
case Token.TokenType.BINARY_OP:
case Token.TokenType.UNARY_OP:
case Token.TokenType.OPEN_PAREN:
stack.Push(t);
break;
case Token.TokenType.CLOSE_PAREN:
while (stack.Peek().type != Token.TokenType.OPEN_PAREN)
{
outputQueue.Enqueue(stack.Pop());
}
stack.Pop();
if (stack.Count > 0 && stack.Peek().type == Token.TokenType.UNARY_OP)
{
outputQueue.Enqueue(stack.Pop());
}
break;
default:
break;
}
++index;
}
while (stack.Count > 0)
{
outputQueue.Enqueue(stack.Pop());
}
return outputQueue.Reverse().ToList();
}
After this transformation, our token list becomes NOT, OR, OR, C, D, AND, A, B.
At this point, we're ready to create the expression tree. The properties of Polish Notation allow us to just walk the Token List and recursively create the tree nodes (we'll use your BoolExpr class) as we go:
static BoolExpr Make(ref List<Token>.Enumerator polishNotationTokensEnumerator)
{
if (polishNotationTokensEnumerator.Current.type == Token.TokenType.LITERAL)
{
BoolExpr lit = BoolExpr.CreateBoolVar(polishNotationTokensEnumerator.Current.value);
polishNotationTokensEnumerator.MoveNext();
return lit;
}
else
{
if (polishNotationTokensEnumerator.Current.value == "NOT")
{
polishNotationTokensEnumerator.MoveNext();
BoolExpr operand = Make(ref polishNotationTokensEnumerator);
return BoolExpr.CreateNot(operand);
}
else if (polishNotationTokensEnumerator.Current.value == "AND")
{
polishNotationTokensEnumerator.MoveNext();
BoolExpr left = Make(ref polishNotationTokensEnumerator);
BoolExpr right = Make(ref polishNotationTokensEnumerator);
return BoolExpr.CreateAnd(left, right);
}
else if (polishNotationTokensEnumerator.Current.value == "OR")
{
polishNotationTokensEnumerator.MoveNext();
BoolExpr left = Make(ref polishNotationTokensEnumerator);
BoolExpr right = Make(ref polishNotationTokensEnumerator);
return BoolExpr.CreateOr(left, right);
}
}
return null;
}
Now we're golden! We have the expression tree that represents the expression so we'll ask the user for the actual boolean values of each literal operand and evaluate the root node (which will recursively evaluate the rest of the tree as needed).
My Eval function follows, keep in mind I'd use some polymorphism to make this cleaner if I modified your BoolExpr class.
static bool Eval(BoolExpr expr)
{
if (expr.IsLeaf())
{
return booleanValues[expr.Lit];
}
if (expr.Op == BoolExpr.BOP.NOT)
{
return !Eval(expr.Left);
}
if (expr.Op == BoolExpr.BOP.OR)
{
return Eval(expr.Left) || Eval(expr.Right);
}
if (expr.Op == BoolExpr.BOP.AND)
{
return Eval(expr.Left) && Eval(expr.Right);
}
throw new ArgumentException();
}
As expected, feeding our test expression ¬((A ∧ B) ∨ C ∨ D) with values false, true, false, true for A, B, C, D respectively yields the result false.

From the algorithm point of view, to parse an expression, you need one stack.
We use two steps algorithm :
Lexing
The aim of lexing is to get 'keywords', 'identifiers' and 'separators' :
- A keyword is 'if' 'then' 'else' '(' ')' '/\' '/' etc...
- An identifiers in your case is 'A', 'B', 'C' etc...
- A separator is blank space, tabulation, end of line, end of file, etc...
Lexing consist of using an automata. In lexing you will read your input string char by char. When you encouter a char that is compatible with one of your keyword, identifiers, separators, you start a sequence of char. When you encouter a separators you stop the sequence, look in a dictionnary of the sequence is a keyword (if not it is a identifier); then put the tuple [sequence, keyword or identifier/class] on the stack.
I leave you as exercice the case of small keyword '(' that can be also see as separators.
Parsing
Parsing is similar to grammar. In your case the only rules to check are comma, and binary operations, and just a simple identifier.
formaly :
expression::
'(' expression ')'
expression /\ expression
expression \/ expression
identifier
This can be write by a recursive function.
First reverse your stack, then:
myParseExpression(stack, myC#ResultObject)
{
if(stack.top = kewyord.'(' )
then myParseOpenComma(all stack but top, myC#ResultObject)
if(stack.top = keyword.'/\')
then myParseBinaryAnd(stack, myC#ResultObject)
}
myParseOpenComma(stack, myC#ResultObject)
{
...
}
myParseBinaryAnd(stack, myC#ResultObject)
{
myNewRigthPartOfExpr = new C#ResultObject
myParseExpression(stack.top, myNewRigthPartOfExpr)
remove top of stack;
myNewLeftPartOfExpr = new C#ResultObject
myParseExpression(stack.top, myNewLeftPartOfExpr)
C#ResultObject.add("AND", myNewRigthPartOfExpr, myNewLeftPartOfExpr)
}
...
There is multiple function that share recursion on each other.
As exercice, try to add the negation.
Lexing is traditionnally done by a lexer (like lex tool).
Parsing is traditionnaly done by a parser (like bison tool).
Tool allow write of thoses function more like I have done in the formaly expression.
Thoses aspect are fundamental of program compilation.
Coding thoses thing will improve you a lot because it is hard and fundamental.

Related

Why does parser generated by ANTLR reuse context objects?

I'm trying to create an interpreter for a simple programming language using ANTLR.
I would like to add the feature of recursion.
So far I have implemented the definition and calling functions with option of using several return statements and also local variables. To achieve having local variables I extended the parser partial class of FunctionCallContext with a dictionary for them. I can successfully use them for one time. Also, when I call the same function again from itself (recursively), the parser creates a new context object for the new function call, as I would expect.
However,if I create a "deeper" recursion, the third context of the function call will be the very same as the second (having the same hash code and the same local variables).
My (updated) grammar:
grammar BatshG;
/*
* Parser Rules
*/
compileUnit: ( (statement) | functionDef)+;
statement: print ';'
| println ';'
| assignment ';'
| loopWhile
| branch
| returnStatement ';'
| functionCall ';'
;
branch:
'if' '(' condition=booleanexpression ')'
trueBranch=block
('else' falseBranch=block)?;
loopWhile:
'while' '(' condition=booleanexpression ')'
whileBody=block
;
block:
statement
| '{' statement* '}';
numericexpression:
MINUS onepart=numericexpression #UnaryMinus
| left=numericexpression op=('*'|'/') right=numericexpression #MultOrDiv
| left=numericexpression op=('+'|'-') right=numericexpression #PlusOrMinus
| number=NUMERIC #Number
| variableD #NumVariable
;
stringexpression:
left=stringexpression PLUSPLUS right=stringexpression #Concat
| string=STRING #String
| variableD #StrVariable
| numericexpression #NumberToString
;
booleanexpression:
left=numericexpression relationalOperator=('<' | '>' | '>=' | '<=' | '==' | '!=' ) right=numericexpression #RelationalOperation
| booleanliteral #Boolean
| numericexpression #NumberToBoolean
;
booleanliteral: trueConst | falseConst ;
trueConst : 'true' ;
falseConst : 'false' ;
assignment : varName=IDENTIFIER EQUAL right=expression;
expression: numericexpression | stringexpression | functionCall | booleanexpression;
println: 'println' '(' argument=expression ')';
print: 'print' '(' argument=expression ')';
functionDef: 'function' funcName= IDENTIFIER
'('
(functionParameters=parameterList)?
')'
'{'
statements=statementPart?
'}'
;
statementPart: statement* ;
returnStatement: ('return' returnValue=expression );
parameterList : paramName=IDENTIFIER (',' paramName=IDENTIFIER)*;
functionCall: funcName=IDENTIFIER '('
(functionArguments=argumentList)?
')';
argumentList: expression (',' expression)*;
variableD: varName=IDENTIFIER;
///*
// * Lexer Rules
// */
NUMERIC: (FLOAT | INTEGER);
PLUSPLUS: '++';
MINUS: '-';
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]* ;
EQUAL : '=' ;
STRING : '"' (~["\r\n] | '""')* '"' ;
INTEGER: [0-9] [0-9]*;
DIGIT : [0-9] ;
FRAC : '.' DIGIT+ ;
EXP : [eE] [-+]? DIGIT+ ;
FLOAT : DIGIT* FRAC EXP? ;
WS: [ \n\t\r]+ -> channel(HIDDEN);
///*
// * Lexer Rules
// */
NUMERIC: (FLOAT | INTEGER);
PLUSPLUS: '++';
MINUS: '-';
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]* ;
EQUAL : '=' ;
STRING : '"' (~["\r\n] | '""')* '"' ;
INTEGER: [0-9] [0-9]*;
DIGIT : [0-9] ;
FRAC : '.' DIGIT+ ;
EXP : [eE] [-+]? DIGIT+ ;
FLOAT : DIGIT* FRAC EXP? ;
WS: [ \n\t\r]+ -> channel(HIDDEN);
My partial class of parser written by me (not the generated part):
public partial class BatshGParser
{
//"extensions" for contexts:
public partial class FunctionCallContext
{
private Dictionary<string, object> localVariables = new Dictionary<string, object>();
private bool isFunctionReturning;
public FunctionCallContext()
{
localVariables = new Dictionary<string, object>();
isFunctionReturning = false;
}
public Dictionary<string, object> LocalVariables { get => localVariables; set => localVariables = value; }
public bool IsFunctionReturning { get => isFunctionReturning; set => isFunctionReturning = value; }
}
public partial class FunctionDefContext
{
private List<string> parameterNames;
public FunctionDefContext()
{
parameterNames = new List<string>();
}
public List<string> ParameterNames { get => parameterNames; set => parameterNames = value; }
}
}
And relevant parts (and maybe a little more) of my visitor:
public class BatshGVisitor : BatshGBaseVisitor<ResultValue>
{
public ResultValue Result { get; set; }
public StringBuilder OutputForPrint { get; set; }
private Dictionary<string, object> globalVariables = new Dictionary<string, object>();
//string = function name
//object = parameter list
//object = return value
private Dictionary<string, Func<List<object>, object>> globalFunctions = new Dictionary<string, Func<List<object>, object>>();
private Stack<BatshGParser.FunctionCallContext> actualFunctions = new Stack<BatshGParser.FunctionCallContext>();
public override ResultValue VisitCompileUnit([NotNull] BatshGParser.CompileUnitContext context)
{
OutputForPrint = new StringBuilder("");
isSearchingForFunctionDefinitions = true;
var resultvalue = VisitChildren(context);
isSearchingForFunctionDefinitions = false;
resultvalue = VisitChildren(context);
Result = new ResultValue() { ExpType = "string", ExpValue = resultvalue.ExpValue ?? null };
return Result;
}
public override ResultValue VisitChildren([NotNull] IRuleNode node)
{
if (this.isSearchingForFunctionDefinitions)
{
for (int i = 0; i < node.ChildCount; i++)
{
if (node.GetChild(i) is BatshGParser.FunctionDefContext)
{
Visit(node.GetChild(i));
}
}
}
return base.VisitChildren(node);
}
protected override bool ShouldVisitNextChild([NotNull] IRuleNode node, ResultValue currentResult)
{
if (isSearchingForFunctionDefinitions)
{
if (node is BatshGParser.FunctionDefContext)
{
return true;
}
else
return false;
}
else
{
if (node is BatshGParser.FunctionDefContext)
{
return false;
}
else
return base.ShouldVisitNextChild(node, currentResult);
}
}
public override ResultValue VisitFunctionDef([NotNull] BatshGParser.FunctionDefContext context)
{
string functionName = null;
functionName = context.funcName.Text;
if (context.functionParameters != null)
{
List<string> plist = CollectParamNames(context.functionParameters);
context.ParameterNames = plist;
}
if (isSearchingForFunctionDefinitions)
globalFunctions.Add(functionName,
(
delegate(List<object> args)
{
var currentMethod = (args[0] as BatshGParser.FunctionCallContext);
this.actualFunctions.Push(currentMethod);
//args[0] is the context
for (int i = 1; i < args.Count; i++)
{
currentMethod.LocalVariables.Add(context.ParameterNames[i - 1],
(args[i] as ResultValue).ExpValue
);
}
ResultValue retval = null;
retval = this.VisitStatementPart(context.statements);
this.actualFunctions.Peek().IsFunctionReturning = false;
actualFunctions.Pop();
return retval;
}
)
);
return new ResultValue()
{
};
}
public override ResultValue VisitStatementPart([NotNull] BatshGParser.StatementPartContext context)
{
if (!this.actualFunctions.Peek().IsFunctionReturning)
{
return VisitChildren(context);
}
else
{
return null;
}
}
public override ResultValue VisitReturnStatement([NotNull] BatshGParser.ReturnStatementContext context)
{
this.actualFunctions.Peek().IsFunctionReturning = true;
ResultValue retval = null;
if (context.returnValue != null)
{
retval = Visit(context.returnValue);
}
return retval;
}
public override ResultValue VisitArgumentList([NotNull] BatshGParser.ArgumentListContext context)
{
List<ResultValue> argumentList = new List<ResultValue>();
foreach (var item in context.children)
{
var tt = item.GetText();
if (item.GetText() != ",")
{
ResultValue rv = Visit(item);
argumentList.Add(rv);
}
}
return
new ResultValue()
{
ExpType = "list",
ExpValue = argumentList ?? null
};
}
public override ResultValue VisitFunctionCall([NotNull] BatshGParser.FunctionCallContext context)
{
string functionName = context.funcName.Text;
int hashcodeOfContext = context.GetHashCode();
object functRetVal = null;
List<object> argumentList = new List<object>()
{
context
//here come the actual parameters later
};
ResultValue argObjects = null;
if (context.functionArguments != null)
{
argObjects = VisitArgumentList(context.functionArguments);
}
if (argObjects != null )
{
if (argObjects.ExpValue is List<ResultValue>)
{
var argresults = (argObjects.ExpValue as List<ResultValue>) ?? null;
foreach (var arg in argresults)
{
argumentList.Add(arg);
}
}
}
if (globalFunctions.ContainsKey(functionName))
{
{
functRetVal = globalFunctions[functionName]( argumentList );
}
}
return new ResultValue()
{
ExpType = ((ResultValue)functRetVal).ExpType,
ExpValue = ((ResultValue)functRetVal).ExpValue
};
}
public override ResultValue VisitVariableD([NotNull] BatshGParser.VariableDContext context)
{
object variable;
string variableName = context.GetChild(0).ToString();
string typename = "";
Dictionary<string, object> variables = null;
if (actualFunctions.Count > 0)
{
Dictionary<string, object> localVariables =
actualFunctions.Peek().LocalVariables;
if (localVariables.ContainsKey(variableName))
{
variables = localVariables;
}
}
else
{
variables = globalVariables;
}
if (variables.ContainsKey(variableName))
{
variable = variables[variableName];
typename = charpTypesToBatshTypes[variable.GetType()];
}
else
{
Type parentContextType = contextTypes[context.parent.GetType()];
typename = charpTypesToBatshTypes[parentContextType];
variable = new object();
if (typename.Equals("string"))
{
variable = string.Empty;
}
else
{
variable = 0d;
}
}
return new ResultValue()
{
ExpType = typename,
ExpValue = variable
};
}
public override ResultValue VisitAssignment([NotNull] BatshGParser.AssignmentContext context)
{
string varname = context.varName.Text;
ResultValue varAsResultValue = Visit(context.right);
Dictionary<string, object> localVariables = null;
if (this.actualFunctions.Count > 0)
{
localVariables =
actualFunctions.Peek().LocalVariables;
if (localVariables.ContainsKey(varname))
{
localVariables[varname] = varAsResultValue.ExpValue;
}
else
if (globalVariables.ContainsKey(varname))
{
globalVariables[varname] = varAsResultValue.ExpValue;
}
else
{
localVariables.Add(varname, varAsResultValue.ExpValue);
}
}
else
{
if (globalVariables.ContainsKey(varname))
{
globalVariables[varname] = varAsResultValue.ExpValue;
}
else
{
globalVariables.Add(varname, varAsResultValue.ExpValue);
}
}
return varAsResultValue;
}
}
What could cause the problem? Thank you!
Why does parser generated by ANTLR reuse context objects?
It doesn't. Each function call in your source code will correspond to exactly one FunctionCallContext object and those will be unique. They'd have to be, even for two entirely identical function calls, because they also contain meta data, such as where in the source the function call appears - and that's obviously going to differ between calls even if everything else is the same.
To illustrate this, consider the following source code:
function f(x) {
return f(x);
}
print(f(x));
This will create a tree containing exactly two FunctionCallContext objects - one for line 2 and one for line 4. They will both be distinct - they'll both have child nodes referring to the function name f and the argument x, but they'll have different location information and a different hash code - as will the child nodes. Nothing is being reused here.
What could cause the problem?
The fact that you're seeing the same node multiple times is simply due to the fact that you're visiting the same part of the tree multiple times. That's a perfectly normal thing to do for your use case, but in your case it causes a problem because you stored mutable data in the object, assuming that you'd get a fresh FunctionCall object for each time a function call happens at run time - rather than each time a function call appears in the source code.
That's not how parse trees work (they represent the structure of the source code, not the sequence of calls that might happen at run time), so you can't use FunctionCallContext objects to store information about a specific run-time function call. In general, I'd consider it a bad idea to put mutable state into context objects.
Instead you should put your mutable state into your visitor object. For your specific problem that means having a call stack containing the local variables of each run-time function call. Each time a function starts execution, you can push a frame onto the stack and each time a function exits, you can pop it. That way the top of the stack will always contain the local variables of the function currently being executed.
PS: This is unrelated to your problem, but the usual rules of precedence in arithmetic expressions are such that, + has the same precedence as - and * has the same precedence as /. In your grammar the precedence of / is greater than that of * and that of - higher than +. This means that for example 9 * 5 / 3 is going to evaluate to 5, when it should be 15 (assuming the usual rules for integer arithmetic).
To fix this + and -, as well as * and / should be part of the same rule, so they get the same precedence:
| left=numericexpression op=('*'|'/') right=numericexpression #MulOrDiv
| left=numericexpression op=('+'|'-') right=numericexpression #PlusOrMinus

How to use "using static" directive for dynamically generated code?

I want to let the users input mathematics expression in terms of x and y as natural as possible. For example, instead of typing Complex.Sin(x), I prefer to use just Sin(x).
The following code fails when Sin(x), for example, is defined by the user.
using Microsoft.CodeAnalysis.CSharp.Scripting;
using System;
using System.Numerics;
using static System.Console;
using static System.Numerics.Complex;
namespace MathEvaluator
{
public class Globals
{
public Complex x;
public Complex y;
}
class Program
{
async static void JobAsync(Microsoft.CodeAnalysis.Scripting.Script<Complex> script)
{
Complex x = new Complex(1, 0);
Complex y = new Complex(0, 1);
try
{
var result = await script.RunAsync(new Globals { x = x, y = y });
WriteLine($"{x} * {y} = {result.ReturnValue}\n");
}
catch (Exception e)
{
WriteLine(e.Message);
}
}
static void Main(string[] args)
{
Console.Write("Define your expression in x and y: ");
string expression = Console.ReadLine(); //user input
var script = CSharpScript.Create<Complex>(expression, globalsType: typeof(Globals));
script.Compile();
JobAsync(script);
}
}
}
Question
How to use using static directive for dynamically generated code?
You can supply script options to the Create function that define the references and imports that should be set for your script:
var scriptOptions = ScriptOptions.Default
.WithReferences("System.Numerics")
.WithImports("System.Numerics.Complex");
var script = CSharpScript.Create<Complex>(expression, options: scriptOptions, globalsType: typeof(Globals));
That way, you can use Sin(x) in the input:
Define your expression in x and y: Sin(x)
(1, 0) * (0, 1) = (0,841470984807897, 0)
However, when dealing with user input, you should consider writing your own parser. This allows you on one hand to define your own “aliases” for functions (e.g. a lower case sin) or even a more lenient syntax; on the other hand, it also adds more security because right now, nothing prevents me from doing this:
Define your expression in x and y: System.Console.WriteLine("I hacked this calculator!")
I hacked this calculator!
(1, 0) * (0, 1) = (0, 0)
I created a quick (and dirty) parser using Roslyn’s syntax tree parsing. Obviously this is rather limited (e.g. since it requires all return values of subexpressions to be Complex), but this could give you an idea of how this could work:
void Main()
{
string input = "y + 3 * Sin(x)";
var options = CSharpParseOptions.Default.WithKind(Microsoft.CodeAnalysis.SourceCodeKind.Script);
var expression = CSharpSyntaxTree.ParseText(input, options).GetRoot().DescendantNodes().OfType<ExpressionStatementSyntax>().FirstOrDefault()?.Expression;
Console.WriteLine(EvaluateExpression(expression));
}
Complex EvaluateExpression(ExpressionSyntax expr)
{
if (expr is BinaryExpressionSyntax)
{
var binExpr = (BinaryExpressionSyntax)expr;
var left = EvaluateExpression(binExpr.Left);
var right = EvaluateExpression(binExpr.Right);
switch (binExpr.OperatorToken.ValueText)
{
case "+":
return left + right;
case "-":
return left - right;
case "*":
return left * right;
case "/":
return left / right;
default:
throw new NotSupportedException(binExpr.OperatorToken.ValueText);
}
}
else if (expr is IdentifierNameSyntax)
{
return GetValue(((IdentifierNameSyntax)expr).Identifier.ValueText);
}
else if (expr is LiteralExpressionSyntax)
{
var value = ((LiteralExpressionSyntax)expr).Token.Value;
return float.Parse(value.ToString());
}
else if (expr is InvocationExpressionSyntax)
{
var invocExpr = (InvocationExpressionSyntax)expr;
var args = invocExpr.ArgumentList.Arguments.Select(arg => EvaluateExpression(arg.Expression)).ToArray();
return Call(((IdentifierNameSyntax)invocExpr.Expression).Identifier.ValueText, args);
}
else
throw new NotSupportedException(expr.GetType().Name);
}
Complex Call(string identifier, Complex[] args)
{
switch (identifier.ToLower())
{
case "sin":
return Complex.Sin(args[0]);
default:
throw new NotImplementedException(identifier);
}
}
Complex GetValue(string identifier)
{
switch (identifier)
{
case "x":
return new Complex(1, 0);
case "y":
return new Complex(0, 1);
default:
throw new ArgumentException("Identifier not found", nameof(identifier));
}
}

Code folding in RichTextBox

I am working on a Code Editor derived from Winforms RichTextBox using C#. I have already implemented autocompletion and syntax hilighting, but code folding is somewhat a different approach. What I want to achieve is:
The code below:
public static SomeFunction(EventArgs e)
{
//Some code
//Some code
//Some code
//Some code
//Some code
//Some code
}
Should become:
public static SomeFunction(EventArgs e)[...]
Where[...] is a shortened code that is displayed in a tooltip when you hover over at [...]
Any ideas or suggestions how to do it, either using Regex or procedural code?
I have created a parser that will return the indices of code folding locations.
Folding delimiters are defined by regular expressions.
You can specify a start and ending index so that you don't have to check the entire code when one area is updated.
It will throw exceptions if the code is not properly formatted, feel free to change that behavior. One alternative could be that it keeps moving up the stack until an appropriate end token is found.
Fold Finder
public class FoldFinder
{
public static FoldFinder Instance { get; private set; }
static FoldFinder()
{
Instance = new FoldFinder();
}
public List<SectionPosition> Find(string code, List<SectionDelimiter> delimiters, int start = 0,
int end = -1)
{
List<SectionPosition> positions = new List<SectionPosition>();
Stack<SectionStackItem> stack = new Stack<SectionStackItem>();
int regexGroupIndex;
bool isStartToken;
SectionDelimiter matchedDelimiter;
SectionStackItem currentItem;
Regex scanner = RegexifyDelimiters(delimiters);
foreach (Match match in scanner.Matches(code, start))
{
// the pattern for every group is that 0 corresponds to SectionDelimter, 1 corresponds to Start
// and 2, corresponds to End.
regexGroupIndex =
match.Groups.Cast<Group>().Select((g, i) => new {
Success = g.Success,
Index = i
})
.Where(r => r.Success && r.Index > 0).First().Index;
matchedDelimiter = delimiters[(regexGroupIndex - 1) / 3];
isStartToken = match.Groups[regexGroupIndex + 1].Success;
if (isStartToken)
{
stack.Push(new SectionStackItem()
{
Delimter = matchedDelimiter,
Position = new SectionPosition() { Start = match.Index }
});
}
else
{
currentItem = stack.Pop();
if (currentItem.Delimter == matchedDelimiter)
{
currentItem.Position.End = match.Index + match.Length;
positions.Add(currentItem.Position);
// if searching for an end, and we've passed it, and the stack is empty then quit.
if (end > -1 && currentItem.Position.End >= end && stack.Count == 0) break;
}
else
{
throw new Exception(string.Format("Invalid Ending Token at {0}", match.Index));
}
}
}
if (stack.Count > 0) throw new Exception("Not enough closing symbols.");
return positions;
}
public Regex RegexifyDelimiters(List<SectionDelimiter> delimiters)
{
return new Regex(
string.Join("|", delimiters.Select(d =>
string.Format("(({0})|({1}))", d.Start, d.End))));
}
}
public class SectionStackItem
{
public SectionPosition Position;
public SectionDelimiter Delimter;
}
public class SectionPosition
{
public int Start;
public int End;
}
public class SectionDelimiter
{
public string Start;
public string End;
}
Sample Find
The sample below matches folds delimited by {,}, [,], and right after a symbol until a ;. I don't see too many IDE's that fold for each line, but it might be handy at long pieces of code, like a LINQ query.
var sectionPositions =
FoldFinder.Instance.Find("abc { def { qrt; ghi [ abc ] } qrt }", new List<SectionDelimiter>(
new SectionDelimiter[3] {
new SectionDelimiter() { Start = "\\{", End = "\\}" },
new SectionDelimiter() { Start = "\\[", End = "\\]" },
new SectionDelimiter() { Start = "(?<=\\[|\\{|;|^)[^[{;]*(?=;)", End = ";" },
}));

Logical Inversion of Symbol Tree

I have a class, Symbol_Group, that represents an invertible expression of the nature AB(C+DE) + FG. Symbol_Group contains a List<List<iSymbol>>, where iSymbol is an interface applied to Symbol_Group, and Symbol.
The above equation would be represented as A,B,Sym_Grp + F,G; Sym_Grp = C + D,E, where each + represents a new List<iSymbol>
I need to be able to invert and expand this equation using an algorithm that can handle any amount of nesting, and any amount of symbols anded or ored together, to produce a set of Symbol_Group, with each containing a unique expansion. For the above question the answer set would be !A!F; !B!F; !C!D!F; !C!E!F; !A!G; !B!G; !C!D!G; !C!E!G;
I know that I will need to use recursion, but I have had very little experience with it. Any help figuring out this algorithm would be appreciated.
Unless you are somehow required to use a List<List<iSymbol>>, I recommend switching to a different class structure, with a base class (or interface) Expression and subclasses (or implementors) SymbolExpression, NotExpression, OrExpression, and AndExpression. A SymbolExpression contains a single symbol; a NotExpression contains one Expression, and OrExpression and AndExpression contain two expressions each. This is a much more standard structure for working with mathematical expressions, and it is probably simpler to perform the transformations on it.
With the above classes, you can model any expression as a binary tree. Negate the expression by replacing the root by a NotExpression whose child is the original root. Then, traverse the tree with a depth-first search, and whenever you hit a NotExpression whose child is an OrExpression or an AndExpression, you can replace that by an AndExpression or an OrExpression (respectively) whose children are NotExpressions with the original children below them. You might also want to eliminate double negations (look for NotExpressions whose child is a NotExpression, and remove both).
(Whether this answer is understandable probably depends on how comfortable you are with working with trees. Let me know if you need clarification.)
After much work, this is the method I used to get the minimum terms of inversion.
public List<iSymbol> GetInvertedGroup()
{
TrimSymbolList();
List<List<iSymbol>> symbols = this.CopyListMembers(Symbols);
List<iSymbol> SymList;
while (symbols.Count > 1)
{
symbols.Add(MultiplyLists(symbols[0], symbols[1]));
symbols.RemoveRange(0, 2);
}
SymList = symbols[0];
for(int i=0;i<symbols[0].Count;i++)
{
if (SymList[i] is Symbol)
{
Symbol sym = SymList[i] as Symbol;
SymList.RemoveAt(i--);
Symbol_Group symgrp = new Symbol_Group(null);
symgrp.AddSymbol(sym);
SymList.Add(symgrp);
}
}
for (int i = 0; i < SymList.Count; i++)
{
if (SymList[i] is Symbol_Group)
{
Symbol_Group SymGrp = SymList[i] as Symbol_Group;
if (SymGrp.Symbols.Count > 1)
{
List<iSymbol> list = SymGrp.GetInvertedGroup();
SymList.RemoveAt(i--);
AddElementsOf(list, SymList);
}
}
}
return SymList;
}
public List<iSymbol> MultiplyLists(List<iSymbol> L1, List<iSymbol> L2)
{
List<iSymbol> Combined = new List<iSymbol>(L1.Count + L2.Count);
foreach (iSymbol S1 in L1)
{
foreach (iSymbol S2 in L2)
{
Symbol_Group newGrp = new Symbol_Group(null);
newGrp.AddSymbol(S1);
newGrp.AddSymbol(S2);
Combined.Add(newGrp);
}
}
return Combined;
}
This resulted in a List of Groups of Symbols, with each group representing 1 or term in the final result (e.g !A!F). Some further code was used to reduce this to a List>, as there was a reasonable amount of nesting in the answer. To reduce it, I used:
public List<List<Symbol>> ReduceList(List<iSymbol> List)
{
List<List<Symbol>> Output = new List<List<Symbol>>(List.Count);
foreach (iSymbol iSym in List)
{
if (iSym is Symbol_Group)
{
List<Symbol> L = new List<Symbol>();
(iSym as Symbol_Group).GetAllSymbols(L);
Output.Add(L);
}
else
{
throw (new Exception());
}
}
return Output;
}
public void GetAllSymbols(List<Symbol> List)
{
foreach (List<iSymbol> SubList in Symbols)
{
foreach (iSymbol iSym in SubList)
{
if (iSym is Symbol)
{
List.Add(iSym as Symbol);
}
else if (iSym is Symbol_Group)
{
(iSym as Symbol_Group).GetAllSymbols(List);
}
else
{
throw(new Exception());
}
}
}
}
Hope this helps someone else!
I came to this simpler solution after a bit of rejigging. I hope it helps out somebody else with a similar problem! This is the class structure (plus a few other properties)
public class SymbolGroup : iSymbol
{
public SymbolGroup(SymbolGroup Parent, SymRelation Relation)
{
Symbols = new List<iSymbol>();
this.Parent = Parent;
SymbolRelation = Relation;
if (SymbolRelation == SymRelation.AND)
Name = "AND Group";
else
Name = "OR Group";
}
public int Depth
{
get
{
foreach (iSymbol s in Symbols)
{
if (s is SymbolGroup)
{
return (s as SymbolGroup).Depth + 1;
}
}
return 1;
}
}
}
The method of inversion is also contained within this class. It replaces an unexpanded group in the results list with all of the expanded results of that result. It only strips away one level at a time.
public List<SymbolGroup> InvertGroup()
{
List<SymbolGroup> Results = new List<SymbolGroup>();
foreach (iSymbol s in Symbols)
{
if (s is SymbolGroup)
{
SymbolGroup sg = s as SymbolGroup;
sg.Parent = null;
Results.Add(s as SymbolGroup);
}
else if (s is Symbol)
{
SymbolGroup sg = new SymbolGroup(null, SymRelation.AND);
sg.AddSymbol(s);
Results.Add(sg);
}
}
bool AllChecked = false;
while (!AllChecked)
{
AllChecked = true;
for(int i=0;i<Results.Count;i++)
{
SymbolGroup result = Results[i];
if (result.Depth > 1)
{
AllChecked = false;
Results.RemoveAt(i--);
}
else
continue;
if (result.SymbolRelation == SymRelation.OR)
{
Results.AddRange(result.MultiplyOut());
continue;
}
for(int j=0;j<result.nSymbols;j++)
{
iSymbol s = result.Symbols[j];
if (s is SymbolGroup)
{
result.Symbols.RemoveAt(j--); //removes the symbolgroup that is being replaced, so that the rest of the group can be added to the expansion.
AllChecked = false;
SymbolGroup subResult = s as SymbolGroup;
if(subResult.SymbolRelation == SymRelation.OR)
{
List<SymbolGroup> newResults;
newResults = subResult.MultiplyOut();
foreach(SymbolGroup newSg in newResults)
{
newSg.Symbols.AddRange(result.Symbols);
}
Results.AddRange(newResults);
}
break;
}
}
}
}
return Results;
}

Removing duplicates from string array

I'm new to C#, have looked at numerous posts but am still confused.
I have a array list:
List<Array> moves = new List<Array>();
I'm adding moves to it using the following:
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
And now I wish to remove duplicates using the following:
moves = moves.Distinct();
However it's not letting me do it. I get this error:
Cannot implicitly convert type 'System.Collections.Generic.IEnumerable' to 'System.Collections.Generic.List'. An explicit conversion exists (are you missing a cast?)
Help please? I'd be so grateful.
Steve
You need to call .ToList() after the .Distinct method as it returns IEnumerable<T>. I would also recommend you using a strongly typed List<string[]> instead of List<Array>:
List<string[]> moves = new List<string[]>();
string[] newmove = { piece, axis.ToString(), direction.ToString() };
moves.Add(newmove);
moves.Add(newmove);
moves = moves.Distinct().ToList();
// At this stage moves.Count = 1
Your code has two errors. The first is the missing call to ToList, as already pointed out. The second is subtle. Unique compares objects by identity, but your duplicate list items have are different array instances.
There are multiple solutions for that problem.
Use a custom equality comparer in moves.Distinct().ToList(). No further changes necessary.
Sample implementation:
class ArrayEqualityComparer<T> : EqualityComparer<T> {
public override bool Equals(T[] x, T[] y) {
if ( x == null ) return y == null;
else if ( y == null ) return false;
return x.SequenceEquals(y);
}
public override int GetHashCode(T[] obj) {
if ( obj == null) return 0;
return obj.Aggregate(0, (hash, x) => hash ^ x.GetHashCode());
}
}
Filtering for unique items:
moves = moves.Distinct(new ArrayEqualityComparer<string>()).ToList();
Use Tuple<string,string,string> instead of string[]. Tuple offers built-in structural equality and comparison. This variant might make your code cluttered because of the long type name.
Instantiation:
List<Tuple<string, string, string>> moves =
new List<Tuple<string, string, string>>();
Adding new moves:
Tuple<string, string, string> newmove =
Tuple.Create(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();
Use a custom class to hold your three values. I'd actually recommend this variant, because it makes all your code dealing with moves much more readable.
Sample implementation:
class Move {
public Move(string piece, string axis, string direction) {
Piece = piece;
Axis = axis;
Direction = direction;
}
string Piece { get; private set; }
string Axis { get; private set; }
string Direction { get; private set; }
public override Equals(object obj) {
Move other = obj as Move;
if ( other != null )
return Piece == other.Piece &&
Axis == other.Axis &&
Direction == other.Direction;
return false;
}
public override GetHashCode() {
return Piece.GetHashCode() ^
Axis.GetHashCode() ^
Direction.GetHashCode();
}
// TODO: override ToString() as well
}
Instantiation:
List<Move> moves = new List<Move>();
Adding new moves:
Move newmove = new Move(piece, axis.ToString(), direction.ToString());
moves.Add(newmove);
Filtering for unique items:
moves = moves.Distinct().ToList();
The compiler error is because you need to convert the result to a list:
moves = moves.Distinct().ToList();
However it probably won't work as you want, because arrays don't have Equals defined in the way that you are hoping (it compares the references of the array objects, not the values inside the array). Instead of using an array, create a class to hold your data and define Equals and GetHashCode to compare the values.
Old question, but this is an O(n) solution using O(1) additional space:
public static void RemoveDuplicates(string[] array)
{
int c = 0;
int i = -1;
for (int n = 1; n < array.Length; n++)
{
if (array[c] == array[n])
{
if (i == -1)
{
i = n;
}
}
else
{
if (i == -1)
{
c++;
}
else
{
array[i] = array[n];
c++;
i++;
}
}
}
}

Categories

Resources