I'm trying to create an interpreter for a simple programming language using ANTLR.
I would like to add the feature of recursion.
So far I have implemented the definition and calling functions with option of using several return statements and also local variables. To achieve having local variables I extended the parser partial class of FunctionCallContext with a dictionary for them. I can successfully use them for one time. Also, when I call the same function again from itself (recursively), the parser creates a new context object for the new function call, as I would expect.
However,if I create a "deeper" recursion, the third context of the function call will be the very same as the second (having the same hash code and the same local variables).
My (updated) grammar:
grammar BatshG;
/*
* Parser Rules
*/
compileUnit: ( (statement) | functionDef)+;
statement: print ';'
| println ';'
| assignment ';'
| loopWhile
| branch
| returnStatement ';'
| functionCall ';'
;
branch:
'if' '(' condition=booleanexpression ')'
trueBranch=block
('else' falseBranch=block)?;
loopWhile:
'while' '(' condition=booleanexpression ')'
whileBody=block
;
block:
statement
| '{' statement* '}';
numericexpression:
MINUS onepart=numericexpression #UnaryMinus
| left=numericexpression op=('*'|'/') right=numericexpression #MultOrDiv
| left=numericexpression op=('+'|'-') right=numericexpression #PlusOrMinus
| number=NUMERIC #Number
| variableD #NumVariable
;
stringexpression:
left=stringexpression PLUSPLUS right=stringexpression #Concat
| string=STRING #String
| variableD #StrVariable
| numericexpression #NumberToString
;
booleanexpression:
left=numericexpression relationalOperator=('<' | '>' | '>=' | '<=' | '==' | '!=' ) right=numericexpression #RelationalOperation
| booleanliteral #Boolean
| numericexpression #NumberToBoolean
;
booleanliteral: trueConst | falseConst ;
trueConst : 'true' ;
falseConst : 'false' ;
assignment : varName=IDENTIFIER EQUAL right=expression;
expression: numericexpression | stringexpression | functionCall | booleanexpression;
println: 'println' '(' argument=expression ')';
print: 'print' '(' argument=expression ')';
functionDef: 'function' funcName= IDENTIFIER
'('
(functionParameters=parameterList)?
')'
'{'
statements=statementPart?
'}'
;
statementPart: statement* ;
returnStatement: ('return' returnValue=expression );
parameterList : paramName=IDENTIFIER (',' paramName=IDENTIFIER)*;
functionCall: funcName=IDENTIFIER '('
(functionArguments=argumentList)?
')';
argumentList: expression (',' expression)*;
variableD: varName=IDENTIFIER;
///*
// * Lexer Rules
// */
NUMERIC: (FLOAT | INTEGER);
PLUSPLUS: '++';
MINUS: '-';
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]* ;
EQUAL : '=' ;
STRING : '"' (~["\r\n] | '""')* '"' ;
INTEGER: [0-9] [0-9]*;
DIGIT : [0-9] ;
FRAC : '.' DIGIT+ ;
EXP : [eE] [-+]? DIGIT+ ;
FLOAT : DIGIT* FRAC EXP? ;
WS: [ \n\t\r]+ -> channel(HIDDEN);
///*
// * Lexer Rules
// */
NUMERIC: (FLOAT | INTEGER);
PLUSPLUS: '++';
MINUS: '-';
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]* ;
EQUAL : '=' ;
STRING : '"' (~["\r\n] | '""')* '"' ;
INTEGER: [0-9] [0-9]*;
DIGIT : [0-9] ;
FRAC : '.' DIGIT+ ;
EXP : [eE] [-+]? DIGIT+ ;
FLOAT : DIGIT* FRAC EXP? ;
WS: [ \n\t\r]+ -> channel(HIDDEN);
My partial class of parser written by me (not the generated part):
public partial class BatshGParser
{
//"extensions" for contexts:
public partial class FunctionCallContext
{
private Dictionary<string, object> localVariables = new Dictionary<string, object>();
private bool isFunctionReturning;
public FunctionCallContext()
{
localVariables = new Dictionary<string, object>();
isFunctionReturning = false;
}
public Dictionary<string, object> LocalVariables { get => localVariables; set => localVariables = value; }
public bool IsFunctionReturning { get => isFunctionReturning; set => isFunctionReturning = value; }
}
public partial class FunctionDefContext
{
private List<string> parameterNames;
public FunctionDefContext()
{
parameterNames = new List<string>();
}
public List<string> ParameterNames { get => parameterNames; set => parameterNames = value; }
}
}
And relevant parts (and maybe a little more) of my visitor:
public class BatshGVisitor : BatshGBaseVisitor<ResultValue>
{
public ResultValue Result { get; set; }
public StringBuilder OutputForPrint { get; set; }
private Dictionary<string, object> globalVariables = new Dictionary<string, object>();
//string = function name
//object = parameter list
//object = return value
private Dictionary<string, Func<List<object>, object>> globalFunctions = new Dictionary<string, Func<List<object>, object>>();
private Stack<BatshGParser.FunctionCallContext> actualFunctions = new Stack<BatshGParser.FunctionCallContext>();
public override ResultValue VisitCompileUnit([NotNull] BatshGParser.CompileUnitContext context)
{
OutputForPrint = new StringBuilder("");
isSearchingForFunctionDefinitions = true;
var resultvalue = VisitChildren(context);
isSearchingForFunctionDefinitions = false;
resultvalue = VisitChildren(context);
Result = new ResultValue() { ExpType = "string", ExpValue = resultvalue.ExpValue ?? null };
return Result;
}
public override ResultValue VisitChildren([NotNull] IRuleNode node)
{
if (this.isSearchingForFunctionDefinitions)
{
for (int i = 0; i < node.ChildCount; i++)
{
if (node.GetChild(i) is BatshGParser.FunctionDefContext)
{
Visit(node.GetChild(i));
}
}
}
return base.VisitChildren(node);
}
protected override bool ShouldVisitNextChild([NotNull] IRuleNode node, ResultValue currentResult)
{
if (isSearchingForFunctionDefinitions)
{
if (node is BatshGParser.FunctionDefContext)
{
return true;
}
else
return false;
}
else
{
if (node is BatshGParser.FunctionDefContext)
{
return false;
}
else
return base.ShouldVisitNextChild(node, currentResult);
}
}
public override ResultValue VisitFunctionDef([NotNull] BatshGParser.FunctionDefContext context)
{
string functionName = null;
functionName = context.funcName.Text;
if (context.functionParameters != null)
{
List<string> plist = CollectParamNames(context.functionParameters);
context.ParameterNames = plist;
}
if (isSearchingForFunctionDefinitions)
globalFunctions.Add(functionName,
(
delegate(List<object> args)
{
var currentMethod = (args[0] as BatshGParser.FunctionCallContext);
this.actualFunctions.Push(currentMethod);
//args[0] is the context
for (int i = 1; i < args.Count; i++)
{
currentMethod.LocalVariables.Add(context.ParameterNames[i - 1],
(args[i] as ResultValue).ExpValue
);
}
ResultValue retval = null;
retval = this.VisitStatementPart(context.statements);
this.actualFunctions.Peek().IsFunctionReturning = false;
actualFunctions.Pop();
return retval;
}
)
);
return new ResultValue()
{
};
}
public override ResultValue VisitStatementPart([NotNull] BatshGParser.StatementPartContext context)
{
if (!this.actualFunctions.Peek().IsFunctionReturning)
{
return VisitChildren(context);
}
else
{
return null;
}
}
public override ResultValue VisitReturnStatement([NotNull] BatshGParser.ReturnStatementContext context)
{
this.actualFunctions.Peek().IsFunctionReturning = true;
ResultValue retval = null;
if (context.returnValue != null)
{
retval = Visit(context.returnValue);
}
return retval;
}
public override ResultValue VisitArgumentList([NotNull] BatshGParser.ArgumentListContext context)
{
List<ResultValue> argumentList = new List<ResultValue>();
foreach (var item in context.children)
{
var tt = item.GetText();
if (item.GetText() != ",")
{
ResultValue rv = Visit(item);
argumentList.Add(rv);
}
}
return
new ResultValue()
{
ExpType = "list",
ExpValue = argumentList ?? null
};
}
public override ResultValue VisitFunctionCall([NotNull] BatshGParser.FunctionCallContext context)
{
string functionName = context.funcName.Text;
int hashcodeOfContext = context.GetHashCode();
object functRetVal = null;
List<object> argumentList = new List<object>()
{
context
//here come the actual parameters later
};
ResultValue argObjects = null;
if (context.functionArguments != null)
{
argObjects = VisitArgumentList(context.functionArguments);
}
if (argObjects != null )
{
if (argObjects.ExpValue is List<ResultValue>)
{
var argresults = (argObjects.ExpValue as List<ResultValue>) ?? null;
foreach (var arg in argresults)
{
argumentList.Add(arg);
}
}
}
if (globalFunctions.ContainsKey(functionName))
{
{
functRetVal = globalFunctions[functionName]( argumentList );
}
}
return new ResultValue()
{
ExpType = ((ResultValue)functRetVal).ExpType,
ExpValue = ((ResultValue)functRetVal).ExpValue
};
}
public override ResultValue VisitVariableD([NotNull] BatshGParser.VariableDContext context)
{
object variable;
string variableName = context.GetChild(0).ToString();
string typename = "";
Dictionary<string, object> variables = null;
if (actualFunctions.Count > 0)
{
Dictionary<string, object> localVariables =
actualFunctions.Peek().LocalVariables;
if (localVariables.ContainsKey(variableName))
{
variables = localVariables;
}
}
else
{
variables = globalVariables;
}
if (variables.ContainsKey(variableName))
{
variable = variables[variableName];
typename = charpTypesToBatshTypes[variable.GetType()];
}
else
{
Type parentContextType = contextTypes[context.parent.GetType()];
typename = charpTypesToBatshTypes[parentContextType];
variable = new object();
if (typename.Equals("string"))
{
variable = string.Empty;
}
else
{
variable = 0d;
}
}
return new ResultValue()
{
ExpType = typename,
ExpValue = variable
};
}
public override ResultValue VisitAssignment([NotNull] BatshGParser.AssignmentContext context)
{
string varname = context.varName.Text;
ResultValue varAsResultValue = Visit(context.right);
Dictionary<string, object> localVariables = null;
if (this.actualFunctions.Count > 0)
{
localVariables =
actualFunctions.Peek().LocalVariables;
if (localVariables.ContainsKey(varname))
{
localVariables[varname] = varAsResultValue.ExpValue;
}
else
if (globalVariables.ContainsKey(varname))
{
globalVariables[varname] = varAsResultValue.ExpValue;
}
else
{
localVariables.Add(varname, varAsResultValue.ExpValue);
}
}
else
{
if (globalVariables.ContainsKey(varname))
{
globalVariables[varname] = varAsResultValue.ExpValue;
}
else
{
globalVariables.Add(varname, varAsResultValue.ExpValue);
}
}
return varAsResultValue;
}
}
What could cause the problem? Thank you!
Why does parser generated by ANTLR reuse context objects?
It doesn't. Each function call in your source code will correspond to exactly one FunctionCallContext object and those will be unique. They'd have to be, even for two entirely identical function calls, because they also contain meta data, such as where in the source the function call appears - and that's obviously going to differ between calls even if everything else is the same.
To illustrate this, consider the following source code:
function f(x) {
return f(x);
}
print(f(x));
This will create a tree containing exactly two FunctionCallContext objects - one for line 2 and one for line 4. They will both be distinct - they'll both have child nodes referring to the function name f and the argument x, but they'll have different location information and a different hash code - as will the child nodes. Nothing is being reused here.
What could cause the problem?
The fact that you're seeing the same node multiple times is simply due to the fact that you're visiting the same part of the tree multiple times. That's a perfectly normal thing to do for your use case, but in your case it causes a problem because you stored mutable data in the object, assuming that you'd get a fresh FunctionCall object for each time a function call happens at run time - rather than each time a function call appears in the source code.
That's not how parse trees work (they represent the structure of the source code, not the sequence of calls that might happen at run time), so you can't use FunctionCallContext objects to store information about a specific run-time function call. In general, I'd consider it a bad idea to put mutable state into context objects.
Instead you should put your mutable state into your visitor object. For your specific problem that means having a call stack containing the local variables of each run-time function call. Each time a function starts execution, you can push a frame onto the stack and each time a function exits, you can pop it. That way the top of the stack will always contain the local variables of the function currently being executed.
PS: This is unrelated to your problem, but the usual rules of precedence in arithmetic expressions are such that, + has the same precedence as - and * has the same precedence as /. In your grammar the precedence of / is greater than that of * and that of - higher than +. This means that for example 9 * 5 / 3 is going to evaluate to 5, when it should be 15 (assuming the usual rules for integer arithmetic).
To fix this + and -, as well as * and / should be part of the same rule, so they get the same precedence:
| left=numericexpression op=('*'|'/') right=numericexpression #MulOrDiv
| left=numericexpression op=('+'|'-') right=numericexpression #PlusOrMinus
Related
In Java I can pass a Scanner a string and then I can do handy things like, scanner.hasNext() or scanner.nextInt(), scanner.nextDouble() etc.
This allows some pretty clean code for parsing a string that contains rows of numbers.
How is this done in C# land?
If you had a string that say had:
"0 0 1 22 39 0 0 1 2 33 33"
In Java I would pass that to a scanner and do a
while(scanner.hasNext())
myArray[i++] = scanner.nextInt();
Or something very similar. What is the C#' ish way to do this?
I'm going to add this as a separate answer because it's quite distinct from the answer I already gave. Here's how you could start creating your own Scanner class:
class Scanner : System.IO.StringReader
{
string currentWord;
public Scanner(string source) : base(source)
{
readNextWord();
}
private void readNextWord()
{
System.Text.StringBuilder sb = new StringBuilder();
char nextChar;
int next;
do
{
next = this.Read();
if (next < 0)
break;
nextChar = (char)next;
if (char.IsWhiteSpace(nextChar))
break;
sb.Append(nextChar);
} while (true);
while((this.Peek() >= 0) && (char.IsWhiteSpace((char)this.Peek())))
this.Read();
if (sb.Length > 0)
currentWord = sb.ToString();
else
currentWord = null;
}
public bool hasNextInt()
{
if (currentWord == null)
return false;
int dummy;
return int.TryParse(currentWord, out dummy);
}
public int nextInt()
{
try
{
return int.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNextDouble()
{
if (currentWord == null)
return false;
double dummy;
return double.TryParse(currentWord, out dummy);
}
public double nextDouble()
{
try
{
return double.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNext()
{
return currentWord != null;
}
}
Using part of the answers already given, I've created a StringReader that can extract Enum and any data type that implements IConvertible.
Usage
using(var reader = new PacketReader("1 23 ErrorOk StringValue 15.22")
{
var index = reader.ReadNext<int>();
var count = reader.ReadNext<int>();
var result = reader.ReadNext<ErrorEnum>();
var data = reader.ReadNext<string>();
var responseTime = reader.ReadNext<double>();
}
Implementation
public class PacketReader : StringReader
{
public PacketReader(string s)
: base(s)
{
}
public T ReadNext<T>() where T : IConvertible
{
var sb = new StringBuilder();
do
{
var current = Read();
if (current < 0)
break;
sb.Append((char)current);
var next = (char)Peek();
if (char.IsWhiteSpace(next))
break;
} while (true);
var value = sb.ToString();
var type = typeof(T);
if (type.IsEnum)
return (T)Enum.Parse(type, value);
return (T)((IConvertible)value).ToType(typeof(T), System.Globalization.CultureInfo.CurrentCulture);
}
}
While this isn't the exact same fundamental concept, what you're looking for can be done with this lambda expression:
string foo = "0 0 1 22 39 0 0 1 2 33 33";
int[] data = foo.Split(' ').Select(p => int.Parse(p)).ToArray();
What this does is first Split the string, using a space as a delimiter. The Select function then allows you to specify an alias for a given member in the array (which I referred to as 'p' in this example), then perform an operation on that member to give a final result. The ToArray() call then turns this abstract enumerable class into a concrete array.
So in this end, this splits the string, then converts each element into an int and populates an int[] with the resulting values.
To my knowledge, there are no built in classes in the framework for doing this. You would have to roll your own.
That would not be too hard. A nice C# version might implement IEnumerable so you could say:
var scanner = new Scanner<int>(yourString);
foreach(int n in scanner)
; // your code
To get as close as possible to your syntax, this'll work if you're only interested in one type ("int" in the example):
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
IEnumerator<int> scanner = (from arg in args select int.Parse(arg)).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current);
}
}
Here's an even more whiz-bang version that allows you to access any type that is supported by string's IConvertible implementation:
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
var scanner = args.Select<string, Func<Type, Object>>((string s) => {
return (Type t) =>
((IConvertible)s).ToType(t, System.Globalization.CultureInfo.InvariantCulture);
}).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current(typeof(int)));
}
}
Just pass a different type to the "typeof" operator in the while loop to choose the type.
These both require the latest versions of C# and the .NET framework.
You could use linq to accomplish this like so:
string text = "0 0 1 22 39 0 0 1 2 33 33";
text.Where(i => char.IsNumber(i)).Write(); // do somthing usefull here...
I would do this in one of a couple ways depending on whether 1) you are using the latest .NET framework with LINQ support and 2) you know the values are valid integers. Here's a function to demonstrate both:
int[] ParseIntArray(string input, bool validateRequired)
{
if (validateRequired)
{
string[] split = input.Split();
List<int> result = new List<int>(split.Length);
int parsed;
for (int inputIdx = 0; inputIdx < split.Length; inputIdx++)
{
if (int.TryParse(split[inputIdx], out parsed))
result.Add(parsed);
}
return result.ToArray();
}
else
return (from i in input.Split()
select int.Parse(i)).ToArray();
}
Based on comments in other answer(s), I assume you need the validation. After reading those comments, I think the closest thing you'll get is int.TryParse and double.TryParse, which is kind of a combination of hasNextInt and nextInt (or a combination of hasNextDouble and nextDouble).
I am working on a Code Editor derived from Winforms RichTextBox using C#. I have already implemented autocompletion and syntax hilighting, but code folding is somewhat a different approach. What I want to achieve is:
The code below:
public static SomeFunction(EventArgs e)
{
//Some code
//Some code
//Some code
//Some code
//Some code
//Some code
}
Should become:
public static SomeFunction(EventArgs e)[...]
Where[...] is a shortened code that is displayed in a tooltip when you hover over at [...]
Any ideas or suggestions how to do it, either using Regex or procedural code?
I have created a parser that will return the indices of code folding locations.
Folding delimiters are defined by regular expressions.
You can specify a start and ending index so that you don't have to check the entire code when one area is updated.
It will throw exceptions if the code is not properly formatted, feel free to change that behavior. One alternative could be that it keeps moving up the stack until an appropriate end token is found.
Fold Finder
public class FoldFinder
{
public static FoldFinder Instance { get; private set; }
static FoldFinder()
{
Instance = new FoldFinder();
}
public List<SectionPosition> Find(string code, List<SectionDelimiter> delimiters, int start = 0,
int end = -1)
{
List<SectionPosition> positions = new List<SectionPosition>();
Stack<SectionStackItem> stack = new Stack<SectionStackItem>();
int regexGroupIndex;
bool isStartToken;
SectionDelimiter matchedDelimiter;
SectionStackItem currentItem;
Regex scanner = RegexifyDelimiters(delimiters);
foreach (Match match in scanner.Matches(code, start))
{
// the pattern for every group is that 0 corresponds to SectionDelimter, 1 corresponds to Start
// and 2, corresponds to End.
regexGroupIndex =
match.Groups.Cast<Group>().Select((g, i) => new {
Success = g.Success,
Index = i
})
.Where(r => r.Success && r.Index > 0).First().Index;
matchedDelimiter = delimiters[(regexGroupIndex - 1) / 3];
isStartToken = match.Groups[regexGroupIndex + 1].Success;
if (isStartToken)
{
stack.Push(new SectionStackItem()
{
Delimter = matchedDelimiter,
Position = new SectionPosition() { Start = match.Index }
});
}
else
{
currentItem = stack.Pop();
if (currentItem.Delimter == matchedDelimiter)
{
currentItem.Position.End = match.Index + match.Length;
positions.Add(currentItem.Position);
// if searching for an end, and we've passed it, and the stack is empty then quit.
if (end > -1 && currentItem.Position.End >= end && stack.Count == 0) break;
}
else
{
throw new Exception(string.Format("Invalid Ending Token at {0}", match.Index));
}
}
}
if (stack.Count > 0) throw new Exception("Not enough closing symbols.");
return positions;
}
public Regex RegexifyDelimiters(List<SectionDelimiter> delimiters)
{
return new Regex(
string.Join("|", delimiters.Select(d =>
string.Format("(({0})|({1}))", d.Start, d.End))));
}
}
public class SectionStackItem
{
public SectionPosition Position;
public SectionDelimiter Delimter;
}
public class SectionPosition
{
public int Start;
public int End;
}
public class SectionDelimiter
{
public string Start;
public string End;
}
Sample Find
The sample below matches folds delimited by {,}, [,], and right after a symbol until a ;. I don't see too many IDE's that fold for each line, but it might be handy at long pieces of code, like a LINQ query.
var sectionPositions =
FoldFinder.Instance.Find("abc { def { qrt; ghi [ abc ] } qrt }", new List<SectionDelimiter>(
new SectionDelimiter[3] {
new SectionDelimiter() { Start = "\\{", End = "\\}" },
new SectionDelimiter() { Start = "\\[", End = "\\]" },
new SectionDelimiter() { Start = "(?<=\\[|\\{|;|^)[^[{;]*(?=;)", End = ";" },
}));
I've got the following BoolExpr class:
class BoolExpr
{
public enum BOP { LEAF, AND, OR, NOT };
//
// inner state
//
private BOP _op;
private BoolExpr _left;
private BoolExpr _right;
private String _lit;
//
// private constructor
//
private BoolExpr(BOP op, BoolExpr left, BoolExpr right)
{
_op = op;
_left = left;
_right = right;
_lit = null;
}
private BoolExpr(String literal)
{
_op = BOP.LEAF;
_left = null;
_right = null;
_lit = literal;
}
//
// accessor
//
public BOP Op
{
get { return _op; }
set { _op = value; }
}
public BoolExpr Left
{
get { return _left; }
set { _left = value; }
}
public BoolExpr Right
{
get { return _right; }
set { _right = value; }
}
public String Lit
{
get { return _lit; }
set { _lit = value; }
}
//
// public factory
//
public static BoolExpr CreateAnd(BoolExpr left, BoolExpr right)
{
return new BoolExpr(BOP.AND, left, right);
}
public static BoolExpr CreateNot(BoolExpr child)
{
return new BoolExpr(BOP.NOT, child, null);
}
public static BoolExpr CreateOr(BoolExpr left, BoolExpr right)
{
return new BoolExpr(BOP.OR, left, right);
}
public static BoolExpr CreateBoolVar(String str)
{
return new BoolExpr(str);
}
public BoolExpr(BoolExpr other)
{
// No share any object on purpose
_op = other._op;
_left = other._left == null ? null : new BoolExpr(other._left);
_right = other._right == null ? null : new BoolExpr(other._right);
_lit = new StringBuilder(other._lit).ToString();
}
//
// state checker
//
Boolean IsLeaf()
{
return (_op == BOP.LEAF);
}
Boolean IsAtomic()
{
return (IsLeaf() || (_op == BOP.NOT && _left.IsLeaf()));
}
}
What algorithm should I use to parse an input boolean expression string like "¬((A ∧ B) ∨ C ∨ D)" and load it into the above class?
TL;DR: If you want to see the code, jump to the second portion of the answer.
I would build a tree from the expression to parse and then traverse it depth first. You can refer to the wikipedia article about Binary Expression Trees to get a feel for what I'm suggesting.
Start by adding the omitted optional parentheses to make the next step easier
When you read anything that is not an operator or a parenthese, create a LEAF type node
When you read any operator (in your case not, and, or), create the corresponding operator node
Binary operators get the previous and following nodes as children, unary operators only get the next one.
So, for your example ¬((A ∧ B) ∨ C ∨ D), the algorithm would go like this:
¬((A ∧ B) ∨ C ∨ D) becomes ¬(((A ∧ B) ∨ C) ∨ D)
Create a NOT node, it'll get the result of the following opening paren as a child.
Create A LEAF node, AND node and B LEAF node. AND has A and B as children.
Create OR node, it has the previously created AND as a child and a new LEAF node for C.
Create OR node, it has the previously created OR and a new node for D as children.
At that point, your tree looks like this:
NOT
|
OR
/\
OR D
/ \
AND C
/\
A B
You can then add a Node.Evaluate() method that evaluates recursively based on its type (polymorphism could be used here). For example, it could look something like this:
class LeafEx {
bool Evaluate() {
return Boolean.Parse(this.Lit);
}
}
class NotEx {
bool Evaluate() {
return !Left.Evaluate();
}
}
class OrEx {
bool Evaluate() {
return Left.Evaluate() || Right.Evaluate();
}
}
And so on and so forth. To get the result of your expression, you then only need to call
bool result = Root.Evaluate();
Alright, since it's not an assignment and it's actually a fun thing to implement, I went ahead. Some of the code I'll post here is not related to what I described earlier (and some parts are missing) but I'll leave the top part in my answer for reference (nothing in there is wrong (hopefully!)).
Keep in mind this is far from optimal and that I made an effort to not modify your provided BoolExpr class. Modifying it could allow you to reduce the amount of code. There's also no error checking at all.
Here's the main method
static void Main(string[] args)
{
//We'll use ! for not, & for and, | for or and remove whitespace
string expr = #"!((A&B)|C|D)";
List<Token> tokens = new List<Token>();
StringReader reader = new StringReader(expr);
//Tokenize the expression
Token t = null;
do
{
t = new Token(reader);
tokens.Add(t);
} while (t.type != Token.TokenType.EXPR_END);
//Use a minimal version of the Shunting Yard algorithm to transform the token list to polish notation
List<Token> polishNotation = TransformToPolishNotation(tokens);
var enumerator = polishNotation.GetEnumerator();
enumerator.MoveNext();
BoolExpr root = Make(ref enumerator);
//Request boolean values for all literal operands
foreach (Token tok in polishNotation.Where(token => token.type == Token.TokenType.LITERAL))
{
Console.Write("Enter boolean value for {0}: ", tok.value);
string line = Console.ReadLine();
booleanValues[tok.value] = Boolean.Parse(line);
Console.WriteLine();
}
//Eval the expression tree
Console.WriteLine("Eval: {0}", Eval(root));
Console.ReadLine();
}
The tokenization phase creates a Token object for all tokens of the expression. It helps keep the parsing separated from the actual algorithm. Here's the Token class that performs this:
class Token
{
static Dictionary<char, KeyValuePair<TokenType, string>> dict = new Dictionary<char, KeyValuePair<TokenType, string>>()
{
{
'(', new KeyValuePair<TokenType, string>(TokenType.OPEN_PAREN, "(")
},
{
')', new KeyValuePair<TokenType, string>(TokenType.CLOSE_PAREN, ")")
},
{
'!', new KeyValuePair<TokenType, string>(TokenType.UNARY_OP, "NOT")
},
{
'&', new KeyValuePair<TokenType, string>(TokenType.BINARY_OP, "AND")
},
{
'|', new KeyValuePair<TokenType, string>(TokenType.BINARY_OP, "OR")
}
};
public enum TokenType
{
OPEN_PAREN,
CLOSE_PAREN,
UNARY_OP,
BINARY_OP,
LITERAL,
EXPR_END
}
public TokenType type;
public string value;
public Token(StringReader s)
{
int c = s.Read();
if (c == -1)
{
type = TokenType.EXPR_END;
value = "";
return;
}
char ch = (char)c;
if (dict.ContainsKey(ch))
{
type = dict[ch].Key;
value = dict[ch].Value;
}
else
{
string str = "";
str += ch;
while (s.Peek() != -1 && !dict.ContainsKey((char)s.Peek()))
{
str += (char)s.Read();
}
type = TokenType.LITERAL;
value = str;
}
}
}
At that point, in the main method, you can see I transform the list of tokens in Polish Notation order. It makes the creation of the tree much easier and I use a modified implementation of the Shunting Yard Algorithm for this:
static List<Token> TransformToPolishNotation(List<Token> infixTokenList)
{
Queue<Token> outputQueue = new Queue<Token>();
Stack<Token> stack = new Stack<Token>();
int index = 0;
while (infixTokenList.Count > index)
{
Token t = infixTokenList[index];
switch (t.type)
{
case Token.TokenType.LITERAL:
outputQueue.Enqueue(t);
break;
case Token.TokenType.BINARY_OP:
case Token.TokenType.UNARY_OP:
case Token.TokenType.OPEN_PAREN:
stack.Push(t);
break;
case Token.TokenType.CLOSE_PAREN:
while (stack.Peek().type != Token.TokenType.OPEN_PAREN)
{
outputQueue.Enqueue(stack.Pop());
}
stack.Pop();
if (stack.Count > 0 && stack.Peek().type == Token.TokenType.UNARY_OP)
{
outputQueue.Enqueue(stack.Pop());
}
break;
default:
break;
}
++index;
}
while (stack.Count > 0)
{
outputQueue.Enqueue(stack.Pop());
}
return outputQueue.Reverse().ToList();
}
After this transformation, our token list becomes NOT, OR, OR, C, D, AND, A, B.
At this point, we're ready to create the expression tree. The properties of Polish Notation allow us to just walk the Token List and recursively create the tree nodes (we'll use your BoolExpr class) as we go:
static BoolExpr Make(ref List<Token>.Enumerator polishNotationTokensEnumerator)
{
if (polishNotationTokensEnumerator.Current.type == Token.TokenType.LITERAL)
{
BoolExpr lit = BoolExpr.CreateBoolVar(polishNotationTokensEnumerator.Current.value);
polishNotationTokensEnumerator.MoveNext();
return lit;
}
else
{
if (polishNotationTokensEnumerator.Current.value == "NOT")
{
polishNotationTokensEnumerator.MoveNext();
BoolExpr operand = Make(ref polishNotationTokensEnumerator);
return BoolExpr.CreateNot(operand);
}
else if (polishNotationTokensEnumerator.Current.value == "AND")
{
polishNotationTokensEnumerator.MoveNext();
BoolExpr left = Make(ref polishNotationTokensEnumerator);
BoolExpr right = Make(ref polishNotationTokensEnumerator);
return BoolExpr.CreateAnd(left, right);
}
else if (polishNotationTokensEnumerator.Current.value == "OR")
{
polishNotationTokensEnumerator.MoveNext();
BoolExpr left = Make(ref polishNotationTokensEnumerator);
BoolExpr right = Make(ref polishNotationTokensEnumerator);
return BoolExpr.CreateOr(left, right);
}
}
return null;
}
Now we're golden! We have the expression tree that represents the expression so we'll ask the user for the actual boolean values of each literal operand and evaluate the root node (which will recursively evaluate the rest of the tree as needed).
My Eval function follows, keep in mind I'd use some polymorphism to make this cleaner if I modified your BoolExpr class.
static bool Eval(BoolExpr expr)
{
if (expr.IsLeaf())
{
return booleanValues[expr.Lit];
}
if (expr.Op == BoolExpr.BOP.NOT)
{
return !Eval(expr.Left);
}
if (expr.Op == BoolExpr.BOP.OR)
{
return Eval(expr.Left) || Eval(expr.Right);
}
if (expr.Op == BoolExpr.BOP.AND)
{
return Eval(expr.Left) && Eval(expr.Right);
}
throw new ArgumentException();
}
As expected, feeding our test expression ¬((A ∧ B) ∨ C ∨ D) with values false, true, false, true for A, B, C, D respectively yields the result false.
From the algorithm point of view, to parse an expression, you need one stack.
We use two steps algorithm :
Lexing
The aim of lexing is to get 'keywords', 'identifiers' and 'separators' :
- A keyword is 'if' 'then' 'else' '(' ')' '/\' '/' etc...
- An identifiers in your case is 'A', 'B', 'C' etc...
- A separator is blank space, tabulation, end of line, end of file, etc...
Lexing consist of using an automata. In lexing you will read your input string char by char. When you encouter a char that is compatible with one of your keyword, identifiers, separators, you start a sequence of char. When you encouter a separators you stop the sequence, look in a dictionnary of the sequence is a keyword (if not it is a identifier); then put the tuple [sequence, keyword or identifier/class] on the stack.
I leave you as exercice the case of small keyword '(' that can be also see as separators.
Parsing
Parsing is similar to grammar. In your case the only rules to check are comma, and binary operations, and just a simple identifier.
formaly :
expression::
'(' expression ')'
expression /\ expression
expression \/ expression
identifier
This can be write by a recursive function.
First reverse your stack, then:
myParseExpression(stack, myC#ResultObject)
{
if(stack.top = kewyord.'(' )
then myParseOpenComma(all stack but top, myC#ResultObject)
if(stack.top = keyword.'/\')
then myParseBinaryAnd(stack, myC#ResultObject)
}
myParseOpenComma(stack, myC#ResultObject)
{
...
}
myParseBinaryAnd(stack, myC#ResultObject)
{
myNewRigthPartOfExpr = new C#ResultObject
myParseExpression(stack.top, myNewRigthPartOfExpr)
remove top of stack;
myNewLeftPartOfExpr = new C#ResultObject
myParseExpression(stack.top, myNewLeftPartOfExpr)
C#ResultObject.add("AND", myNewRigthPartOfExpr, myNewLeftPartOfExpr)
}
...
There is multiple function that share recursion on each other.
As exercice, try to add the negation.
Lexing is traditionnally done by a lexer (like lex tool).
Parsing is traditionnaly done by a parser (like bison tool).
Tool allow write of thoses function more like I have done in the formaly expression.
Thoses aspect are fundamental of program compilation.
Coding thoses thing will improve you a lot because it is hard and fundamental.
I have recently started learning programming and chose .NET with Visual Studio Express. I am trying to write a CSV Parser as a learning experience and it's giving me a lot more trouble than I expected. I am starting with the reader. One thing I am doing differently in my parser is that I am not using quotes. I am escaping commas with a backslash, backslashes with a backslash, and line breaks with a backslash. For example, if a comma is preceded by an even number of backslashes it is a field and I halve any blocks of backslashes. If it's odd, it's not end of field and I still halve blocks of backslashes. I'm not sure how robust this will be if I can ever get it working, except I'm only learning at this point and I'm looking at it mostly as an exercise in manipulating data structures.
I have a question in reference to the code snippet at the bottom of this post and how to make it not so static and limiting and still compile and run for me.
The line of code that reads:
var contents = (String)fileContents;
I keep trying to make it more dynamic to increase flexibility and make it something like this:
var contents = (otherVariableThatCouldChangeTypeAtRuntime.GetType())fileContents;
Is there something I can do to get it to do this and still compile? Maybe something like Option Infer from VB.NET might help, except I can't find that.
Also, I have written this in VB.NET as well. It seems to me that VB.NET allows me a considerably more dynamic style than what I've posted below, such as not having to type var over and over again and not having to keep casting my index counting variable into an integer over and over again if I shut off Option Strict and Option Explicit as well as turn on Option Infer. For example, C# won't let me type something analogous to the following VB.NET code even though I know the methods and properties I will be calling at run-time will be there at run-time.
Dim contents As Object = returnObjectICantDetermineAtComplieTime()
contents.MethodIKnowWillBeThereAtRunTime()
Can I do these things in C#? Anyways, here's the code and thanks in advance for any responses.
public class Widget
{
public object ID { get; set; }
public object PartNumber { get; set; }
public object VendorID { get; set; }
public object TypeID { get; set; }
public object KeyMarkLoc { get; set; }
public Widget() { }
}
public object ReadFromFile(object source)
{
var fileContents = new FileService().GetFileContents(source);
object records = null;
if (fileContents == null)
return null;
var stringBuffer = "";
var contents = (String)fileContents;
while (contents.Length > 0 && contents != "\r\n")
{
for (object i = 0; (int)i < contents.Length; i=(int)i+1 )
{
object character = contents[(int)i];
if (!stringBuffer.EndsWith("\r\n"))
{
stringBuffer += character.ToString();
}
if (stringBuffer.EndsWith("\r\n"))
{
var bSlashes = getBackSlashes(stringBuffer.Substring(0, stringBuffer.Length - 4));
stringBuffer = stringBuffer.Substring(0, stringBuffer.Length - 4);
if ((int)bSlashes % 2 == 0)
{
break;
}
}
}
contents = contents.Substring(stringBuffer.Length+2);
records = records == null ? getIncrementedList(new List<object>(), getNextObject(getFields(stringBuffer))) : getIncrementedList((List<object>)records, getNextObject(getFields(stringBuffer)));
}
return records;
}
private Widget getNextRecord(object[] fields)
{
var personStudent = new Widget();
personStudent.ID = fields[0];
personStudent.PartNumber = fields[1];
personStudent.VendorID = fields[2];
personStudent.TypeID = fields[3];
personStudent.GridPath = fields[4];
return personStudent;
}
private object[] getFields(object buffer)
{
var fields = new object[5];
var intFieldCount = 0;
var fieldVal = "";
var blocks = buffer.ToString().Split(',');
foreach (var block in blocks)
{
var bSlashes = getBackSlashes(block);
var intRemoveCount = (int)bSlashes / 2;
if ((int)bSlashes % 2 == 0) // Delimiter
{
fieldVal += block.Substring(0, block.Length - intRemoveCount);
fields[intFieldCount] += fieldVal;
intFieldCount++;
fieldVal = "";
}
else // Part of Field
{
fieldVal += block.Substring(0, block.Length - intRemoveCount - 1) + ",";
}
}
return fields;
}
private object getBackSlashes(object block)
{
object bSlashes = block.ToString().Length == 0 ? new int?(0) : null;
for (object i = block.ToString().Length - 1; (int)i>-1; i=(int)i-1)
{
if (block.ToString()[(int)i] != '\\') return bSlashes = bSlashes == null ? 0 : bSlashes;
bSlashes = bSlashes == null ? 1 : (int)bSlashes + 1;
}
return bSlashes;
}
}
Here is the web service code.
[WebMethod]
public object GetFileContents(object source)
{
return File.ReadAllText(source.ToString());
}
Dim contents As Object = returnObjectICantDetermineAtComplieTime()
contents.MethodIKnowWillBeThereAtRunTime()
You can do this with the dynamic type.
See for more information: http://msdn.microsoft.com/en-us/library/dd264736.aspx
In Java I can pass a Scanner a string and then I can do handy things like, scanner.hasNext() or scanner.nextInt(), scanner.nextDouble() etc.
This allows some pretty clean code for parsing a string that contains rows of numbers.
How is this done in C# land?
If you had a string that say had:
"0 0 1 22 39 0 0 1 2 33 33"
In Java I would pass that to a scanner and do a
while(scanner.hasNext())
myArray[i++] = scanner.nextInt();
Or something very similar. What is the C#' ish way to do this?
I'm going to add this as a separate answer because it's quite distinct from the answer I already gave. Here's how you could start creating your own Scanner class:
class Scanner : System.IO.StringReader
{
string currentWord;
public Scanner(string source) : base(source)
{
readNextWord();
}
private void readNextWord()
{
System.Text.StringBuilder sb = new StringBuilder();
char nextChar;
int next;
do
{
next = this.Read();
if (next < 0)
break;
nextChar = (char)next;
if (char.IsWhiteSpace(nextChar))
break;
sb.Append(nextChar);
} while (true);
while((this.Peek() >= 0) && (char.IsWhiteSpace((char)this.Peek())))
this.Read();
if (sb.Length > 0)
currentWord = sb.ToString();
else
currentWord = null;
}
public bool hasNextInt()
{
if (currentWord == null)
return false;
int dummy;
return int.TryParse(currentWord, out dummy);
}
public int nextInt()
{
try
{
return int.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNextDouble()
{
if (currentWord == null)
return false;
double dummy;
return double.TryParse(currentWord, out dummy);
}
public double nextDouble()
{
try
{
return double.Parse(currentWord);
}
finally
{
readNextWord();
}
}
public bool hasNext()
{
return currentWord != null;
}
}
Using part of the answers already given, I've created a StringReader that can extract Enum and any data type that implements IConvertible.
Usage
using(var reader = new PacketReader("1 23 ErrorOk StringValue 15.22")
{
var index = reader.ReadNext<int>();
var count = reader.ReadNext<int>();
var result = reader.ReadNext<ErrorEnum>();
var data = reader.ReadNext<string>();
var responseTime = reader.ReadNext<double>();
}
Implementation
public class PacketReader : StringReader
{
public PacketReader(string s)
: base(s)
{
}
public T ReadNext<T>() where T : IConvertible
{
var sb = new StringBuilder();
do
{
var current = Read();
if (current < 0)
break;
sb.Append((char)current);
var next = (char)Peek();
if (char.IsWhiteSpace(next))
break;
} while (true);
var value = sb.ToString();
var type = typeof(T);
if (type.IsEnum)
return (T)Enum.Parse(type, value);
return (T)((IConvertible)value).ToType(typeof(T), System.Globalization.CultureInfo.CurrentCulture);
}
}
While this isn't the exact same fundamental concept, what you're looking for can be done with this lambda expression:
string foo = "0 0 1 22 39 0 0 1 2 33 33";
int[] data = foo.Split(' ').Select(p => int.Parse(p)).ToArray();
What this does is first Split the string, using a space as a delimiter. The Select function then allows you to specify an alias for a given member in the array (which I referred to as 'p' in this example), then perform an operation on that member to give a final result. The ToArray() call then turns this abstract enumerable class into a concrete array.
So in this end, this splits the string, then converts each element into an int and populates an int[] with the resulting values.
To my knowledge, there are no built in classes in the framework for doing this. You would have to roll your own.
That would not be too hard. A nice C# version might implement IEnumerable so you could say:
var scanner = new Scanner<int>(yourString);
foreach(int n in scanner)
; // your code
To get as close as possible to your syntax, this'll work if you're only interested in one type ("int" in the example):
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
IEnumerator<int> scanner = (from arg in args select int.Parse(arg)).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current);
}
}
Here's an even more whiz-bang version that allows you to access any type that is supported by string's IConvertible implementation:
static void Main(string[] args)
{
if (args.Length == 0) { args = new string[] { "3", "43", "6" }; }
var scanner = args.Select<string, Func<Type, Object>>((string s) => {
return (Type t) =>
((IConvertible)s).ToType(t, System.Globalization.CultureInfo.InvariantCulture);
}).GetEnumerator();
while (scanner.MoveNext())
{
Console.Write("{0} ", scanner.Current(typeof(int)));
}
}
Just pass a different type to the "typeof" operator in the while loop to choose the type.
These both require the latest versions of C# and the .NET framework.
You could use linq to accomplish this like so:
string text = "0 0 1 22 39 0 0 1 2 33 33";
text.Where(i => char.IsNumber(i)).Write(); // do somthing usefull here...
I would do this in one of a couple ways depending on whether 1) you are using the latest .NET framework with LINQ support and 2) you know the values are valid integers. Here's a function to demonstrate both:
int[] ParseIntArray(string input, bool validateRequired)
{
if (validateRequired)
{
string[] split = input.Split();
List<int> result = new List<int>(split.Length);
int parsed;
for (int inputIdx = 0; inputIdx < split.Length; inputIdx++)
{
if (int.TryParse(split[inputIdx], out parsed))
result.Add(parsed);
}
return result.ToArray();
}
else
return (from i in input.Split()
select int.Parse(i)).ToArray();
}
Based on comments in other answer(s), I assume you need the validation. After reading those comments, I think the closest thing you'll get is int.TryParse and double.TryParse, which is kind of a combination of hasNextInt and nextInt (or a combination of hasNextDouble and nextDouble).