Translating EBNF into Irony - c#

I am using Irony to create a parser for a scripting language, but I've come across a little problem: how do I translate an EBNF expression like this in Irony?
'(' [ Ident { ',' Ident } ] ')'
I already tried some tricks like
Chunk.Rule = (Ident | Ident + "," + Chunk);
CallArgs.Rule = '(' + Chunk + ')' | '(' + ')';
But it's ugly and I'm not even sure if that works the way it should (haven't tried it yet...). Has anyone any suggestions?
EDIT:
I found out these helper methods (MakeStarList, MakePlusList) but couldn't find out how to use them, because of the complete lack of documentation of Irony... Has anyone any clue?

// Declare the non-terminals
var Ident = new NonTerminal("Ident");
var IdentList = new NonTerminal("Term");
// Rules
IdentList.Rule = ToTerm("(") + MakePlusRule(IdentList, ",", Ident) + ")";
Ident.Rule = // specify whatever Ident is (I assume you mean an identifier of some kind).
You can use the MakePlusRule helper method to define a one-or-many occurrence of some terminal. The MakePlusRule is basically just present your terminals as standard recursive list-idiom:
Ident | IdentList + "," + Ident
It also marks the terminal as representing a list, which will tell the parser to unfold the list-tree as a convenient list of child nodes.

Related

weird regex behavior in the tokenization

I am using the following regex to tokenize:
reg = new Regex("([ \\t{}%$^&*():;_–`,\\-\\d!\"?\n])");
The regex is supposed to filter out everything later, however the input string format that i am having problem with is in the following form:
; "string1"; "string2"; "string...n";
the result of the string: ; "social life"; "city life"; "real life" as I know should be like the following:
; White " social White life " ; White " city White life " ; White " real White life "
However there is a problem such that, I get the output in the following form
; empty White empty " social White life " empty ; empty White empty " city White life " empty ; empty White empty " real White life " empty
White: means White-Space,
empty: means empty entry in the split array.
My code for split is as following:
string[] ret = reg.Split(input);
for (int i = 0; i < ret.Length; i++)
{
if (ret[i] == "")
Response.Write("empty<br>");
else
if (ret[i] == " ")
Response.Write("White<br>");
else
Response.Write(ret[i] + "<br>");
}
Why I get these empty entries ? and especially when there is ; followed by space followed by " then the result looks like the following:
; empty White empty "
can I get explanation of why the command adds empty entries ? and how to remove them without any additional O(n) complexity or using another data structure as ret
In my experience, splitting at regex matches is almost always not the best idea. You'll get much better results through plain matching.
And regexes are very well suited for tokenization purposes, as they let you implement a state machine really easily, just take a look at that:
\G(?:
(?<string> "(?>[^"\\]+|\\.)*" )
| (?<separator> ; )
| (?<whitespace> \s+ )
| (?<invalid> . )
)
Demo - use this with RegexOptions.IgnorePatternWhitespace of course.
Here, each match will have the following properties:
It will start at the end of the previous match, so there will be no unmatched text
It will contain exactly one matching group
The name of the group tells you the token type
You can ignore the whitespace group, and you should raise an error if you ever encounter a matching invalid group.
The string group will match an entire quoted string, it can handle escapes such as \" inside the string.
The invalid group should always be last in the pattern. You may add rules for other other types.
Some example code:
var regex = new Regex(#"
\G(?:
(?<string> ""(?>[^""\\]+|\\.)*"" )
| (?<separator> ; )
| (?<whitespace> \s+ )
| (?<invalid> . )
)
", RegexOptions.IgnorePatternWhitespace);
var input = "; \"social life\"; \"city life\"; \"real life\"";
var groupNames = regex.GetGroupNames().Skip(1).ToList();
foreach (Match match in regex.Matches(input))
{
var groupName = groupNames.Single(name => match.Groups[name].Success);
var group = match.Groups[groupName];
Console.WriteLine("{0}: {1}", groupName, group.Value);
}
This produces the following:
separator: ;
whitespace:
string: "social life"
separator: ;
whitespace:
string: "city life"
separator: ;
whitespace:
string: "real life"
See how much easier it is to deal with these results rather than using split?

mismatched Input when lexing and parsing with modes

I'm having an ANTLR4 problem with mismatched input but can't solve it. I've found a lot of questions dealing with that, and the usually revolve around the lexer matching something else to the token, but I don't see it in my case.
I've got this lexer grammar:
FieldStart : '[' Definition ']' -> pushMode(INFIELD) ;
Definition : 'Element';
mode INFIELD;
FieldEnd : '[end]' -> popMode ;
ContentValue : ~[[]* ;
Which then runs on the following parser:
field : FieldStart ContentValue FieldEnd #Field_Found;
I simplified it to zoom in to the problem, but here's the point where I can't get any further.
I'm running that on the following input:
[Element]Va-lu*e[end]
and I get this output:
Type : 001 | FieldStart | [Element]
Type : 004 | ContentValue | Va-lu*e
Type : 003 | FieldEnd | [end]
Type : -001 | EOF | <EOF>
([] [Element] Va-lu*e [end])
I generated the output with C#, doing the following (shortened):
string tokens = "";
foreach (IToken CurrToken in TokenStream.GetTokens())
{
if (CurrToken.Type == -1)
{
tokens += "Type : " + CurrToken.Type.ToString("000") + " | " + "EOF" + " | " + CurrToken.Text + "\n";
}
else
{
tokens += "Type : " + CurrToken.Type.ToString("000") + " | " + Lexer.RuleNames[CurrToken.Type - 1] + " | " + CurrToken.Text + "\n";
}
}
tokens += "\n\n" + ParseTree.ToStringTree();
Upon parsing this via
IParseTree ParseTree = Parser.field();
I am presented this error:
"mismatched input 'Va-lu*e' expecting ContentValue"
I just don't find the error, can you help me here?
I assume it's got something to do with the lexer mode, but from as far as I read it looks like the parser doesn't care (or know) about the modes.
Thanks!
Modes are not available in a combined grammar. Split your grammar and it should work.
Also, always check the error messages:
error(120): ../Field.g4:14:5: lexical modes are only allowed in lexer grammars
I think I have now figured out how to solve my problem, there seems to be a required configuration when working with a split Lexer / Parser grammar structure AND using Lexer modes in Visual Studio (tested 2012 and 2013) with the ANTRL4 NuGet release:
I had to include
options { tokenVocab = GRAMMAR_NAME_Lexer; }
in my parser grammar at the beginning.
Otherwise, the lexer did create the tokens and the modes as expected but the parser will not recognize lexer tokens that are in another mode but the default mode.
I have also experienced that the "popMode" lexer command does sometimes cause my TokenStream to throw an invalid state exception, I could solve that with using "mode(DEFAULT_MODE)" instead of "popMode".
I hope this helps somebody, but I'd still like if someone who understands ANTLR could offer some additional clarification, since I just "solved" it by toying around until it worked.

Reformat SQLGeography polygons to JSON

I am building a web service that serves geographic boundary data in JSON format.
The geographic data is stored in an SQL Server 2008 R2 database using the geography type in a table. I use [ColumnName].ToString() method to return the polygon data as text.
Example output:
POLYGON ((-6.1646509904325884 56.435153006374627, ... -6.1606079906751 56.4338050060666))
MULTIPOLYGON (((-6.1646509904325884 56.435153006374627 0 0, ... -6.1606079906751 56.4338050060666 0 0)))
Geographic definitions can take the form of either an array of lat/long pairs defining a polygon or in the case of multiple definitions, an array or polygons (multipolygon).
I have the following regex that converts the output to JSON objects contained in multi-dimensional arrays depending on the output.
Regex latlngMatch = new Regex(#"(-?[0-9]{1}\.\d*)\s(\d{2}.\d*)(?:\s0\s0,?)?", RegexOptions.Compiled);
private string ConvertPolysToJson(string polysIn)
{
return this.latlngMatch.Replace(polysIn.Remove(0, polysIn.IndexOf("(")) // remove POLYGON or MULTIPOLYGON
.Replace("(", "[") // convert to JSON array syntax
.Replace(")", "]"), // same as above
"{lng:$1,lat:$2},"); // reformat lat/lng pairs to JSON objects
}
This is actually working pretty well and converts the DB output to JSON on the fly in response to an operation call.
However I am no regex master and the calls to String.Replace() also seem inefficient to me.
Does anyone have any suggestions/comments about performance of this?
Again just to just to close this off I will answer my own question with the solution im using.
This method takes the output from a ToString() call on an a MS SQL Geography Type.
If the string returned contains polygon data contructed form GPS points, this method will parse and reformatted it to a JSON sting.
public static class PolyConverter
{
static Regex latlngMatch = new Regex(#"(-?\d{1,2}\.\dE-\d+|-?\d{1,2}\.?\d*)\s(-?\d{1,2}\.\dE-\d+|-?\d{1,2}\.?\d*)\s?0?\s?0?,?", RegexOptions.Compiled);
static Regex reformat = new Regex(#"\[,", RegexOptions.Compiled);
public static string ConvertPolysToJson(string polysIn)
{
var formatted = reformat.Replace(
latlngMatch.Replace(
polysIn.Remove(0, polysIn.IndexOf("(")), ",{lng:$1,lat:$2}")
.Replace("(", "[")
.Replace(")", "]"), "[");
if (polysIn.Contains("MULTIPOLYGON"))
{
formatted = formatted.Replace("[[", "[")
.Replace("]]", "]")
.Replace("[[[", "[[")
.Replace("]]]", "]]");
}
return formatted;
}
}
This is specific to my apllication, but maybe useful to somebody and maybe even create a better implementation.
To convert from WKT to GeoJson you can use NetTopologySuite from nuget. Add NetTopologySuite and NetTopologySuite.IO.GeoJSON
var wkt = "POLYGON ((10 20, 30 40, 50 60, 10 20))";
var wktReader = new NetTopologySuite.IO.WKTReader();
var geom = wktReader.Read(wkt);
var feature = new NetTopologySuite.Features.Feature(geom, new NetTopologySuite.Features.AttributesTable());
var featureCollection = new NetTopologySuite.Features.FeatureCollection();
featureCollection.Add(feature);
var sb = new StringBuilder();
var serializer = new NetTopologySuite.IO.GeoJsonSerializer();
serializer.Formatting = Newtonsoft.Json.Formatting.Indented;
using (var sw = new StringWriter(sb))
{
serializer.Serialize(sw, featureCollection);
}
var result = sb.ToString();
Output:
{
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
10.0,
20.0
],
[
30.0,
40.0
],
[
50.0,
60.0
],
[
10.0,
20.0
]
]
]
},
"properties": {}
}
],
"type": "FeatureCollection"
}
To answer your question about efficiency, For this particular case, I don't think that Replace vs RegEx is going to be that big of a difference. All we are really changing is some parenthesis and commas. Personally, I prefer to do things in TSQL for web applications because I can offload the computational work onto the SQL Server instead of the Web Server. In my case I have a lot of data that I am generating for a map and therefore don't want to bog down the webserver with lots of conversions of data. Additionally, for performance, I usually put more horsepower on the SQL server than I do a webserver, so even if there is some difference between the two functions, if Replace is less efficient it is at least being handled by a server with lots more resources. In general, I want my webserver handling connections to clients and my SQL server handling data computations. This also keeps my web server scripts clean and efficient. So my suggestion is as follows:
Write a Scalar TSQL function in your database. This uses the SQL REPLACE function and is somewhat brute force, but it performs really well. This function can be used directly on a SELECT statement or to create calculated columns in a table if you really want to simplify your web server code. Currently this example only supports POINT, POLYGON and MULTIPOLYGON and provides the "geometry" JSON element for the geoJSON format.
GetGeoJSON Scalar Function
CREATE FUNCTION GetGeoJSON (#geo geography) /*this is your geography shape*/
RETURNS varchar(max)
WITH SCHEMABINDING /*this tells SQL SERVER that it is deterministic (helpful if you use it in a calculated column)*/
AS
BEGIN
/* Declare the return variable here*/
DECLARE #Result varchar(max)
/*Build JSON "geometry" element for geoJSON*/
SELECT #Result = '"geometry":{' +
CASE #geo.STGeometryType()
WHEN 'POINT' THEN
'"type": "Point","coordinates":' +
REPLACE(REPLACE(REPLACE(REPLACE(#geo.ToString(),'POINT ',''),'(','['),')',']'),' ',',')
WHEN 'POLYGON' THEN
'"type": "Polygon","coordinates":' +
'[' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#geo.ToString(),'POLYGON ',''),'(','['),')',']'),'], ',']],['),', ','],['),' ',',') + ']'
WHEN 'MULTIPOLYGON' THEN
'"type": "MultiPolygon","coordinates":' +
'[' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#geo.ToString(),'MULTIPOLYGON ',''),'(','['),')',']'),'], ',']],['),', ','],['),' ',',') + ']'
ELSE NULL
END
+'}'
/* Return the result of the function*/
RETURN #Result
END
Next, use your GetGeoJSON function in your SELECT statement, for example:
SELECT dbo.GetGeoJSON([COLUMN]) as Geometry From [TABLE]
I hope this provides some insight and helps others looking for a methodology, good luck!
The method outlined in James's answer works great. But I recently found an error when converting WKT where the Longitude had a value over 99.
I changed the regular expression:
#"(-?\d{1,2}\.\dE-\d+|-?\d{1,3}\.?\d*)\s(-?\d{1,2}\.\dE-\d+|-?\d{1,2}\.?\d*)\s?0?\s?0?,?"
Notice the second "2" has been changed to a "3" to allow longitude to go up to 180.
Strings are immutable in .net, so when you replacing some, you creating an edited copy of previous string. This is not so critical for performance, as for memory usage.
Look at JSON.net
Or use StringBuilder to generate it properly.
StringBuilder sb = new StringBuilder();
sb.AppendFormat();
Utility function that is used for formatting spatial cells as GeoJSON is shown below.
DROP FUNCTION IF EXISTS dbo.geometry2json
GO
CREATE FUNCTION dbo.geometry2json( #geo geometry)
RETURNS nvarchar(MAX) AS
BEGIN
RETURN (
'{' +
(CASE #geo.STGeometryType()
WHEN 'POINT' THEN
'"type": "Point","coordinates":' +
REPLACE(REPLACE(REPLACE(REPLACE(#geo.ToString(),'POINT ',''),'(','['),')',']'),' ',',')
WHEN 'POLYGON' THEN
'"type": "Polygon","coordinates":' +
'[' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#geo.ToString(),'POLYGON ',''),'(','['),')',']'),'], ',']],['),', ','],['),' ',',') + ']'
WHEN 'MULTIPOLYGON' THEN
'"type": "MultiPolygon","coordinates":' +
'[' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#geo.ToString(),'MULTIPOLYGON ',''),'(','['),')',']'),'], ',']],['),', ','],['),' ',',') + ']'
WHEN 'MULTIPOINT' THEN
'"type": "MultiPoint","coordinates":' +
'[' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#geo.ToString(),'MULTIPOINT ',''),'(','['),')',']'),'], ',']],['),', ','],['),' ',',') + ']'
WHEN 'LINESTRING' THEN
'"type": "LineString","coordinates":' +
'[' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(#geo.ToString(),'LINESTRING ',''),'(','['),')',']'),'], ',']],['),', ','],['),' ',',') + ']'
ELSE NULL
END)
+'}')
END

C# Email Address validation

Just I want to clarify one thing. Per client request we have to create a regular expression in such a way that it should allow apostrophe in email address.
My Question according to RFC standard will an email address contain aportrophe? If so how to recreate regular expression to allow apostrophe?
The regular expression below implements the official RFC 2822 standard for email addresses. Using this regular expression in actual applications is NOT recommended. It is shown to illustrate that with regular expressions there's always a trade-off between what's exact and what's practical.
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
You could use the simplified one:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
And yes, apostrophe is allowed in the email, as long as it is not in domain name.
Here's the validation attribute I wrote. It validates pretty much every "raw" email address, that is those of the form local-part#*domain*. It doesn't support any of the other, more...creative constructs that the RFCs allow (this list is not comprehensive by any means):
comments (e.g., jsmith#whizbang.com (work))
quoted strings (escaped text, to allow characters not allowed in an atom)
domain literals (e.g. foo#[123.45.67.012])
bang-paths (aka source routing)
angle addresses (e.g. John Smith <jsmith#whizbang.com>)
folding whitespace
double-byte characters in either local-part or domain (7-bit ASCII only).
etc.
It should accept almost any email address that can be expressed thusly
foo.bar#bazbat.com
without requiring the use of quotes ("), angle brackets ('<>') or square brackets ([]).
No attempt is made to validate that the rightmost dns label in the domain is a valid TLD (top-level domain). That is because the list of TLDs is far larger now than the "big 6" (.com, .edu, .gov, .mil, .net, .org) plus 2-letter ISO country codes. ICANN actually updates the TLD list daily, though I suspect that the list doesn't actually change daily. Further, ICANN just approved a big expansion of the generic TLD namespace). And some email addresses don't have what you're recognize as a TLD (did you know that postmaster#. is theoretically valid and mailable? Mail to that address should get delivered to the postmaster of the DNS root zone.)
Extending the regular expression to support domain literals, it shouldn't be too difficult.
Here you go. Use it in good health:
using System;
using System.ComponentModel.DataAnnotations;
using System.Text.RegularExpressions;
namespace ValidationHelpers
{
[AttributeUsage( AttributeTargets.Property | AttributeTargets.Field , AllowMultiple = false )]
sealed public class EmailAddressValidationAttribute : ValidationAttribute
{
static EmailAddressValidationAttribute()
{
RxEmailAddress = CreateEmailAddressRegex();
return;
}
private static Regex CreateEmailAddressRegex()
{
// references: RFC 5321, RFC 5322, RFC 1035, plus errata.
string atom = #"([A-Z0-9!#$%&'*+\-/=?^_`{|}~]+)" ;
string dot = #"(\.)" ;
string dotAtom = "(" + atom + "(" + dot + atom + ")*" + ")" ;
string dnsLabel = "([A-Z]([A-Z0-9-]{0,61}[A-Z0-9])?)" ;
string fqdn = "(" + dnsLabel + "(" + dot + dnsLabel + ")*" + ")" ;
string localPart = "(?<localpart>" + dotAtom + ")" ;
string domain = "(?<domain>" + fqdn + ")" ;
string emailAddrPattern = "^" + localPart + "#" + domain + "$" ;
Regex instance = new Regex( emailAddrPattern , RegexOptions.Singleline | RegexOptions.IgnoreCase );
return instance;
}
private static Regex RxEmailAddress;
public override bool IsValid( object value )
{
string s = Convert.ToString( value ) ;
bool fValid = string.IsNullOrEmpty( s ) ;
// we'll take an empty field as valid and leave it to the [Required] attribute to enforce that it's been supplied.
if ( !fValid )
{
Match m = RxEmailAddress.Match( s ) ;
if ( m.Success )
{
string emailAddr = m.Value ;
string localPart = m.Groups[ "localpart" ].Value ;
string domain = m.Groups[ "domain" ].Value ;
bool fLocalPartLengthValid = localPart.Length >= 1 && localPart.Length <= 64 ;
bool fDomainLengthValid = domain.Length >= 1 && domain.Length <= 255 ;
bool fEmailAddrLengthValid = emailAddr.Length >= 1 && emailAddr.Length <= 256 ; // might be 254 in practice -- the RFCs are a little fuzzy here.
fValid = fLocalPartLengthValid && fDomainLengthValid && fEmailAddrLengthValid ;
}
}
return fValid ;
}
}
}
Cheers!

Using ANTLR 3.3?

I'm trying to get started with ANTLR and C# but I'm finding it extraordinarily difficult due to the lack of documentation/tutorials. I've found a couple half-hearted tutorials for older versions, but it seems there have been some major changes to the API since.
Can anyone give me a simple example of how to create a grammar and use it in a short program?
I've finally managed to get my grammar file compiling into a lexer and parser, and I can get those compiled and running in Visual Studio (after having to recompile the ANTLR source because the C# binaries seem to be out of date too! -- not to mention the source doesn't compile without some fixes), but I still have no idea what to do with my parser/lexer classes. Supposedly it can produce an AST given some input...and then I should be able to do something fancy with that.
Let's say you want to parse simple expressions consisting of the following tokens:
- subtraction (also unary);
+ addition;
* multiplication;
/ division;
(...) grouping (sub) expressions;
integer and decimal numbers.
An ANTLR grammar could look like this:
grammar Expression;
options {
language=CSharp2;
}
parse
: exp EOF
;
exp
: addExp
;
addExp
: mulExp (('+' | '-') mulExp)*
;
mulExp
: unaryExp (('*' | '/') unaryExp)*
;
unaryExp
: '-' atom
| atom
;
atom
: Number
| '(' exp ')'
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
Now to create a proper AST, you add output=AST; in your options { ... } section, and you mix some "tree operators" in your grammar defining which tokens should be the root of a tree. There are two ways to do this:
add ^ and ! after your tokens. The ^ causes the token to become a root and the ! excludes the token from the ast;
by using "rewrite rules": ... -> ^(Root Child Child ...).
Take the rule foo for example:
foo
: TokenA TokenB TokenC TokenD
;
and let's say you want TokenB to become the root and TokenA and TokenC to become its children, and you want to exclude TokenD from the tree. Here's how to do that using option 1:
foo
: TokenA TokenB^ TokenC TokenD!
;
and here's how to do that using option 2:
foo
: TokenA TokenB TokenC TokenD -> ^(TokenB TokenA TokenC)
;
So, here's the grammar with the tree operators in it:
grammar Expression;
options {
language=CSharp2;
output=AST;
}
tokens {
ROOT;
UNARY_MIN;
}
#parser::namespace { Demo.Antlr }
#lexer::namespace { Demo.Antlr }
parse
: exp EOF -> ^(ROOT exp)
;
exp
: addExp
;
addExp
: mulExp (('+' | '-')^ mulExp)*
;
mulExp
: unaryExp (('*' | '/')^ unaryExp)*
;
unaryExp
: '-' atom -> ^(UNARY_MIN atom)
| atom
;
atom
: Number
| '(' exp ')' -> exp
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
Space
: (' ' | '\t' | '\r' | '\n'){Skip();}
;
I also added a Space rule to ignore any white spaces in the source file and added some extra tokens and namespaces for the lexer and parser. Note that the order is important (options { ... } first, then tokens { ... } and finally the #... {}-namespace declarations).
That's it.
Now generate a lexer and parser from your grammar file:
java -cp antlr-3.2.jar org.antlr.Tool Expression.g
and put the .cs files in your project together with the C# runtime DLL's.
You can test it using the following class:
using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.StringTemplate;
namespace Demo.Antlr
{
class MainClass
{
public static void Preorder(ITree Tree, int Depth)
{
if(Tree == null)
{
return;
}
for (int i = 0; i < Depth; i++)
{
Console.Write(" ");
}
Console.WriteLine(Tree);
Preorder(Tree.GetChild(0), Depth + 1);
Preorder(Tree.GetChild(1), Depth + 1);
}
public static void Main (string[] args)
{
ANTLRStringStream Input = new ANTLRStringStream("(12.5 + 56 / -7) * 0.5");
ExpressionLexer Lexer = new ExpressionLexer(Input);
CommonTokenStream Tokens = new CommonTokenStream(Lexer);
ExpressionParser Parser = new ExpressionParser(Tokens);
ExpressionParser.parse_return ParseReturn = Parser.parse();
CommonTree Tree = (CommonTree)ParseReturn.Tree;
Preorder(Tree, 0);
}
}
}
which produces the following output:
ROOT
*
+
12.5
/
56
UNARY_MIN
7
0.5
which corresponds to the following AST:
(diagram created using graph.gafol.net)
Note that ANTLR 3.3 has just been released and the CSharp target is "in beta". That's why I used ANTLR 3.2 in my example.
In case of rather simple languages (like my example above), you could also evaluate the result on the fly without creating an AST. You can do that by embedding plain C# code inside your grammar file, and letting your parser rules return a specific value.
Here's an example:
grammar Expression;
options {
language=CSharp2;
}
#parser::namespace { Demo.Antlr }
#lexer::namespace { Demo.Antlr }
parse returns [double value]
: exp EOF {$value = $exp.value;}
;
exp returns [double value]
: addExp {$value = $addExp.value;}
;
addExp returns [double value]
: a=mulExp {$value = $a.value;}
( '+' b=mulExp {$value += $b.value;}
| '-' b=mulExp {$value -= $b.value;}
)*
;
mulExp returns [double value]
: a=unaryExp {$value = $a.value;}
( '*' b=unaryExp {$value *= $b.value;}
| '/' b=unaryExp {$value /= $b.value;}
)*
;
unaryExp returns [double value]
: '-' atom {$value = -1.0 * $atom.value;}
| atom {$value = $atom.value;}
;
atom returns [double value]
: Number {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);}
| '(' exp ')' {$value = $exp.value;}
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
Space
: (' ' | '\t' | '\r' | '\n'){Skip();}
;
which can be tested with the class:
using System;
using Antlr.Runtime;
using Antlr.Runtime.Tree;
using Antlr.StringTemplate;
namespace Demo.Antlr
{
class MainClass
{
public static void Main (string[] args)
{
string expression = "(12.5 + 56 / -7) * 0.5";
ANTLRStringStream Input = new ANTLRStringStream(expression);
ExpressionLexer Lexer = new ExpressionLexer(Input);
CommonTokenStream Tokens = new CommonTokenStream(Lexer);
ExpressionParser Parser = new ExpressionParser(Tokens);
Console.WriteLine(expression + " = " + Parser.parse());
}
}
}
and produces the following output:
(12.5 + 56 / -7) * 0.5 = 2.25
EDIT
In the comments, Ralph wrote:
Tip for those using Visual Studio: you can put something like java -cp "$(ProjectDir)antlr-3.2.jar" org.antlr.Tool "$(ProjectDir)Expression.g" in the pre-build events, then you can just modify your grammar and run the project without having to worry about rebuilding the lexer/parser.
Have you looked at Irony.net? It's aimed at .Net and therefore works really well, has proper tooling, proper examples and just works. The only problem is that it is still a bit 'alpha-ish' so documentation and versions seem to change a bit, but if you just stick with a version, you can do nifty things.
p.s. sorry for the bad answer where you ask a problem about X and someone suggests something different using Y ;^)
My personal experience is that before learning ANTLR on C#/.NET, you should spare enough time to learn ANTLR on Java. That gives you knowledge on all the building blocks and later you can apply on C#/.NET.
I wrote a few blog posts recently,
http://www.lextm.com/index.php/2012/07/how-to-use-antlr-on-net-part-i/
http://www.lextm.com/index.php/2012/07/how-to-use-antlr-on-net-part-ii/
http://www.lextm.com/index.php/2012/07/how-to-use-antlr-on-net-part-iii/
http://www.lextm.com/index.php/2012/07/how-to-use-antlr-on-net-part-iv/
http://www.lextm.com/index.php/2012/07/how-to-use-antlr-on-net-part-v/
The assumption is that you are familiar with ANTLR on Java and is ready to migrate your grammar file to C#/.NET.
There is a great article on how to use antlr and C# together here:
http://www.codeproject.com/KB/recipes/sota_expression_evaluator.aspx
it's a "how it was done" article by the creator of NCalc which is a mathematical expression evaluator for C# - http://ncalc.codeplex.com
You can also download the grammar for NCalc here:
http://ncalc.codeplex.com/SourceControl/changeset/view/914d819f2865#Grammar%2fNCalc.g
example of how NCalc works:
Expression e = new Expression("Round(Pow(Pi, 2) + Pow([Pi2], 2) + X, 2)");
e.Parameters["Pi2"] = new Expression("Pi * Pi");
e.Parameters["X"] = 10;
e.EvaluateParameter += delegate(string name, ParameterArgs args)
{
if (name == "Pi")
args.Result = 3.14;
};
Debug.Assert(117.07 == e.Evaluate());
hope its helpful

Categories

Resources