parse C# preprocessors in antlr4 - c#

I'm trying to parse C# preprocessors using ANTLR4 instead of ignoring them. I'm using the grammar mentioned here: https://github.com/antlr/grammars-v4/tree/master/csharp
This is my addition (now i'm focusing only on pp_conditional):
pp_directive
: Pp_declaration
| pp_conditional
| Pp_line
| Pp_diagnostic
| Pp_region
| Pp_pragma
;
pp_conditional
: pp_if_section (pp_elif_section | pp_else_section | pp_conditional)* pp_endif
;
pp_if_section:
SHARP 'if' conditional_or_expression statement_list
;
pp_elif_section:
SHARP 'elif' conditional_or_expression statement_list
;
pp_else_section:
SHARP 'else' (statement_list | pp_if_section)
;
pp_endif:
SHARP 'endif'
;
I added its entry here:
block
: OPEN_BRACE statement_list? CLOSE_BRACE
| pp_directive
;
i'm getting that error:
line 19:0 mismatched input '#if TEST\n' expecting '}'
when i use the following test case:
if (!IsPostBack){
#if TEST
ltrBuild.Text = "**TEST**";
#else
ltrBuild.Text = "**LIVE**";
#endif
}

The problem is that a block is composed of either '{' statement_list? '}' or a pp_directive. In this specific case, it chooses the first, because the first token it sees is a { (after the if condition). Now, it is expecting to maybe see a statement_list? and then a }, but what it find is #if TEST, a pp_directive.
What do we have to do? Make your pp_directive a statement. Since we know statement_list: statement+;, we search for statement and add pp_directive to it:
statement
: labeled_statement
| declaration_statement
| embedded_statement
| pp_directive
;
And it should be working fine. However, we must also see if your block: ... | pp_directive should be removed or not, and it should be. I'll let it for you to find out why, but here's a test case that's ambiguous:
if (!IsPostBack)
#pragma X
else {
}

Related

How to compare dynamically in .NET Core

This is the conditional rule configured in the database. The rules configured for each user are as follows。
| key | expression |rule|
|:---- |:--------------:|---:|
| 001 | >=500 and <=600| 1.2|
| 001 | >600 | 2.0|
| 002 | ==400 | 4.0|
| 002 | !=700 | 5.0|
| 003 | ==100 || ==200 | 0.5|
I need to get the conditional dynamic judgment that the key is 001
Below is my current code, I want generated C# code like this
if (item.TotalDaySam >= 500 && item.TotalDaySam <= 600)
{
// return Amount * 001 rule(1.2)
}
else if (item.TotalDaySam > 600)
{
// return Amount * 001 rule(2.0)
}
else
{
// retrun Amount
}
How do I get the configuration of the database to dynamically generate the judgment code to perform different logical calculations. I found a similar project RulesEngine, but I don't know how to implement it.
If you can store your data like this :
x>=500 && x<=600
x>600
x==600
x!=600
And then iterate foreach line replacing each time x by "item.TotalDaySam".
Finally you can find help from this post to parse the string into a if statement : C# Convert string to if condition
(sorry for the answer instead of comment I am not expert enough to have the right to comment ^^)

ANTLR4 grammar integration complexities for selection with removals

I’m attempting to create a grammar for a lighting control system and I make good progress when testing with the tree gui tool but it all seems to fall apart when I attempt to implement it into my app.
The basic structure of the language is [Source] [Mask] [Command] [Destination]. Mask is optional so a super simple sample input might look like this : Fixture 1 # 50 which bypasses Mask. Fixture 1 is the source, # is the command and 50 is the destination which in this case is an intensity value.
I’ve no issues with this type of input but things get complicated as I try and build out more complex source selection. Let’s say I want to select a range of fixtures and remove a few from the selection and then add more fixtures after.
Fixture 1 Thru 50 – 25 – 30 – 35 + 40 > 45 # 50
This is a very common syntax on existing control systems but I’m stumped at how to design the grammar for this in a way that makes integration into my app not too painful.
The user could just as easily type the following:
1 Thru 50 – 25 – 30 – 35 + 40 > 45 # 50
Because sourceType (fixture) is not provided, its inferred.
To try and deal with the above situations, I've written the following:
grammar LiteMic;
/*
* Parser Rules
*/
start : expression;
expression : source command destination
| source mask command destination
| command destination
| source command;
destination : sourceType number
| sourceType number sourceType number
| number;
command : COMMAND;
mask : SOURCETYPE;
operator : ADD #Add
| SUB #Subtract
;
plus : ADD;
minus : SUB;
source : singleSource (plus source)*
| rangeSource (plus source)*
;
singleSource : sourceType number #SourceWithType
| number #InferedSource
;
rangeSource : sourceRange (removeSource)*
;
sourceRange : singleSource '>' singleSource;
removeSource : '-' source;
sourceType : SOURCETYPE;
number : NUMBER;
compileUnit
: EOF
;
/*
* Lexer Rules
*/
SOURCETYPE : 'Cue'
| 'Playback'
| 'List'
| 'Intensity'
| 'Position'
| 'Colour'
| 'Beam'
| 'Effect'
| 'Group'
| 'Fixture'
;
COMMAND : '#'
| 'Record'
| 'Update'
| 'Copy'
| 'Move'
| 'Delete'
| 'Highlight'
| 'Full'
;
ADD : '+' ;
SUB : '-' ;
THRU : '>' ;
/* A number: can be an integer value, or a decimal value */
NUMBER : [0-9]+ ;
/* We're going to ignore all white space characters */
WS : [ \t\r\n]+ -> skip
;
Running the command against grun gui produces the following:
I've had some measure of success being able to override the Listener for AddRangeSource as I can loop through and add the correct types but it all falls apart when I try and remove a range.
1 > 50 - 30 > 35 # 50
This produces a problem as the removal of a range matches to the 'addRangeSource'.
I'm pretty sure I'm missing something obvious and I've been working my way through the book I bought on Amazon but it's still not cleared up in my head how to archieve what I'm after and I've been looking at this for a week.
For good measure, below is a tree for a more advanced query that seems ok apart from the selection.
Does anyone have any pointers / suggestions on where I'm going wrong?
Cheers,
Mike
You can solve the problem by reorganizing the grammar a little:
Merge rangeSource with sourceRange:
rangeSource : singleSource '>' singleSource;
Note: This rule also matches input like Beam 1 > Group 16, which might be unintended, in that case you could use this:
rangeSource : sourceType? number '>' number;
Rename source to sourceList (and don't forget to change it in the expression rule):
expression : sourceList command destination
| sourceList mask command destination
| command destination
| sourceList command;
Add a source rule that matches either singleSource or rangeSource:
source : singleSource | rangeSource;
Put + and - at the same level (as addSource and removeSource):
addSource : plus source;
removeSource : minus source;
Change sourceList to accept a list of addSource/removeSource:
sourceList : source (addSource|removeSource)*;
I tried this and it doesn't have any problems with parsing even the more advanced query.

How can I report on the name of an object with fluent assertions

I have a test that checks several objects in a table on our website. The test is written in SpecFlow and C#
It looks something like this:
When I click proceed
Then I should see the following values
| key | value |
| tax | 5.00 |
| delivery | 5.00 |
| subtotal | 20.00 |
My code behind for the "Then" step is something similar to:
[StepDefinition("I should see the following values")]
public void IShouldSeeTheFollowingValues(Table table)
{
var basketSummary = new BasketModel();
foreach (var row in table.Rows)
{
switch (row["key"])
{
case "tax":
basketSummary.Tax.Should().Be(row["value"]);
break;
case "delivery":
basketSummary.Delivery.Should().Be(row["value"]);
break;
case "subtotal":
basketSummary.Subtotal.Should().Be(row["value"]);
break;
}
}
}
The problem with this is in our build logs if the test errors it looks something like this:
When I click proceed
-> done: OrderConfirmationPageSteps.ClickProceed() (1.0s)
Then I should see the following values
--- table step argument ---
| key | value |
| tax | 5.00 |
| delivery | 5.00 |
| subtotal | 20.00 |
-> error: Expected value to be 5.00, but found 1.00.
as you can see above its hard to distinguish which object it means... when it says it expects it to be 5.00
Is there a way I can modify the output to say something along the lines of:
-> error: Expected value of Tax to be 5.00, but found 1.00.
You can do two things:
Pass a reason phrase to the Be() method, e.g.
`basketSummary.Delivery.Should().Be(row["value"], "because that's
the tax value");
Wrap the call in an AssertionScope and pass the description (the context) into its constructor, like
this
In the latest version https://fluentassertions.com/introduction#subject-identification
string username = "dennis";
username.Should().Be("jonas");
//will throw a test framework-specific exception with the following message:
Expected username to be "jonas" with a length of 5,
but "dennis" has a length of 6, differs near "den" (index 0).
Fluent Assertions can use the C# code of the unit test to extract the
name of the subject and use that in the assertion failure.
Since it needs the debug symbols for that, this will require you to
compile the unit tests in debug mode, even on your build servers.

ANTLR rule to skip method body

My task is to create ANTLR grammar, to analyse C# source code files and generate class hierarchy. Then, I will use it to generate class diagram.
I wrote rules to parse namespaces, class declarations and method declarations. Now I have problem with skipping methods bodies. I don't need to parse them, because bodies are useless in my task.
I wrote simple rule:
body:
'{' .* '}'
;
but it does not work properly, when method looks like:
void foo()
{
...
{
...
}
...
}
rule matches first brace what is ok, then it matches
...
{
...
as 'any'(.*) and then third brace as final brace, what is not ok, and rule ends.
Anybody could help me to write proper rule for method bodies? As I said before, I don't want to parse them - only to skip.
UPDATE:
here is solution of my problem strongly based on Adam12 answer
body:
'{' ( ~('{' | '}') | body)* '}'
;
You have to use recursive rules that match parentheses pairs.
rule1 : '('
(
nestedParan
| (~')')*
)
')';
nestedParan : '('
(
nestedParan
| (~')')*
)
')';
This code assumes you are using the parser here so strings and comments are already excluded. ANTLR doesn't allow negation of multiple alternatives in parser rules so the code above relies on the fact that alternatives are tried in order. It should give a warning that alternatives 1 and 2 both match '(' and thus choose the first alternative, which is what we want.
You can handle the recursion of (nested) blocks in your lexer. The trick is to let your class definition also include the opening { so that not the entire contents of the class is gobbled up by this recursive lexer rule.
A quick demo that is without a doubt not complete, but is a decent start to "fuzzy parse/lex" a Java (or C# with some slight modifications) source file:
grammar T;
parse
: (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text.replace("\n", "\\n"));})* EOF
;
Skip
: (StringLiteral | CharLiteral | Comment) {skip();}
;
PackageDecl
: 'package' Spaces Ids {setText($Ids.text);}
;
ClassDecl
: 'class' Spaces Id Spaces? '{' {setText($Id.text);}
;
Method
: Id Spaces? ('(' {setText($Id.text);}
| /* no method after all! */ {skip();}
)
;
MethodOrStaticBlock
: Block {skip();}
;
Any
: . {skip();}
;
// fragments
fragment Spaces
: (' ' | '\t' | '\r' | '\n')+
;
fragment Ids
: Id ('.' Id)*
;
fragment Id
: ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
;
fragment Block
: '{' ( ~('{' | '}' | '"' | '\'' | '/')
| {input.LA(2) != '/'}?=> '/'
| StringLiteral
| CharLiteral
| Comment
| Block
)*
'}'
;
fragment Comment
: '/*' .* '*/'
| '//' ~('\r' | '\n')*
;
fragment CharLiteral
: '\'' ('\\\'' | ~('\\' | '\'' | '\r' | '\n'))+ '\''
;
fragment StringLiteral
: '"' ('\\"' | ~('\\' | '"' | '\r' | '\n'))* '"'
;
I ran the generated parser against the following Java source file:
/*
... package NO.PACKAGE; ...
*/
package foo.bar;
public final class Mu {
static String x;
static {
x = "class NotAClass!";
}
void m1() {
// {
while(true) {
double a = 2.0 / 2;
if(a == 1.0) { break; } // }
/* } */
}
}
static class Inner {
int m2 () {return 42; /*comment}*/ }
}
}
which produced the following output:
PackageDecl 'foo.bar'
ClassDecl 'Mu'
Method 'm1'
ClassDecl 'Inner'
Method 'm2'

ANTLR3 common values in 2 different domain values

I need to define a language-parser for the following search criteria:
CRITERIA_1=<values-set-#1> AND/OR CRITERIA_2=<values-set-#2>;
Where <values-set-#1> can have values from 1-50 and <values-set-#2> can be from the following set (5, A, B, C) - case is not important here.
I have decided to use ANTLR3 (v3.4) with output in C# (CSharp3) and it used to work pretty smooth until now. The problem is that it fails to parse the string when I provide values from both data-sets (I.e. in this case '5'). For example, if I provide the following string
CRITERIA_1=5;
It returns the following error where the value node was supposed to be:
<unexpected: [#1,11:11='5',<27>,1:11], resync=5>
The grammar definition file is the following:
grammar ZeGrammar;
options {
language=CSharp3;
TokenLabelType=CommonToken;
output=AST;
ASTLabelType=CommonTree;
k=3;
}
tokens
{
ROOT;
CRITERIA_1;
CRITERIA_2;
OR = 'OR';
AND = 'AND';
EOF = ';';
LPAREN = '(';
RPAREN = ')';
}
public
start
: expr EOF -> ^(ROOT expr)
;
expr
: subexpr ((AND|OR)^ subexpr)*
;
subexpr
: grouppedsubexpr
| 'CRITERIA_1=' rangeval1_expr -> ^(CRITERIA_1 rangeval1_expr)
| 'CRITERIA_2=' rangeval2_expr -> ^(CRITERIA_2 rangeval2_expr)
;
grouppedsubexpr
: LPAREN! expr RPAREN!
;
rangeval1_expr
: rangeval1_subexpr
| RANGE1_VALUES
;
rangeval1_subexpr
: LPAREN! rangeval1_expr (OR^ rangeval1_expr)* RPAREN!
;
RANGE1_VALUES
: (('0'..'4')? ('0'..'9') | '5''0')
;
rangeval2_expr
: rangeval2_subexpr
| RANGE2_VALUES
;
rangeval2_subexpr
: LPAREN! rangeval2_expr (OR^ rangeval2_expr)* RPAREN!
;
RANGE2_VALUES
: '5' | ('a'|'A') | ('b'|'B') | ('c'|'C')
;
And if I remove the value '5' from RANGE2_VALUES it works fine. Can anyone hint me on what I am doing wrong?
You must realize that the lexer does not produce tokens based on what the parser tries to match. So, in your case, the input "5" will always be tokenized as a RANGE1_VALUES and never as a RANGE2_VALUES because both RANGE1_VALUES and RANGE2_VALUES can match this input but RANGE1_VALUES comes first (so RANGE1_VALUES takes precedence over RANGE2_VALUES).
A possible fix would be to remove both RANGE1_VALUES and RANGE2_VALUES rules and replace them with the following lexer rules:
D0_4
: '0'..'4'
;
D5
: '5'
;
D6_50
: '6'..'9' // 6-9
| '1'..'4' '0'..'9' // 10-49
| '50' // 50
;
A_B_C
: ('a'|'A')
| ('b'|'B')
| ('c'|'C')
;
and the introduce these new parser rules:
range1_values
: D0_4
| D5
| D6_50
;
range2_values
: A_B_C
| D5
;
and change all RANGE1_VALUES and RANGE2_VALUES calls in your parser rules with range1_values and range2_values respectively.
EDIT
Instead of trying to solve this at the lexer-level, you might simply match any integer value and check inside the parser rule if the value is the correct one (or correct range) using a semantic predicate:
range1_values
: INT {Integer.valueOf($INT.text) <= 50}?
;
range2_values
: A_B_C
| INT {Integer.valueOf($INT.text) == 5}?
;
INT
: '0'..'9'+
;
A_B_C
: 'a'..'c'
| 'A'..'C'
;

Categories

Resources