I'm migrating a C#-based programming language compiler from a manual lexer/parser to Antlr.
Antlr has been giving me severe headaches because it usually mostly works, but then there are the small parts that do not and are incredibly painful to solve.
I discovered that most of my headaches are caused by the lexer parts of Antlr, rather than the parser. Then I noticed parser grammar X; and realized that perhaps I could have my manually written lexer and then an Antlr generated parser.
So I'm looking for more documentation on this topic. I guess a custom ITokenStream could work, but there appears to be virtually no online documentation on this topic...
I found out how. It might not be the best approach but it certainly seems to be working.
Antlr parsers receive a ITokenStream parameter
Antlr lexers are themselves ITokenSources
ITokenSource is a significantly simpler interface than ITokenStream
The simplest way to convert a ITokenSource to a ITokenStream is to use a CommonSourceStream, which receives a ITokenSource parameter
So now we only need to do 2 things:
Adjust the grammar to be parser-only
Implement ITokenSource
Adjusting the grammar is very simple. Simply remove all lexer declarations and ensure you declare the grammar as parser grammar. A simple example is posted here for convinience:
parser grammar mygrammar;
options
{
language=CSharp2;
}
#parser::namespace { MyNamespace }
document: (WORD {Console.WriteLine($WORD.text);} |
NUMBER {Console.WriteLine($NUMBER.text);})*;
Note that the following file will output class mygrammar instead of class mygrammarParser.
So now we want to implement a "fake" lexer.
I personally used the following pseudo-code:
TokenQueue q = new TokenQueue();
//Do normal lexer stuff and output to q
CommonTokenStream cts = new CommonTokenStream(q);
mygrammar g = new mygrammar(cts);
g.document();
Finally, we need to define TokenQueue. TokenQueue is not strictly necessary but I used it for convenience.
It should have methods to receive the lexer tokens, and methods to output Antlr tokens. So if not using Antlr native tokens one has to implement a convert-to-Antlr-token method.
Also, TokenQueue must implement ITokenSource.
Be aware that it is very important to correctly set the token variables. Initially, I had some problems because I was miscalculating CharPositionInLine. If these variables are incorrectly set, then the parser may fail.
Also, the normal channel(not hidden) is 0.
This seems to be working for me so far. I hope others find it useful as well.
I'm open to feedback. In particular, if you find a better way to solve this problem, feel free to post a separate reply.
Related
I have a use case in Q# where I have qubit register qs and need to apply the CNOT gate on every qubit except the first one, using the first one as control. Using a for loop I can do it as follows:
for (i in 1..Length(qs)-1) {
CNOT(qs[0], qs[i]);
}
Now, I wanted to give it a more functional flavor and tried instead to do something like:
ApplyToEach(q => CNOT(qs[0], q), qs[1..Length(qs)-1]);
The Q# compiler does not accept an expression like this, informing me that it encountered an unexpected code fragment. That's not too informative for my taste. Some documents claim that Q# supports anonymous functions a'la C#, hence the attempt above. Can anybody point me to a correct usage of lambdas in Q# or dispel my false belief?
At the moment, Q# doesn't support lambda functions and operations (though that would be a great feature request to file at https://github.com/microsoft/qsharp-compiler/issues/new/choose). That said, you can get a lot of the functional flavor that you get from lambdas by using partial application. In your example, for instance, I could also write that for loop as:
ApplyToEach(CNOT(Head(qs), _), Rest(qs));
Here, since CNOT has type (Qubit, Qubit) => Unit is Adj + Ctl, filling in one of the two inputs as CNOT(Head(qs), _) results in an operation of type Qubit => Unit is Adj + Ctl.
Partial application is a very powerful feature, and is used all throughout the Q# standard libraries to provide a functional way to build up quantum programs. If you're interested in learning more, I recommend checking out the docs at https://learn.microsoft.com/quantum/language/expressions#callable-invocation-expressions.
I am looking for an algorithm or approach to evaluate mathematical expressions that are stated as string. The expression contains mathematical components but also custom functions. I look to implement said algorithm in C#/.Net.
I am aware that Roslyn allows me to evaluate an expression of the kind
"var value = 3+5*11-Math.Sqrt(9);"
I am also familiar how to use "node re-writing" in order to accomplish avoidance of variable declarations or fully qualified function names or the omission of the trailing semicolon in order to evaluate
"value = 3+5*11-Sqrt(9)"
However, what I want to implement on top of this is to offer custom script functions such as
"value = Ratio(A,B)", where Ratio is a custom function that divides each element in vector A by each element in vector B and returns a same length vector.
or
"value = Sma(A, 10)", where Sma is a custom function that calculates the simple moving average of vector/timeseries A with a lookback window of 10.
Ideally I want to get to the ability to provide more complexity such as
"value = Ratio(A,B) * Pi + 0.5 * Spread(C,D) + Sma(E, lookback)", whereby the parsing engine would respect operator precedence and build a parsing tree in order to fetch values, required to evaluate the expression.
I can't wrap my head around how I could solve such kind of problem with Roslyn.
What other approaches are out there to get me started or am I missing features that Roslyn offers that may assist in solving this problem?
Assuming that all your expressions are valid C# expressions you can make use of Roslyn in multiple ways.
You could use Roslyn only for parsing. SyntaxFactory.ParseExpression would give you the syntax tree of an expression. Note that your first (var v = expr;) example is not an expression, but a variable declaration. However v = expr is an expression, namely an AssignmentExpressionSyntax. Then you could traverse this AST, and do with each node what you want to do, basically you'd write an interpreter. The benefit of this approach is that you don't have to write your own parser, walking an AST is very simple, and this approach would be flexible, as defining what you do with "unknown" methods would be perfectly up to you.
Use Roslyn for evaluation too. This can be done in multiple flavors: either putting together a valid C# file, and compiling that into an assembly, or you could go through the Scripting API. This approach would basically require a class library that contains the implementation of all your extra methods, like Sma, Spread, ... But these would also be needed in some form in the first approach, so it's not really an extra effort.
If the only goal is to evaluate the expression, then I would go with the 2nd approach. If there are extra requirements (which you haven't mentioned) like being able to let's say produce a simplified form of an expression, then I'd consider the first solution.
If you find a library that does exactly what you need (and the perf is good, and you don't mind the dependency on 3rd party tools, ...), I'd go with that. MathParser.org-mXparser suggested in the comment seems pretty much what you're looking for.
I've never done Bison or Wisent before.
how can I get started?
My real goal is to produce a working Wisent/Semantic grammar for C#, to allow C# to be edited in emacs with code-completion, and all the other CEDET goodies. (For those who don't know, Wisent is a emacs-lisp port of GNU Bison, which is included into CEDET. The Wisent apparently is a European Bison. And Bison, I take it, is a play-on-words deriving from YACC. And CEDET is a Collection of Emacs Development Tools. All caught up? I'm not going to try to define emacs. )
Microsoft provides the BNF grammar for C#, including all the LINQ extensions, in the language reference document. I was able to translate that into a .wy file that compiles successfully with semantic-grammar-create-package.
But the compiled grammar doesn't "work". In some cases the grammar "finds" enum declarations, but not class declarations. Why? I don't know. I haven't been able to get it to recognize attributes.
I'm not finding the "debugging" of the grammar to be very easy.
I thought I'd take a step back and try to produce a wisent grammar for a vastly simpler language, a toy language with only a few keywords. Just to sort of gain some experience. Even that is proving a challenge.
I've seen the .info documents on the grammar fw, and wisent, but... still those things are not really clarifying for me, how the stuff really works.
So
Q1: any tips on debugging a wisent grammar in emacs? Is there a way to run a "lint-like" thing on the grammar to find out if there are unused rules, dead-ends stuff like that? What about being able to watch the parser in action? Anything like that?
Q2: Any tips on coming up to speed on bison/wisent in general? What I'm thinking is a tool that will allow me to gain some insight into how the rules work. Something that provides some transparency, instead of the "it didn't work" experience i'm getting now with Wisent.
Q3: Rather than continue to fight this, should I give up and become an organic farmer?
ps: I know about the existing C# grammar in the contrib directory of CEDET/semantic. That thing works, but ... It doesn't support the latest C# spec, including LINQ, partial classes and methods, yield, anonymous methods, object initializers, and so on. Also it mostly punts on parsing a bunch of the C# code. It sniffs out the classes and methods, and then bails out. Even foreach loops aren't done quite right. It's good as far as it goes, but I'd like to see it be better. What I'm trying to do is make it current, and also extend it to parse more of the C# code.
You may want to look at the calc example in the semantic/wisent directory. It is quite simple, and also shows how to use the %left and %right features. It will "execute" the code instead of convert it into tags. Some other simple grammars include the 'dot' parser in cogre, and the srecode parser in srecode.
For wisent debugging, there is a verbosity flag in the menu, though to be honest I hadn't tried it. There is also wisent-debug-on-entry which lets you select a action that will cause the Emacs debugger to stop in that action so you can see what the values are.
The older "bovine" parser has a debug mode that allows you to step through the rules, but it was never ported to wisent. That is a feature I have sorely missed as I write wisent parsers.
Regarding Q1:
1st make sure that the wisent parser is actually used:
(fetch-overload 'semantic-parse-stream)
should return wisent-parse-stream.
Run the following elisp-snippet:
(easy-menu-add-item semantic-mode-map '(menu-bar cedet-menu) ["Wisent-Debug" wisent-debug-toggle :style toggle :selected (wisent-debug-active)])
(defun wisent-debug-active ()
"Return non-nil if wisent debugging is active."
(assoc 'wisent-parse-action-debug (ad-get-advice-info-field 'wisent-parse-action 'after)))
(defun wisent-debug-toggle ()
"Install debugging of wisent-parser"
(interactive)
(if (wisent-debug-active)
(ad-unadvise 'wisent-parse-action)
(defadvice wisent-parse-action (after wisent-parse-action-debug activate)
(princ (format "\ntoken:%S;\nactionList:%S;\nreturn:%S\n"
(eval i)
(eval al)
(eval ad-return-value)) (get-buffer-create "*wisent-debug*"))))
(let ((fileName (locate-file "semantic/wisent/wisent" load-path '(".el" ".el.gz")))
fct found)
(if fileName
(with-current-buffer (find-file-noselect fileName)
(goto-char (point-max))
(while (progn
(backward-list)
(setq fct (sexp-at-point))
(null
(or
(bobp)
(and
(listp fct)
(eq 'defun (car fct))
(setq found (eq 'wisent-parse (cadr fct))))))))
(if found
(eval fct)
(error "Did not find wisent-parse.")))
(error "Source file for semantic/wisent/wisent not found.")
)))
It creates a new entry Wisent-Debug in the Development-menu. Clicking this entry toggles debugging of the wisent parser. Next time you reparse a buffer with the wisent-parser it outputs debug information to the buffer *wisent debug*. The buffer *wisent debug* is not shown automatically but you find it via the buffer menu.
To avoid a flooding of *wisent debug* you should disable "Reparse when idle".
From time to time you shold clear the buffer *wisent debug* with erase-buffer.
I would like to be able to decorate any method with a custom Trace attribute and some piece of code should be injected into that method at compilation.
For example:
[Trace]
public void TracedMethod(string param1)
{
//method body
}
should become:
public void TracedMethod(string param1)
{
Log.Trace("TracedMethod", "param1", param1);
//method body
}
In this case, the injected code depends on the method name and method parameters, so it should be possible to infer this information.
Does anyone know how to accomplish this?
To do Aspect Oriented Programming in C#, you can use PostSharp.
(The homepage even shows a Trace example, just like you're asking for!)
This can be easily done with a program transformation system.
The DMS Software Reengineering Toolkit is a general purpose program transformation system, and can be used with many languages (C++, COBOL, Java, EcmaScript, Fortran, ..) as well as specifically with C#.
DMS parses source code, builds Abstract Syntax Trees, and allows you to apply source-to-source patterns to transform your code from one C# program into another with whatever properties you wish. THe transformation rule to accomplish exactly the task you specified would be:
domain CSharp.
insert_trace():method->method
"[Trace]
\visibility \returntype \methodname(string \parametername)
{ \body } "
->
"\visibility \returntype \methodname(string \parametername)
{ Log.Trace(\CSharpString\(\methodname\),
\CSharpString\(\parametername\),
\parametername);
\body } "
The quote marks (") are not CSharp quote marks; rather, they are "domain quotes", and indicate that the content inside the quote marks is CSharp syntax (because we said, "domain CSharp"). The \foo notations are meta syntax.
This rule matches the AST representing the method you specified with the [Trace] annotation, and rewrites that AST into the traced form. The resulting AST is then prettyprinted back into source form, which you can compile. You probably need other rules to handle other combinations of arguments; in fact, you'd probably generalize the argument processing to produce (where practical) a string value for each scalar argument.
It should be clear you can do a lot more than just logging with this, and a lot more than just aspect-oriented programming, since you can express arbitrary transformations and not just before-after actions.
Some time ago I had to address a certain C# design problem when I was implementing a JavaScript code-generation framework. One of the solutions I came with was using the “using” keyword in a totally different (hackish, if you please) way. I used it as a syntax sugar (well, originally it is one anyway) for building hierarchical code structure. Something that looked like this:
CodeBuilder cb = new CodeBuilder();
using(cb.Function("foo"))
{
// Generate some function code
cb.Add(someStatement);
cb.Add(someOtherStatement);
using(cb.While(someCondition))
{
cb.Add(someLoopStatement);
// Generate some more code
}
}
It is working because the Function and the While methods return IDisposable object, that, upon dispose, tells the builder to close the current scope. Such thing can be helpful for any tree-like structure that need to be hard-codded.
Do you think such “hacks” are justified? Because you can say that in C++, for example, many of the features such as templates and operator overloading get over-abused and this behavior is encouraged by many (look at boost for example). On the other side, you can say that many modern languages discourage such abuse and give you specific, much more restricted features.
My example is, of course, somewhat esoteric, but real. So what do you think about the specific hack and of the whole issue? Have you encountered similar dilemmas? How much abuse can you tolerate?
I think this is something that has blown over from languages like Ruby that have much more extensive mechanisms to let you create languages within your language (google for "dsl" or "domain specific languages" if you want to know more). C# is less flexible in this respect.
I think creating DSL's in this way is a good thing. It makes for more readable code. Using blocks can be a useful part of a DSL in C#. In this case I think there are better alternatives. The use of using is this case strays a bit too far from its original purpose. This can confuse the reader. I like Anton Gogolev's solution better for example.
Offtopic, but just take a look at how pretty this becomes with lambdas:
var codeBuilder = new CodeBuilder();
codeBuilder.DefineFunction("Foo", x =>
{
codeBuilder.While(condition, y =>
{
}
}
It would be better if the disposable object returned from cb.Function(name) was the object on which the statements should be added. That internally this function builder passed through the calls to private/internal functions on the CodeBuilder is fine, just that to public consumers the sequence is clear.
So long as the Dispose implementation would make the following code cause a runtime error.
CodeBuilder cb = new CodeBuilder();
var f = cb.Function("foo")
using(function)
{
// Generate some function code
f.Add(someStatement);
}
function.Add(something); // this should throw
Then the behaviour is intuitive and relatively reasonable and correct usage (below) encourages and prevents this happening
CodeBuilder cb = new CodeBuilder();
using(var function = cb.Function("foo"))
{
// Generate some function code
function.Add(someStatement);
}
I have to ask why you are using your own classes rather than the provided CodeDomProvider implementations though. (There are good reasons for this, notably that the current implementation lacks many of the c# 3.0 features) but since you don't mention it yourself...
Edit: I would second Anoton's suggest to use lamdas. The readability is much improved (and you have the option of allowing Expression Trees
If you go by the strictest definitions of IDisposable then this is an abuse. It's meant to be used as a method for releasing native resources in a deterministic fashion by a managed object.
The use of IDisposable has evolved to essentially be used by "any object which should have a deterministic lifetime". I'm not saying this is write or wrong but that's how many API's and users are choosing to use IDisposable. Given that definition it's not an abuse.
I wouldn't consider it terribly bad abuse, but I also wouldn't consider it good form because of the cognitive wall you're building for your maintenance developers. The using statement implies a certain class of lifetime management. This is fine in its usual uses and in slightly customized ones (like #heeen's reference to an RAII analogue), but those situations still keep the spirit of the using statement intact.
In your particular case, I might argue that a more functional approach like #Anton Gogolev's would be more in the spirit of the language as well as maintainable.
As to your primary question, I think each such hack must ultimately stand on its own merits as the "best" solution for a particular language in a particular situation. The definition of best is subjective, of course, but there are definitely times (especially when the external constraints of budgets and schedules are thrown into the mix) where a slightly more hackish approach is the only reasonable answer.
I often "abuse" using blocks. I think they provide a great way of defining scope. I have a whole series of objects that I use for capture and restoring state (e.g. of Combo boxes or the mouse pointer) during operations that may change the state. I also use them for creating and dropping database connections.
E.g.:
using(_cursorStack.ChangeCursor(System.Windows.Forms.Cursors.WaitCursor))
{
...
}
I wouldn't call it abuse. Looks more like a fancied up RAII technique to me. People have been using these for things like monitors.