This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
I need a fast runtime expression parser
How do I make it that when someone types in x*y^z in a textbox on my page to calculate that equation in the code behind and get the result?
.NET does not have a built-in function for evaluating arbitrary strings. However, an open source .NET library named NCalc does.
NCalc is a mathematical expressions evaluator in .NET. NCalc can parse
any expression and evaluate the result, including static or dynamic
parameters and custom functions.
Answer from operators as strings by user https://stackoverflow.com/users/1670022/matt-crouch, using built-in .NET functionality:
"If all you need is simple arithmetic, do this.
DataTable temp = new DataTable();
Console.WriteLine(temp.Compute("15 / 3",string.Empty));
EDIT: a little more information. Check out the MSDN documentation for the Expression property of the System.Data.DataColumn class. The stuff on "Expression Syntax" outlines a list of commands you can use in addition to the arithmetic operators. (ex. IIF, LEN, etc.)."
EDIT 2: For convenience, you can put this into a little function like:
public string Eval(string expr)
{
var temp = new System.Data.DataTable();
string result = null;
try
{
result = $"{temp.Compute(expr, string.Empty)}";
}
catch (System.Data.EvaluateException ex)
{
if (ex.Message.ToLower().Contains("cannot find column"))
throw new System.Data.SyntaxErrorException($"Syntax error: Invalid expression: '{expr}'."
+ " Variables as operands are not supported.");
else
throw;
}
return result;
}
So you can use it like:
Console.WriteLine(Eval("15 * (3 + 5) / (7 - 2)"));
giving the expected output:
24
Note that the error handler helps to handle exceptions caused by using variables which are not allowed here. Example: Eval("a") - Instead of returning "Cannot find column [a]", which doesn't make much sense in this context (we're not using it in a database context) it is returning "Syntax error: Invalid expression: 'a'. Variables as operands are not supported."
Run it on DotNetFiddle
There are two main approaches to this problem, each with some variations, as illustrated in the variety of answers.
Option A: Find an existing mathematical expresssion evaluator
Option B: Write your own parser and the logic to compute the result
Before going into some details about this, it is appropriate to stress that interpreting arbitrary mathematical expressions is not a trivial task, for any expression grammar other than "toy" grammars such as these that only accept one or two arithmetic operations and do not allow parenthesis etc.
Understanding that such task is deceivingly trivial, and acknowledging that, after all, interpreting arithmetic expressions of average complexity is a relatively recurrent need for various applications [hence one for which mature solutions should be available], it is probably wise to try and make do with "Option A".
I'd therefore second Jed's recommendation of a ready-make expression evaluator such as NCalc.
It may be useful however to take the time and understand the various concepts and methods associated with parsing and interpreting arithmetic expressions, as if one were going to whip-up one's own implementation.
The key concept is that of a formal grammar. The arithmetic expressions which the evaluator will accept must follow a set of rules such as the list of arithmetic operations allowed. For example will the evaluator support, say, trigonometric functions, or if it does, will this also include say atan2(). The rules also indicate what consitutes an operand, for example will it be allowed to input numerical values as big as say 45 digits. etc. The point is that all these rules are formalized in a grammar.
Typically a grammar works on tokens which have previously been extracted from the raw input text. Essentially at some time in the process, some logic needs to analyze the input string, character by character, and determine which sequences of characters go together. For example in the 123 + 45 / 9.3 expression, the tokens are the integer value 123, the plus operator, the integer value 45, the division operator and finally the 9.3 real value. The task of identifying the tokens and associating them with a token type is the job a lexer. Lexers can be build themselves on a grammar (a grammar which "tokens" are single characters, as opposed to the grammar for the arithmetic expression parser which tokens are short strings produced by the lexer.)
BTW, grammars are used to define many other things beyond arithmetic expressions. Computer languages follow [rather sophiticated] grammars, but it is relatively common to introduce Domain Specific Languages aka DSLs in support of various features of computer applications.
For very simple grammars, one may be able to write the corresponding lexer and parser from scratch. But sooner than later the grammars may get complicated to the point that hand-writing these modules becomes fastidious, bug-prone and maybe more importantly difficult to read. Hence the existence of Lexer and Parser Generators which are stand-alone programs that produce the code of lexers and parsers (in a particular programming language such as C, Java or C#) from a list of rules (expressed in a syntax particular to the generator, though many generators tend to use similar syntaxes, loosely base on BNF).
When using such a lexer/parser generator, work in done in multiple steps:
- first one writes a definition of the grammar (in the generator-specific language/syntax)
- one runs this grammar through the generator.
- one often repeats the above two steps multiple times, because writing a grammar is an exacting exercise: the generator will complain of many possible ambiguities one may write into the grammar.
- eventually the generator produces a source file (in the desired target language such as C# etc.)
- this source is included in the overall project
- other source files in the project may invoke the functions exposed in the source files produced by the generator and/or some logic corresponding to various patterns identified during parsing may readily be may imbedded in the generator produced code.
- the project can then be build as usual, i.e. as if the parser and lexer had be hand-written.
And that's about it for a 20,000 feet high presentation of the process of working with formal grammars and code generators.
A list of parser-generators (aka compiler-compilers) can be found at this link. For simple work in C# I also want to mention Irony. It may be very insightful to peruse these sites, to get a better feel for these concept, even without the intent of becoming a practitioner at this time.
As said, I wish to stress that for this particular application, a ready-made arithmetic evaluator is likely the better approach. The main downside of these would be
some limitations as to what the allowed expression syntax is (either the grammar allowed is too restrictive: you also need say stddev() or is too broad: you don't want your users to use trig functions. With the more mature evaluators, there will be some form of configuration/extension feature which allows dealing with this problem.
the learning curve of such a 3rd party module. Hopefully many of them should be relatively "plug-and-play".
solved with this library http://www.codeproject.com/Articles/21137/Inside-the-Mathematical-Expressions-Evaluator
my final code
Calculator Cal = new Calculator();
txt_LambdaNoot.Text = (Cal.Evaluate(txt_C.Text) / fo).ToString();
now when some one type 3*10^11 he will get 300000000000
You will need to implement (or find a third-party source) an expression parser. This is not a trivial thing to do.
What you need - if you want to do it yourself - is a Scanner (also known as Lexer) + Parser in the code behind which interprets the expression. Alternatively, you can find a 3rd party library which does the job and works similar as the JavaScript eval(string) function does.
Please take a look here, it describes an recursive descent parser. The example is written in C, but you should be able to adapt it to C# easily once you got the idea described in the article.
It is less complicated than it sounds, especially if you have a limited amount of operators to support.
The advantage is that you keep full control on what expressions will be executed (to prevent malicious code injections by the end-user of your website).
Related
How do I get powers of a trig function in Math.net?
Expr x = Expr.Variable("x");
Expr g = (2 * x).Sinh().Pow(2);
g.ToString() gives the output: (sinh(2*x))^2
What I want is sinh^2(2*x)
How do I do that?
Edit:
As per Christoph's comment below this can be done in v0.21.0 which implements Expr.ToCustomString(true)
It does not currently support this notation. However, I see three options:
We define powers of trigonometric functions as new functions, or make them parametric.
This is something I do not wish to do.
We introduce un-applied functions as first class concept, which can be manipulated in such ways.
This is something we likely want to explore at some point, but this is a larger topic, and likely overkill for what you need.
Extend our visual expressions to support positive integer powers of functions.
This is something which we could implement.
Option 3 would save one set of parentheses in such expressions, which would lead to a more compact rendering, especially also for the LaTeX formatter where the result would be more readable. When building a visual expression from an expression, it would automatically pull positive integer powers to the applied function.
To my understanding your concern is only about how it is printed, so it seems to me this would solve your problem as well?
I need to locate a fast, lightweight expression parser.
Ideally I want to pass it a list of name/value pairs (e.g. variables) and a string containing the expression to evaluate. All I need back from it is a true/false value.
The types of expressions should be along the lines of:
varA == "xyz" and varB==123
Basically, just a simple logic engine whose expression is provided at runtime.
UPDATE
At minimum it needs to support ==, !=, >, >=, <, <=
Regarding speed, I expect roughly 5 expressions to be executed per request. We'll see somewhere in the vicinity of 100/requests a second. Our current pages tend to execute in under 50ms. Usually there will only be 2 or 3 variables involved in any expression. However, I'll need to load approximately 30 into the parser prior to execution.
UPDATE 2012/11/5
Update about performance. We implemented nCalc nearly 2 years ago. Since then we've expanded it's use such that we average 40+ expressions covering 300+ variables on post backs. There are now thousands of post backs occurring per second with absolutely zero performance degradation.
We've also extended it to include a handful of additional functions, again with no performance loss. In short, nCalc met all of our needs and exceeded our expectations.
Have you seen https://ncalc.codeplex.com/ and https://github.com/sheetsync/NCalc ?
It's extensible, fast (e.g. has its own cache) enables you to provide custom functions and varaibles at run time by handling EvaluateFunction/EvaluateParameter events. Example expressions it can parse:
Expression e = new Expression("Round(Pow(Pi, 2) + Pow([Pi2], 2) + X, 2)");
e.Parameters["Pi2"] = new Expression("Pi * Pi");
e.Parameters["X"] = 10;
e.EvaluateParameter += delegate(string name, ParameterArgs args)
{
if (name == "Pi")
args.Result = 3.14;
};
Debug.Assert(117.07 == e.Evaluate());
It also handles unicode & many data type natively. It comes with an antler file if you want to change the grammer. There is also a fork which supports MEF to load new functions.
It also supports logical operators, date/time's strings and if statements.
How about the Fast Lightweight Expression Evaluator? It lets you set variables and supports logical operators.
If you need something beefier and have the time, you could also design your own expression language with Irony.
Hisystems' Interpreter supports custom functions, operators and literals, is lightweight pure c# portable code. Currently runs on iOS via MonoTouch and should run on any other Mono environment as well as windows. Free for commercial use. Available on GitHub at https://github.com/hisystems/Interpreter.
I fully appreciate how late this answer is however I would like to throw in my solution because I believe it can add more above the accepted answer of using NCalc should someone wish to use the expressions across multiple platforms.
-- Update --
I have created a parser for C# with plans to also implement it for Java and Swift over the next few months. This would mean that you can evaluate the expressions on multi-platforms without the need for tweaking per platform.
While Java and Swift was planned it never made it in to a fully fledge release. Instead there is now support for .NET Standard enabling support for Xamarin apps.
-- End update --
Expressive is the tool and it is available at:
GitHub or Nuget.
The site has a fair amount of documentation on it but to prevent link rot here is an example on how to use it:
Variable support
var expression = new Expression("1 * [variable]");
var result = expression.Evaluate(new Dictionary<string, object> { ["variable"] = 2);
Functions
var expression = new Expression("sum(1,2,3,4)");
var result = expression.Evaluate();
It was designed to match NCalc as best as possible however it has added support for things like a 'null' keyword.
self promotion here
i wrote aa generic parser generator for c# https://github.com/b3b00/csly
you can find an expression parseras example on my github. you may need to customize it to fit your needs
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How do I go about writing a Parser (Recursive Descent?) in C#? For now I just want a simple parser that parses arithmetic expressions (and reads variables?). Though later I intend to write an xml and html parser (for learning purposes). I am doing this because of the wide range of stuff in which parsers are useful: Web development, Programming Language Interpreters, Inhouse Tools, Gaming Engines, Map and Tile Editors, etc. So what is the basic theory of writing parsers and how do I implement one in C#? Is C# the right language for parsers (I once wrote a simple arithmetic parser in C++ and it was efficient. Will JIT compilation prove equally good?). Any helpful resources and articles. And best of all, code examples (or links to code examples).
Note: Out of curiosity, has anyone answering this question ever implemented a parser in C#?
I have implemented several parsers in C# - hand-written and tool generated.
A very good introductory tutorial on parsing in general is Let's Build a Compiler - it demonstrates how to build a recursive descent parser; and the concepts are easily translated from his language (I think it was Pascal) to C# for any competent developer. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser by hand.
You should look into some tools to generate the code for you - if you are determined to write a classical recursive descent parser (TinyPG, Coco/R, Irony). Keep in mind that there are other ways to write parsers now, that usually perform better - and have easier definitions (e.g. TDOP parsing or Monadic Parsing).
On the topic of whether C# is up for the task - C# has some of the best text libraries out there. A lot of the parsers today (in other languages) have an obscene amount of code to deal with Unicode etc. I won't comment too much on JITted code because it can get quite religious - however you should be just fine. IronJS is a good example of a parser/runtime on the CLR (even though its written in F#) and its performance is just shy of Google V8.
Side Note: Markup parsers are completely different beasts when compared to language parsers - they are, in the majority of the cases, written by hand - and at the scanner/parser level very simple; they are not usually recursive descent - and especially in the case of XML it is better if you don't write a recursive descent parser (to avoid stack overflows, and because a 'flat' parser can be used in SAX/push mode).
Sprache is a powerful yet lightweight framework for writing parsers in .NET. There is also a Sprache NuGet package. To give you an idea of the framework here is one of the samples that can parse a simple arithmetic expression into an .NET expression tree. Pretty amazing I would say.
using System;
using System.Linq.Expressions;
using Sprache;
namespace LinqyCalculator
{
static class ExpressionParser
{
public static Expression<Func<decimal>> ParseExpression(string text)
{
return Lambda.Parse(text);
}
static Parser<ExpressionType> Operator(string op, ExpressionType opType)
{
return Parse.String(op).Token().Return(opType);
}
static readonly Parser<ExpressionType> Add = Operator("+", ExpressionType.AddChecked);
static readonly Parser<ExpressionType> Subtract = Operator("-", ExpressionType.SubtractChecked);
static readonly Parser<ExpressionType> Multiply = Operator("*", ExpressionType.MultiplyChecked);
static readonly Parser<ExpressionType> Divide = Operator("/", ExpressionType.Divide);
static readonly Parser<Expression> Constant =
(from d in Parse.Decimal.Token()
select (Expression)Expression.Constant(decimal.Parse(d))).Named("number");
static readonly Parser<Expression> Factor =
((from lparen in Parse.Char('(')
from expr in Parse.Ref(() => Expr)
from rparen in Parse.Char(')')
select expr).Named("expression")
.XOr(Constant)).Token();
static readonly Parser<Expression> Term = Parse.ChainOperator(Multiply.Or(Divide), Factor, Expression.MakeBinary);
static readonly Parser<Expression> Expr = Parse.ChainOperator(Add.Or(Subtract), Term, Expression.MakeBinary);
static readonly Parser<Expression<Func<decimal>>> Lambda =
Expr.End().Select(body => Expression.Lambda<Func<decimal>>(body));
}
}
C# is almost a decent functional language, so it is not such a big deal to implement something like Parsec in it. Here is one of the examples of how to do it: http://jparsec.codehaus.org/NParsec+Tutorial
It is also possible to implement a combinator-based Packrat, in a very similar way, but this time keeping a global parsing state somewhere instead of doing a pure functional stuff. In my (very basic and ad hoc) implementation it was reasonably fast, but of course a code generator like this must perform better.
I know that I am a little late, but I just published a parser/grammar/AST generator library named Ve Parser. you can find it at http://veparser.codeplex.com or add to your project by typing 'Install-Package veparser' in Package Manager Console. This library is kind of Recursive Descent Parser that is intended to be easy to use and flexible. As its source is available to you, you can learn from its source codes. I hope it helps.
In my opinion, there is a better way to implement parsers than the traditional methods that results in simpler and easier to understand code, and especially makes it easier to extend whatever language you are parsing by just plugging in a new class in a very object-oriented way. One article of a larger series that I wrote focuses on this parsing method, and full source code is included for a C# 2.0 parser:
http://www.codeproject.com/Articles/492466/Object-Oriented-Parsing-Breaking-With-Tradition-Pa
Well... where to start with this one....
First off, writing a parser, well that's a very broad statement especially with the question your asking.
Your opening statement was that you wanted a simple arithmatic "parser" , well technically that's not a parser, it's a lexical analyzer, similar to what you may use for creating a new language. ( http://en.wikipedia.org/wiki/Lexical_analysis ) I understand however exactly where the confusion of them being the same thing may come from. It's important to note, that Lexical analysis is ALSO what you'll want to understand if your going to write language/script parsers too, this is strictly not parsing because you are interpreting the instructions as opposed to making use of them.
Back to the parsing question....
This is what you'll be doing if your taking a rigidly defined file structure to extract information from it.
In general you really don't have to write a parser for XML / HTML, beacuse there are already a ton of them around, and more so if your parsing XML produced by the .NET run time, then you don't even need to parse, you just need to "serialise" and "de-serialise".
In the interests of learning however, parsing XML (Or anything similar like html) is very straight forward in most cases.
if we start with the following XML:
<movies>
<movie id="1">
<name>Tron</name>
</movie>
<movie id="2">
<name>Tron Legacy</name>
</movie>
<movies>
we can load the data into an XElement as follows:
XElement myXML = XElement.Load("mymovies.xml");
you can then get at the 'movies' root element using 'myXML.Root'
MOre interesting however, you can use Linq easily to get the nested tags:
var myElements = from p in myXML.Root.Elements("movie")
select p;
Will give you a var of XElements each containing one '...' which you can get at using somthing like:
foreach(var v in myElements)
{
Console.WriteLine(string.Format("ID {0} = {1}",(int)v.Attributes["id"],(string)v.Element("movie"));
}
For anything else other than XML like data structures, then I'm afraid your going to have to start learning the art of regular expressions, a tool like "Regular Expression Coach" will help you imensly ( http://weitz.de/regex-coach/ ) or one of the more uptodate similar tools.
You'll also need to become familiar with the .NET regular expression objects, ( http://www.codeproject.com/KB/dotnet/regextutorial.aspx ) should give you a good head start.
Once you know how your reg-ex stuff works then in most cases it's a simple case case of reading in the files one line at a time and making sense of them using which ever method you feel comfortable with.
A good free source of file formats for almost anything you can imagine can be found at ( http://www.wotsit.org/ )
For the record I implemented parser generator in C# just because I couldn't find any working properly or similar to YACC (see: http://sourceforge.net/projects/naivelangtools/).
However after some experience with ANTLR I decided to go with LALR instead of LL. I know that theoretically LL is easier to implement (generator or parser) but I simply cannot live with stack of expressions just to express priorities of operators (like * goes before + in "2+5*3"). In LL you say that mult_expr is embedded inside add_expr which does not seem natural for me.
I need to enable user that he can write own formula in datagridview. Something like a function in Excel.
Example of formula definition:
So, user write his own formula in formula cell and then in other table is shown result for each. How I can do this?
I would try NCalc
NCalc is a mathematical expressions evaluator in .NET. NCalc can parse any expression and evaluate the result, including static or dynamic parameters and custom functions.
Dictionary<string, int> dict = new Dictionary<string, int>() { { "Income", 1000 }, { "Tax", 5 } };
string expressionString = "Income * Tax";
NCalc.Expression expr = new NCalc.Expression(expressionString);
expr.EvaluateParameter += (name, args) =>
{
args.Result = dict[name];
};
int result = (int)expr.Evaluate();
Your formula could be manipulated to C# and dynamically compiled using SystemCodeCom.Compiler and you could run it on the fly feeding in your variable values.
Otherwise you are going to have to impliment some kind of mini parser/compiler - which is a rather particular skill and which could quickly get complicated - especially if your formulas become more complicated (which maybe likely).
There is are codeproject articles on dynamic complilation here and here. But there are plenty of other examples around on the web.
There are a number of ways you can do that, they all revolve around translating the formula into executable code.
Do you want to write your own parser or do you wnat to use an existing one. C# itself, IronPython, IronRuby, some off the shelf component. If you use a full parser you might want to look at how to restrict what the user can do with it, inadvertantly or otherwise...
If they are as simple as they look, some sort of expression builder, (pick two named values and an operator) might be the way to go, but modularise, both building the expression and evaluating it so you can beef up at some later point.
However given how simple they appear to be, I'd be tempted to predefine expressions (loaded as meta data from some sort of backing store, and make it select one of these as opposed to user entering it. You could easily spend months at this aspect of the design, is it worth it?
I had a similar requirement (dynamic parsing of expressions) recently in a project I am working on and ended up using VB expressions from WF (Windows Workflow Foundation). It certainly depends on how important this functionality is for you and how much effort are you willing to put into it. In my case it turned out better than NCalc for several reasons:
it supports more complex expressions than NCalc
the resulting expression trees can be analyzed to determine dependencies for individual expressions
it's the same language for expressions as in WF (which I already use elsewhere in the project)
Anyway, here is a short blogpost I've written on the subject.
I created the albatross expression parser a few years back. It has been open sourced for a while but I finally get around and published v2.02 and added documentation recently. It is being actively maintained. It has a couple nice features:
Circular reference detection
Source variable from external object directly
Reversed generation of expressions
Properly documented
For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don't tell me to read the dragon book, i'll come around to that eventually.
I wanna get into the gritty details about implementing a lexer and parser for a reasonable simple language, say CSS. And I wanna do this right.
This will probably end up being a series of questions but right now I'm starting with a lexer. Tokenization rules for CSS can be found here.
I find my self writing code like this (hopefully you can infer the rest from this snippet):
public CssToken ReadNext()
{
int val;
while ((val = _reader.Read()) != -1)
{
var c = (char)val;
switch (_stack.Top)
{
case ParserState.Init:
if (c == ' ')
{
continue; // ignore
}
else if (c == '.')
{
_stack.Transition(ParserState.SubIdent, ParserState.Init);
}
break;
case ParserState.SubIdent:
if (c == '-')
{
_token.Append(c);
}
_stack.Transition(ParserState.SubNMBegin);
break;
What is this called? and how far off am I from something reasonable well understood? I'm trying to balance something which is fair in terms of efficiency and easy to work with, using a stack to implement some kind of state machine is working quite well, but I'm unsure how to continue like this.
What I have is an input stream, from which I can read 1 character at a time. I don't do any look a head right now, I just read the character then depending on the current state try to do something with that.
I'd really like to get into the mind set of writing reusable snippets of code. This Transition method is currently means to do that, it will pop the current state of the stack and then push the arguments in reverse order. That way, when I write Transition(ParserState.SubIdent, ParserState.Init) it will "call" a sub routine SubIdent which will, when complete, return to the Init state.
The parser will be implemented in much the same way, currently, having everything in a single big method like this allows me to easily return a token when I found one, but it also forces me to keep everything in one single big method. Is there a nice way to split these tokenization rules into separate methods?
What you're writing is called a pushdown automaton. This is usually more power than you need to write a lexer, it's certainly excessive if you're writing a lexer for a modern language like CSS. A recursive descent parser is close in power to a pushdown automaton, but recursive descent parsers are much easier to write and to understand. Most parser generators generate pushdown automatons.
Lexers are almost always written as finite state machines, i.e., like your code except get rid of the "stack" object. Finite state machines are closely related to regular expressions (actually, they're provably equivalent to one another). When designing such a parser, one usually starts with the regular expressions and uses them to create a deterministic finite automaton, with some extra code in the transitions to record the beginning and end of each token.
There are tools to do this. The lex tool and its descendants are well known and have been translated into many languages. The ANTLR toolchain also has a lexer component. My preferred tool is ragel on platforms that support it. There is little benefit to writing a lexer by hand most of the time, and the code generated by these tools will probably be faster and more reliable.
If you do want to write your own lexer by hand, good ones often look something like this:
function readToken() // note: returns only one token each time
while !eof
c = peekChar()
if c in A-Za-z
return readIdentifier()
else if c in 0-9
return readInteger()
else if c in ' \n\r\t\v\f'
nextChar()
...
return EOF
function readIdentifier()
ident = ""
while !eof
c = nextChar()
if c in A-Za-z0-9
ident.append(c)
else
return Token(Identifier, ident)
// or maybe...
return Identifier(ident)
Then you can write your parser as a recursive descent parser. Don't try to combine lexer / parser stages into one, it leads to a total mess of code. (According to the Parsec author, it's slower, too).
You need to write your own Recursive Descent Parser from your BNF/EBNF. I had to write my own recently and this page was a lot of help. I'm not sure what you mean by "with just code". Do you mean you want to know how to write your own recursive parser?
If you want to do that, you need to have your grammar in place first. Once you have your EBNF/BNF in place, the parser can be written quite easily from it.
The first thing I did when I wrote my parser, was to read everything in and then tokenize the text. So I essentially ended up with an array of tokens that I treated as a stack. To reduce the verbosity/overhead of pulling a value off a stack and then pushing it back on if you don't require it, you can have a peek method that simply returns the top value on the stack without popping it.
UPDATE
Based on your comment, I had to write a recursive-descent parser in Javascript from scratch. You can take a look at the parser here. Just search for the constraints function. I wrote my own tokenize function to tokenize the input as well. I also wrote another convenience function (peek, that I mentioned before). The parser parses according to the EBNF here.
This took me a little while to figure out because it's been years since I wrote a parser (last time I wrote it was in school!), but trust me, once you get it, you get it. I hope my example gets your further along on your way.
ANOTHER UPDATE
I also realized that my example may not be what you want because you might be going towards using a shift-reduce parser. You mentioned that right now you are trying to write a tokenizer. In my case, I did write my own tokenizer in Javascript. It's probably not robust, but it was sufficient for my needs.
function tokenize(options) {
var str = options.str;
var delimiters = options.delimiters.split("");
var returnDelimiters = options.returnDelimiters || false;
var returnEmptyTokens = options.returnEmptyTokens || false;
var tokens = new Array();
var lastTokenIndex = 0;
for(var i = 0; i < str.length; i++) {
if(exists(delimiters, str[i])) {
var token = str.substring(lastTokenIndex, i);
if(token.length == 0) {
if(returnEmptyTokens) {
tokens.push(token);
}
}
else {
tokens.push(token);
}
if(returnDelimiters) {
tokens.push(str[i]);
}
lastTokenIndex = i + 1;
}
}
if(lastTokenIndex < str.length) {
var token = str.substring(lastTokenIndex, str.length);
token = token.replace(/^\s+/, "").replace(/\s+$/, "");
if(token.length == 0) {
if(returnEmptyTokens) {
tokens.push(token);
}
}
else {
tokens.push(token);
}
}
return tokens;
}
Based on your code, it looks like you are reading, tokenizing, and parsing at the same time - I'm assuming that's what a shift-reduce parser does? The flow for what I have is tokenize first to build the stack of tokens, and then send the tokens through the recursive-descent parser.
If you are going to hand code everything from scratch I would definately consider going with a recursive decent parser. In your post you are not really saying what you will be doing with the token stream once you have parsed the source.
Some things I would recommend getting a handle on
1. Good design for your scanner/lexer, this is what will be tokenizing your source code for your parser.
2. The next thing is the parser, if you have a good ebnf for the source language the parser can usually translate quite nicely into a recursive decent parser.
3. Another data structure you will really need to get your head around is the symbol table. This can be as simple as a hashtable or as complex as a tree structure that can represent complex record structures etc. I think for CSS you might be somewhere between the two.
4. And finally you want to deal with code generation. You have many options here. For an interpreter, you might simply interpret on the fly as you parse the code. A better approach might be to generate a for of i-code that you can then write an iterpreter for, and later even a compiler. Of course for the .NET platform you could directly generate IL (probably not applicable for CSS :))
For references, I gather you are not heavy into the deep theory and I do not blame you. A really good starting point for getting the basics without complex, code if you do not mind the Pascal that is, is Jack Crenshaw's 'Let's build a compiler'
http://compilers.iecc.com/crenshaw/
Good luck I am sure you are going to enjoy this project.
It looks like you want to implement a "shift-reduce" parser, where you explicitly build a token stack. The usual alternative is a "recursive descent" parser, in which depth of procedure calls build the same token stack with their own local variables, on the actual hardware stack.
In shift-reduce, the term "reduce" refers to the operation performed on the explicitly-maintained token stack. For example, if the top of the stack has become Term, Operator, Term then a reduction rule can be applied resulting in Expression as a replacement for the pattern. The reduction rules are explicitly encoded in a data structure used by the shift-reduce parser; as a result, all reduction rules can be found in the same spot of the source code.
The shift-reduce approach brings a few benefits compared to recursive-descent. On a subjective level, my opinion is that shift-reduce is easier to read and maintain than recursive-descent. More objectively, shift-reduce allows for more informative error messages from the parser when an unexpected token occurs.
Specifically, because the shift-reduce parser has an explicit encoding of rules for making "reductions," the parser is easily extended to articulate what sorts of tokens could legally have followed. (e.g., "; expected"). A recursive descent implementation cannot easily be extended to do the same thing.
A great book on both kinds of parser, and the trade-offs in implementing different kinds of shift-reduce is "Introduction to Compiler Construction", by Thomas W. Parsons.
Shift-reduce is sometimes called "bottom-up" parsing and recursive-descent is sometimes called "top-down" parsing. In the analogy used, nodes composed with highest precedence (e.g., "factors" in multiplication expression) are considered to be "at the bottom" of the parsing. This is in accord with the same analogy used in "descent" of "recursive descent".
If you want to use the parser to also handle not-well-formed expressions, you really want a recursive descent parser. Much easier to get the error handling and reporting usable.
For literature, I'd recommend some of the old work of Niklaus Wirth. He knows how to write. Algorithms + Data Structures = Programs is what I used, but you can find his Compiler Construction online.