Fun things to do with generators and sequences - c#

Does anyone else feel that the iterators are coming up short when you want to take a part a sequence piece by piece?
Maybe I should just start writing my code in F# (btw anybody knows if F# uses lazy evaluation) but I've found myself wanting a way to pull at a sequence in a very distinct way.
e.g.
// string.Split implemented as a generator, with lazy evaluation
var it = "a,b,c".GetSplit(',').GetEnumerator();
it.MoveNext();
var a = it.Current;
it.MoveNext();
it.MoveNext();
var c = it.Current;
That works, but I don't like it, it is ugly. So can I do this?
var it = "a,b,c".GetSplit(',');
string a;
var c = it.Yield(out a).Skip(1).First();
That's better. But I'm wondering if there's another way of generalizing the same semantic, maybe this is good enough. Usually I'm doing some embedded string parsing, that's when this pops out.
There's also the case where I wish to consume a sequence up to a specific point, then basically, fork it (or clone it, that's better). Like so
var s = "abc";
IEnumerable<string> a;
var b = s.Skip(1).Fork(out a);
var s2 = new string(a.ToArray()); // "bc"
var s3 = new string(b.ToArray()); // "bc"
This last one, might not be that useful at first, I find that it solves backtracking issues rather conveniently.
My question is do we need this? or does it already exist in some manner and I've just missed it?

Sequences basically work OK at what they do, which is to provide a simple interface that yields a stream of values on demand. If you have more complicated demands then you're welcome to use a more powerful interface.
For instance, your string examples look like they could benefit being written as a parser: that is, a function that consumes a sequence of characters from a stream and uses internal state to keep track of where it is in the stream.

Related

Is it advisable to use tokens for the purpose of syntax highlighting?

I'm trying to implement syntax highlighting in C# on Android, using Xamarin. I'm using the ANTLR v4 library for C# to achieve this. My code, which is currently syntax highlighting Java with this grammar, does not attempt to build a parse tree and use the visitor pattern. Instead, I simply convert the input into a list of tokens:
private static IList<IToken> Tokenize(string text)
{
var inputStream = new AntlrInputStream(text);
var lexer = new JavaLexer(inputStream);
var tokenStream = new CommonTokenStream(lexer);
tokenStream.Fill();
return tokenStream.GetTokens();
}
Then I loop through all of the tokens in the highlighter and assign a color to them based on their kind.
public void HighlightAll(IList<IToken> tokens)
{
int tokenCount = tokens.Count;
for (int i = 0; i < tokenCount; i++)
{
var token = tokens[i];
var kind = GetSyntaxKind(token);
HighlightNext(token, kind);
if (kind == SyntaxKind.Annotation)
{
var nextToken = tokens[++i];
Debug.Assert(token.Text == "#" && nextToken.Type == Identifier);
HighlightNext(nextToken, SyntaxKind.Annotation);
}
}
}
public void HighlightNext(IToken token, SyntaxKind tokenKind)
{
int count = token.Text.Length;
if (token.Type != -1)
{
_text.SetSpan(_styler.GetSpan(tokenKind), _index, _index + count, SpanTypes.InclusiveExclusive);
_index += count;
}
}
Initially, I figured this was wise because syntax highlighting is largely context-independent. However, I have already found myself needing to special-case identifiers in front of #, since I want those to get highlighted as annotations just as on GitHub (example). GitHub has further examples of coloring identifiers in certain contexts: here, List and ArrayList are colored, while mItems is not. I will likely have to add further code to highlight identifiers in those scenarios.
My question is, is it a good idea to examine tokens rather than a parse tree here? On one hand, I'm worried that I might have to end up doing a lot of special-casing for when a token's neighbors alter how it should be highlighted. On the other, parsing will add additional overhead for memory-constrained mobile devices, and make it more complicated to implement efficient syntax highlighting (e.g. not re-tokenizing/parsing everything) when the user edits text in the code editor. I also found it significantly less complicated to handle all of the token types rather than the parser rule types, because you just switch on token.Type rather than overriding a bunch of Visit* methods.
For reference, the full code of the syntax highlighter is available here.
It depends on what you are syntax highlighting.
If you use a naive parser, then any syntax error in the text will cause highlighting to fail. That makes it quite a fragile solution since a lot of the texts you might want to syntax highlight are not guaranteed to be correct (particularly​ user input, which at best will not be correct until it is fully typed). Since syntax highlighting can help make syntax errors visible and is often used for that purpose, failing completely on syntax errors is counter-productive.
Text with errors does not readily fit into a syntax tree. But it does have more structure than a stream of tokens. Probably the most accurate representation would be a forest of subtree fragments, but that is an even more awkward data structure to work with than a tree.
Whatever the solution you choose, you will end up negotiating between conflicting goals: complexity vs. accuracy vs. speed vs. usability. A parser may be part of the solution, but so may ad hoc pattern matching.
Your approach is totally fine and pretty much what everybody's using. And it's totally normal to fine tune type matching by looking around (and it's cheap since the token types are cached). So you can always just look back or ahead in the token stream if you need to adjust actually used SyntaxKind. Don't start parsing your input. It won't help you.
I ended up choosing to use a parser because there were too many ad hoc rules. For example, although I wanted to color regular identifiers white, I wanted types in type declarations (e.g. C in class C) to be green. There ended up being about 20 of these special rules in total. Also, the added overhead of parsing turned out to be miniscule compared to other bottlenecks in my app.
For those interested, you can view my code here: https://github.com/jamesqo/Repository/blob/e5d5653093861bc35f4c0ac71ad6e27265e656f3/Repository.EditorServices/Internal/Java/Highlighting/JavaSyntaxHighlighter.VisitMethods.cs#L19-L76. I've highlighted all of the ~20 special rules I've had to make.

why is this code bad?

Below is a code in C# for rss reader, why this code is bad? this class generates a list of the 5 most recent posts, sorted by title. What do you use to analyze code in C#?
static Story[] Parse(string content)
{
var items = new List<string>();
int start = 0;
while (true)
{
var nextItemStart = content.IndexOf("<item>", start);
var nextItemEnd = content.IndexOf("</item>", nextItemStart);
if (nextItemStart < 0 || nextItemEnd < 0) break;
String nextItem = content.Substring(nextItemStart, nextItemEnd + 7 - nextItemStart);
items.Add(nextItem);
start = nextItemEnd;
}
var stories = new List<Story>();
for (byte i = 0; i < items.Count; i++)
{
stories.Add(new Story()
{
title = Regex.Match(items[i], "(?<=<title>).*(?=</title>)").Value,
link = Regex.Match(items[i], "(?<=<link>).*(?=</link>)").Value,
date = Regex.Match(items[i], "(?<=<pubDate>).*(?=</pubdate>)").Value
});
}
return stories.ToArray();
}
why don't use XmlReader or XmlDocument or LINQ to Xml?
It is bad because it's using string parsing when there are excellent classes in the framework for parsing XML. Even better, there are classes to deal with RSS feeds.
ETA:
Sorry to not have answered your second question earlier. There are a great number of tools to analyze correctness and quality of C# code. There's probably a huge list compiled somewhere, but here's a few I use on a daily basis to help ensure quality code:
StyleCop (code formatting standards)
Resharper (idiomatic programming, gotcha catching)
FxCop (code correctness, standards adherence, idiomatic programming)
Pex (white box testing)
Nitriq (code quality metrics)
NUnit (unit testing)
You shouldn't parse XML with string functions and regular expressions. XML can get very complicated and be formatted many ways that a real XML parser like XmlReader can handle, but will break your simple string parsing code.
Basically: don't try and reinvent the wheel (an xml parser), especially when you don't realize how complicated that wheel actually is.
I think the worst thing for the code is the performance issue. You should parse the xml string into a XDocument(or similar structure) instead of parse it again and again using regex.
simply because you are reinventing a xml parser , use Linq to xml instead, it is very simple and clean.I am sure that you can do all the above in three lines of code if you use Linq to XML , your code uses a lot of magic numbers (ex: 7-n ..), the thing that make it unstable and non generic
For starters, it uses byte as a indexer instead of int (what if items has more items in it than a byte can represent?). It doesn't use idiomatic C# (see user1645569's response). It also unnecessarily uses var instead of specific data types (that's more stylistic though, but for me I prefer not to, hence by my metric it's not ideal (and you've given no other metric)).
Let me clarify what I am saying about "unnecessarily using var": var in and of itself is not bad, and I am not suggesting that. I am (mostly) suggesting the usage here isn't very consistent. For example, explicitly declaring start as an int, but then declaring nextItemEnd as var (which will deduce to int) and assigning nextItemEnd to start seems (to me) like a weird mixture between wanting to automatically deduce a variable's type and explicitly declaring it. I think it's good that var was not used in start's declaration (because then it's not exactly clear if the intent is an integer or a floating point number), but I (personally) don't think it helps any to declare nextItemStart and nextItemEnd as var. I tend to prefer to use var for more complex/longer data types (similar to how I use auto in C++ for iterators, but not for "simpler" data types).

Clean and natural scripting functionality without parsing

I'm experimenting with creating a semi-natural scripting language, mostly for my own learning purposes, and for fun. The catch is that it needs to be in native C#, no parsing or lexical analysis on my part, so whatever I do needs to be able to be done through normal syntactical sugar.
I want it to read somewhat like a sentence would, so that it is easy to read and learn, especially for those that aren't especially fluent with programming, but I also want the full functionality of native code available to the user.
For example, in the perfect world it would look like a natural language (English in this case):
When an enemy is within 10 units of player, the enemy attacks the player
In C#, allowing a sentence like this to actually do what the scripter intends would almost certainly require that this be a string that is run through a parser and lexical analyzer. My goal isn't that I have something this natural, and I don't want the scripter to be using strings to script. I want the scripter to have full access to C#, and have things like syntax highlighting, intellisense, debugging in IDE, etc. So what I'm trying to get it something that reads easily, but is in native C#. A couple of the major hurdles that I don't see a way to overcome is getting rid of periods ., commas ,, and parentheses for empty methods (). For example, something like this is feasible but doesn't read very cleanly:
// C#
When(Enemy.Condition(Conditions.isWithinDistance(Enemy, Player, 10))), Event(Attack(Enemy, Player))
Using a language like Scala you can actually get much closer, because periods and parentheses can be replaced by a single whitespace in many cases. For example, you could take the above statement and make it look something like this in Scala:
// Scala
When(Enemy is WithinDistance(Player, 10)) => Then(Attack From(Enemy, Player))
This above code would actually compile assuming you setup your engine to handle it, in fact you might be able to coax further parentheses and commas out of this. Without the syntactical sugar in the above example it would be more like this, in Scala:
// Scala (without syntactical sugar)
When(Enemy.is(WithinDistance(Player, 10)) => Then(Attack().From(Enemy, Player))
The bottom line is I want to get as close as possible to something like the first scala example using native C#. It may be that there is really nothing I can do, but I'm willing to try any tricks that may be possible to make it read more natural, and get the periods, parentheses, and commas out of there (except when they make sense even in natural language).
I'm not as experienced with C# as other languages, so I might not know about some syntax tricks that are available, like macros in C++. Not that macros would actually be a good solution, they would probably cause more problems then they would solve, and would be a debugging nightmare, but you get where I'm going with this, at least in C++ it would be feasible. Is what I'm wanting even possible in C#?
Here's an example, using LINQ and Lambda expressions you can sometimes get the same amount of work done with fewer lines, less symbols, and code the reads closer to English. For example, here's an example of three collisions that happen between pairs of objects with IDs, we want to gather all collisions with the object that has ID 5, then sort those collisions by the "first" ID in the pair, and then output the pairs. Here is how you would do this without LINQ and/or Lambra expressions:
struct CollisionPair : IComparable, IComparer
{
public int first;
public int second;
// Since we're sorting we'll need to write our own Comparer
int IComparer.Compare( object one, object two )
{
CollisionPair pairOne = (CollisionPair)one;
CollisionPair pairTwo = (CollisionPair)two;
if (pairOne.first < pairTwo.first)
return -1;
else if (pairTwo.first < pairOne.first)
return 1;
else
return 0;
}
// ...and our own compable
int IComparable.CompareTo( object two )
{
CollisionPair pairTwo = (CollisionPair)two;
if (this.first < pairTwo.first)
return -1;
else if (pairTwo.first < this.first)
return 1;
else
return 0;
}
}
static void Main( string[] args )
{
List<CollisionPair> collisions = new List<CollisionPair>
{
new CollisionPair { first = 1, second = 5 },
new CollisionPair { first = 2, second = 3 },
new CollisionPair { first = 5, second = 4 }
};
// In a script this would be all the code you needed, everything above
// would be part of the game engine
List<CollisionPair> sortedCollisionsWithFive = new List<CollisionPair>();
foreach (CollisionPair c in collisions)
{
if (c.first == 5 || c.second == 5)
{
sortedCollisionsWithFive.Add(c);
}
}
sortedCollisionsWithFive.Sort();
foreach (CollisionPair c in sortedCollisionsWithFive)
{
Console.WriteLine("Collision between " + c.first +
" and " + c.second);
}
}
And now the same example with LINQ and Lambda. Notice in this example we don't have to both with making CollisionPair both IComparable and IComparer, and don't have to implement to the Compare and CompareTo methods:
struct CollisionPair
{
public int first;
public int second;
}
static void Main( string[] args )
{
List<CollisionPair> collisions = new List<CollisionPair>
{
new CollisionPair { first = 1, second = 5 },
new CollisionPair { first = 2, second = 3 },
new CollisionPair { first = 5, second = 4 }
};
// In a script this would be all the code you needed, everything above
// would be part of the game engine
(from c in collisions
where ( c.first == 5 || c.second == 5 )
orderby c.first select c).ForEach(c =>
Console.WriteLine("Collision between " + c.first +
" and " + c.second));
}
In the end we're left with a LINQ and Lambda expression that read closer to natural language, and are much less code for both a game engine and for the script. These kinds of changes are really what I'm looking for, but obviously LINQ and Lambda are both limited to specific syntax, not something as generic as I would like in the end.
Another approach would be to use FluentInterface "pattern", implement something like:
When(enemy).IsWithin(10.units()).Of(player).Then(enemy).Attacks(player);
If you make the functions like When, IsWithin, Of, Then return some interfaces, then you will be able easily add new extension methods to expand your rules language.
For example let's take a look at function Then:
public IActiveActor Then(this ICondition condition, Actor actor) {
/* keep the actor, etc */
}
public void Attacks(this IActiveActor who, Actor whom) {
/* your business logic */
}
In the future it would be easy to implement another function, say RunAway() without changing anything in your code:
public void RunAway(this IActiveActor who) {
/* perform runaway logic */
}
so it with this little addition you will be able to write:
When(player).IsWithin(10.units()).Of(enemy).Then(player).RunAway();
Same for conditions, assuming When returns something like ICheckActor, you can introduce new conditions by simply defining new functions:
public ICondition IsStrongerThan(this ICheckActor me, Actor anotherGuy) {
if (CompareStrength(me, anotherGuy) > 0)
return TrueActorCondition(me);
else
return FalseActorCondition(me);
}
so now you can do:
When(player)
.IsWithin(10.units()).Of(enemy)
.And(player).IsStrongerThan(enemy)
.Then(player)
.Attacks(enemy);
or
When(player)
.IsWithin(10.units()).Of(enemy)
.And(enemy).IsStrongerThan(player)
.Then(player)
.RunAway();
The point is that you can improve your language without experiencing heavy impact on the code you already have.
Honestly I don't think this is a good direction for a language. Take a look at AppleScript sometime. They went to great pains to mimic natural language, and in trivial examples you can write AppleScript that reads like English. In real usage, it's a nightmare. It's awkward and cumbersome to use. And it's hard to learn, because people have a very hard time with "just write this incredibly limited subset of English with no deviations from the set pattern." It's easier to learn real C# syntax, which is regular and predictable.
I don't quite understand your requirement of "written in native C#". Why? Probably you want it to be written in native .NET? I can understand this as you can compile these rules written in "plain English" into .NET with no parsing etc. Then your engine (probably written in C#) will be able to use these rules, evaluate them, etc. Just because it is all .NET, doesn't really matter which language developer used.
Now, if C# is not really a requirement, then we can stop figuring out how to make "ugly-ugly" syntax look "just ugly" :)
We can look at, for example, F#. It compiles into .NET in the same way C# or VB.NET do, but it is more suitable for solving problems like yours.
You gave us 3 (ugly looking) examples in C# and Scala, here is one in F# I managed to write from the top of my head in 5 minutes:
When enemy (within 10<unit> player) (Then enemy attacks player)
I only spent 5 minutes, so probably it can be even prettier.
No parsing is involved, When, within, Then, attacks are just normal .NET functions (written in F#).
Here is all the code I had to write to make it possible:
[<Measure>] type unit
type Position = int<unit>
type Actor =
| Enemy of Position
| Player of Position
let getPosition actor =
match actor with
| Enemy x -> x
| Player x -> x
let When actor condition positiveAction =
if condition actor
then positiveAction
else ()
let Then actor action = action actor
let within distance actor1 actor2 =
let pos1 = getPosition actor1
let pos2 = getPosition actor2
abs (pos1 - pos2) <= distance
let attacks victim agressor =
printfn "%s attacks %s" (agressor.GetType().Name) (victim.GetType().Name)
This is really it, not hundreds and hundreds of lines of code you would probably write in C# :)
This is a beauty of .NET: you can use appropriate languages for appropriate tasks. And F# is a good language for DLS (just what you need here)
P.S. You can even define functions like "an", "the", "in", etc to make it look more like English (these functions will do nothing but return their first argument):
let an something = something
let the = an
let is = an
Good luck!

hand coding a parser

For all you compiler gurus, I wanna write a recursive descent parser and I wanna do it with just code. No generating lexers and parsers from some other grammar and don't tell me to read the dragon book, i'll come around to that eventually.
I wanna get into the gritty details about implementing a lexer and parser for a reasonable simple language, say CSS. And I wanna do this right.
This will probably end up being a series of questions but right now I'm starting with a lexer. Tokenization rules for CSS can be found here.
I find my self writing code like this (hopefully you can infer the rest from this snippet):
public CssToken ReadNext()
{
int val;
while ((val = _reader.Read()) != -1)
{
var c = (char)val;
switch (_stack.Top)
{
case ParserState.Init:
if (c == ' ')
{
continue; // ignore
}
else if (c == '.')
{
_stack.Transition(ParserState.SubIdent, ParserState.Init);
}
break;
case ParserState.SubIdent:
if (c == '-')
{
_token.Append(c);
}
_stack.Transition(ParserState.SubNMBegin);
break;
What is this called? and how far off am I from something reasonable well understood? I'm trying to balance something which is fair in terms of efficiency and easy to work with, using a stack to implement some kind of state machine is working quite well, but I'm unsure how to continue like this.
What I have is an input stream, from which I can read 1 character at a time. I don't do any look a head right now, I just read the character then depending on the current state try to do something with that.
I'd really like to get into the mind set of writing reusable snippets of code. This Transition method is currently means to do that, it will pop the current state of the stack and then push the arguments in reverse order. That way, when I write Transition(ParserState.SubIdent, ParserState.Init) it will "call" a sub routine SubIdent which will, when complete, return to the Init state.
The parser will be implemented in much the same way, currently, having everything in a single big method like this allows me to easily return a token when I found one, but it also forces me to keep everything in one single big method. Is there a nice way to split these tokenization rules into separate methods?
What you're writing is called a pushdown automaton. This is usually more power than you need to write a lexer, it's certainly excessive if you're writing a lexer for a modern language like CSS. A recursive descent parser is close in power to a pushdown automaton, but recursive descent parsers are much easier to write and to understand. Most parser generators generate pushdown automatons.
Lexers are almost always written as finite state machines, i.e., like your code except get rid of the "stack" object. Finite state machines are closely related to regular expressions (actually, they're provably equivalent to one another). When designing such a parser, one usually starts with the regular expressions and uses them to create a deterministic finite automaton, with some extra code in the transitions to record the beginning and end of each token.
There are tools to do this. The lex tool and its descendants are well known and have been translated into many languages. The ANTLR toolchain also has a lexer component. My preferred tool is ragel on platforms that support it. There is little benefit to writing a lexer by hand most of the time, and the code generated by these tools will probably be faster and more reliable.
If you do want to write your own lexer by hand, good ones often look something like this:
function readToken() // note: returns only one token each time
while !eof
c = peekChar()
if c in A-Za-z
return readIdentifier()
else if c in 0-9
return readInteger()
else if c in ' \n\r\t\v\f'
nextChar()
...
return EOF
function readIdentifier()
ident = ""
while !eof
c = nextChar()
if c in A-Za-z0-9
ident.append(c)
else
return Token(Identifier, ident)
// or maybe...
return Identifier(ident)
Then you can write your parser as a recursive descent parser. Don't try to combine lexer / parser stages into one, it leads to a total mess of code. (According to the Parsec author, it's slower, too).
You need to write your own Recursive Descent Parser from your BNF/EBNF. I had to write my own recently and this page was a lot of help. I'm not sure what you mean by "with just code". Do you mean you want to know how to write your own recursive parser?
If you want to do that, you need to have your grammar in place first. Once you have your EBNF/BNF in place, the parser can be written quite easily from it.
The first thing I did when I wrote my parser, was to read everything in and then tokenize the text. So I essentially ended up with an array of tokens that I treated as a stack. To reduce the verbosity/overhead of pulling a value off a stack and then pushing it back on if you don't require it, you can have a peek method that simply returns the top value on the stack without popping it.
UPDATE
Based on your comment, I had to write a recursive-descent parser in Javascript from scratch. You can take a look at the parser here. Just search for the constraints function. I wrote my own tokenize function to tokenize the input as well. I also wrote another convenience function (peek, that I mentioned before). The parser parses according to the EBNF here.
This took me a little while to figure out because it's been years since I wrote a parser (last time I wrote it was in school!), but trust me, once you get it, you get it. I hope my example gets your further along on your way.
ANOTHER UPDATE
I also realized that my example may not be what you want because you might be going towards using a shift-reduce parser. You mentioned that right now you are trying to write a tokenizer. In my case, I did write my own tokenizer in Javascript. It's probably not robust, but it was sufficient for my needs.
function tokenize(options) {
var str = options.str;
var delimiters = options.delimiters.split("");
var returnDelimiters = options.returnDelimiters || false;
var returnEmptyTokens = options.returnEmptyTokens || false;
var tokens = new Array();
var lastTokenIndex = 0;
for(var i = 0; i < str.length; i++) {
if(exists(delimiters, str[i])) {
var token = str.substring(lastTokenIndex, i);
if(token.length == 0) {
if(returnEmptyTokens) {
tokens.push(token);
}
}
else {
tokens.push(token);
}
if(returnDelimiters) {
tokens.push(str[i]);
}
lastTokenIndex = i + 1;
}
}
if(lastTokenIndex < str.length) {
var token = str.substring(lastTokenIndex, str.length);
token = token.replace(/^\s+/, "").replace(/\s+$/, "");
if(token.length == 0) {
if(returnEmptyTokens) {
tokens.push(token);
}
}
else {
tokens.push(token);
}
}
return tokens;
}
Based on your code, it looks like you are reading, tokenizing, and parsing at the same time - I'm assuming that's what a shift-reduce parser does? The flow for what I have is tokenize first to build the stack of tokens, and then send the tokens through the recursive-descent parser.
If you are going to hand code everything from scratch I would definately consider going with a recursive decent parser. In your post you are not really saying what you will be doing with the token stream once you have parsed the source.
Some things I would recommend getting a handle on
1. Good design for your scanner/lexer, this is what will be tokenizing your source code for your parser.
2. The next thing is the parser, if you have a good ebnf for the source language the parser can usually translate quite nicely into a recursive decent parser.
3. Another data structure you will really need to get your head around is the symbol table. This can be as simple as a hashtable or as complex as a tree structure that can represent complex record structures etc. I think for CSS you might be somewhere between the two.
4. And finally you want to deal with code generation. You have many options here. For an interpreter, you might simply interpret on the fly as you parse the code. A better approach might be to generate a for of i-code that you can then write an iterpreter for, and later even a compiler. Of course for the .NET platform you could directly generate IL (probably not applicable for CSS :))
For references, I gather you are not heavy into the deep theory and I do not blame you. A really good starting point for getting the basics without complex, code if you do not mind the Pascal that is, is Jack Crenshaw's 'Let's build a compiler'
http://compilers.iecc.com/crenshaw/
Good luck I am sure you are going to enjoy this project.
It looks like you want to implement a "shift-reduce" parser, where you explicitly build a token stack. The usual alternative is a "recursive descent" parser, in which depth of procedure calls build the same token stack with their own local variables, on the actual hardware stack.
In shift-reduce, the term "reduce" refers to the operation performed on the explicitly-maintained token stack. For example, if the top of the stack has become Term, Operator, Term then a reduction rule can be applied resulting in Expression as a replacement for the pattern. The reduction rules are explicitly encoded in a data structure used by the shift-reduce parser; as a result, all reduction rules can be found in the same spot of the source code.
The shift-reduce approach brings a few benefits compared to recursive-descent. On a subjective level, my opinion is that shift-reduce is easier to read and maintain than recursive-descent. More objectively, shift-reduce allows for more informative error messages from the parser when an unexpected token occurs.
Specifically, because the shift-reduce parser has an explicit encoding of rules for making "reductions," the parser is easily extended to articulate what sorts of tokens could legally have followed. (e.g., "; expected"). A recursive descent implementation cannot easily be extended to do the same thing.
A great book on both kinds of parser, and the trade-offs in implementing different kinds of shift-reduce is "Introduction to Compiler Construction", by Thomas W. Parsons.
Shift-reduce is sometimes called "bottom-up" parsing and recursive-descent is sometimes called "top-down" parsing. In the analogy used, nodes composed with highest precedence (e.g., "factors" in multiplication expression) are considered to be "at the bottom" of the parsing. This is in accord with the same analogy used in "descent" of "recursive descent".
If you want to use the parser to also handle not-well-formed expressions, you really want a recursive descent parser. Much easier to get the error handling and reporting usable.
For literature, I'd recommend some of the old work of Niklaus Wirth. He knows how to write. Algorithms + Data Structures = Programs is what I used, but you can find his Compiler Construction online.

C++ ">>" and "<<" IO in C#?

Is there a C# library that provides the functionality of ">>" and "<<" for IO in C++? It was really convenient for console apps. Granted not a lot of console apps are in C#, but some of us use it for them.
I know about Console.Read[Line]|Write[Line] and Streams|FileStream|StreamReader|StreamWriter thats not part of the question.
I dont think im specific enough
int a,b;
cin >> a >> b;
IS AMAZING!!
string input = Console.ReadLine();
string[] data = input.split( ' ' );
a = Convert.ToInt32( data[0] );
b = Convert.ToInt32( data[1] );
... long winded enough? Plus there are other reasons why the C# solution is worse. I must get the entire line or make my own buffer for it. If the line im working on is IDK say the 1000 line of Bells Triangle, I waste so much time reading everything at one time.
EDIT:
GAR!!!
OK THE PROBLEM!!!
Using IntX to do HUGE number like the .net 4.0 BigInteger to produce the bell triangle. If you know the bell triangle it gets freaking huge very very quickly. The whole point of this question is that I need to deal with each number individually. If you read an entire line, you could easily hit Gigs of data. This is kinda the same as digits of Pi. For Example 42pow1048576 is 1.6 MB! I don't have time nor memory to read all the numbers as one string then pick the one I want
No, and I wouldn't. C# != C++
You should try your best to stick with the language convention of whatever language you are working in.
I think I get what you are after: simple, default formatted input. I think the reason there is no TextReader.ReadXXX() is that this is parsing, and parsing is hard: for example: should ReadFloat():
ignore leading whitespace
require decimal point
require trailing whitespace (123abc)
handle exponentials (12.3a3 parses differently to 12.4e5?)
Not to mention what the heck does ReadString() do? From C++, you would expect "read to the next whitespace", but the name doesn't say that.
Now all of these have good sensible answers, and I agree C# (or rather, the BCL) should provide them, but I can certainly understand why they would choose to not provide fragile, nearly impossible to use correctly, functions right there on a central class.
EDIT:
For the buffering problem, an ugly solution is:
static class TextReaderEx {
static public string ReadWord(this TextReader reader) {
int c;
// Skip leading whitespace
while (-1 != (c = reader.Peek()) && char.IsWhiteSpace((char)c)) reader.Read();
// Read to next whitespace
var result = new StringBuilder();
while (-1 != (c = reader.Peek()) && !char.IsWhiteSpace((char)c)) {
reader.Read();
result.Append((char)c);
}
return result.ToString();
}
}
...
int.Parse(Console.In.ReadWord())
Nope. You're stuck with Console.WriteLine. You could create a wrapper that offered this functionality, though.
You can Use Console.WriteLine , Console.ReadLine ..For the purpose.Both are in System NameSpace.
You have System.IO.Stream(Reader|Writer)
And for console: Console.Write, Console.Read
Not that I know of. If you are interested of the chaining outputs you can use System.Text.StringBuilder.
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder(VS.71).aspx
StringBuilder builder = new StringBuilder();
builder.Append("hello").Append(" world!");
Console.WriteLine(builder.ToString());
Perhaps not as pretty as C++, but as another poster states, C# != C++.
This is not even possible in C#, no matter how hard you try:
The left hand side and right hand side of operators is always passed by value; this rules out the possibility of cin.
The right hand side of << and >> must be an integer; this rules out cout.
The first point is to make sure operator overloading is a little less messy than in C++ (debatable, but it surely makes things a lot simpler), and the second point was specifically chosen to rule out C++'s cin and cout way of dealing with IO, IIRC.

Categories

Resources