C# External Data (similar to iOS Plist) - c#

I am looking for best practices on externalizing simple data structures into human readable files.
I have some experience with iOS's plist functionality (which I believe is XML-like underneith) and I'd like to find something similar.
On the .NET side .resx seems to be the way to go, but as I do research everyone brings up localization and this data is not meant to be localized. Is .resx still the answer?
If so, is there a way to get a dictionary structure of all the .resx data instead of reading a single entry? I'd like to know things like number of entries, an array of all the keys, an array of all the values, etc.

Given my druthers, I'd avoid XML. It's designed to be easy to parse. It's verbose, it's not designed human readability. Avoid the angle-bracket tax if you can.
There's JSON. That's a useful alternative. Simple, easy to read, easy to parse. No angle-bracket tax. That's one option. YAML is another (it's a superset of JSON).
There's LISP-style S-expressions (see also wikipedia). You could also use Prolog-style terms to construct the data structures of your choice (also quite easy to parse).
And there's old-school DOS/Windows INI files. There's multiple tools out there for wrangling them, including .Net/CLR implementations.
You could just co-op Apple's pList format from OS X. You can use its old-school "ASCII" (text) representation or its XML representation.
You can also (preferred, IMHO) write a custom/"little" language to suit your needs specifically. The buzzword du jour for which, these days, is "domain-specific language". I'd avoid using the Visual Studio/C#/.Net domain-specific language facilities because what you get is going to be XML-based.
Terrance Parr's excellent ANTLR is arguably the tool of choice for language building. It's written in Java, comes with an IDE for working with grammars and parse trees, and can target multiple languages (Java, C#, Python, Objective-C, C/C++ are all up-to-date. There's some support for Scala as well. A few other target languages exist for older versions, in varying levels of completeness.)
Terrance Parr's books are equally excellent:
The Definitive ANTLR Reference: Building Domain-Specific Languages
Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages

XML (any format, not just resx) or CSV are common choices. XML is easier to use in the code as long as it is valid XML.
Look into LINQ to XML ( http://msdn.microsoft.com/en-us/library/bb387098.aspx ) to get started on reading XML.

Related

Do ISO grammars follow a specific format

Here is a extract from the grammar section of the C# Language Specification:
Is this written in a specific format? I looked at the grammar section in an old C++ ISO I found and it seemed to follow the same format, so is there some standard being used here for writing this grammar? I ask because I would like to be able to create a tool where I can paste the grammar directly and have a working C# parser immediately.
Microsoft seem to release their C# spec for free, but I can't find the C++11 format anywhere. Am I going to have to buy this to view it?
It's a variant of BNF that used by Yacc. Yacc normally has ; as part of its syntax, but changing that makes things simpler with a language like C# and C++ in which ; is very significant in itself. Unlike most BNF variants, it has a : where often BNF uses = (see also Van Wijngaarden grammar and you'll soon know much more than the little bit of knowledge that this answer is coming from).
ISO don't have a rule on which grammar must be used in their standards, so others use BNF, ABNF, EBNF, Wirth syntax and perhaps others.
ISO standards often originate as national or other standards that are then adopted by ISO. Since different standards bodies use different grammars (The IETF use ABNF in RFCs [itself defined in RFC 5234], BSI and the W3C use different variants of EBNF, and so on) the grammar in an ISO often reflects its origins.
This is the case here. Kernigan and Ritchie used this format in their book, The C Programming Language. While the ANSI standard and later ISO standards differed in the grammar itself, they used the same format, and it's been used since for other C-like languages.
Each standard does its own thing. But among compiler writers there's a fairly standard way of describing grammars, and that's what you're seeing here and in the C++ standard.
This is a variation of the backus naur form of grammars that you are seeing here. While not exactly the standard format it is pretty similar. This is generally the standard way of showing how the language is supposed to be parsed, and a common input to parser generators.
The C++ standard is not available for free. You can buy a copy for 30 USD at webstore.ansi.org. Search for document number 14882, and then look for the C++ standard.
The common way to describe a grammar is using either Backus-Naur Form (BNF) or Extended Backus-Naur Form (EBNF). If you are looking to parse a language easily in C#, take a look at Irony which is a language toolkit for C# and it allows you to use something very similar to EBNF to describe the grammar.
On top of those grammars, there is also Parsing Expression Grammar (PEG) but I don't believe it is as common as BNF or EBNF.

Interpreting custom language

I need to develop an application that will read and understand text file in which I'll find a custom language that describe a list of operations (ie cooking recipe). This language has not been defined yet, but it will probably take one of the following shape :
C++ like code
(This code is randomly generated, just for example purpose) :
begin
repeat(10)
{
bar(toto, 10, 1999, xxx);
}
result = foo(xxxx, 10);
if(foo == ok)
{
...
}
else
{
...
}
end
XML code
(This code is randomly generated, just for example purpose) :
<recipe>
<action name="foo" argument"bar, toto, xxx" repeat=10/>
<action name="bar" argument"xxxxx;10" condition="foo == ok">
<true>...</true>
<false>...</false>
</action>
</recipe>
No matter which language will be chosen, there will have to handle simple conditions, loops.
I never did such a thing but at first sight, it occurs to me that describing those operations into XML would be simplier yet less powerful.
After browsing StackOverFlow, I've found some chats on a tool called "ANTLR"... I started reading "The Definitive ANTLR Reference" but since I never done that kind of stuff, I find it hard to know if it's really the kind of tool I need...
In other words, what do I need to read a text file, interpret it properly and perform actions in my C# code. Those operations will interact between themselves by simple conditions like :
If operation1 failed, I do operation2 else operation3.
Repeat the operation4 10 times.
What would be the best language to do describe those text file (XML, my own) ? What are the key points during such developments ?
I hope I'm being clear :)
Thanks a lot for your help and advices !
XML is great for storing relational data in a verbose way. I think it is a terrible candidate for writing logic such as a program, however.
Have you considered using an existing grammar/scripting language that you can embed, rather than writing your own? E.g:
LUA
Python
In one of my projects I actually started with an XML like language as I already had an XML parser and parsed the XML structure into an expression tree in memory to be interpreted/run.
This works out very nicely to get passed the problem of figuring out tokenizing/parsing of text files and concentrate instead on your 'language' and the logic of the operations in your language. The down side is writing the text files is a little strange and very wordy. Its also very unnatural for a programmer use to C/C++ syntax.
Eventually you could easily replace your XML with a full blown scanner & lexer to parse a more 'natural C++' like text format into your expression tree.
As for writing a scanner & lexer, I found it easier to write these by hand using simple logic flow/loops for the scanner and recursive decent parser for the lexer.
That said, ANTLR is great at letting you write out rules for your language and generating your scanner & lexer for you. This allows for much more dynamic language which can easily change without having to refactor everything again when new things are added. So, it might be worth looking into as learning this as it would save you much time in rewrites as things change if you hand wrote your own.
I'd recommend writing the app in F#. It has many useful features for parsing strings and xmls like Pattern Matching and Active Patterns.
For parsing C-like code I would recommend F# (just did one interpreter with F#, works like a charm)
For parsing XML's I would recommend C#/F# + XmlDocument class.
You basically need to work on two files:
Operator dictionary
Code file in YourLanguage
Load and interpret the operators and then apply them recursively to your code file.
The best prefab answer: S-expressions
C and XML are good first steps. They have sort of opposite disadvantages. The C-like syntax won't add a ton of extra characters, but it's going to be hard to parse due to ambiguity, the variety of tokens, and probably a bunch more issues I can't think of. XML is relatively easy to parse and there's tons of example code, but it will also contain tons of extra text. It might also give you too many options for where to stick language features - for example, is the number of times to repeat a loop an attribute, element or text?
S-expressions are more terse than XML for sure, maybe even C. At the same time, they're specific to the task of applying operations to data. They don't admit ambiguity. Parsers are simple and easy to find example code for.
This might save you from having to learn too much theory before you start experimenting. I'll emphasize MerickOWA's point that ANTLR and other parser generators are probably a bigger battle than you want to fight right now. See this discussion on programmers.stackexchange for some background on when the full generality of this type of tool could help.

Best/fastest way to write a parser in c#

What is the best way to build a parser in c# to parse my own language?
Ideally I'd like to provide a grammar, and get Abstract Syntax Trees as an output.
Many thanks,
Nestor
I've had good experience with ANTLR v3. By far the biggest benefit is that it lets you write LL(*) parsers with infinite lookahead - these can be quite suboptimal, but the grammar can be written in the most straightforward and natural way with no need to refactor to work around parser limitations, and parser performance is often not a big deal (I hope you aren't writing a C++ compiler), especially in learning projects.
It also provides pretty good means of constructing meaningful ASTs without need to write any code - for every grammar production, you indicate the "crucial" token or sub-production, and that becomes a tree node. Or you can write a tree production.
Have a look at the following ANTLR grammars (listed here in order of increasing complexity) to get a gist of how it looks and feels
JSON grammar - with tree productions
Lua grammar
C grammar
I've played wtih Irony. It looks simple and useful.
You could study the source code for the Mono C# compiler.
While it is still in early beta the Oslo Modeling language and MGrammar tools from Microsoft are showing some promise.
I would also take a look at SableCC. Its very easy to create the EBNF grammer. Here is a simple C# calculator example.
There's a short paper here on constructing an LL(1) parser here, of course you could use a generator too.
Lex and yacc are still my favorites. Obscure if you're just starting out, but extremely simple, fast, and easy once you've got the lingo down.
You can make it do whatever you want; generate C# code, build other grammars, emulate instructions, whatever.
It's not pretty, it's a text based format and LL1, so your syntax has to accomodate that.
On the plus side, it's everywhere. There are great O'reilly books about it, lots of sample code, lots of premade grammars, and lots of native language libraries.

SAX vs XmlTextReader - SAX in C#

I am attempting to read a large XML document and I wanted to do it in chunks vs XmlDocument's way of reading the entire file into memory. I know I can use XmlTextReader to do this but I was wondering if anyone has used SAX for .NET? I know Java developers swear by it and I was wondering if it is worth giving it a try and if so what are the benefits in using it. I am looking for specifics.
If you just want to get the job done quickly, the XmlTextReader exists for that purpose (in .NET).
If you want to learn a de facto standard (and available in may other programming languages) that is stable and which will force you to code very efficiently and elegantly, but which is also extremely flexible, then look into SAX. However, don't waste your time unless you're going to be creating highly esoteric XML parsers. Instead, look for parsers that next generation parsers (like XmlTextReader) for your particular platform.
SAX Resources
SAX was originally written for Java, and you can find the original open source project, which has been stable for several years, here:
http://sax.sourceforge.net/
There is a C# port of the same project here (with HTML docs as part of the source download); it is also stable:
http://saxdotnet.sourceforge.net/
If you do not like the C# implementation, you could always resort to referencing COM DLLs via COMInterop using MSXML3 or later: http://msdn.microsoft.com/en-us/library/ms994343.aspx
Articles that come from the Java world but which probably illustrate the concepts you need to be successful with this approach (there may also be downloadable Java source code that could prove useful and may be easy enough to convert to C#):
Output large XML documents, Part 1 (http://www.ibm.com/developerworks/xml/library/x-tipbigdoc.html)
Output large XML documents, Part 2 (http://www.ibm.com/developerworks/xml/library/x-tipbigdoc2.html)
Use a SAX filter to manipulate data (http://www.ibm.com/developerworks/xml/library/x-tipsaxfilter/)
It will be a cumbersome implementation. I have only used SAX back in my pre-.NET days, but it requires some pretty advanced coding techniques. At this point, it's just not worth the trouble.
Interesting Concept for a Hybrid Parser
This thread describes a hybrid parser that uses the .NET XmlTextReader to implement a parser that provides a combination of DOM and SAX benefits...
http://bytes.com/groups/net-xml/178403-xmltextreader-versus-dom
If you're talking about SAX for .NET, the project doesn't appear to be maintained. The last release was more than 2 years ago. Maybe they got it perfect on the last release, but I wouldn't bet on it. The author, Karl Waclawek, seems to have disappeared off the net.
As for SAX under Java? You bet, it's great. Unfortunately, SAX was never developed as a standard, so all of the non-Java ports have been adapting a Java API for their own needs. While DOM is a pretty lousy API, it has the advantage of having been designed for multiple languages and environments, so it's easy to implement in Java, C#, JavaScript, C, et al.
I believe there are no benefits using SAX at least due two reasons:
SAX is a "push" model while XmlReader is a pull parser that has a number of benefits.
Being dependent on a 3rd-party library rather than using a standard .NET API.
Personally, I much prefer the SAX model as the XmlReader has some really annoying traps that can cause bugs in your code that might cause your code to skip elements. Most code would be structured around a while(rdr.Read()) model, but if you have any "ReadString" or "ReadInnerXml()" within that loop you will find yourself skipping elements on the next iteration.
As SAX is event based this will never hapen as you can not perform any operations that would cause your parser to seek-ahead.
My personal feeling is that Microsoft have invented the notion that the XmlReader is better with the explanation of the push/pull model, but I don't really buy it. So Microsoft think that you don't need to create a state-machine with XmlReader, that doesn't make sense to me, but anyway, it's just my opinion.

Constructing a simple interpreter

I’m starting a project where I need to implement a light-weight interpreter.
The interpreter is used to execute simple scientific algorithms.
The programming language that this interpreter will use should be simple, since it is targeting non- software developers (for example, mathematicians.)
The interpreter should support basic programming languages features:
Real numbers, variables, multi-dimensional arrays
Binary (+, -, *, /, %) and Boolean (==, !=, <, >, <=, >=) operations
Loops (for, while), Conditional expressions (if)
Functions
MathWorks MatLab is a good example of where I’m heading, just much simpler.
The interpreter will be used as an environment to demonstrate algorithms; simple algorithms such as finding the average of a dataset/array, or slightly more complicated algorithms such as Gaussian elimination or RSA.
Best/Most practical resource I found on the subject is Ron Ayoub’s entry on Code Project (Parsing Algebraic Expressions Using the Interpreter Pattern) - a perfect example of a minified version of my problem.
The Purple Dragon Book seems to be too much, anything more practical?
The interpreter will be implemented as a .NET library, using C#. However, resources for any platform are welcome, since the design-architecture part of this problem is the most challenging.
Any practical resources?
(please avoid “this is not trivial” or “why re-invent the wheel” responses)
I would write it in ANTLR. Write the grammar, let ANTLR generate a C# parser. You can ANTLR ask for a parse tree, and possibly the interpreter can already operate on the parse tree. Perhaps you'll have to convert the parse tree to some more abstract internal representation (although ANTLR already allows to leave out irrelevant punctuation when generating the tree).
It might sound odd, but Game Scripting Mastery is a great resource for learning about parsing, compiling and interpreting code.
You should really check it out:
http://www.amazon.com/Scripting-Mastery-Premier-Press-Development/dp/1931841578
One way to do it is to examine the source code for an existing interpreter. I've written a javascript interpreter in the D programming language, you can download the source code from http://ftp.digitalmars.com/dmdscript.zip
Walter Bright, Digital Mars
I'd recommend leveraging the DLR to do this, as this is exactly what it is designed for.
Create Your Own Language ontop of the DLR
Lua was designed as an extensible interpreter for use by non-programmers. (The first users were Brazilian petroleum geologists although the user base has broadened considerably since then.) You can take Lua and easily add your scientific algorithms, visualizations, what have you. It's superbly well engineered and you can get on with the task at hand.
Of course, if what you really want is the fun of building your own, then the other advice is reasonable.
Have you considered using IronPython? It's easy to use from .NET and it seems to meet all your requirements. I understand that python is fairly popular for scientific programming, so it's possible your users will already be familiar with it.
The Silk library has just been published to GitHub. It seems to do most of what you are asking. It is very easy to use. Just register the functions you want to make available to the script, compile the script to bytecode and execute it.
The programming language that this interpreter will use should be simple, since it is targeting non- software developers.
I'm going to chime in on this part of your question. A simple language is not what you really want to hand to non-software developers. Stripped down languages require more effort by the programmer. What you really want id a well designed and well implemented Domain Specific Language (DSL).
In this sense I will second what Norman Ramsey recommends with Lua. It has an excellent reputation as a base for high quality DSLs. A well documented and useful DSL takes time and effort, but will save everyone time in the long run when domain experts can be brought up to speed quickly and require minimal support.
I am surprised no one has mentioned xtext yet. It is available as Eclipse plugin and IntelliJ plugin. It provides not just the parser like ANTLR but the whole pipeline (including parser, linker, typechecker, compiler) needed for a DSL. You can check it's source code on Github for understanding how, an interpreter/compiler works.

Categories

Resources