C# and ANTLR4: Handling "include" directives when parsing a file - c#

I’m in a situation that, using ANTLR, I’m trying to parse input files that contains references to other files inside them, just like #include "[insert file name]" of C language.
One suggested approach is:
Parse the root file, saving said references as nodes (so, specific Grammar rules)
Visit the tree searching for "reference" nodes
for each reference node, parse the file referenced and substitute the node with the newly generated tree
repeat this process recursively, to handle multiple levels of inclusions
The problem with this solution is that the referenced files might be completely partial (see includes inside the body of a C function). In order to parse such files, I would have to implement a different parser to handle the fragmented grammar.
Is there any valid/suggested approach to (literally) inject the new file inside the ongoing parsing process?

One solution to this problem can be achieved by overriding Scanner's behavior and specifically, the NextToken() method.
This is necassary since the EOF token cannot be handled by the ANTLR lexer grammar ( to my best knowledge ) and any actions
attached to the lexer rule recognizing the EOF are simply ignored (as shown in the code bellow). Thus, it is necessary to
implement this behaviour directly into the scanner method.
So assume we have a parser grammar
parser grammar INCParserGrammar;
#parser::members {
public static Stack<ICharStream> m_nestedfiles = new Stack<ICharStream>();
}
options { tokenVocab = INCLexerGrammar; }
/*
* Parser Rules
*/
compileUnit
: (include_directives | ANY )+ ENDOFFILE
;
include_directives : INCLUDEPREFIX FILE DQUOTE
;
A static public Stack<ICharStream> (i.e. mySpecialFileStack) should be introduced inside grammar's members. This stack will be used to store the Character Steams associated with the files that take part in the parsing. The Character Streams are push to
this stack as new files are encountered with the include statements
and a lexer grammar
lexer grammar INCLexerGrammar;
#lexer::header {
using System;
using System.IO;
}
#lexer::members {
string file;
ICharStream current;
}
/*
* Lexer Rules
*/
INCLUDEPREFIX : '#include'[ \t]+'"' {
Mode(INCLexerGrammar.FILEMODE);
};
// The following ruls has always less length matched string that the the rule above
ANY : ~[#]+ ;
ENDOFFILE : EOF { // Any actions in the this rule are ignored by the ANTLR lexer };
////////////////////////////////////////////////////////////////////////////////////////////////////////
mode FILEMODE;
FILE : [a-zA-Z][a-zA-Z0-9_]*'.'[a-zA-Z0-9_]+ { file= Text;
StreamReader s = new StreamReader(file);
INCParserGrammar.m_nestedfiles.Push(_input);
current =new AntlrInputStream(s);
};
DQUOTE: '"' {
this._input = current;
Mode(INCLexerGrammar.DefaultMode); };
The overriden body of NextToken() method will be placed in the .g4.cs file which purpose is to extend
the generated scanner class given that the generated scanner class is decorated with the "partial" keyword
After the partial Scanner Class associated to the given grammar is generated navigate to the source code of the
ANTLR4 Lexer Class as given in the ANTLR Runtime and Copy ALL the original code to this new method and,
in the middle do-while block (right after the try-catch block) add the following code:
if (this._input.La(1) == -1)
{
if ( mySpecialFileStack.Count == 0 )
this._hitEOF = true;
else
this._input = mySpecialFileStack.Pop();
}
The full body of the NextToken() method override is
public override IToken NextToken() {
int marker = this._input != null ? this._input.Mark() : throw new InvalidOperationException("nextToken requires a non-null input stream.");
label_3:
try {
while (!this._hitEOF) {
this._token = (IToken)null;
this._channel = 0;
this._tokenStartCharIndex = this._input.Index;
this._tokenStartCharPositionInLine = this.Interpreter.Column;
this._tokenStartLine = this.Interpreter.Line;
this._text = (string)null;
do {
this._type = 0;
int num;
try {
num = this.Interpreter.Match(this._input, this._mode);
} catch (LexerNoViableAltException ex) {
this.NotifyListeners(ex);
this.Recover(ex);
num = -3;
}
if (this._input.La(1) == -1) {
if (INCParserGrammar.m_nestedfiles.Count == 0 ) {
this._hitEOF = true;
}
else
{
this._input = INCParserGrammar.m_nestedfiles.Pop();
}
}
if (this._type == 0)
this._type = num;
if (this._type == -3)
goto label_3;
}
while (this._type == -2);
if (this._token == null)
this.Emit();
return this._token;
}
this.EmitEOF();
return this._token;
} finally {
this._input.Release(marker);
}
}
Now, when you recognize a file inside your code that should be parsed, simply add the following action
FILE
: [a-zA-Z][a-zA-Z0-9_]*'.'[a-zA-Z0-9_]+ {
StreamReader s = new StreamReader(Text);
mySpecialFileStack.Push(_input);
_input = new AntlrInputStream(s);
};
DQUOTE: '"' { this._input = current;
Mode(INCLexerGrammar.DefaultMode); };
//***Warning:***
// Be careful when your file inclusion is enclosed inside quotes or other symbols, or if
// the filename-to-be-included is not the last token that defines an inclusion: `_input`
// should only be switched AFTER the inclusion detection is completely found (i.e. after
// the closing quote has been recognized).
Finally the main program is given below where it is apparent that the root file is added first in the ICharStream stack
static void Main(string[] args) {
var a = new StreamReader("./root.txt");
var antlrInput = new AntlrInputStream(a);
INCParserGrammar.m_nestedfiles.Push(antlrInput);
var lexer = new INCLexerGrammar(antlrInput);
var tokens = new BufferedTokenStream(lexer);
var parser = new INCParserGrammar(tokens);
parser.compileUnit();
}

Reading Mr. Grigoris's answer helped me to discover another possible solution for my problem:
While trying to figure out how does the suggested solution work, I stumbled upon public virtual IToken EmitEOF() method. If the code that Mr. Grigoris provided gets placed inside this function (with minor changes), everything seems to work as intended.
That gave me the opportunity to override the functionality of EmitEOF() directly from #members block of lexer, without having to create a whole new file or to understand how my current parser's NextToken() method works.
Lexer Grammar:
lexer grammar INCLexerGrammar;
#lexer::header {
using System;
using System.IO;
using System.Collections.Generic;
}
#lexer::members {
private Stack<ICharStream> _nestedFiles = new Stack<ICharStream>();
public override IToken EmitEOF(){
if (_nestedFiles.Count == 0 ) {
return base.EmitEOF();
};
this._hitEOF = false;
this._input = _nestedFiles.Pop();
return this.NextToken();
}
}
/////////////////////////////////////////////////////////////////////////////////////
// Default Mode /////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////////
// Skipped because we don't want to hide INCLUDEPREFIX's existance from parser
INCLUDEPREFIX : '#include'[ \t]+'"' { Mode(INCLexerGrammar.FILEMODE); } -> skip;
// This is the only valid token our Grammar accepts
ANY : ~[#]+ ;
/////////////////////////////////////////////////////////////////////////////////////
mode FILEMODE; //////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////////
// Skipped because we don't want to hide FILE's existance from parser
FILE : [a-zA-Z][a-zA-Z0-9_]*'.'[a-zA-Z0-9_]+ {
// Create new StreamReader from the file mentioned
StreamReader s = new StreamReader(Text);
// Push the old stream to stack
_nestedFiles.Push(_input);
// This new stream will be popped and used right after, on DQUOTE.
_nestedFiles.Push(new AntlrInputStream(s));
} -> skip;
// Skipped because we don't want to hide DQUOTE's existance from parser
DQUOTE: '"' {
// Injecting the newly generated Stream.
this._input = _nestedFiles.Pop();
Mode(INCLexerGrammar.DefaultMode);
} -> skip;
Parser Grammar:
parser grammar INCParserGrammar;
options { tokenVocab = INCLexerGrammar; }
// Our Grammar contains only ANY tokens. Include directives
// and other Tokens exists only for helping lexer to
// inject the contents of other files inside the current
// scanning process.
compileUnit
: ANY+ EOF
;
Execution Calls:
// [...]
var myRootFile = new StreamReader("./root.txt");
var myAntlrInputStream = new AntlrInputStream(myRootFile);
var lexer = new INCLexerGrammar(myAntlrInputStream);
var tokens = new BufferedTokenStream(lexer);
var parser = new INCParserGrammar(tokens);
parser.compileUnit();
// [...]

Related

Last line ignored using ANTLR for C#

We use C# version 4.6.4 of ANTLR to parse code snippets used in our tools. The grammar is similar to IEC61131, the Pascal-like PLC language. When someone enters a snippet and forgets the semicolon ending the last line, this line is just ignored by the parser. What can I do to get some feedback on this? I need to at least give an error message to the user.
I already have an error handler:
class ErrorListener : IAntlrErrorListener<IToken>
{
public void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
_errorLine = offendingSymbol.Line;
_errorColumn = offendingSymbol.Column + 1;
_errorText = "Error on line " + _errorLine + ", column " + _errorColumn;
}
}
My lexer and parser functions are:
public CommonTokenStream Lex(string stLike)
{
AntlrInputStream input = new AntlrInputStream(stLike);
IEC61131Lexer lexer = new IEC61131Lexer(input);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
return tokenStream;
}
public IParseTree Parse(CommonTokenStream tokenStream)
{
IEC61131Parser parser = new IEC61131Parser(tokenStream);
ErrorListener listener = new ErrorListener();
parser.AddErrorListener((IAntlrErrorListener<IToken>)listener);
return parser.iec_source();
}
They are called like this:
CommonTokenStream tokenStream = Lex(stLike);
IParseTree tree = Parse(tokenStream);
// If parsing went OK, _errorText will be empty
if (_errorText == "")
{
// Walk the tree to create code
IEC61131PlcVisitor visitor = new IEC61131PlcVisitor(theClass, tokenStream, target);
visitor.Indent = indent;
result = visitor.Visit(tree);
}
else
{
result = "<" + _errorText + ">";
}
When I parse a code snippet with a missing semicolon at the end, the lexer generates tokens for it, but the parser stops at the last semicolon.
Add EOF token at the end of iec_source rule.
That way the parser will know it has to reach end of input and will emit an error if it does not find end of file after the last recognized statement.

Remove text from PDF document using Aspose.PDF library?

I need to delete a text from a PDF document. I am using Aspose for the purpose
am currently using TextFragmentAbsorber.
FYI, I cannot use any other 3rd party library.
Below is the code I am using :
private string DeleteMachineReadableCode(string inputFilePath)
{
var outputFilePath = Path.Combine(Path.GetTempPath(), string.Format(#"{0}.pdf", Guid.NewGuid()));
try
{
// Open document
Document pdfDocument = new Document(inputFilePath);
// Create TextAbsorber object to find all the phrases matching the regular expression
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("#START#((.|\r\n)*?)#END#");
// Set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
// Accept the absorber for all pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
// Update text and other properties
textFragment.Text = string.Empty;
// Set to an instance of an object.
textFragment.TextState.Font = FontRepository.FindFont("Verdana");
textFragment.TextState.FontSize = 1;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.White);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.White);
}
pdfDocument.Save(outputFilePath);
}
finally
{
if (File.Exists(inputFilePath))
File.Delete(inputFilePath);
}
return outputFilePath;
}
I am able to replace the content if the content to be deleted is on a single page.
My problem is that if the text spans over multiple pages the TextFragmentAbsorber does not recognize the text with the mentioned regex pattern ("#START#((.|\r\n)*?)#END#").
Please suggest if anything can be done on the regex or the some setting in Aspose can fix my issue.
As shared earlier, we can not promise earlier resolution of the issue reported by you, because of architecture limitation. However, we have modified the code snippet to meet your requirements.
The idea is to find text starting from '#START#' on the one of the document pages. Then to find text ending with '#END#' on the one of subsequent pages. And also to process all text fragments that placed on the pages between those two pages (if it exists).
private string DeleteMachineReadableCodeUpdated(string inputFilePath)
{
string outputFilePath = Path.Combine(Path.GetTempPath(), string.Format(#"{0}.pdf", Guid.NewGuid()));
try
{
// Open document
Document pdfDocument = new Document(inputFilePath);
// Create TextAbsorber object to find all the phrases matching the regular expression
TextFragmentAbsorber absorber = new TextFragmentAbsorber("#START#((.|\r\n)*?)#END#");
// Set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
absorber.TextSearchOptions = textSearchOptions;
// Accept the absorber for all pages
pdfDocument.Pages.Accept(absorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = absorber.TextFragments;
// If pattern found on one of the pages
if (textFragmentCollection.Count > 0)
{
RemoveTextFromFragmentCollection(textFragmentCollection);
}
else
{
// In case nothing was found tries to find by parts
string startingPattern = "#START#((.|\r\n)*?)\\z";
string endingPattern = "\\A((.|\r\n)*?)#END#";
bool isStartingPatternFound = false;
bool isEndingPatternFound = false;
ArrayList fragmentsToRemove = new ArrayList();
foreach (Page page in pdfDocument.Pages)
{
// If ending pattern was already found - do nothing
if (isEndingPatternFound)
continue;
// If starting pattern was already found - activate textFragmentAbsorber with ending pattern
absorber.Phrase = !isStartingPatternFound ? startingPattern : endingPattern;
page.Accept(absorber);
if (absorber.TextFragments.Count > 0)
{
// In case something is found - add it to list
fragmentsToRemove.AddRange(absorber.TextFragments);
if (isStartingPatternFound)
{
// Both starting and ending patterns found - the document processing
isEndingPatternFound = true;
RemoveTextFromFragmentCollection(fragmentsToRemove);
}
else
{
// Only starting pattern found yet - continue
isStartingPatternFound = true;
}
}
else
{
// In case neither starting nor ending pattern are found on current page
// If starting pattern was found previously - get all fragments from the page
if (isStartingPatternFound)
{
absorber.Phrase = String.Empty;
page.Accept(absorber);
fragmentsToRemove.AddRange(absorber.TextFragments);
}
// Otherwise do nothing (continue)
}
}
}
pdfDocument.Save(outputFilePath);
}
finally
{
if (File.Exists(inputFilePath))
File.Delete(inputFilePath);
}
return outputFilePath;
}
private void RemoveTextFromFragmentCollection(ICollection fragmentCollection)
{
// Loop through the fragments
foreach (TextFragment textFragment in fragmentCollection)
{
textFragment.Text = string.Empty;
}
}
Note:
This code assumed that the only one text block starting from '#START#' and ending with '#END#' is in the document. However the above code can be easly modified to process several those blocks.
Instead of processing text on intermediate page(s) you may store page number(s) and than delete using pdfDocument.Pages.Delete(pageNumber) before the saving document. It lets to avoid 'blank' pages if them undesirable.

Parsing key value pairs c#

I've been having issues attempting to parse key value pairs from a text file. I've been scouring for libraries that can do what I'd like as I do not have the ability to create a class that can do this.
Here is the beginning of my file along with a portion of commented out text and key value pairs:
#!version:1.0.0.1
##File header "#!version:1.0.0.1" can not be edited or deleted, and must be placed in the first line.##
#######################################################################################
## Account1 Basic Settings ##
#######################################################################################
account.1.enable = 1
account.1.label = Front
account.1.display_name = Front
What I'm looking to do is grab these values, and be able to update them to within the file in the same location in the file that they are as these files need to remain human readable.
I've looked into Nini as this library seems to be able to do what I'd like, however the error I continue to have is based off of the line 1 of my file as it is not a key value pair.
Expected assignment operator (=) - Line: 1, Position: 19.
I read through the source of Nini, and it seems there is a way to condition the reader to use Mysqlstyle, which would use "#" as comments, but I'm unsure how to adjust it or if it is done automatically as it is completely over my head.
I understand that my files aren't legitimate ini files and there is probably a limitation within the Nini library as it searches for the section that the key value pairs are in.
The code I've attempted to use to parse and display this text to edit with Nini is as follows:
public void EditCFG(string file)
{
if (!string.IsNullOrWhiteSpace(file))
{
IniConfigSource inifile = new IniConfigSource(file);
account_1_display_name.Text = inifile.Configs[""].Get("account.1.display.name");
}
}
Could someone please point me in the right direction?
EDIT
Thanks to #rowland-shaw, I have found the solution:
private IConfigSource source = null;
public void EditCFG(string file)
{
if (!string.IsNullOrWhiteSpace(file))
{
IniDocument inifile = new IniDocument(file, IniFileType.MysqlStyle);
source = new IniConfigSource(inifile);
account_1_display_name.Text = source.Configs["account"].Get("account.1.display_name");
}
}
However, this wasn't completely the answer. I had to also implement sections within the file. After testing my equipment that grabs these files with the updated text, everything was a success.
You need to specify the IniFileType, i.e.:
IniConfigSource inifile = new IniConfigSource(file, IniFileType.MysqlStyle);
Long example:
IniDocument inifile = new IniDocument(file, IniFileType.MysqlStyle);
IniConfigSource source = new IniConfigSource(inifile);
If that is how the format is going to be (key = value and # for comments) in the file, you could do the following (c# pseudocode-ish, you can do the trivial stuff yourself):
Dictionary<string, string> dictionary;
foreach(string line in file)
{
if(string.IsNullOrWhiteSpace(line)) continue;
// Remove extra spaces
line = line.Trim();
if(line[0] == '#') continue;
string[] kvp = line.Split('=');
dictionary[kvp[0].Trim()] = kvp[1].Trim(); // kvp[0] = key, kvp[1] = value
}
Then you can use the created dictionary like account_1_display_name.Text = dictionary["account.1.display.name"];
i can recommend my library Nager.ConfigParser you can easily obtain them over nuget.
Here the example for your configuration
var config = "#comment1\r\naccount.1.enable = 1\r\naccount.1.label = Front";
var configConvert = new ConfigConvert();
var item = configConvert.DeserializeObject<AccountCollection>(config);
public class AccountCollection
{
[ConfigKey("account.")]
[ConfigArray]
public Account[] Accounts { get; set; }
}
public class Account : ConfigArrayElement
{
public int Enable { get; set; }
public string Label { get; set; }
[ConfigKey("display_name")]
public string DisplayName { get; set; }
}

How to implement a Lua container (virtual file system) module loader in C#

Sounds a little bit scary isn't it?
Some background information, I want to load a tar archive which contains some lua modules into my C# application using LuaInterface. The easiest way would be to extract these files to a temp folder, modify the lua module search path and read them with require as usual. But I do not want to put these scripts somewhere on the file system.
So I thought it should be possible to load the tar-archive with the #ziplib I know there are a lot of lua implementations for tar and stuff like that. But the #zlib is already part of the project.
After successfully loading the file as strings(streams) out of the archive I should be able to pass them into lua.DoString(...) in C# via LuaInterface.
But simply loading modules by a dostring or dofile does not work if modules have a line like this: "module(..., package.seeall)" There is a error reportet like passing argument 1 a nil, but string expected.
The other problem is a module may depend on other modules which are also located in the tar archive.
One possible solution should be to define a custom loader as described here.
My idea is to implement such a loader in C# with the #ziplib and map this loader into the lua stack of my C# application.
Does anyone of you had a similar task to this?
Are there any ready to use solutions which already address problems like this?
The tar file is not must have but a nice to have package format.
Is this idea feasible or totally unfeasible?
I've written some example class to extract the lua files from the archive. This method works as loader and return a lua function.
namespace LuaInterfaceTest
{
class LuaTarModuleLoader
{
private LuaTarModuleLoader() { }
~LuaTarModuleLoader()
{
in_stream_.Close();
}
public LuaTarModuleLoader(Stream in_stream,Lua lua )
{
in_stream_ = in_stream;
lua_ = lua;
}
public LuaFunction load(string modulename, out string error_message)
{
string lua_chunk = "test=hello";
string filename = modulename + ".lua";
error_message = "Unable to locate the file";
in_stream_.Position = 0; // rewind
Stream gzipStream = new BZip2InputStream(in_stream_);
TarInputStream tar = new TarInputStream(gzipStream);
TarEntry tarEntry;
LuaFunction func = null;
while ((tarEntry = tar.GetNextEntry()) != null)
{
if (tarEntry.IsDirectory)
{
continue;
}
if (filename == tarEntry.Name)
{
MemoryStream out_stream = new MemoryStream();
tar.CopyEntryContents(out_stream);
out_stream.Position = 0; // rewind
StreamReader stream_reader = new StreamReader(out_stream);
lua_chunk = stream_reader.ReadToEnd();
func = lua_.LoadString(lua_chunk, filename);
string dum = func.ToString();
error_message = "No Error!";
break;
}
}
return func;
}
private Stream in_stream_;
private Lua lua_;
}
}
I try to register the load method like this in the LuaInterface
Lua lua = new Lua();
GC.Collect();
Stream inStream = File.OpenRead("c:\\tmp\\lua_scripts.tar.bz2");
LuaTarModuleLoader tar_loader = new LuaTarModuleLoader(inStream, lua);
lua.DoString("require 'CLRPackage'");
lua.DoString("import \"ICSharpCode.SharpZipLib.dll\"");
lua.DoString("import \"System\"");
lua["container_module_loader"] = tar_loader;
lua.DoString("table.insert(package.loaders, 2, container_module_loader.load)");
lua.DoString("require 'def_sensor'");
If I try it this way I'll get an exception while the call to require :
"instance method 'load' requires a non null target object"
I tried to call the load method directly, here I have to use the ":" notation.
lua.DoString("container_module_loader:load('def_sensor')");
If I call the method like that I hit a breakpoint in the debugger which is place on top of the method so everything works as expected.
But If I try to register the method with ":" notation I get an exception while registering the method:
lua.DoString("table.insert(package.loaders, 2, container_module_loader:load)");
"[string "chunk"]:1: function arguments expected near ')'"
In LÖVE they have that working. All Lua files are inside one zip file, and they work, even if ... is used. The library they use is PhysicsFS.
Have a look at the source. Probably /modules/filesystem will get you started.
I finally got the trick ;-)
One Problem I currently not really understand is that my loader should not return any string.
Here is my solution:
The loader Class itself:
namespace LuaInterfaceTest
{
class LuaTarModuleLoader
{
private LuaTarModuleLoader() { }
~LuaTarModuleLoader()
{
in_stream_.Close();
}
public LuaTarModuleLoader(Stream in_stream,Lua lua )
{
in_stream_ = in_stream;
lua_ = lua;
}
public LuaFunction load(string modulename)
{
string lua_chunk = "";
string filename = modulename + ".lua";
in_stream_.Position = 0; // rewind
Stream gzipStream = new BZip2InputStream(in_stream_);
TarInputStream tar = new TarInputStream(gzipStream);
TarEntry tarEntry;
LuaFunction func = null;
while ((tarEntry = tar.GetNextEntry()) != null)
{
if (tarEntry.IsDirectory)
{
continue;
}
if (filename == tarEntry.Name)
{
MemoryStream out_stream = new MemoryStream();
tar.CopyEntryContents(out_stream);
out_stream.Position = 0; // rewind
StreamReader stream_reader = new StreamReader(out_stream);
lua_chunk = stream_reader.ReadToEnd();
func = lua_.LoadString(lua_chunk, modulename);
string dum = func.ToString();
break;
}
}
return func;
}
private Stream in_stream_;
private Lua lua_;
}
}
And how to use the loader, I am not sure if all the package stuff is really needed. But I had to wrap up the call with ":" notation and hide it behind my "load_wrapper" function.
string load_wrapper = "local function load_wrapper(modname)\n return container_module_loader:load(modname)\n end";
Lua lua = new Lua();
GC.Collect();
Stream inStream = File.OpenRead("c:\\tmp\\lua_scripts.tar.bz2");
LuaTarModuleLoader tar_loader = new LuaTarModuleLoader(inStream, lua);
lua.DoString("require 'CLRPackage'");
lua.DoString("import \"System\"");
lua["container_module_loader"] = tar_loader;
lua.DoString(load_wrapper);
string loader_package = "module('my_loader', package.seeall) \n";
loader_package += load_wrapper + "\n";
loader_package += "table.insert(package.loaders, 2, load_wrapper)";
lua.DoString(loader_package);
lua.DoFile("./load_modules.lua");
I hope this may also helps some other

Import CSV file to strongly typed data structure in .Net [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What's the best way to import a CSV file into a strongly-typed data structure?
Microsoft's TextFieldParser is stable and follows RFC 4180 for CSV files. Don't be put off by the Microsoft.VisualBasic namespace; it's a standard component in the .NET Framework, just add a reference to the global Microsoft.VisualBasic assembly.
If you're compiling for Windows (as opposed to Mono) and don't anticipate having to parse "broken" (non-RFC-compliant) CSV files, then this would be the obvious choice, as it's free, unrestricted, stable, and actively supported, most of which cannot be said for FileHelpers.
See also: How to: Read From Comma-Delimited Text Files in Visual Basic for a VB code example.
Use an OleDB connection.
String sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\InputDirectory\\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();
If you're expecting fairly complex scenarios for CSV parsing, don't even think up of rolling our own parser. There are a lot of excellent tools out there, like FileHelpers, or even ones from CodeProject.
The point is this is a fairly common problem and you could bet that a lot of software developers have already thought about and solved this problem.
I agree with #NotMyself. FileHelpers is well tested and handles all kinds of edge cases that you'll eventually have to deal with if you do it yourself. Take a look at what FileHelpers does and only write your own if you're absolutely sure that either (1) you will never need to handle the edge cases FileHelpers does, or (2) you love writing this kind of stuff and are going to be overjoyed when you have to parse stuff like this:
1,"Bill","Smith","Supervisor", "No Comment"
2 , 'Drake,' , 'O'Malley',"Janitor,
Oops, I'm not quoted and I'm on a new line!
Brian gives a nice solution for converting it to a strongly typed collection.
Most of the CSV parsing methods given don't take into account escaping fields or some of the other subtleties of CSV files (like trimming fields). Here is the code I personally use. It's a bit rough around the edges and has pretty much no error reporting.
public static IList<IList<string>> Parse(string content)
{
IList<IList<string>> records = new List<IList<string>>();
StringReader stringReader = new StringReader(content);
bool inQoutedString = false;
IList<string> record = new List<string>();
StringBuilder fieldBuilder = new StringBuilder();
while (stringReader.Peek() != -1)
{
char readChar = (char)stringReader.Read();
if (readChar == '\n' || (readChar == '\r' && stringReader.Peek() == '\n'))
{
// If it's a \r\n combo consume the \n part and throw it away.
if (readChar == '\r')
{
stringReader.Read();
}
if (inQoutedString)
{
if (readChar == '\r')
{
fieldBuilder.Append('\r');
}
fieldBuilder.Append('\n');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
records.Add(record);
record = new List<string>();
inQoutedString = false;
}
}
else if (fieldBuilder.Length == 0 && !inQoutedString)
{
if (char.IsWhiteSpace(readChar))
{
// Ignore leading whitespace
}
else if (readChar == '"')
{
inQoutedString = true;
}
else if (readChar == ',')
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
else
{
fieldBuilder.Append(readChar);
}
}
else if (readChar == ',')
{
if (inQoutedString)
{
fieldBuilder.Append(',');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
}
else if (readChar == '"')
{
if (inQoutedString)
{
if (stringReader.Peek() == '"')
{
stringReader.Read();
fieldBuilder.Append('"');
}
else
{
inQoutedString = false;
}
}
else
{
fieldBuilder.Append(readChar);
}
}
else
{
fieldBuilder.Append(readChar);
}
}
record.Add(fieldBuilder.ToString().TrimEnd());
records.Add(record);
return records;
}
Note that this doesn't handle the edge case of fields not being deliminated by double quotes, but meerley having a quoted string inside of it. See this post for a bit of a better expanation as well as some links to some proper libraries.
I was bored so i modified some stuff i wrote. It try's to encapsulate the parsing in an OO manner whle cutting down on the amount of iterations through the file, it only iterates once at the top foreach.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
// usage:
// note this wont run as getting streams is not Implemented
// but will get you started
CSVFileParser fileParser = new CSVFileParser();
// TO Do: configure fileparser
PersonParser personParser = new PersonParser(fileParser);
List<Person> persons = new List<Person>();
// if the file is large and there is a good way to limit
// without having to reparse the whole file you can use a
// linq query if you desire
foreach (Person person in personParser.GetPersons())
{
persons.Add(person);
}
// now we have a list of Person objects
}
}
public abstract class CSVParser
{
protected String[] deliniators = { "," };
protected internal IEnumerable<String[]> GetRecords()
{
Stream stream = GetStream();
StreamReader reader = new StreamReader(stream);
String[] aRecord;
while (!reader.EndOfStream)
{
aRecord = reader.ReadLine().Split(deliniators,
StringSplitOptions.None);
yield return aRecord;
}
}
protected abstract Stream GetStream();
}
public class CSVFileParser : CSVParser
{
// to do: add logic to get a stream from a file
protected override Stream GetStream()
{
throw new NotImplementedException();
}
}
public class CSVWebParser : CSVParser
{
// to do: add logic to get a stream from a web request
protected override Stream GetStream()
{
throw new NotImplementedException();
}
}
public class Person
{
public String Name { get; set; }
public String Address { get; set; }
public DateTime DOB { get; set; }
}
public class PersonParser
{
public PersonParser(CSVParser parser)
{
this.Parser = parser;
}
public CSVParser Parser { get; set; }
public IEnumerable<Person> GetPersons()
{
foreach (String[] record in this.Parser.GetRecords())
{
yield return new Person()
{
Name = record[0],
Address = record[1],
DOB = DateTime.Parse(record[2]),
};
}
}
}
}
There are two articles on CodeProject that provide code for a solution, one that uses StreamReader and one that imports CSV data using the Microsoft Text Driver.
A good simple way to do it is to open the file, and read each line into an array, linked list, data-structure-of-your-choice. Be careful about handling the first line though.
This may be over your head, but there seems to be a direct way to access them as well using a connection string.
Why not try using Python instead of C# or VB? It has a nice CSV module to import that does all the heavy lifting for you.
I had to use a CSV parser in .NET for a project this summer and settled on the Microsoft Jet Text Driver. You specify a folder using a connection string, then query a file using a SQL Select statement. You can specify strong types using a schema.ini file. I didn't do this at first, but then I was getting bad results where the type of the data wasn't immediately apparent, such as IP numbers or an entry like "XYQ 3.9 SP1".
One limitation I ran into is that it cannot handle column names above 64 characters; it truncates. This shouldn't be a problem, except I was dealing with very poorly designed input data. It returns an ADO.NET DataSet.
This was the best solution I found. I would be wary of rolling my own CSV parser, since I would probably miss some of the end cases, and I didn't find any other free CSV parsing packages for .NET out there.
EDIT: Also, there can only be one schema.ini file per directory, so I dynamically appended to it to strongly type the needed columns. It will only strongly-type the columns specified, and infer for any unspecified field. I really appreciated this, as I was dealing with importing a fluid 70+ column CSV and didn't want to specify each column, only the misbehaving ones.
I typed in some code. The result in the datagridviewer looked good. It parses a single line of text to an arraylist of objects.
enum quotestatus
{
none,
firstquote,
secondquote
}
public static System.Collections.ArrayList Parse(string line,string delimiter)
{
System.Collections.ArrayList ar = new System.Collections.ArrayList();
StringBuilder field = new StringBuilder();
quotestatus status = quotestatus.none;
foreach (char ch in line.ToCharArray())
{
string chOmsch = "char";
if (ch == Convert.ToChar(delimiter))
{
if (status== quotestatus.firstquote)
{
chOmsch = "char";
}
else
{
chOmsch = "delimiter";
}
}
if (ch == Convert.ToChar(34))
{
chOmsch = "quotes";
if (status == quotestatus.firstquote)
{
status = quotestatus.secondquote;
}
if (status == quotestatus.none )
{
status = quotestatus.firstquote;
}
}
switch (chOmsch)
{
case "char":
field.Append(ch);
break;
case "delimiter":
ar.Add(field.ToString());
field.Clear();
break;
case "quotes":
if (status==quotestatus.firstquote)
{
field.Clear();
}
if (status== quotestatus.secondquote)
{
status =quotestatus.none;
}
break;
}
}
if (field.Length != 0)
{
ar.Add(field.ToString());
}
return ar;
}
If you can guarantee that there are no commas in the data, then the simplest way would probably be to use String.split.
For example:
String[] values = myString.Split(',');
myObject.StringField = values[0];
myObject.IntField = Int32.Parse(values[1]);
There may be libraries you could use to help, but that's probably as simple as you can get. Just make sure you can't have commas in the data, otherwise you will need to parse it better.

Categories

Resources