How to extract specific data from XML data - c#

I am using the following code snippet to parse and convert some XML data to CSV. I can convert the entire XML data and dump it into a file, however my requirements have changed and now I'm confused.
public void xmlToCSVfiltered(string p, int e)
{
string all_lines1 = File.ReadAllText(p);
all_lines1 = "<Root>" + all_lines1 + "</Root>";
XmlDocument doc_all = new XmlDocument();
doc_all.LoadXml(all_lines1);
StreamWriter write_all = new StreamWriter(FILENAME2);
XmlNodeList rows_all = doc_all.GetElementsByTagName("XML");
List<string[]> filtered = new List<string[]>();
foreach (XmlNode rowtemp in rows_all)
{
List<string> children_all = new List<string>();
foreach (XmlNode childtemp in rowtemp.ChildNodes)
{
children_all.Add(Regex.Replace(childtemp.InnerText, "\\s+", " ")); // <------- Fixed the Bug , Advisories dont span
}
string.Join(",", children_all.ToArray());
//write_all.WriteLine(string.Join(",", children_all.ToArray()));
if (children_all.Contains(e.toString()))
{
filtered.Add(children_all.ToArray());
write_all.WriteLine(children_all);
}
}
write_all.Flush();
write_all.Close();
foreach (var res in filtered)
{
Console.WriteLine(string.Join(",", res));
}
}
My input looks something like the following... My objective now is to only convert those "events" and compile into a CSV which have a certain number. Lets say, for example, I only want to convert to CSV those events who's 2nd data value under element <EVENT> is 4627. It would only convert those events and in the case of the input below, both mentioned below.
<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>
<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>
.. goes on
What my approach has been so far is to convert everything to CSV and store it in some sort of data structure and then query that data structure line by line and look if that number exists and if yes, write it to a file line by line. My function takes the path of the XML file and the number we are looking for in the XML data as parameters. I'm new to C# and I cannot understand how I would go about changing my function above. Any help will be appreciated!
EDIT:
Sample Input:
<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-
<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4623,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-
Required Output:
1.0,770162,20121009133435,3,,20121009133435,721,5,1,0,0,0,00:00,00:00,,00032134 26064957,4627,1,,1872161156,7,0,10000,1,0,5000000,0,10000000,0,1 ,,Keep it simple or spell
tham ALL out. For some reason
that is not the case
please press the on button
when trying to activate
device codes also available on
list,,,20121009133435,00-1d-71-0a-71-80,-66,,,0,50
The above will be the case if I call xmlToCSVfiltered(file, 4627);
Also note that, the output will be a single horizontal line as in CSV files but I can't really format it here for it to look like that.

I changed XmlDocumnet to XDocument so I can use Xml Linq. I also for testing used a StringReader to read the string instead of reading from a file. You can convert code back to your original File.ReadAlltext.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME2 = #"c:\temp\test.txt";
static void Main(string[] args)
{
string input =
"<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell\n" +
"tham ALL out. For some reason \n" +
"that is not the case\n" +
"please press the on button\n" +
"when trying to activate\n" +
"device codes also available on\n" +
"list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>\n" +
"<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell\n" +
"tham ALL out. For some reason\n" +
"that is not the case\n" +
"please press the on button\n" +
"when trying to activate\n" +
"device codes also available on\n" +
"list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>\n";
xmlToCSVfiltered(input, 4627);
}
static public void xmlToCSVfiltered(string p, int e)
{
//string all_lines1 = File.ReadAllText(p);
StringReader reader = new StringReader(p);
string all_lines1 = reader.ReadToEnd();
all_lines1 = "<Root>" + all_lines1 + "</Root>";
XDocument doc_all = XDocument.Parse(all_lines1);
StreamWriter write_all = new StreamWriter(FILENAME2);
List<XElement> rows_all = doc_all.Descendants("XML").Where(x => x.Element("EVENT").Value.Split(new char[] {','}).Skip(1).Take(1).FirstOrDefault() == e.ToString()).ToList();
List<string[]> filtered = new List<string[]>();
foreach (XElement rowtemp in rows_all)
{
List<string> children_all = new List<string>();
foreach (XElement childtemp in rowtemp.Elements())
{
children_all.Add(Regex.Replace(childtemp.Value, "\\s+", " ")); // <------- Fixed the Bug , Advisories dont span
}
string.Join(",", children_all.ToArray());
//write_all.WriteLine(string.Join(",", children_all.ToArray()));
if (children_all.Contains(e.ToString()))
{
filtered.Add(children_all.ToArray());
write_all.WriteLine(children_all);
}
}
write_all.Flush();
write_all.Close();
foreach (var res in filtered)
{
Console.WriteLine(string.Join(",", res));
}
}
}
}
​

I have made some assumptions since it was not clear to me from the question
Assumptions
1. I am assuming you know that you need to check node event and you need to second position element from there.
2. You know the delimiter between the values in node. for eg. ',' here in events
public void xmlToCSVfiltered(string p, int e, string nodeName, char delimiter)
{
//get the xml node
XDocument xml = XDocument.Load(p);
//get the required node. I am assuming you would know. For eg. Event Node
var requiredNode = xml.Descendants(nodeName);
foreach (var node in requiredNode)
{
if (node == null)
continue;
//Also here, I am assuming you have the delimiter knowledge.
var valueSplit = node.Value.Split(delimiter);
foreach (var value in valueSplit)
{
if (value == e.ToString())
{
AddToCSV();
}
}
}
}

Related

Checking multiple XML files

I am re-wording this from my original post:
I have two XML files, and they are related to a given year each. For example, 18/19 and 17/18. They conform to the same structure and below is small sample from one of these files. What I want is, in C#, to compare all records in these files where the Given Name, the Family Name, the NI Number and the Date of birth are the same, BUT the Learner Ref Number is different. I need to be able to compare, then push only these records into a data table so I can then push them into a spreadsheet (the spreadsheet bit I can do). I currently have the below as a starting block, but am still very much stuck.
Firstly, I have my Import button press for which:
private void Btn_Import_Click(object sender, RoutedEventArgs e)
{
ILRChecks.ILRReport.CrossYear();}
Then this goes to look at the Class of which eventually pushes the file to my location:
using System.Data;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using ILRValidation;
using InfExcelExtension;
namespace ILRChecks
{
internal static partial class ILRReport
{
internal static void CrossYear()
{
DataSet ds_CrossYearChecks =
ILRValidation.Validation.CrossYearChecks(Global.fileNames);
string output = Path.Combine(Global.foldername, "ULIN_Issues" +
".xlsx");
ds_CrossYearChecks.ToWorkBook(output);
}
}
}
And this is the bit I'm stuck on, which is the production of finding the differences:
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ILRValidation
{
public static partial class Validation
{
public static DataSet CrossYearChecks(DataSet ds_CrossYearChecks)
{
return CrossYearChecks(ds_CrossYearChecks);
}
public static DataSet CrossYearChecks(string[] xmlPath)
{
DataSet ds_xmlCrossYear = new DataSet();
return CrossYearChecks(ds_xmlCrossYear);
}
}
}
XML:
<Learner>
<LearnRefNumber></LearnRefNumber>
<ULN></ULN>
<FamilyName></FamilyName>
<GivenNames></GivenNames>
<DateOfBirth></DateOfBirth>
<Ethnicity></Ethnicity>
<Sex></Sex>
<LLDDHealthProb></LLDDHealthProb>
<NINumber></NINumber>
<PriorAttain></PriorAttain>
<MathGrade></MathGrade>
<EngGrade></EngGrade>
<PostcodePrior></PostcodePrior>
<Postcode></Postcode>
<AddLine1></AddLine1>
<AddLine3></AddLine3>
<Email></Email>
Well, you can traverse both XML files recursively and write down all the encountered changes. Something like should be helpful:
static string AppendPrefix(string oldPrefix, string addition) =>
oldPrefix == "" ? addition : $"{oldPrefix}.{addition}";
static void CompareElements(string prefix, XElement d1, XElement d2)
{
// 1. compare names
var newPrefix = AppendPrefix(prefix, d1.Name.ToString());
if (d1.Name != d2.Name)
{
Console.WriteLine(
$"Name mismatch: {newPrefix} != {AppendPrefix(prefix, d2.Name.ToString())}");
return;
}
// 2. compare attributes
var attrs = d1.Attributes().OrderBy(a => a.Name);
var unpairedAttributes = new HashSet<XAttribute>(d2.Attributes());
foreach (var attr in attrs)
{
var otherAttr = d2.Attributes(attr.Name).SingleOrDefault();
if (otherAttr == null)
{
Console.WriteLine($"No new attr: {newPrefix}/{attr.Name}");
continue;
}
unpairedAttributes.Remove(otherAttr);
if (attr.Value != otherAttr.Value)
Console.WriteLine(
$"Attr value mismatch: {newPrefix}/{attr.Name}: {attr.Value} != {otherAttr.Value}");
}
foreach (var attr in unpairedAttributes)
Console.WriteLine($"No old attr: {newPrefix}/{attr.Name}");
// 3. compare subelements
var leftNodes = d1.Nodes().ToList();
var rightNodes = d2.Nodes().ToList();
var smallerCount = Math.Min(leftNodes.Count, rightNodes.Count);
for (int i = 0; i < smallerCount; i++)
CompareNodes(newPrefix, i, leftNodes[i], rightNodes[i]);
if (leftNodes.Count > smallerCount)
Console.WriteLine($"Extra {leftNodes.Count - smallerCount} nodes at old file");
if (rightNodes.Count > smallerCount)
Console.WriteLine($"Extra {rightNodes.Count - smallerCount} nodes at new file");
}
static void CompareNodes(string prefix, int index, XNode n1, XNode n2)
{
if (n1.NodeType != n2.NodeType)
{
Console.WriteLine($"Node type mismatch: {prefix}/[{index}]");
return;
}
switch (n1.NodeType)
{
case XmlNodeType.Element:
CompareElements(prefix, (XElement)n1, (XElement)n2);
break;
case XmlNodeType.Text:
CompareText(prefix, index, (XText)n1, (XText)n2);
break;
}
}
static void CompareText(string prefix, int index, XText t1, XText t2)
{
if (t1.Value != t2.Value)
Console.WriteLine($"Text mismatch at {prefix}[{index}]");
}
Usage:
XDocument d1 = <get document #1 from somewhere>,
d2 = <get document #2 from somewhere>;
CompareNodes("", 0, d1.Root, d2.Root);
Obviously, instead of writing to console you should write to the appropriate spreadsheet.
Note that I'm ignoring the attribute reorder but not subnode reorder (which seems to be right).
Seems to me you're having trouble extracting the values you want from the xml, correct?
As the others have mentioned in the comments, without knowing the layout of your xml its impossible to give a specific example for your case. If you edit your question to include an example of your xml, we can help more.
Here are some general examples of how to extract values from xml:
private static bool CheckXmlDocument(string xmlPathCheck)
{
// if you have multiple files from which you need to extract values, pass in an array or List<string> and loop over it, fetching the values
// XmlDocument will allow you to edit the document as well as read it
// there's another option to use XPathDocument and XPathNavigator but it's read-only
var doc = new XmlDocument();
// this can throw various exceptions so might want to add some handling
doc.Load(xmlPathCheck);
// getting the elements, you have some options depending on the layout of the document
// if the nodes you want are identified by 'id' use this:
var nameElement = doc.GetElementById("name");
// if the nodes you want are identified by 'tag', use this:
var nameNodeList = doc.GetElementsByTagName("name");
// if you know the xpath to the specific node you want, use this:
var selectNameNode = doc.SelectSingleNode("the/xpath/to/the/node");
// if there are several nodes that have the same xpaths, use this:
var selectNameList = doc.SelectNodes("the/xpath/that/may/match/many/nodes");
// getting the value depends on the if you have an XmlNode, XmlElement or XmlNodeList
// if you have a single XmlElement or XmlNode you can get the value one of these ways depending on the layout of your document:
var name = nameElement.InnerText;
name = nameElement.InnerXml;
// if you have an XmlNodeList, you'll have to iterate through the nodes to find the one you want, like this:
foreach (var node in nameNodeList)
{
// here use some condition that will determine if its the element/node you want or not (depends on your xml layout)
if (node is XmlNode n)
{
name = n.InnerText;
}
}
// do that for all the values you want to compare, then compare them
return CheckValues(/*the values to compare*/);
}
XmlDocument
XmlNode
XmlElement

Parsing RDF - dotnetrdf c#

This is my first time ever looking at RDF, and after trying, I've no idea how to parse it. I'm looking at some RDF (in Turtle format) used in the AFF4 file system, here's a portion of it:
<aff4://0295fab8-94b7-4435-bdb3-932cf48e40bd>
a aff4:ImageStream ;
aff4:chunkSize "32768"^^xsd:int ;
aff4:chunksInSegment "2048"^^xsd:int ;
aff4:compressionMethod <http://code.google.com/p/snappy/> ;
aff4:imageStreamHash "82798a275176aa141a2993ca8931535b1303545a0954473f5c5e55b4d8d5a8e3ebdb9e9323e5ecfaf65f8d379a8e2b9150750f5cf44851cf4edb6a2e05372f42"^^aff4:SHA512 ;
aff4:imageStreamIndexHash "039eb2da046cfb8c8d40e6f9b42aae501fb36f9b09b5f29d660d3637f87c37c98c3ee3b995265adff1d2b971fa795317333bf50200e72fdfe9fa96acdb88b6d0"^^aff4:SHA512 ;
aff4:size "185335808"^^xsd:long ;
aff4:stored : ;
aff4:target <aff4://92015053-5f7b-4e5a-a1e7-901d8943cf1f> ;
aff4:version "1"^^xsd:int .
There's a lot of this stuff in the file, but I've no idea how to access any of it, thus far I've cobbled together:
private static void ParseInformationStream(Stream informationStream)
{
Console.WriteLine("Parsing information.turtle file: ");
informationStream.Position = 0;
TurtleParser turtleParser = new TurtleParser();
Graph graph = new Graph();
turtleParser.Load(graph, new StreamReader(informationStream));
foreach (var triple in graph.Triples)
{
Console.WriteLine(triple.Subject);
}
}
This prints out some of the data, but if for example, I wanted to access the aff4:compressionMethod (node?) specifically, how would I go about doing that? I've been reading about Sparql, but it all seems a bit overkill for what I need.
Any input or advice would be appreciated.
You can use the methods of the IGraph interface to access the contents of the parsed graph. For example the following will retrieve all image streams (in Turtle "a" is a shortcut for the rdf:type predicate) and print out the compression method for each stream:
// Get the node for rdf:type
var rdfType = graph.CreateUriNode(new Uri(RdfSpecsHelper.RdfType));
// Get the node for the aff4:ImageStream type
var imageStream = graph.GetUriNode("aff4:ImageStream");
// Get the node for the aff4:compressionMethod predicate
var compressionMethod = graph.CreateUriNode("aff4:compressionMethod");
// Get the streams (the subject of x a aff4:ImageStream in the Turtle)
var imageStreams = graph.GetTriplesWithPredicateObject(rdfType, imageStream).Select(t => t.Subject);
foreach (var streamInstance in imageStreams)
{
// Get the first compressionMethod value for the stream instance
var compression = graph.GetTriplesWithSubjectPredicate(streamInstance, compressionMethod)
.Select(t => t.Object).FirstOrDefault();
Console.WriteLine("Stream " + streamInstance + " uses compression method " + compression);
}
For more on accessing nodes an triples in a graph in dotNetRDF, please see https://github.com/dotnetrdf/dotnetrdf/wiki/UserGuide-Working-With-Graphs

How to write and read list to text file that contains user input using c#

I am working on Console Application and I am trying to save list to txt file and read that data later.
In program user inputs name of category and I am not sure how to save that with list in txt file.
Struct Category that holds name.
struct Category
{
public string name;
}
This is my code so far.
Category k;
Console.WriteLine("Enter name of category: ");
k.name = Console.ReadLine();
List<String> category = new List<string>();
TextWriter tw = new StreamWriter("../../dat.txt");
foreach (string k in category)
{
string[] en = s.Split(',');
category.Add(k.name); // Here I am not sure how to save name
}
tw.Close();
StreamReader sr = new StreamReader("../../dat.txt");
string data = sr.ReadLine();
while (data != null)
{
Console.WriteLine(data);
data = sr.ReadLine();
}
sr.Close();
It doesn't give me any error but it's not writing name to txt file.
SOLUTIN
string filePath = #"../../datoteka.txt";
List<String> kategorije = File.ReadAllLines(filePath).ToList();
foreach (string s in kategorije)
{
Console.WriteLine(s);
}
kategorije.Add(k.naziv);
File.WriteAllLines(filePath,kategorije);
You can use the static methods of the System.IO.File class. Their advantage is that they open and close files automatically, thus reducing the task of writing and reading files to a single statement
File.WriteAllLines(yourFilePath, category);
You can read the lines back into a list with
 category = new List(ReadLines(yourFilePath));
ReadLines returns an IEnumerable<string> that is accepted as data source in the constructor of the list.
or into an array with
string[] array = ReadAllLines(yourFilePath);
Your solution does not write anything to the output stream. You are initializing a TextWriter but not using it. You would use it like
tw.WriteLine(someString);
Your code has some problems: You are declaring a category variable k, but you never assign it a category. Your list is not of type category.
A better solution would work like this
var categories = new List<Category>(); // Create a categories list.
while (true) { // Loop as long as the user enters some category name.
Console.WriteLine("Enter name of category: ");
string s = Console.ReadLine(); // Read user entry.
if (String.IsNullOrWhiteSpace(s)) {
// No more entries - exit loop
break;
}
// Create a new category and assign it the entered name.
var category = new Category { name = s };
//TODO: Prompt the user for more properties of the category and assign them to the
// category.
// Add the category to the list.
categories.Add(category);
}
File.WriteAllLines(yourFilePath, categories.Select(c => c.name));
The Category type should be class. See: When should I use a struct instead of a class?
You are not using WtiteLine to write content. Add below code in your solution after
category.Add(k.name);
tw.WriteLine(someString);
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/file-system/how-to-write-to-a-text-file
Find below sample code for read and write.
class WriteTextFile
{
static void Main()
{
System.IO.File.WriteAllLines(#"C:\Users\Public\TestFolder\WriteLines.txt", lines);
string text = "A class is the most powerful data type in C#. Like a structure, " +
"a class defines the data and behavior of the data type. ";
System.IO.File.WriteAllText(#"C:\Users\Public\TestFolder\WriteText.txt", text);
using (System.IO.StreamWriter file =
new System.IO.StreamWriter(#"C:\Users\Public\TestFolder\WriteLines2.txt"))
{
foreach (string line in lines)
{
if (!line.Contains("Second"))
{
file.WriteLine(line);
}
}
}
using (System.IO.StreamWriter file =
new System.IO.StreamWriter(#"C:\Users\Public\TestFolder\WriteLines2.txt", true))
{
file.WriteLine("Fourth line");
}
}
}

C# - Problem removing items in List1 & List2 from List3

I have a file that I am reading in, splitting up into different lists and outputting them into RichTextBox to then be read into 3 different Listboxes. I am currently doing all of this, however I have come across something I do not know how to fix/work around.
My code is below and I seem to be having trouble understanding why it fails to properly work when it gets to the Match twoRegex = Regex.Match(...) section of the code.
CODE:
private void SortDataLines()
{
try
{
// Reads the lines in the file to format.
var fileReader = File.OpenText(openGCFile.FileName);
// Creates a list for the lines to be stored in.
var placementUserDefinedList = new List<string>();
// Reads the first line and does nothing with it.
fileReader.ReadLine();
// Adds each line in the file to the list.
while (true)
{
var line = fileReader.ReadLine();
if (line == null)
break;
placementUserDefinedList.Add(line);
}
// Creates new lists to hold certain matches for each list.
var oneResult = new List<string>();
var twoResult = new List<string>();
var mainResult = new List<string>();
foreach (var userLine in placementUserDefinedList)
mainResult.Add(string.Join(" ", userLine));
foreach (var oneLine in mainResult)
{
// PLACEMENT ONE Regex
Match oneRegex = Regex.Match(oneLine, #"^.+(RES|0402|0201|0603|0805|1206|1306|1608|3216|2551"
+ #"|1913|1313|2513|5125|2525|5619|3813|1508|6431|2512|1505|2208|1005|1010|2010|0505|0705"
+ #"|1020|1812|2225|5764|4532|1210|0816|0363|SOT)");
if (oneRegex.Success)
oneResult.Add(string.Join(" ", oneLine));
}
//
// THIS IS THE SECTION THAT FAILS..
//
foreach(var twoLine in mainResult)
{
//PLACEMENT TWO Regex
Match twoRegex = Regex.Match(twoLine, #"^.+(BGA|SOP8|QSOP|TQSOP|SOIC16|SOIC12|SOIC8|SO8|SO08"
+ #"CQFP|LCC|LGA|OSCCC|PLCC|QFN|QFP|SOJ|SON");
if (twoRegex.Success)
twoResult.Add(string.Join(" ", twoLine));
}
// Removes the matched values from both of the Regex used above.
List<string> userResult = mainResult.Except(oneResult).ToList();
userResult = userResult.Except(twoResult).ToList();
// Prints the proper values into the assigned RichTextBoxes.
foreach (var line in userResult)
userDefinedRichTextBox.AppendText(line + "\n");
foreach (var line in oneResult)
placementOneRichTextBox.AppendText(line + "\n");
foreach (var line in twoResult)
placementTwoRichTextBox.AppendText(line + "\n");
}
// Catches an exception if the file was not opened.
catch (Exception)
{
MessageBox.Show("Could not match any regex values.", "Regular Expression Match Error",
MessageBoxButtons.OK, MessageBoxIcon.Warning);
}
}
QUESTIONS:
Does anyone understand why I am unable to to find, or fail at, the second set of REGEX?
With that, is there a way to fix it?
Suggestions please! :)
Haven't you missed the pipeline character in your second regex between two lines?
Match twoRegex = Regex.Match(twoLine, #"^.+(BGA|SOP8|QSOP|TQSOP|SOIC16|SOIC12|SOIC8|SO8|SO08"
+ #"|CQFP|LCC|LGA|OSCCC|PLCC|QFN|QFP|SOJ|SON)");

Reading a CSV file in .NET?

How do I read a CSV file using C#?
A choice, without using third-party components, is to use the class Microsoft.VisualBasic.FileIO.TextFieldParser (http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx) . It provides all the functions for parsing CSV. It is sufficient to import the Microsoft.VisualBasic assembly.
var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(file);
parser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
parser.SetDelimiters(new string[] { ";" });
while (!parser.EndOfData)
{
string[] row = parser.ReadFields();
/* do something */
}
You can use the Microsoft.VisualBasic.FileIO.TextFieldParser class in C#:
using System;
using System.Data;
using Microsoft.VisualBasic.FileIO;
static void Main()
{
string csv_file_path = #"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTableFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTableFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datacolumn = new DataColumn(column);
datacolumn.AllowDBNull = true;
csvData.Columns.Add(datacolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
You could try CsvHelper, which is a project I work on. Its goal is to make reading and writing CSV files as easy as possible, while being very fast.
Here are a few ways you can read from a CSV file.
// By type
var records = csv.GetRecords<MyClass>();
var records = csv.GetRecords( typeof( MyClass ) );
// Dynamic
var records = csv.GetRecords<dynamic>();
// Using anonymous type for the class definition
var anonymousTypeDefinition =
{
Id = default( int ),
Name = string.Empty,
MyClass = new MyClass()
};
var records = csv.GetRecords( anonymousTypeDefinition );
I usually use a simplistic approach like this one:
var path = Server.MapPath("~/App_Data/Data.csv");
var csvRows = System.IO.File.ReadAllLines(path, Encoding.Default).ToList();
foreach (var row in csvRows.Skip(1))
{
var columns = row.Split(';');
var field1 = columns[0];
var field2 = columns[1];
var field3 = columns[2];
}
I just used this library in my application. http://www.codeproject.com/KB/database/CsvReader.aspx. Everything went smoothly using this library, so I'm recommending it. It is free under the MIT License, so just include the notice with your source files.
I didn't display the CSV in a browser, but the author has some samples for Repeaters or DataGrids. I did run one of his test projects to test a Sort operation I have added and it looked pretty good.
You can try Cinchoo ETL - an open source lib for reading and writing CSV files.
Couple of ways you can read CSV files
Id, Name
1, Tom
2, Mark
This is how you can use this library to read it
using (var reader = new ChoCSVReader("emp.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO object defined to match up with CSV file like below
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can parse the same file using this POCO class as below
using (var reader = new ChoCSVReader<Employee>("emp.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library
I recommend Angara.Table, about save/load: http://predictionmachines.github.io/Angara.Table/saveload.html.
It makes column types inference, can save CSV files and is much faster than TextFieldParser. It follows RFC4180 for CSV format and supports multiline strings, NaNs, and escaped strings containing the delimiter character.
The library is under MIT license. Source code is https://github.com/Microsoft/Angara.Table.
Though its API is focused on F#, it can be used in any .NET language but not so succinct as in F#.
Example:
using Angara.Data;
using System.Collections.Immutable;
...
var table = Table.Load("data.csv");
// Print schema:
foreach(Column c in table)
{
string colType;
if (c.Rows.IsRealColumn) colType = "double";
else if (c.Rows.IsStringColumn) colType = "string";
else if (c.Rows.IsDateColumn) colType = "date";
else if (c.Rows.IsIntColumn) colType = "int";
else colType = "bool";
Console.WriteLine("{0} of type {1}", c.Name, colType);
}
// Get column data:
ImmutableArray<double> a = table["a"].Rows.AsReal;
ImmutableArray<string> b = table["b"].Rows.AsString;
Table.Save(table, "data2.csv");
You might be interested in Linq2Csv library at CodeProject. One thing you would need to check is that if it's reading the data when it needs only, so you won't need a lot of memory when working with bigger files.
As for displaying the data on the browser, you could do many things to accomplish it, if you would be more specific on what are your requirements, answer could be more specific, but things you could do:
1. Use HttpListener class to write simple web server (you can find many samples on net to host mini-http server).
2. Use Asp.Net or Asp.Net Mvc, create a page, host it using IIS.
Seems like there are quite a few projects on CodeProject or CodePlex for CSV Parsing.
Here is another CSV Parser on CodePlex
http://commonlibrarynet.codeplex.com/
This library has components for CSV parsing, INI file parsing, Command-Line parsing as well. It's working well for me so far. Only thing is it doesn't have a CSV Writer.
This is just for parsing the CSV. For displaying it in a web page, it is simply a matter of taking the list and rendering it however you want.
Note: This code example does not handle the situation where the input string line contains newlines.
public List<string> SplitCSV(string line)
{
if (string.IsNullOrEmpty(line))
throw new ArgumentException();
List<string> result = new List<string>();
int index = 0;
int start = 0;
bool inQuote = false;
StringBuilder val = new StringBuilder();
// parse line
foreach (char c in line)
{
switch (c)
{
case '"':
inQuote = !inQuote;
break;
case ',':
if (!inQuote)
{
result.Add(line.Substring(start, index - start)
.Replace("\"",""));
start = index + 1;
}
break;
}
index++;
}
if (start < index)
{
result.Add(line.Substring(start, index - start).Replace("\"",""));
}
return result;
}
}
I have been maintaining an open source project called FlatFiles for several years now. It's available for .NET Core and .NET 4.5.1.
Unlike most of the alternatives, it allows you to define a schema (similar to the way EF code-first works) with an extreme level of precision, so you aren't fight conversion issues all the time. You can map directly to your data classes, and there is also support for interfacing with older ADO.NET classes.
Performance-wise, it's been tuned to be one of the fastest parsers for .NET, with a plethora of options for quirky format differences. There's also support for fixed-length files, if you need it.
you can use this library: Sky.Data.Csv
https://www.nuget.org/packages/Sky.Data.Csv/
this is a really fast CSV reader library and it's really easy to use:
using Sky.Data.Csv;
var readerSettings = new CsvReaderSettings{Encoding = Encoding.UTF8};
using(var reader = CsvReader.Create("path-to-file", readerSettings)){
foreach(var row in reader){
//do something with the data
}
}
it also supports reading typed objects with CsvReader<T> class which has a same interface.

Categories

Resources