Parse CSV File section wise - c#

I am new to in C# need help to write parser for below cvs file of data
[INFO]
LINE_NAME,MACHINE_SN,MACHINE_NAME,OPERATOR_ID
LineName,ParmiMachineSN,PARMI_AOI_1,engineer
[INFO_END]
[PANEL_INSP_RESULT]
MODEL_NAME,MODEL_CODE,PANEL_SIDE,INDEX,BARCODE,DATE,START_TIME,END_TIME,DEFECT_NAME,DEFECT_CODE,RESULT
E11-03356-0388-A-TOP CNG,,BOTTOM,47,MLT0388A03358CSNSOF1232210200052-0001,20201023,12:46:57,12:47:04,,,OK
[PANEL_INSP_RESULT_END]
[BOARD_INSP_RESULT]
BOARD_NO,BARCODE,DEFECT_NAME,DEFECT_CODE,BADMARK,RESULT
1,MLT0388A03358CSNSOF1232210200052-0001,,,NO,OK
2,MLT0388A03358CSNSOF1232210200052-0004,,,NO,OK
3,MLT0388A03358CSNSOF1232210200052-0003,,,NO,OK
4,MLT0388A03358CSNSOF1232210200052-0002,,,NO,OK
[BOARD_INSP_RESULT_END]
[COMPONENT_INSP_RESULT]
BOARD_NO,LOCATION_NAME,PIN_NUMBER,POS_X,POS_Y,DEFECT_NAME,DEFECT_CODE,RESULT
[COMPONENT_INSP_RESULT_END]
I need to parse the above file

To parse the above CSV file in C#, you can use the following steps:
Read the entire file into a string using the File.ReadAllText method.
string fileText = File.ReadAllText("file.csv");
Split the file into individual sections by looking for the "[INFO]" and "
[INFO_END]" tags, and then use a loop to process each section.
string[] sections = fileText.Split(new string[] { "[INFO]", "[INFO_END]", "[PANEL_INSP_RESULT]", "[PANEL_INSP_RESULT_END]", "[BOARD_INSP_RESULT]", "[BOARD_INSP_RESULT_END]", "[COMPONENT_INSP_RESULT]", "[COMPONENT_INSP_RESULT_END]" }, StringSplitOptions.RemoveEmptyEntries);
foreach (string section in sections)
{
//Process each section
}
Within the loop, use the String.Split method to split each section into rows by looking for the newline character.
string[] rows = section.Split('\n');
Use the String.Split method again to split each row into cells by looking for the comma.
foreach (string row in rows)
{
string[] cells = row.Split(',');
//Process each cell
}
Now you can process each cell as you need, you can check the first cell value to decide which section this row belongs to, and then you can process the cells according to their type and position in the row.
You can use a switch case statement to check which section you are currently processing and then use appropriate logic to parse the data.
Please be aware that this is a simplified example, and you may need to add additional error handling and validation to ensure that the data is properly parsed.
This is an example how you can parse the csv file but you might need to handle various edge cases like empty rows, empty cells, etc based on your specific use case.

The following reads all text, creates an anonymous list with line index and line followed by looping through a list of sections. In the loop find a section and in this case displays to a console window.
internal partial class Program
{
static void Main(string[] args)
{
var items = (File.ReadAllLines("YourFileNameGoesHere")
.Select((line, index) => new { Line = line, Index = index })
.Select(lineData => lineData)).ToList();
List<string> sections = new List<string>()
{
"INFO",
"PANEL_INSP_RESULT",
"BOARD_INSP_RESULT",
"COMPONENT_INSP_RESULT"
};
foreach (var section in sections)
{
Console.WriteLine($"{section}");
var startItem = items.FirstOrDefault(x => x.Line == $"[{section}]");
var endItem = items.FirstOrDefault(x => x.Line == $"[{section}_END]");
if (startItem is not null && endItem is not null)
{
bool header = false;
for (int index = startItem.Index + 1; index < endItem.Index; index++)
{
if (header == false)
{
Console.WriteLine($"\t{items[index].Line}");
header = true;
}
else
{
Console.WriteLine($"\t\t{items[index].Line}");
}
}
}
else
{
Console.WriteLine("\tFailed to read this section");
}
}
}
}

Related

Is It possible to find out what are the common part in String List

I was working on finding out the Common string part in the String list. If we take a sample data set
private readonly List<string> Xpath = new List<string>()
{
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(1)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(2)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(3)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(4)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(5)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(6)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(7)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(8)>H2:nth-of-type(1)",
"BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>SECTION:nth-of-type(9)>H2:nth-of-type(1)"
};
From this, I want to find out to which children these are similar. data is an Xpath list.
Programmatically I should be able to tell
Expected output:
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV
In order to get this What I did was like this. I separate each item by > and then create a list of items for each dataset originally.
Then using this find out what are the unique items
private IEnumerable<T> GetCommonItems<T>(IEnumerable<T>[] lists)
{
HashSet<T> hs = new HashSet<T>(lists.First());
for (int i = 1; i < lists.Length; i++)
{
hs.IntersectWith(lists[i]);
}
return hs;
}
Able to find out the unique values and create a dataset again. But what happened is if this contains Ex:- Div in two places and it also in every originally dataset even then this method will pick up only one Div.
From then I would get something like this:
BODY>MAIN:nth-of-type(1)>DIV>SECTION
But I need this
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-
type(3)>DIV>ARTICLE>DIV>DIV>DIV
Disclaimer: This is not the most performant solution but it works :)
Let's start with splitting the first path by > character
Do the same with all the paths
char separator = '>';
IEnumerable<string> firstPathChunks = Xpath[0].Split(separator);
var chunks = Xpath.Select(path => path.Split(separator).ToList()).ToArray();
Iterate through the firstPathChunks
Iterate through the chunks
if there is a match then remove the first element
if all first element is removed then append the matching prefix to sb
void Process(StringBuilder sb)
{
foreach (var pathChunk in firstPathChunks)
{
foreach (var chunk in chunks)
{
if (chunk[0] != pathChunk)
{
return;
}
chunk.RemoveAt(0);
}
sb.Append(pathChunk);
sb.Append(separator);
}
}
Sample usage
var sb = new StringBuilder();
Process(sb);
Console.WriteLine(sb.ToString());
Output
BODY>MAIN:nth-of-type(1)>DIV>SECTION>DIV>SECTION>DIV>DIV:nth-of-type(1)>DIV>DIV:nth-of-type(3)>DIV>ARTICLE>DIV>DIV>DIV>
Parsing the string by the seperator > is a good idea. Instead of then creating a list of unique items you should create a list of all items contained in the string which would result in
{
"BODY",
"MAIN:nth-of-type(1)",
"DIV",
"SECTTION",
"DIV",
...
}
for the first entry of your XPath list.
This way you create a List<List<string>> containing every element of each entry of your XPath list. You then can compare all first elements of the inner lists. If they are equal save that elements value to you output and proceed with all second elements and so on until you find an element that is not equal in all outer lists.
Edit:
After seperating your list by the > seperator this could look something like this:
List<List<string>> XPathElementsLists;
List<string> resultElements = new List<string>();
string result;
XPathElementsLists = ParseElementsFormXPath(XPath);
for (int i = 0; i < XPathElementsLists[0].Count; i++)
{
bool isEqual = true;
string compareElemment = XPathElementsLists[0][i];
foreach (List<string> element in XPathElementsLists)
{
if (!String.Equals(compareElemment, element))
{
isEqual = false;
break;
}
}
if (!isEqual)
{
break;
}
resultElements.Add(compareElemment);
}
result = String.Join(">", resultElements.ToArray());

How to get all rows for specific column from .csv file

In my project, I have a .csv file with many columns.
I need to extract all rows for only first column. I've managed to read all lines, but got stuck on how to extract rows from first column to another .csv file.
string filePath = #"C:\Users\BP185150\Desktop\OTC.csv";
string[] OTC_Output = File.ReadAllLines(#"C:\Users\BP185150\Desktop\OTC.csv");
foreach (string line in OTC_Output)
{
Console.WriteLine(line);
Console.Read();
}
Console.ReadLine();
Depending on what seperator your csv is using you can use the string.split() function.
e.g.
string firstItem = line.Split(',')[0];
Console.WriteLine(firstItem);
Adding them to a collection:
ICollection<string> firstItems = new List<string>();
string[] OTC_Output = File.ReadAllLines(#"C:\Users\BP185150\Desktop\OTC.csv");
foreach (string line in OTC_Output)
{
firstItems.Add(line.Split(',')[0]);
}
Well if you want to use File.ReadAllLines, the best way to get the first column is to split the line with a delimiter that your csv is using. Then just add the first item of every line to a collection.
var column = OTC_Output.Select(line => line.Split(';').First()).ToList();
In lineItems, you'll have all the columns splitted:
var lineItems = line.Split(";").ToArray();
Then, parse the value only for the first of them:
lineItems.GetValue(0).ToString();

How to collect a list of specific substrings following on a different substring trigger?

I have a large text dataset (~2 GB) of engineering information that was written in Cobol. I am attempting to extract certain substrings within it and make a CSV list with the extracted data.
The substrings of interest occur at known locations within each record. However, there are no unique identifiers (primary keys) within the data itself. It is simply a list of data where each "record" begins with a line starting in "01". Every subsequent line belongs to that same record, until the next "01". The presence of a given line might vary, but if present, data occurs at specific intervals.
The data looks like this:
Line1: 01253820RELEVANTSUBSTRING39ALSORELEVANT0990
Line2: 02999IRRELEVANT
Line3: 0420180101RELEVANTMONTHLYDATA000MORERELEVANTDATA8980
Line4: 0420190101FURTHERRELEVANTMONTHLYDATA
Line5: 12000003848982IRRELEVANT
Line6: 0100NEWRECORD8932000
Line7: 0420100101MORE
I have been able to successfully extract relevant substrings occurring after each "01" using the following code (partially included below):
static void PopulateList(){
using (StreamReader sr = new StreamReader(sourcePath))
{
string ctrl //control key - indicates a new record if "01"
List<TurbineModel> turbines = new List<TurbineModel>();
List<string> lines = File.ReadAllLines(sourcePath).ToList();
foreach (string line in lines)
{
if (line.Substring(0, 2) == "01")
{
ctrl = line.Substring(0, 2);
TurbineModel newWell = new TurbineModel();
newTurbine.Ctrl = ctrl;
turbines.Add(newTurbine);
}
}
}
This code is working fine. However, there are lines further down that begin with "04" which have other information that I have not been able to extract and group with the current "01" list. I can extract substrings from every line that begins with an "04", but I don't have any way to link each record's data to the "01" record that preceded it.
What I need the code to do is the following:
1) Arrive at an "01" in the data and set up a new record
2) Extract relevant info from "01" line (per code above)
3) Skip subsequent lines unless it reaches an "04"
4) If it reaches an "04", extract substrings from that line and group those extracted substrings with the "01" substrings
5) Continue scanning lines until it reaches a new "01", at which point it sets up a new record and starts again
6) Output everything to CSV
I have been unable to group the information together so that I know which "04" relates to which "01".
Any help you can provide is greatly appreciated. Let me know if I can clarify.
It seems to me that all you have to do is create a class that can store the data from the 01 line, and which can hold the relevant parts of the following lines.
Here's an example, where we loop through each line in the file, and if the line starts with "01, we create a new Item and add the line as it's Data (you could do some processing of the line contents instead to populate other properties). If the line doesn't start with "01" and we've already created an Item, then we add the line to the item's AssociatedLines property if it starts with "04" (you could also process the line in some way and add the relevant parts to the Item instead).
At the end, we have a list of Item objects that were each created from a line that begins with "01" and which contain all the lines after that until the next line that starts with "01".
First, the Item class:
public class Item
{
public string Data { get; set; }
public List<string> AssociatedData { get; set; } = new List<string>();
// This returns a comma-separated line representing this item
public string GetCsvString()
{
return $"{Data},{string.Join(",", AssociatedData)}";
}
}
And then the code that creates a list of these based on the file data:
public static List<Item> GetItems(string filePath)
{
var items = new List<Item>();
Item current = null;
foreach (var line in File.ReadAllLines(filePath))
{
if (line.StartsWith("01"))
{
// If there's already a current item, add it to our list
if (current != null) items.Add(current);
// Here we would parse the '01' line and set properties of the current item
current = new Item {Data = line};
}
else if (line.StartsWith("04"))
{
// Here we would parse the '04' line and set properties of the current item
current?.AssociatedData.Add(line);
}
}
// Add the final item to our list
if (current != null) items.Add(current);
return items;
}
And then the code that calls the method above would simply look like:
var items = GetItems(#"f:\public\temp\temp.txt");
Extracting an item to a CSV file would probably best be done by either overriding the ToString() method on the Item class or providing a GetCsvString() method that spits out the relevant data in the correct format. After which, you could write the items to a csv file like:
File.WriteAllLines(#"f:\public\temp\temp.csv", items.Select(item => item.GetCsvString()));
Give this a go, it's a "chunk reader" :) I have used something similar in the past. It may need some work, but it parses your sample into 2 "chunks".
namespace Solution
{
class Solution
{
static void Main(string[] args)
{
var reader = new ChunkReader();
Chunk chunk = null;
foreach (Chunk c in reader.Read(#"D:\test.txt"))
{
Console.WriteLine(c.Header);
}
Console.ReadKey();
}
}
internal class ChunkReader
{
public IEnumerable<Chunk> Read(string filePath)
{
Chunk currentChunk = null;
using (StreamReader reader = new StreamReader(File.OpenRead(filePath)))
{
string currentLine;
while ((currentLine = reader.ReadLine()) != null)
{
if (currentLine.StartsWith("01"))
{
if (currentChunk != null)
{
yield return currentChunk;
}
currentChunk = new Chunk();
currentChunk.Contents.Add(currentLine);
}
else
{
currentChunk?.Contents.Add(currentLine);
}
}
}
yield return currentChunk;
}
}
internal class Chunk
{
public Chunk()
{
Contents = new SortedSet<string>();
}
public SortedSet<string> Contents { get; }
public string Header
{
get
{
return Contents.FirstOrDefault(s => s.StartsWith("01"));
}
}
}
}
First of all, as some others have suggested, if your file is really large you should consider an alternative to File.ReadAllLines() as it can get costly. But since the question is not about that, I'm moving past that.
First, two dummy functions to mimic extracting your necessary data once you know if a line begins with either 01 or 04.
static string Extract01Data(string line)
{
return line;
}
static string Extract04Data(string line)
{
return line;
}
EDIT
Edited the answer to accommodate multiple lines that begin with 04 that come after the first 01 line:
And a simple class to hold your resulting data:
public class Record
{
public string OneInfo { get; set; }
public List<string> FourInfo { get; set; } = new List<string>();
}
Then, here's my code, with explanations in comments:
static void Main()
{
var file = #"C:\Users\gurudeniyas\Desktop\CobolData.txt";
var lines = File.ReadAllLines(file).ToList();
var records = new List<Record>();
for (var count = 0; count < lines.Count; count++)
{
var line = lines[count];
var firstTwo = line.Substring(0, 2);
// Iterate till we find a line that starts with 01
if (firstTwo == "01")
{
// Create a Record and add 01 line related data
var rec = new Record
{
OneInfo = Extract01Data(line)
};
// Here we iterate to find preceding lines that start with 03
// If we find them, extract 04 data and add as a record
// Break out of the loop if we find the next 01 line or EOF
do
{
count++;
if (count == lines.Count)
break;
line = lines[count];
firstTwo = line.Substring(0, 2);
if (firstTwo == "04")
{
rec.FourInfo.Add(Extract04Data(line));
}
} while (firstTwo != "01");
// If we found next 01, backtrack count by 1 so in the outer loop we can process that record again
if (firstTwo == "01")
{
count--;
}
records.Add(rec);
}
}
Console.ReadLine();
}
If the "04" always follows the 01 you can just add an else if as below, and then access the last item in your list (this will work because adding a item to a list adds it to the end).
foreach (string line in lines)
{
if (line.Substring(0, 2) == "01")
{
ctrl = line.Substring(0, 2);
TurbineModel newWell = new TurbineModel();
newTurbine.Ctrl = ctrl;
turbines.Add(newTurbine);
}
else if (line.Substring(0, 2) == "04")
{
var lastTurbine = turbines[turbines.Count - 1];
//do what you need to do with the "04" record monthly data here
}
}
Have you looked at using a finite state machine algorithm? Seems ideal for this.

Is it possible to get specific column data from a large pipe delimited file without creating a class for every column?

I am writing a C# program that will grab some data from a pipe delimited file with 400 columns in it. I'm only required to work with 6 of the columns in each row. The file does not have headers, and the first line is a 5 column row with general description of file (file name, batch date, number of records, total, report id). Before I create a class with 400 fields in it, I was curious if anyone here had a better idea of how to approach this. Thanks for your time.
Well, you don't mention much as to how you're loading the file, but I imagine it is using System.IO and then doing a string split on each line. If so, you need not extract every field in the resulting splitted array.
Imagine you only needed two columns, the second and fourth, and had a class to accept each row as follows:
public class row {
public string field2;
public string field4;
}
Then you would extract your data like this:
IEnumerable<row> parsed =
File.ReadLines(#"path to file")
.Skip(1)
.Select(line => {
var splitted = line.Split('|');
return new row {
field2 = splitted[1],
field4 = splitted[3]
};
});
You could use the Microsoft.VisualBasic.FileIO reference and then do something like this:
using(var parser = new TextFieldParsser(file))
{
Int32 skipHeader = 0;
parser.SetDelimiters("|");
while (!parser.EndOfData)
{
//Processing row
string[] fields = parser.ReadFields();
Int32 x = 0;
if (skipHeader > 0)
{
foreach (var field in fields)
{
if (x == 0)
{
//SAVE STUFF TO VARIABLE
}
else if (x==4)
{
//SAVE MORE STUFF
}
else if (x == 20)
{
//SAVE LAST STUFF
break;//THIS IS THE LAST COLUMN OF DATA NEEDED SO YOU BREAK
}
x++;
}
//DO SOMETHING WITH ALL THE SAVED STUFF AND CLEAR IT OUT
}
else
{
skipHeader++;
}
}}

Is there a way to dynamically create an object at run time in .NET 3.5?

I'm working on an importer that takes tab delimited text files. The first line of each file contains 'columns' like ItemCode, Language, ImportMode etc and there can be varying numbers of columns.
I'm able to get the names of each column, whether there's one or 10 and so on. I use a method to achieve this that returns List<string>:
private List<string> GetColumnNames(string saveLocation, int numColumns)
{
var data = (File.ReadAllLines(saveLocation));
var columnNames = new List<string>();
for (int i = 0; i < numColumns; i++)
{
var cols = from lines in data
.Take(1)
.Where(l => !string.IsNullOrEmpty(l))
.Select(l => l.Split(delimiter.ToCharArray(), StringSplitOptions.None))
.Select(value => string.Join(" ", value))
let split = lines.Split(' ')
select new
{
Temp = split[i].Trim()
};
foreach (var x in cols)
{
columnNames.Add(x.Temp);
}
}
return columnNames;
}
If I always knew what columns to be expecting, I could just create a new object, but since I don't, I'm wondering is there a way I can dynamically create an object with properties that correspond to whatever GetColumnNames() returns?
Any suggestions?
For what it's worth, here's how I used DataTables to achieve what I wanted.
// saveLocation is file location
// numColumns comes from another method that gets number of columns in file
var columnNames = GetColumnNames(saveLocation, numColumns);
var table = new DataTable();
foreach (var header in columnNames)
{
table.Columns.Add(header);
}
// itemAttributeData is the file split into lines
foreach (var row in itemAttributeData)
{
table.Rows.Add(row);
}
Although there was a bit more work involved to be able to manipulate the data in the way I wanted, Karthik's suggestion got me on the right track.
You could create a dictionary of strings where the first string references the "properties" name and the second string its characteristic.

Categories

Resources