CSV (or excel) parsing ; eliminate empty column

CSV (or excel) parsing ; eliminate empty column - c#

I am using TextFieldParser class to parse the file. I want to eliminate or ignore complete column if "entire column" is empty (which means single empty cell of a perticular row should be considered) Is this possible?
Note: as per functionality, I need to use data copied to clipboard. So can not pass direct file path to the parser.
TextFieldParser parser = new TextFieldParser(new StringReader(row));
string[] delimiters = { ",", "\t" };
parser.SetDelimiters(delimiters);
string[] columns = null;
while (!parser.EndOfData)
{
columns = parser.ReadFields();
}
Appreciate your help.

After reading through the TextFieldParser Class page on MSDN, I see that there is nothing written there that would make me think that this class can ignore a whole column. That would be something that you would have to do manually. Furthermore, your code does not seem right because you are trying to read the fields repeatedly with the same variable:
while (!parser.EndOfData)
{
columns = parser.ReadFields();
}

Related

Can someone please confirm the reason behind foreach loop giving error as "invalid token" and "splittedText" as does not exist in current context?

string[] splittedText = File.ReadAllLines(#"file.txt");//.Split(',');
foreach (string data in splittedText)
{
}
I want to read through a file in c# which returns array of string type. Then, I will be iterating over the array to fetch my desired data.

If you want to read a CSV file, you should use a CVS parser. Values in the CSV file are separated using command and in some cases, the value in the CSV file can also contain a comma. In that case, the column values are wrapped in double-quotes. And this solution will not handle that scenario.
var splittedText = File.ReadAllText("E:\\Test.txt").Split(',');
foreach (string data in splittedText)
{
Console.WriteLine(data.Trim());
}

Hint - Reading file line by line or Reading whole file content depends on your use case. May be below code snippet give some idea on how to split the content.
Please try.
var inputtext = File.ReadAllText(#"inpufile.txt");
inputtext.Replace("\n", "")
.Split(',',
StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries)
.ToList().ForEach(t =>
{
System.Console.WriteLine(t);
//Other manupulations
});
if you want to split based on multiple characters , pass a character array to the split().
new char[] { ',', ':' };
Thank you.

You need change File.ReadAllLines to File.ReadAllText(path) then you can split method.

How Can I Read a Multiline Field from a CSV Without Altering It?

I have a CSV that looks like this. My goal is to extract each entry (notice I said entry, not line), where an entry starts from the first column and stretches to the last column, and may span multiple lines. I'd like to extract an entry without ruining the formatting. For example, I do not want the following to be considered four seperate lines,
Eg. 1, One Column Multiple Lines
...,"1. copy ctor
2. copy ctor
3. declares function
4. default ctor",... // Where ... represents the columns before and after
but rather a column in one entry that can be represented as such
Eg. 2, One Column Single Line
"1. copy ctor\n2.copy ctor\ndeclares function\n4.default ctor"
When I iterate over the CSV, as such, I get Eg. 1. I'm not sure why splitting on a comma is treating a new line as a comma.
using (var streamReader = new StreamReader("results-survey111101.csv"))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
string[] splitLine = line.Split(',');
foreach (var column in splitLine)
Console.WriteLine(column);
}
}
If someone can show me what I need to do to get these multi line CSV columns into one line that maintains the formatting (e.g. adds \t or \n where necessary) that would be great. Thanks!

Assuming your source file is valid CSV, variability in the data is really hard to account for. That's all I'll say, but I'll link you to another SO answer if you need convincing that writing your own CSV parser is a horrible task. Reading CSV files using C#
Let's assume you are going to take advantage of an existing CSV reader library. I'll use TextFieldParser from the Microsoft.VisualBasic library as is used in the example answer I linked.
Your task is to read your source file line by line, and validate whether the line is a complete CSV entry on it's own, or if it forms part of a broken line.
If it forms part of a broken line, we need to remember the line and add the next line to it before attempting validation again.
For this we need to know one thing:
What is the expected number of fields each data entry row should have?
int expectedFieldCount = 7;
string brokenLine = "";
using (var streamReader = new StreamReader("results-survey111101.csv"))
{
string line;
while ((line = streamReader.ReadLine()) != null) // read the next line
{
// if the previous line was incomplete, add it to the current line,
// otherwise use the current line
string csvLineData = (brokenLine.Length > 0) ? brokenLine + line : line;
try
{
using (StringReader stringReader = new StringReader(csvLineData ))
using (TextFieldParser parser = new TextFieldParser(stringReader))
{
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields(); // tests if the line is valid csv
if (expectedFieldCount == fields.Length)
{
// do whatever you want with the fields now.
foreach (var field in fields)
{
Console.WriteLine(field);
}
brokenLine = ""; // reset the brokenLine
}
else // it was valid csv, but we don't have the required number of fields yet
{
brokenLine += line + #"\r\n";
break;
}
}
}
}
catch (Exception ex) // the current line is NOT valid csv, update brokenLine
{
brokenLine += (line + #"\r\n");
}
}
}
I am replacing the line breaks that broken lines contain with \r\n literals. You can display these in your resulting one-liner field however you want. But you shouldn't expect to be able to copy paste the result into notepad and see line breaks.

One assumes you have the same number of columns in each record. Therefore in your code where you do your Split you can merely sum the length of splitLine into a running columnsReadCount until they equal the desired columnsPerRecordCount. At that point you have read all the record and can reset the running columnsReadCount back to zero ready for the next record to read.

Issue renaming two columns in a CSV file instead of one

I need to be able to rename the column in a spreadsheet from 'idn_prod' to 'idn_prod1', but there are two columns with this name.
I have tried implementing code from similar posts, but I've only been able to update both columns. Below you'll find the code I have that just renames both columns.
//locate and edit column in csv
string file1 = #"C:\Users\username\Documents\AppDevProjects\import.csv";
string[] lines = System.IO.File.ReadAllLines(file1);
System.IO.StreamWriter sw = new System.IO.StreamWriter(file1);
foreach(string s in lines)
{
sw.WriteLine(s.Replace("idn_prod", "idn_prod1"));
}
I expect only the 2nd column to be renamed, but the actual output is that both are renamed.
Here are the first couple rows of the CSV:

I'm assuming that you only need to update the column header, the actual rows need not be updated.
var file1 = #"test.csv";
var lines = System.IO.File.ReadAllLines(file1);
var columnHeaders = lines[0];
var textToReplace = "idn_prod";
var newText = "idn_prod1";
var indexToReplace = columnHeaders
.LastIndexOf("idn_prod");//LastIndex ensures that you pick the second idn_prod
columnHeaders = columnHeaders
.Remove(indexToReplace,textToReplace.Length)
.Insert(indexToReplace, newText);//I'm removing the second idn_prod and replacing it with the updated value.
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(file1))
{
sw.WriteLine(columnHeaders);
foreach (var str in lines.Skip(1))
{
sw.WriteLine(str);
}
sw.Flush();
}

Replace foreach(string s in lines) loop with
for loop and get the lines count and rename only the 2nd column.

I believe the only way to handle this properly is to crack the header line (first string that has column names) into individual parts, separated by commas or tabs or whatever, and run through the columns one at a time yourself.
Your loop would consider the first line from the file, use the Split function on the delimiter, and look for the column you're interested in:
bool headerSeen = false;
foreach (string s in lines)
{
if (!headerSeen)
{
// special: this is the header
string [] parts = s.Split("\t");
for (int i = 0; i < parts.Length; i++)
{
if (parts[i] == "idn_prod")
{
// only fix the *first* one seen
parts[i] = "idn_prod1";
break;
}
}
sw.WriteLine( string.Join("\t", parts));
headerSeen = true;
}
else
{
sw.WriteLine( s );
}
}
The only reason this is even remotely possible is that it's the header and not the individual lines; headers tend to be more predictable in format, and you worry less about quoting and fields that contain the delimiter, etc.
Trying this on the individual data lines will rarely work reliably: if your delimiter is a comma, what happens if an individual field contains a comma? Then you have to worry about quoting, and this enters all kinds of fun.
For doing any real CSV work in C#, it's really worth looking into a package that specializes in this, and I've been thrilled with CsvHelper from Josh Close. Highly recommended.

csv upload to datatable having comma in column cell as string using C#

I am using split function to upload csv to datatable, but if it gets comma as a string it separates it as a different column.
foreach (var RowItem in GLExtract)
{
string[] Acctid = (**RowItem.ToString()).Split(',');**
string glacct = Acctid[70].ToString();
decimal remitAmt = decimal.Parse(Acctid[47].ToString());
if (acctno==glacct)
{
sum = sum + remitAmt;
dtflatfile.Rows[x]["field10"] = sum;
}
}
Can you please help me with this??

Presumably the field containing a comma is quoted. You can't parse CSV using String.Split on the delimiter in this case.
Either use one of the many third-party CSV parsers that google knows about, or use the TextFieldParser class in the Microsoft.VisualBasic namespace of the .NET framework.

How to handle quotation marks within CSV files?

To read a CSV file, I use the following statement:
var query = from line in rawLines
let data = line.Split(';')
select new
{
col01 = data[0],
col02 = data[1],
col03 = data[2]
};
The CSV file I want to read is malformed in the way, that an entry can have the separator ; itself as data when surrounded with qutation marks.
Example:
col01;col02;col03
data01;"data02;";data03
My read statement above does not work here, since it interprets the second row as four columns.
Question: Is there an easy way to handle this malformed CSV correctly? Perhaps with another LINQ query?

Just use a CSV parser and STOP ROLLING YOUR OWN:
using (var parser = new TextFieldParser("test.csv"))
{
parser.CommentTokens = new string[] { "#" };
parser.SetDelimiters(new string[] { ";" });
parser.HasFieldsEnclosedInQuotes = true;
// Skip over header line.
parser.ReadLine();
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
Console.WriteLine("{0} {1} {2}", fields[0], fields[1], fields[2]);
}
}
TextFieldParser is built in .NET. Just add reference to the Microsoft.VisualBasic assembly and you are good to go. A real CSV parser will happily handle this situation.

Parsing CSV files manually can always lead to issues like this. I would advise that you use a third party tool like CsvHelper to handle the parsing.
Furthermore, it's not a good idea to explicitly parse commas, as your separator can be overridden in your computers environment options.
Let me know if I can help further,
Matt

Not very elegant but after using your method you can check if any colxx contains an unfinished quotation mark (single) you can join it with the next colxx.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

CSV (or excel) parsing ; eliminate empty column - c#

Related

Can someone please confirm the reason behind foreach loop giving error as "invalid token" and "splittedText" as does not exist in current context?

How Can I Read a Multiline Field from a CSV Without Altering It?

Issue renaming two columns in a CSV file instead of one

csv upload to datatable having comma in column cell as string using C#

How to handle quotation marks within CSV files?

Categories

Resources