Reading CSV Files using Fast CsvReader without quotes around fields

Reading CSV Files using Fast CsvReader without quotes around fields - c#

I'm having some issues using Lumenworks Fast CsvReader. Using the code:
using (CsvReader csv = new CsvReader(new StreamReader(Server.MapPath(fileName)), true))
{
csv.ParseError += csv_ParseError;
while (csv.ReadNextRecord())
{
var importItem = new ProductImportItem(csv);
if (!ProductsDALC.SearchByPartProductCode(importItem.ProductCode).Any())
{
if (!SaveProduct(importItem))
{
this.ParseErrors.Add(string.Format("Failed to add product-{0}", importItem.ProductCode));
}
}
}
}
The code works fine when the CSV file is formatted using double quotes either side of the fields/column values e.g:
"product_code", "product_name", "item_description", "sku", "postage_level_required", "cost_price", "retail_price_inc_vat"
However, if the columns look like this:
product_code,product_name,item_description,sku,postage_level_required,cost_price,retail_price_inc_vat
Then the code behaves as if there is no data, that is to say, it won't enter into the while loop and enumerating the result set in the debugger will show that it yields no results.
This would be fine if I had absolute control over the data in/out. However, all I can do is provide the user with a template which contains the fields and hope that they wrap the data in quotes. This isn't an acceptable approach.
Is there a way to get the reader to parse data even if it isn't wrapped in quotes?
I'm aware of the TextFieldParser class built into .Net which handles this fine but since we're using CsvReader elsewhere in the projec it would be good to remain consistent.

You have to provide the information that the fields aren't quoted in the constructor by using the unicode "null" character:
Char quotingCharacter = '\0'; // means none
Char escapeCharacter = '\0';
Char commentCharacter = '\0';
Char delimiter = ',';
bool hasHeader = true;
using (var csv = new CsvReader(reader, hasHeader, delimiter, quotingCharacter, escapeCharacter, commentCharacter, ValueTrimmingOptions.All))
{
// ...
}

Related

Convert a String, which is already malformed

I have a class, which uses another class which reads a Textfile.
The Textfile is written in Ascii or to be clear CP1525.
Background info: The Textfile is generated in Axapta and uses the ASCIIio class which writes the text by using the writeRaw method
The class which I am using is by a collegue and he is using a C# StreamReader to read files. Normally this works okay because the files are written in UTF8, but in this particular case it isn't.
So the Streamreader reads the file as UTF8 and passes the read string to me.
I now have some letters, like for example the Lating small letter o with Diaeresis (ö) which aren't formated as I would need them to be.
A simple convert of the String doesn't help in this case and I can't figure out how I can get the right letters.
So this is basically how he reads it:
char quotationChar = '"';
String line = "";
using (StreamReader reader = new StreamReader(fileName))
{
if((line = reader.ReadLine()) != null)
{
line = line.Replace(quotationChar.ToString(), "");
}
}
return line;
What now happens is, in the Textfile I have the german word "Röhre" which, after reading it with the streamreader, transforms to R�hre (which looks stupid in a database).
I could try to convert every letter
Encoding enc = Encoding.GetEncoding(1252);
byte[] utf8_Bytes = new byte[line.Length];
for (int i = 0; i < line.Length; ++i)
{
utf8_Bytes[i] = (byte)line[i];
}
String propEncodeString = enc.GetString(utf8_Bytes, 0, utf8_Bytes.Length);
That doesn't give me the right character !
byte[] myarr = Encoding.UTF8.GetBytes(line);
String propEncodeString = enc.GetString(myarr);
That also returns the wrong character.
I am aware that I could just solve the problem by using this:
using (StreamReader reader = new StreamReader(fileName, Encoding.Default, true))
But just for fun:
How can I get the right string from an already wrongly decoded string ?

Once the UTF8 to ASCII conversion is first made, all characters that don't map to valid ASCII entries are replaced with the same bad data character which means that data is just lost and you can't simply 'convert' back to a good character downstream. See this example: https://dotnetfiddle.net/XWysml

How to bypass comma , double quote using Lumenworks

I am fetching data from a database in csv format.i.e http://iapp250.dev.sx.com:5011/a.csv?select[>date]from employee
But some columns in the table contains comma and double quotes and after reading the csv it becomes comma separated string.As a result I get index out of bound exception when I deserialize it.
I decided to use Lumenworks csv reader after reading some of the posts.I proceeded in below way but still could not fix it.Please find the below code snippet.
List<List<string>> LineFields = new List<List<string>>();
using(var reader = new StreamReader(siteminderresponse.getResponseStream())
{
Char quotingCharacter = '\0';
Char escapeCharacter = quotingCharacter;
Char delimiter = '|';
using (var csv = new CsvReader(reader, true, delimiter, quotingCharacter, escapeCharacter, '\0', ValueTrimmingOptions.All))
{
csv.DefaultParseErrorAction = ParseErrorAction.ThrowException;
//csv.ParseError += csv_ParseError;
csv.SkipEmptyLines = true;
while (csv.ReadNextRecord())
{
List<string> fields = new List<string>(csv.FieldCount);
for (int i = 0; i < csv.FieldCount; i++)
{
try
{
string field = csv[i];
fields.Add(field.Trim('"'));
} catch (MalformedCsvException ex)
{
throw;
}
}
LineFields Add(fields);
}
}
}
End result is I get comma separated fields like
1,16:00:01,BUY,-2,***ROBERTS, K***.
Please see the ROBERTS, K which was a single column value and now is comma separated due to which my serialization fails.

This appears to be a problem with the way you're formatting the results of your query. If you want "ROBERTS, K" to be read as a single field by a CSV reader, you need put quotes around "ROBERTS, K" in your input to the CSV reader. LumenWorks is only doing what it's supposed to.
Similarly, if you want literal double-quotes in your parsed fields, you need to escape them with another double-quote. So, to properly express this as a single field:
Roberts, K is 6" taller than I am
...you'd need to pass this into the CSV parser:
"Roberts, K is 6"" taller than I am"

Read csv using comma separated ignore double quotes and space using streamreader

I have a CSV whose data when open in notepad looks like:
TEST DATA1,,,,,,
TEST DATA,,,,,,
",12:10,,10:30",,,,",,11:30",,",,12:30"
location,Value1,,Value2,,Value3
mumbai-20,1.2,,,2.3,,3.4
pune-33,1.8,,,2.34,,4.5
I want to read this using streamReader ignoring the commas, double quotes and spaces.
The code which i am using is
var reader = new StreamReader(File.OpenRead(#"D:\Test.csv"));
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var array = line.Split(',');
foreach (var element in array)
Console.WriteLine(element);
}

String.Split() is horrible, awful way to handle CSV data. There are all kinds of edge cases that it can't handle. You want a dedicated parser instead.
This replicates your sample using my CSV parser:
foreach (IList<string> row in CSV.FromFile(#"D:\Test.csv") )
{
foreach(string element in row)
{
Console.WriteLine(element);
}
}
It's just one source file to include with your code.

Remove a specific column from a delimited file

I've been working with some big delimited text (~1GB) files these days. It looks like somewhat below
COlumn1 #COlumn2#COlumn3#COlumn4
COlumn1#COlumn2#COlumn3 #COlumn4
where # is the delimiter.
In case a column is invalid I might have to remove it from the whole text file. The output file when Column 3 is invalid should look like this.
COlumn1 #COlumn2#COlumn4
COlumn1#COlumn2#COlumn4
string line = "COlumn1# COlumn2 #COlumn3# COlumn4";
int junk =3;
int columncount = line.Split(new char[] { '#' }, StringSplitOptions.None).Count();
//remove the [junk-1]th '#' and the value till [junk]th '#'
//"COlumn1# COlumn2 # COlumn4"
I's not able to find a c# version of this in SO. Is there a way I can do that? Please help.
EDIT:
The solution which I found myself is like below which does the job. Is there a way I could modify this to a better way so that it narrows down the performance impact it might have in case of large text files?
int junk = 3;
string line = "COlumn1#COlumn2#COlumn3#COlumn4";
int counter = 0;
int colcount = line.Split(new char[] { '#' }, StringSplitOptions.None).Length - 1;
string[] linearray = line.Split(new char[] { '#' }, StringSplitOptions.None);
List<string> linelist = linearray.ToList();
linelist.RemoveAt(junk - 1);
string finalline = string.Empty;
foreach (string s in linelist)
{
counter++;
finalline += s;
if (counter < colcount)
finalline += "#";
}
Console.WriteLine(finalline);

EDITED
This method can be very memory expensive, as your can read in this post, the suggestion should be:
If you need to run complex queries against the data in the file, the right thing to do is to load the data to database and let DBMS to take care of data retrieval and memory management.
To avoid memory consumption you should use a StreamReader to read file line by line
This could be a start for your task, missing your invalid match logic
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
const string fileName = "temp.txt";
var results = FindInvalidColumns(fileName);
using (var reader = File.OpenText(fileName))
{
while (!reader.EndOfStream)
{
var builder = new StringBuilder();
var line = reader.ReadLine();
if (line == null) continue;
var split = line.Split(new[] { "#" }, 0);
for (var i = 0; i < split.Length; i++)
if (!results.Contains(i))
builder.Append(split[i]);
using (var fs = new FileStream("new.txt", FileMode.Append, FileAccess.Write))
using (var sw = new StreamWriter(fs))
{
sw.WriteLine(builder.ToString());
}
}
}
}
private static List<int> FindInvalidColumns(string fileName)
{
var invalidColumnIndexes = new List<int>();
using (var reader = File.OpenText(fileName))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line == null) continue;
var split = line.Split(new[] { "#" }, 0);
for (var i = 0; i < split.Length; i++)
{
if (IsInvalid(split[i]) && !invalidColumnIndexes.Contains(i))
invalidColumnIndexes.Add(i);
}
}
}
return invalidColumnIndexes;
}
private static bool IsInvalid(string s)
{
return false;
}
}
}

First, what you will do is re-write the line to a text file using a 0-length string for COlumn3. Therefore the line after being written correctly would look like this:
COlumun1#COlumn2##COlumn4
As you can see, there are two delimiters between COlumn2 and COlumn4. This is a cell with no data in it. (By "cell" I mean one column of a certain, single row.) Later, when some other process reads this using the Split function, it will still create a new value for Column 3, but in the array generated by Split, the 3rd position would be an empty string:
String[] columns = stream_reader.ReadLine().Split('#');
int lengthOfThirdItem = columns[2].Length; // for proof
// lengthOfThirdItem = 0
This reduces invalid values to null and persists them back in the text file.
For more on String.Split see C# StreamReader save to Array with separator.
It is not possible to write to lines internal to a text file while it is also open for read. This article discusses it some (simultaneous read-write a file in C#), but it looks like that question-asker just wants to be able to write lines to the end. You want to be able to write lines at any point in the interior. I think this is not possible without buffering the data in some way.
The simplest way to buffer the data is rename the file to a temp file first (using File.CoMovepy() // http://msdn.microsoft.com/en-us/library/system.io.file.move(v=vs.110).aspx). Then use the temp file as the data source. Just open the temp file that to read in the data which may have corrupt entries, and write the data afresh to the original file name using the approach I describe above to represent empty columns. After this is complete, then you should delete the temp file.
Important
Deleting the temp file may leave you vulnerable to power and data transients (or software 'transients'). (I.e., a power drop that interrupts part of the process could leave the data in an unusable state.) So you may also want to leave the temp file on the drive as an emergency backup in case of some problem.

C# CSV file to array/list

I want to read 4-5 CSV files in some array in C#
I know that this question is been asked and I have gone through them...
But my use of CSVs is too much simpler for that...
I have csv fiels with columns of following data types....
string , string
These strings are without ',' so no tension...
That's it. And they aren't much big. Only about 20 records in each.
I just want to read them into array of C#....
Is there any very very simple and direct way to do that?

To read the file, use
TextReader reader = File.OpenText(filename);
To read a line:
string line = reader.ReadLine()
then
string[] tokens = line.Split(',');
to separate them.
By using a loop around the two last example lines, you could add each array of tokens into a list, if that's what you need.

This one includes the quotes & commas in fields. (assumes you're doing a line at a time)
using Microsoft.VisualBasic.FileIO; //For TextFieldParser
// blah blah blah
StringReader csv_reader = new StringReader(csv_line);
TextFieldParser csv_parser = new TextFieldParser(csv_reader);
csv_parser.SetDelimiters(",");
csv_parser.HasFieldsEnclosedInQuotes = true;
string[] csv_array = csv_parser.ReadFields();

Here is a simple way to get a CSV content to an array of strings. The CSV file can have double quotes, carriage return line feeds and the delimiter is a comma.
Here are the libraries that you need:
System.IO;
System.Collection.Generic;
System.IO is for FileStream and StreamReader class to access your file. Both classes implement the IDisposable interface, so you can use the using statements to close your streams. (example below)
System.Collection.Generic namespace is for collections, such as IList,List, and ArrayList, etc... In this example, we'll use the List class, because Lists are better than Arrays in my honest opinion. However, before I return our outbound variable, i'll call the .ToArray() member method to return the array.
There are many ways to get content from your file, I personally prefer to use a while(condition) loop to iterate over the contents. In the condition clause, use !lReader.EndOfStream. While not end of stream, continue iterating over the file.
public string[] GetCsvContent(string iFileName)
{
List<string> oCsvContent = new List<string>();
using (FileStream lFileStream =
new FileStream(iFilename, FileMode.Open, FileAccess.Read))
{
StringBuilder lFileContent = new StringBuilder();
using (StreamReader lReader = new StreamReader(lFileStream))
{
// flag if a double quote is found
bool lContainsDoubleQuotes = false;
// a string for the csv value
string lCsvValue = "";
// loop through the file until you read the end
while (!lReader.EndOfStream)
{
// stores each line in a variable
string lCsvLine = lReader.ReadLine();
// for each character in the line...
foreach (char lLetter in lCsvLine)
{
// check if the character is a double quote
if (lLetter == '"')
{
if (!lContainsDoubleQuotes)
{
lContainsDoubleQuotes = true;
}
else
{
lContainsDoubleQuotes = false;
}
}
// if we come across a comma
// AND it's not within a double quote..
if (lLetter == ',' && !lContainsDoubleQuotes)
{
// add our string to the array
oCsvContent.Add(lCsvValue);
// null out our string
lCsvValue = "";
}
else
{
// add the character to our string
lCsvValue += lLetter;
}
}
}
}
}
return oCsvContent.ToArray();
}
Hope this helps! Very easy and very quick.
Cheers!

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.