How to bypass comma , double quote using Lumenworks - c#

I am fetching data from a database in csv format.i.e http://iapp250.dev.sx.com:5011/a.csv?select[>date]from employee
But some columns in the table contains comma and double quotes and after reading the csv it becomes comma separated string.As a result I get index out of bound exception when I deserialize it.
I decided to use Lumenworks csv reader after reading some of the posts.I proceeded in below way but still could not fix it.Please find the below code snippet.
List<List<string>> LineFields = new List<List<string>>();
using(var reader = new StreamReader(siteminderresponse.getResponseStream())
{
Char quotingCharacter = '\0';
Char escapeCharacter = quotingCharacter;
Char delimiter = '|';
using (var csv = new CsvReader(reader, true, delimiter, quotingCharacter, escapeCharacter, '\0', ValueTrimmingOptions.All))
{
csv.DefaultParseErrorAction = ParseErrorAction.ThrowException;
//csv.ParseError += csv_ParseError;
csv.SkipEmptyLines = true;
while (csv.ReadNextRecord())
{
List<string> fields = new List<string>(csv.FieldCount);
for (int i = 0; i < csv.FieldCount; i++)
{
try
{
string field = csv[i];
fields.Add(field.Trim('"'));
} catch (MalformedCsvException ex)
{
throw;
}
}
LineFields Add(fields);
}
}
}
End result is I get comma separated fields like
1,16:00:01,BUY,-2,***ROBERTS, K***.
Please see the ROBERTS, K which was a single column value and now is comma separated due to which my serialization fails.

This appears to be a problem with the way you're formatting the results of your query. If you want "ROBERTS, K" to be read as a single field by a CSV reader, you need put quotes around "ROBERTS, K" in your input to the CSV reader. LumenWorks is only doing what it's supposed to.
Similarly, if you want literal double-quotes in your parsed fields, you need to escape them with another double-quote. So, to properly express this as a single field:
Roberts, K is 6" taller than I am
...you'd need to pass this into the CSV parser:
"Roberts, K is 6"" taller than I am"

Related

Escape comma(,) from a csv cell while while exporting its data to database table

I have a csv file in which there is a field having comma in it. e.g under office location column I have a value xyz, building. When i checked the value through debugger it only shows "\"xyz". I have tried escaping the comma and backward slash by using Replace(",","") and Replace("\"","") but it failed. Also I am getting extra \ in the result as marked in red circle.
I have attached the image while debugging showing the structure of the csv row. The problem is in the red circle area.
I have also tried following function:
public static string RemoveColumnDelimitersInsideValues(string input)
{
const char valueDelimiter = '"';
const char columnDelimiter = ',';
StringBuilder output = new StringBuilder();
bool isInsideValue = false;
for (var i = 0; i < input.Length; i++)
{
var currentChar = input[i];
if (currentChar == valueDelimiter)
{
isInsideValue = !isInsideValue;
output.Append(currentChar);
continue;
}
if (currentChar != columnDelimiter || !isInsideValue)
{
output.Append(currentChar);
}
}
return output.ToString();
}
Kindly help in resolving the issues. Thanks
The \ character you see is not in the actual string, it's just an escaping character added in the debugger view.
Click on the magnifier to get the actual value of the string.
Hope it helps.
Try using TextFieldParser, in csv if the column value has comma the column value is escaped with qoutes, so adding HasFieldsEnclosedInQuotes to true will automatically read it as single column.
using Microsoft.VisualBasic.FileIO;
using (TextFieldParser reader = new TextFieldParser(csvpath))
{
reader.Delimiters = new string[] { "," };
reader.HasFieldsEnclosedInQuotes = true;
string[] col = reader.ReadFields();
}
String.Replace doesn't modify the existing string, it returns a new one. Because of that, you have the same old row string outside IsNullOrEmpty check.
Also, you are telling, you are trying to escape comma and quotes, but from you are removing it in your code.
If you want to remove commas and quotes, your code may look like
if (string.IsNullOrEmpty(row))
{
row = row.Replace(",", "").Replace("\"", "");
}
If you want to escape quotes and commas, your code may look like
if (row != null && row.Contains(","))
{
row = "\"" + row.Replace("\"", "\"\"") + "\"";
}
There are 3 issues with your code that are worth pointing out.
1. Parsing a CSV can be tricky
Would you code handle a multiline string correctly? Would you code handle a " inside one of the columns (so an escaped ")?
I recommend using a csv reading libary (aka NuGet package).
There is no backslash
Here is a file.
1,"The string in the first row has a comma, and an f, in it"
2,The string in the 2nd row does not have a comma in it
Here is what Visual Studio shows (I'm using VS Code here).
Here is what Console.WriteLine prints.
1,"The string in the first row has a comma, and an f, in it"
2,The string in the 2nd row does not have a comma in it
3. Replacing commas
Even if you deal with the quotes, wouldn't replacing commans get rid of the field delimiter?

Reading CSV Files using Fast CsvReader without quotes around fields

I'm having some issues using Lumenworks Fast CsvReader. Using the code:
using (CsvReader csv = new CsvReader(new StreamReader(Server.MapPath(fileName)), true))
{
csv.ParseError += csv_ParseError;
while (csv.ReadNextRecord())
{
var importItem = new ProductImportItem(csv);
if (!ProductsDALC.SearchByPartProductCode(importItem.ProductCode).Any())
{
if (!SaveProduct(importItem))
{
this.ParseErrors.Add(string.Format("Failed to add product-{0}", importItem.ProductCode));
}
}
}
}
The code works fine when the CSV file is formatted using double quotes either side of the fields/column values e.g:
"product_code", "product_name", "item_description", "sku", "postage_level_required", "cost_price", "retail_price_inc_vat"
However, if the columns look like this:
product_code,product_name,item_description,sku,postage_level_required,cost_price,retail_price_inc_vat
Then the code behaves as if there is no data, that is to say, it won't enter into the while loop and enumerating the result set in the debugger will show that it yields no results.
This would be fine if I had absolute control over the data in/out. However, all I can do is provide the user with a template which contains the fields and hope that they wrap the data in quotes. This isn't an acceptable approach.
Is there a way to get the reader to parse data even if it isn't wrapped in quotes?
I'm aware of the TextFieldParser class built into .Net which handles this fine but since we're using CsvReader elsewhere in the projec it would be good to remain consistent.
You have to provide the information that the fields aren't quoted in the constructor by using the unicode "null" character:
Char quotingCharacter = '\0'; // means none
Char escapeCharacter = '\0';
Char commentCharacter = '\0';
Char delimiter = ',';
bool hasHeader = true;
using (var csv = new CsvReader(reader, hasHeader, delimiter, quotingCharacter, escapeCharacter, commentCharacter, ValueTrimmingOptions.All))
{
// ...
}

Remove a specific column from a delimited file

I've been working with some big delimited text (~1GB) files these days. It looks like somewhat below
COlumn1 #COlumn2#COlumn3#COlumn4
COlumn1#COlumn2#COlumn3 #COlumn4
where # is the delimiter.
In case a column is invalid I might have to remove it from the whole text file. The output file when Column 3 is invalid should look like this.
COlumn1 #COlumn2#COlumn4
COlumn1#COlumn2#COlumn4
string line = "COlumn1# COlumn2 #COlumn3# COlumn4";
int junk =3;
int columncount = line.Split(new char[] { '#' }, StringSplitOptions.None).Count();
//remove the [junk-1]th '#' and the value till [junk]th '#'
//"COlumn1# COlumn2 # COlumn4"
I's not able to find a c# version of this in SO. Is there a way I can do that? Please help.
EDIT:
The solution which I found myself is like below which does the job. Is there a way I could modify this to a better way so that it narrows down the performance impact it might have in case of large text files?
int junk = 3;
string line = "COlumn1#COlumn2#COlumn3#COlumn4";
int counter = 0;
int colcount = line.Split(new char[] { '#' }, StringSplitOptions.None).Length - 1;
string[] linearray = line.Split(new char[] { '#' }, StringSplitOptions.None);
List<string> linelist = linearray.ToList();
linelist.RemoveAt(junk - 1);
string finalline = string.Empty;
foreach (string s in linelist)
{
counter++;
finalline += s;
if (counter < colcount)
finalline += "#";
}
Console.WriteLine(finalline);
EDITED
This method can be very memory expensive, as your can read in this post, the suggestion should be:
If you need to run complex queries against the data in the file, the right thing to do is to load the data to database and let DBMS to take care of data retrieval and memory management.
To avoid memory consumption you should use a StreamReader to read file line by line
This could be a start for your task, missing your invalid match logic
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
const string fileName = "temp.txt";
var results = FindInvalidColumns(fileName);
using (var reader = File.OpenText(fileName))
{
while (!reader.EndOfStream)
{
var builder = new StringBuilder();
var line = reader.ReadLine();
if (line == null) continue;
var split = line.Split(new[] { "#" }, 0);
for (var i = 0; i < split.Length; i++)
if (!results.Contains(i))
builder.Append(split[i]);
using (var fs = new FileStream("new.txt", FileMode.Append, FileAccess.Write))
using (var sw = new StreamWriter(fs))
{
sw.WriteLine(builder.ToString());
}
}
}
}
private static List<int> FindInvalidColumns(string fileName)
{
var invalidColumnIndexes = new List<int>();
using (var reader = File.OpenText(fileName))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (line == null) continue;
var split = line.Split(new[] { "#" }, 0);
for (var i = 0; i < split.Length; i++)
{
if (IsInvalid(split[i]) && !invalidColumnIndexes.Contains(i))
invalidColumnIndexes.Add(i);
}
}
}
return invalidColumnIndexes;
}
private static bool IsInvalid(string s)
{
return false;
}
}
}
First, what you will do is re-write the line to a text file using a 0-length string for COlumn3. Therefore the line after being written correctly would look like this:
COlumun1#COlumn2##COlumn4
As you can see, there are two delimiters between COlumn2 and COlumn4. This is a cell with no data in it. (By "cell" I mean one column of a certain, single row.) Later, when some other process reads this using the Split function, it will still create a new value for Column 3, but in the array generated by Split, the 3rd position would be an empty string:
String[] columns = stream_reader.ReadLine().Split('#');
int lengthOfThirdItem = columns[2].Length; // for proof
// lengthOfThirdItem = 0
This reduces invalid values to null and persists them back in the text file.
For more on String.Split see C# StreamReader save to Array with separator.
It is not possible to write to lines internal to a text file while it is also open for read. This article discusses it some (simultaneous read-write a file in C#), but it looks like that question-asker just wants to be able to write lines to the end. You want to be able to write lines at any point in the interior. I think this is not possible without buffering the data in some way.
The simplest way to buffer the data is rename the file to a temp file first (using File.CoMovepy() // http://msdn.microsoft.com/en-us/library/system.io.file.move(v=vs.110).aspx). Then use the temp file as the data source. Just open the temp file that to read in the data which may have corrupt entries, and write the data afresh to the original file name using the approach I describe above to represent empty columns. After this is complete, then you should delete the temp file.
Important
Deleting the temp file may leave you vulnerable to power and data transients (or software 'transients'). (I.e., a power drop that interrupts part of the process could leave the data in an unusable state.) So you may also want to leave the temp file on the drive as an emergency backup in case of some problem.

C# CSV file to array/list

I want to read 4-5 CSV files in some array in C#
I know that this question is been asked and I have gone through them...
But my use of CSVs is too much simpler for that...
I have csv fiels with columns of following data types....
string , string
These strings are without ',' so no tension...
That's it. And they aren't much big. Only about 20 records in each.
I just want to read them into array of C#....
Is there any very very simple and direct way to do that?
To read the file, use
TextReader reader = File.OpenText(filename);
To read a line:
string line = reader.ReadLine()
then
string[] tokens = line.Split(',');
to separate them.
By using a loop around the two last example lines, you could add each array of tokens into a list, if that's what you need.
This one includes the quotes & commas in fields. (assumes you're doing a line at a time)
using Microsoft.VisualBasic.FileIO; //For TextFieldParser
// blah blah blah
StringReader csv_reader = new StringReader(csv_line);
TextFieldParser csv_parser = new TextFieldParser(csv_reader);
csv_parser.SetDelimiters(",");
csv_parser.HasFieldsEnclosedInQuotes = true;
string[] csv_array = csv_parser.ReadFields();
Here is a simple way to get a CSV content to an array of strings. The CSV file can have double quotes, carriage return line feeds and the delimiter is a comma.
Here are the libraries that you need:
System.IO;
System.Collection.Generic;
System.IO is for FileStream and StreamReader class to access your file. Both classes implement the IDisposable interface, so you can use the using statements to close your streams. (example below)
System.Collection.Generic namespace is for collections, such as IList,List, and ArrayList, etc... In this example, we'll use the List class, because Lists are better than Arrays in my honest opinion. However, before I return our outbound variable, i'll call the .ToArray() member method to return the array.
There are many ways to get content from your file, I personally prefer to use a while(condition) loop to iterate over the contents. In the condition clause, use !lReader.EndOfStream. While not end of stream, continue iterating over the file.
public string[] GetCsvContent(string iFileName)
{
List<string> oCsvContent = new List<string>();
using (FileStream lFileStream =
new FileStream(iFilename, FileMode.Open, FileAccess.Read))
{
StringBuilder lFileContent = new StringBuilder();
using (StreamReader lReader = new StreamReader(lFileStream))
{
// flag if a double quote is found
bool lContainsDoubleQuotes = false;
// a string for the csv value
string lCsvValue = "";
// loop through the file until you read the end
while (!lReader.EndOfStream)
{
// stores each line in a variable
string lCsvLine = lReader.ReadLine();
// for each character in the line...
foreach (char lLetter in lCsvLine)
{
// check if the character is a double quote
if (lLetter == '"')
{
if (!lContainsDoubleQuotes)
{
lContainsDoubleQuotes = true;
}
else
{
lContainsDoubleQuotes = false;
}
}
// if we come across a comma
// AND it's not within a double quote..
if (lLetter == ',' && !lContainsDoubleQuotes)
{
// add our string to the array
oCsvContent.Add(lCsvValue);
// null out our string
lCsvValue = "";
}
else
{
// add the character to our string
lCsvValue += lLetter;
}
}
}
}
}
return oCsvContent.ToArray();
}
Hope this helps! Very easy and very quick.
Cheers!

C# Regex Split - commas outside quotes

I got quite a lot of strings (segments of SQL code, actually) with the following format:
('ABCDEFG', 123542, 'XYZ 99,9')
and i need to split this string, using C#, in order to get:
'ABCDEFG'
123542
'XYZ 99,9'
I was originally using a simple Split(','), but since that comma inside the last parameter is causing havoc in the output i need to use Regex to get it. The problem is that i'm still quite noobish in regular expressions and i can't seem to crack the pattern mainly because inside that string both numerical and alpha-numerical parameters may exist at any time...
What could i use to split that string according to every comma outside the quotes?
Cheers
You could split on all commas, that do have an even number of quotes following them , using the following Regex to find them:
",(?=(?:[^']*'[^']*')*[^']*$)"
You'd use it like
var result = Regex.Split(samplestring, ",(?=(?:[^']*'[^']*')*[^']*$)");
//this regular expression splits string on the separator character NOT inside double quotes.
//separatorChar can be any character like comma or semicolon etc.
//it also allows single quotes inside the string value: e.g. "Mike's Kitchen","Jane's Room"
Regex regx = new Regex(separatorChar + "(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
string[] line = regx.Split(string to split);
I had a problem where it wasn't capturing empty columns. I modified it as such to get empty string results
var results = Regex.Split(source, "[,]{1}(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
although I too like a challenge some of the time, but this actually isn't one.
please read this article http://secretgeek.net/csv_trouble.asp
and then go on and use http://www.filehelpers.com/
[Edit1, 3]:
or maybe this article can help too (the link only shows some VB.Net sample code but still, you can use it with C# too!): http://msdn.microsoft.com/en-us/library/cakac7e6.aspx
I've tried to do the sample for C# (add reference to Microsoft.VisualBasic to your project)
using System;
using System.IO;
using Microsoft.VisualBasic.FileIO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
TextReader reader = new StringReader("('ABCDEFG', 123542, 'XYZ 99,9')");
TextFieldParser fieldParser = new TextFieldParser(reader);
fieldParser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
fieldParser.SetDelimiters(",");
String[] currentRow;
while (!fieldParser.EndOfData)
{
try
{
currentRow = fieldParser.ReadFields();
foreach(String currentField in currentRow)
{
Console.WriteLine(currentField);
}
}
catch (MalformedLineException e)
{
Console.WriteLine("Line {0} is not valid and will be skipped.", e);
}
}
}
}
}
[Edit2]:
found another one which could be of help here: http://www.codeproject.com/KB/database/CsvReader.aspx
-- reinhard
Try (hacked from Jens') in the split method:
",(?:.*?'[^']*?')"
or just add question marks after Jens' *'s, that makes it lazy rather than greedy.
... or you could have installed NuGet package LumenWorks CsvReader and done something like below where I read a csv file which has content like for example
"hello","how","hello, how are you"
"hi","hello","greetings"
...
and process it like this
public static void ProcessCsv()
{
var filename = #"your_file_path\filename.csv";
DataTable dt = new DataTable("MyTable");
List<string> product_codes = new List<string>();
using (CsvReader csv = new CsvReader(new StreamReader(filename), true))
{
int fieldCount = csv.FieldCount;
string[] headers = csv.GetFieldHeaders();
for (int i = 0; i < headers.Length; i++)
{
dt.Columns.Add(headers[i], typeof(string));
}
while (csv.ReadNextRecord())
{
DataRow dr = dt.NewRow();
for (int i = 0; i < fieldCount; i++)
{
product_codes.Add(csv[i]);
dr[i] = csv[i];
}
dt.Rows.Add(dr);
}
}
}
The accepted answer does not work for me (can put in and test at Regexr-dot-com and see that does not work). So I had to read the lines into an array of lines. Use (C#) Regex.Matches to get an array of any strings found between escaped quotes (your in-field commas should be in fields wrapped in quotes), and replace commas with || before splitting each line into columns/fields. After splitting each line, I looped each line and column to replace || with commas.
private static IEnumerable<string[]> ReadCsv(string fileName, char delimiter = ';')
{
string[] lines = File.ReadAllLines(fileName, Encoding.ASCII);
// Before splitting on comma for a field array, we have to replace commas witin the fields
for(int l = 1; l < lines.Length; l++)
{
//(\\")(.*?)(\\")
MatchCollection regexGroup2 = Regex.Matches(lines[l], "(\\\")(.*?)(\\\")");
if (regexGroup2.Count > 0)
{
for (int g = 0; g < regexGroup2.Count; g++)
{
lines[l] = lines[l].Replace(regexGroup2[g].Value, regexGroup2[g].Value.Replace(",", "||"));
}
}
}
// Split
IEnumerable<string[]> lines_split = lines.Select(a => a.Split(delimiter));
// Reapply commas
foreach(string[] row in lines_split)
{
for(int c = 0; c < row.Length; c++)
{
row[c] = row[c].Replace("||", ",");
}
}
return (lines_split);
}

Categories

Resources