CSV appears to be corrupt on Double quotes in Headers - C# - c#

I was trying to read CSV file in C#.
I have tried File.ReadAllLines(path).Select(a => a.Split(';')) way but the issue is when there is \n multiple line in a cell it is not working.
So I have tried below
using LumenWorks.Framework.IO.Csv;
var csvTable = new DataTable();
using (TextReader fileReader = File.OpenText(path))
using (var csvReader = new CsvReader(fileReader, false))
{
csvTable.Load(csvReader);
}
for (int i = 0; i < csvTable.Rows.Count; i++)
{
if (!(csvTable.Rows[i][0] is DBNull))
{
var row1= csvTable.Rows[i][0];
}
if (!(csvTable.Rows[i][1] is DBNull))
{
var row2= csvTable.Rows[i][1];
}
}
The issue is the above code throwing exception as
The CSV appears to be corrupt near record '0' field '5 at position '63'
This is because the header of CSV's having two double quote as below
"Header1",""Header2""
Is there a way that I can ignore double quotes and process the CSV's.
update
I have tried with TextFieldParser as below
public static void GetCSVData()
{
using (var parser = new TextFieldParser(path))
{
parser.HasFieldsEnclosedInQuotes = false;
parser.Delimiters = new[] { "," };
while (parser.PeekChars(1) != null)
{
string[] fields = parser.ReadFields();
foreach (var field in fields)
{
Console.Write(field + " ");
}
Console.WriteLine(Environment.NewLine);
}
}
}
The output:
Sample CSV data I have used:
Any help is appreciated.

Hope this works!
Please replace two double quotes as below from csv:
using (FileStream fs = new FileStream(Path, FileMode.Open, FileAccess.ReadWrite, FileShare.None))
{
StreamReader sr = new StreamReader(fs);
string contents = sr.ReadToEnd();
// replace "" with "
contents = contents.Replace("\"\"", "\"");
// go back to the beginning of the stream
fs.Seek(0, SeekOrigin.Begin);
// adjust the length to make sure all original
// contents is overritten
fs.SetLength(contents.Length);
StreamWriter sw = new StreamWriter(fs);
sw.Write(contents);
sw.Close();
}
Then use the same CSV helper
using LumenWorks.Framework.IO.Csv;
var csvTable = new DataTable();
using (TextReader fileReader = File.OpenText(path))
using (var csvReader = new CsvReader(fileReader, false))
{
csvTable.Load(csvReader);
}
Thanks.

Related

StreamWriter only writes one line

I am trying to write from a .csv file to a new file.
Every time StreamWriter writes, it writes to the first line of the new file. It then overwrites that line with the next string, and continues to do so until StreamReader reaches EndOfStream.
Has anybody ever experienced this? How did you overcome it?
This is my first solution outside of those required in by my school work. There is an unknown number of rows in the original file. Each row of the .csv file has only 17 columns. I need to write only three of them and in the order found in the code snippet below.
Before coding the StreamWriter I used Console.WriteLine() to make sure that each line was in the correct order.
Here is the code snippet:
{
string path = # "c:\directory\file.csv";
string newPath = # "c:\directory\newFile.csv"
using(FileStream fs = new FileStream(path, FileMode.Open))
{
using(StreamReader sr = new StreamReader(fs))
{
string line;
string[] columns;
while ((line = sr.ReadLine()) != null)
{
columns = line.Split(',');
using(FileStream aFStream = new FileStream(
newPath,
FileMode.OpenOrCreate,
FileAccess.ReadWrite))
using(StreamWriter sw = new StreamWriter(aFStream))
{
sw.WriteLine(columns[13] + ',' + columns[10] + ',' + columns[16]);
sw.Flush();
sw.WriteLine(sw.NewLine);
}
}
}
}
}
You should open the target in the same scope as you are opening the source instead of doing so in the loop which will cause you to overwrite the file every time with the FileMode option OpenOrCreate.
var path = #"c:\directory\file.csv";
var newPath = #"c:\directory\newFile.csv"
using(var sr = new StreamReader(new FileStream(path, FileMode.Open)))
using(var sw = new StreamWriter(new FileStream(newPath, FileMode.OpenOrCreate, FileAccess.ReadWrite)))
{
while(!sr.EndOfStream)
{
string line = sr.ReadLine();
var columns = line.Split(',');
sw.WriteLine(columns[13] + ',' + columns[10] + ',' + columns[16]);
sw.WriteLine(sw.NewLine);
}
sw.Flush();
}
I also hope you are sure about your CSV spacing as you are hard coding the positions in your code.
To correctly fix your code, you'll want to structure more:
public void CopyFileContentToLog()
{
var document = ReadByLine();
WriteToFile(document);
}
public IEnumerable<string> ReadByLine()
{
string line;
using(StreamReader reader = File.OpenText(...))
while ((line = reader.ReadLine()) != null)
yield return line;
}
public void WriteToFile(IEnumerable<string> contents)
{
using(StreamWriter writer = new StreamWriter(...))
{
foreach(var line in contents)
writer.WriteLine(line);
writer.Flush();
}
}
You could obviously tailor and make it a bit more flexible. But this should demonstrate and resolve some of the issues you have with your loop and streams.
First off, you are creating and closing a write stream to the same file for every single line. This means the file gets overwritten every line. You want to take your using block outside of the while loop; however, if you insist on opening and closing the write stream for every single line, then you need to use FileMode.Append
{
string path=#"c:\directory\file.csv";
string newPath=#"c:\directory\newFile.csv"
using(StreamReader sr = new StreamReader(new FileStream(path, FileMode.Open))) // no need for 2 usings
using (FileStream aFStream = new FileStream (newPath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
string line;
string[] columns;
{
while((line = sr.ReadLine()) != null)
{
columns = line.Split(',');
using (StreamWriter sw = new StreamWriter(aFStream))
{
sw.WriteLine(columns[13] + ',' + columns[10] + ',' + columns[16]);
sw.Flush();
sw.WriteLine(sw.NewLine);
}
}
}
}
}

C# ZipArchive losing data

I'm trying to copy the contents of one Excel file to another Excel file while replacing a string inside of the file on the copy. It's working for the most part, but the file is losing 27 kb of data. Any suggestions?
public void ReplaceString(string what, string with, string path) {
List < string > doneContents = new List < string > ();
List < string > doneNames = new List < string > ();
using(ZipArchive archive = ZipFile.Open(_path, ZipArchiveMode.Read)) {
int count = archive.Entries.Count;
for (int i = 0; i < count; i++) {
ZipArchiveEntry entry = archive.Entries[i];
using(var entryStream = entry.Open())
using(StreamReader reader = new StreamReader(entryStream)) {
string txt = reader.ReadToEnd();
if (txt.Contains(what)) {
txt = txt.Replace(what, with);
}
doneContents.Add(txt);
string name = entry.FullName;
doneNames.Add(name);
}
}
}
using(MemoryStream zipStream = new MemoryStream()) {
using(ZipArchive newArchive = new ZipArchive(zipStream, ZipArchiveMode.Create, true, Encoding.UTF8)) {
for (int i = 0; i < doneContents.Count; i++) {
int spot = i;
ZipArchiveEntry entry = newArchive.CreateEntry(doneNames[spot]);
using(var entryStream = entry.Open())
using(var sw = new StreamWriter(entryStream)) {
sw.Write(doneContents[spot]);
}
}
}
using(var fileStream = new FileStream(path, FileMode.Create)) {
zipStream.Seek(0, SeekOrigin.Begin);
zipStream.CopyTo(fileStream);
}
}
}
I've used Microsoft's DocumentFormat.OpenXML and Excel Interop, however, they are both lacking in a few main components that I need.
Update:
using(var fileStream = new FileStream(path, FileMode.Create)) {
var wrapper = new StreamWriter(fileStream);
wrapper.AutoFlush = true;
zipStream.Seek(0, SeekOrigin.Begin);
zipStream.CopyTo(wrapper.BaseStream);
wrapper.Flush();
wrapper.Close();
}
Try the process without changing the string and see if the file size is the same. If so then it would seem that your copy is working correctly, however as Marc B suggested, with compression, even a small change can result in a larger change in the overall size.

SqlBulkCopy and File Archiving

I have a process that loads data into a sql table from a flat file then needs to immediately move the file to an archive folder.
However when running the code it imports the data but throws and IOException
{"The process cannot access the file because it is being used by another process."}
There appears to be some contention in the process. Where and how should I avoid this?
internal class Program
{
private static void Main(string[] args)
{
string sourceFolder = #"c:\ImportFiles\";
string destinationFolder = #"c:\ImportFiles\Archive\";
foreach (string fileName in Directory.GetFiles(sourceFolder, "*.*"))
{
string sourceFileName = Path.GetFileName(fileName);
string destinationFileName = Path.GetFileName(fileName) + ".arc";
ProcessFile(fileName);
string source = String.Concat(sourceFolder,sourceFileName);
string destination = String.Concat(destinationFolder,destinationFileName);
File.Move(source, destination);
}
}
static void ProcessFile(string fileName)
{
Encoding enc = new UTF8Encoding(true, true);
DataTable dt = LoadRecordsFromFile(fileName, enc, ',');
SqlBulkCopy bulkCopy = new SqlBulkCopy("Server=(local);Database=test;Trusted_Connection=True;",
SqlBulkCopyOptions.TableLock);
bulkCopy.DestinationTableName = "dbo.tblManualDataLoad";
bulkCopy.WriteToServer(dt);
bulkCopy.Close();
}
public static DataTable LoadRecordsFromFile(string fileName, Encoding encoding, char delimeter)
{
DataTable table = null;
if (fileName != null &&
!fileName.Equals(string.Empty) &&
File.Exists(fileName))
{
try
{
string tableName = "DataImport";
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
List<string> rows = new List<string>();
StreamReader reader = new StreamReader(fs, encoding);
string record = reader.ReadLine();
while (record != null)
{
rows.Add(record);
record = reader.ReadLine();
}
List<string[]> rowObjects = new List<string[]>();
int maxColsCount = 0;
foreach (string s in rows)
{
string[] convertedRow = s.Split(new char[] { delimeter });
if (convertedRow.Length > maxColsCount)
maxColsCount = convertedRow.Length;
rowObjects.Add(convertedRow);
}
table = new DataTable(tableName);
for (int i = 0; i < maxColsCount; i++)
{
table.Columns.Add(new DataColumn());
}
foreach (string[] rowArray in rowObjects)
{
table.Rows.Add(rowArray);
}
//Remove Header Row From Import file
DataRow row = table.Rows[0];
row.Delete();
table.AcceptChanges();
}
catch
{
//TODO SEND EMAIL ALERT ON ERROR
throw new Exception("Error in ReadFromFile: IO error.");
}
}
else
{
//TODO SEND EMAIL ALERT ON ERROR
throw new FileNotFoundException("Error in ReadFromFile: the file path could not be found.");
}
return table;
}
}
Your program is likely holding the file open. You should wrap FileStream and StreamReader objects in using statements. This closes those objects when the using block finishes.
The part of your LoadRecordsFromFile function that reads the file should look something like:
...
string tableName = "DataImport";
List<string> rows = new List<string>();
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader reader = new StreamReader(fs, encoding))
{
string record = reader.ReadLine();
while (record != null)
{
rows.Add(record);
record = reader.ReadLine();
}
}
}
...

need to speed up C# application that converts timestamps

im having a tough time getting a small application to work faster. im not a developer and it took me some time to get this working as is. Can anyone offer any suggestions or alternate code to speed this process up, its taking about 1 Hour to process 10m of the input file.
the code is listed below and here is an example of the input file.
4401,imei:0000000000,2012-09-01 12:12:12.9999
using System;
using System.Globalization;
using System.IO;
class Sample
{
public static void Main(string[] args)
{
if (args.Length == 0)
{
return;
}
using (FileStream stream = File.Open(args[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader streamReader = new StreamReader(stream))
{
System.Text.StringBuilder builder = new System.Text.StringBuilder();
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
var values = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(values[2], out dt);
values[2] = Convert.ToString(dt.Ticks);
string[] output = new string[values.Length];
bool firstColumn = true;
for (int index = 0; index < values.Length; index++)
{
if (!firstColumn)
builder.Append(',');
builder.Append(values[index]);
firstColumn = false;
}
File.WriteAllText(args[1], builder.AppendLine().ToString());
}
}
}
}
}
The biggest performance hit is that every time a line is read the entire file (processed so far) is written back to disk. For a quick win, try moving your StringBuilder out of the loop:
System.Text.StringBuilder builder = new System.Text.StringBuilder();
using (FileStream stream = File.Open(args[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader streamReader = new StreamReader(stream))
{
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
var values = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(values[2], out dt);
values[2] = Convert.ToString(dt.Ticks);
string[] output = new string[values.Length];
bool firstColumn = true;
for (int index = 0; index < values.Length; index++)
{
if (!firstColumn)
builder.Append(',');
builder.Append(values[index]);
firstColumn = false;
}
builder.AppendLine();
}
}
}
File.WriteAllText(args[1], builder.ToString());
If you want to refactor further change the comma separating logic:
System.Text.StringBuilder builder = new System.Text.StringBuilder();
using (FileStream stream = File.Open(args[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader streamReader = new StreamReader(stream))
{
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
var values = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(values[2], out dt);
values[2] = Convert.ToString(dt.Ticks);
builder.AppendLine(string.Join(",", values));
}
}
}
File.WriteAllText(args[1], builder.ToString());
Edit: To avoid the memory usage, remove the Stringbuilder and use another FileStream to write to disk. Your proposed solution (using a List) will still use a substantial amount of memory and likely break on larger files:
using (FileStream input = File.Open(args[0], FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (FileStream output = File.Create(args[1]))
{
using (StreamReader streamReader = new StreamReader(input))
using (StreamWriter streamWriter = new StreamWriter(output))
{
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
var values = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(values[2], out dt);
values[2] = Convert.ToString(dt.Ticks);
streamWriter.WriteLine(string.Join(",", values));
}
}
}
here is what i found can fix this and handle the large files.
thanks to #Muzz and #Vache for the assistance.
string line = "";
System.IO.StreamReader file = new System.IO.StreamReader("c:/test.txt");
List<string> convertedLines = new List<string>();
while ((line = file.ReadLine()) != null)
{
string[] lineSplit = line.Split(',');
DateTime dt = new DateTime();
DateTime.TryParse(lineSplit[2], out dt);
lineSplit[2] = Convert.ToString(dt.Ticks);
 
string convertedline = lineSplit[0] + "," + lineSplit[1] + "," + lineSplit[2];
convertedLines.Add(convertedline);
}
file.Close();
File.WriteAllLines("c:/newTest.txt", convertedLines);

C# Comparing two files and exporting matching lines based on delimiter

Here’s the scenario.
I have a text file(alpha), single column, with a bunch of items.
My 2nd file is a csv(delta) with 4 columns.
I have to have the alpha compare again the delta and create a new file (omega) in which anything that alpha matched delta, it would export only the first two columns from delta into a new .txt file.
Example:
(Alpha)
BeginID
(delta):
BeginID,Muchmore,Info,Exists
(Omega):
BeginID,Muchmore
This document will probably have 10k lines it in. Thanks for the help!
Here's a rough cut way of doing the task you need:
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string alphaFilePath = #"C:\Documents and Settings\Jason\My Documents\Visual Studio 2008\Projects\Compte Two Files\Compte Two Files\ExternalFiles\Alpha.txt";
List<string> alphaFileContent = new List<string>();
using (FileStream fs = new FileStream(alphaFilePath, FileMode.Open))
using(StreamReader rdr = new StreamReader(fs))
{
while(!rdr.EndOfStream)
{
alphaFileContent.Add(rdr.ReadLine());
}
}
string betaFilePath = #"C:\Beta.csv";
StringBuilder sb = new StringBuilder();
using (FileStream fs = new FileStream(betaFilePath, FileMode.Open))
using (StreamReader rdr = new StreamReader(fs))
{
while(! rdr.EndOfStream)
{
string[] betaFileLine = rdr.ReadLine().Split(Convert.ToChar(","));
if (alphaFileContent.Contains(betaFileLine[0]))
{
sb.AppendLine(String.Format("{0}, {1}", betaFileLine[0], betaFileLine[1]));
}
}
}
using (FileStream fs = new FileStream(#"C:\Omega.txt", FileMode.Create))
using (StreamWriter writer = new StreamWriter(fs))
{
writer.Write(sb.ToString());
}
Console.WriteLine(sb.ToString());
}
}
}
Basically it reads a txt file, puts the contents in a list. Then it reads a csv file (assuming no columns) and matches the values to create a StringBuilder. In your code, substitute the StringBuilder with creating a new txt file.
EDIT: If you wish to have the code run in a button click, then put it in the button click handler (or a new routine and call that):
public void ButtonClick (Object sender, EventArgs e)
{
string alphaFilePath = #"C:\Documents and Settings\Jason\My Documents\Visual Studio 2008\Projects\Compte Two Files\Compte Two Files\ExternalFiles\Alpha.txt";
List<string> alphaFileContent = new List<string>();
using (FileStream fs = new FileStream(alphaFilePath, FileMode.Open))
using(StreamReader rdr = new StreamReader(fs))
{
while(!rdr.EndOfStream)
{
alphaFileContent.Add(rdr.ReadLine());
}
}
string betaFilePath = #"C:\Beta.csv";
StringBuilder sb = new StringBuilder();
using (FileStream fs = new FileStream(betaFilePath, FileMode.Open))
using (StreamReader rdr = new StreamReader(fs))
{
while(! rdr.EndOfStream)
{
string[] betaFileLine = rdr.ReadLine().Split(Convert.ToChar(","));
if (alphaFileContent.Contains(betaFileLine[0]))
{
sb.AppendLine(String.Format("{0}, {1}", betaFileLine[0], betaFileLine[1]));
}
}
}
using (FileStream fs = new FileStream(#"C:\Omega.txt", FileMode.Create))
using (StreamWriter writer = new StreamWriter(fs))
{
writer.Write(sb.ToString());
}
}
I'd probably load alpha into a collection then open delta for read, while not EOF readline into a string, split, if collection.contains column 0 then write to omega.
Done...

Categories

Resources