Enforce LF line endings with CsvHelper - c#

If I have some LF converted (using N++) CSV files, everytime I write data to them using JoshClose's CsvHelper the line endings are back to CRLF.
Since I'm having problems with CLRF ROWTERMINATORS in SQL Server, I whish to keep my line endings like the initital status of the file.
Couldn't find it in the culture settings, I compile my own version of the library.
How to proceed?

Missing or incorrect Newline characters when using CsvHelper is a common problem with a simple but poorly documented solution. The other answers to this SO question are correct but are missing one important detail.
Configuration allows you to choose from one of four available alternatives:
// Pick one of these alternatives
CsvWriter.Configuration.NewLine = NewLine.CR;
CsvWriter.Configuration.NewLine = NewLine.LF;
CsvWriter.Configuration.NewLine = NewLine.CRLF;
CsvWriter.Configuration.NewLine = NewLine.Environment;
However, many people are tripped up by the fact that (by design) CsvWriter does not emit any newline character when you write the header using CsvWriter.WriteHeader() nor when you write a single record using CsvWriter.WriteRecord(). The reason is so that you can write additional header elements or additional record elements, as you might do when your header and row data comes from two or more classes rather than from a single class.
CsvWriter does emit the defined type of newline when you call CsvWriter.NextRecord(), and the author, JoshClose, states that you are supposed to call NextRecord() after you are done with the header and after you are done with each individual row added using WriteRecord. See GitHub Issues List 929
When you are writing multiple records using WriteRecords() CsvWriter automatically emits the defined type of newline at the end of each record.
In my opinion this ought to be much better documented, but there it is.

From what I can tell, the line terminator isn't controlled by CvsHelper. I've gotten it to work by adjusting the File writer I pass to CsvWriter.
TextWriter tw = File.CreateText(filepathname);
tw.NewLine = "\n";
CsvWriter csvw = new CsvWriter(tw);
csvw.WriteRecords(records);
csvw.Dispose();

Might be useful for somebody:
public static void AppendToCsv(ShopDataModel shopRecord)
{
using (var writer = new StreamWriter(DestinationFile, true))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecord(shopRecord);
writer.Write("\n");
}
}
}

As of CsvHelper 13.0.0, line-endings are now configurable via the NewLine configuration property.
E.g.:
using CsvHelper;
using CsvHelper.Configuration;
using System.Globalization;
void Main()
{
using (var writer = new StreamWriter(#"my-file.csv"))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = false;
csv.Configuration.NewLine = NewLine.LF; // <<####################
var records = new List<Foo>
{
new Foo { Id = 1, Name = "one" },
new Foo { Id = 2, Name = "two" },
};
csv.WriteRecords(records);
}
}
}
private class Foo
{
public int Id { get; set; }
public string Name { get; set; }
}

Related

How to detect if a row has extra columns (more than the header)

While reading a CSV file, how can I configure CsvHelper to enforce that each row has no extra columns that are not found in the header? I cannot find any obvious property under CsvConfiguration nor under CsvHelper.Configuration.Attributes.
Context: In our CSV file format, the last column is a string description, which our users (using plain-text editors) sometimes forget to quote when the description contains commas. Such "raw" commas cause that row to have extra columns, and the intended description read into the software omits the description after the first raw comma. I want to detect this and throw an exception that suggests to the user they may have forgotten to quote the description cell.
It looks like CsvConfiguration.DetectColumnCountChanges might be related, but presently the 29.0.0 library lacks any Intellisense description of CsvConfiguration properties, so I have no idea how to use this.
Similar information for other CSV libraries:
With LINQtoCSV this was done by setting IgnoreUnknownColumns = false in CsvFileDescription.
Can Lumenworks CSV parser error when there are too many columns in a row?
You were on the right track with CsvConfiguration.DetectColumnCountChanges.
void Main()
{
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
DetectColumnCountChanges = true
};
using (var reader = new StringReader("Id,Name\n1,MyName\n2,YourName,ExtraColumn"))
using (var csv = new CsvReader(reader, config))
{
try
{
var records = csv.GetRecords<Foo>().ToList();
}
catch (BadDataException ex)
{
if (ex.Message.StartsWith("An inconsistent number of columns has been detected."))
{
Console.WriteLine("There is an issue with an inconsistent number of columns on row {0}", ex.Context.Parser.RawRow);
Console.WriteLine("Row data: \"{0}\"", ex.Context.Parser.RawRecord);
Console.WriteLine("Please check for commas in a field that were not properly quoted.");
}
}
}
}
public class Foo
{
public int Id { get; set; }
public string Name { get; set; }
}

C# Linq doesn't recognize Czech characters while reading from .csv file

Basically, when trying to get a .csv file into a list using Linq, all characters with diacritics turn into <?> character. What should i do to make the code keep them as in the .csv file?
using (StreamReader ctec = new StreamReader(souborovejmeno))
{
var lines = File.ReadAllLines(souborovejmeno).Select(a => a.Split('\t'));
var csv = from line in lines
select (from piece in line
select piece).ToList();
foreach (var c in csv)
{
hraci.Add(new Hrac(c[0], c[1]));
listBox1.Items.Add(c[0]);
}
}
Thanks in advance for answers. Sorry if this is quite dumb, i am not too experienced in coding.
I think your problem is missed encoding. I see you already have answer above that works.
var lines = File.ReadAllLines(path, Encoding.UTF8).Select(a => a.Split('\t'));
But I strongly recommend you to use CsvHelper
dotnet add package CsvHelper
And use something like this
public class Record
{
[Index(0)]
public int Key { get; set; }
[Index(1)]
public string Value { get; set; }
}
....
using (var reader = new StreamReader(souborovejmeno, Encoding.UTF8))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
var records = csv.GetRecords<Record>();
foreach(var record in records) {
hraci.Add(new Hrac(record.Key, record.Value));
listBox1.Items.Add(record.Key);
}
}
...
Try to include an encoding like that:
var lines = File.ReadAllLines(path, Encoding.UTF8).Select(a => a.Split('\t'));
Make sure to import System.Text

Basic Read CSV File Questions

Thanks in advance, C# newb here having a few issues.
I this CSV file provided daily, large, and has no header. I only need certain items out of this file.
Here is the code I have so far.
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false,
};
using (var reader = new StreamReader(iFile.FileName))
using (var csv = new CsvReader(reader, config))
{
var records = new List<BQFile>();
csv.Read();
csv.ReadHeader();
while (csv.Read())
{
var record = new BQFile()
{
SNumber = csv.GetField<string>("SNumber"),
FOBPoint = csv.GetField<string>("FOBPoint")
};
}
What I am not understanding since this CSV files 150+ fields, is how do grab the correct data. For example, if SNumber is column 46, FOBPoint is column 123. I am finding the CSVHelper documentation a little limited to me.
Any help is appreciated.
What I am not understanding since this CSV files 150+ fields, is how do grab the correct data
By index, because there is no header
In your BQFile, decorate the properties with an attribute of [Index(NNN)] where N is the column number (0-based). The IndexAttribute is found in CsvHelper.Configuration.Attributes namespace - I mention this because Entity Framework also has an Index attribute; be sure you use the correct one
pubic class BQFile{
[Index(46)]
public string SNumber { get; set;}
...
}
Then do:
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false,
};
using (var reader = new StreamReader(iFile.FileName))
using (var csv = new CsvReader(reader, config))
{
var records = csv.GetRecords<BQFile>();
...
records is an enumeration on top of the file stream (via CSVHelper, which reads records as it goes and creates instances of BQFile). You can only enumerate it once, and then after you're done enumerating it the filestream will be at the end - if you wanted to re-read the file you'd have to Seek the stream or renew the reader. Also, the file is only read (in chunks, progressively) as you enumerate. If you return records somewhere, so you drop out of the using and you thus dispose the reader, you'll get an error when you try to start reading from records (because it's disposed)
To work with records, you either foreach it, processing the objects you get as you go:
foreach(BQFile bqf in records){
//do stuff with each BQFile here
}
Or if you want to load it all into memory, you can do something like ToList() it so you end up with a bunch of BQFile in a List, and then you can e.g. access them randomly, read them over and over etc..
var bqfs = records.ToList();
ps; I don't know, when you said "it's column 46" if that's counting from 1 or 0.. You might have to adjust your 46.

Bulk data insertion in SQL Server table from delimited text file using c#

I have tab delimited text file. File is around 100MB. I want to store data from this file to SQL server table. The file contains 1 million records when stored in sql server. What is the best way to achieve this?
I can create in momory datatable in c# and then upload the same to sql server, but in this case it will load entire 100 MB file to memory. What if file size get bigger?
No problem; CsvReader will handle most delimited text formats, and implements IDataReader, so can be used to feed a SqlBulkCopy. For example:
using (var file = new StreamReader(path))
using (var csv = new CsvReader(file, true)) // true = first row is headers
using (var bcp = new SqlBulkCopy(connectionString))
{
bcp.DestinationTableName = "Foo";
bcp.WriteToServer(csv);
}
Note that CsvReader has lots of options more more subtle file handling (specifying the delimiter rules, etc). SqlBulkCopy is the high-performance bulk-load API - very efficient. This is a streaming reader/writer API; it does not load all the data into memory at once.
You should read the file line-by-line, so you don't have to load the whole line into memory:
using (var file = System.IO.File.OpenText(filename))
{
while (!file.EndOfStream)
{
string line = file.ReadLine();
// TODO: Do your INSERT here
}
}
* Update *
"This will make 1 million separate insert commands to sql server. Is there any way to make it in bulk"
You could use parameterised queries, which would still issue 1M inserts, but would still be quite fast.
Alternatively, you can use SqlBulkCopy, but that's going to be rather difficult if you don't want to use 3rd party libraries. If you are more amenable to the MS license, you could use the LINQ Entity Data Reader (distributed under Ms-PL license), which provides the AsDataReader extension method:
void MyInsertMethod()
{
using (var bulk = new SqlBulkCopy("MyConnectionString"))
{
bulk.DestinationTableName = "MyTableName";
bulk.WriteToServer(GetRows().AsDataReader());
}
}
class MyType
{
public string A { get; set; }
public string B { get; set; }
}
IEnumerable<MyType> GetRows()
{
using (var file = System.IO.File.OpenText("MyTextFile"))
{
while (!file.EndOfStream)
{
var splitLine = file.ReadLine().Split(',');
yield return new MyType() { A = splitLine[0], B = splitLine[1] };
}
}
}
If you didn't want to use the MS licensed code either, you could implement IDataReader yourself, but that is going to be a PITA. Note that the CSV handling above (Split(',')) is not at all robust, and also that column names in the table must be the same as property names on MyType. TBH, I'd recommend you go with Marc's answer on this one

Reading a CSV file in .NET?

How do I read a CSV file using C#?
A choice, without using third-party components, is to use the class Microsoft.VisualBasic.FileIO.TextFieldParser (http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx) . It provides all the functions for parsing CSV. It is sufficient to import the Microsoft.VisualBasic assembly.
var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(file);
parser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
parser.SetDelimiters(new string[] { ";" });
while (!parser.EndOfData)
{
string[] row = parser.ReadFields();
/* do something */
}
You can use the Microsoft.VisualBasic.FileIO.TextFieldParser class in C#:
using System;
using System.Data;
using Microsoft.VisualBasic.FileIO;
static void Main()
{
string csv_file_path = #"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTableFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTableFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datacolumn = new DataColumn(column);
datacolumn.AllowDBNull = true;
csvData.Columns.Add(datacolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}
You could try CsvHelper, which is a project I work on. Its goal is to make reading and writing CSV files as easy as possible, while being very fast.
Here are a few ways you can read from a CSV file.
// By type
var records = csv.GetRecords<MyClass>();
var records = csv.GetRecords( typeof( MyClass ) );
// Dynamic
var records = csv.GetRecords<dynamic>();
// Using anonymous type for the class definition
var anonymousTypeDefinition =
{
Id = default( int ),
Name = string.Empty,
MyClass = new MyClass()
};
var records = csv.GetRecords( anonymousTypeDefinition );
I usually use a simplistic approach like this one:
var path = Server.MapPath("~/App_Data/Data.csv");
var csvRows = System.IO.File.ReadAllLines(path, Encoding.Default).ToList();
foreach (var row in csvRows.Skip(1))
{
var columns = row.Split(';');
var field1 = columns[0];
var field2 = columns[1];
var field3 = columns[2];
}
I just used this library in my application. http://www.codeproject.com/KB/database/CsvReader.aspx. Everything went smoothly using this library, so I'm recommending it. It is free under the MIT License, so just include the notice with your source files.
I didn't display the CSV in a browser, but the author has some samples for Repeaters or DataGrids. I did run one of his test projects to test a Sort operation I have added and it looked pretty good.
You can try Cinchoo ETL - an open source lib for reading and writing CSV files.
Couple of ways you can read CSV files
Id, Name
1, Tom
2, Mark
This is how you can use this library to read it
using (var reader = new ChoCSVReader("emp.csv").WithFirstLineHeader())
{
foreach (dynamic item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
If you have POCO object defined to match up with CSV file like below
public class Employee
{
public int Id { get; set; }
public string Name { get; set; }
}
You can parse the same file using this POCO class as below
using (var reader = new ChoCSVReader<Employee>("emp.csv").WithFirstLineHeader())
{
foreach (var item in reader)
{
Console.WriteLine(item.Id);
Console.WriteLine(item.Name);
}
}
Please check out articles at CodeProject on how to use it.
Disclaimer: I'm the author of this library
I recommend Angara.Table, about save/load: http://predictionmachines.github.io/Angara.Table/saveload.html.
It makes column types inference, can save CSV files and is much faster than TextFieldParser. It follows RFC4180 for CSV format and supports multiline strings, NaNs, and escaped strings containing the delimiter character.
The library is under MIT license. Source code is https://github.com/Microsoft/Angara.Table.
Though its API is focused on F#, it can be used in any .NET language but not so succinct as in F#.
Example:
using Angara.Data;
using System.Collections.Immutable;
...
var table = Table.Load("data.csv");
// Print schema:
foreach(Column c in table)
{
string colType;
if (c.Rows.IsRealColumn) colType = "double";
else if (c.Rows.IsStringColumn) colType = "string";
else if (c.Rows.IsDateColumn) colType = "date";
else if (c.Rows.IsIntColumn) colType = "int";
else colType = "bool";
Console.WriteLine("{0} of type {1}", c.Name, colType);
}
// Get column data:
ImmutableArray<double> a = table["a"].Rows.AsReal;
ImmutableArray<string> b = table["b"].Rows.AsString;
Table.Save(table, "data2.csv");
You might be interested in Linq2Csv library at CodeProject. One thing you would need to check is that if it's reading the data when it needs only, so you won't need a lot of memory when working with bigger files.
As for displaying the data on the browser, you could do many things to accomplish it, if you would be more specific on what are your requirements, answer could be more specific, but things you could do:
1. Use HttpListener class to write simple web server (you can find many samples on net to host mini-http server).
2. Use Asp.Net or Asp.Net Mvc, create a page, host it using IIS.
Seems like there are quite a few projects on CodeProject or CodePlex for CSV Parsing.
Here is another CSV Parser on CodePlex
http://commonlibrarynet.codeplex.com/
This library has components for CSV parsing, INI file parsing, Command-Line parsing as well. It's working well for me so far. Only thing is it doesn't have a CSV Writer.
This is just for parsing the CSV. For displaying it in a web page, it is simply a matter of taking the list and rendering it however you want.
Note: This code example does not handle the situation where the input string line contains newlines.
public List<string> SplitCSV(string line)
{
if (string.IsNullOrEmpty(line))
throw new ArgumentException();
List<string> result = new List<string>();
int index = 0;
int start = 0;
bool inQuote = false;
StringBuilder val = new StringBuilder();
// parse line
foreach (char c in line)
{
switch (c)
{
case '"':
inQuote = !inQuote;
break;
case ',':
if (!inQuote)
{
result.Add(line.Substring(start, index - start)
.Replace("\"",""));
start = index + 1;
}
break;
}
index++;
}
if (start < index)
{
result.Add(line.Substring(start, index - start).Replace("\"",""));
}
return result;
}
}
I have been maintaining an open source project called FlatFiles for several years now. It's available for .NET Core and .NET 4.5.1.
Unlike most of the alternatives, it allows you to define a schema (similar to the way EF code-first works) with an extreme level of precision, so you aren't fight conversion issues all the time. You can map directly to your data classes, and there is also support for interfacing with older ADO.NET classes.
Performance-wise, it's been tuned to be one of the fastest parsers for .NET, with a plethora of options for quirky format differences. There's also support for fixed-length files, if you need it.
you can use this library: Sky.Data.Csv
https://www.nuget.org/packages/Sky.Data.Csv/
this is a really fast CSV reader library and it's really easy to use:
using Sky.Data.Csv;
var readerSettings = new CsvReaderSettings{Encoding = Encoding.UTF8};
using(var reader = CsvReader.Create("path-to-file", readerSettings)){
foreach(var row in reader){
//do something with the data
}
}
it also supports reading typed objects with CsvReader<T> class which has a same interface.

Categories

Resources