Basic Read CSV File Questions - c#

Thanks in advance, C# newb here having a few issues.
I this CSV file provided daily, large, and has no header. I only need certain items out of this file.
Here is the code I have so far.
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false,
};
using (var reader = new StreamReader(iFile.FileName))
using (var csv = new CsvReader(reader, config))
{
var records = new List<BQFile>();
csv.Read();
csv.ReadHeader();
while (csv.Read())
{
var record = new BQFile()
{
SNumber = csv.GetField<string>("SNumber"),
FOBPoint = csv.GetField<string>("FOBPoint")
};
}
What I am not understanding since this CSV files 150+ fields, is how do grab the correct data. For example, if SNumber is column 46, FOBPoint is column 123. I am finding the CSVHelper documentation a little limited to me.
Any help is appreciated.

What I am not understanding since this CSV files 150+ fields, is how do grab the correct data
By index, because there is no header
In your BQFile, decorate the properties with an attribute of [Index(NNN)] where N is the column number (0-based). The IndexAttribute is found in CsvHelper.Configuration.Attributes namespace - I mention this because Entity Framework also has an Index attribute; be sure you use the correct one
pubic class BQFile{
[Index(46)]
public string SNumber { get; set;}
...
}
Then do:
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
HasHeaderRecord = false,
};
using (var reader = new StreamReader(iFile.FileName))
using (var csv = new CsvReader(reader, config))
{
var records = csv.GetRecords<BQFile>();
...
records is an enumeration on top of the file stream (via CSVHelper, which reads records as it goes and creates instances of BQFile). You can only enumerate it once, and then after you're done enumerating it the filestream will be at the end - if you wanted to re-read the file you'd have to Seek the stream or renew the reader. Also, the file is only read (in chunks, progressively) as you enumerate. If you return records somewhere, so you drop out of the using and you thus dispose the reader, you'll get an error when you try to start reading from records (because it's disposed)
To work with records, you either foreach it, processing the objects you get as you go:
foreach(BQFile bqf in records){
//do stuff with each BQFile here
}
Or if you want to load it all into memory, you can do something like ToList() it so you end up with a bunch of BQFile in a List, and then you can e.g. access them randomly, read them over and over etc..
var bqfs = records.ToList();
ps; I don't know, when you said "it's column 46" if that's counting from 1 or 0.. You might have to adjust your 46.

Related

Enforce LF line endings with CsvHelper

If I have some LF converted (using N++) CSV files, everytime I write data to them using JoshClose's CsvHelper the line endings are back to CRLF.
Since I'm having problems with CLRF ROWTERMINATORS in SQL Server, I whish to keep my line endings like the initital status of the file.
Couldn't find it in the culture settings, I compile my own version of the library.
How to proceed?
Missing or incorrect Newline characters when using CsvHelper is a common problem with a simple but poorly documented solution. The other answers to this SO question are correct but are missing one important detail.
Configuration allows you to choose from one of four available alternatives:
// Pick one of these alternatives
CsvWriter.Configuration.NewLine = NewLine.CR;
CsvWriter.Configuration.NewLine = NewLine.LF;
CsvWriter.Configuration.NewLine = NewLine.CRLF;
CsvWriter.Configuration.NewLine = NewLine.Environment;
However, many people are tripped up by the fact that (by design) CsvWriter does not emit any newline character when you write the header using CsvWriter.WriteHeader() nor when you write a single record using CsvWriter.WriteRecord(). The reason is so that you can write additional header elements or additional record elements, as you might do when your header and row data comes from two or more classes rather than from a single class.
CsvWriter does emit the defined type of newline when you call CsvWriter.NextRecord(), and the author, JoshClose, states that you are supposed to call NextRecord() after you are done with the header and after you are done with each individual row added using WriteRecord. See GitHub Issues List 929
When you are writing multiple records using WriteRecords() CsvWriter automatically emits the defined type of newline at the end of each record.
In my opinion this ought to be much better documented, but there it is.
From what I can tell, the line terminator isn't controlled by CvsHelper. I've gotten it to work by adjusting the File writer I pass to CsvWriter.
TextWriter tw = File.CreateText(filepathname);
tw.NewLine = "\n";
CsvWriter csvw = new CsvWriter(tw);
csvw.WriteRecords(records);
csvw.Dispose();
Might be useful for somebody:
public static void AppendToCsv(ShopDataModel shopRecord)
{
using (var writer = new StreamWriter(DestinationFile, true))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecord(shopRecord);
writer.Write("\n");
}
}
}
As of CsvHelper 13.0.0, line-endings are now configurable via the NewLine configuration property.
E.g.:
using CsvHelper;
using CsvHelper.Configuration;
using System.Globalization;
void Main()
{
using (var writer = new StreamWriter(#"my-file.csv"))
{
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.Configuration.HasHeaderRecord = false;
csv.Configuration.NewLine = NewLine.LF; // <<####################
var records = new List<Foo>
{
new Foo { Id = 1, Name = "one" },
new Foo { Id = 2, Name = "two" },
};
csv.WriteRecords(records);
}
}
}
private class Foo
{
public int Id { get; set; }
public string Name { get; set; }
}

How to remove contents of one csv file from another in C#

I have 2 csv files, file1.csv and file2.csv. Some lines in each file will be identical. I wish to create a 3rd csv file, based upon file2.csv but with any lines that are present in file1.csv removed from it. Effectively I wish to subtract file1.csv from file2.csv ignoring any lines present in file1 that are not in file2.
I know that I could use streamreader to read each line in file2.csv and search for it in file1.csv. If it does not exist in file1.csv I can write it to file3.csv. However, the files are very large (over 30000 lines) and I believe this would take a lot of processing time.
I suspect there may be a better method of loading each csv to an array and then performing a simple subtraction function on them to obtain the desired result. I would appreciate either some help with the code or on method that I should approach this problem with.
Example content of files:
file1.csv
dt97861.jpg,149954,c1714ee1,\folder1\folderA\,
dt97862.jpg,149955,c1714ee0,\folder1\folderA\,
dt97863.jpg,59368,cd23f223,\folder2\folderA\,
dt97864.jpg,57881,0835be4a,\folder2\folderB\,
dt97865.jpg,57882,0835be4b,\folder2\folderB\,
file2.csv
dt97862.jpg,149955,c1714ee0,\folder1\folderA\,
dt97863.jpg,59368,cd23f223,\folder2\folderA\,
dt97864.jpg,57881,0835be4a,\folder2\folderB\,
dt97865.jpg,57882,0835be4b,\folder2\folderB\,
dt97866.jpg,57883,0835be4c,\folder2\folderB\,
dt97867.jpg,57884,0835be4d,\folder3\folderA\,
dt97868.jpg,57885,0835be4e,\folder3\folderA\,
The results I require is:
file3.csv
dt97866.jpg,57883,0835be4c,\folder2\folderB\,
dt97867.jpg,57884,0835be4d,\folder3\folderA\,
dt97868.jpg,57885,0835be4e,\folder3\folderA\,
EDIT:
With the help below I came to the following solution which I believe to be nice and elegant:
public static IEnumerable<string> ReadFile(string path)
{
string line;
using (var reader = File.OpenText(path))
while ((line = reader.ReadLine()) != null)
yield return line;
}
then:
var file2 = ReadFile(file2FilePath);
var file1 = ReadFile(file1FilePath);
var file3 = file2.Except(file1);
File.WriteAllLines(file3FilePath, file3);
Assume the line is perfectly identical, you can read both file into two IEnumerable<string> and extract with IEnumerable.Except<T>. This will produce the same result regardless of the ordering~
Example :
var file1 = new List<string>{
#"dt97861.jpg,149954,c1714ee1,\folder1\folderA\,",
#"dt97862.jpg,149955,c1714ee0,\folder1\folderA\,",
#"dt97863.jpg,59368,cd23f223,\folder2\folderA\,",
#"dt97864.jpg,57881,0835be4a,\folder2\folderB\,",
#"dt97865.jpg,57882,0835be4b,\folder2\folderB\,",
};
var file2 = new List<string>{
#"dt97862.jpg,149955,c1714ee0,\folder1\folderA\,",
#"dt97863.jpg,59368,cd23f223,\folder2\folderA\,",
#"dt97864.jpg,57881,0835be4a,\folder2\folderB\,",
#"dt97865.jpg,57882,0835be4b,\folder2\folderB\,",
#"dt97866.jpg,57883,0835be4c,\folder2\folderB\,",
#"dt97867.jpg,57884,0835be4d,\folder3\folderA\,",
#"dt97868.jpg,57885,0835be4e,\folder3\folderA\,",
};
file2.Except(file1).Dump();
Output :
dt97866.jpg,57883,0835be4c,\folder2\folderB\,
dt97867.jpg,57884,0835be4d,\folder3\folderA\,
dt97868.jpg,57885,0835be4e,\folder3\folderA\,
Here is the function to load any file into IEnumerable<string>. Just dont forget to using System.IO;.
public static IEnumerable<string> ReadFile(string path)
{
string line;
using(var reader = File.OpenText(path))
while((line = reader.ReadLine()) != null)
yield return line;
}
To write the result to a file :
//using System.IO; is required
File.WriteAllLines("file3.csv", file2.Except(file1))
Remarks : File.WriteAllLines will create or overwrite the file.
While this may not be the best approach, it's the one I've used in the past. It's a bit of a dirty hack, but...
Import both CSV files into a datatable (so you will have two datatables -I personally prefer closed xml if you plan to use an excel type format, otherwise just use a normal file read/write - My example uses regular read/write)
Move data from datatable into a list (my example assumes comma separated values, one per line.)
Find unique values between lists and merge
Export the merged lists to a csv file
*[Edited steps after actually working on the code]
Per request from Bit, I've added an example using sample data from Some Random Website - This was written in VS2008 against .NET 3.5, but it should work on 3.5+. I copied us-500 into 2 versions, the original and modified 1 row to create a unique value to test. This project is targeting x86 platform. I've used a new windows form for testing
using System.Data;
using System.Data.OleDb;
using System.IO;
using System.Linq;
using System.Windows.Forms;
namespace TestSandbox
{
public partial class Form1 : Form
{
public Form1()
{
var file1 = new DataTable();
var file2 = new DataTable();
InitializeComponent();
//Gets data from csv file, select allows for filtering
using (var conn = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\;Extended Properties=""text;HDR=Yes;FMT=Delimited"";"))
{
conn.Open();
using (var adapter = new OleDbDataAdapter(#"select * from [us-500.csv]", conn))
{
adapter.Fill(file1);
}
}
using (var conn = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\;Extended Properties=""text;HDR=Yes;FMT=Delimited"";"))
{
conn.Open();
using (var adapter = new OleDbDataAdapter(#"select * from [us-500-2.csv]", conn))
{
adapter.Fill(file2);
}
}
//Moves datatable information to lists for comparison
var file1List = (from DataRow row in file1.Rows select row.ItemArray.Select(field => field.ToString()).ToArray() into fields select string.Join(",", fields)).ToList();
var file2List = (from DataRow row in file2.Rows select row.ItemArray.Select(field => field.ToString()).ToArray() into fields select string.Join(",", fields)).ToList();
//Adds all data from file2 into file1 list, except for data that already exists in file1
file1List.AddRange(file2List.Except(file1List));
//Exports all results to c:\results.csv
File.WriteAllLines(#"C:\Results.csv", file1List.ToArray());
}
}
}
*Note: After looking at the code, importing straight to a list looks like it would be more efficient, but I'll leave this as is for now since it's not overly complicated.
Step 1. Using System.IO, we'll read two files using FileStream and create a third file using StreamWriter.
Step 2. Use FileStream to read file #1. e.g.
using (var FS = new System.IO.FileStream(file1, System.IO.FileMode.Open, System.IO.FileAccess.Read)) { ...<insert next steps in here>...}
Step 3. Nest another FileStream to read file #2. This stream will be read multiple times, so it's best if you can put the smaller file in this part of the nest. You can do this by checking the size of the file prior to jumping into these loops.
Step 4. Read in a single line from our biggest file, File#1, then we compare it against ALL lines from File#2 sequentially. If a match is found, set a boolean to TRUE indicating that there is a matching line found in File #2.
Step 5. Once we're at the end of File #2, check for a true/false condition of the boolean. If its false, SAVE the string we read from File #1 into File #3. This is your output file.
Step 6. Reset the stream pointer for File #2 to the beginning of the file e.g. FS.Seek(0, System.IO.SeekOrigin.Begin)
Step 7. Repeat from Step 4 until we've reached the end of File #1. File #3's contents should represent only unique entries from File #1 that are not members of File #2

How do I pass a collection of strings as a TextReader?

I am using the CSVHelper library, which can extract a list of objects from a CSV file with just three lines of code:
var streamReader = // Create a reader to your CSV file.
var csvReader = new CsvReader( streamReader );
List<MyCustomType> myData = csvReader.GetRecords<MyCustomType>();
However, by file has nonsense lines and I need to skip the first ten lines in the file. I thought it would be nice to use LINQ to ensure 'clean' data, and then pass that data to CsvFReader, like so:
public TextReader GetTextReader(IEnumerable<string> lines)
{
// Some magic here. Don't want to return null;
return TextReader.Null;
}
public IEnumerable<T> ExtractObjectList<T>(string filePath) where T : class
{
var csvLines = File.ReadLines(filePath)
.Skip(10)
.Where(l => !l.StartsWith(",,,"));
var textReader = GetTextReader(csvLines);
var csvReader = new CsvReader(textReader);
csvReader.Configuration.ClassMapping<EventMap, Event>();
return csvReader.GetRecords<T>();
}
But I'm really stuck into pushing a 'static' collection of strings through a stream like a TextReaer.
My alternative here is to process the CSV file line by line through CsvReader and examine each line before extracting an object, but I find that somewhat clumsy.
The StringReader Class provides a TextReader that wraps a String. You could simply join the lines and wrap them in a StringReader:
public TextReader GetTextReader(IEnumerable<string> lines)
{
return new StringReader(string.Join("\r\n", lines));
}
An easier way would be to use CsvHelper to skip the lines.
// Skip rows.
csvReader.Configuration.IgnoreBlankLines = false;
csvReader.Configuration.IgnoreQuotes = true;
for (var i = 0; i < 10; i++)
{
csvReader.Read();
}
csvReader.Configuration.IgnoreBlankLines = false;
csvReader.Configuration.IgnoreQuotes = false;
// Carry on as normal.
var myData = csvReader.GetRecords<MyCustomType>;
IgnoreBlankLines is turned off in case any of those first 10 rows are blank. IgnoreQuotes is turned off so you don't get any BadDataExceptions if those rows contain a ". You can turn them back on after for normal functionality again.
If you don't know the amount of rows and need to test based on row data, you can just test csvReader.Context.Record and see if you need to stop. In this case, you would probably need to manually call csvReader.ReadHeader() before calling csvReader.GetRecords<MyCustomType>().

Bulk data insertion in SQL Server table from delimited text file using c#

I have tab delimited text file. File is around 100MB. I want to store data from this file to SQL server table. The file contains 1 million records when stored in sql server. What is the best way to achieve this?
I can create in momory datatable in c# and then upload the same to sql server, but in this case it will load entire 100 MB file to memory. What if file size get bigger?
No problem; CsvReader will handle most delimited text formats, and implements IDataReader, so can be used to feed a SqlBulkCopy. For example:
using (var file = new StreamReader(path))
using (var csv = new CsvReader(file, true)) // true = first row is headers
using (var bcp = new SqlBulkCopy(connectionString))
{
bcp.DestinationTableName = "Foo";
bcp.WriteToServer(csv);
}
Note that CsvReader has lots of options more more subtle file handling (specifying the delimiter rules, etc). SqlBulkCopy is the high-performance bulk-load API - very efficient. This is a streaming reader/writer API; it does not load all the data into memory at once.
You should read the file line-by-line, so you don't have to load the whole line into memory:
using (var file = System.IO.File.OpenText(filename))
{
while (!file.EndOfStream)
{
string line = file.ReadLine();
// TODO: Do your INSERT here
}
}
* Update *
"This will make 1 million separate insert commands to sql server. Is there any way to make it in bulk"
You could use parameterised queries, which would still issue 1M inserts, but would still be quite fast.
Alternatively, you can use SqlBulkCopy, but that's going to be rather difficult if you don't want to use 3rd party libraries. If you are more amenable to the MS license, you could use the LINQ Entity Data Reader (distributed under Ms-PL license), which provides the AsDataReader extension method:
void MyInsertMethod()
{
using (var bulk = new SqlBulkCopy("MyConnectionString"))
{
bulk.DestinationTableName = "MyTableName";
bulk.WriteToServer(GetRows().AsDataReader());
}
}
class MyType
{
public string A { get; set; }
public string B { get; set; }
}
IEnumerable<MyType> GetRows()
{
using (var file = System.IO.File.OpenText("MyTextFile"))
{
while (!file.EndOfStream)
{
var splitLine = file.ReadLine().Split(',');
yield return new MyType() { A = splitLine[0], B = splitLine[1] };
}
}
}
If you didn't want to use the MS licensed code either, you could implement IDataReader yourself, but that is going to be a PITA. Note that the CSV handling above (Split(',')) is not at all robust, and also that column names in the table must be the same as property names on MyType. TBH, I'd recommend you go with Marc's answer on this one

Modify CSV Parser to work with TSV files C#

I have this code for parsing a CSV file.
var query = from line in File.ReadAllLines("E:/test/sales/" + filename)
let customerRecord = line.Split(',')
select new FTPSalesDetails
{
retailerName = "Example",
};
foreach (var item in query)
{
//sales details table
ItemSale ts = new ItemSale
{
RetailerID = GetRetailerID(item.retailerName)
};
}
Obviously there will be more data in the above code, I am just awaiting the test information file details/structure.
In the mean time I thought I'd ask if this could me modified to parse TSV files?
All help is appreciated,
thanks :)
assuming tsv is tab separated value, you can use
line.Split('\t')
if you are using .NET 4.0, i would recommend that u use File.ReadLines for large files in order to use LINQ and not to load all the lines in memory at once.

Categories

Resources