Manipulate existing CSV file, while keeping columns order. (CsvReader/CsvWriter) - c#

I need to manipulate an existing CSV file via following actions:
Read from an existing CSV file -> then Append new row to it.
I have following code which is choking over the 3rd row - as the file is already in use by the code from the 1st row. And I'm not sure how to read it properly otherwise, and then append new row to it.
public bool Save(Customer customer)
{
using (StreamReader input = File.OpenText("DataStoreOut.csv"))
using (CsvReader csvReader = new CsvReader(input))
using (StreamWriter output = File.CreateText("DataStoreOut.csv"))
using (var csvWriter = new CsvWriter(output))
{
IEnumerable<Customer> records = csvReader.GetRecords<Customer>();
List<Customer> customerList = new List<Customer>();
customerList.Add(customer);
csvWriter.WriteHeader<Customer>();
csvWriter.NextRecord();
foreach (var array in customerList)
{
csvWriter.WriteRecord(records.Append(array));
}
}
}
Each of row in the CSV file contains a customer.CustomerId (which is unique, and read-only). How can I read only row which has specific customerId and then update any values there.

If you want to append a record to a file, the best way to do it is read the items, add the new one to the collection, and write everything back.
public static void Append(Customer customer, string file)
{
List<Customer> records = null;
using (var reader = new StreamReader(file))
{
using (var csv = new CsvReader(reader))
{
records = csv.GetRecords<Customer>().ToList();
}
}
records.Add(customer);
using (var writer = new StreamWriter(file))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecords(records);
}
}
}
As #Dour High Arch mentioned, to be perfectly safe though you might want to take the extra step of using a temp file in case something goes wrong.
If you want to update instead of append, you'd have to look up the specified record, and update it if it exists.
public static void Update(Customer customer, string file)
{
List<Customer> records = null;
using (var reader = new StreamReader(file))
{
using (var csv = new CsvReader(reader))
{
records = csv.GetRecords<Customer>().ToList();
}
}
var index = records.FindIndex(x => x.ID == customer.ID);
if (index >= 0)
{
records[index] = customer;
using (var writer = new StreamWriter(file))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecords(records);
}
}
}
}
Again, writing to a temp file is advisable.
UPDATE
Actually there's a slightly better way to append if you don't want to replace the file. When instantiating a StreamWriter you can do so with append=true. In which case, it will append to the end of the file.
The small caveat is that in case the EOF marker is not at a new line but at the last field of the last record, this will append record to the end of the last field messing up your columns. As a workaround I've added a writer.WriteLine(); before using the CSVHelper class' writer.
public static void Append2(Customer customer, string file)
{
using (var writer = new StreamWriter(file, true))
{
writer.WriteLine();
using (var csv = new CsvWriter(writer))
{
csv.WriteRecord(customer);
}
}
}
In case the file is in a new line, then this will add an empty line though. That can be countered by ignoring empty lines when you read a file.

Related

Reading a CSV file using CsvHelper

I'm a newbie. I want to get data from the CSV file-the Id and Name fields, but when I run the reading method, I get only 100 lines of an incomprehensible type: "CsvHelper.CsvReaderd__87`1[Program+Product]". I do not know how to get data from CSV, I also cannot understand where the error is.
Although the documentation says that having the same names of properties and CSV headers, you do not need to write additional configurations. However, I get the result specified above. The CSV names match the classes. Link to the documentation:https://joshclose.github.io/CsvHelper/getting-started/
reading method:
{
using (var reader = new StreamReader("C:\\Users\\Saint\\Desktop\\TaskRetail\\file.csv", Encoding.UTF8))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
var records = csv.GetRecords<Product>();
Console.WriteLine($"{records}");
}
}
CSV is created without problems, there are two columns with Id and Name with filled rows, there are 100 rows in total:
method for creating a csv with the Id and Name fields:
using (var writer = new StreamWriter("C:\\Users\\Saint\\Desktop\\TaskRetail\\file.csv", false, Encoding.UTF8))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(products);
}
the entire code:
using CsvHelper;
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Net;
using System.Text;
using System.Xml;
public class Program
{
public class Product
{
public int Id { get; set; }
public string Name { get; set; }
public Product(int id, string name)
{
Id = id;
Name = name;
}
}
public const string PathToDoc = "C:/Users/Saint/Desktop/TaskRetail/yml.xml";
public static void Main(string[] args)
{
string url = "https://www.googleapis.com/drive/v3/files/1sSR9kWifwjIP5qFWcyxGCxN0-MoEd_oo?alt=media&key=AIzaSyBsW_sj1GCItGBK0vl8hr9zu1I1vTI1Meo";
string savePath = #"C:\Users\Saint\Desktop\TaskRetail\yml.xml";
WebClient client = new WebClient();
client.DownloadFile(url, savePath);
Research();
}
public static void Research()
{
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
var document = new XmlDocument();
document.Load(PathToDoc);
var xmlDoc = document.SelectNodes("/yml_catalog/shop/offers/offer");
var count = xmlDoc.Count;
var products = new List<Product>();
Console.WriteLine($"Offers count: {count}");
for (var i = 0; i < count; i++)
{
var element = xmlDoc.Item(i);
var id = int.Parse(element.Attributes.GetNamedItem("id").Value);
var name = element.SelectSingleNode("name").InnerText;
var product = new Product(id, name);
//Console.WriteLine($"Id: {id}, name: {name}");
products.Add(product);
using (var writer = new StreamWriter("C:\\Users\\Saint\\Desktop\\TaskRetail\\file.csv", false, Encoding.UTF8))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(products);
}
var config = new CsvConfiguration(CultureInfo.InvariantCulture) { Delimiter = ",", PrepareHeaderForMatch = header => header.Header.ToLower() };
using (var reader = new StreamReader("C:\\Users\\Saint\\Desktop\\TaskRetail\\file.csv", Encoding.UTF8))
using (var csv = new CsvReader(reader, config))
{
var records = csv.GetRecords<Product>();
foreach (var record in records)
{
Console.WriteLine($"{record.Id} {record.Name}");
}
}
}
}
}
Because GetRecords() does return an object of type IEnumerable,
you have to iterate over your records to print each one of them:
foreach(var record in records)
{
Console.WriteLine($"{record.Id} {record.Name}");
}
Furthermore you have to access each property you want to print individually.
Another option would be to override the ToString() method in your Product class.
EDIT
The initial problem wasn't the correct printing of the values but the parsing of the file as I learned from this comment:
CsvHelper.HeaderValidationException: 'Header with name 'id'[0] was not found. Header with name 'name'[0] was not found.
To tackle this problem one have to make sure that the delimiter character is set correctly. This can be enforced in the config object of the CsvHelper. Furthermore to avoid casing errors the configuration can be set to ignore the casing of the headers:
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = ",", // Enforce ',' as delimiter
PrepareHeaderForMatch = header => header.Header.ToLower() // Ignore casing
};
using (var csv = new CsvReader(reader, config))
{
...
}

I have a *.tsv file which contains 32 million records and I need to load them and do search operation

When I load the file it throws me 'OutOfMemoryException'. How can I load and do search efficiently?
I am using
//to load the file.
var passEngine = new FileHelper<MyClass>.ReadFile().ToList()
var passList = passEngine.ReadFile("Files/plain_32m.tsv");
Or is there any other way to do it?
Code below adds data to a datatable. It also assumes the first row contains the names of the columns
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"Files/plain_32m.tsv";
static void Main(string[] args)
{
int rowCount = 0;
StreamReader reader = new StreamReader(FILENAME);
string line = "";
DataTable dt = new DataTable();
while ((line = reader.ReadLine()) != null)
{
string[] tsv = line.Split(new char[] { '\t' }).ToArray();
//remove any end spaces from data
tsv = tsv.Select(x => x.Trim()).ToArray();
if (++rowCount == 1)
{
foreach (string colName in tsv)
{
dt.Columns.Add(colName, typeof(string));
}
}
else
{
dt.Rows.Add(tsv);
}
}
}
}
}
You may consider approaching it in couple of ways
Method 1:
If it is one time search operation and pick only small set of records from large file, you can do so with streaming approach along with Linq to objects. there are number of open source libs available to care for it.
I'm going to show you one such library, Cinchoo ETL
using (var p = new ChoCSVReader<MyClass>("*** Your CSV File ***")
.WithFirstLineHeader()
)
{
var subset = p.Where(rec => rec.ID == 100).ToArray(); //You can apply any filter
}
Method 2:
Load the file to database. This approach is useful if your search criteria is complex, and improving the search with indices etc. You can load the file either with EF / BulkCopy / ADO.NET. BulkCopy is preferable for such large file. Sample code shows how to load the file using Bcp
string connectionString = "*** DB Connection String ***";
using (var p = new ChoCSVReader<MyClass>("*** Your CSV File ***")
.WithFirstLineHeader()
)
{
using (SqlBulkCopy bcp = new SqlBulkCopy(connectionString))
{
bcp.DestinationTableName = "** DB Table Name **";
bcp.EnableStreaming = true;
bcp.BatchSize = 10000;
bcp.BulkCopyTimeout = 0;
bcp.NotifyAfter = 10;
bcp.SqlRowsCopied += delegate (object sender, SqlRowsCopiedEventArgs e)
{
Console.WriteLine(e.RowsCopied.ToString("#,##0") + " rows copied.");
};
bcp.WriteToServer(p.AsDataReader());
}
}
Once you have loaded file to database, rest you can do like creating indices, querying and filtering the data via EF/ADO.NET etc.
Hope it helps.
FileHelpers has a FileHelpersAsyncEngine which allows you to work record by record and avoid reading or writing all the records at once. The documentation is here.
var engine = new FileHelperAsyncEngine<Customer>();
// Read
using(engine.BeginReadFile("Input.txt"))
{
// The engine is IEnumerable
foreach(Customer cust in engine)
{
// your code here
Console.WriteLine(cust.Name);
}
}
// Write
using(engine.BeginWriteFile("TestOut.txt"))
{
var arrayCustomers = GetSomeMoreCustomers(); // a batch at a time
if (arrayCustomers.Count() > 0)
{
foreach(Customer cust in arrayCustomers)
{
engine.WriteNext(cust);
}
}
}

How to split CSV file

"0.0.0.0,""0.255.255.255"",""ZZ"""
"1.0.0.0,""1.0.0.255"",""AU"""
"1.0.1.0,""1.0.3.255"",""CN"""
"1.0.4.0,""1.0.7.255"",""AU"""
"1.0.8.0,""1.0.15.255"",""CN"""
"1.0.16.0,""1.0.31.255"",""JP"""
"1.0.32.0,""1.0.63.255"",""CN"""
"1.0.64.0,""1.0.127.255"",""JP"""
"1.0.128.0,""1.0.255.255"",""TH"""
"1.1.0.0,""1.1.0.255"",""CN"""
"1.1.1.0,""1.1.1.255"",""AU"""
"1.1.2.0,""1.1.63.255"",""CN"""
"1.1.64.0,""1.1.127.255"",""JP"""
"1.1.128.0,""1.1.255.255"",""TH"""
İN EXCEL
0.0.0.0,"0.255.255.255","ZZ"
1.0.0.0,"1.0.0.255","AU"
1.0.1.0,"1.0.3.255","CN"
1.0.4.0,"1.0.7.255","AU"
1.0.8.0,"1.0.15.255","CN"
1.0.16.0,"1.0.31.255","JP"
1.0.32.0,"1.0.63.255","CN"
1.0.64.0,"1.0.127.255","JP"
1.0.128.0,"1.0.255.255","TH"
1.1.0.0,"1.1.0.255","CN"
1.1.1.0,"1.1.1.255","AU"
1.1.2.0,"1.1.63.255","CN"
1.1.64.0,"1.1.127.255","JP"
1.1.128.0,"1.1.255.255","TH"
1.2.0.0,"1.2.2.255","CN"
1.2.3.0,"1.2.3.255","AU"
1.2.4.0,"1.2.127.255","CN"
1.2.128.0,"1.2.255.255","TH"
1.3.0.0,"1.3.255.255","CN"
1.4.0.0,"1.4.0.255","AU"
1.4.1.0,"1.4.127.255","CN"
1.4.128.0,"1.4.255.255","TH"
How can split this CSV file.
For example 0.0.0.0 0.255.255.255 ZZ for first row and how can add datagridview with 3columns
You can do it via the following way..
using System.IO;
static void Main(string[] args)
{
using(var reader = new StreamReader(#"C:\test.csv"))
{
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(','); // or whatever yur get by reading that file
listA.Add(values[0]);
listB.Add(values[1]);
}
}
}
A CSV file is either a Tab delimited or a Comma delimited file. That said; you have to read the file line by line and then separate the values available in a line based on the delimiter character. The first line usually appears in a CSV file is usually the headers which you can use in order to produce a KeyValue pair to make your collection more efficient. For example:
Dictionary<int, Dictionary<String, String>> values = new Dictionary<int, Dictionary<String,String>>();
using(FileStream fileStream = new FileStream(#"D:\MyCSV.csv", FileMode.Open, FileAccess.Read, FileShare.Read)) {
using(StreamReader streamReader = new StreamReader(fileStream)){
//You can skip this line if there is no header
// Then instead of Dictionary<String,String> you use List<String>
var headers = streamReader.ReadLine().Split(',');
String line = null;
int lineNumber = 1;
while(!streamReader.EndOfStream){
line = streamReader.ReadLine().split(',');
if(line.Length == headers.Length){
var temp = new Dictionary<String, String>();
for(int i = 0; i < headers.Length; i++){
// You can remove '"' character by line[i].Replace("\"", "") or through using the Substring method
temp.Add(headers[i], line[i]);
}
values.Add(lineNumber, temp);
}
lineNumber++;
}
}
In case the data structure of your CSV is constant and it will not change in the future, you can develop a strongly typed data model and get rid of the Dictionary type. This approach will be more elegant and more efficient.
First of all, your CSV lines are surrounded by quotes. Is it copy/paste mistake? If not, you will need to sanitize the file to a valid CSV file.
You can try Cinchoo ETL - an open source library to load the CSV file to datatable, then you can assign it to your DataGridView source.
I'll show you both approach, how to handle
Valid CSV: (test.csv)
0.0.0.0,"0.255.255.255","ZZ"
1.0.0.0,"1.0.0.255","AU"
1.0.1.0,"1.0.3.255","CN"
1.0.4.0,"1.0.7.255","AU"
1.0.8.0,"1.0.15.255","CN"
1.0.16.0,"1.0.31.255","JP"
1.0.32.0,"1.0.63.255","CN"
1.0.64.0,"1.0.127.255","JP"
1.0.128.0,"1.0.255.255","TH"
1.1.0.0,"1.1.0.255","CN"
1.1.1.0,"1.1.1.255","AU"
1.1.2.0,"1.1.63.255","CN"
1.1.64.0,"1.1.127.255","JP"
1.1.128.0,"1.1.255.255","TH"
Read CSV:
using (var p = new ChoCSVReader("test.csv"))
{
var dt = p.AsDataTable();
//Assign dt to DataGridView
}
Next approach
Invalid CSV: (test.csv)
"0.0.0.0,""0.255.255.255"",""ZZ"""
"1.0.0.0,""1.0.0.255"",""AU"""
"1.0.1.0,""1.0.3.255"",""CN"""
"1.0.4.0,""1.0.7.255"",""AU"""
"1.0.8.0,""1.0.15.255"",""CN"""
"1.0.16.0,""1.0.31.255"",""JP"""
"1.0.32.0,""1.0.63.255"",""CN"""
"1.0.64.0,""1.0.127.255"",""JP"""
"1.0.128.0,""1.0.255.255"",""TH"""
"1.1.0.0,""1.1.0.255"",""CN"""
"1.1.1.0,""1.1.1.255"",""AU"""
"1.1.2.0,""1.1.63.255"",""CN"""
"1.1.64.0,""1.1.127.255"",""JP"""
"1.1.128.0,""1.1.255.255"",""TH"""
Read CSV:
using (var p = new ChoCSVReader("Sample6.csv"))
{
p.SanitizeLine += (o, e) =>
{
string line = e.Line as string;
if (line != null)
{
line = line.Substring(1, line.Length - 2);
line = line.Replace(#"""""", #"""");
}
e.Line - line;
};
var dt = p.AsDataTable();
//Assign dt to DataGridView
}
Hope it helps.

Write to CSV file using CsvHelper in C#

I tried to write to CSV file using CsvHelper in C#.
This is the link to the library http://joshclose.github.io/CsvHelper/
Nothing is sent to the csv file. I tried doing "exportCsv.WriteField("Hello");" but still nothing happened.
List<string> ColumnOne = new List<string>();
List<string> ColumnTwo = new List<string>();
var csvTextWriter = new
StreamWriter(#"C:\Users\Public\Documents\ExportTest.csv");
var exportCsv = new CsvWriter(csvTextWriter);
//creating a list to store workflows then adding name and description to the myWorkflowsList list
if (myWorkflows.WorkFlowCollection.Any())
{
foreach (var Workflow in myWorkflows.WorkFlowCollection)
{
ColumnOne.Add(Workflow.WorkflowName);
ColumnTwo.Add(Workflow.WorkflowDescription);
}
exportCsv.WriteField(ColumnOne);
//exportCsv.WriteField(ColumnTwo);
exportCsv.NextRecord();
exportCsv.Flush();
Console.WriteLine("File is saved:
C:\\Users\\Public\\Documents\\ExportTest.csv");
Console.ReadLine();
}
Your code doesn't add any records. It doesn't have any calls to WriteRecords or WriteRecord. It looks like it's trying to write an entire list of strings into a single field instead.
To write two columns out to a file you can use `WriteRecords, eg :
var data = from flow in myWorkflows.WorkFlowCollection
select new { flow.WorkflowName,flow.WorkflowDescription};
using (var writer = new StreamWriter("test.csv"))
using (var csv = new CsvWriter(writer))
{
csv.WriteRecords(data);
}
This will write a file with field names WorkflowName and WorkflowDescription
You can change how the fields are written by creating a small class that accepts only the fields you want and sets names etc through attributes :
class Flow
{
[NameAttribute("Workflow Name")]
public string WorkflowName { get; set; }
[NameAttribute("Workflow Description")]
public string WorkflowDescription { get; set; }
public Flow(string workflowName, string workflowDescription)
{
WorkflowName = workflowName;
WorkflowDescription = workflowDescription;
}
}
//...
var data = from flow in myWorkflows.WorkFlowCollection
select new Flow(flow.WorkflowName,flow.WorkflowDescription);
using (var writer = new StreamWriter("test.csv"))
using (var csv = new CsvWriter(writer))
{
csv.WriteRecords(data);
}

good way for reading a delimited file into DataTable

i was looking for good way to read a delimited file into DataTable and found this piece of code.
private void txtRead_Click(object sender, EventArgs e)
{
var filename = #"d:\shiptest.txt";
var reader = ReadAsLines(filename);
var data = new DataTable();
//this assume the first record is filled with the column names
var headers = reader.First().Split('\t');
foreach (var header in headers)
{
data.Columns.Add(header);
}
var records = reader.Skip(1);
foreach (var record in records)
{
data.Rows.Add(record.Split('\t'));
}
dgList.DataSource=data;
}
static IEnumerable<string> ReadAsLines(string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
this code works fine and fast but i am curious about that what would be the efficiency of the above code when there will be huge data in text file.looking for suggestion. thanks

Categories

Resources