good way for reading a delimited file into DataTable - c#

i was looking for good way to read a delimited file into DataTable and found this piece of code.
private void txtRead_Click(object sender, EventArgs e)
{
var filename = #"d:\shiptest.txt";
var reader = ReadAsLines(filename);
var data = new DataTable();
//this assume the first record is filled with the column names
var headers = reader.First().Split('\t');
foreach (var header in headers)
{
data.Columns.Add(header);
}
var records = reader.Skip(1);
foreach (var record in records)
{
data.Rows.Add(record.Split('\t'));
}
dgList.DataSource=data;
}
static IEnumerable<string> ReadAsLines(string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
this code works fine and fast but i am curious about that what would be the efficiency of the above code when there will be huge data in text file.looking for suggestion. thanks

Related

Manipulate existing CSV file, while keeping columns order. (CsvReader/CsvWriter)

I need to manipulate an existing CSV file via following actions:
Read from an existing CSV file -> then Append new row to it.
I have following code which is choking over the 3rd row - as the file is already in use by the code from the 1st row. And I'm not sure how to read it properly otherwise, and then append new row to it.
public bool Save(Customer customer)
{
using (StreamReader input = File.OpenText("DataStoreOut.csv"))
using (CsvReader csvReader = new CsvReader(input))
using (StreamWriter output = File.CreateText("DataStoreOut.csv"))
using (var csvWriter = new CsvWriter(output))
{
IEnumerable<Customer> records = csvReader.GetRecords<Customer>();
List<Customer> customerList = new List<Customer>();
customerList.Add(customer);
csvWriter.WriteHeader<Customer>();
csvWriter.NextRecord();
foreach (var array in customerList)
{
csvWriter.WriteRecord(records.Append(array));
}
}
}
Each of row in the CSV file contains a customer.CustomerId (which is unique, and read-only). How can I read only row which has specific customerId and then update any values there.
If you want to append a record to a file, the best way to do it is read the items, add the new one to the collection, and write everything back.
public static void Append(Customer customer, string file)
{
List<Customer> records = null;
using (var reader = new StreamReader(file))
{
using (var csv = new CsvReader(reader))
{
records = csv.GetRecords<Customer>().ToList();
}
}
records.Add(customer);
using (var writer = new StreamWriter(file))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecords(records);
}
}
}
As #Dour High Arch mentioned, to be perfectly safe though you might want to take the extra step of using a temp file in case something goes wrong.
If you want to update instead of append, you'd have to look up the specified record, and update it if it exists.
public static void Update(Customer customer, string file)
{
List<Customer> records = null;
using (var reader = new StreamReader(file))
{
using (var csv = new CsvReader(reader))
{
records = csv.GetRecords<Customer>().ToList();
}
}
var index = records.FindIndex(x => x.ID == customer.ID);
if (index >= 0)
{
records[index] = customer;
using (var writer = new StreamWriter(file))
{
using (var csv = new CsvWriter(writer))
{
csv.WriteRecords(records);
}
}
}
}
Again, writing to a temp file is advisable.
UPDATE
Actually there's a slightly better way to append if you don't want to replace the file. When instantiating a StreamWriter you can do so with append=true. In which case, it will append to the end of the file.
The small caveat is that in case the EOF marker is not at a new line but at the last field of the last record, this will append record to the end of the last field messing up your columns. As a workaround I've added a writer.WriteLine(); before using the CSVHelper class' writer.
public static void Append2(Customer customer, string file)
{
using (var writer = new StreamWriter(file, true))
{
writer.WriteLine();
using (var csv = new CsvWriter(writer))
{
csv.WriteRecord(customer);
}
}
}
In case the file is in a new line, then this will add an empty line though. That can be countered by ignoring empty lines when you read a file.

I have a *.tsv file which contains 32 million records and I need to load them and do search operation

When I load the file it throws me 'OutOfMemoryException'. How can I load and do search efficiently?
I am using
//to load the file.
var passEngine = new FileHelper<MyClass>.ReadFile().ToList()
var passList = passEngine.ReadFile("Files/plain_32m.tsv");
Or is there any other way to do it?
Code below adds data to a datatable. It also assumes the first row contains the names of the columns
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"Files/plain_32m.tsv";
static void Main(string[] args)
{
int rowCount = 0;
StreamReader reader = new StreamReader(FILENAME);
string line = "";
DataTable dt = new DataTable();
while ((line = reader.ReadLine()) != null)
{
string[] tsv = line.Split(new char[] { '\t' }).ToArray();
//remove any end spaces from data
tsv = tsv.Select(x => x.Trim()).ToArray();
if (++rowCount == 1)
{
foreach (string colName in tsv)
{
dt.Columns.Add(colName, typeof(string));
}
}
else
{
dt.Rows.Add(tsv);
}
}
}
}
}
You may consider approaching it in couple of ways
Method 1:
If it is one time search operation and pick only small set of records from large file, you can do so with streaming approach along with Linq to objects. there are number of open source libs available to care for it.
I'm going to show you one such library, Cinchoo ETL
using (var p = new ChoCSVReader<MyClass>("*** Your CSV File ***")
.WithFirstLineHeader()
)
{
var subset = p.Where(rec => rec.ID == 100).ToArray(); //You can apply any filter
}
Method 2:
Load the file to database. This approach is useful if your search criteria is complex, and improving the search with indices etc. You can load the file either with EF / BulkCopy / ADO.NET. BulkCopy is preferable for such large file. Sample code shows how to load the file using Bcp
string connectionString = "*** DB Connection String ***";
using (var p = new ChoCSVReader<MyClass>("*** Your CSV File ***")
.WithFirstLineHeader()
)
{
using (SqlBulkCopy bcp = new SqlBulkCopy(connectionString))
{
bcp.DestinationTableName = "** DB Table Name **";
bcp.EnableStreaming = true;
bcp.BatchSize = 10000;
bcp.BulkCopyTimeout = 0;
bcp.NotifyAfter = 10;
bcp.SqlRowsCopied += delegate (object sender, SqlRowsCopiedEventArgs e)
{
Console.WriteLine(e.RowsCopied.ToString("#,##0") + " rows copied.");
};
bcp.WriteToServer(p.AsDataReader());
}
}
Once you have loaded file to database, rest you can do like creating indices, querying and filtering the data via EF/ADO.NET etc.
Hope it helps.
FileHelpers has a FileHelpersAsyncEngine which allows you to work record by record and avoid reading or writing all the records at once. The documentation is here.
var engine = new FileHelperAsyncEngine<Customer>();
// Read
using(engine.BeginReadFile("Input.txt"))
{
// The engine is IEnumerable
foreach(Customer cust in engine)
{
// your code here
Console.WriteLine(cust.Name);
}
}
// Write
using(engine.BeginWriteFile("TestOut.txt"))
{
var arrayCustomers = GetSomeMoreCustomers(); // a batch at a time
if (arrayCustomers.Count() > 0)
{
foreach(Customer cust in arrayCustomers)
{
engine.WriteNext(cust);
}
}
}

How to split CSV file

"0.0.0.0,""0.255.255.255"",""ZZ"""
"1.0.0.0,""1.0.0.255"",""AU"""
"1.0.1.0,""1.0.3.255"",""CN"""
"1.0.4.0,""1.0.7.255"",""AU"""
"1.0.8.0,""1.0.15.255"",""CN"""
"1.0.16.0,""1.0.31.255"",""JP"""
"1.0.32.0,""1.0.63.255"",""CN"""
"1.0.64.0,""1.0.127.255"",""JP"""
"1.0.128.0,""1.0.255.255"",""TH"""
"1.1.0.0,""1.1.0.255"",""CN"""
"1.1.1.0,""1.1.1.255"",""AU"""
"1.1.2.0,""1.1.63.255"",""CN"""
"1.1.64.0,""1.1.127.255"",""JP"""
"1.1.128.0,""1.1.255.255"",""TH"""
İN EXCEL
0.0.0.0,"0.255.255.255","ZZ"
1.0.0.0,"1.0.0.255","AU"
1.0.1.0,"1.0.3.255","CN"
1.0.4.0,"1.0.7.255","AU"
1.0.8.0,"1.0.15.255","CN"
1.0.16.0,"1.0.31.255","JP"
1.0.32.0,"1.0.63.255","CN"
1.0.64.0,"1.0.127.255","JP"
1.0.128.0,"1.0.255.255","TH"
1.1.0.0,"1.1.0.255","CN"
1.1.1.0,"1.1.1.255","AU"
1.1.2.0,"1.1.63.255","CN"
1.1.64.0,"1.1.127.255","JP"
1.1.128.0,"1.1.255.255","TH"
1.2.0.0,"1.2.2.255","CN"
1.2.3.0,"1.2.3.255","AU"
1.2.4.0,"1.2.127.255","CN"
1.2.128.0,"1.2.255.255","TH"
1.3.0.0,"1.3.255.255","CN"
1.4.0.0,"1.4.0.255","AU"
1.4.1.0,"1.4.127.255","CN"
1.4.128.0,"1.4.255.255","TH"
How can split this CSV file.
For example 0.0.0.0 0.255.255.255 ZZ for first row and how can add datagridview with 3columns
You can do it via the following way..
using System.IO;
static void Main(string[] args)
{
using(var reader = new StreamReader(#"C:\test.csv"))
{
List<string> listA = new List<string>();
List<string> listB = new List<string>();
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var values = line.Split(','); // or whatever yur get by reading that file
listA.Add(values[0]);
listB.Add(values[1]);
}
}
}
A CSV file is either a Tab delimited or a Comma delimited file. That said; you have to read the file line by line and then separate the values available in a line based on the delimiter character. The first line usually appears in a CSV file is usually the headers which you can use in order to produce a KeyValue pair to make your collection more efficient. For example:
Dictionary<int, Dictionary<String, String>> values = new Dictionary<int, Dictionary<String,String>>();
using(FileStream fileStream = new FileStream(#"D:\MyCSV.csv", FileMode.Open, FileAccess.Read, FileShare.Read)) {
using(StreamReader streamReader = new StreamReader(fileStream)){
//You can skip this line if there is no header
// Then instead of Dictionary<String,String> you use List<String>
var headers = streamReader.ReadLine().Split(',');
String line = null;
int lineNumber = 1;
while(!streamReader.EndOfStream){
line = streamReader.ReadLine().split(',');
if(line.Length == headers.Length){
var temp = new Dictionary<String, String>();
for(int i = 0; i < headers.Length; i++){
// You can remove '"' character by line[i].Replace("\"", "") or through using the Substring method
temp.Add(headers[i], line[i]);
}
values.Add(lineNumber, temp);
}
lineNumber++;
}
}
In case the data structure of your CSV is constant and it will not change in the future, you can develop a strongly typed data model and get rid of the Dictionary type. This approach will be more elegant and more efficient.
First of all, your CSV lines are surrounded by quotes. Is it copy/paste mistake? If not, you will need to sanitize the file to a valid CSV file.
You can try Cinchoo ETL - an open source library to load the CSV file to datatable, then you can assign it to your DataGridView source.
I'll show you both approach, how to handle
Valid CSV: (test.csv)
0.0.0.0,"0.255.255.255","ZZ"
1.0.0.0,"1.0.0.255","AU"
1.0.1.0,"1.0.3.255","CN"
1.0.4.0,"1.0.7.255","AU"
1.0.8.0,"1.0.15.255","CN"
1.0.16.0,"1.0.31.255","JP"
1.0.32.0,"1.0.63.255","CN"
1.0.64.0,"1.0.127.255","JP"
1.0.128.0,"1.0.255.255","TH"
1.1.0.0,"1.1.0.255","CN"
1.1.1.0,"1.1.1.255","AU"
1.1.2.0,"1.1.63.255","CN"
1.1.64.0,"1.1.127.255","JP"
1.1.128.0,"1.1.255.255","TH"
Read CSV:
using (var p = new ChoCSVReader("test.csv"))
{
var dt = p.AsDataTable();
//Assign dt to DataGridView
}
Next approach
Invalid CSV: (test.csv)
"0.0.0.0,""0.255.255.255"",""ZZ"""
"1.0.0.0,""1.0.0.255"",""AU"""
"1.0.1.0,""1.0.3.255"",""CN"""
"1.0.4.0,""1.0.7.255"",""AU"""
"1.0.8.0,""1.0.15.255"",""CN"""
"1.0.16.0,""1.0.31.255"",""JP"""
"1.0.32.0,""1.0.63.255"",""CN"""
"1.0.64.0,""1.0.127.255"",""JP"""
"1.0.128.0,""1.0.255.255"",""TH"""
"1.1.0.0,""1.1.0.255"",""CN"""
"1.1.1.0,""1.1.1.255"",""AU"""
"1.1.2.0,""1.1.63.255"",""CN"""
"1.1.64.0,""1.1.127.255"",""JP"""
"1.1.128.0,""1.1.255.255"",""TH"""
Read CSV:
using (var p = new ChoCSVReader("Sample6.csv"))
{
p.SanitizeLine += (o, e) =>
{
string line = e.Line as string;
if (line != null)
{
line = line.Substring(1, line.Length - 2);
line = line.Replace(#"""""", #"""");
}
e.Line - line;
};
var dt = p.AsDataTable();
//Assign dt to DataGridView
}
Hope it helps.

C# Reading CSV to DataTable and Invoke Rows/Columns

i am currently working on a small Project and i got stuck with a Problem i currently can not manage to solve...
I have multiple ".CSV" Files i want to read, they all have the same Data just with different Values.
Header1;Value1;Info1
Header2;Value2;Info2
Header3;Value3;Info3
While reading the first File i Need to Create the Headers. The Problem is they are not splited in Columns but in rows (as you can see above Header1-Header3).
Then it Needs to read the Value 1 - Value 3 (they are listed in the 2nd Column) and on top of that i Need to create another Header -> Header4 with the data of "Info2" which is always placed in Column 3 and Row 2 (the other values of Column 3 i can ignore).
So the Outcome after the first File should look like this:
Header1;Header2;Header3;Header4;
Value1;Value2;Value3;Info2;
And after multiple files it sohuld be like this:
Header1;Header2;Header3;Header4;
Value1;Value2;Value3;Value4;
Value1b;Value2b;Value3b;Value4b;
Value1c;Value2c;Value3c;Value4c;
I tried it with OleDB but i get the Error "missing ISAM" which i cant mange to fix. The Code i Used is the following:
public DataTable ReadCsv(string fileName)
{
DataTable dt = new DataTable("Data");
/* using (OleDbConnection cn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" +
Path.GetDirectoryName(fileName) + "\";Extendet Properties ='text;HDR=yes;FMT=Delimited(,)';"))
*/
using (OleDbConnection cn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" +
Path.GetDirectoryName(fileName) + ";Extendet Properties ='text;HDR=yes;FMT=Delimited(,)';"))
{
using(OleDbCommand cmd = new OleDbCommand(string.Format("select *from [{0}]", new FileInfo(fileName).Name,cn)))
{
cn.Open();
using(OleDbDataAdapter adapter = new OleDbDataAdapter(cmd))
{
adapter.Fill(dt);
}
}
}
return dt;
}
Another attempt i did was using StreamReader. But the Headers are in the wrong place and i dont know how to Change this + do this for every file. the Code i tried is the following:
public static DataTable ReadCsvFilee(string path)
{
DataTable oDataTable = new DataTable();
var fileNames = Directory.GetFiles(path);
foreach (var fileName in fileNames)
{
//initialising a StreamReader type variable and will pass the file location
StreamReader oStreamReader = new StreamReader(fileName);
// CONTROLS WHETHER WE SKIP A ROW OR NOT
int RowCount = 0;
// CONTROLS WHETHER WE CREATE COLUMNS OR NOT
bool hasColumns = false;
string[] ColumnNames = null;
string[] oStreamDataValues = null;
//using while loop read the stream data till end
while (!oStreamReader.EndOfStream)
{
String oStreamRowData = oStreamReader.ReadLine().Trim();
if (oStreamRowData.Length > 0)
{
oStreamDataValues = oStreamRowData.Split(';');
//Bcoz the first row contains column names, we will poluate
//the column name by
//reading the first row and RowCount-0 will be true only once
// CHANGE TO CHECK FOR COLUMNS CREATED
if (!hasColumns)
{
ColumnNames = oStreamRowData.Split(';');
//using foreach looping through all the column names
foreach (string csvcolumn in ColumnNames)
{
DataColumn oDataColumn = new DataColumn(csvcolumn.ToUpper(), typeof(string));
//setting the default value of empty.string to newly created column
oDataColumn.DefaultValue = string.Empty;
//adding the newly created column to the table
oDataTable.Columns.Add(oDataColumn);
}
// SET COLUMNS CREATED
hasColumns = true;
// SET RowCount TO 0 SO WE KNOW TO SKIP COLUMNS LINE
RowCount = 0;
}
else
{
// IF RowCount IS 0 THEN SKIP COLUMN LINE
if (RowCount++ == 0) continue;
//creates a new DataRow with the same schema as of the oDataTable
DataRow oDataRow = oDataTable.NewRow();
//using foreach looping through all the column names
for (int i = 0; i < ColumnNames.Length; i++)
{
oDataRow[ColumnNames[i]] = oStreamDataValues[i] == null ? string.Empty : oStreamDataValues[i].ToString();
}
//adding the newly created row with data to the oDataTable
oDataTable.Rows.Add(oDataRow);
}
}
}
//close the oStreamReader object
oStreamReader.Close();
//release all the resources used by the oStreamReader object
oStreamReader.Dispose();
}
return oDataTable;
}
I am thankful for everyone who is willing to help. And Thanks for reading this far!
Sincerely yours
If I understood you right, there is a strict parsing there like this:
string OpenAndParse(string filename, bool firstFile=false)
{
var lines = File.ReadAllLines(filename);
var parsed = lines.Select(l => l.Split(';')).ToArray();
var header = $"{parsed[0][0]};{parsed[1][0]};{parsed[2][0]};{parsed[1][0]}\n";
var data = $"{parsed[0][1]};{parsed[1][1]};{parsed[2][1]};{parsed[1][2]}\n";
return firstFile
? $"{header}{data}"
: $"{data}";
}
Where it would return - if first file:
Header1;Header2;Header3;Header2
Value1;Value2;Value3;Value4
if not first file:
Value1;Value2;Value3;Value4
If I am correct, rest is about running this against a list file of files and joining the results in an output file.
EDIT: Against a directory:
void ProcessFiles(string folderName, string outputFileName)
{
bool firstFile = true;
foreach (var f in Directory.GetFiles(folderName))
{
File.AppendAllText(outputFileName, OpenAndParse(f, firstFile));
firstFile = false;
}
}
Note: I missed you want a DataTable and not an output file. Then you could simply create a list and put the results into that list making the list the datasource for your datatable (then why would you use semicolons in there? Probably all you need is to simply attach the array values to a list).
(Adding as another answer just to make it uncluttered)
void ProcessMyFiles(string folderName)
{
List<MyData> d = new List<MyData>();
var files = Directory.GetFiles(folderName);
foreach (var file in files)
{
OpenAndParse(file, d);
}
string[] headers = GetHeaders(files[0]);
DataGridView dgv = new DataGridView {Dock=DockStyle.Fill};
dgv.DataSource = d;
dgv.ColumnAdded += (sender, e) => {e.Column.HeaderText = headers[e.Column.Index];};
Form f = new Form();
f.Controls.Add(dgv);
f.Show();
}
string[] GetHeaders(string filename)
{
var lines = File.ReadAllLines(filename);
var parsed = lines.Select(l => l.Split(';')).ToArray();
return new string[] { parsed[0][0], parsed[1][0], parsed[2][0], parsed[1][0] };
}
void OpenAndParse(string filename, List<MyData> d)
{
var lines = File.ReadAllLines(filename);
var parsed = lines.Select(l => l.Split(';')).ToArray();
var data = new MyData
{
Col1 = parsed[0][1],
Col2 = parsed[1][1],
Col3 = parsed[2][1],
Col4 = parsed[1][2]
};
d.Add(data);
}
public class MyData
{
public string Col1 { get; set; }
public string Col2 { get; set; }
public string Col3 { get; set; }
public string Col4 { get; set; }
}
I don't know if this is the best way to do this. But what i would have done in your case, is to rewrite the CSV's the conventionnal way while reading all the files, then create a stream containing the new CSV created.
It would look like something like this :
var csv = new StringBuilder();
csv.AppendLine("Header1;Header2;Header3;Header4");
foreach (var item in file)
{
var newLine = string.Format("{0},{1},{2},{3}", item.value1, item.value2, item.value3, item.value4);
csv.AppendLine(newLine);
}
//Create Stream
MemoryStream stream = new MemoryStream();
StreamReader reader = new StreamReader(stream);
//Fill your data table here with your values
Hope this will help.

table adapter returning incorrect results

I have an application which takes in a csv file and returns certain rows .
The code that takes the csv and tells which data to send to the db is shown here
List<string[]> Results = sm.parseCSV2(ofd.FileName, de).Where(x=>x.Length >5).ToList();
foreach (string[] item2 in Results)
{
objSqlCommands.sqlCommandInsertorUpdate2("INSERT", Results);//laClient[0]);
}
with my parsing code here
public List<string[]> parseCSV2(string path, char[] delim)
{
// Intialise return value
List<string[]> parsedData = new List<string[]>();
try
{
// With 'StreamRader' read file that is located in save pat
using (StreamReader readFile = new StreamReader(path))
{
string line; // current line
string[] row; // array row
// Go thru file until we reach the end
while ((line = readFile.ReadLine()) != null)
{
row = line.Split(delim);// arry row equals values delimited by pipe
parsedData.Add(row); // add this to return value <List>
}
}
}
catch (Exception e)
{
MessageBox.Show(e.Message);
}
return parsedData; // return list
}
alongside my sql code
objConnections.executeSQL(connection,
"INSERT INTO generic.Client(ClientName) VALUES('" + text + "')");
I am then calling the tableadapter
//Refreshs the Client table on display from the
this.clientTableAdapter.Fill(this.CalcDataSet.Client);
//update the view
dgvClientlst.Update() ;
However the data being returned is shown below
System.Collections.Generic.List`1[System.String[]]
I have had it suggested that my query is actually printing the list ToString() but as my code isnt doing that I'm unsure what the problem is . Any help much appreciated
foreach (string[] item2 in Results)
{
objSqlCommands.sqlCommandInsertorUpdate2("INSERT", item2);//You were mixed up with Results here
}
I think your code might be this (I'm not sure if your objSqlCommands.sqlCommandInsertorUpdate2 can handle a string[] passed in?)
foreach (string[] item2 in Results)
{
foreach(string item in item2)
objSqlCommands.sqlCommandInsertorUpdate2("INSERT", item);
}

Categories

Resources