Reading CSV File Using Jet - TabDelimited Does Not Workl! - c#

This has been killing me - I have a massive file that I need to read in as a DataTable.
After a lot of messing about I am using this:
using (OleDbConnection connection = new OleDbConnection(connString))
{
using (OleDbCommand command = new OleDbCommand(sql, connection))
{
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
{
dataTable = new DataTable();
dataTable.Locale = CultureInfo.CurrentCulture;
adapter.Fill(dataTable);
}
}
}
which works if the text file is comma seperated but does not work if it is tab delimited - Can anyone please help??
My connection string looks like :
string connString = #"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + pathOnly + #";Extended Properties='text;HDR=YES'";
I ve tried to set the FMT property with no luck....

Here is a good library use it.
http://www.codeproject.com/KB/database/CsvReader.aspx
Here is the code which use the library.
TextReader tr = new StreamReader(HttpContext.Current.Server.MapPath(Filename));
string data = tr.ReadToEnd();
tr.Close();
for comma delimited;
CachedCsvReader cr = new CachedCsvReader(new StringReader(csv), true);
for tab delimited;
CachedCsvReader cr = new CachedCsvReader(new StringReader(csv), true, '\t');
And here you can load it into DataTable by having this code
DataTable dt = new DataTable();
dt.Load(cr);
Hope you find it helpful. Thanks

Manually: You can use String.Split() method to split your entire file. Here is an example of what i use in my code. In this example, i read the data line by line and split it. I then place the data directly into columns.
//Open and read the file
System.IO.FileStream fs = new System.IO.FileStream("myfilename", System.IO.FileMode.Open);
System.IO.StreamReader sr = new System.IO.StreamReader(fs);
string line = "";
line = sr.ReadLine();
string[] colVal;
try
{
//clear of previous data
//myDataTable.Clear();
//for each reccord insert it into a row
while (!sr.EndOfStream)
{
line = sr.ReadLine();
colVal = line.Split('\t');
DataRow dataRow = myDataTable.NewRow();
//associate values with the columns
dataRow["col1"] = colVal[0];
dataRow["col2"] = colVal[1];
dataRow["col3"] = colVal[2];
//add the row to the table
myDataTable.Rows.Add(dataRow);
}
//close the stream
sr.Close();
//binds the dataset tothe grid view.
BindingSource bs = new BindingSource();
bs.DataSource = myDataSet;
bs.DataMember = myDataTable.TableName;
myGridView.DataSource = bs;
}
You could modify it to do some loops for columns if you have many and they are numbered. Also, i recommend checking the integrity first, by checking that the number of columns read is correct.

This should work: (from http://www.hotblue.com/article0000.aspx?a=0006)
just replace the comma part with:
if ((postdata || !quoted) && (c == ',' || c == '\t'))
to make it tab delimited.
using System.Data;
using System.IO;
using System.Text.RegularExpressions;
public DataTable ParseCSV(string inputString) {
DataTable dt=new DataTable();
// declare the Regular Expression that will match versus the input string
Regex re=new Regex("((?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");
ArrayList colArray=new ArrayList();
ArrayList rowArray=new ArrayList();
int colCount=0;
int maxColCount=0;
string rowbreak="";
string field="";
MatchCollection mc=re.Matches(inputString);
foreach(Match m in mc) {
// retrieve the field and replace two double-quotes with a single double-quote
field=m.Result("${field}").Replace("\"\"","\"");
rowbreak=m.Result("${rowbreak}");
if (field.Length > 0) {
colArray.Add(field);
colCount++;
}
if (rowbreak.Length > 0) {
// add the column array to the row Array List
rowArray.Add(colArray.ToArray());
// create a new Array List to hold the field values
colArray=new ArrayList();
if (colCount > maxColCount)
maxColCount=colCount;
colCount=0;
}
}
if (rowbreak.Length == 0) {
// this is executed when the last line doesn't
// end with a line break
rowArray.Add(colArray.ToArray());
if (colCount > maxColCount)
maxColCount=colCount;
}
// create the columns for the table
for(int i=0; i < maxColCount; i++)
dt.Columns.Add(String.Format("col{0:000}",i));
// convert the row Array List into an Array object for easier access
Array ra=rowArray.ToArray();
for(int i=0; i < ra.Length; i++) {
// create a new DataRow
DataRow dr=dt.NewRow();
// convert the column Array List into an Array object for easier access
Array ca=(Array)(ra.GetValue(i));
// add each field into the new DataRow
for(int j=0; j < ca.Length; j++)
dr[j]=ca.GetValue(j);
// add the new DataRow to the DataTable
dt.Rows.Add(dr);
}
// in case no data was parsed, create a single column
if (dt.Columns.Count == 0)
dt.Columns.Add("NoData");
return dt;
}
Now that we have a parser for converting a string into a DataTable, all we need now is a function that will read the content from a CSV file and pass it to our ParseCSV function:
public DataTable ParseCSVFile(string path) {
string inputString="";
// check that the file exists before opening it
if (File.Exists(path)) {
StreamReader sr = new StreamReader(path);
inputString = sr.ReadToEnd();
sr.Close();
}
return ParseCSV(inputString);
}
And now you can easily fill a DataGrid with data coming off the CSV file:
protected System.Web.UI.WebControls.DataGrid DataGrid1;
private void Page_Load(object sender, System.EventArgs e) {
// call the parser
DataTable dt=ParseCSVFile(Server.MapPath("./demo.csv"));
// bind the resulting DataTable to a DataGrid Web Control
DataGrid1.DataSource=dt;
DataGrid1.DataBind();
}

Related

Import Part of CSV to datagridview

What I have is a CSV that I have imported into a Datagridview.
I am now looking for a way to only import the column with the header # and Delay and not all info in the CSV, so any help on this would be appreciated.
Here is the Code I have thus far:
private void button1_Click(object sender, EventArgs e)
{
DataTable dt = new DataTable();
DialogResult result = openFileDialog1.ShowDialog();
if (result == DialogResult.OK) // Test result.
{
String Fname = openFileDialog1.FileName;
//String Sname = "export";
string[] raw_text = System.IO.File.ReadAllLines(Fname);
string[] data_col = null;
int x = 0;
foreach (string text_line in raw_text)
{
data_col = text_line.Split(';');
if (x == 0)
{
for (int i = 0; i < data_col.Count(); i++)
{
dt.Columns.Add(data_col[i]);
}
x++;
}
else
{
dt.Rows.Add(data_col);
}
}
dataGridView1.DataSource = dt;
}
}
When I read from CSV files, I create a list of values that I want for each row and use that list as the basis for my INSERT statement to the database.
I know where to find the data I want in the CSV file so I specifically target those items while I'm building my list of parameters for the query.
See the code below:
// Read the file content from the function parameter.
string content = System.Text.Encoding.ASCII.GetString(bytes);
// Split the content into an array where each array item is a line for
// each row of data.
// The Replace simply removes the CarriageReturn LineFeed characters from
// the source text and replaces them with a Pipe character (`|`)
// and then does the split from that character.
// This is just personal preference to do it this way
string[] data = content.Replace("\r\n", "|").Split('|');
// Loop through each row and extract the data you want.
// Note that each value is in a fixed position in the row.
foreach (string row in data)
{
if (!String.IsNullOrEmpty(row))
{
string[] cols = row.Split(';');
List<MySqlParameter> args = new List<MySqlParameter>();
args.Add(new MySqlParameter("#sid", Session["storeid"]));
args.Add(new MySqlParameter("#name", cols[0]));
args.Add(new MySqlParameter("#con", cols[3]));
try
{
// Insert the data to the database.
}
catch (Exception ex)
{
// Report an error.
}
}
}
In the same way, you could build your list/dataset/whatever as a data source for your datagridview. I would build a table.
Here's a mockup (I haven't got time to test it right now but it should get you on the right track).
DataTable table = new DataTable();
table.Columns.Add("#");
table.Columns.Add("Delay");
foreach (var line in raw_text)
{
DataRow row = table.NewRow();
row[0] = line[0]; // The # value you want.
row[1] = line[1]; // The Delay value you want.
table.Rows.Add(row);
}
DataGridView1.DataSource = table;
DataGridView1.DataBind();
Using TextFieldParser can make handling CVS input less brittle:
// add this using statement for TextFieldParser - needs reference to Microsoft.VisualBasic assembly
using Microsoft.VisualBasic.FileIO;
...
// TextFieldParser implements IDisposable so you can let a using block take care of opening and closing
using (TextFieldParser parser = new TextFieldParser(Fname))
{
// configure your parser to your needs
parser.TextFieldType = FieldType.Delimited;
parser.Delimiters = new string[] { ";" };
parser.HasFieldsEnclosedInQuotes = false; // no messy code if your data comes with quotes: ...;"text value";"another";...
// read the first line with your headers
string[] fields = parser.ReadFields();
// add the desired headers with the desired data type
dt.Columns.Add(fields[2], typeof(string));
dt.Columns.Add(fields[4], typeof(string));
// read the rest of the lines from your file
while (!parser.EndOfData)
{
// all fields from one line
string[] line = parser.ReadFields();
// create a new row <-- this is missing in your code
DataRow row = dt.NewRow();
// put data values; cast if needed - this example uses string type columns
row[0] = line[2];
row[1] = line[4];
// add the newly created and filled row
dt.Rows.Add(row);
}
}
// asign to DGV
this.dataGridView1.DataSource = dt;

How to copy 1 column from a CSV file to a SQL database?

I got a CSV file but in the files in using there are not comma seperaters in the file.
How can I copy this data into my database? So the CSV file and thus the database should have 1 column.
This is what I tried so far:
System.Data.DataTable csvData = new System.Data.DataTable();
try
{
using (TextFieldParser csvReader = new TextFieldParser(csv_file_path, Encoding.GetEncoding("windows-1250"))) //windows 1250 is de correcte character encoding voor europese characters
{
csvReader.SetDelimiters(new string[] { "," }); //change this maybe to something???
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData); // it return an error here saying that: `System.ArgumentException: Input array is longer than the number of columns in this table`
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
InsertDataIntoSQLServerUsingSQLBulkCopy(csvData, tablenaam);
return csvData;
So an example for the CSV would be:
test123
test45,6
test789
And in my database would be the exact same values.
EDIT: read the comment about actual delimeters so I've updated the code below, is not pretty but should give you a starting point
Why not read the file as a simple text file. One line at a time and parse the expected syntax.
Doing something like this (not tested, may not compile)
string line;
System.Data.DataTable csvData = new System.Data.DataTable();
csvData.Columns.Add("OnlyColumn", typeof(String));
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
if(line.StartsWith("-"))
continue;
DataRow newRow = csvData.NewRow();
newRow["OnlyColumn"] = line.Split('|')[1].Trim();
csvData.Rows.Add(newRow);
}
file.Close();
InsertDataIntoSQLServerUsingSQLBulkCopy(csvData, tablenaam);
Maybe you need to use new line as delimiter not comma. I think you can have new values in new lines?
I think you can just use StreamReader.ReadLine() for this
You should check out this link for an explanation from Microsoft.
https://msdn.microsoft.com/en-us/library/system.io.streamreader.readline(v=vs.110).aspx
So something like this should do it:
using (StreamReader sr = new StreamReader(csv_file_path))
{
while (sr.Peek() >= 0)
{
csvData.Rows.Add(new string[] {sr.ReadLine()});
}
}

How to read excel file in asp.net

I am using Epplus library in order to upload data from excel file.The code i am using is perfectly works for excel file which has standard form.ie if first row is column and rest all data corresponds to column.But now a days i am getting regularly , excel files which has different structure and i am not able to read
excel file like as shown below
what i want is on third row i wan only Region and Location Id and its values.Then 7th row is columns and 8th to 15 are its values.Finally 17th row is columns for 18th to 20th .How to load all these datas to seperate datatables
code i used is as shown below
I created an extension method
public static DataSet Exceltotable(this string path)
{
DataSet ds = null;
using (var pck = new OfficeOpenXml.ExcelPackage())
{
try
{
using (var stream = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
pck.Load(stream);
}
ds = new DataSet();
var wss = pck.Workbook.Worksheets;
////////////////////////////////////
//Application app = new Application();
//app.Visible = true;
//app.Workbooks.Add("");
//app.Workbooks.Add(#"c:\MyWork\WorkBook1.xls");
//app.Workbooks.Add(#"c:\MyWork\WorkBook2.xls");
//for (int i = 2; i <= app.Workbooks.Count; i++)
//{
// for (int j = 1; j <= app.Workbooks[i].Worksheets.Count; j++)
// {
// Worksheet ws = app.Workbooks[i].Worksheets[j];
// ws.Copy(app.Workbooks[1].Worksheets[1]);
// }
//}
///////////////////////////////////////////////////
//for(int s=0;s<5;s++)
//{
foreach (var ws in wss)
{
System.Data.DataTable tbl = new System.Data.DataTable();
bool hasHeader = true; // adjust it accordingly( i've mentioned that this is a simple approach)
string ErrorMessage = string.Empty;
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (var rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var row = tbl.NewRow();
foreach (var cell in wsRow)
{
//modifed by faras
if (cell.Text != null)
{
row[cell.Start.Column - 1] = cell.Text;
}
}
tbl.Rows.Add(row);
tbl.TableName = ws.Name;
}
DataTable dt = RemoveEmptyRows(tbl);
ds.Tables.Add(dt);
}
}
catch (Exception exp)
{
}
return ds;
}
}
If you're providing the template for users to upload, you can mitigate this some by using named ranges in your spreadsheet. That's a good idea anyway when programmatically working with Excel because it helps when you modify your own spreadsheet, not just when the user does.
You probably know how to name a range, but for the sake of completeness, here's how to name a range.
When you're working with the spreadsheet in code you can get a reference to the range using [yourworkbook].Names["yourNamedRange"]. If it's just a single cell and you need to reference the row or column index you can use .Start.Row or .Start.Column.
I add named ranges for anything - cells containing particular values, columns, header rows, rows where sets of data begin. If I need row or column indexes I assign useful variable names. That protects you from having all sorts of "magic numbers" in your spreadsheet. You (or your users) can move quite a bit around without breaking anything.
If they modify the structure too much then it won't work. You can also use protection on the workbook and worksheet to ensure that they can't accidentally modify the structure - tabs, rows, columns.
This is loosely taken from a test I was working with last weekend when I was learning this. It was just a "hello world" so I wasn't trying to make it all streamlined and perfect. (I was working on populating a spreadsheet, not reading one, so I'm just learning the properties as I go.)
// Open the workbook
using (var package = new ExcelPackage(new FileInfo("PriceQuoteTemplate.xlsx")))
{
// Get the worksheet I'm looking for
var quoteSheet = package.Workbook.Worksheets["Quote"];
//If I wanted to get the text from one named range
var cellText = quoteSheet.Workbook.Names["myNamedRange"].Text
//If I wanted to get the cell's value as some other type
var cellValue = quoteSheet.Workbook.Names["myNamedRange"].GetValue<int>();
//If I had a named range and I wanted to loop through the rows and get
//values from certain columns
var myRange = quoteSheet.Workbook.Names["rangeContainingRows"];
//This is a named range used to mark a column. So instead of using a
//magic number, I'll read from whatever column has this named range.
var someColumn = quoteSheet.Workbook.Names["columnLabel"].Start.Column;
for(var rowNumber = myRange.Start.Row; rowNumber < myRange.Start.Row + myRange.Rows; rowNumber++)
{
var getTheTextForTheRowAndColumn = quoteSheet.Cells(rowNumber, someColumn).Text
}
There might be a more elegant way to go about it. I just started using this myself. But the idea is you tell it to find a certain named range on the spreadsheet, and then you use the row or column number of that range instead of a magic row or column number.
Even though a range might be one cell, one row, or one column, it can potentially be a larger area. That's why I use .Start.Row. In other words, give me the row for the first cell in the range. If a range has more than one row, the .Rows property indicates the number of rows so I know how many there are. That means someone could even insert rows without breaking the code.
using System;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.IO;
namespace ReadData
{
public partial class ImportExelDataInGridView : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
}
protected void btnUpload_Click(object sender, EventArgs e)
{
//Coneection String by default empty
string ConStr = "";
//Extantion of the file upload control saving into ext because
//there are two types of extation .xls and .xlsx of excel
string ext = Path.GetExtension(FileUpload1.FileName).ToLower();
//getting the path of the file
string path = Server.MapPath("~/MyFolder/"+FileUpload1.FileName);
//saving the file inside the MyFolder of the server
FileUpload1.SaveAs(path);
Label1.Text = FileUpload1.FileName + "\'s Data showing into the GridView";
//checking that extantion is .xls or .xlsx
if (ext.Trim() == ".xls")
{
//connection string for that file which extantion is .xls
ConStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=2\"";
}
else if (ext.Trim() == ".xlsx")
{
//connection string for that file which extantion is .xlsx
ConStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties=\"Excel 12.0;HDR=Yes;IMEX=2\"";
}
//making query
string query = "SELECT * FROM [Sheet1$]";
//Providing connection
OleDbConnection conn = new OleDbConnection(ConStr);
//checking that connection state is closed or not if closed the
//open the connection
if (conn.State == ConnectionState.Closed)
{
conn.Open();
}
//create command object
OleDbCommand cmd = new OleDbCommand(query, conn);
// create a data adapter and get the data into dataadapter
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
DataSet ds = new DataSet();
//fill the excel data to data set
da.Fill(ds);
if (ds.Tables != null && ds.Tables.Count > 0)
{
for (int i = 0; i < ds.Tables[0].Columns.Count; i++)
{
if (ds.Tables[0].Columns[0].ToString() == "ID" && ds.Tables[0].Columns[1].ToString() == "name")
{
}
//else if (ds.Tables[0].Rows[0][i].ToString().ToUpper() == "NAME")
//{
//}
//else if (ds.Tables[0].Rows[0][i].ToString().ToUpper() == "EMAIL")
//{
//}
}
}
//set data source of the grid view
gvExcelFile.DataSource = ds.Tables[0];
//binding the gridview
gvExcelFile.DataBind();
//close the connection
conn.Close();
}
}
}
try
{
System.Diagnostics.Process[] process = System.Diagnostics.Process.GetProcessesByName("Excel");
foreach (System.Diagnostics.Process p in process)
{
if (!string.IsNullOrEmpty(p.ProcessName))
{
try
{
p.Kill();
}
catch { }
}
}
REF_User oREF_User = new REF_User();
oREF_User = (REF_User)Session["LoggedUser"];
string pdfFilePath = Server.MapPath("~/FileUpload/" + oREF_User.USER_ID + "");
if (Directory.Exists(pdfFilePath))
{
System.IO.DirectoryInfo di = new DirectoryInfo(pdfFilePath);
foreach (FileInfo file in di.GetFiles())
{
file.Delete();
}
Directory.Delete(pdfFilePath);
}
Directory.CreateDirectory(pdfFilePath);
string path = Server.MapPath("~/FileUpload/" + oREF_User.USER_ID + "/");
if (Path.GetExtension(FileUpload1.FileName) == ".xlsx")
{
string fullpath1 = path + Path.GetFileName(FileUpload1.FileName);
if (FileUpload1.FileName != "")
{
FileUpload1.SaveAs(fullpath1);
}
FileStream Stream = new FileStream(fullpath1, FileMode.Open);
IExcelDataReader ExcelReader = ExcelReaderFactory.CreateOpenXmlReader(Stream);
DataSet oDataSet = ExcelReader.AsDataSet();
Stream.Close();
bool result = false;
foreach (System.Data.DataTable oDataTable in oDataSet.Tables)
{
//ToDO code
}
oBL_PlantTransactions.InsertList(oListREF_PlantTransactions, null);
ShowMessage("Successfully saved!", REF_ENUM.MessageType.Success);
}
else
{
ShowMessage("File Format Incorrect", REF_ENUM.MessageType.Error);
}
}
catch (Exception ex)
{
ShowMessage("Please check the details and submit again!", REF_ENUM.MessageType.Error);
System.Diagnostics.Process[] process = System.Diagnostics.Process.GetProcessesByName("Excel");
foreach (System.Diagnostics.Process p in process)
{
if (!string.IsNullOrEmpty(p.ProcessName))
{
try
{
p.Kill();
}
catch { }
}
}
}
I found this article to be very helpful.
It lists various libraries you can choose from. One of the libraries I used is EPPlus as shown below.
Nuget: EPPlus Library
Excel Sheet 1 Data
Cell A2 Value :
Cell A2 Color :
Cell B2 Formula :
Cell B2 Value :
Cell B2 Border :
Excel Sheet 2 Data
Cell A2 Formula :
Cell A2 Value :
static void Main(string[] args)
{
using(var package = new ExcelPackage(new FileInfo("Book.xlsx")))
{
var firstSheet = package.Workbook.Worksheets["First Sheet"];
Console.WriteLine("Sheet 1 Data");
Console.WriteLine($"Cell A2 Value : {firstSheet.Cells["A2"].Text}");
Console.WriteLine($"Cell A2 Color : {firstSheet.Cells["A2"].Style.Font.Color.LookupColor()}");
Console.WriteLine($"Cell B2 Formula : {firstSheet.Cells["B2"].Formula}");
Console.WriteLine($"Cell B2 Value : {firstSheet.Cells["B2"].Text}");
Console.WriteLine($"Cell B2 Border : {firstSheet.Cells["B2"].Style.Border.Top.Style}");
Console.WriteLine("");
var secondSheet = package.Workbook.Worksheets["Second Sheet"];
Console.WriteLine($"Sheet 2 Data");
Console.WriteLine($"Cell A2 Formula : {secondSheet.Cells["A2"].Formula}");
Console.WriteLine($"Cell A2 Value : {secondSheet.Cells["A2"].Text}");
}
}

export data from CSV to datatable in c#

I am using below code to export data from a csv file to datatable.
As the values are of mixed text i.e. both numbers and Alphabets, some of the columns are not getting exported to Datatable.
I have done some research here and found that we need to set ImportMixedType = Text and TypeGuessRows = 0 in registry which even did not solve the problem.
Below code is working for some files even with mixed text.
Could someone tell me what is wrong with below code. Do I miss some thing here.
if (isFirstRowHeader)
{
header = "Yes";
}
using (OleDbConnection connection = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + pathOnly +
";Extended Properties=\"text;HDR=" + header + ";FMT=Delimited\";"))
{
using (OleDbCommand command = new OleDbCommand(sql, connection))
{
using (OleDbDataAdapter adapter = new OleDbDataAdapter(command))
{
adapter.Fill(table);
}
connection.Close();
}
}
for comma delimited file this worked for me
public DataTable CSVtoDataTable(string inputpath)
{
DataTable csvdt = new DataTable();
string Fulltext;
if (File.Exists(inputpath))
{
using (StreamReader sr = new StreamReader(inputpath))
{
while (!sr.EndOfStream)
{
Fulltext = sr.ReadToEnd().ToString();//read full content
string[] rows = Fulltext.Split('\n');//split file content to get the rows
for (int i = 0; i < rows.Count() - 1; i++)
{
var regex = new Regex("\\\"(.*?)\\\"");
var output = regex.Replace(rows[i], m => m.Value.Replace(",", "\\c"));//replace commas inside quotes
string[] rowValues = output.Split(',');//split rows with comma',' to get the column values
{
if (i == 0)
{
for (int j = 0; j < rowValues.Count(); j++)
{
csvdt.Columns.Add(rowValues[j].Replace("\\c",","));//headers
}
}
else
{
try
{
DataRow dr = csvdt.NewRow();
for (int k = 0; k < rowValues.Count(); k++)
{
if (k >= dr.Table.Columns.Count)// more columns may exist
{ csvdt .Columns.Add("clmn" + k);
dr = csvdt .NewRow();
}
dr[k] = rowValues[k].Replace("\\c", ",");
}
csvdt.Rows.Add(dr);//add other rows
}
catch
{
Console.WriteLine("error");
}
}
}
}
}
}
}
return csvdt;
}
The main thing that would probably help is to first stop using OleDB objects for reading a delimited file. I suggest using the 'TextFieldParser' which is what I have successfully used for over 2 years now for a client.
http://www.dotnetperls.com/textfieldparser
There may be other issues, but without seeing your .CSV file, I can't tell you where your problem may lie.
The TextFieldParser is specifically designed to parse comma delimited files. The OleDb objects are not. So, start there and then we can determine what the problem may be, if it persists.
If you look at an example on the link I provided, they are merely writing lines to the console. You can alter this code portion to add rows to a DataTable object, as I do, for sorting purposes.

CSV parser to parse double quotes via OLEDB

How can I use OLEDB to parse and import a CSV file that each cell is encased in double quotes because some rows contain commas in them?? I am unable to change the format as it is coming from a vendor.
I am trying the following and it is failing with an IO error:
public DataTable ConvertToDataTable(string fileToImport, string fileDestination)
{
string fullImportPath = fileDestination + #"\" + fileToImport;
OleDbDataAdapter dAdapter = null;
DataTable dTable = null;
try
{
if (!File.Exists(fullImportPath))
return null;
string full = Path.GetFullPath(fullImportPath);
string file = Path.GetFileName(full);
string dir = Path.GetDirectoryName(full);
//create the "database" connection string
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM " + file;
//create a DataTable to hold the query results
dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
dAdapter = new OleDbDataAdapter(query, connString);
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (Exception ex)
{
throw new Exception(CLASS_NAME + ".ConvertToDataTable: Caught Exception: " + ex);
}
finally
{
if (dAdapter != null)
dAdapter.Dispose();
}
return dTable;
}
When I use a normal CSV it works fine. Do I need to change something in the connString??
Use a dedicated CSV parser.
There are many out there. A popular one is FileHelpers, though there is one hidden in the Microsoft.VisualBasic.FileIO namespace - TextFieldParser.
Have a look at FileHelpers.
You can use this code : MS office required
private void ConvertCSVtoExcel(string filePath = #"E:\nucc_taxonomy_140.csv", string tableName = "TempTaxonomyCodes")
{
string tempPath = System.IO.Path.GetDirectoryName(filePath);
string strConn = #"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + tempPath + #"\;Extensions=asc,csv,tab,txt";
OdbcConnection conn = new OdbcConnection(strConn);
OdbcDataAdapter da = new OdbcDataAdapter("Select * from " + System.IO.Path.GetFileName(filePath), conn);
DataTable dt = new DataTable();
da.Fill(dt);
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationSettings.AppSettings["dbConnectionString"]))
{
bulkCopy.DestinationTableName = tableName;
bulkCopy.BatchSize = 50;
bulkCopy.WriteToServer(dt);
}
}
There is a lot to consider when handling CSV files. However you extract them from the file, you should know how you are handling the parsing. There are classes out there that can get you part way, but most don't handle the nuances that Excel does with embedded commas, quotes and line breaks. However, loading Excel or the MS classes seems a lot of freaking overhead if you just want parse a txt file like a CSV.
One thing you can consider is doing the parsing in your own Regex, which will also make your code a little more platform independent, in case you need to port it to another server or application at some point. Using regex has the benefit of also being accessible in virtually every language. That said, there are some good regex patterns out there that handle the CSV puzzle. Here is my shot at it, which does cover embedded commas, quotes and line breaks. Regex code/pattern and explanation :
http://www.kimgentes.com/worshiptech-web-tools-page/2008/10/14/regex-pattern-for-parsing-csv-files-with-embedded-commas-dou.html
Hope that is of some help..
Try the code from my answer here:
Reading CSV files in C#
It handles quoted csv just fine.
private static void Mubashir_CSVParser(string s)
{
// extract the fields
Regex RegexCSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = RegexCSVParser.Split(s);
// clean up the fields (remove " and leading spaces)
for (int i = 0; i < Fields.Length; i++)
{
Fields[i] = Fields[i].TrimStart(' ', '"');
Fields[i] = Fields[i].TrimEnd('"');// this line remove the quotes
//Fields[i] = Fields[i].Trim();
}
}
Just incase anyone has a similar issue, i wanted to post the code i used. i did end up using Textparser to get the file and parse ot the columns, but i am using recrusion to get the rest done and substrings.
/// <summary>
/// Parses each string passed as a "row".
/// This routine accounts for both double quotes
/// as well as commas currently, but can be added to
/// </summary>
/// <param name="row"> string or row to be parsed</param>
/// <returns></returns>
private List<String> ParseRowToList(String row)
{
List<String> returnValue = new List<String>();
if (row[0] == '\"')
{// Quoted String
if (row.IndexOf("\",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf("\",") + 2));
returnValue.Insert(0, row.Substring(1, row.IndexOf("\",") - 1));
}
else
{// This is the last column
returnValue.Add(row.Substring(1, row.Length - 2));
}
}
else
{// Unquoted String
if (row.IndexOf(",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf(",") + 1));
returnValue.Insert(0, row.Substring(0, row.IndexOf(",")));
}
else
{// This is the last column
returnValue.Add(row.Substring(0, row.Length));
}
}
return returnValue;
}
Then the code for Textparser is:
// string pathFile = #"C:\TestFTP\TestCatalog.txt";
string pathFile = #"C:\TestFTP\SomeFile.csv";
List<String> stringList = new List<String>();
TextFieldParser fieldParser = null;
DataTable dtable = new DataTable();
/* Set up TextFieldParser
* use the correct delimiter provided
* and path */
fieldParser = new TextFieldParser(pathFile);
/* Set that there are quotes in the file for fields and or column names */
fieldParser.HasFieldsEnclosedInQuotes = true;
/* delimiter by default to be used first */
fieldParser.SetDelimiters(new string[] { "," });
// Build Full table to be imported
dtable = BuildDataTable(fieldParser, dtable);
This is what I used in a project, parses a single line of data.
private string[] csvParser(string csv, char separator = ',')
{
List <string> parsed = new List<string>();
string[] temp = csv.Split(separator);
int counter = 0;
string data = string.Empty;
while (counter < temp.Length)
{
data = temp[counter].Trim();
if (data.Trim().StartsWith("\""))
{
bool isLast = false;
while (!isLast && counter < temp.Length)
{
data += separator.ToString() + temp[counter + 1];
counter++;
isLast = (temp[counter].Trim().EndsWith("\""));
}
}
parsed.Add(data);
counter++;
}
return parsed.ToArray();
}
http://zamirsblog.blogspot.com/2013/09/c-csv-parser-csvparser.html

Categories

Resources