SSIS and The Excel Files Saga

SSIS and The Excel Files Saga - c#

I have a Excel File (xls) that has a column called Money. In the Money column all the columns are formatted as number, except for some that have that marker saying formatted as text against them. I convert the Excel file to CSV using a c# script that uses IMEX=1 in the connection string to open it. The fields that are marked with stored as text do not come through to the csv file. The file is large, about 20MB. So this means 100 values like 33344 etc do not come thro the csv file.
I tried to put a delay in where I open the Excel File. This worked on my PC but not the Development machine.
Have any idea how to get round this without manually intervention, like format all columns with mixed data types as number etc ? I am looking for an automated solution that works every time . This is on SSIS 2008.
static void ConvertExcelToCsv(string excelFilePath, string csvOutputFile, int worksheetNumber = 1) {
if (!File.Exists(excelFilePath)) throw new FileNotFoundException(excelFilePath);
if (File.Exists(csvOutputFile)) throw new ArgumentException("File exists: " + csvOutputFile);
// connection string
var cnnStr = String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=\"Excel 8.0;IMEX=1;HDR=NO\"", excelFilePath);
var cnn = new OleDbConnection(cnnStr);
// get schema, then data
var dt = new DataTable();
try {
cnn.Open();
var schemaTable = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (schemaTable.Rows.Count < worksheetNumber) throw new ArgumentException("The worksheet number provided cannot be found in the spreadsheet");
string worksheet = schemaTable.Rows[worksheetNumber - 1]["table_name"].ToString().Replace("'", "");
string sql = String.Format("select * from [{0}]", worksheet);
var da = new OleDbDataAdapter(sql, cnn);
da.Fill(dt);
}
catch (Exception e) {
// ???
throw e;
}
finally {
// free resources
cnn.Close();
}
// write out CSV data
using (var wtr = new StreamWriter(csvOutputFile)) {
foreach (DataRow row in dt.Rows) {
bool firstLine = true;
foreach (DataColumn col in dt.Columns) {
if (!firstLine) { wtr.Write(","); } else { firstLine = false; }
var data = row[col.ColumnName].ToString().Replace("\"", "\"\"");
wtr.Write(String.Format("\"{0}\"", data));
}
wtr.WriteLine();
}
}
}

My solution was to specify a format for the incoming files which said no columns with mixed data types. Solution was from business side and not technology.

Related

when I put a csv file in Sql Server with C #, some fields are written incorrectly?

Hi I have a problem with importing a csv file into a sql server, this csv file contains articles that need to be saved in the sql server database. Once the import (done with the code c # written below) is finished, some fields imported as (Descrizione and CodArt) are not written correctly in the database and have strange characters. To download the csv file click here.
SqlServer improper import over blue line:
Import C# Code:
using (var rd = new StreamReader(labelPercorso.Text))
{
Articolo a = new Articolo();
a.db = this.db;
while (!rd.EndOfStream)
{
//setto codean e immagine =null ad ogni giro
CodEAN = "";
Immagine = "";
try
{
var splits = rd.ReadLine().Split(';');
CodArt = splits[0];
Descrizione = splits[1];
String Price = splits[2];
Prezzo = decimal.Parse(Price);
}
catch (Exception ex)
{
Console.WriteLine("Non è presente nè immagine nè codean");
}
a.Prezzo = Prezzo;
a.CodiceArticolo = CodArt;
a.Descrizione = Descrizione;
a.Fornitore = fornitore;
//manca da controllare se l'articolo è presente e nel caso aggiornalo
a.InserisciArticoloCSV();
}
}
Code of function: InserisciArticoloCSV
try
{
SqlConnection conn = db.apriconnessione();
String query = "INSERT INTO Articolo(CodArt,Descrizione,Prezzo,PrezzoListino,Fornitore,Importato,TipoArticolo) VALUES(#CodArt,#Descrizione,#Prezzo,#PrezzoListino,#Fornitore,#Importato,#TipoArticolo)";
String Importato = "CSV";
String TipoArticolo = "A";
SqlCommand cmd = new SqlCommand(query, conn);
// MessageBox.Show("CodArt: " + CodiceArticolo + "\n Descrizione :" + Descrizione + "\n Prezzo: " + Prezzo);
cmd.Parameters.AddWithValue("#CodArt", CodiceArticolo.ToString());
cmd.Parameters.AddWithValue("#Descrizione", Descrizione.ToString());
cmd.Parameters.AddWithValue("#Prezzo", Prezzo);
cmd.Parameters.AddWithValue("#PrezzoListino", Prezzo);
cmd.Parameters.AddWithValue("#Fornitore", Fornitore.ToString());
cmd.Parameters.AddWithValue("#Importato", Importato.ToString());
cmd.Parameters.AddWithValue("#TipoArticolo", TipoArticolo.ToString());
cmd.ExecuteNonQuery();
db.chiudiconnessione();
conn.Close();
return true;
}
catch (Exception ex)
{
Console.WriteLine("Errore nell'inserimento dell'articolo " + ex);
//MessageBox.Show("Errore nel inserimento dell'articolo: " + ex);
return false;
}

Your CSV file is not well formated , there are intermediatory Carriage Returns in between , which screws up the parsing. See the file in Notepad++ and turn on the Line Breaks , this is what you find.
So for the lines which are in format the data import is working fine , for others the logic is not working.

As others have pointed out, you have numerous problems, encoding, carriage returns and a lot of white space. In addition you are using single inserts into your database, which is very slow. I show below some sample code, which illustrates how to deal with all of these points.
IFormatProvider fP = new CultureInfo("it");
DataTable tmp = new DataTable();
tmp.Columns.Add("CodArt", typeof(string));
tmp.Columns.Add("Descrizione", typeof(string));
tmp.Columns.Add("Prezzo", typeof(decimal));
using (var rd = new StreamReader("yourFileName", Encoding.GetEncoding("iso-8859-1")))
{
while (!rd.EndOfStream)
{
try
{
var nextLine = Regex.Replace(rd.ReadLine(), #"\s+", " ");
while (nextLine.Split(';').Length < 3)
{
nextLine = nextLine.Replace("\r\n", "") + Regex.Replace(rd.ReadLine(), #"\s+", " ");
}
var splits = nextLine.Split(';');
DataRow dR = tmp.NewRow();
dR[0] = splits[0];
dR[1] = splits[1];
string Price = splits[2];
dR[2] = decimal.Parse(Price, fP);
tmp.Rows.Add(dR);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
using (var conn = db.apriconnessione())
{
var sBC = new SqlBulkCopy(conn);
conn.Open();
sBC.DestinationTableName = "yourTableName";
sBC.WriteToServer(tmp);
conn.Close();
}
Now for some explanation:
Firstly I am storing the parsed values in a DataTable. Please note that I have only included the three fields that are in the CSV. In practice you must supply the other columns and fill the extra columns with the correct values for each row. I was simply being lazy, but I am sure you will get the idea.
I do not know what encoding your csv file is, but iso-8859-1 worked for me!
I use Regex to replace multiple white space with a single space.
If any line does not have the required number of splits, I keep adding further lines (having deleted the carriage return) until I hit success!
Once I have a complete line, I can now split it, and assign it to the new DataRow (please see my comments above for extra columns).
Finally once the file has been read, the DataTable will have all the rows and can be uploaded to your database using BulkCopy. This is very fast!
HTH
PS Some of your lines have double quotes. You probably want to get rid of these as well!

You should specify the correct encoding when you read your file. Is it utf? Is it ascii with a specific code page? You should also specify the SqlDbType of your Sql parameters, especially the string parameters which will be either varchar or nvarchar and there is a big difference between them.
// what is the encoding of your file? This is an example using code page windows-1252
var encoding = Encoding.GetEncoding("windows-1252");
using (var file = File.Open(labelPercorso.Text, FileMode.Open))
using (var reader = new StreamReader(file, encoding))
{
// rest of code unchanged
}
Sql Code. Note that I added using blocks for the types that implement IDisposable like Connection and Command.
try
{
String query = "INSERT INTO Articolo(CodArt,Descrizione,Prezzo,PrezzoListino,Fornitore,Importato,TipoArticolo) VALUES(#CodArt,#Descrizione,#Prezzo,#PrezzoListino,#Fornitore,#Importato,#TipoArticolo)";
String Importato = "CSV";
String TipoArticolo = "A";
using(SqlConnection conn = db.apriconnessione())
using(SqlCommand cmd = new SqlCommand(query, conn))
{
// -1 indicates you used MAX like nvarchar(max), otherwise use the maximum number of characters in the schema
cmd.Parameters.Add(new SqlDbParameter("#CodArt", SqlDbType.NVarChar, -1)).Value = CodiceArticolo.ToString();
cmd.Parameters.Add(new SqlDbParameter("#Descrizione", SqlDbType.NVarChar, -1)).Value = Descrizione.ToString();
/*
Rest of your parameters created in the same manner
*/
cmd.ExecuteNonQuery();
db.chiudiconnessione();
}
return true;
}
catch (Exception ex)
{
Console.WriteLine("Errore nell'inserimento dell'articolo " + ex);
//MessageBox.Show("Errore nel inserimento dell'articolo: " + ex);
return false;
}

Just in case if you are interested in exploring library to handle all parsing needs with few lines of code, you can check out the Cinchoo ETL - an open source library. Here is sample to parse the csv file and shows how to get either datatable or list of records for later to load them to database.
System.Threading.Thread.CurrentThread.CurrentCulture = new CultureInfo("it");
using (var p = new ChoCSVReader("Bosch Luglio 2017.csv")
.Configure((c) => c.MayContainEOLInData = true) //Handle newline chars in data
.Configure(c => c.Encoding = Encoding.GetEncoding("iso-8859-1")) //Specify the encoding for reading
.WithField("CodArt", 1) //first column
.WithField("Descrizione", 2) //second column
.WithField("Prezzo", 3, fieldType: typeof(decimal)) //third column
.Setup(c => c.BeforeRecordLoad += (o, e) =>
{
e.Source = e.Source.CastTo<string>().Replace(#"""", String.Empty); //Remove the quotes
}) //Scrub the data
)
{
var dt = p.AsDataTable();
//foreach (var rec in p)
// Console.WriteLine(rec.Prezzo);
}
Disclaimer: I'm the author of this library.

How to read excel file in asp.net

I am using Epplus library in order to upload data from excel file.The code i am using is perfectly works for excel file which has standard form.ie if first row is column and rest all data corresponds to column.But now a days i am getting regularly , excel files which has different structure and i am not able to read
excel file like as shown below
what i want is on third row i wan only Region and Location Id and its values.Then 7th row is columns and 8th to 15 are its values.Finally 17th row is columns for 18th to 20th .How to load all these datas to seperate datatables
code i used is as shown below
I created an extension method
public static DataSet Exceltotable(this string path)
{
DataSet ds = null;
using (var pck = new OfficeOpenXml.ExcelPackage())
{
try
{
using (var stream = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
pck.Load(stream);
}
ds = new DataSet();
var wss = pck.Workbook.Worksheets;
////////////////////////////////////
//Application app = new Application();
//app.Visible = true;
//app.Workbooks.Add("");
//app.Workbooks.Add(#"c:\MyWork\WorkBook1.xls");
//app.Workbooks.Add(#"c:\MyWork\WorkBook2.xls");
//for (int i = 2; i <= app.Workbooks.Count; i++)
//{
// for (int j = 1; j <= app.Workbooks[i].Worksheets.Count; j++)
// {
// Worksheet ws = app.Workbooks[i].Worksheets[j];
// ws.Copy(app.Workbooks[1].Worksheets[1]);
// }
//}
///////////////////////////////////////////////////
//for(int s=0;s<5;s++)
//{
foreach (var ws in wss)
{
System.Data.DataTable tbl = new System.Data.DataTable();
bool hasHeader = true; // adjust it accordingly( i've mentioned that this is a simple approach)
string ErrorMessage = string.Empty;
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (var rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var row = tbl.NewRow();
foreach (var cell in wsRow)
{
//modifed by faras
if (cell.Text != null)
{
row[cell.Start.Column - 1] = cell.Text;
}
}
tbl.Rows.Add(row);
tbl.TableName = ws.Name;
}
DataTable dt = RemoveEmptyRows(tbl);
ds.Tables.Add(dt);
}
}
catch (Exception exp)
{
}
return ds;
}
}

If you're providing the template for users to upload, you can mitigate this some by using named ranges in your spreadsheet. That's a good idea anyway when programmatically working with Excel because it helps when you modify your own spreadsheet, not just when the user does.
You probably know how to name a range, but for the sake of completeness, here's how to name a range.
When you're working with the spreadsheet in code you can get a reference to the range using [yourworkbook].Names["yourNamedRange"]. If it's just a single cell and you need to reference the row or column index you can use .Start.Row or .Start.Column.
I add named ranges for anything - cells containing particular values, columns, header rows, rows where sets of data begin. If I need row or column indexes I assign useful variable names. That protects you from having all sorts of "magic numbers" in your spreadsheet. You (or your users) can move quite a bit around without breaking anything.
If they modify the structure too much then it won't work. You can also use protection on the workbook and worksheet to ensure that they can't accidentally modify the structure - tabs, rows, columns.
This is loosely taken from a test I was working with last weekend when I was learning this. It was just a "hello world" so I wasn't trying to make it all streamlined and perfect. (I was working on populating a spreadsheet, not reading one, so I'm just learning the properties as I go.)
// Open the workbook
using (var package = new ExcelPackage(new FileInfo("PriceQuoteTemplate.xlsx")))
{
// Get the worksheet I'm looking for
var quoteSheet = package.Workbook.Worksheets["Quote"];
//If I wanted to get the text from one named range
var cellText = quoteSheet.Workbook.Names["myNamedRange"].Text
//If I wanted to get the cell's value as some other type
var cellValue = quoteSheet.Workbook.Names["myNamedRange"].GetValue<int>();
//If I had a named range and I wanted to loop through the rows and get
//values from certain columns
var myRange = quoteSheet.Workbook.Names["rangeContainingRows"];
//This is a named range used to mark a column. So instead of using a
//magic number, I'll read from whatever column has this named range.
var someColumn = quoteSheet.Workbook.Names["columnLabel"].Start.Column;
for(var rowNumber = myRange.Start.Row; rowNumber < myRange.Start.Row + myRange.Rows; rowNumber++)
{
var getTheTextForTheRowAndColumn = quoteSheet.Cells(rowNumber, someColumn).Text
}
There might be a more elegant way to go about it. I just started using this myself. But the idea is you tell it to find a certain named range on the spreadsheet, and then you use the row or column number of that range instead of a magic row or column number.
Even though a range might be one cell, one row, or one column, it can potentially be a larger area. That's why I use .Start.Row. In other words, give me the row for the first cell in the range. If a range has more than one row, the .Rows property indicates the number of rows so I know how many there are. That means someone could even insert rows without breaking the code.

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.OleDb;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.IO;
namespace ReadData
{
public partial class ImportExelDataInGridView : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
}
protected void btnUpload_Click(object sender, EventArgs e)
{
//Coneection String by default empty
string ConStr = "";
//Extantion of the file upload control saving into ext because
//there are two types of extation .xls and .xlsx of excel
string ext = Path.GetExtension(FileUpload1.FileName).ToLower();
//getting the path of the file
string path = Server.MapPath("~/MyFolder/"+FileUpload1.FileName);
//saving the file inside the MyFolder of the server
FileUpload1.SaveAs(path);
Label1.Text = FileUpload1.FileName + "\'s Data showing into the GridView";
//checking that extantion is .xls or .xlsx
if (ext.Trim() == ".xls")
{
//connection string for that file which extantion is .xls
ConStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=2\"";
}
else if (ext.Trim() == ".xlsx")
{
//connection string for that file which extantion is .xlsx
ConStr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties=\"Excel 12.0;HDR=Yes;IMEX=2\"";
}
//making query
string query = "SELECT * FROM [Sheet1$]";
//Providing connection
OleDbConnection conn = new OleDbConnection(ConStr);
//checking that connection state is closed or not if closed the
//open the connection
if (conn.State == ConnectionState.Closed)
{
conn.Open();
}
//create command object
OleDbCommand cmd = new OleDbCommand(query, conn);
// create a data adapter and get the data into dataadapter
OleDbDataAdapter da = new OleDbDataAdapter(cmd);
DataSet ds = new DataSet();
//fill the excel data to data set
da.Fill(ds);
if (ds.Tables != null && ds.Tables.Count > 0)
{
for (int i = 0; i < ds.Tables[0].Columns.Count; i++)
{
if (ds.Tables[0].Columns[0].ToString() == "ID" && ds.Tables[0].Columns[1].ToString() == "name")
{
}
//else if (ds.Tables[0].Rows[0][i].ToString().ToUpper() == "NAME")
//{
//}
//else if (ds.Tables[0].Rows[0][i].ToString().ToUpper() == "EMAIL")
//{
//}
}
}
//set data source of the grid view
gvExcelFile.DataSource = ds.Tables[0];
//binding the gridview
gvExcelFile.DataBind();
//close the connection
conn.Close();
}
}
}

try
{
System.Diagnostics.Process[] process = System.Diagnostics.Process.GetProcessesByName("Excel");
foreach (System.Diagnostics.Process p in process)
{
if (!string.IsNullOrEmpty(p.ProcessName))
{
try
{
p.Kill();
}
catch { }
}
}
REF_User oREF_User = new REF_User();
oREF_User = (REF_User)Session["LoggedUser"];
string pdfFilePath = Server.MapPath("~/FileUpload/" + oREF_User.USER_ID + "");
if (Directory.Exists(pdfFilePath))
{
System.IO.DirectoryInfo di = new DirectoryInfo(pdfFilePath);
foreach (FileInfo file in di.GetFiles())
{
file.Delete();
}
Directory.Delete(pdfFilePath);
}
Directory.CreateDirectory(pdfFilePath);
string path = Server.MapPath("~/FileUpload/" + oREF_User.USER_ID + "/");
if (Path.GetExtension(FileUpload1.FileName) == ".xlsx")
{
string fullpath1 = path + Path.GetFileName(FileUpload1.FileName);
if (FileUpload1.FileName != "")
{
FileUpload1.SaveAs(fullpath1);
}
FileStream Stream = new FileStream(fullpath1, FileMode.Open);
IExcelDataReader ExcelReader = ExcelReaderFactory.CreateOpenXmlReader(Stream);
DataSet oDataSet = ExcelReader.AsDataSet();
Stream.Close();
bool result = false;
foreach (System.Data.DataTable oDataTable in oDataSet.Tables)
{
//ToDO code
}
oBL_PlantTransactions.InsertList(oListREF_PlantTransactions, null);
ShowMessage("Successfully saved!", REF_ENUM.MessageType.Success);
}
else
{
ShowMessage("File Format Incorrect", REF_ENUM.MessageType.Error);
}
}
catch (Exception ex)
{
ShowMessage("Please check the details and submit again!", REF_ENUM.MessageType.Error);
System.Diagnostics.Process[] process = System.Diagnostics.Process.GetProcessesByName("Excel");
foreach (System.Diagnostics.Process p in process)
{
if (!string.IsNullOrEmpty(p.ProcessName))
{
try
{
p.Kill();
}
catch { }
}
}
}

I found this article to be very helpful.
It lists various libraries you can choose from. One of the libraries I used is EPPlus as shown below.
Nuget: EPPlus Library
Excel Sheet 1 Data
Cell A2 Value :
Cell A2 Color :
Cell B2 Formula :
Cell B2 Value :
Cell B2 Border :
Excel Sheet 2 Data
Cell A2 Formula :
Cell A2 Value :
static void Main(string[] args)
{
using(var package = new ExcelPackage(new FileInfo("Book.xlsx")))
{
var firstSheet = package.Workbook.Worksheets["First Sheet"];
Console.WriteLine("Sheet 1 Data");
Console.WriteLine($"Cell A2 Value : {firstSheet.Cells["A2"].Text}");
Console.WriteLine($"Cell A2 Color : {firstSheet.Cells["A2"].Style.Font.Color.LookupColor()}");
Console.WriteLine($"Cell B2 Formula : {firstSheet.Cells["B2"].Formula}");
Console.WriteLine($"Cell B2 Value : {firstSheet.Cells["B2"].Text}");
Console.WriteLine($"Cell B2 Border : {firstSheet.Cells["B2"].Style.Border.Top.Style}");
Console.WriteLine("");
var secondSheet = package.Workbook.Worksheets["Second Sheet"];
Console.WriteLine($"Sheet 2 Data");
Console.WriteLine($"Cell A2 Formula : {secondSheet.Cells["A2"].Formula}");
Console.WriteLine($"Cell A2 Value : {secondSheet.Cells["A2"].Text}");
}
}

Input string was not in a correct format in c#, int value is not in correct format

Following is the code for it:
protected void Upload(object sender, EventArgs e)
{
if (FileUpload1.HasFile)
{
//Upload and save the file
string csvPath = Server.MapPath("~/App_Data/") + Path.GetFileName(FileUpload1.PostedFile.FileName);
FileUpload1.SaveAs(csvPath);
DataTable dt = new DataTable();
dt.Columns.AddRange(new DataColumn[7]
{
new DataColumn("pataintno", typeof(int)),
new DataColumn("Firstname", typeof(string)),
new DataColumn("Lastname",typeof(string)),
new DataColumn("Age", typeof(int)),
new DataColumn("Address", typeof(string)),
new DataColumn("Email", typeof(string)),
new DataColumn("Phno", typeof(int)),});
string csvData = File.ReadAllText(csvPath);
foreach (string row in csvData.Split('\n'))
{
if (!string.IsNullOrEmpty(row))
{
dt.Rows.Add();
int i = 0;
foreach (string cell in row.Split(','))
{
dt.Rows[dt.Rows.Count - 1][i] = cell;
i++;
}
}
}
string consString = ConfigurationManager.ConnectionStrings["cnstr"].ConnectionString;
using (SqlConnection con = new SqlConnection(consString))
{
using (SqlBulkCopy sqlBulkCopy = new SqlBulkCopy(con))
{
//Set the database table name
sqlBulkCopy.DestinationTableName = "Pataint";
con.Open();
sqlBulkCopy.WriteToServer(dt);
con.Close();
Array.ForEach(Directory.GetFiles((Server.MapPath("~/App_Data/"))), File.Delete);
}
}
}
else
{
Label1.Text = "PlZ TRY AGAIN";
}
}

You have a DataTable with 3 fields of type integer, the error says that one or more of the data extracted from your file are not valid integers.
So you need to check for bad input (as always in these cases)
// Read all lines and get back an array of the lines
string[] lines = File.ReadAllLines(csvPath);
// Loop over the lines and try to add them to the table
foreach (string row in lines)
{
// Discard if the line is just null, empty or all whitespaces
if (!string.IsNullOrWhiteSpace(row))
{
string[] rowParts = row.Split(',');
// We expect the line to be splittes in 7 parts.
// If this is not the case then log the error and continue
if(rowParts.Length != 7)
{
// Log here the info on the incorrect line with some logging tool
continue;
}
// Check if the 3 values expected to be integers are really integers
int pataintno;
int age;
int phno;
if(!Int32.TryParse(rowParts[0], out pataintno))
{
// It is not an integer, so log the error
// on this line and continue
continue;
}
if(!Int32.TryParse(rowParts[3], out age))
{
// It is not an integer, so log the error
// on this line and continue
continue;
}
if(!Int32.TryParse(rowParts[6], out phno))
{
// It is not an integer, so log the error
// on this line and continue
continue;
}
// OK, all is good now, try to create a new row, fill it and add to the
// Rows collection of the DataTable
DataRow dr = dt.NewRow();
dr[0] = pataintno;
dr[1] = rowParts[1].ToString();
dr[2] = rowParts[2].ToString();
dr[3] = age
dr[4] = rowParts[4].ToString();
dr[5] = rowParts[5].ToString();
dr[6] = phno;
dt.Rows.Add(dr);
}
}
The check on your input is done using Int32.TryParse that will return false if the string cannot be converted in an integer. In this case you should write some kind of error log to look at when the loop is completed and discover which lines are incorrect and fix them.
Notice also that I have changed your code in some points: Use File.ReadAllLines so you have already your input splitted at each new line (without problem if the newline is just a \n or a \r\n code), also the code to add a new row to your datatable should follow the pattern: create a new row, fill it with values, add the new row to the existing collection.

I checked the code and it seems fine. I suggest you to check the csv file and make sure there are no headers for any columns.

I had this problem today while parsing csv to sql table. My parser was working good since one year but all of a sudden threw int conversion error today. SQL bulk copy is not that informative, neither reviewing the csv file shows anything wrong in data. All my numeric columns in csv had valid numeric values.
So to find the error, I wrote below custom method. Error immediately popped on very first record. Actual problem was vendor changed the csv format of numeric value and now started rendering decimal values in place of integer. So for example, in place of value 1, csv file had 1.0. When I open the csv file, it reflects only 1 but in notepad, it showed 1.0. My sql table had all integer values and somehow SQL BulkCopy can't handle this transformation. Spent around 3 hours to figure out this error.
Solution inspired from - https://sqlbulkcopy-tutorial.net/type-a-cannot-be-converted-to-type-b
private void TestData(CsvDataReader dataReader)
{
int a = 0;
while(dataReader.Read())
{
try
{
a = int.Parse(dataReader[<<Column name>>].ToString());
}
catch (Exception ex){}
}
}

Reading a range of data from Excel

I am quite new to using C# for reading Excel data. I am using Microsoft.ACE.OLEDB.12.0 to read an excel sheet data. But my problem is the sheet starts from the cell B4 (instead of usual A1) and hence I am facing difficulties while reading the data. Following is my method:
public static DataSet GetExcelFileData(String fileNameWPath, String sheetName, String rangeName, String fieldList, String whereClause)
{
DataSet xlsDS = new DataSet();
String xlsFields = String.Empty;
String xlsWhereClause = String.Empty;
String xlsSqlString = String.Empty;
String xlsTempPath = #"C:\temp\";
//Copy File to temp folder locations....
String xlsTempName = Path.GetFileNameWithoutExtension(fileNameWPath);
xlsTempName = xlsTempName.Replace(".", String.Empty).Replace(" ", "_").Replace("-", "_").Replace("&", String.Empty).Replace("~", String.Empty) + ".xls";
//Check if sqlFields and Where Clause is Empty....
if (String.IsNullOrEmpty(fieldList))
xlsFields = "*";
else
xlsFields = fieldList;
if (!String.IsNullOrEmpty(whereClause))
xlsWhereClause = whereClause;
//String oleDBConnString = String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source ={0};Extended Properties=\"Excel 8.0; IMEX=1\"", xlsTempPath + Path.GetFileName(xlsTempName));
String oleDBConnString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0;HDR=NO;IMEX=0\"", xlsTempPath + Path.GetFileName(xlsTempName));
OleDbConnection xlsConnect = null;
try
{
File.Copy(fileNameWPath, xlsTempPath + Path.GetFileName(xlsTempName), true);
xlsConnect = new OleDbConnection(oleDBConnString);
OpenConnection(xlsConnect);
//Get Worksheet information
DataTable dbSchema = xlsConnect.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dbSchema == null || dbSchema.Rows.Count < 1)
{
throw new Exception(String.Format("Failed to get worksheet information for {0}", fileNameWPath));
}
DataRow[] sheets = dbSchema.Select(String.Format("TABLE_NAME LIKE '*{0}*'", sheetName.Replace("*", String.Empty)));
if (sheets.Length < 1)
{
throw new Exception(String.Format("Could not find worksheet {0} in {1}", sheetName, fileNameWPath));
}
else
{
string realSheetName = sheets[0]["TABLE_NAME"].ToString();
//Build Sql String
xlsSqlString = String.Format("Select {0} FROM [{1}${2}] {3}", xlsFields, sheetName, rangeName, xlsWhereClause);
//xlsSqlString = String.Format("Select {0} FROM [{1}${2}] {3}", xlsFields, sheetName, "", xlsWhereClause);
OleDbCommand cmd = new OleDbCommand(xlsSqlString, xlsConnect);
OleDbDataAdapter adapter = new OleDbDataAdapter(xlsSqlString, xlsConnect);
adapter.SelectCommand = cmd;
adapter.Fill(xlsDS);
return xlsDS;
}
}
catch (FormatException ex)
{
throw ex;
}
catch (Exception ex2)
{
if (ex2.Message.ToLower().Equals("no value given for one or more required parameters."))
{
throw new Exception(String.Format("Error in Reading File: {0}. \n Please Check if file contains fields you request Field List: {1}", fileNameWPath, xlsFields));
}
throw new Exception(String.Format("Error in Reading File: {0}\n Error Message: {1}", fileNameWPath, ex2.Message + ex2.StackTrace));
}
finally
{
CloseConnection(xlsConnect);
File.Delete(xlsTempPath + Path.GetFileName(xlsTempName));
}
}
Also, I have tried using the older veriosn of Jet Engine: Microsoft.Jet.OLEDB.4.0 and it works fine. But since we have migrated to 64 bit server, we must use the latest OleDb 12.0 engine. Everytime I specify a range ("B4:IV65536") and try to read data, I get the following exception:
"The Microsoft Office Access database engine could not find the object 'Report1$B4:IV65536'. Make sure the object exists and that you spell its name and the path name correctly."
Also, please note that I have tried many permutations-combinations of HDR, IMEX (setting them to Yes/No & 0/1 respectively but that hasn't helped).
Please suggest me a workaround.
Thanks,
Abhinav

CSV parser to parse double quotes via OLEDB

How can I use OLEDB to parse and import a CSV file that each cell is encased in double quotes because some rows contain commas in them?? I am unable to change the format as it is coming from a vendor.
I am trying the following and it is failing with an IO error:
public DataTable ConvertToDataTable(string fileToImport, string fileDestination)
{
string fullImportPath = fileDestination + #"\" + fileToImport;
OleDbDataAdapter dAdapter = null;
DataTable dTable = null;
try
{
if (!File.Exists(fullImportPath))
return null;
string full = Path.GetFullPath(fullImportPath);
string file = Path.GetFileName(full);
string dir = Path.GetDirectoryName(full);
//create the "database" connection string
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM " + file;
//create a DataTable to hold the query results
dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
dAdapter = new OleDbDataAdapter(query, connString);
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (Exception ex)
{
throw new Exception(CLASS_NAME + ".ConvertToDataTable: Caught Exception: " + ex);
}
finally
{
if (dAdapter != null)
dAdapter.Dispose();
}
return dTable;
}
When I use a normal CSV it works fine. Do I need to change something in the connString??

Use a dedicated CSV parser.
There are many out there. A popular one is FileHelpers, though there is one hidden in the Microsoft.VisualBasic.FileIO namespace - TextFieldParser.

Have a look at FileHelpers.

You can use this code : MS office required
private void ConvertCSVtoExcel(string filePath = #"E:\nucc_taxonomy_140.csv", string tableName = "TempTaxonomyCodes")
{
string tempPath = System.IO.Path.GetDirectoryName(filePath);
string strConn = #"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + tempPath + #"\;Extensions=asc,csv,tab,txt";
OdbcConnection conn = new OdbcConnection(strConn);
OdbcDataAdapter da = new OdbcDataAdapter("Select * from " + System.IO.Path.GetFileName(filePath), conn);
DataTable dt = new DataTable();
da.Fill(dt);
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationSettings.AppSettings["dbConnectionString"]))
{
bulkCopy.DestinationTableName = tableName;
bulkCopy.BatchSize = 50;
bulkCopy.WriteToServer(dt);
}
}

There is a lot to consider when handling CSV files. However you extract them from the file, you should know how you are handling the parsing. There are classes out there that can get you part way, but most don't handle the nuances that Excel does with embedded commas, quotes and line breaks. However, loading Excel or the MS classes seems a lot of freaking overhead if you just want parse a txt file like a CSV.
One thing you can consider is doing the parsing in your own Regex, which will also make your code a little more platform independent, in case you need to port it to another server or application at some point. Using regex has the benefit of also being accessible in virtually every language. That said, there are some good regex patterns out there that handle the CSV puzzle. Here is my shot at it, which does cover embedded commas, quotes and line breaks. Regex code/pattern and explanation :
http://www.kimgentes.com/worshiptech-web-tools-page/2008/10/14/regex-pattern-for-parsing-csv-files-with-embedded-commas-dou.html
Hope that is of some help..

Try the code from my answer here:
Reading CSV files in C#
It handles quoted csv just fine.

private static void Mubashir_CSVParser(string s)
{
// extract the fields
Regex RegexCSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = RegexCSVParser.Split(s);
// clean up the fields (remove " and leading spaces)
for (int i = 0; i < Fields.Length; i++)
{
Fields[i] = Fields[i].TrimStart(' ', '"');
Fields[i] = Fields[i].TrimEnd('"');// this line remove the quotes
//Fields[i] = Fields[i].Trim();
}
}

Just incase anyone has a similar issue, i wanted to post the code i used. i did end up using Textparser to get the file and parse ot the columns, but i am using recrusion to get the rest done and substrings.
/// <summary>
/// Parses each string passed as a "row".
/// This routine accounts for both double quotes
/// as well as commas currently, but can be added to
/// </summary>
/// <param name="row"> string or row to be parsed</param>
/// <returns></returns>
private List<String> ParseRowToList(String row)
{
List<String> returnValue = new List<String>();
if (row[0] == '\"')
{// Quoted String
if (row.IndexOf("\",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf("\",") + 2));
returnValue.Insert(0, row.Substring(1, row.IndexOf("\",") - 1));
}
else
{// This is the last column
returnValue.Add(row.Substring(1, row.Length - 2));
}
}
else
{// Unquoted String
if (row.IndexOf(",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf(",") + 1));
returnValue.Insert(0, row.Substring(0, row.IndexOf(",")));
}
else
{// This is the last column
returnValue.Add(row.Substring(0, row.Length));
}
}
return returnValue;
}
Then the code for Textparser is:
// string pathFile = #"C:\TestFTP\TestCatalog.txt";
string pathFile = #"C:\TestFTP\SomeFile.csv";
List<String> stringList = new List<String>();
TextFieldParser fieldParser = null;
DataTable dtable = new DataTable();
/* Set up TextFieldParser
* use the correct delimiter provided
* and path */
fieldParser = new TextFieldParser(pathFile);
/* Set that there are quotes in the file for fields and or column names */
fieldParser.HasFieldsEnclosedInQuotes = true;
/* delimiter by default to be used first */
fieldParser.SetDelimiters(new string[] { "," });
// Build Full table to be imported
dtable = BuildDataTable(fieldParser, dtable);

This is what I used in a project, parses a single line of data.
private string[] csvParser(string csv, char separator = ',')
{
List <string> parsed = new List<string>();
string[] temp = csv.Split(separator);
int counter = 0;
string data = string.Empty;
while (counter < temp.Length)
{
data = temp[counter].Trim();
if (data.Trim().StartsWith("\""))
{
bool isLast = false;
while (!isLast && counter < temp.Length)
{
data += separator.ToString() + temp[counter + 1];
counter++;
isLast = (temp[counter].Trim().EndsWith("\""));
}
}
parsed.Add(data);
counter++;
}
return parsed.ToArray();
}
http://zamirsblog.blogspot.com/2013/09/c-csv-parser-csvparser.html

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

SSIS and The Excel Files Saga - c#

My solution was to specify a format for the incoming files which said no columns with mixed data types. Solution was from business side and not technology.

Related

when I put a csv file in Sql Server with C #, some fields are written incorrectly?

How to read excel file in asp.net

Input string was not in a correct format in c#, int value is not in correct format

Reading a range of data from Excel

CSV parser to parse double quotes via OLEDB

Categories

Resources