I am quite new to using C# for reading Excel data. I am using Microsoft.ACE.OLEDB.12.0 to read an excel sheet data. But my problem is the sheet starts from the cell B4 (instead of usual A1) and hence I am facing difficulties while reading the data. Following is my method:
public static DataSet GetExcelFileData(String fileNameWPath, String sheetName, String rangeName, String fieldList, String whereClause)
{
DataSet xlsDS = new DataSet();
String xlsFields = String.Empty;
String xlsWhereClause = String.Empty;
String xlsSqlString = String.Empty;
String xlsTempPath = #"C:\temp\";
//Copy File to temp folder locations....
String xlsTempName = Path.GetFileNameWithoutExtension(fileNameWPath);
xlsTempName = xlsTempName.Replace(".", String.Empty).Replace(" ", "_").Replace("-", "_").Replace("&", String.Empty).Replace("~", String.Empty) + ".xls";
//Check if sqlFields and Where Clause is Empty....
if (String.IsNullOrEmpty(fieldList))
xlsFields = "*";
else
xlsFields = fieldList;
if (!String.IsNullOrEmpty(whereClause))
xlsWhereClause = whereClause;
//String oleDBConnString = String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source ={0};Extended Properties=\"Excel 8.0; IMEX=1\"", xlsTempPath + Path.GetFileName(xlsTempName));
String oleDBConnString = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0;HDR=NO;IMEX=0\"", xlsTempPath + Path.GetFileName(xlsTempName));
OleDbConnection xlsConnect = null;
try
{
File.Copy(fileNameWPath, xlsTempPath + Path.GetFileName(xlsTempName), true);
xlsConnect = new OleDbConnection(oleDBConnString);
OpenConnection(xlsConnect);
//Get Worksheet information
DataTable dbSchema = xlsConnect.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (dbSchema == null || dbSchema.Rows.Count < 1)
{
throw new Exception(String.Format("Failed to get worksheet information for {0}", fileNameWPath));
}
DataRow[] sheets = dbSchema.Select(String.Format("TABLE_NAME LIKE '*{0}*'", sheetName.Replace("*", String.Empty)));
if (sheets.Length < 1)
{
throw new Exception(String.Format("Could not find worksheet {0} in {1}", sheetName, fileNameWPath));
}
else
{
string realSheetName = sheets[0]["TABLE_NAME"].ToString();
//Build Sql String
xlsSqlString = String.Format("Select {0} FROM [{1}${2}] {3}", xlsFields, sheetName, rangeName, xlsWhereClause);
//xlsSqlString = String.Format("Select {0} FROM [{1}${2}] {3}", xlsFields, sheetName, "", xlsWhereClause);
OleDbCommand cmd = new OleDbCommand(xlsSqlString, xlsConnect);
OleDbDataAdapter adapter = new OleDbDataAdapter(xlsSqlString, xlsConnect);
adapter.SelectCommand = cmd;
adapter.Fill(xlsDS);
return xlsDS;
}
}
catch (FormatException ex)
{
throw ex;
}
catch (Exception ex2)
{
if (ex2.Message.ToLower().Equals("no value given for one or more required parameters."))
{
throw new Exception(String.Format("Error in Reading File: {0}. \n Please Check if file contains fields you request Field List: {1}", fileNameWPath, xlsFields));
}
throw new Exception(String.Format("Error in Reading File: {0}\n Error Message: {1}", fileNameWPath, ex2.Message + ex2.StackTrace));
}
finally
{
CloseConnection(xlsConnect);
File.Delete(xlsTempPath + Path.GetFileName(xlsTempName));
}
}
Also, I have tried using the older veriosn of Jet Engine: Microsoft.Jet.OLEDB.4.0 and it works fine. But since we have migrated to 64 bit server, we must use the latest OleDb 12.0 engine. Everytime I specify a range ("B4:IV65536") and try to read data, I get the following exception:
"The Microsoft Office Access database engine could not find the object 'Report1$B4:IV65536'. Make sure the object exists and that you spell its name and the path name correctly."
Also, please note that I have tried many permutations-combinations of HDR, IMEX (setting them to Yes/No & 0/1 respectively but that hasn't helped).
Please suggest me a workaround.
Thanks,
Abhinav
Related
Hi I have a problem with importing a csv file into a sql server, this csv file contains articles that need to be saved in the sql server database. Once the import (done with the code c # written below) is finished, some fields imported as (Descrizione and CodArt) are not written correctly in the database and have strange characters. To download the csv file click here.
SqlServer improper import over blue line:
Import C# Code:
using (var rd = new StreamReader(labelPercorso.Text))
{
Articolo a = new Articolo();
a.db = this.db;
while (!rd.EndOfStream)
{
//setto codean e immagine =null ad ogni giro
CodEAN = "";
Immagine = "";
try
{
var splits = rd.ReadLine().Split(';');
CodArt = splits[0];
Descrizione = splits[1];
String Price = splits[2];
Prezzo = decimal.Parse(Price);
}
catch (Exception ex)
{
Console.WriteLine("Non è presente nè immagine nè codean");
}
a.Prezzo = Prezzo;
a.CodiceArticolo = CodArt;
a.Descrizione = Descrizione;
a.Fornitore = fornitore;
//manca da controllare se l'articolo è presente e nel caso aggiornalo
a.InserisciArticoloCSV();
}
}
Code of function: InserisciArticoloCSV
try
{
SqlConnection conn = db.apriconnessione();
String query = "INSERT INTO Articolo(CodArt,Descrizione,Prezzo,PrezzoListino,Fornitore,Importato,TipoArticolo) VALUES(#CodArt,#Descrizione,#Prezzo,#PrezzoListino,#Fornitore,#Importato,#TipoArticolo)";
String Importato = "CSV";
String TipoArticolo = "A";
SqlCommand cmd = new SqlCommand(query, conn);
// MessageBox.Show("CodArt: " + CodiceArticolo + "\n Descrizione :" + Descrizione + "\n Prezzo: " + Prezzo);
cmd.Parameters.AddWithValue("#CodArt", CodiceArticolo.ToString());
cmd.Parameters.AddWithValue("#Descrizione", Descrizione.ToString());
cmd.Parameters.AddWithValue("#Prezzo", Prezzo);
cmd.Parameters.AddWithValue("#PrezzoListino", Prezzo);
cmd.Parameters.AddWithValue("#Fornitore", Fornitore.ToString());
cmd.Parameters.AddWithValue("#Importato", Importato.ToString());
cmd.Parameters.AddWithValue("#TipoArticolo", TipoArticolo.ToString());
cmd.ExecuteNonQuery();
db.chiudiconnessione();
conn.Close();
return true;
}
catch (Exception ex)
{
Console.WriteLine("Errore nell'inserimento dell'articolo " + ex);
//MessageBox.Show("Errore nel inserimento dell'articolo: " + ex);
return false;
}
Your CSV file is not well formated , there are intermediatory Carriage Returns in between , which screws up the parsing. See the file in Notepad++ and turn on the Line Breaks , this is what you find.
So for the lines which are in format the data import is working fine , for others the logic is not working.
As others have pointed out, you have numerous problems, encoding, carriage returns and a lot of white space. In addition you are using single inserts into your database, which is very slow. I show below some sample code, which illustrates how to deal with all of these points.
IFormatProvider fP = new CultureInfo("it");
DataTable tmp = new DataTable();
tmp.Columns.Add("CodArt", typeof(string));
tmp.Columns.Add("Descrizione", typeof(string));
tmp.Columns.Add("Prezzo", typeof(decimal));
using (var rd = new StreamReader("yourFileName", Encoding.GetEncoding("iso-8859-1")))
{
while (!rd.EndOfStream)
{
try
{
var nextLine = Regex.Replace(rd.ReadLine(), #"\s+", " ");
while (nextLine.Split(';').Length < 3)
{
nextLine = nextLine.Replace("\r\n", "") + Regex.Replace(rd.ReadLine(), #"\s+", " ");
}
var splits = nextLine.Split(';');
DataRow dR = tmp.NewRow();
dR[0] = splits[0];
dR[1] = splits[1];
string Price = splits[2];
dR[2] = decimal.Parse(Price, fP);
tmp.Rows.Add(dR);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
using (var conn = db.apriconnessione())
{
var sBC = new SqlBulkCopy(conn);
conn.Open();
sBC.DestinationTableName = "yourTableName";
sBC.WriteToServer(tmp);
conn.Close();
}
Now for some explanation:
Firstly I am storing the parsed values in a DataTable. Please note that I have only included the three fields that are in the CSV. In practice you must supply the other columns and fill the extra columns with the correct values for each row. I was simply being lazy, but I am sure you will get the idea.
I do not know what encoding your csv file is, but iso-8859-1 worked for me!
I use Regex to replace multiple white space with a single space.
If any line does not have the required number of splits, I keep adding further lines (having deleted the carriage return) until I hit success!
Once I have a complete line, I can now split it, and assign it to the new DataRow (please see my comments above for extra columns).
Finally once the file has been read, the DataTable will have all the rows and can be uploaded to your database using BulkCopy. This is very fast!
HTH
PS Some of your lines have double quotes. You probably want to get rid of these as well!
You should specify the correct encoding when you read your file. Is it utf? Is it ascii with a specific code page? You should also specify the SqlDbType of your Sql parameters, especially the string parameters which will be either varchar or nvarchar and there is a big difference between them.
// what is the encoding of your file? This is an example using code page windows-1252
var encoding = Encoding.GetEncoding("windows-1252");
using (var file = File.Open(labelPercorso.Text, FileMode.Open))
using (var reader = new StreamReader(file, encoding))
{
// rest of code unchanged
}
Sql Code. Note that I added using blocks for the types that implement IDisposable like Connection and Command.
try
{
String query = "INSERT INTO Articolo(CodArt,Descrizione,Prezzo,PrezzoListino,Fornitore,Importato,TipoArticolo) VALUES(#CodArt,#Descrizione,#Prezzo,#PrezzoListino,#Fornitore,#Importato,#TipoArticolo)";
String Importato = "CSV";
String TipoArticolo = "A";
using(SqlConnection conn = db.apriconnessione())
using(SqlCommand cmd = new SqlCommand(query, conn))
{
// -1 indicates you used MAX like nvarchar(max), otherwise use the maximum number of characters in the schema
cmd.Parameters.Add(new SqlDbParameter("#CodArt", SqlDbType.NVarChar, -1)).Value = CodiceArticolo.ToString();
cmd.Parameters.Add(new SqlDbParameter("#Descrizione", SqlDbType.NVarChar, -1)).Value = Descrizione.ToString();
/*
Rest of your parameters created in the same manner
*/
cmd.ExecuteNonQuery();
db.chiudiconnessione();
}
return true;
}
catch (Exception ex)
{
Console.WriteLine("Errore nell'inserimento dell'articolo " + ex);
//MessageBox.Show("Errore nel inserimento dell'articolo: " + ex);
return false;
}
Just in case if you are interested in exploring library to handle all parsing needs with few lines of code, you can check out the Cinchoo ETL - an open source library. Here is sample to parse the csv file and shows how to get either datatable or list of records for later to load them to database.
System.Threading.Thread.CurrentThread.CurrentCulture = new CultureInfo("it");
using (var p = new ChoCSVReader("Bosch Luglio 2017.csv")
.Configure((c) => c.MayContainEOLInData = true) //Handle newline chars in data
.Configure(c => c.Encoding = Encoding.GetEncoding("iso-8859-1")) //Specify the encoding for reading
.WithField("CodArt", 1) //first column
.WithField("Descrizione", 2) //second column
.WithField("Prezzo", 3, fieldType: typeof(decimal)) //third column
.Setup(c => c.BeforeRecordLoad += (o, e) =>
{
e.Source = e.Source.CastTo<string>().Replace(#"""", String.Empty); //Remove the quotes
}) //Scrub the data
)
{
var dt = p.AsDataTable();
//foreach (var rec in p)
// Console.WriteLine(rec.Prezzo);
}
Disclaimer: I'm the author of this library.
I have a Excel File (xls) that has a column called Money. In the Money column all the columns are formatted as number, except for some that have that marker saying formatted as text against them. I convert the Excel file to CSV using a c# script that uses IMEX=1 in the connection string to open it. The fields that are marked with stored as text do not come through to the csv file. The file is large, about 20MB. So this means 100 values like 33344 etc do not come thro the csv file.
I tried to put a delay in where I open the Excel File. This worked on my PC but not the Development machine.
Have any idea how to get round this without manually intervention, like format all columns with mixed data types as number etc ? I am looking for an automated solution that works every time . This is on SSIS 2008.
static void ConvertExcelToCsv(string excelFilePath, string csvOutputFile, int worksheetNumber = 1) {
if (!File.Exists(excelFilePath)) throw new FileNotFoundException(excelFilePath);
if (File.Exists(csvOutputFile)) throw new ArgumentException("File exists: " + csvOutputFile);
// connection string
var cnnStr = String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=\"Excel 8.0;IMEX=1;HDR=NO\"", excelFilePath);
var cnn = new OleDbConnection(cnnStr);
// get schema, then data
var dt = new DataTable();
try {
cnn.Open();
var schemaTable = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (schemaTable.Rows.Count < worksheetNumber) throw new ArgumentException("The worksheet number provided cannot be found in the spreadsheet");
string worksheet = schemaTable.Rows[worksheetNumber - 1]["table_name"].ToString().Replace("'", "");
string sql = String.Format("select * from [{0}]", worksheet);
var da = new OleDbDataAdapter(sql, cnn);
da.Fill(dt);
}
catch (Exception e) {
// ???
throw e;
}
finally {
// free resources
cnn.Close();
}
// write out CSV data
using (var wtr = new StreamWriter(csvOutputFile)) {
foreach (DataRow row in dt.Rows) {
bool firstLine = true;
foreach (DataColumn col in dt.Columns) {
if (!firstLine) { wtr.Write(","); } else { firstLine = false; }
var data = row[col.ColumnName].ToString().Replace("\"", "\"\"");
wtr.Write(String.Format("\"{0}\"", data));
}
wtr.WriteLine();
}
}
}
My solution was to specify a format for the incoming files which said no columns with mixed data types. Solution was from business side and not technology.
We are reading xls file which is getting updated regularly from external links. We have loop which read the same file after some interval of 200ms. After reading file for 1000+ time, we are getting Error
"The Microsoft Jet database engine cannot open the file ''. It is already opened exclusively by another user, or you need permission to view its data."
Connection string is as follows:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\FeedFiles\TESTING1.xls;Extended Properties="Excel 8.0;HDR=YES;IMEX=1;Importmixedtypes=text;typeguessrows=0;"
And after some time, it start giving "Could not find Installable ISAM".
Code as follows:
String xlsConnString = String.Format(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR=YES;IMEX=1;Importmixedtypes=text;typeguessrows=0;""", feedFiles.FullName);
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(xlsQuery, xlsConnString);
while (true)
{
try
{
//Exception handling if not able to read xls file.
DataSet dataSet = new DataSet();
dataAdapter.Fill(dataSet);
String fileName = dirstr + "Temp-";
System.IO.StreamWriter file = new System.IO.StreamWriter(fileName + ".tmp");
file.WriteLine(dataSet.GetXml());
file.Close();
try
{
File.Replace(fileName + ".tmp", dirstr + "Temp-" + filecount.ToString() + ".xml", null);
}
catch (Exception ex)
{
try
{
File.Move(fileName + ".tmp", dirstr + "Temp-" + filecount.ToString() + ".xml");
}
catch
{
Thread.Sleep(xlsThreadSleep);
}
}
filecount++;
if (filecount > maxFileCnt)
{
filecount = 0;
}
dataSet.Clear();
dataSet = null;
Thread.Sleep(xlsThreadSleep);
}
catch (Exception ex)
{
txtlog.BeginInvoke(new DelegateForTxtLog(functionFortxtLog), "Exception occured > " + ex.Message);
feedFileIndex++;
if (feedFileIndex == feedFiles.Length)
{
feedFileIndex = 0;
}
dataAdapter.Dispose();
dataAdapter = null;
Thread.Sleep(xlsThreadSleep * 20);
xlsConnString = String.Format(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;HDR=YES;IMEX=1;Importmixedtypes=text;typeguessrows=0;""", feedFiles[feedFileIndex].FullName);
txtlog.BeginInvoke(new DelegateForTxtLog(functionFortxtLog), "Trying connecting with connection string > " + xlsConnString);
dataAdapter = new OleDbDataAdapter(xlsQuery, xlsConnString);
txtlog.BeginInvoke(new DelegateForTxtLog(functionFortxtLog), "Now reading file > " + feedFiles[feedFileIndex].FullName);
}
}
Connection string is not formatted properly. Try this:
String xlsConnString = String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=
{0};Extended Properties=\"Excel 8.0;HDR=YES;
IMEX=1;Importmixedtypes=text;typeguessrows=0;\"", feedFiles.FullName);
How can I use OLEDB to parse and import a CSV file that each cell is encased in double quotes because some rows contain commas in them?? I am unable to change the format as it is coming from a vendor.
I am trying the following and it is failing with an IO error:
public DataTable ConvertToDataTable(string fileToImport, string fileDestination)
{
string fullImportPath = fileDestination + #"\" + fileToImport;
OleDbDataAdapter dAdapter = null;
DataTable dTable = null;
try
{
if (!File.Exists(fullImportPath))
return null;
string full = Path.GetFullPath(fullImportPath);
string file = Path.GetFileName(full);
string dir = Path.GetDirectoryName(full);
//create the "database" connection string
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM " + file;
//create a DataTable to hold the query results
dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
dAdapter = new OleDbDataAdapter(query, connString);
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (Exception ex)
{
throw new Exception(CLASS_NAME + ".ConvertToDataTable: Caught Exception: " + ex);
}
finally
{
if (dAdapter != null)
dAdapter.Dispose();
}
return dTable;
}
When I use a normal CSV it works fine. Do I need to change something in the connString??
Use a dedicated CSV parser.
There are many out there. A popular one is FileHelpers, though there is one hidden in the Microsoft.VisualBasic.FileIO namespace - TextFieldParser.
Have a look at FileHelpers.
You can use this code : MS office required
private void ConvertCSVtoExcel(string filePath = #"E:\nucc_taxonomy_140.csv", string tableName = "TempTaxonomyCodes")
{
string tempPath = System.IO.Path.GetDirectoryName(filePath);
string strConn = #"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + tempPath + #"\;Extensions=asc,csv,tab,txt";
OdbcConnection conn = new OdbcConnection(strConn);
OdbcDataAdapter da = new OdbcDataAdapter("Select * from " + System.IO.Path.GetFileName(filePath), conn);
DataTable dt = new DataTable();
da.Fill(dt);
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationSettings.AppSettings["dbConnectionString"]))
{
bulkCopy.DestinationTableName = tableName;
bulkCopy.BatchSize = 50;
bulkCopy.WriteToServer(dt);
}
}
There is a lot to consider when handling CSV files. However you extract them from the file, you should know how you are handling the parsing. There are classes out there that can get you part way, but most don't handle the nuances that Excel does with embedded commas, quotes and line breaks. However, loading Excel or the MS classes seems a lot of freaking overhead if you just want parse a txt file like a CSV.
One thing you can consider is doing the parsing in your own Regex, which will also make your code a little more platform independent, in case you need to port it to another server or application at some point. Using regex has the benefit of also being accessible in virtually every language. That said, there are some good regex patterns out there that handle the CSV puzzle. Here is my shot at it, which does cover embedded commas, quotes and line breaks. Regex code/pattern and explanation :
http://www.kimgentes.com/worshiptech-web-tools-page/2008/10/14/regex-pattern-for-parsing-csv-files-with-embedded-commas-dou.html
Hope that is of some help..
Try the code from my answer here:
Reading CSV files in C#
It handles quoted csv just fine.
private static void Mubashir_CSVParser(string s)
{
// extract the fields
Regex RegexCSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = RegexCSVParser.Split(s);
// clean up the fields (remove " and leading spaces)
for (int i = 0; i < Fields.Length; i++)
{
Fields[i] = Fields[i].TrimStart(' ', '"');
Fields[i] = Fields[i].TrimEnd('"');// this line remove the quotes
//Fields[i] = Fields[i].Trim();
}
}
Just incase anyone has a similar issue, i wanted to post the code i used. i did end up using Textparser to get the file and parse ot the columns, but i am using recrusion to get the rest done and substrings.
/// <summary>
/// Parses each string passed as a "row".
/// This routine accounts for both double quotes
/// as well as commas currently, but can be added to
/// </summary>
/// <param name="row"> string or row to be parsed</param>
/// <returns></returns>
private List<String> ParseRowToList(String row)
{
List<String> returnValue = new List<String>();
if (row[0] == '\"')
{// Quoted String
if (row.IndexOf("\",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf("\",") + 2));
returnValue.Insert(0, row.Substring(1, row.IndexOf("\",") - 1));
}
else
{// This is the last column
returnValue.Add(row.Substring(1, row.Length - 2));
}
}
else
{// Unquoted String
if (row.IndexOf(",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf(",") + 1));
returnValue.Insert(0, row.Substring(0, row.IndexOf(",")));
}
else
{// This is the last column
returnValue.Add(row.Substring(0, row.Length));
}
}
return returnValue;
}
Then the code for Textparser is:
// string pathFile = #"C:\TestFTP\TestCatalog.txt";
string pathFile = #"C:\TestFTP\SomeFile.csv";
List<String> stringList = new List<String>();
TextFieldParser fieldParser = null;
DataTable dtable = new DataTable();
/* Set up TextFieldParser
* use the correct delimiter provided
* and path */
fieldParser = new TextFieldParser(pathFile);
/* Set that there are quotes in the file for fields and or column names */
fieldParser.HasFieldsEnclosedInQuotes = true;
/* delimiter by default to be used first */
fieldParser.SetDelimiters(new string[] { "," });
// Build Full table to be imported
dtable = BuildDataTable(fieldParser, dtable);
This is what I used in a project, parses a single line of data.
private string[] csvParser(string csv, char separator = ',')
{
List <string> parsed = new List<string>();
string[] temp = csv.Split(separator);
int counter = 0;
string data = string.Empty;
while (counter < temp.Length)
{
data = temp[counter].Trim();
if (data.Trim().StartsWith("\""))
{
bool isLast = false;
while (!isLast && counter < temp.Length)
{
data += separator.ToString() + temp[counter + 1];
counter++;
isLast = (temp[counter].Trim().EndsWith("\""));
}
}
parsed.Add(data);
counter++;
}
return parsed.ToArray();
}
http://zamirsblog.blogspot.com/2013/09/c-csv-parser-csvparser.html
My program is now still running to import data from a log file into a remote SQL Server Database. The log file is about 80MB in size and contains about 470000 lines, with about 25000 lines of data. My program can import only 300 rows/second, which is really bad. :(
public static int ImportData(string strPath)
{
//NameValueCollection collection = ConfigurationManager.AppSettings;
using (TextReader sr = new StreamReader(strPath))
{
sr.ReadLine(); //ignore three first lines of log file
sr.ReadLine();
sr.ReadLine();
string strLine;
var cn = new SqlConnection(ConnectionString);
cn.Open();
while ((strLine = sr.ReadLine()) != null)
{
{
if (strLine.Trim() != "") //if not a blank line, then import into database
{
InsertData(strLine, cn);
_count++;
}
}
}
cn.Close();
sr.Close();
return _count;
}
}
InsertData is just a normal insert method using ADO.NET. It uses a parsing method:
public Data(string strLine)
{
string[] list = strLine.Split(new[] {'\t'});
try
{
Senttime = DateTime.Parse(list[0] + " " + list[1]);
}
catch (Exception)
{
}
Clientip = list[2];
Clienthostname = list[3];
Partnername = list[4];
Serverhostname = list[5];
Serverip = list[6];
Recipientaddress = list[7];
Eventid = Convert.ToInt16(list[8]);
Msgid = list[9];
Priority = Convert.ToInt16(list[10]);
Recipientreportstatus = Convert.ToByte(list[11]);
Totalbytes = Convert.ToInt32(list[12]);
Numberrecipient = Convert.ToInt16(list[13]);
DateTime temp;
if (DateTime.TryParse(list[14], out temp))
{
OriginationTime = temp;
}
else
{
OriginationTime = null;
}
Encryption = list[15];
ServiceVersion = list[16];
LinkedMsgid = list[17];
MessageSubject = list[18];
SenderAddress = list[19];
}
InsertData method:
private static void InsertData(string strLine, SqlConnection cn)
{
var dt = new Data(strLine); //parse the log line into proper fields
const string cnnStr =
"INSERT INTO LOGDATA ([SentTime]," + "[client-ip]," +
"[Client-hostname]," + "[Partner-Name]," + "[Server-hostname]," +
"[server-IP]," + "[Recipient-Address]," + "[Event-ID]," + "[MSGID]," +
"[Priority]," + "[Recipient-Report-Status]," + "[total-bytes]," +
"[Number-Recipients]," + "[Origination-Time]," + "[Encryption]," +
"[service-Version]," + "[Linked-MSGID]," + "[Message-Subject]," +
"[Sender-Address]) " + " VALUES ( " + "#Senttime," + "#Clientip," +
"#Clienthostname," + "#Partnername," + "#Serverhostname," + "#Serverip," +
"#Recipientaddress," + "#Eventid," + "#Msgid," + "#Priority," +
"#Recipientreportstatus," + "#Totalbytes," + "#Numberrecipient," +
"#OriginationTime," + "#Encryption," + "#ServiceVersion," +
"#LinkedMsgid," + "#MessageSubject," + "#SenderAddress)";
var cmd = new SqlCommand(cnnStr, cn) {CommandType = CommandType.Text};
cmd.Parameters.AddWithValue("#Senttime", dt.Senttime);
cmd.Parameters.AddWithValue("#Clientip", dt.Clientip);
cmd.Parameters.AddWithValue("#Clienthostname", dt.Clienthostname);
cmd.Parameters.AddWithValue("#Partnername", dt.Partnername);
cmd.Parameters.AddWithValue("#Serverhostname", dt.Serverhostname);
cmd.Parameters.AddWithValue("#Serverip", dt.Serverip);
cmd.Parameters.AddWithValue("#Recipientaddress", dt.Recipientaddress);
cmd.Parameters.AddWithValue("#Eventid", dt.Eventid);
cmd.Parameters.AddWithValue("#Msgid", dt.Msgid);
cmd.Parameters.AddWithValue("#Priority", dt.Priority);
cmd.Parameters.AddWithValue("#Recipientreportstatus", dt.Recipientreportstatus);
cmd.Parameters.AddWithValue("#Totalbytes", dt.Totalbytes);
cmd.Parameters.AddWithValue("#Numberrecipient", dt.Numberrecipient);
if (dt.OriginationTime != null)
cmd.Parameters.AddWithValue("#OriginationTime", dt.OriginationTime);
else
cmd.Parameters.AddWithValue("#OriginationTime", DBNull.Value);
//if OriginationTime was null, then insert with null value to this column
cmd.Parameters.AddWithValue("#Encryption", dt.Encryption);
cmd.Parameters.AddWithValue("#ServiceVersion", dt.ServiceVersion);
cmd.Parameters.AddWithValue("#LinkedMsgid", dt.LinkedMsgid);
cmd.Parameters.AddWithValue("#MessageSubject", dt.MessageSubject);
cmd.Parameters.AddWithValue("#SenderAddress", dt.SenderAddress);
cmd.ExecuteNonQuery();
}
How can my program run faster?
Thank you so much!
Use SqlBulkCopy.
Edit: I created a minimal implementation of IDataReader and created a Batch type so that I could insert arbitrary in-memory data using SqlBulkCopy. Here is the important bit:
IDataReader dr = batch.GetDataReader();
using (SqlTransaction tx = _connection.BeginTransaction())
{
try
{
using (SqlBulkCopy sqlBulkCopy =
new SqlBulkCopy(_connection, SqlBulkCopyOptions.Default, tx))
{
sqlBulkCopy.DestinationTableName = TableName;
SetColumnMappings(sqlBulkCopy.ColumnMappings);
sqlBulkCopy.WriteToServer(dr);
tx.Commit();
}
}
catch
{
tx.Rollback();
throw;
}
}
The rest of the implementation is left as an exercise for the reader :)
Hint: the only bits of IDataReader you need to implement are Read, GetValue and FieldCount.
Hmmm, let's break this down a little bit.
In pseudocode what you did is the ff:
Open the file
Open a connection
For every line that has data:
Parse the string
Save the data in SQL Server
Close the connection
Close the file
Now the fundamental problems in doing it this way are:
You are keeping a SQL connection open while waiting for your line parsing (pretty susceptible to timeouts and stuff)
You might be saving the data line by line, each in its own transaction. We won't know until you show us what the InsertData method is doing
Consequently you are keeping the file open while waiting for SQL to finish inserting
The optimal way of doing this is to parse the file as a whole, and then insert them in bulk. You can do this with SqlBulkCopy (as suggested by Matt Howells), or with SQL Server Integration Services.
If you want to stick with ADO.NET, you can pool together your INSERT statements and then pass them off into one large SQLCommand, instead of doing it this way e.g., setting up one SQLCommand object per insert statement.
You create the SqlCommand object for every row of data. The simplest improvement would therefore to create a
private static SqlCommand cmdInsert
and declare the parameters with the Parameters.Add() method. Then for each data row, set the parameter values using
cmdInsert.Parameters["#paramXXX"].Value = valueXXX;
A second performance improvement might be to skip creation of Data objects for each row, and assign Parameter values directly from the list[] array.