Upload csv file, parse, then linsert data into SQL table - c#

Have a webform that has an upload button to upload csv file then my code needs to parse the file and use the parsed data to insert into a SQL table. Is what I'm doing correct for the parse data to a List, it's not picking up the filename for the streamreader. Is this the most effective way to parse the data? Should I parse into to a datatable?
protected void UploadBtn_Click(object sender, EventArgs e)
{
if (FileUpload.HasFile)
{
string filename = Path.GetFileName(FileUpload.FileName);
List<string[]> ValuesToUpload = parseData(filename);
//if (!Directory.Exists(ConfigurationManager.AppSettings["temp_dir"].ToString().Trim()))
//{
// Directory.CreateDirectory(ConfigurationManager.AppSettings["temp_dir"].ToString().Trim());
//}
//FileUpload.SaveAs(ConfigurationManager.AppSettings["temp_dir"].ToString().Trim() + filename);
//using (FileStream stream = new FileStream(ConfigurationManager.AppSettings["temp_dir"].ToString().Trim() + filename, FileMode.Open, FileAccess.Read, FileShare.Read))
}
}
public List<string[]> parseData(filename)
{
int j=0;
List <string[]> members = new List<string[]>();
try
{
using (StreamReader read = new StreamReader(filename))
{
while (!read.EndOfStream)
{
string line = read.ReadLine();
string[] values = line.Split(',');
if(j==0)
{
j++;
continue;
}
long memnbr = Convert.ToInt64(values[0]);
int loannbr = Convert.ToInt32(values[1]);
int propval = Convert.ToInt32(values[2]);
members.Add(values);
}

Use KBCsv. We are getting 40K rows parsed per second, and 70K+ rows skipped per second. This is the fastest I have seen. And also pretty stable. Then generate SQL manually as suggested above. If doing data reload and aim for performance, run multi-threaded, no transaction (MS SQL only). Can get up to 10K rows per second of import speed, depending on your network bandwidth to database server.
Do not parse to DataTable - it is very slow.

Since you're going to insert the data into the SQL table, I'd first create a class that represents the table and create a new object for each record. (this is for visibility).
or I could use the following approaches (assuming you're using MS SQL Server)
1. The Dynamic Insert Query
StringBuilder strInsertValues = new StringBuilder("VALUES");
your ParsingCode HERE..
string [] values = line.Split(',');
strInsertValues.AppendFormat("({0},{1},{2}),", values[0], values[1], values[2]);
end parse
using(SqlConnection cn = new SqlConnection(YOUR_CONNECTION_STRING)){
SqlCommand cmd = cn.CreateCommand;
cmd.CommandType = SqlCommandType.Text;
cmd.CommandText = "INSERT INTO TABLE(Column1, Column2, Column3) " + strInsertValues.ToString().SubString(0, strInsertValues.Length);
cn.Open();
cmd.ExecuteNonQuery();
}
2. Use BulkCopy (Recommanded)
Create a DataSet the represents your CSV values
Add a new record for each line parsed
Create Column Mappings for your DataSet and SQL Table,
Use BulkCopy Object to insert your data.
Ref to BulkCopy: http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx

Not really an answer, but too long to post as a comment...
As it looks like you're throwing away your parsed values (memnbr, etc...), you could significantly reduce your csv parsing code to:
return
File
.ReadLines(filename)
.Skip(1)
.Select(line => line.Split(','))
.ToList();

The below code sample will bulk insert CSV data into a staging table that has matching columns and then will execute a Stored Procedure to normalize the data on the server.
This is significantly more efficiently than manually parsing the data and inserting the data line-by-line. A few months ago I used similar code to submit 1,500,000+ records to our database and normalize the data in a matter of seconds.
var sqlConnection = new SqlConnection(DbConnectionStringInternal);
// Bulk-import our unnormalized data from the .csv file into a staging table
var inputFileConnectionString = String.Format("Driver={{Microsoft Text Driver (*.txt; *.csv)}};Extensions=csv;Readonly=True;Dbq={0}", Path.GetDirectoryName(csvFilePath));
using (var inputFileConnection = new OdbcConnection(inputFileConnectionString))
{
inputFileConnection.Open();
var selectCommandText = String.Format("SELECT * FROM {0}", Path.GetFileName(csvFilePath));
var selectCommand = new OdbcCommand(selectCommandText, inputFileConnection);
var inputDataReader = selectCommand.ExecuteReader(CommandBehavior.CloseConnection);
var sqlBulkCopy = new SqlBulkCopy(sqlConnection) { DestinationTableName = "Data_Staging" };
if (sqlConnection.State != ConnectionState.Open)
sqlConnection.Open();
sqlBulkCopy.WriteToServer(inputDataReader);
}
// Run a stored-procedure to normalize the data in the staging table, then efficiently move it across to the "real" tables.
var addDataFromStagingTable = String.Format("EXEC SP_AddDataFromStagingTable");
if (sqlConnection.State != ConnectionState.Open)
sqlConnection.Open();
using (var addToStagingTableCommand = new SqlCommand(addDataFromStagingTable, sqlConnection) { CommandTimeout = 60 * 20 })
addToStagingTableCommand.ExecuteNonQuery();
sqlConnection.Close();

Related

How to import csv file to mssql express server in C# and changing datatypes

I've got a console application that uses an API to get data which is then saved into a csv file in the following format:
Headers:
TicketID,TicketTitle,TicketStatus,CustomerName,TechnicianFullName,TicketResolvedDate
Body:
String values. where TicketResolvedDate is written as: YYYY-MM-DDTHH:mm:ssZ
Now I want to import this csv file into my mssql express database using the same console application and make sure the TicketID is imported as a integer datatype and the TicketResolvedDate as a SQL datetime datatype.
I've made the following code:
List<TicketCSV> tickets = new List<TicketCSV>();
using var reader1 = new StreamReader(OutputClosedTickets);
using var reader2 = new StreamReader(OutputWorkhours);
using var csv1 = new CsvReader((IParser)reader1);
{
csv1.Configuration.Delimiter = ",";
csv1.Configuration.MissingFieldFound = null;
csv1.Configuration.PrepareHeaderForMatch = (string header, int index) => header.ToLower();
csv1.ReadHeader();
while (csv1.Read())
{
var record = new TicketCSV
{
TicketID = csv1.GetField<int>("TicketID"),
TicketTitle = csv1.GetField("TicketTitle"),
TicketStatus = csv1.GetField("TicketStatus"),
CustomerName = csv1.GetField("CustomerName"),
TechnicianFullName = csv1.GetField("TechnicianFullName"),
TicketResolvedDate = SqlDateTime.Parse(csv1.GetField("TicketResolvedDate"))
};
tickets.Add(record);
}
}
using (var bulkCopy = new SqlBulkCopy(connectionString))
{
bulkCopy.DestinationTableName = "GeslotenTickets";
bulkCopy.WriteToServer((IDataReader)csv1);
bulkCopy.DestinationTableName = "WerkUren";
bulkCopy.WriteToServer((IDataReader)reader2);
}
But I'm not sure if this is remotely near the idea i should have to establish this
You're on the right track, but there are a couple issues with your code. You're reading the CSV data into objects, but then passing the CsvReader to the bulk copy operation. At that point all the CSV data in the reader has already been consumed, because you read it all when you were creating objects. Thus the SqlBulkCopy won't see any data in the reader.
The next issue that I think you're going to have is that the "schema" of the data reader needs to match the schema of the target SQL table. If the schemas don't match, you'll typically get some cryptic error message out of the SqlBulkCopy operation, that some type can't be converted.
I maintain a library that I've specifically designed to work well in this scenario: Sylvan.Data.Csv. It allows you to apply a schema to the "untyped" CSV data.
Here is an example of how you could write CSV data to a table in SqlServer:
using Sylvan.Data.Csv;
using System.Data.SqlClient;
static void LoadTableCsv(SqlConnection conn, string tableName, string csvFile)
{
// read the column schema of the target table
var cmd = conn.CreateCommand();
cmd.CommandText = $"select top 0 * from {tableName}"; // beware of sql injection
var reader = cmd.ExecuteReader();
var colSchema = reader.GetColumnSchema();
reader.Close();
// apply the column schema to the csv reader.
var csvSchema = new CsvSchema(colSchema);
var csvOpts = new CsvDataReaderOptions { Schema = csvSchema };
using var csv = CsvDataReader.Create(csvFile, csvOpts);
using var bulkCopy = new SqlBulkCopy(conn);
bulkCopy.DestinationTableName = tableName;
bulkCopy.EnableStreaming = true;
bulkCopy.WriteToServer(csv);
}
You still might encounter errors if the CSV data doesn't correctly match the schema, or has invalid or broken records, but this should work if your csv files are clean and valid.

How to import Data in csv file into a SQL Server database using C#

I'm trying to import data into a SQL Server database from a .csv file. I have just one problem: for the money row, I am throwing Format.Exception due to wrong format of money variable.
I tried to convert to double I change the period instead of comma, I change in split(); method also semicolon ; instead of comma , but the exception didn't go away. Does anyone know what to do about this?
It is just an experiment.
My .csv file looks like this:
Database table's columns are:
name, second_Name, nickname, money
Code:
public void Import()
{
SqlCommand command = null;
var lineNumber = 0;
using (SqlConnection conn = DatabaseSingleton.GetInstance())
{
// conn.Open();
using (StreamReader reader = new StreamReader(#"C:\Users\petrb\Downloads\E-Shop\E-Shop\dataImport.csv"))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
if (lineNumber != 0)
{
var values = line.Split(',');
using (command = new SqlCommand("INSERT INTO User_Shop VALUES (#name, #second_Name, #nickname, #money", conn))
{
command.Parameters.Add(new SqlParameter("#name", values[0].ToString()));
command.Parameters.Add(new SqlParameter("#second_Name", values[1].ToString()));
command.Parameters.Add(new SqlParameter("#nickname", values[2].ToString()));
command.Parameters.Add(new SqlParameter("#money", Convert.ToDecimal(values[3].ToString())));
command.Connection = conn;
command.ExecuteNonQuery();
}
}
lineNumber++;
}
}
conn.Close();
}
Console.WriteLine("Products import completed");
Console.ReadLine();
}
I maintain a package Sylvan.Data.Csv that makes it very easy to bulk import CSV data into SQL Server, assuming the shape of your CSV file matches the target table.
Here is some code that demonstrates how to do it:
SqlConnection conn = ...;
// Get the schema for the target table
var cmd = conn.CreateCommand();
cmd.CommandText = "select top 0 * from User_Shop";
var reader = cmd.ExecuteReader();
var tableSchema = reader.GetColumnSchema();
// apply the schema of the target SQL table to the CSV data.
var options =
new CsvDataReaderOptions {
Schema = new CsvSchema(tableSchema)
};
using var csv = CsvDataReader.Create("dataImport.csv", options);
// use sql bulk copy to bulk insert the data
var bcp = new SqlBulkCopy(conn);
bcp.BulkCopyTimeout = 0;
bcp.DestinationTableName = "User_Shop";
bcp.WriteToServer(csv);
On certain .NET framework versions GetColumnSchema might not exist, or might throw NotSupportedException. The Sylvan.Data v0.2.0 library can be used to work around this. You can call the older GetSchemaTable API, then use the Sylvan.Data.Schema type to convert it to the new-style schema IReadOnlyCollection<DbColumn>:
DataTable schemaDT = reader.GetSchemaTable();
var tableSchema = Schema.FromSchemaTable(schemaDT);
Try this:
SqlParameter moneyParam = new SqlParameter("#money", SqlDbType.Money);
moneyParam.Value = SqlMoney(Convert.ToDecimal(values[3].ToString()))
command.Parameters.Add(moneyParam)
Not sure if it'll work, but it seems to make sense to me.
The problem being that when you use the constructor for the SQL parameter, I think it assumes the type of your variable, so in this case, as you're passing a double or whatever, the equivalent DB type is 'decimal', however your DB schema will be using the DB type 'money', so explicitly setting the DB type in your parameter constructor should help.

Uploading Data to SQL from Excel (.CSV) file using C# windows form app

I am using this method to upload data to SQL.
private void button5_Click(object sender, EventArgs e)
{
string filepath = textBox2.Text;
string connectionString_i = string.Format(#"Provider=Microsoft.Jet.OleDb.4.0; Data Source={0};Extended Properties=""Text;HDR=YES;FMT=Delimited""", Path.GetDirectoryName(filepath));
using (OleDbConnection connection_i = new OleDbConnection(connectionString_i))
{
connection_i.Open();
OleDbCommand command = new OleDbCommand ("Select * FROM [" + Path.GetFileName(filepath) +"]", connection_i);
command.CommandTimeout = 180;
using (OleDbDataReader dr = command.ExecuteReader())
{
string sqlConnectionString = MyConString;
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConnectionString))
{
SqlBulkCopy bulkInsert = new SqlBulkCopy(sqlConnectionString);
bulkInsert.BulkCopyTimeout = 180;
bulkInsert.DestinationTableName = "Table_Name";
bulkInsert.WriteToServer(dr);
MessageBox.Show("Upload Successful!");
}
}
connection_i.Close();
}
}
I have an Excel sheet in .CSV format of about 1,048,313 entries. That bulk copy method is just working for about 36000 to 60000 entries. I want to ask if there is any way that I can select the first 30000 entries from Excel and upload them to a SQL Server table, then again select next chunk of 30000 rows and upload those to SQL Server, and so on until the last entry has been stored.
Create a datatable to store the values from your csv file that needs to be inserted into your target table. Each column in the datatable would correspond to a data column in the csv file.
Create a custom data type (table-valued) on SQL Server to match your data table, including data type and length. As this post was tagged sql-server and not access, as your sample connection string seems to contradict that.
Using a text reader and a counter variable, populate your datatable with 30,000 records.
Pass the data table to your insert query or stored procedure. The pararameter type is SqlDbType.Structured.
In the event that the job fails and you need to restart, the first step could be to determine the last inserted value from a predefined key in your field. Your could also use a left outer join as part of your insert query to only insert records that do not exist on the table. These are just a few of the more common techniques to restart a failed ETL job.
This technique has some tactical advantages over the bulk copy as it adds flexibility and is less coupled to the target table, thus changes to the table could be less volatile, depending on the nature of the change.

SqlBulkCopy ColumnMapping Error

My goal is to copy generic tables from one database to another. I would like to have it copy the data as is and it would be fine to either delete whatever is in the table or to add to it with new columns if there are new columns. The only thing I may want to change is to add something for versioning which can be done in a seperate part of the query.
Opening the data no problem but when I try a bulk copy but it is failing. I have gone though several posts and the closest thing is this one:
SqlBulkCopy Insert with Identity Column
I removed the SqlBulkCopyOptions.KeepIdentity from my code but it still is throwing
"The given ColumnMapping does not match up with any column in the source or destination" error
I have tried playing with the SqlBulkCopyOptions but so far no luck.
Ideas?
public void BatchBulkCopy(string connectionString, DataTable dataTable, string DestinationTbl, int batchSize)
{
// Get the DataTable
DataTable dtInsertRows = dataTable;
using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString))
{
sbc.DestinationTableName = DestinationTbl;
// Number of records to be processed in one go
sbc.BatchSize = batchSize;
// Finally write to server
sbc.WriteToServer(dtInsertRows);
}
}
If I could suggest another approach, I would have a look at the SMO (SQL Server Management Objects) library to perform such tasks.
You can find an interesting article here.
Using SMO, you can perform tasks in SQL Server, such a bulk copy, treating tables, columns and databases as objects.
Some time ago, I used SMO in a small open source application I developed, named SQLServerDatabaseCopy.
To copy the data from table to table, I created this code (the complete code is here):
foreach (Table table in Tables)
{
string columnsTable = GetListOfColumnsOfTable(table);
string bulkCopyStatement = "SELECT {3} FROM [{0}].[{1}].[{2}]";
bulkCopyStatement = String.Format(bulkCopyStatement, SourceDatabase.Name, table.Schema, table.Name, columnsTable);
using (SqlCommand selectCommand = new SqlCommand(bulkCopyStatement, connection))
{
LogFileManager.WriteToLogFile(bulkCopyStatement);
SqlDataReader dataReader = selectCommand.ExecuteReader();
using (SqlConnection destinationDatabaseConnection = new SqlConnection(destDatabaseConnString))
{
if (destinationDatabaseConnection.State == System.Data.ConnectionState.Closed)
{
destinationDatabaseConnection.Open();
}
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationDatabaseConnection))
{
bulkCopy.DestinationTableName = String.Format("[{0}].[{1}]", table.Schema, table.Name);
foreach (Column column in table.Columns)
{
//it's not needed to perfom a mapping for computed columns!
if (!column.Computed)
{
bulkCopy.ColumnMappings.Add(column.Name, column.Name);
}
}
try
{
bulkCopy.WriteToServer(dataReader);
LogFileManager.WriteToLogFile(String.Format("Bulk copy successful for table [{0}].[{1}]", table.Schema, table.Name));
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
finally
{
//closing reader
dataReader.Close();
}
}
}
}
}
As you can see, you have to add the ColumnMappings to the BulkCopy object for each column, because you have to define which column of source table must be mapped to a column of destination table. This is the reason of your error that says: The given ColumnMapping does not match up with any column in the source or destination.
I would add some validation to this to check what columns your source and destination tables have in common.
This essentially queries the system views (I have assumed SQL Server but this will be easily adaptable for other DBMS), to get the column names in the destination table (excluding identity columns), iterates over these and if there is a match in the source table adds the column mapping.
public void BatchBulkCopy(string connectionString, DataTable dataTable, string DestinationTbl, int batchSize)
{
using (SqlBulkCopy sbc = new SqlBulkCopy(connectionString))
{
sbc.DestinationTableName = DestinationTbl;
string sql = "SELECT name FROM sys.columns WHERE is_identity = 0 AND object_id = OBJECT_ID(#table)";
using (var connection = new SqlConnection(connectionString))
using (var command = new SqlCommand(sql, connection))
{
command.Parameters.AddWithValue("#table", DestinationTbl);
connection.Open();
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
var column = reader.GetString(0);
if (dataTable.Columns.Contains(column))
{
sbc.ColumnMappings.Add(column, column);
}
}
}
}
// Number of records to be processed in one go
sbc.BatchSize = batchSize;
// Finally write to server
sbc.WriteToServer(dataTable);
}
}
This could still get invalid cast errors as there is no data type check, but should get you started for a generic method.
You can add
sbc.ColumnMappings.Add(0, 0);
sbc.ColumnMappings.Add(1, 1);
sbc.ColumnMappings.Add(2, 2);
sbc.ColumnMappings.Add(3, 3);
sbc.ColumnMappings.Add(4, 4);
before executing
sbc.WriteToServer(dataTable);
Thank you !!

Importing Excel files with a large number of columns header into mysql with c#

i just was just wondering, how do i import large excel files into mysql with c#? My coding experience isn't great and i was hoping if there's anyone out there who could give me some rough idea to start on it. So far, i was able to load excel files into datagridview with the following codes:
string PathConn = " Provider=Microsoft.JET.OLEDB.4.0;Data Source=" + pathTextBox.Text + ";Extended Properties =\"Excel 8.0;HDR=Yes;\";";
OleDbConnection conn = new OleDbConnection(PathConn);
conn.Open();
OleDbDataAdapter myDataAdapter = new OleDbDataAdapter("Select * from [" + loadTextBox.Text + "$]", conn);
table = new DataTable();
myDataAdapter.Fill(table);
but after that, i don't know how i could extract the information and save it into mysql database. Assuming i have a empty scheme created before, how do i work on uploading excel files into mysql? thanks.
I think you would then need to loop over the items in the datatable and do something with them (maybe an insert statement to your DB)
like so
foreach(DataRow dr in table.Rows)
{
string s = dr[0].ToString() // this will be the first column in the datatabl as they are zero indexed
}
this is what i do in data migration scenarios from one SQL Server to another or DataFiles to SQL:
Create the new Table on the destination SQL Server (Column names, Primary Key etc.)
Load existing Data into a DataTable (Thats what you did already)
Now Query the new Table with the DataAdapter into another DataTable (Same as you did with the excel file except you now query the SQL Table.)
Load OldData from 'table' into 'newTable' using DataTable Method "Load()"
string PathConn = (MYSQL Connection String goes here)
OleDbConnection conn = new OleDbConnection(PathConn);
conn.Open();
OleDbDataAdapter myDataAdapter = new OleDbDataAdapter("Select * from [" + loadTextBox.Text + "$]", conn);
newTable = new DataTable();
myDataAdapter.Fill(newTable);
Now use the Load() Method on the new table:
newTable.Load(table.CreateDataReader(), <Specify LoadOption here>)
Matching columns will be imported into the new DataTable. (You can ensure the mapping through using Aliases in the select statements)
After Loading the existing Data into the new Table you will be able to use an DataAdapter to write the changes back to database.
Example for writing data back: ConnString - connection String for DB,
SelectStmt (can use the same as you did on the empty Table before) and provide the newTable as dtToWrite
public static void writeDataTableToServer(string ConnString, string selectStmt, DataTable dtToWrite)
{
using (OdbcConnection odbcConn = new OdbcConnection(ConnString))
{
odbcConn.Open();
using (OdbcTransaction trans = odbcConn.BeginTransaction())
{
using (OdbcDataAdapter daTmp = new OdbcDataAdapter(selectStmt, ConnString))
{
using (OdbcCommandBuilder cb = new OdbcCommandBuilder(daTmp))
{
try
{
cb.ConflictOption = ConflictOption.OverwriteChanges;
daTmp.UpdateBatchSize = 5000;
daTmp.SelectCommand.Transaction = trans;
daTmp.SelectCommand.CommandTimeout = 120;
daTmp.InsertCommand = cb.GetInsertCommand();
daTmp.InsertCommand.Transaction = trans;
daTmp.InsertCommand.CommandTimeout = 120;
daTmp.UpdateCommand = cb.GetUpdateCommand();
daTmp.UpdateCommand.Transaction = trans;
daTmp.UpdateCommand.CommandTimeout = 120;
daTmp.DeleteCommand = cb.GetDeleteCommand();
daTmp.DeleteCommand.Transaction = trans;
daTmp.DeleteCommand.CommandTimeout = 120;
daTmp.Update(dtToWrite);
trans.Commit();
}
catch (OdbcException ex)
{
trans.Rollback();
throw ex;
}
}
}
}
odbcConn.Close();
}
}
Hope this helps.
Primary Key on the newTable is necessary, otherwise you might get a CommandBuilder exception.
BR
Therak
Your halfway there, You have obtained the information from the Excel spreadsheet and have it stored in a DataTable.
The first thing you need to do before you look to import a significant amount of data into SQL is validate what you have read in from the spreadsheets.
You have a few options, one of which is do something very similar to how you read in your data and that is use a SQLAdapter to perform am INSERT into the SQL Database. All your really needing to do in this case is create a new connection and write the INSERT command.
There are many example of doing this on here.
Another option which i would use, is LINQ to CSV (http://linqtocsv.codeplex.com/).
With this you can load all of your data into class objects which makes it easier to validate each object before you perform your INSERT into SQL.
If you have limited experience then use the SQLAdapter to connect to you DB.
Good Luck

Categories

Resources