What's the best way to import a small csv file into SQL Server using an ASP.NET form with C#? I know there are many ways to do this, but I'm wondering which classes would be best to read the file and how to insert into the database. Do I read the file into a DataTable and then use the SqlBulkCopy class, or just insert the data using ADO.NET? Not sure which way is best. I'm after the simplest solution and am not concerned about scalability or performance as the csv files are tiny.
Using ASP.NET 4.0, C# 4.0 and SQL Server 2008 R2.
The DataTable and the SqlBulkCopy classes will do just fine and that is the way I would prefer to do it, in order to prevent that someday these tiny CSV files become larger, your program will be ready for it, as ADO.NET might add some overhead by treating a single row at a time.
EDIT #1
What's the best way to get from csv file to datatable?
The CSV file format is nothing more than a text file. As such, you might want to read it using the File.ReadAllLines Method (String), which will return a string[]. Then, you may add to your DataTable using the DataRow class or your prefered way.
Consider adding your columns when defining your DataTable so that it knows its structure when adding rows.
Related
I have several Access Database with more than 40000000 rows. I'm reading each row using a Data Reader and insert every row one by one into SQL Database. But it seems it will take weeks and even more!
Is there any way doing this migration faster?
I would recommend exporting your access database to a CSV files (or a number of CSV files), a guide is here: https://support.spatialkey.com/export-data-from-database-to-csv-file/
You can then using Bulk Import or SSIS to import the rows into SQL Server. A reference for this operation would be; http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/
This way should be substantially faster.
An programmatic alternative would be using the SQLBulkCopy class; https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy(v=vs.110).aspx
I have about 20 .csv files which are around 100-200mb each.
They each have about 100 columns.
90% of the columns of each file are the same; however, some files have more columns and some files have less columns.
I need to import all of these files into one table in a sql server 2008 database.
If the field does not exist, I need it to be created.
question: What should be the process with this import? How do I more efficiently and quickly import all of these files into one table in a database, and make sure that if a field does not exist, then it is created? Please also keep in mind that the same field might be in a different location. For example, CAR can be in field AB in one csv whereas the same field name (CAR) can be AC in the other csv file. The solution can be SQL or C# or both.
You may choose a number of options
1. Use the DTS package
2. Try to produce one uniform CSV file, get the db table in sync with its columns and bulk insert it
3. Bulk insert every file to its own table, and after that merge the tables into the target table.
I would recommend looking at the BCP program which comes with SQL Server and is intended to help with jobs just like this:
http://msdn.microsoft.com/en-us/library/aa337544.aspx
There are "format files" which allow you to specify which CSV columns go to which SQL columns.
If you are more inclined to use C#, have a look at the SqlBulkCopy class:
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
Also take a look at this SO thread, also about importing from CSV files into SQL Server:
SQL Bulk import from CSV
I recommend writing a small c# application that reads each of the CSV file headers and stores a dictionary of the columns needed and either outputs a 'create table' statement or directly runs a create table operation on the database. Then you can use Sql Management Studio to load the 20 files individually using the import routine.
Use SqlBulkCopy class in System.Data.SqlClient
It facilitates bulk data transfer. only catch it wont work with DataTime DB column
Less of an answer and more of a direction, but here I go. The way I would do it is first enumerate the column names from both the CSV files and the DB, then make sure the ones from your CSV all exist in the destination.
Once you have validated and/or created all the columns, then you can do your bulk insert. Assuming you don't have multiple imports happening at the same time, you could cache the column names from the DB when you start the import, as they shouldn't be changing.
If you will have multiple imports running at the same time, then you will need to make sure you have a full table lock during the import, as race conditions could show up.
I do a lot of automated imports for SQL DBs, and I haven't ever seen what you asked, as it's an assumed requirement that one knows the data that is coming in to the DB. Not knowing columns ahead of time is typically a very bad thing, but it sounds like you have an exception to the rule.
Roll your own.
Keep (or create) a runtime representation of the target table's columns in the database. Before importing each file, check to see if the column exists already. If it doesn't, run the appropriate ALTER statement. Then import the file.
The actual import process can and probably should be done by BCP or whatever Bulk protocol you have available. You will have to do some fancy kajiggering since the source data and destination align only logically, not physically. So you will need BCP format files.
There are several possibilities that you have here.
You can use SSIS if it is available to you.
In Sql Server you can use SqlBulkCopy to bulk insert in a staging table where you will insert the whole .csv file
and then use a stored procedure with possibly MERGE statement in it
to place each row where it belongs or create a new one if it doesn't
exist.
You can use C# code to read the files and write them using SqlBulkInsert or EntityDataReader
For those data volumes, you should use an ETL. See this tutorial.
ETLs are designed for large amount of data manipulation
I have been given the task of developing a small library (using C# 3.0 and .NET 3.5) to provide data import functionality for an application.
The spec is:
Data can be imported from CSV file
(potentially other file formats in the
future)
The CSV files can contain any
schema and number of rows, with a
maximum file size of 10MB.
It must be
possible to change the datatype and
column name of each column in the CSV
file.
It must be possible to exclude
columns in the CSV file from the
import.
Importing the data will
result in a table matching the schema
being created in a SQL Server database, and then
being populated using rows in the
CSV.
I've been playing around with ideas for a while now my current code feels like it has been hacked together a bit.
My current implementation approach is:
Open CSV and estimate the schema,
store in an ImportSchema class
Allow the schema to be modified.
Use SMO to create the table in SQL
according to the schema.
Create a System.Data.DataTable instance using the schema
for datatypes.
Use CsvReader to read the CSV
data into the DataTable.
Apply column name changes and remove unwanted columns from DataTable.
Use System.Data.SqlClient.SqlBulkCopy() to add the rows from the DataTable into the created database table.
This sounds overly complex to me and I am facing a mental block trying to wrap it up neatly in a handful of testable/extensible objects.
Any suggestions/thoughts on ways to approach this problem, both from an implementation and a design perspective?
Many thanks for any suggestions.
As suggested in some previous SO answers, take a look at the FileHelpers Libraray. This might be at least helpful in your task to import and analyze the CSV files.
I have a csv file with 350,000 rows, each row has about 150 columns.
What would be the best way to insert these rows into SQL Server using ADO.Net?
The way I've usually done it is to create the SQL statement manually. I was wondering if there is any way I can code it to simply insert the entire datatable into SQL Server? Or some short-cut like this.
By the way I already tried doing this with SSIS, but there are a few data clean-up issues which I can handle with C# but not so easily with SSIS. The data started as XML, but I changed it to CSV for simplicity.
Make a class "CsvDataReader" that implements IDataReader. Just implement Read(), GetValue(int i), Dispose() and the constructor : you can leave the rest throwing NotImplementedException if you want, because SqlBulkCopy won't call them. Use read to handle the read of each line and GetValue to read the i'th value in the line.
Then pass it to the SqlBulkCopy with the appropriate column mappings you want.
I get about 30000 records/per sec insert speed with that method.
If you have control of the source file format, make it tab delimited as it's easier to parse than CSV.
Edit : http://www.codeproject.com/KB/database/CsvReader.aspx - tx Mark Gravell.
SqlBulkCopy if it's available. Here is a very helpful explanation of using SqlBulkCopy in ADO.NET 2.0 with C#
I think you can load your XML directly into a DataSet and then map your SqlBulkCopy to the database and the DataSet.
Hey you should revert back to XML instead of csv, then load that xml file in a temp table using openxml, clean up your data in temp table and then finally process this data.
I have been following this approach for huge data imports where my XML files happen to be > 500 mb in size and openxml works like a charm.
You would be surprised at how much faster this would work compared to manual ado.net statements.
On e of my current requirements is to take in an Excel spreadsheet that the user updates about once a week and be able to query that document for certain fields.
As of right now, I run through and push all the Excel (2007) data into an xml file (just once when they upload the file, then I just use the xml) that then holds all of the needed data (not all of the columns in the spreadsheet) for querying via Linq-to-XML; note that the xml file is smaller than the excel.
Now my question is, is there any performance difference between querying an XML file with Linq and an Excel file with OledbConnection? Am I just adding another unneccesary step?
I suppose the followup question would be, is it worth it for ease of use to keep pushing it to xml.
The file has about 1000 rows.
For something that is done only once per week I don't see the need to perform any optimizations. Instead you should focus on what is maintainable and understandable both for you and whoever will maintain the solution in the future.
Use whatever solution you find most natural :-)
As I understand it the performance side of things stands like this for accessing Excel data.
Fastest to Slowest
1. Custom 3rd party vendor software using C++ directly on the Excel file type.
2. OleDbConnection method using a schema file if necessary for data types, treats Excel as a flatfile db.
3. Linq 2 XML method superior method for read/write data with Excel 2007 file formats only.
4. Straight XML data manipulation using the OOXML SDK and optionally 3rd party xml libraries. Again limited to Excel 2007 file formats only.
5. Using an Object[,] array to read a region of cells (using .Value2 prop), and passing an Object[,] array back again to a region of cells (again .Value2 prop) to write data.
6. Updating and reading from cells individually using the .Cells(x,y) and .Offset(x,y) prop accessors.
You can't use a SqlConnection to access an Excel spreadsheet. More than likely, you are using an OleDbConnection or an OdbcConnection.
That being said, I would guess that using the OleDbConnection to access the Excel sheet would be faster, as you are processing the data natively, but the only way to know for the data you are using is to test it yourself, using the Stopwatch class in the System.Diagnostics namespace, or using a profiling tool.
If you have a great deal of data to process, you might also want to consider putting it in SQL Server and then querying that (depending on the ratio of queries to the time it takes to save the data, of course).
I think it's important to discuss what type of querying you are doing with the file. I have to believe it will be a great deal easier to query using LINQ than the oledbconnection although I am talking more from experience than anything else.