I have a very large DataSet containing about 160.000 records. If I loop trough the dataset and import every record, it can take about 20 minutes before the complete dataset is imported into the SQL Server.
Isn't there a faster way for importing the dataset at once in the database?
The dataset is created from a file I process which the user provides, then I have 1 table called lets say "ImportTable" containing about 14 columns. The columns correspond with the columns in the DataSet.
I use Visual Studio 2010 professional with c#.
Thanks in advance!
You should take a close look at the SqlBulkCopy class.
It's a C# component (no external app), it takes a DataSet or DataTable as input, and copies that into SQL Server in a bulk fashion. Should be significantly faster than doing a row-by-agonizing-row (RBAR) insert operation...
Better yet: you don't even need to import your entire data set into memory - you can define a SqlDataReader on your base data, and pass that to SqlBulkCopy to read from the SqlDataReader and bulk insert into SQL Server.
You may want to take a look at the bcp command line utility. It lets you load data directly from a file into a table in the database. Depending on how the user generated file looks, you may need to re-format it, but if it has a simple delimited format you can probably use it as-is with bcp.
You can use make dataset xml using DataSet.getXml function. and pass input paremeter for SP.
for example
Create PROCEDURE dbo.MyInsertSP
(
#strXML varchar(1000)
)
AS
Begin
Insert into publishers
Select * from OpenXml(#intPointer,'/root/publisher',2)
With (pub_id char(4), pub_name varchar(40), city varchar(20),
state char(2),9) country varchar(20))
exec sp_xml_removedocument #intPointer
RETURN
End
Hope this make sense.
Related
I want to perform bulk insert from CSV to MySQL database using C#, I'm using MySql.Data.MySqlClient for connection. CSV columns are refereed into multiple tables and they are dependent on primary key value, for example,
CSV(column & value): -
emp_name, address,country
-------------------------------
jhon,new york,usa
amanda,san diago,usa
Brad,london,uk
DB Schema(CountryTbl) & value
country_Id,Country_Name
1,usa
2,UK
3,Germany
DB Schema(EmployeeTbl)
Emp_Id(AutoIncrement),Emp_Name
DB Schema(AddressTbl)
Address_Id(AutoIncrement), Emp_Id,Address,countryid
Problem statement:
1> Read data from CSV to get the CountryId from "CountryTbl" for respective employee.
2> Insert data into EmployeeTbl and AddressTbl with CountryId
Approach 1
Go as per above problem statement steps, but that will be a performance hit (Row-by-Row read and insert)
Approach 2
Use "Bulk Insert" option "MySqlBulkLoader", but that needs csv files to read, and looks that this option is not going to work for me.
Approach 3
Use stored proc and use the procedure for upload. But I don't want to use stored proc.
Please suggest if there is any other option by which I can do bulk upload or suggest any other approach.
Unless you have hundreds of thousands of rows to upload, bulk loading (your approach 2) probably is not worth the extra programming and debugging time it will cost. That's my opinion, for what it's worth (2x what you paid for it :)
Approaches 1 and 3 are more or less the same. The difference lies in whether you issue the queries from c# or from your sp. You still have to work out the queries. So let's deal with 1.
The solutions to these sorts of problems depend on make and model of RDBMS. If you decide you want to migrate to SQL Server, you'll have to change this stuff.
Here's what you do. For each row of your employee csv ...
... Put a row into the employee tbl
INSERT INTO EmployeeTbl (Emp_Name) VALUES (#emp_name);
Notice this query uses the INSERT ... VALUES form of the insert query. When this query (or any insert query) runs, it drops the autoincremented Emp_Id value where a subsequent invocation of LAST_INSERT_ID() can get it.
... Put a row into the address table
INSERT INTO AddressTbl (Emp_Id,Address,countryid)
SELECT LAST_INSERT_ID() AS Emp_Id,
#address AS Address,
country_id AS countryid
FROM CountryTbl
WHERE Country_Name = #country;
Notice this second INSERT uses the INSERT ... SELECT form of the insert query. The SELECT part of all this generates one row of data with the column values to insert.
It uses LAST_INSERT_ID() to get Emp_Id,
it uses a constant provided by your C# program for the #address, and
it looks up the countryid value from your pre-existing CountryTbl.
Notice, of course, that you must use the C# Parameters.AddWithValue() method to set the values of the # parameters in these queries. Those values come from your CSV file.
Finally, wrap each thousand rows or so of your csv in a transaction, by preceding their INSERT statements with a START TRANSACTION; statement and ending them with a COMMIT; statement. That will get you a performance improvement, and if something goes wrong the entire transaction will get rolled back so you can start over.
I want to retrieve a table on a SQL Server database that is located on another server
and I want to store the data retrieved into my own SQL Server database.
How do I can do that?
Thanks so much
one of the easiest and best method i found in this article this might be resolve your issue easily
SQL SERVER – 2008 – Copy Database With Data – Generate T-SQL For Inserting Data From One Table to Another Table
One of the fastest method if you have a lot of data is a tool called bcp.
It allows you to export and import data to a file. So you can export from the source database and then import to the target. It is very fast.
If your destination is a SQL 2008 database and you're set on using C# to connect to the source and get the data you could use a Table parameter. A DataTable in .NET is directly mappable to a User Defined Table type in SQL Server.
Here is a SO thread about it:
How to pass User Defined Table Type as Stored Procedured parameter in C#
Define your custom table type in your destination database
create type MyCustomTable as Table
(
Field1 int,
Field2 varchar(50),
Field3 decimal(18,0)
)
The concept would be to read all of the data from the source in to a data table. Then you would use a SqlParameter to execute a stored procedure or possibly text query on your destination server. By using a stored procedure that accepts a table parameter you could do the following:
CREATE PROCEDURE dbo.BulkCopyData
(
#SourceData MyCustomTable readonly --readonly has to be there, Table params have to be readonly as parameters
) AS
BEGIN
INSERT INTO dbo.DestinationTable
(
Field1,
Field2,
Field3
--more fields
)
SELECT Field1,Field2,Field3 FROM #SourceData
END
And in C# when you go to execute the command:
DataTable dt = new DataTable(); //Go get the data from your source here
SqlConnection conn = new SqlConnection("....");
conn.Open();
SqlCommand cmd = new SqlCommand("dbo.BulkCopyData",conn)
cmd.Parameters.Add( new SqlParameter("SourceData", SqlDbType.Structured){ TypeName = "dbo.MyCustomTable ", Value = dt});
cmd.Parameters[0].
cmd.ExecuteNonQuery();
You can use also openrowset function on SQL and call/query the remote server using your SQL code. This feature is not enabled by defauld (you must use the SP_CONFIGURE stored procedure and enable the remote queries to use this functionality). Here is a link with some examples.
http://msdn.microsoft.com/en-us/library/ms190312.aspx
When you need to know how to set up the configuration just let me know ;)
Connect to your DB using SQL Server management Studio
Go to Server Objects ->Add a new Linked Server
then you can use the other table as select * from LinkedServerName.DBName.dbo.TableName
I know SQL Server 2000 has a bulk insert. Does it support bulk insert from a C# collection, such as a dataset?
I need to insert 30 rows at a time, fairly regularly. I don't want to create 30 DB connections for this if I don't have to.
Have a look at SqlBulkCopy (has according to forums SQL 2000 support). It's easy to use. Basically just provide it with a data table (or data reader) and it will copy the rows from that source to your destination table.
You can insert using a DataSet in SQL 2000, I've never tried because I never use DataSets.
http://www.dotnet247.com/247reference/msgs/3/16570.aspx has a good post on it:
(From the article)
Steps involved
1.Create SqlDataAdapter with proper select statement
2.Create dataset and fill dataset with SqlDataAdapter
3.Add rows to the table in the dataset (for all the actions you said above, like radio
button selected, check box enabled)
4.Use SqlCommandBuilder helper object to generate the
UpdateStatements. Its very easy to use command builder. Just a one
call to the SqlCommandBuilder constructor.
5.Once your are done adding rows to the datatable int the dataset call
SqlDataAdapter.update and pass the modified dataset as a parameter.
This should automatically add the rows from dataset to the
database.(if no database error occurs)
Have you considered XML?
Working with XML in SQL 2000 isn't as nice as in 2008, but it is still doable:
http://www.codeproject.com/KB/database/insxmldatasqlsvr.aspx
http://www.codeproject.com/KB/database/generic_OpenXml.aspx
http://support.microsoft.com/default.aspx?scid=kb;en-us;315968
Another option you could look at would be to:
Open Connection.
Iterate through the inserts
Close Connection.
I'm looking for any advice on what's the optimum way of inserting a large number of records into a database (SQL2000 onwards) based upon a collection of objects.
Currently my code looks something similar to the snippet below and each record is inserted using a single sql simple INSERT command (opening and closing the database connection each time the function is called! - I'm sure this must slow things down?).
The routine needs to be able to cope with routinely inserting up to 100,000 records and I was wondering if there is a faster way (I'm sure there must be???). I've seen a few posts mentioning using xml based data and bulk copy routines - is this something I should consider or can anyone provide any simple examples which I could build upon?
foreach (DictionaryEntry de in objectList)
{
eRecord record = (eRecord)de.Value;
if (!record.Deleted)
{
createDBRecord(record.Id,
record.Index,
record.Name,
record.Value);
}
}
Thanks for any advice,
Paul.
Doing that way will be relatively slow. You need to consider a bulk INSERT technique either using BCP or BULK INSERT, or if you are using .NET 2.0 you can use the SqlBulkCopy class. Here's an example of using SqlBulkCopy: SqlBulkCopy - Copy Table Data Between SQL Servers at High Speeds
Here's a simple example for SQL Server 2000:
If you have a CSV file, csvtest.txt, in the following format:
1,John,Smith
2,Bob,Hope
3,Kate,Curie
4,Peter,Green
This SQL script will load the contents of the csv file into a database table (If any row contains errors it will be not inserted but other rows will be):
USE myDB
GO
CREATE TABLE CSVTest
(
ID INT,
FirstName VARCHAR(60),
LastName VARCHAR(60)
)
GO
BULK INSERT
CSVTest
FROM 'c:\temp\csvtest.txt'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
SELECT *
FROM CSVTest
GO
You could write out the dictionary contents into a CSV file and then BULK INERT that file.
See also: Using bcp and BULK INSERT
If you implement your own IDataReader method, you could avoid writing to an imtermediate file. See ADO.NET 2.0 Tutorial : SqlBulkCopy Revisited for Transferring Data at High Speeds
Related SO question: how to pass variables like arrays / datatable to SQL server?
I've created a stored procedure similar to the one below (I'm using this cut down version to try and figure our the problem).
CREATE PROCEDURE bsp_testStoredProc
AS
BEGIN
CREATE TABLE #tmpFiles
(
AuthorName NVARCHAR(50),
PercentageHigh INT
)
-- Insert data into temp table
SELECT AuthorName, PercentageHigh FROM #tmpFiles
ORDER BY PercentageHigh DESC
DROP TABLE #tmpFiles
RETURN 0
END
From my C# code in VS2008, I'm trying to use the Query component with the Use Existing Stored Procedure option to connect this up to a DataTable / DataGridView to display the results.
However, because I'm selecting from a temporary table, in the Query component properties Visual Studio does not display any columns being returned from the stored procedure. I assume that it has trouble determining the data types being used since the SP is not based on a real schema.
Connecting to different stored procedures that select from real tables do show the columns correctly.
Does anyone know away around this? Is there some sort of hint I can add somewhere to explicitly state what sort of data will be returned?
Thanks in advance.
For info, you might consider using a "table variable" rather than a temporary table (i.e. #FOO rather than #FOO) - this might help a little, and it certainly helps a few tempdb issues.
With temporary tables - no there is no way of explicitly declaring the SPs schema. I would perhaps suggest using a simplified version of the SP while you generate your wrapper classes - i.e. have it do a trivial SELECT of the correct shape.
Alternatively, I would use LINQ to consume a UDF, which does have explicit schema.