I'm writing a program in c# to export SQL Server data from one database and importing it in another. Since these two servers are not connected I need to choose a method such as bcp.
What are the differences between these two? Is one more efficient than the other? And in what scenarios?
What are the known limitations/compatibility issues of each?
What other methods exist to export data from SQL Server into files and import from them?
Can I enable compression in these files at the same time as creating them via a command line switch instead of zipping them afterwards?
Please include any other aspects which you think is important when making this decision.
Thanks in advance.
Doesn't cover BCP, but I did write a blog post comparing a couple of approaches for bulk loading data into SQL Server - compared SqlBulkCopy against batched inserts via SqlDataAdapter.
SqlBulkCopy is worth checking out - the kind of process you'd use is query database 1 and retrieve an SqlDataReader. Pass that SqlDataReader to SqlBulkCopy to persist that data to database 2.
Related
I have a very large number of rows (10 million) which I need to select out of a SQL Server table. I will go through each record and parse out each one (they are xml), and then write each one back to a database via a stored procedure.
The question I have is, what's the most efficient way to do this?
The way I am doing it currently is I open 2 SqlConnection's (one for read one for write). The read one uses the SqlDataReader of which it basically does a select * from table and I loop through the dataset. After I parse each record I do an ExecuteNonQuery (using parameters) on the second connection.
Is there any suggestions to make this more efficient, or is this just the way to do it?
Thanks
It seems that you are writing rows one-by-one. That is the slowest possible model. Write bigger batches.
There is no need for two connections when you use MARS. Unfortunately, MARS forces a 14 byte row versioning tag in each written row. Might be totally acceptable, or not.
I had very slimier situation and here what I did:
I made two copies of same database.
One is optimized for reading and another is optimized for writing.
In config, i kept two connection string ConnectionRead and ConnectionWrite.
Now in DataLayer when I have read statement(select..) I switch my connection to ConnectionRead connection string and when writing using other one.
Now since I have to keep both the databases in sync, I am using SQL replication for this job.
I can understand implementation depends on many aspect but approach may help you.
I agree with Tim Schmelter's post - I did something very similar... I actually used a SQLCLR procedure which read the data from a XML column in a SQL table into an in-memory (table) using .net (System.Data) then used the .net System.Xml namespace to deserialize the xml, populated another in-memory table (in the shape of the destination table) and used the sqlbulkcopy to populate that destination SQL table with those parsed attributes I needed.
SQL Server is engineered for set-based operations... If ever I'm shredding/iterating (row-by-row) I tend to use SQLCLR as .Net is generally better at iterative/data-manipulative processing. An exception to my rule is when working with a little metadata for data-driven processes, cleanup routines where I may use a cursor.
I have a Windows Service application that receives a stream of data with the following format
IDX|20120512|075659|00000002|3|AALI |Astra Agro Lestari Tbk. |0|ORDI_PREOPEN|12 |00000001550.00|00000001291.67|00001574745000|00001574745000|00500|XDS1BXO1| |00001574745000|ݤ
IDX|20120512|075659|00000022|3|ALMI |Alumindo Light Metal Industry Tbk. |0|ORDI |33 |00000001300.00|00000001300.00|00000308000000|00000308000000|00500|--U3---2| |00000308000000|õÄ
This data comes in millions of rows and in sequence 00000002....00198562 and I have to parse and insert them according to the sequence into a database table.
My question is, what is the best way (the most effective) to insert these data into my database? I have tried to use a simple method as to open a SqlConnection object then generate a string of SQL insert script and then execute the script using SqlCommand object, however this method is taking too long.
I read that I can use Sql BULK INSERT but it has to read from a textfile, is it possible for this scenario to use BULK INSERT? (I have never used it before).
Thank you
update: I'm aware of SqlBulkCopy but it requires me to have DataTable first, is this good for performance? If possible I want to insert directly from my data source to SQL Server without having to use in memory DataTable.
If you are writing this in C# you might want to look at the SqlBulkCopy class.
Lets you efficiently bulk load a SQL Server table with data from another source.
First, download free LumenWorks.Framework.IO.Csv library.
Second, use the code like this
StreamReader sr = new TextReader(yourStream);
var sbc = new SqlBulkCopy(connectionString);
sbc.WriteToServer(new LumenWorks.Framework.IO.Csv.CsvReader(sr));
Yeah, it is really that easy.
You can use SSIS "Sql Server Integration Service" for converting data from source data flow to destination data flow.
The source can be a text file and destination can be a SQL Server table. Your conversion executes in bulk insert mode.
I'm importing data from a remote MySQL server. I'm connecting to the MySQL database through a SSH connection and then pulling the data into MS SQL Server. There are a couple of type checks that need to be performed, especially the MySQL DateTime to the MS SQL DateTime. Initially I thought about using the MySqlDataReader to read the data into a List<T> to ensure correct types and then pushing the data into a DataSet and then into MS SQL Server.
Is this a good approach or should I be looking into doing this a different way? I can certainly do a bulk insert into SQL Server but then I'll have to deal with the data types later.
Thoughts?
I personally wouldn't use a dataset in the process, but moving it into a .NET type, then using a parameterized SQL statement will work just fine.
If you have a very large set, you might think of looking at a bulk insert, but that will depend on the size of the set.
Here's a Microsoft Guideline for going from MySQL to Sql Server 2000:
http://technet.microsoft.com/en-us/library/cc966396.aspx
SQL Server has a rich set of tools and utilities to ease the migration from MySQL. SQL Server 2000 Data Transformation Services (DTS) is a set of graphical tools and programmable objects for extraction, transformation, and consolidation of data from disparate sources into single or multiple destinations.
From reading this article you can import your MySQL without writing a line of C#
You know, if it's a process you need to perform but you aren't limited to writing your own code, you might want to look at Talend. It's an open source tool for ETL (essentially data transforms between data sources).
It's open source and has a nice GUI for designing the transform - where things come from and where they go to, plus what happens in the middle.
http://www.talend.com/index.php
Just a thought, but if you're just trying to reach a goal as opposed to write the tool, it may be quicker and more flexible in the long run for you.
The easiest way to do it to convert it to Timestamp.
Timestamp SetupStart_ts = rs.getTimestamp("SetupStart");
String SetupStart = SetupStart_ts.toString()
Push it to mssql server straightaway and it will save automatically in the datetime but varify it.Thank you.
Using C# (vs2005) I need to copy a table from one database to another. Both database engines are SQL Server 2005. For the remote database, the source, I only have execute access to a stored procedure to get the data I need to bring locally.
The local database I have more control over as it's used by the [asp.net] application which needs a local copy of this remote table. We would like it local for easier lookup and joins with other tables, etc.
Could you please explain to me an efficient method of copying this data to our local database.
The local table can be created with the same schema as the remote one, if it makes things simpler. The remote table has 9 columns, none of which are identity columns. There are approximately 5400 rows in the remote table, and this number grows by about 200 a year. So not a quickly changing table.
Perhaps SqlBulkCopy; use SqlCommand.ExecuteReader to get the reader that you use in the call to SqlBulkCopy.WriteToServer. This is the same as bulk-insert, so very quick. It should look something like (untested);
using (SqlConnection connSource = new SqlConnection(csSource))
using (SqlCommand cmd = connSource.CreateCommand())
using (SqlBulkCopy bcp = new SqlBulkCopy(csDest))
{
bcp.DestinationTableName = "SomeTable";
cmd.CommandText = "myproc";
cmd.CommandType = CommandType.StoredProcedure;
connSource.Open();
using(SqlDataReader reader = cmd.ExecuteReader())
{
bcp.WriteToServer(reader);
}
}
Bulk Copy feature of ADO.NET might help you take a look at that :
MSDN - Multiple Bulk Copy Operations (ADO.NET)
An example article
I would first look at using SQL Server Intergration Services (SSIS, née Data Transfer Services (DTS)).
It is designed for moving/comparing/processing/transforming data between databases, and IIRC allows an arbitrary expression for the source. You would need it installed on your database (shouldn't be a problem, it is part of a default install).
Otherwise a code solution, given the data size (small), pull all the data from the remove system into an internal structure, and then look for rows which don't exist locally to insert.
You probably can't do this, but if you can't, DON'T do it with a program. If you have any way of talking to someone who controls the source server, see if they will set up some sort of export of the data. If the data is as small as you say, then xml or csv output would be 100x better than writing something in c# (or any language).
So let's assume they can't export, still, avoid writing a program. You say you have more control over the destination. Can you set up an SSIS package, or setup a linked server? If so, you'll have a much easier time migrating the data.
If you set up at bare minimum the source as a linked server you could write a small t-sql batch to
TRUNCATE DestTable
INSERT INTO DestTable
SELECT SourceTable.Star FROM [SourceServer].[Schema].[Table]
wouldn't be as nice as SSIS (you have more visual of what's happening, but the t-sql above is pretty clear).
Since I would not take the programming route, the best solution I could give you would be, if you absolutely had to:
Use SqlClient namespace.
So, create 2 SqlConnections, 2 SqlCommands, and get the instance of the 1 SqlReader.
Iterate through the source reader, and execute the destination SqlCommand insert for each iteration with the.
It'll be ugly, but it'll work.
Doesn’t seem to be huge quantity of data you have to synchronize. Under conditions you described (only SP to access the remote DB and no way to get anything else), you can go for Marc Gravell’s solution.
In the case the data can only grow and existing data can not be changed you can compare the record count on remote and internal DB in order to optimize operation; if no change in remote DB no need to copy.
I am about to start on a journey writing a windows forms application that will open a txt file that is pipe delimited and about 230 mb in size. This app will then insert this data into a sql server 2005 database (obviously this needs to happen swiftly). I am using c# 3.0 and .net 3.5 for this project.
I am not asking for the app, just some communal advise here and potential pitfalls advise. From the site I have gathered that SQL bulk copy is a prerequisite, is there anything I should think about (I think that just opening the txt file with a forms app will be a large endeavor; maybe break it into blob data?).
Thank you, and I will edit the question for clarity if anyone needs it.
Do you have to write a winforms app? It might be much easier and faster to use SSIS. There are some built-in tasks available especially Bulk Insert task.
Also, worth checking Flat File Bulk Import methods speed comparison in SQL Server 2005.
Update: If you are new to SSIS, check out some of these sites to get you on fast track. 1) SSIS Control Flow Basics 2) Getting Started with SQL Server Integration Services
This is another How to: on importing Excel file into SQL 2005.
This is going to be a streaming endeavor.
If you can, do not use transactions here. The transactional cost will simply be too great.
So what you're going to do is read the file a line at a time and insert it in a line at a time. You should dump failed inserts into another file that you can diagnose later and see where they failed.
At first I would go ahead and try a bulk insert of a couple of hundred rows just to see that the streaming is working properly and then you can open up all you want.
You could try using SqlBulkCopy. It lets you pull from "any data source".
Just as a side note, it's sometimes faster to drop the indices of your table and recreate them after the bulk insert operation.
You might consider switching from full recovery to bulk-logged. This will help to keep your backups a reasonable size.
I totally recommend SSIS, you can read in millions of records and clean them up along the way in relatively little time.
You will need to set aside some time to get to grips with SSIS, but it should pay off. There are a few other threads here on SO which will probably be useful:
What's the fastest way to bulk insert a lot of data in SQL Server (C# client)
What are the recommended learning material for SSIS?
You can also create a package from C#. I have a C# program which reads a 3GL "master file" from a legacy system (parses into an object model using an API I have for a related project), takes a package template and modifies it to generate a package for the ETL.
The size of data you're talking about actually isn't that gigantic. I don't know what your efficiency concerns are, but if you can wait a few hours for it to insert, you might be surprised at how easy this would be to accomplish with a really naive technique of just INSERTing each row one at a time. Batching together a thousand or so rows at a time and submitting them to SQL server may make it quite a bit faster as well.
Just a suggestion that could save you some serious programming time, if you don't need it to be as fast as conceivable. Depending on how often this import has to run, saving a few days of programming time could easily be worth it in exchange for waiting a few hours while it runs.
You could use SSIS for the read & insert, but call it as a package from your WinForms app. Then you could pass in things like source, destination, connection strings etc as parameter/configurations.
HowTo: http://msdn.microsoft.com/en-us/library/aa337077.aspx
You can set up transforms and error handling inside SSIS and even create logical branching based on input parameters.
If the column format of the file matches the target table where the data needs to end up, I prefer using the command line utility bcp to load the data file. It's blazingly fast and you can specify and error file for any "odd" records that fail to be inserted.
Your app could kick off the command if you need to store the command line parameters for it (server, database, username / password or trusted connection, table, error file etc.).
I like this method better than running a BULK INSERT SQL command because the data file isn't required to be on a system accessible by the database server. To use bulk insert you have to specify the path to the data file to load, so it must be a path visible and readable by the system user on the database server that is running the load. Too much hassle for me usually. :-)