SSIS Data flow vs Custom transfer scripts vs Bulk Insert

SSIS Data flow vs Custom transfer scripts vs Bulk Insert - c#

When copying the results of a view from a database on one server, to a another database on a new server (both running SQL Server 2008), which of the following methods is the likely to be the most efficient?
1. SSIS Dataflow task with OLE DB Source/Destination
2. Custom scripts
e.g.
using (SqlConnection connection = new SqlConnection(sourceConnectionString))
using (SqlCommand command = new SqlCommand(sourceQuery, connection))
{
connection.Open();
using (SqlDataReader reader = command.ExecuteReader())
{
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
{
bulkCopy.DestinationTableName = destinationTable;
//Any Column mapping required
bulkCopy.WriteToServer(reader);
}
}
}
3. SSIS Bulk Insert
I don't see how this would be any different to using custom scripts, but without the added inflexibility that it only works with tables/views and not stored procedures.
I realise this is not quite like for like comparison because there are additional levels of logging and error handling etc available with the various options, but lets assume I require no logging at all to try and make the playing field as even as possible.

According to this SO Post, BULK INSERT performs better than SqlBulkCopy. However, according to this post, the DataFlow task is preferred over the Bulk Insert SSIS task

This isn't really answering your question directly, but...
The short answer to your question is probably to try different approaches using a representative data set and see for yourself. That will give you an answer that is more meaningful for your environment than anyone on SO can provide, as well as being a good way to understand the effort and issues involved in each option.
The longer answer is that the SSIS Bulk Insert task is most likely the slowest, since it can only load flat file data. In order to use it, you would have to export the data to a file on the source server before reloading it on the target server. That could be useful if you have a very slow WAN connection between the servers, because you can compress the file before copying it across to minimize the data volume, otherwise it just adds more work to do.
As for the difference between SSIS and SqlBulkCopy I have no personal experience with SqlBulkCopy so I can only suggest that if there is no clear 'winner' you go with the simplest, easiest to maintain implementation first and don't worry about finding the fastest possible solution until you actually need to. Code only needs to run fast enough, not as fast as possible, and it isn't clear from your question that you actually have a performance problem.

Related

How to run non-persistent queries in Visual Studio

When using the Query Design feature in Visual Studio, any queries that I run on a SQL Database or Microsoft Access Database while testing are persistent. Meaning they actually change the data in the table(s). Is there a way to make the queries non-persistent while testing them until a program is run? Using C# as a programming language, and .NET as a framework if it matters. Also need to know the process for doing this with either an MS Access or SQL database.

You can do transactions in C# similar to how you use them in SQL. Here is an example:
connection.Open();
SqlCommand command = connection.CreateCommand();
SqlTransaction transaction;
// Start a local transaction.
transaction = connection.BeginTransaction("SampleTransaction");
//Execute query here
Query details
//check if test environment
bool testEnvironment = SomeConfigFile.property("testEnvironment");
if (!testEnvironment) {
transaction.Commit();
} else {
transaction.Rollback();
}
Here is the documentation on transactions in C#: https://msdn.microsoft.com/en-us/library/86773566%28v=vs.110%29.aspx

It should be possible for VS to create you a local copy of the SQL data you're working on while you're testing. This is held in the bin folder. Have a look at this:
https://msdn.microsoft.com/en-us/library/ms246989.aspx
Once you're finished testing you could simply change it to be pointing to the database you want to alter with your application.

I'm not aware of a way to get exactly what you're asking for, but I think there is an approach to get close to the behaviour you want:
When using Microsoft SQL Server, creating a table with a leading hash in the name (#tableName) will cause the table to be disposed of when your session ends.
One way you could take advantage of this to get your desired behaviour is to copy your working table into a temporary table, and work on the temporary table instead of the live table.
To do so, use something like the following:
SELECT * INTO #tempTable FROM liveTable
This will create a complete copy of your liveTable, with all of the same columns and rows. Once you are finished, the table will be automatically dropped and no permanent changes will have been made.
This can also useful for a series of queries which you execute on the same subset of a large data set. Selecting the subset of data into a smaller temporary table can make subsequent queries much faster than if you had to select from the full data set repeatedly.
Just keep in mind that as soon as your connection closes, all the data goes with it.

Best way to have 2 connections to sql server (one read one write)

I have a very large number of rows (10 million) which I need to select out of a SQL Server table. I will go through each record and parse out each one (they are xml), and then write each one back to a database via a stored procedure.
The question I have is, what's the most efficient way to do this?
The way I am doing it currently is I open 2 SqlConnection's (one for read one for write). The read one uses the SqlDataReader of which it basically does a select * from table and I loop through the dataset. After I parse each record I do an ExecuteNonQuery (using parameters) on the second connection.
Is there any suggestions to make this more efficient, or is this just the way to do it?
Thanks

It seems that you are writing rows one-by-one. That is the slowest possible model. Write bigger batches.
There is no need for two connections when you use MARS. Unfortunately, MARS forces a 14 byte row versioning tag in each written row. Might be totally acceptable, or not.

I had very slimier situation and here what I did:
I made two copies of same database.
One is optimized for reading and another is optimized for writing.
In config, i kept two connection string ConnectionRead and ConnectionWrite.
Now in DataLayer when I have read statement(select..) I switch my connection to ConnectionRead connection string and when writing using other one.
Now since I have to keep both the databases in sync, I am using SQL replication for this job.
I can understand implementation depends on many aspect but approach may help you.

I agree with Tim Schmelter's post - I did something very similar... I actually used a SQLCLR procedure which read the data from a XML column in a SQL table into an in-memory (table) using .net (System.Data) then used the .net System.Xml namespace to deserialize the xml, populated another in-memory table (in the shape of the destination table) and used the sqlbulkcopy to populate that destination SQL table with those parsed attributes I needed.
SQL Server is engineered for set-based operations... If ever I'm shredding/iterating (row-by-row) I tend to use SQLCLR as .Net is generally better at iterative/data-manipulative processing. An exception to my rule is when working with a little metadata for data-driven processes, cleanup routines where I may use a cursor.

MySQL to MS SQL Server...need advice on best approach

I'm importing data from a remote MySQL server. I'm connecting to the MySQL database through a SSH connection and then pulling the data into MS SQL Server. There are a couple of type checks that need to be performed, especially the MySQL DateTime to the MS SQL DateTime. Initially I thought about using the MySqlDataReader to read the data into a List<T> to ensure correct types and then pushing the data into a DataSet and then into MS SQL Server.
Is this a good approach or should I be looking into doing this a different way? I can certainly do a bulk insert into SQL Server but then I'll have to deal with the data types later.
Thoughts?

I personally wouldn't use a dataset in the process, but moving it into a .NET type, then using a parameterized SQL statement will work just fine.
If you have a very large set, you might think of looking at a bulk insert, but that will depend on the size of the set.

Here's a Microsoft Guideline for going from MySQL to Sql Server 2000:
http://technet.microsoft.com/en-us/library/cc966396.aspx
SQL Server has a rich set of tools and utilities to ease the migration from MySQL. SQL Server 2000 Data Transformation Services (DTS) is a set of graphical tools and programmable objects for extraction, transformation, and consolidation of data from disparate sources into single or multiple destinations.
From reading this article you can import your MySQL without writing a line of C#

You know, if it's a process you need to perform but you aren't limited to writing your own code, you might want to look at Talend. It's an open source tool for ETL (essentially data transforms between data sources).
It's open source and has a nice GUI for designing the transform - where things come from and where they go to, plus what happens in the middle.
http://www.talend.com/index.php
Just a thought, but if you're just trying to reach a goal as opposed to write the tool, it may be quicker and more flexible in the long run for you.

The easiest way to do it to convert it to Timestamp.
Timestamp SetupStart_ts = rs.getTimestamp("SetupStart");
String SetupStart = SetupStart_ts.toString()
Push it to mssql server straightaway and it will save automatically in the datetime but varify it.Thank you.

Copy from one database table to another C#

Using C# (vs2005) I need to copy a table from one database to another. Both database engines are SQL Server 2005. For the remote database, the source, I only have execute access to a stored procedure to get the data I need to bring locally.
The local database I have more control over as it's used by the [asp.net] application which needs a local copy of this remote table. We would like it local for easier lookup and joins with other tables, etc.
Could you please explain to me an efficient method of copying this data to our local database.
The local table can be created with the same schema as the remote one, if it makes things simpler. The remote table has 9 columns, none of which are identity columns. There are approximately 5400 rows in the remote table, and this number grows by about 200 a year. So not a quickly changing table.

Perhaps SqlBulkCopy; use SqlCommand.ExecuteReader to get the reader that you use in the call to SqlBulkCopy.WriteToServer. This is the same as bulk-insert, so very quick. It should look something like (untested);
using (SqlConnection connSource = new SqlConnection(csSource))
using (SqlCommand cmd = connSource.CreateCommand())
using (SqlBulkCopy bcp = new SqlBulkCopy(csDest))
{
bcp.DestinationTableName = "SomeTable";
cmd.CommandText = "myproc";
cmd.CommandType = CommandType.StoredProcedure;
connSource.Open();
using(SqlDataReader reader = cmd.ExecuteReader())
{
bcp.WriteToServer(reader);
}
}

Bulk Copy feature of ADO.NET might help you take a look at that :
MSDN - Multiple Bulk Copy Operations (ADO.NET)
An example article

I would first look at using SQL Server Intergration Services (SSIS, née Data Transfer Services (DTS)).
It is designed for moving/comparing/processing/transforming data between databases, and IIRC allows an arbitrary expression for the source. You would need it installed on your database (shouldn't be a problem, it is part of a default install).
Otherwise a code solution, given the data size (small), pull all the data from the remove system into an internal structure, and then look for rows which don't exist locally to insert.

You probably can't do this, but if you can't, DON'T do it with a program. If you have any way of talking to someone who controls the source server, see if they will set up some sort of export of the data. If the data is as small as you say, then xml or csv output would be 100x better than writing something in c# (or any language).
So let's assume they can't export, still, avoid writing a program. You say you have more control over the destination. Can you set up an SSIS package, or setup a linked server? If so, you'll have a much easier time migrating the data.
If you set up at bare minimum the source as a linked server you could write a small t-sql batch to
TRUNCATE DestTable
INSERT INTO DestTable
SELECT SourceTable.Star FROM [SourceServer].[Schema].[Table]
wouldn't be as nice as SSIS (you have more visual of what's happening, but the t-sql above is pretty clear).
Since I would not take the programming route, the best solution I could give you would be, if you absolutely had to:
Use SqlClient namespace.
So, create 2 SqlConnections, 2 SqlCommands, and get the instance of the 1 SqlReader.
Iterate through the source reader, and execute the destination SqlCommand insert for each iteration with the.
It'll be ugly, but it'll work.

Doesn’t seem to be huge quantity of data you have to synchronize. Under conditions you described (only SP to access the remote DB and no way to get anything else), you can go for Marc Gravell’s solution.
In the case the data can only grow and existing data can not be changed you can compare the record count on remote and internal DB in order to optimize operation; if no change in remote DB no need to copy.

What are the pitfalls of inserting millions of records into SQL Server from flat file?

I am about to start on a journey writing a windows forms application that will open a txt file that is pipe delimited and about 230 mb in size. This app will then insert this data into a sql server 2005 database (obviously this needs to happen swiftly). I am using c# 3.0 and .net 3.5 for this project.
I am not asking for the app, just some communal advise here and potential pitfalls advise. From the site I have gathered that SQL bulk copy is a prerequisite, is there anything I should think about (I think that just opening the txt file with a forms app will be a large endeavor; maybe break it into blob data?).
Thank you, and I will edit the question for clarity if anyone needs it.

Do you have to write a winforms app? It might be much easier and faster to use SSIS. There are some built-in tasks available especially Bulk Insert task.
Also, worth checking Flat File Bulk Import methods speed comparison in SQL Server 2005.
Update: If you are new to SSIS, check out some of these sites to get you on fast track. 1) SSIS Control Flow Basics 2) Getting Started with SQL Server Integration Services
This is another How to: on importing Excel file into SQL 2005.

This is going to be a streaming endeavor.
If you can, do not use transactions here. The transactional cost will simply be too great.
So what you're going to do is read the file a line at a time and insert it in a line at a time. You should dump failed inserts into another file that you can diagnose later and see where they failed.
At first I would go ahead and try a bulk insert of a couple of hundred rows just to see that the streaming is working properly and then you can open up all you want.

You could try using SqlBulkCopy. It lets you pull from "any data source".

Just as a side note, it's sometimes faster to drop the indices of your table and recreate them after the bulk insert operation.

You might consider switching from full recovery to bulk-logged. This will help to keep your backups a reasonable size.

I totally recommend SSIS, you can read in millions of records and clean them up along the way in relatively little time.
You will need to set aside some time to get to grips with SSIS, but it should pay off. There are a few other threads here on SO which will probably be useful:
What's the fastest way to bulk insert a lot of data in SQL Server (C# client)
What are the recommended learning material for SSIS?
You can also create a package from C#. I have a C# program which reads a 3GL "master file" from a legacy system (parses into an object model using an API I have for a related project), takes a package template and modifies it to generate a package for the ETL.

The size of data you're talking about actually isn't that gigantic. I don't know what your efficiency concerns are, but if you can wait a few hours for it to insert, you might be surprised at how easy this would be to accomplish with a really naive technique of just INSERTing each row one at a time. Batching together a thousand or so rows at a time and submitting them to SQL server may make it quite a bit faster as well.
Just a suggestion that could save you some serious programming time, if you don't need it to be as fast as conceivable. Depending on how often this import has to run, saving a few days of programming time could easily be worth it in exchange for waiting a few hours while it runs.

You could use SSIS for the read & insert, but call it as a package from your WinForms app. Then you could pass in things like source, destination, connection strings etc as parameter/configurations.
HowTo: http://msdn.microsoft.com/en-us/library/aa337077.aspx
You can set up transforms and error handling inside SSIS and even create logical branching based on input parameters.

If the column format of the file matches the target table where the data needs to end up, I prefer using the command line utility bcp to load the data file. It's blazingly fast and you can specify and error file for any "odd" records that fail to be inserted.
Your app could kick off the command if you need to store the command line parameters for it (server, database, username / password or trusted connection, table, error file etc.).
I like this method better than running a BULK INSERT SQL command because the data file isn't required to be on a system accessible by the database server. To use bulk insert you have to specify the path to the data file to load, so it must be a path visible and readable by the system user on the database server that is running the load. Too much hassle for me usually. :-)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

SSIS Data flow vs Custom transfer scripts vs Bulk Insert - c#

According to this SO Post, BULK INSERT performs better than SqlBulkCopy. However, according to this post, the DataFlow task is preferred over the Bulk Insert SSIS task

Related

How to run non-persistent queries in Visual Studio

Best way to have 2 connections to sql server (one read one write)

MySQL to MS SQL Server...need advice on best approach

Copy from one database table to another C#

What are the pitfalls of inserting millions of records into SQL Server from flat file?

Categories

Resources