Using C# (vs2005) I need to copy a table from one database to another. Both database engines are SQL Server 2005. For the remote database, the source, I only have execute access to a stored procedure to get the data I need to bring locally.
The local database I have more control over as it's used by the [asp.net] application which needs a local copy of this remote table. We would like it local for easier lookup and joins with other tables, etc.
Could you please explain to me an efficient method of copying this data to our local database.
The local table can be created with the same schema as the remote one, if it makes things simpler. The remote table has 9 columns, none of which are identity columns. There are approximately 5400 rows in the remote table, and this number grows by about 200 a year. So not a quickly changing table.
Perhaps SqlBulkCopy; use SqlCommand.ExecuteReader to get the reader that you use in the call to SqlBulkCopy.WriteToServer. This is the same as bulk-insert, so very quick. It should look something like (untested);
using (SqlConnection connSource = new SqlConnection(csSource))
using (SqlCommand cmd = connSource.CreateCommand())
using (SqlBulkCopy bcp = new SqlBulkCopy(csDest))
{
bcp.DestinationTableName = "SomeTable";
cmd.CommandText = "myproc";
cmd.CommandType = CommandType.StoredProcedure;
connSource.Open();
using(SqlDataReader reader = cmd.ExecuteReader())
{
bcp.WriteToServer(reader);
}
}
Bulk Copy feature of ADO.NET might help you take a look at that :
MSDN - Multiple Bulk Copy Operations (ADO.NET)
An example article
I would first look at using SQL Server Intergration Services (SSIS, née Data Transfer Services (DTS)).
It is designed for moving/comparing/processing/transforming data between databases, and IIRC allows an arbitrary expression for the source. You would need it installed on your database (shouldn't be a problem, it is part of a default install).
Otherwise a code solution, given the data size (small), pull all the data from the remove system into an internal structure, and then look for rows which don't exist locally to insert.
You probably can't do this, but if you can't, DON'T do it with a program. If you have any way of talking to someone who controls the source server, see if they will set up some sort of export of the data. If the data is as small as you say, then xml or csv output would be 100x better than writing something in c# (or any language).
So let's assume they can't export, still, avoid writing a program. You say you have more control over the destination. Can you set up an SSIS package, or setup a linked server? If so, you'll have a much easier time migrating the data.
If you set up at bare minimum the source as a linked server you could write a small t-sql batch to
TRUNCATE DestTable
INSERT INTO DestTable
SELECT SourceTable.Star FROM [SourceServer].[Schema].[Table]
wouldn't be as nice as SSIS (you have more visual of what's happening, but the t-sql above is pretty clear).
Since I would not take the programming route, the best solution I could give you would be, if you absolutely had to:
Use SqlClient namespace.
So, create 2 SqlConnections, 2 SqlCommands, and get the instance of the 1 SqlReader.
Iterate through the source reader, and execute the destination SqlCommand insert for each iteration with the.
It'll be ugly, but it'll work.
Doesn’t seem to be huge quantity of data you have to synchronize. Under conditions you described (only SP to access the remote DB and no way to get anything else), you can go for Marc Gravell’s solution.
In the case the data can only grow and existing data can not be changed you can compare the record count on remote and internal DB in order to optimize operation; if no change in remote DB no need to copy.
Related
I am working on a C# console application which runs two SQL queries which are passed as strings as follows
private DataTable GetData(string resultsQuery) {
SqlCommand sqlCmd = new SqlCommand(resultsQuery, sqlcon);
DataTable dtlist = new DataTable();
SqlDataAdapter dalist = new SqlDataAdapter();
dalist.SelectCommand = sqlCmd;
dalist.Fill(dtlist);
sqlcon.Close();
return dtlist;
}
but the thing is these queries keep changing very frequently and everytime they change, I have re-build, re-publish and uninstall the older application before installing the updated application which I think is a bad practice. The reason I cannot use a stored procedure is that I have only read access to the database and I cannot create a stored procedure.
Can anyone suggest me a better way and best practice to deal with this?
Essentially, your query is part of your program's configuration, which is currently hard-coded. Therefore, a solution you need has little to do with accessing databases: you need a way to upgrade configuration settings of an installed application.
Although having a stored procedure would be a fine choice, there are other ways of achieving the effect that you are looking for:
One approach would be to configure a separate database to which you have a write access, and use it as your source of query strings. Make a table that "maps" query names to query content:
QueryKey Query
-------- --------------------------------------
query1 SELECT A, B, C FROM MyTable1 WHERE ...
queryX SELECT X, Y, Z FROM MyTable2 WHERE ...
Your program can read this other DB at startup, and store queries for use at runtime. When end-users request their data, your program would execute the query it got from the configuration database against your read-only database.
You can take other approaches to distributing this piece of configuration. Alternatives include storing the strings in a shared folder on a file server to which your application has visibility, setting up a network service of your own to feed your application its queries at start-up, or using built-in means of configuration available in .NET. The last approach requires you to change the settings on individual machines one-by-one, which may not be an ideal roll-out scenario.
The idea of using an external XML file was mentioned, but because this is .Net, that is exactly what the App.Config files are designed to do. There is a whole class library designed to allow you to store info in the App.Config file so it can be easily modified without recompiling the program. The big caveat is, that if the query returns an entirely different shaped result set, this will not help much. But if all you are changing is the WHERE condition or something like that, but not the SELECT list; then this approach will work fine.
When using the Query Design feature in Visual Studio, any queries that I run on a SQL Database or Microsoft Access Database while testing are persistent. Meaning they actually change the data in the table(s). Is there a way to make the queries non-persistent while testing them until a program is run? Using C# as a programming language, and .NET as a framework if it matters. Also need to know the process for doing this with either an MS Access or SQL database.
You can do transactions in C# similar to how you use them in SQL. Here is an example:
connection.Open();
SqlCommand command = connection.CreateCommand();
SqlTransaction transaction;
// Start a local transaction.
transaction = connection.BeginTransaction("SampleTransaction");
//Execute query here
Query details
//check if test environment
bool testEnvironment = SomeConfigFile.property("testEnvironment");
if (!testEnvironment) {
transaction.Commit();
} else {
transaction.Rollback();
}
Here is the documentation on transactions in C#: https://msdn.microsoft.com/en-us/library/86773566%28v=vs.110%29.aspx
It should be possible for VS to create you a local copy of the SQL data you're working on while you're testing. This is held in the bin folder. Have a look at this:
https://msdn.microsoft.com/en-us/library/ms246989.aspx
Once you're finished testing you could simply change it to be pointing to the database you want to alter with your application.
I'm not aware of a way to get exactly what you're asking for, but I think there is an approach to get close to the behaviour you want:
When using Microsoft SQL Server, creating a table with a leading hash in the name (#tableName) will cause the table to be disposed of when your session ends.
One way you could take advantage of this to get your desired behaviour is to copy your working table into a temporary table, and work on the temporary table instead of the live table.
To do so, use something like the following:
SELECT * INTO #tempTable FROM liveTable
This will create a complete copy of your liveTable, with all of the same columns and rows. Once you are finished, the table will be automatically dropped and no permanent changes will have been made.
This can also useful for a series of queries which you execute on the same subset of a large data set. Selecting the subset of data into a smaller temporary table can make subsequent queries much faster than if you had to select from the full data set repeatedly.
Just keep in mind that as soon as your connection closes, all the data goes with it.
I have a very large number of rows (10 million) which I need to select out of a SQL Server table. I will go through each record and parse out each one (they are xml), and then write each one back to a database via a stored procedure.
The question I have is, what's the most efficient way to do this?
The way I am doing it currently is I open 2 SqlConnection's (one for read one for write). The read one uses the SqlDataReader of which it basically does a select * from table and I loop through the dataset. After I parse each record I do an ExecuteNonQuery (using parameters) on the second connection.
Is there any suggestions to make this more efficient, or is this just the way to do it?
Thanks
It seems that you are writing rows one-by-one. That is the slowest possible model. Write bigger batches.
There is no need for two connections when you use MARS. Unfortunately, MARS forces a 14 byte row versioning tag in each written row. Might be totally acceptable, or not.
I had very slimier situation and here what I did:
I made two copies of same database.
One is optimized for reading and another is optimized for writing.
In config, i kept two connection string ConnectionRead and ConnectionWrite.
Now in DataLayer when I have read statement(select..) I switch my connection to ConnectionRead connection string and when writing using other one.
Now since I have to keep both the databases in sync, I am using SQL replication for this job.
I can understand implementation depends on many aspect but approach may help you.
I agree with Tim Schmelter's post - I did something very similar... I actually used a SQLCLR procedure which read the data from a XML column in a SQL table into an in-memory (table) using .net (System.Data) then used the .net System.Xml namespace to deserialize the xml, populated another in-memory table (in the shape of the destination table) and used the sqlbulkcopy to populate that destination SQL table with those parsed attributes I needed.
SQL Server is engineered for set-based operations... If ever I'm shredding/iterating (row-by-row) I tend to use SQLCLR as .Net is generally better at iterative/data-manipulative processing. An exception to my rule is when working with a little metadata for data-driven processes, cleanup routines where I may use a cursor.
When copying the results of a view from a database on one server, to a another database on a new server (both running SQL Server 2008), which of the following methods is the likely to be the most efficient?
1. SSIS Dataflow task with OLE DB Source/Destination
2. Custom scripts
e.g.
using (SqlConnection connection = new SqlConnection(sourceConnectionString))
using (SqlCommand command = new SqlCommand(sourceQuery, connection))
{
connection.Open();
using (SqlDataReader reader = command.ExecuteReader())
{
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
{
bulkCopy.DestinationTableName = destinationTable;
//Any Column mapping required
bulkCopy.WriteToServer(reader);
}
}
}
3. SSIS Bulk Insert
I don't see how this would be any different to using custom scripts, but without the added inflexibility that it only works with tables/views and not stored procedures.
I realise this is not quite like for like comparison because there are additional levels of logging and error handling etc available with the various options, but lets assume I require no logging at all to try and make the playing field as even as possible.
According to this SO Post, BULK INSERT performs better than SqlBulkCopy. However, according to this post, the DataFlow task is preferred over the Bulk Insert SSIS task
This isn't really answering your question directly, but...
The short answer to your question is probably to try different approaches using a representative data set and see for yourself. That will give you an answer that is more meaningful for your environment than anyone on SO can provide, as well as being a good way to understand the effort and issues involved in each option.
The longer answer is that the SSIS Bulk Insert task is most likely the slowest, since it can only load flat file data. In order to use it, you would have to export the data to a file on the source server before reloading it on the target server. That could be useful if you have a very slow WAN connection between the servers, because you can compress the file before copying it across to minimize the data volume, otherwise it just adds more work to do.
As for the difference between SSIS and SqlBulkCopy I have no personal experience with SqlBulkCopy so I can only suggest that if there is no clear 'winner' you go with the simplest, easiest to maintain implementation first and don't worry about finding the fastest possible solution until you actually need to. Code only needs to run fast enough, not as fast as possible, and it isn't clear from your question that you actually have a performance problem.
I have a Windows Service application that receives a stream of data with the following format
IDX|20120512|075659|00000002|3|AALI |Astra Agro Lestari Tbk. |0|ORDI_PREOPEN|12 |00000001550.00|00000001291.67|00001574745000|00001574745000|00500|XDS1BXO1| |00001574745000|ݤ
IDX|20120512|075659|00000022|3|ALMI |Alumindo Light Metal Industry Tbk. |0|ORDI |33 |00000001300.00|00000001300.00|00000308000000|00000308000000|00500|--U3---2| |00000308000000|õÄ
This data comes in millions of rows and in sequence 00000002....00198562 and I have to parse and insert them according to the sequence into a database table.
My question is, what is the best way (the most effective) to insert these data into my database? I have tried to use a simple method as to open a SqlConnection object then generate a string of SQL insert script and then execute the script using SqlCommand object, however this method is taking too long.
I read that I can use Sql BULK INSERT but it has to read from a textfile, is it possible for this scenario to use BULK INSERT? (I have never used it before).
Thank you
update: I'm aware of SqlBulkCopy but it requires me to have DataTable first, is this good for performance? If possible I want to insert directly from my data source to SQL Server without having to use in memory DataTable.
If you are writing this in C# you might want to look at the SqlBulkCopy class.
Lets you efficiently bulk load a SQL Server table with data from another source.
First, download free LumenWorks.Framework.IO.Csv library.
Second, use the code like this
StreamReader sr = new TextReader(yourStream);
var sbc = new SqlBulkCopy(connectionString);
sbc.WriteToServer(new LumenWorks.Framework.IO.Csv.CsvReader(sr));
Yeah, it is really that easy.
You can use SSIS "Sql Server Integration Service" for converting data from source data flow to destination data flow.
The source can be a text file and destination can be a SQL Server table. Your conversion executes in bulk insert mode.