Move data from one table to another in the same Database - c#

I realize this question has been asked before, but nothing I've read really answers my question.
I have a table with millions of rows of data that is used in multiple queries a day. I want to move the majority of the data to another table with the same schema. The second table will be an "archive" table.
I would like a list of options to archive data, so I can present them to my boss. So far I'm considering an insert into select statement, SQLBulkCopy in a C# console application, and I'm starting to dig in to SSIS to see what it can do. I plan on doing this over a weekend or multiple weekends.
The table has an ID as the primary key
The table also has a few foreign key constraints
Thanks for any help.

I assume that this is for SQL Server. In that case, partitioned tables might be an additional option. Otherwise I'd always go for a INSERT ... SELECT run by a job in SQL Server, or - if you can't run it directly in SQL Server - create a stored procedure and run it through a little C# tool that you schedule.

Try execute something like
CREATE TABLE mynewtable as select * from myoldtable where any_filter..;
You could create new table with data copy with one instruction on most database engines.
Use this, in case of SQL Server 2008
Select * into new_table from old_table

In the event that you have a set data archive interval, you may be able to leverage the partition-to-archive solution described in the following article.
http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in-sql-server-part-1-the-basics.aspx
Our team has leveraged a similar partition / archive solution in the past with good success.
Regards,

Related

How to synchronize two tables stored in different databases

I have two tables in different databases. The tables are exactly alike (same name,same columns,etc). My question is, how can I retrieve new rows from parent table and store into the child table? I need to do so in a click_event of a button.
Thanks in advance.
There are several technologies specifically for this type of scenario:
SQL Replication
Supports unidirectional or bidirectional synchronization
SSIS
Lets you define the mappings of the data, as well as transformations, and attach other code to the process easily
Linked-servers
Allows you to query databases and tables on remote servers as though they are part of the local database. Very easy to setup (just call exec sp_addlinkedserver) and once defined uses nothing but plain old SQL
Since you mention this needs to occur on a button-click then I'd suggest you use linked servers within a stored procedure--they're the simplest option. SSIS would also be suitable, you'd need to execute the package on the button-click.
Resolved it myself using Linked Server. Here is a simple tutorial about how to create a linked server.
After creating linked server, we can query it as follows:
select * from LinkedServerName.DatabaseName.dbo.TableName
Works just perfect!!
Accepting STW's answer as he explains different approaches.
(long and non-optimal solution)
get all id's from first table.
get all id's from second table.
loop through the first array and remove all items that are found in the second.

Efficient Update of Table from One SQL Server to Another, Same Table Structure

I have one database server, acting as the main SQL Server, containing a Table to hold all data. Other database servers come in and out (different instances of SQL Server). When they come online, they need to download data from main Table (for a given time period), they then generate their own additional data to the same local SQL Server database table, and then want to update the main server with only new data, using a C# program, through a scheduled service, every so often. Multiple additional servers could be generating data at the same time, although it's not going to be that many.
Main table will always be online. The additional non-main database table is not always online, and should not be an identical copy of main, first it will contain a subset of the main data, then it generates its own additional data to the local table and updates main table every so often with its updates. There could be a decent amount of number of rows generated and/or downloaded. so an efficient algorithm is needed to copy from the extra database to the main table.
What is the most efficient way to transfer this in C#? SqlBulkCopy doesn't look like it will work because I can't have duplicate entries in main server, and it would fail if checking constraints since some entries already exist.
You could do it in DB or in C#. In all cases you must do something like Using FULL JOINs to Compare Datasets. You know that already.
Most important thing is to do it in transaction. If you have 100k rows split it to 1000 rows per transaction. Or try to determine what combination of rows per transaction is best for you.
Use Dapper. It's really fast.
If you have all your data in C#, use TVP to pass it to DB stored procedure. In stored procedure use MERGE to UPDATE/DELETE/INSERT data.
And last. In C# use Dictionary<Tkey, TValue> or something different with O(1) access time.
SQLBulkCopy is the fastest way for inserting data into a table from a C# program. I have used it to copy data between databases and so far nothing beats it speed wise. Here is a nice generic example: Generic bulk copy.
I would use a IsProcessed flag in the table of the main server and keep track of the main table's primary keys when you download data to the local db server. Then you should be able to do a delete and update to the main server again.
Here's how i would do it:
Create a stored procedure on the main table database which receives a user defined table variable with the same structure as the main table.
it should do something like -
INSERT INTO yourtable (SELECT * FROM tablevar)
OR you could use the MERGE statement for the Insert-or-Update functionality.
In code, (a windows service) load all (or a part of) the data from the secondery table and send it to the stored procedure as a table variable.
You could do it in bulks of 1000's and each time a bulk is updated you should mark it in the source table / source updater code.
Can you use linked servers for this? If yes it will make copying of data from and to main server much easier.
When copying data back to the main server I’d use IF EXISTS before each INSERT statement to additionally make sure there are no duplicates and encapsulate all insert statements into transaction so that if an error occurs transaction is rolled back.
I also agree with others on doing this in batches on 1000 or so records so that if something goes wrong you can limit the damage.

Copy data from a table and update it

I have a table tblSource in SourceDB(SQL Server DB) and tblTarget in TargetDB(SQL Server DB). Data from tblSource has to be moved to tblTarget. tblSource has bit field to indicate which data is moved to tblTarget, so when row is copied to tbltarget this flag has to be set. I need to do it in C#, still suggestions in T-SQL are welcomed. My question is what all are possible solution and which will be best approach?
Merge will work for you if in SQL Server 2008.
OUTPUT will work for you with SQL Server 2005+.
You need to Update the record to set your bit flag and OUTPUT INSERTED.* into your destination table.
You can consider outputting selected records only if you are planning to insert selected records to your destination table.
This is good in terms of performance as this technique will require SQL Server to traverse the record only once.
Check these links for how OUTPUT is used.
http://msdn.microsoft.com/en-us/library/ms177564.aspx &&
http://blog.sqlauthority.com/2007/10/01/sql-server-2005-output-clause-example-and-explanation-with-insert-update-delete/
You could use the TSQL MERGE statement, which would remove the need to keep a flag on each row.
This could be executed from C# if need be, or wrapped in a stored procedure. If they are in separate server instances, you can create Linked server.
I would use SQLBulkCopy class for this. I've used it in the past and had good luck with it. It's plenty fast and easy to use. There's plenty of sample code at that link to get you started.
Is there any reason why a simple INSERT is not an option?
INSERT INTO tblTarget (destcol1, destcol2)
SELECT (sourcecol1, sourcecol2) FROM tblSource

Merging databases - Identity column drop

I need to create a tool that is able to merge clients production databases.
Usually these databases will have the same schema (I'll do some check's later on, but for now we'll assume it is). Filtering of duplicate data is something for later on too.
This needs to be done automaticly(so no script generation via SSMS etc).
I've already had to start over again a couple of times because every time I ran into problems for things I didn't think off, so this time I wanted to ask you guys for advice before I begin all over again.
My current plan of action is:
Copy schema from database 1(later on I'll add some checks here for
when the schema is different
Loop over all tables and set all foreign key updates to cascade, and
set the order in which the tabledata needs to be inserted (so the
tables containing the PK's first, then the tables holding the FK's)
Loop every table in the correct order
Check in database 2 table for identity column, if so, retrieve the
current seed value from the corresponding table in database 1, drop
identity property on database 2 table and update each ID to newID =
currentID + seed(to avoid duplicate primary keys later on)
Generate insert script(SMO's Table.EnumScript) for database 1 table
Generate insert script(SMO's Table.EnumScript) for database 2 table
Execute every line in database 1 insert script on the new database
Execute every line in database 2 insert script(which now has primary
keys/identity field data that will follow the ones in database 1) on the new database
Go to next table
Everything was working when testing(disabling the identity property in SSMS, created a T-SQL script to update every row with the given seed,..)
But the problem now is automating this in C#, more specific the disabling of the identity property. There doesn't seem to be a clean solution for this. Creating a new table and rebuilding every constraint etc seems like the wrong way, because the only reason I need it is to cascade every FK so everything still points to the correct place..
Another way would be to delay the updating of the identity-column-data, and change it after script generation and before insertion in the new database. But then I'd need to know which data points to which other data, while everything is still in strings(insertscript)?
Any suggestions,thoughts or techniques on how to handle this?
I know about Red Gate 's SQL compare, and it is indeed wonderfull, but need to program it myself.
Using: SMO, SQL Server 2005 - 2008R2(no developers or enterprise edition on client servers), ADO.NET , C#, .NET framework 2.0, Visual Studio 2008
I am not sure exactly what you are trying to accomplish with your process here, but managing Database versions is something that I have a keen interest in.
Have a look at DBSourceTools ( http://dbsourcetools.codeplex.com ).
It is a utility to script an entire database to disk, including all foreign key constraints and data.
Using Deployment Targets, you will then be able to re-create these databases on another database server (usually local machine).
The tool will handle dependencies and large database tables using Sql Bulk insert - trying to generate a script with 50,000 insert statements will be a nightmare.
Have fun.
Disclaimer: I am involved in the http://dbsourcetools.codeplex.com project.

Maintain a local copy of a table from an external database table, ADO.NET

We have built an application which needs a local copy of a table from another database. I would like to write an ado.net routine which will keep the local table in sync with the master. Using .net 2.0, C# and ADO.NET.
Please note I really have no control over the master table which is in a third party, mission critical app I don't wish to mess with.
For example Here is the master data table:
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
OtherUneededField int
OtherUneededField2 int
The local table we need to keep in sync...
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
Perhaps a better approach to this question is what have you done in the past to this type of problem? What has worked best for you or should be avoided at all costs?
My goal with this question is to determine a good way to handle this. So often I am combining data from two or more disjointed data sources. I haven't included database platforms for this reason, it really shouldn't matter. In this current situation both databases are MSSQL, but I prefer the solution not use linked databases or DTS, etc.
Sure, truncating the local table and refilling it each time from the master is an option, but with thousands of rows I don't think this is very efficient. Do you?
EDIT: First, recognize that what you are doing is hand-rolled replication and replication is never simple.
You need to track and apply all of the CRUD state changes. That said, ADO.NET can do this.
To track changes to the source you can use Query Notification with your source database. This requires special permission against the database so the owner of the source database will need to take action to enable this solution. I haven't used this technique myself, but here is a description of it.
See "Query Notifications in SQL Server (ADO.NET)"
Query notifications were introduced in
Microsoft SQL Server 2005 and the
System.Data.SqlClient namespace in
ADO.NET 2.0. Built upon the Service
Broker infrastructure, query
notifications allow applications to be
notified when data has changed. This
feature is particularly useful for
applications that provide a cache of
information from a database, such as a
Web application, and need to be
notified when the source data is
changed.
To apply changes from the source db table you need to retrieve the data from the target db table, apply the changes to the target rows and post the changes back to the target db.
To apply the changes you can either
1) Delete and reinsert all of the rows (simple), or
2) Merge row-by-row changes (hard).
Delete and reinsert is self explanatory, so I won't go into detail on that.
For row-by-row change tracking here is an approach. (I am assuming here that Query Notification doesn't give you row-by-row change information, so you have to calculate it.)
You need to determine which rows were modified and identify inserted and deleted rows. Create a DataView with a sort for each table to get a Find method you can use to lookup matching rows by ID.
Identify modified rows by using a datetime/timestamp column, or by comparing all field values. Copy modified values to the target row.
Identify added and deleted rows by looping over the respective table DataViews and using the Find method of the other DataView to identify rows that do not appear in the first table. Insert or delete rows from the target table as required. (The Delete method doesn't remove the row but marks it for deletion by the TableAdapter Update.)
Good luck!
+tom
I would push in the direction where the application that is inserting the data would insert into one db/table then the other in the same function. Make the application do the work, the db will be pushed already.
Some questions - what db platform? how are you using the data?
I'm going to assume you're just using this data as a lookup... and as you have no timestamp and no ability modify the existing table, i'd just blow away the local copy periodically and pull it down from the master table again.
Unless you've got a hell of a lot of data the overhead for this should be pretty small.
If you need to synch back to the master table, you'll need to do something a bit more exotic.
Can you use SQL replication? This would be preferable to writing code to do it no?

Categories

Resources