I need to create a tool that is able to merge clients production databases.
Usually these databases will have the same schema (I'll do some check's later on, but for now we'll assume it is). Filtering of duplicate data is something for later on too.
This needs to be done automaticly(so no script generation via SSMS etc).
I've already had to start over again a couple of times because every time I ran into problems for things I didn't think off, so this time I wanted to ask you guys for advice before I begin all over again.
My current plan of action is:
Copy schema from database 1(later on I'll add some checks here for
when the schema is different
Loop over all tables and set all foreign key updates to cascade, and
set the order in which the tabledata needs to be inserted (so the
tables containing the PK's first, then the tables holding the FK's)
Loop every table in the correct order
Check in database 2 table for identity column, if so, retrieve the
current seed value from the corresponding table in database 1, drop
identity property on database 2 table and update each ID to newID =
currentID + seed(to avoid duplicate primary keys later on)
Generate insert script(SMO's Table.EnumScript) for database 1 table
Generate insert script(SMO's Table.EnumScript) for database 2 table
Execute every line in database 1 insert script on the new database
Execute every line in database 2 insert script(which now has primary
keys/identity field data that will follow the ones in database 1) on the new database
Go to next table
Everything was working when testing(disabling the identity property in SSMS, created a T-SQL script to update every row with the given seed,..)
But the problem now is automating this in C#, more specific the disabling of the identity property. There doesn't seem to be a clean solution for this. Creating a new table and rebuilding every constraint etc seems like the wrong way, because the only reason I need it is to cascade every FK so everything still points to the correct place..
Another way would be to delay the updating of the identity-column-data, and change it after script generation and before insertion in the new database. But then I'd need to know which data points to which other data, while everything is still in strings(insertscript)?
Any suggestions,thoughts or techniques on how to handle this?
I know about Red Gate 's SQL compare, and it is indeed wonderfull, but need to program it myself.
Using: SMO, SQL Server 2005 - 2008R2(no developers or enterprise edition on client servers), ADO.NET , C#, .NET framework 2.0, Visual Studio 2008
I am not sure exactly what you are trying to accomplish with your process here, but managing Database versions is something that I have a keen interest in.
Have a look at DBSourceTools ( http://dbsourcetools.codeplex.com ).
It is a utility to script an entire database to disk, including all foreign key constraints and data.
Using Deployment Targets, you will then be able to re-create these databases on another database server (usually local machine).
The tool will handle dependencies and large database tables using Sql Bulk insert - trying to generate a script with 50,000 insert statements will be a nightmare.
Have fun.
Disclaimer: I am involved in the http://dbsourcetools.codeplex.com project.
Related
I realize this question has been asked before, but nothing I've read really answers my question.
I have a table with millions of rows of data that is used in multiple queries a day. I want to move the majority of the data to another table with the same schema. The second table will be an "archive" table.
I would like a list of options to archive data, so I can present them to my boss. So far I'm considering an insert into select statement, SQLBulkCopy in a C# console application, and I'm starting to dig in to SSIS to see what it can do. I plan on doing this over a weekend or multiple weekends.
The table has an ID as the primary key
The table also has a few foreign key constraints
Thanks for any help.
I assume that this is for SQL Server. In that case, partitioned tables might be an additional option. Otherwise I'd always go for a INSERT ... SELECT run by a job in SQL Server, or - if you can't run it directly in SQL Server - create a stored procedure and run it through a little C# tool that you schedule.
Try execute something like
CREATE TABLE mynewtable as select * from myoldtable where any_filter..;
You could create new table with data copy with one instruction on most database engines.
Use this, in case of SQL Server 2008
Select * into new_table from old_table
In the event that you have a set data archive interval, you may be able to leverage the partition-to-archive solution described in the following article.
http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in-sql-server-part-1-the-basics.aspx
Our team has leveraged a similar partition / archive solution in the past with good success.
Regards,
We have a text processing application developed in C# using .NET FW 4.0 where the Administrator can define various settings. All this 'settings' data reside in about 50 tables with foreign key relations and Identity primary keys (this one will make it tricky, I think). The entire database is no more than 100K records, with the average table having about 6 short columns. The system is based on MS SQL 2008 R2 Express database.
We face a requirement to create a snapshot of all this data so that the administrator of the system could roll back to one of the snapshots anytime he screws up something. We need to keep the last 5 snapshots only. Creation of the snapshot must be commenced from the application GUI and so must be the rollback to any of the snapshots if needed (use SSMS will not be allowed as direct access to the DB is denied). The system is still in development (are we ever really finished?) which means that new tables and columns are added many times. Thus we need a robust method that can take care of changes automatically (digging code after inserting/changing columns is something we want to avoid unless there's no other way). The best way would be to tell that "I want to create a snapshot of all tables where the name begins with 'Admin'". Obviously, this is quite a DB-intensive task, but due to the fact that it will be used in emergency situations only, this is something that I do not mind. I also do not mind if table locks happen as nothing will try to use these tables while the creation or rollback of the snapshot is in progress.
The problem can be divided into 2 parts:
creating the snapshot
rolling back to the snapshot
Regarding problem #1. we may have two options:
export the data into XML (file or database column)
duplicate the data inside SQL into the same or different tables (like creating the same table structure again with the same names as the original tables prefixed with "Backup").
Regarding problem #2. the biggest issue I see is how to re-import all data into foreign key related tables which use IDENTITY columns for PK generation. I need to delete all data from all affected tables then re-import everything while temporarily relaxing FK constraints and switching off Identity generation. Once data is loaded I should check if FK constraints are still OK.
Or perhaps I should find a logical way to load tables so that constraint checking can remain in place while loading (as we do not have an unmanageable number of tables this could be a viable solution). Of course I need to do all deletion and re-loading in a single transaction, for obvious reasons.
I suspect there may be no pure SQL-based solution for this, although SQL CLR might be of help to avoid moving data out of SQL Server.
Is there anyone out there with the same problem we face? Maybe someone who successfully solved such problem?
I do not expect a step by step instruction. Any help on where to start, which routes to take (export to RAW XML or keep snapshot inside the DB or both), pros/cons would be really helpful.
Thank you for your help and your time.
Daniel
We don't have this exact problem, but we have a very similar problem in which we provide our customers with a baseline set of configuration data (fairly complex, mostly identity PKs) that needs to be updated when we provide a new release.
Our mechanism is probably overkill for your situation, but I am sure there is a subset of it that is applicable.
The basic approach is this:
First, we execute a script that drops all of the FK constraints and changes the nullability of those FK columns that are currently NOT NULL to NULL. This script also drops all triggers to ensure that any logical constraints implemented in them will not be executed.
Next, we perform the data import, setting identity_insert off before updating a table, then setting it back on after the data in the table is updated.
Next, we execute a script that checks the data integrity of the newly added items with respect to the foreign keys. In our case, we know that items that do not have a corresponding parent record can safely be deleted, but you may choose to take a different approach (report the error and let someone manually handle the issue).
Finally, once we have verified the data, we execute another script that restores the nullability, adds the FKs back, and reinstalls the triggers.
If you have the budget for it, I would strongly recommend that you take a look at the tools that Red Gate provides, specifically SQL Packager and SQL Data Compare (I suspect there may be other tools out there as well, we just don't have any experience with them). These tools have been critical in the successful implementation of our strategy.
Update
We provide the baseline configuration through an SQL Script that is generated by RedGate's SQL Packager.
Because our end-users can modify the database between updates which will cause the identity values in their database to be different in ours, we actually store the baseline primary and foreign keys in separate fields within each record.
When we update the customer database and we need to link new records to known configuration information, we can use the baseline fields to find out what the database-specific FKs should be.
In otherwords, there is always a known set of field ids for well-known configuration records regardless what other data is modified in the database and we can use this to link records together.
For example, if I have Table1 linked to Table2, Table1 will have a baseline PK and Table2 will have a baseline PK and a baseline FKey containing Table1's baseline PK. When we update records, if we add a new Table2 record, all we have to do is find the Table1 record with the specified baseline PK, then update the actual FKey in Table2 with the actual PK in Table1.
A kind of versioning by date ranges is a common method for records in Enterprise applications. As an example we have a table for business entities (us) or companies (uk) and we keep the current official name in another table as follows:
CompanyID Name ValidFrom ValidTo
12 Business Lld 2000-01-01 2008-09-23
12 Business Inc 2008-09-23 NULL
The null in the last record means that this is current one. You may use the above logic and possibly add more columns to gain more control. This way there are no duplicates, you can keep the history up to any level and synchronize the current values across tables easily. Finally the performance will be great.
I'm writing a standalone application and I thought using Entity Framework to store my data.
At the moment the application is small so I can use a local database file to get started.
The thing is that the local database file doesn't have the ability to auto generate integer primary keys as SQL Server does.
It's not a problem defining the ID column as "identify" when creating the table, but when trying to call the SaveChanges method it throws the following exception:
{"Server-generated keys and server-generated values are not supported by SQL Server Compact."}
Any suggestions how to manage primary keys for entities in a local database file that will be compatible with SQL Server in the future?
Thanks,
Ronny
There are three general techniques I can think of for where the database has no auto-number.
1) Do a MAX(ID_column)+1 first to get the next ID value. However, you need to be aware of multi-user issues here. Also, the numbers are not one-use-only. If you delete a row and add a new row, you will get the same ID. This may or may not be a problem for you.
2) Use a GUID. Pretty much guaranteed to be unique, but does have a large footprint for an ID column.
3) Use a separate key table that holds the last ID that was assigned. This ensures numbers are never reused, but adds an extra table into your database.
Every change of data in some row in database should save the previous row data in some kind of history so user can rollback to previous row data state. Is there any good practice for that approach? Tried with DataContract and serializing and deserializing data objects but it becomes little messy with complex objects.
So to be more clear:
I am using NHibernate for data access and want to stay out off database dependency (For testing using SQL server 2005)
What is my intention is to provide data history so every time user can rollback to some previous versions.
An example of usage would be the following:
I have a news article
Somebody make some changes to that article
Main editor see that this news has some typos
It decides to rollback to previous valid version (until the newest version is corrected)
I hope I gave you valid info.
Tables that store changes when the main table changes are called audit tables. You can do this multiple ways:
In the database using triggers: I would recommend this approach because then there is no way that data can change without a record being made. You have to account for 3 types of changes when you do this: Add, Delete, Update. Therefore you need trigger functionality that will work on all three.
Also remember that a transaction can modify multiple records at the same time, so you should work with the full set of modified records, not just the last record (as most people belatedly realize they did).
Control will not be returned to the calling program until the trigger execution is completed. So you should keep the code as light and as fast as possible.
In the middle layer using code: This approach will let you save changes to a different database and possibly take some load off the database. However, a SQL programmer running an UPDATE statement will completely bypass your middle layer and you will not have an audit trail.
Structure of the Audit Table
You will have the following columns:
Autonumber PK, TimeStamp, ActionType + All columns from your original table
and I have done this in the following ways in the past:
Table Structure:
Autonumber PK, TimeStamp, ActionType, TableName, OriginalTableStructureColumns
This structure will mean that you create one audit table per data table saved. The data save and reconstruction is fairly easy to do. I would recommend this approach.
Name Value Pair:
Autonumber PK, TimeStamp, ActionType, TableName, PKColumns, ColumnName, OldValue, NewValue
This structure will let you save any table, but you will have to create name value pairs for each column in your trigger. This is very generic, but expensive. You will also need to write some views to recreate the actual rows by unpivoting the data. This gets to be tedious and is not generally the method followed.
Microsoft have introduced new auditing capabilities into SQL Server 2008. Here's an article describing some of the capabilities and design goals which might help in whichever approach you choose.
MSDN - Auditing in SQL Server 2008
You can use triggers for that.
Here is one example.
AutoAudit is a SQL Server (2005, 2008)
Code-Gen utility that creates Audit
Trail Triggers with:
* Created, Modified, and RowVerwsion (incrementing INT) columns to table
* view to reconstruct deleted rows
* UDF to reconstruct Row History
* Schema Audit Trigger to track schema changes
* Re-code-gens triggers when Alter Table changes the table
http://autoaudit.codeplex.com/
Saving serialized data always gets messy in the end, you're right to stay away from that. The best thing to do is to create a parallel "version" table with the same columns as your main table.
For instance, if you have a table named "book", with columns "id", "name", "author", you could add a table named "book_version" with columns "id", "name", "author", "version_date", "version_user"
Each time you insert or update a record on table "book", your application will also insert into "book_version".
Depending on your database system and the way you database access from your application, you may be able to completely automate this (cfr the Versionable plugin in Doctrine)
One way is to use a DB which supports this natively, like HBase. I wouldn't normally suggest "Change your DB server to get this one feature," but since you don't specify a DB server in your question I'm presuming you mean this as open-ended, and native support in the server is one of the best implementations of this feature.
What database system are you using? If you're using an ACID (atomicity, consistency, isolation, durability) compliant database, can't you just use the inbuilt rollback facility to go back to a previous transaction?
I solved this problem very nice by using NHibernate.Enverse
For those intersted read this:
http://nhforge.org/blogs/nhibernate/archive/2010/07/05/nhibernate-auditing-v3-poor-man-s-envers.aspx
We have built an application which needs a local copy of a table from another database. I would like to write an ado.net routine which will keep the local table in sync with the master. Using .net 2.0, C# and ADO.NET.
Please note I really have no control over the master table which is in a third party, mission critical app I don't wish to mess with.
For example Here is the master data table:
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
OtherUneededField int
OtherUneededField2 int
The local table we need to keep in sync...
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
Perhaps a better approach to this question is what have you done in the past to this type of problem? What has worked best for you or should be avoided at all costs?
My goal with this question is to determine a good way to handle this. So often I am combining data from two or more disjointed data sources. I haven't included database platforms for this reason, it really shouldn't matter. In this current situation both databases are MSSQL, but I prefer the solution not use linked databases or DTS, etc.
Sure, truncating the local table and refilling it each time from the master is an option, but with thousands of rows I don't think this is very efficient. Do you?
EDIT: First, recognize that what you are doing is hand-rolled replication and replication is never simple.
You need to track and apply all of the CRUD state changes. That said, ADO.NET can do this.
To track changes to the source you can use Query Notification with your source database. This requires special permission against the database so the owner of the source database will need to take action to enable this solution. I haven't used this technique myself, but here is a description of it.
See "Query Notifications in SQL Server (ADO.NET)"
Query notifications were introduced in
Microsoft SQL Server 2005 and the
System.Data.SqlClient namespace in
ADO.NET 2.0. Built upon the Service
Broker infrastructure, query
notifications allow applications to be
notified when data has changed. This
feature is particularly useful for
applications that provide a cache of
information from a database, such as a
Web application, and need to be
notified when the source data is
changed.
To apply changes from the source db table you need to retrieve the data from the target db table, apply the changes to the target rows and post the changes back to the target db.
To apply the changes you can either
1) Delete and reinsert all of the rows (simple), or
2) Merge row-by-row changes (hard).
Delete and reinsert is self explanatory, so I won't go into detail on that.
For row-by-row change tracking here is an approach. (I am assuming here that Query Notification doesn't give you row-by-row change information, so you have to calculate it.)
You need to determine which rows were modified and identify inserted and deleted rows. Create a DataView with a sort for each table to get a Find method you can use to lookup matching rows by ID.
Identify modified rows by using a datetime/timestamp column, or by comparing all field values. Copy modified values to the target row.
Identify added and deleted rows by looping over the respective table DataViews and using the Find method of the other DataView to identify rows that do not appear in the first table. Insert or delete rows from the target table as required. (The Delete method doesn't remove the row but marks it for deletion by the TableAdapter Update.)
Good luck!
+tom
I would push in the direction where the application that is inserting the data would insert into one db/table then the other in the same function. Make the application do the work, the db will be pushed already.
Some questions - what db platform? how are you using the data?
I'm going to assume you're just using this data as a lookup... and as you have no timestamp and no ability modify the existing table, i'd just blow away the local copy periodically and pull it down from the master table again.
Unless you've got a hell of a lot of data the overhead for this should be pretty small.
If you need to synch back to the master table, you'll need to do something a bit more exotic.
Can you use SQL replication? This would be preferable to writing code to do it no?