Best practice for data version control in sql and c#

Best practice for data version control in sql and c# - c#

In this question i want to figureout, what is the best practice to control versions of data in sql. We are useing a relational database (Sybase SAP Sql Anywhere). The problem is, we don't know in which layer of our software we should implement a version control system. We want to write a generic system, so that version control is available for all types of data with a small amout of work for every type (Types: Contacts, Appointments, ...).
Here are the options we figured out:
1. Using an entity framework and calculating the difference of two models. Then saving the difference to the database
2. Using triggers and comapre old and new data and save them in separate table
3. Using procedures which proof for changes and save them also in a separate table
I know it's a very general question, but maybe some one has a good idea and solution for our problem.
Edit
Important: I want to create versions of the data itself, not of the sql schema or some sql code.
EDIT2
Lets use the following simple example. I have a tiny contact table (not our real contact table):
CREATE TABLE Contact
(
"GUID" Uniqueidentifier NOT NULL UNIQUE,
"ContactId" BIGINT NOT NULL Identity(1,1),
"Version" INTEGER NOT NULL,
"FirstName" VARCHAR(100),
"LastName" VARCHAR(200),
"Address" VARCHAR(400),
PRIMARY KEY (ContactId, Version)
);
No, every time some one made changes to the contact object, i want to save a new version of it. But im am looking for a general solution. This must be implemented for every type.
Thank you!

As someone who live and breathe database source control (part of amazing team at DBmaestro), I can recommend on combination of 2 methods, depending on how you run the delta.
Using triggers you should save all information you need for the deployment, if it by using slow change dimension or entire table content
Using procedure that analyze difference and knows to generate the relevant delta script

we have the same issue and try to solve it by storing a version and a branch id with every entity we want to follow.
In a different table we store the versions with their predecessor version id, so we can trace where branches meet each other.
Seperatly we have an audit trace with the version number.
I wonder if this has the same elements you need(ed) and whether you advanced since your question and your last edit.
Thanks for the suggestion to combine the unique id and the version number

Related

How to generate a uniqueidentifier in sql if null

I am trying to add a column to an existing SQL table of uniqueidentifier type. That column must not be null and of course unique. I have attempted this code:
ALTER TABLE ScheduleJobs ADD CurrentInstanceID uniqueidentifier not null
followed by:
ALTER TABLE ScheduleJobs ADD CONSTRAINT DF_CurrentInstanceID DEFAULT newsequentialid() FOR CurrentInstanceID
However, when I create a new record (from C#), the uniqueidentifier is always all zeros (presumably null.) I can create the GUID in C# and pass it to sql upon creating a new record which works fine, but I am concerned that a duplicate GUID could be created. Based on my readings, it appears that would be an extremely rare case, but it always seems bad practice to have any sort of potential error floating around. Note that the field will not be a PK for the table. Suggestions and opinions welcome for the sake of education.
I am using C# 4.0 framework with MS SQL SERVER 2008

Sorry for the delay, but I am glad to say that I have this issue resolved. Thanks everyone for your overwhelming support. While no one quite hit the nail on the head (and there were some really good suggestions btw), Eldog brought up Entity Framework not playing nice in his comment. Thanks to him, I simply Googled Entity Framework + GUID and found the solution.
This article steps through the issue and gives a great explanation on the problem, solution, and steps to resolve it. I will note that I decided to step through and test one step at a time and that I didn't have to do the last step. That leads me to believe that part of the issue may have been resolved in later versions of the Entity Framework.
I simply pulled up the edmx file in design view (not xml) and set the StoreGeneratedPattern property to "Identity."
Thanks again for the help and suggestions. You're an awesome bunch.

Does your C# code attempt to pass in a CurrentInstanceID when creating the record? If so, can you drop that column from the INSERT statement?
We do this with numeric primary keys. Our C# code calls a stored procedure for CRUD operations on our records. The C# code generates a negative key on the client side for its own use. When it is ready to create the record, it passes this key to the stored procedure.
The proc ignores this key and inserts the rest of the data. The output of the proc is the actual key that SQL assigned to the record. Finally, the C# code merges the new key into the existing data.

I wouldn't use a GUID for this. GUIDs are used in quite a lot of operations in windows, so this won't be an identifier that will be only unique in your application, it will be a unique identifier in your operating system. Unless this makes sens in your case, I wouldn't use it.
You could use an incremental value, like a simple uint. If your table already has some data, you could write a script that fills existing rows with incremental values for your new column, and add the unique contraint to your column after executing that script.

in your original Create table or alter table use something like the following
create table ScheduleJobs (keyval int, id2 uniqueidentifier not null default newsequentialid())
then don't reference the column in your insert and a new GUID value will be added

How can I keep track of changes to a Database?

I am creating a website that will be used by an accounting dept. to track budget expenditures by different projects.
I am using SQL Server 2008 R2 for the database and ASP.net C# MVC 3 for the website.
What my boss has asked me to do is every time any user updates or creates a project, we need to log that change into a new table called Mapping_log. It should record the whole Mapping row being saved or created, and additionally the user and the datestamp. The notes field will now be mandatory, and the note should be saved to the Mapping_log.
Now when editing the PA, the Notes field will always be empty and below it, it should have a list of the older notes organized by date. I have been looking into maybe using Nlog and Log4net but I have not been able to find any good tutorials for a situation like mine. It seems that those modules are mostly used for error logging, which although important is not exactly what I am try to do at the moment.
I need some direction... does anyone have any advice or tutorials that I could use to learn how I can implement a process that will keep track of changes made to the data by users of the site.
Thanks for your help/advice!

You can consider two new features that SQL Server 2008 introduced: Change Tracking and Change Data Capture.
You could use that and avoid your custom Mapping_log table.
But if you need to apply a more complex -business- rule, perhaps it will better doing that in the application layer, rather than purely in the database.
Regards.

I would just create two triggers - one for the update, one for the insert.
These triggers would look something like this - assuming you also want to log the operation (insert vs. update) in your Mapping_Log table:
CREATE TRIGGER trg_Mapping_Insert
ON dbo.Mapping
AFTER INSERT
AS
INSERT INTO dbo.Mapping_Log(col1, col2, ..., colN, User, DateStamp, Operation)
SELECT
col1, col2, ..., colN, SUSER_NAME(), GETDATE(), 'INSERT'
FROM
Inserted
(your UPDATE trigger would be very similar - just replace "insert" by "update" wherever it appears)
This is done "behind the scenes" for you - once in place, you don't have to do anything anymore to have these operations "logged" to your Mapping_Log table.

Creating snapshot of application data - best practice

We have a text processing application developed in C# using .NET FW 4.0 where the Administrator can define various settings. All this 'settings' data reside in about 50 tables with foreign key relations and Identity primary keys (this one will make it tricky, I think). The entire database is no more than 100K records, with the average table having about 6 short columns. The system is based on MS SQL 2008 R2 Express database.
We face a requirement to create a snapshot of all this data so that the administrator of the system could roll back to one of the snapshots anytime he screws up something. We need to keep the last 5 snapshots only. Creation of the snapshot must be commenced from the application GUI and so must be the rollback to any of the snapshots if needed (use SSMS will not be allowed as direct access to the DB is denied). The system is still in development (are we ever really finished?) which means that new tables and columns are added many times. Thus we need a robust method that can take care of changes automatically (digging code after inserting/changing columns is something we want to avoid unless there's no other way). The best way would be to tell that "I want to create a snapshot of all tables where the name begins with 'Admin'". Obviously, this is quite a DB-intensive task, but due to the fact that it will be used in emergency situations only, this is something that I do not mind. I also do not mind if table locks happen as nothing will try to use these tables while the creation or rollback of the snapshot is in progress.
The problem can be divided into 2 parts:
creating the snapshot
rolling back to the snapshot
Regarding problem #1. we may have two options:
export the data into XML (file or database column)
duplicate the data inside SQL into the same or different tables (like creating the same table structure again with the same names as the original tables prefixed with "Backup").
Regarding problem #2. the biggest issue I see is how to re-import all data into foreign key related tables which use IDENTITY columns for PK generation. I need to delete all data from all affected tables then re-import everything while temporarily relaxing FK constraints and switching off Identity generation. Once data is loaded I should check if FK constraints are still OK.
Or perhaps I should find a logical way to load tables so that constraint checking can remain in place while loading (as we do not have an unmanageable number of tables this could be a viable solution). Of course I need to do all deletion and re-loading in a single transaction, for obvious reasons.
I suspect there may be no pure SQL-based solution for this, although SQL CLR might be of help to avoid moving data out of SQL Server.
Is there anyone out there with the same problem we face? Maybe someone who successfully solved such problem?
I do not expect a step by step instruction. Any help on where to start, which routes to take (export to RAW XML or keep snapshot inside the DB or both), pros/cons would be really helpful.
Thank you for your help and your time.
Daniel

We don't have this exact problem, but we have a very similar problem in which we provide our customers with a baseline set of configuration data (fairly complex, mostly identity PKs) that needs to be updated when we provide a new release.
Our mechanism is probably overkill for your situation, but I am sure there is a subset of it that is applicable.
The basic approach is this:
First, we execute a script that drops all of the FK constraints and changes the nullability of those FK columns that are currently NOT NULL to NULL. This script also drops all triggers to ensure that any logical constraints implemented in them will not be executed.
Next, we perform the data import, setting identity_insert off before updating a table, then setting it back on after the data in the table is updated.
Next, we execute a script that checks the data integrity of the newly added items with respect to the foreign keys. In our case, we know that items that do not have a corresponding parent record can safely be deleted, but you may choose to take a different approach (report the error and let someone manually handle the issue).
Finally, once we have verified the data, we execute another script that restores the nullability, adds the FKs back, and reinstalls the triggers.
If you have the budget for it, I would strongly recommend that you take a look at the tools that Red Gate provides, specifically SQL Packager and SQL Data Compare (I suspect there may be other tools out there as well, we just don't have any experience with them). These tools have been critical in the successful implementation of our strategy.
Update
We provide the baseline configuration through an SQL Script that is generated by RedGate's SQL Packager.
Because our end-users can modify the database between updates which will cause the identity values in their database to be different in ours, we actually store the baseline primary and foreign keys in separate fields within each record.
When we update the customer database and we need to link new records to known configuration information, we can use the baseline fields to find out what the database-specific FKs should be.
In otherwords, there is always a known set of field ids for well-known configuration records regardless what other data is modified in the database and we can use this to link records together.
For example, if I have Table1 linked to Table2, Table1 will have a baseline PK and Table2 will have a baseline PK and a baseline FKey containing Table1's baseline PK. When we update records, if we add a new Table2 record, all we have to do is find the Table1 record with the specified baseline PK, then update the actual FKey in Table2 with the actual PK in Table1.

A kind of versioning by date ranges is a common method for records in Enterprise applications. As an example we have a table for business entities (us) or companies (uk) and we keep the current official name in another table as follows:
CompanyID Name ValidFrom ValidTo
12 Business Lld 2000-01-01 2008-09-23
12 Business Inc 2008-09-23 NULL
The null in the last record means that this is current one. You may use the above logic and possibly add more columns to gain more control. This way there are no duplicates, you can keep the history up to any level and synchronize the current values across tables easily. Finally the performance will be great.

Keeping a history of data changes in database

Every change of data in some row in database should save the previous row data in some kind of history so user can rollback to previous row data state. Is there any good practice for that approach? Tried with DataContract and serializing and deserializing data objects but it becomes little messy with complex objects.
So to be more clear:
I am using NHibernate for data access and want to stay out off database dependency (For testing using SQL server 2005)
What is my intention is to provide data history so every time user can rollback to some previous versions.
An example of usage would be the following:
I have a news article
Somebody make some changes to that article
Main editor see that this news has some typos
It decides to rollback to previous valid version (until the newest version is corrected)
I hope I gave you valid info.

Tables that store changes when the main table changes are called audit tables. You can do this multiple ways:
In the database using triggers: I would recommend this approach because then there is no way that data can change without a record being made. You have to account for 3 types of changes when you do this: Add, Delete, Update. Therefore you need trigger functionality that will work on all three.
Also remember that a transaction can modify multiple records at the same time, so you should work with the full set of modified records, not just the last record (as most people belatedly realize they did).
Control will not be returned to the calling program until the trigger execution is completed. So you should keep the code as light and as fast as possible.
In the middle layer using code: This approach will let you save changes to a different database and possibly take some load off the database. However, a SQL programmer running an UPDATE statement will completely bypass your middle layer and you will not have an audit trail.
Structure of the Audit Table
You will have the following columns:
Autonumber PK, TimeStamp, ActionType + All columns from your original table
and I have done this in the following ways in the past:
Table Structure:
Autonumber PK, TimeStamp, ActionType, TableName, OriginalTableStructureColumns
This structure will mean that you create one audit table per data table saved. The data save and reconstruction is fairly easy to do. I would recommend this approach.
Name Value Pair:
Autonumber PK, TimeStamp, ActionType, TableName, PKColumns, ColumnName, OldValue, NewValue
This structure will let you save any table, but you will have to create name value pairs for each column in your trigger. This is very generic, but expensive. You will also need to write some views to recreate the actual rows by unpivoting the data. This gets to be tedious and is not generally the method followed.

Microsoft have introduced new auditing capabilities into SQL Server 2008. Here's an article describing some of the capabilities and design goals which might help in whichever approach you choose.
MSDN - Auditing in SQL Server 2008

You can use triggers for that.
Here is one example.
AutoAudit is a SQL Server (2005, 2008)
Code-Gen utility that creates Audit
Trail Triggers with:
* Created, Modified, and RowVerwsion (incrementing INT) columns to table
* view to reconstruct deleted rows
* UDF to reconstruct Row History
* Schema Audit Trigger to track schema changes
* Re-code-gens triggers when Alter Table changes the table
http://autoaudit.codeplex.com/

Saving serialized data always gets messy in the end, you're right to stay away from that. The best thing to do is to create a parallel "version" table with the same columns as your main table.
For instance, if you have a table named "book", with columns "id", "name", "author", you could add a table named "book_version" with columns "id", "name", "author", "version_date", "version_user"
Each time you insert or update a record on table "book", your application will also insert into "book_version".
Depending on your database system and the way you database access from your application, you may be able to completely automate this (cfr the Versionable plugin in Doctrine)

One way is to use a DB which supports this natively, like HBase. I wouldn't normally suggest "Change your DB server to get this one feature," but since you don't specify a DB server in your question I'm presuming you mean this as open-ended, and native support in the server is one of the best implementations of this feature.

What database system are you using? If you're using an ACID (atomicity, consistency, isolation, durability) compliant database, can't you just use the inbuilt rollback facility to go back to a previous transaction?

I solved this problem very nice by using NHibernate.Enverse
For those intersted read this:
http://nhforge.org/blogs/nhibernate/archive/2010/07/05/nhibernate-auditing-v3-poor-man-s-envers.aspx

Maintain a local copy of a table from an external database table, ADO.NET

We have built an application which needs a local copy of a table from another database. I would like to write an ado.net routine which will keep the local table in sync with the master. Using .net 2.0, C# and ADO.NET.
Please note I really have no control over the master table which is in a third party, mission critical app I don't wish to mess with.
For example Here is the master data table:
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
OtherUneededField int
OtherUneededField2 int
The local table we need to keep in sync...
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
Perhaps a better approach to this question is what have you done in the past to this type of problem? What has worked best for you or should be avoided at all costs?
My goal with this question is to determine a good way to handle this. So often I am combining data from two or more disjointed data sources. I haven't included database platforms for this reason, it really shouldn't matter. In this current situation both databases are MSSQL, but I prefer the solution not use linked databases or DTS, etc.
Sure, truncating the local table and refilling it each time from the master is an option, but with thousands of rows I don't think this is very efficient. Do you?

EDIT: First, recognize that what you are doing is hand-rolled replication and replication is never simple.
You need to track and apply all of the CRUD state changes. That said, ADO.NET can do this.
To track changes to the source you can use Query Notification with your source database. This requires special permission against the database so the owner of the source database will need to take action to enable this solution. I haven't used this technique myself, but here is a description of it.
See "Query Notifications in SQL Server (ADO.NET)"
Query notifications were introduced in
Microsoft SQL Server 2005 and the
System.Data.SqlClient namespace in
ADO.NET 2.0. Built upon the Service
Broker infrastructure, query
notifications allow applications to be
notified when data has changed. This
feature is particularly useful for
applications that provide a cache of
information from a database, such as a
Web application, and need to be
notified when the source data is
changed.
To apply changes from the source db table you need to retrieve the data from the target db table, apply the changes to the target rows and post the changes back to the target db.
To apply the changes you can either
1) Delete and reinsert all of the rows (simple), or
2) Merge row-by-row changes (hard).
Delete and reinsert is self explanatory, so I won't go into detail on that.
For row-by-row change tracking here is an approach. (I am assuming here that Query Notification doesn't give you row-by-row change information, so you have to calculate it.)
You need to determine which rows were modified and identify inserted and deleted rows. Create a DataView with a sort for each table to get a Find method you can use to lookup matching rows by ID.
Identify modified rows by using a datetime/timestamp column, or by comparing all field values. Copy modified values to the target row.
Identify added and deleted rows by looping over the respective table DataViews and using the Find method of the other DataView to identify rows that do not appear in the first table. Insert or delete rows from the target table as required. (The Delete method doesn't remove the row but marks it for deletion by the TableAdapter Update.)
Good luck!
+tom

I would push in the direction where the application that is inserting the data would insert into one db/table then the other in the same function. Make the application do the work, the db will be pushed already.

Some questions - what db platform? how are you using the data?
I'm going to assume you're just using this data as a lookup... and as you have no timestamp and no ability modify the existing table, i'd just blow away the local copy periodically and pull it down from the master table again.
Unless you've got a hell of a lot of data the overhead for this should be pretty small.
If you need to synch back to the master table, you'll need to do something a bit more exotic.

Can you use SQL replication? This would be preferable to writing code to do it no?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.