Keeping a history of data changes in database

Keeping a history of data changes in database - c#

Every change of data in some row in database should save the previous row data in some kind of history so user can rollback to previous row data state. Is there any good practice for that approach? Tried with DataContract and serializing and deserializing data objects but it becomes little messy with complex objects.
So to be more clear:
I am using NHibernate for data access and want to stay out off database dependency (For testing using SQL server 2005)
What is my intention is to provide data history so every time user can rollback to some previous versions.
An example of usage would be the following:
I have a news article
Somebody make some changes to that article
Main editor see that this news has some typos
It decides to rollback to previous valid version (until the newest version is corrected)
I hope I gave you valid info.

Tables that store changes when the main table changes are called audit tables. You can do this multiple ways:
In the database using triggers: I would recommend this approach because then there is no way that data can change without a record being made. You have to account for 3 types of changes when you do this: Add, Delete, Update. Therefore you need trigger functionality that will work on all three.
Also remember that a transaction can modify multiple records at the same time, so you should work with the full set of modified records, not just the last record (as most people belatedly realize they did).
Control will not be returned to the calling program until the trigger execution is completed. So you should keep the code as light and as fast as possible.
In the middle layer using code: This approach will let you save changes to a different database and possibly take some load off the database. However, a SQL programmer running an UPDATE statement will completely bypass your middle layer and you will not have an audit trail.
Structure of the Audit Table
You will have the following columns:
Autonumber PK, TimeStamp, ActionType + All columns from your original table
and I have done this in the following ways in the past:
Table Structure:
Autonumber PK, TimeStamp, ActionType, TableName, OriginalTableStructureColumns
This structure will mean that you create one audit table per data table saved. The data save and reconstruction is fairly easy to do. I would recommend this approach.
Name Value Pair:
Autonumber PK, TimeStamp, ActionType, TableName, PKColumns, ColumnName, OldValue, NewValue
This structure will let you save any table, but you will have to create name value pairs for each column in your trigger. This is very generic, but expensive. You will also need to write some views to recreate the actual rows by unpivoting the data. This gets to be tedious and is not generally the method followed.

Microsoft have introduced new auditing capabilities into SQL Server 2008. Here's an article describing some of the capabilities and design goals which might help in whichever approach you choose.
MSDN - Auditing in SQL Server 2008

You can use triggers for that.
Here is one example.
AutoAudit is a SQL Server (2005, 2008)
Code-Gen utility that creates Audit
Trail Triggers with:
* Created, Modified, and RowVerwsion (incrementing INT) columns to table
* view to reconstruct deleted rows
* UDF to reconstruct Row History
* Schema Audit Trigger to track schema changes
* Re-code-gens triggers when Alter Table changes the table
http://autoaudit.codeplex.com/

Saving serialized data always gets messy in the end, you're right to stay away from that. The best thing to do is to create a parallel "version" table with the same columns as your main table.
For instance, if you have a table named "book", with columns "id", "name", "author", you could add a table named "book_version" with columns "id", "name", "author", "version_date", "version_user"
Each time you insert or update a record on table "book", your application will also insert into "book_version".
Depending on your database system and the way you database access from your application, you may be able to completely automate this (cfr the Versionable plugin in Doctrine)

One way is to use a DB which supports this natively, like HBase. I wouldn't normally suggest "Change your DB server to get this one feature," but since you don't specify a DB server in your question I'm presuming you mean this as open-ended, and native support in the server is one of the best implementations of this feature.

What database system are you using? If you're using an ACID (atomicity, consistency, isolation, durability) compliant database, can't you just use the inbuilt rollback facility to go back to a previous transaction?

I solved this problem very nice by using NHibernate.Enverse
For those intersted read this:
http://nhforge.org/blogs/nhibernate/archive/2010/07/05/nhibernate-auditing-v3-poor-man-s-envers.aspx

Related

Adding Entity Objects to LINQ-to-SQL Data Context at Runtime - SQL, C#(WPF)

I've hit a wall when it comes to adding a new entity object (a regular SQL table) to the Data Context using LINQ-to-SQL. This isn't regarding the drag-and-drop method that is cited regularly across many other threads. This method has worked repeatedly without issue.
The end goal is relatively simple. I need to find a way to add a table that gets created during runtime via stored procedure to the current Data Context of the LINQ-to-SQL dbml file. I'll then need to be able to use the regular LINQ query methods/extension methods (InsertOnSubmit(), DeleteOnSubmit(), Where(), Contains(), FirstOrDefault(), etc...) on this new table object through the existing Data Context. Essentially, I need to find a way to procedurally create the code that would otherwise be automatically generated when you do use the drag-and-drop method during development (when the application isn't running), but have it generate this same code while the application is running via command and/or event trigger.
More Detail
There's one table that gets used a lot and, over the course of an entire year, collects many thousands of rows. Each row contains a timestamp and this table needs to be divided into multiple tables based on the year that the row was added.
Current Solution (using one table)
Single table with tens of thousands of rows which are constantly queried against.
Table is added to Data Context during development using drag-and-drop, so there are no additional coding issues
Significant performance decrease over time
Goals (using multiple tables)
(Complete) While the application is running, use C# code to check if a table for the current year already exists. If it does, no action is taken. If not, a new table gets created using a stored procedure with the current year as a prefix on the table name (2017_TableName, 2018_TableName, 2019_TableName, and so on...).
(Incomplete) While the application is still running, add the newly created table to the active LINQ-to-SQL Data Context (the same code that would otherwise be added using drag-and-drop during development).
(Incomplete) Run regular LINQ queries against the newly added table.
Final Thoughts
Other than the above, my only other concern is how to write the C# code that references a table that may or may not already exist. Is it possible to use a variable in place of the standard 'DB_DataContext.2019_TableName' methodology in order to actually get the table's data into a UI control? Is there a way to simply create an Enumerable of all the tables where the name is prefixed with a year and then select the most current table?
From what I've read so far, the most likely solution seems to involve the use of a SQL add-on like SQLMetal or Huagati which (based solely from what I've read) will generate the code I need during runtime and update the corresponding dbml file. I have no experience using these types of add-ons, so any additional insight into these would be appreciated.
Lastly, I've seen some references to LINQ-to-Entities and/or LINQ-to-Objects. Would these be the components I'm looking for?
Thanks for reading through a rather lengthy first post. Any comments/criticisms are welcome.

The simplest way to achieve what you want is to redirect in SQL Server, and leave your client code alone. At design-time create your L2S Data Context, or EF DbContex referencing a database with only a single table. Then at run-time substitue a view or synonym for that table that points to the "current year" table.
HOWEVER this should not be necessary in the first place. SQL Server supports partitioning, so you can store all the data in a physically separate data structures, but have a single logical table. And SQL Server supports columnstore tables, which can compress and store many millions of rows with excellent performance.

How can I keep track of changes to a Database?

I am creating a website that will be used by an accounting dept. to track budget expenditures by different projects.
I am using SQL Server 2008 R2 for the database and ASP.net C# MVC 3 for the website.
What my boss has asked me to do is every time any user updates or creates a project, we need to log that change into a new table called Mapping_log. It should record the whole Mapping row being saved or created, and additionally the user and the datestamp. The notes field will now be mandatory, and the note should be saved to the Mapping_log.
Now when editing the PA, the Notes field will always be empty and below it, it should have a list of the older notes organized by date. I have been looking into maybe using Nlog and Log4net but I have not been able to find any good tutorials for a situation like mine. It seems that those modules are mostly used for error logging, which although important is not exactly what I am try to do at the moment.
I need some direction... does anyone have any advice or tutorials that I could use to learn how I can implement a process that will keep track of changes made to the data by users of the site.
Thanks for your help/advice!

You can consider two new features that SQL Server 2008 introduced: Change Tracking and Change Data Capture.
You could use that and avoid your custom Mapping_log table.
But if you need to apply a more complex -business- rule, perhaps it will better doing that in the application layer, rather than purely in the database.
Regards.

I would just create two triggers - one for the update, one for the insert.
These triggers would look something like this - assuming you also want to log the operation (insert vs. update) in your Mapping_Log table:
CREATE TRIGGER trg_Mapping_Insert
ON dbo.Mapping
AFTER INSERT
AS
INSERT INTO dbo.Mapping_Log(col1, col2, ..., colN, User, DateStamp, Operation)
SELECT
col1, col2, ..., colN, SUSER_NAME(), GETDATE(), 'INSERT'
FROM
Inserted
(your UPDATE trigger would be very similar - just replace "insert" by "update" wherever it appears)
This is done "behind the scenes" for you - once in place, you don't have to do anything anymore to have these operations "logged" to your Mapping_Log table.

Creating snapshot of application data - best practice

We have a text processing application developed in C# using .NET FW 4.0 where the Administrator can define various settings. All this 'settings' data reside in about 50 tables with foreign key relations and Identity primary keys (this one will make it tricky, I think). The entire database is no more than 100K records, with the average table having about 6 short columns. The system is based on MS SQL 2008 R2 Express database.
We face a requirement to create a snapshot of all this data so that the administrator of the system could roll back to one of the snapshots anytime he screws up something. We need to keep the last 5 snapshots only. Creation of the snapshot must be commenced from the application GUI and so must be the rollback to any of the snapshots if needed (use SSMS will not be allowed as direct access to the DB is denied). The system is still in development (are we ever really finished?) which means that new tables and columns are added many times. Thus we need a robust method that can take care of changes automatically (digging code after inserting/changing columns is something we want to avoid unless there's no other way). The best way would be to tell that "I want to create a snapshot of all tables where the name begins with 'Admin'". Obviously, this is quite a DB-intensive task, but due to the fact that it will be used in emergency situations only, this is something that I do not mind. I also do not mind if table locks happen as nothing will try to use these tables while the creation or rollback of the snapshot is in progress.
The problem can be divided into 2 parts:
creating the snapshot
rolling back to the snapshot
Regarding problem #1. we may have two options:
export the data into XML (file or database column)
duplicate the data inside SQL into the same or different tables (like creating the same table structure again with the same names as the original tables prefixed with "Backup").
Regarding problem #2. the biggest issue I see is how to re-import all data into foreign key related tables which use IDENTITY columns for PK generation. I need to delete all data from all affected tables then re-import everything while temporarily relaxing FK constraints and switching off Identity generation. Once data is loaded I should check if FK constraints are still OK.
Or perhaps I should find a logical way to load tables so that constraint checking can remain in place while loading (as we do not have an unmanageable number of tables this could be a viable solution). Of course I need to do all deletion and re-loading in a single transaction, for obvious reasons.
I suspect there may be no pure SQL-based solution for this, although SQL CLR might be of help to avoid moving data out of SQL Server.
Is there anyone out there with the same problem we face? Maybe someone who successfully solved such problem?
I do not expect a step by step instruction. Any help on where to start, which routes to take (export to RAW XML or keep snapshot inside the DB or both), pros/cons would be really helpful.
Thank you for your help and your time.
Daniel

We don't have this exact problem, but we have a very similar problem in which we provide our customers with a baseline set of configuration data (fairly complex, mostly identity PKs) that needs to be updated when we provide a new release.
Our mechanism is probably overkill for your situation, but I am sure there is a subset of it that is applicable.
The basic approach is this:
First, we execute a script that drops all of the FK constraints and changes the nullability of those FK columns that are currently NOT NULL to NULL. This script also drops all triggers to ensure that any logical constraints implemented in them will not be executed.
Next, we perform the data import, setting identity_insert off before updating a table, then setting it back on after the data in the table is updated.
Next, we execute a script that checks the data integrity of the newly added items with respect to the foreign keys. In our case, we know that items that do not have a corresponding parent record can safely be deleted, but you may choose to take a different approach (report the error and let someone manually handle the issue).
Finally, once we have verified the data, we execute another script that restores the nullability, adds the FKs back, and reinstalls the triggers.
If you have the budget for it, I would strongly recommend that you take a look at the tools that Red Gate provides, specifically SQL Packager and SQL Data Compare (I suspect there may be other tools out there as well, we just don't have any experience with them). These tools have been critical in the successful implementation of our strategy.
Update
We provide the baseline configuration through an SQL Script that is generated by RedGate's SQL Packager.
Because our end-users can modify the database between updates which will cause the identity values in their database to be different in ours, we actually store the baseline primary and foreign keys in separate fields within each record.
When we update the customer database and we need to link new records to known configuration information, we can use the baseline fields to find out what the database-specific FKs should be.
In otherwords, there is always a known set of field ids for well-known configuration records regardless what other data is modified in the database and we can use this to link records together.
For example, if I have Table1 linked to Table2, Table1 will have a baseline PK and Table2 will have a baseline PK and a baseline FKey containing Table1's baseline PK. When we update records, if we add a new Table2 record, all we have to do is find the Table1 record with the specified baseline PK, then update the actual FKey in Table2 with the actual PK in Table1.

A kind of versioning by date ranges is a common method for records in Enterprise applications. As an example we have a table for business entities (us) or companies (uk) and we keep the current official name in another table as follows:
CompanyID Name ValidFrom ValidTo
12 Business Lld 2000-01-01 2008-09-23
12 Business Inc 2008-09-23 NULL
The null in the last record means that this is current one. You may use the above logic and possibly add more columns to gain more control. This way there are no duplicates, you can keep the history up to any level and synchronize the current values across tables easily. Finally the performance will be great.

Why creating Tables in run-time (code behind) is bad?

People suggest creating database table dynamically (or, in run-time) should be avoided, with the saying that it is bad practice and will be hard to maintain.
I don't see the reason why, and I don't see difference between creating table and any another SQL query/statement such as SELECT or INSERT. I wrote apps that create, delete and modify database and tables in run time, and so far I do not see any performance issues.
Can anyone explane the cons of creating database and tables in run-time?

Tables are much more complex entities than rows and managing table creation is much more complex than an insert which has to abide by an existing model, the table. True, a table create statement is a standard SQL operation but depending on creating them dynamically smacks of a bad design decisions.
Now, if you just create one or two and that's it, or an entire database dynamically, or from a script once, that might be ok. But if you depend on having to create more and more tables to handle your data you will also need to join more and more and query more and more. One very serious issue I encountered with an app that made use of dynamic table creation is that a single SQL Server query can only involve 255 tables. It's a built-in constraint. (And that's SQL Server, not CE.) It only took a few weeks in production for this limit to be reached resulting in a nonfunctioning application.
And if you get into editing the tables, e.g. adding/dropping columns, then your maintenance headache gets even worse. There's also the matter of binding your db data to your app's logic. Another issue is upgrading production databases. This would really be a challenge if a db had been growing with objects dynamically and you suddenly needed to update the model.
When you need to store data in such a dynamic manner the standard practice is to make use of EAV models. You have fixed tables and your data is added dynamically as rows so your schema does not have to change. There are drawbacks of course but it's generally thought of as better practice.

KMC ,
Remember the following points
What if you want to add or remove a column , you many need to change in the code and compile it agian
what if the database location changes
Developers who are not very good at database can make changes , if you create the schema at the backend , DBA's can take care of it.
If you get any performance issues , it may get tough to debug.

You will need to be a little clearer about what you mean by "creating tables".
One reason to not allow the application to control table creation and deletion is that this is a task that should be handled only by an administrator. You don't want normal users to have the ability to delete whole tables.
Temporary tables ar a different story, and you may need to create temporary tables as part of your queries, but your basic database structure should be managed only by someone with the rights to do so.

sometimes, creating tables dynamically is not the best option security-wise (Google SQL injection), and it would be better using stored procedures and have your insert or update operations occur at the database level by executing the stored procedures in code.

Maintain a local copy of a table from an external database table, ADO.NET

We have built an application which needs a local copy of a table from another database. I would like to write an ado.net routine which will keep the local table in sync with the master. Using .net 2.0, C# and ADO.NET.
Please note I really have no control over the master table which is in a third party, mission critical app I don't wish to mess with.
For example Here is the master data table:
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
OtherUneededField int
OtherUneededField2 int
The local table we need to keep in sync...
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
Perhaps a better approach to this question is what have you done in the past to this type of problem? What has worked best for you or should be avoided at all costs?
My goal with this question is to determine a good way to handle this. So often I am combining data from two or more disjointed data sources. I haven't included database platforms for this reason, it really shouldn't matter. In this current situation both databases are MSSQL, but I prefer the solution not use linked databases or DTS, etc.
Sure, truncating the local table and refilling it each time from the master is an option, but with thousands of rows I don't think this is very efficient. Do you?

EDIT: First, recognize that what you are doing is hand-rolled replication and replication is never simple.
You need to track and apply all of the CRUD state changes. That said, ADO.NET can do this.
To track changes to the source you can use Query Notification with your source database. This requires special permission against the database so the owner of the source database will need to take action to enable this solution. I haven't used this technique myself, but here is a description of it.
See "Query Notifications in SQL Server (ADO.NET)"
Query notifications were introduced in
Microsoft SQL Server 2005 and the
System.Data.SqlClient namespace in
ADO.NET 2.0. Built upon the Service
Broker infrastructure, query
notifications allow applications to be
notified when data has changed. This
feature is particularly useful for
applications that provide a cache of
information from a database, such as a
Web application, and need to be
notified when the source data is
changed.
To apply changes from the source db table you need to retrieve the data from the target db table, apply the changes to the target rows and post the changes back to the target db.
To apply the changes you can either
1) Delete and reinsert all of the rows (simple), or
2) Merge row-by-row changes (hard).
Delete and reinsert is self explanatory, so I won't go into detail on that.
For row-by-row change tracking here is an approach. (I am assuming here that Query Notification doesn't give you row-by-row change information, so you have to calculate it.)
You need to determine which rows were modified and identify inserted and deleted rows. Create a DataView with a sort for each table to get a Find method you can use to lookup matching rows by ID.
Identify modified rows by using a datetime/timestamp column, or by comparing all field values. Copy modified values to the target row.
Identify added and deleted rows by looping over the respective table DataViews and using the Find method of the other DataView to identify rows that do not appear in the first table. Insert or delete rows from the target table as required. (The Delete method doesn't remove the row but marks it for deletion by the TableAdapter Update.)
Good luck!
+tom

I would push in the direction where the application that is inserting the data would insert into one db/table then the other in the same function. Make the application do the work, the db will be pushed already.

Some questions - what db platform? how are you using the data?
I'm going to assume you're just using this data as a lookup... and as you have no timestamp and no ability modify the existing table, i'd just blow away the local copy periodically and pull it down from the master table again.
Unless you've got a hell of a lot of data the overhead for this should be pretty small.
If you need to synch back to the master table, you'll need to do something a bit more exotic.

Can you use SQL replication? This would be preferable to writing code to do it no?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.