Creating snapshot of application data - best practice

Creating snapshot of application data - best practice - c#

We have a text processing application developed in C# using .NET FW 4.0 where the Administrator can define various settings. All this 'settings' data reside in about 50 tables with foreign key relations and Identity primary keys (this one will make it tricky, I think). The entire database is no more than 100K records, with the average table having about 6 short columns. The system is based on MS SQL 2008 R2 Express database.
We face a requirement to create a snapshot of all this data so that the administrator of the system could roll back to one of the snapshots anytime he screws up something. We need to keep the last 5 snapshots only. Creation of the snapshot must be commenced from the application GUI and so must be the rollback to any of the snapshots if needed (use SSMS will not be allowed as direct access to the DB is denied). The system is still in development (are we ever really finished?) which means that new tables and columns are added many times. Thus we need a robust method that can take care of changes automatically (digging code after inserting/changing columns is something we want to avoid unless there's no other way). The best way would be to tell that "I want to create a snapshot of all tables where the name begins with 'Admin'". Obviously, this is quite a DB-intensive task, but due to the fact that it will be used in emergency situations only, this is something that I do not mind. I also do not mind if table locks happen as nothing will try to use these tables while the creation or rollback of the snapshot is in progress.
The problem can be divided into 2 parts:
creating the snapshot
rolling back to the snapshot
Regarding problem #1. we may have two options:
export the data into XML (file or database column)
duplicate the data inside SQL into the same or different tables (like creating the same table structure again with the same names as the original tables prefixed with "Backup").
Regarding problem #2. the biggest issue I see is how to re-import all data into foreign key related tables which use IDENTITY columns for PK generation. I need to delete all data from all affected tables then re-import everything while temporarily relaxing FK constraints and switching off Identity generation. Once data is loaded I should check if FK constraints are still OK.
Or perhaps I should find a logical way to load tables so that constraint checking can remain in place while loading (as we do not have an unmanageable number of tables this could be a viable solution). Of course I need to do all deletion and re-loading in a single transaction, for obvious reasons.
I suspect there may be no pure SQL-based solution for this, although SQL CLR might be of help to avoid moving data out of SQL Server.
Is there anyone out there with the same problem we face? Maybe someone who successfully solved such problem?
I do not expect a step by step instruction. Any help on where to start, which routes to take (export to RAW XML or keep snapshot inside the DB or both), pros/cons would be really helpful.
Thank you for your help and your time.
Daniel

We don't have this exact problem, but we have a very similar problem in which we provide our customers with a baseline set of configuration data (fairly complex, mostly identity PKs) that needs to be updated when we provide a new release.
Our mechanism is probably overkill for your situation, but I am sure there is a subset of it that is applicable.
The basic approach is this:
First, we execute a script that drops all of the FK constraints and changes the nullability of those FK columns that are currently NOT NULL to NULL. This script also drops all triggers to ensure that any logical constraints implemented in them will not be executed.
Next, we perform the data import, setting identity_insert off before updating a table, then setting it back on after the data in the table is updated.
Next, we execute a script that checks the data integrity of the newly added items with respect to the foreign keys. In our case, we know that items that do not have a corresponding parent record can safely be deleted, but you may choose to take a different approach (report the error and let someone manually handle the issue).
Finally, once we have verified the data, we execute another script that restores the nullability, adds the FKs back, and reinstalls the triggers.
If you have the budget for it, I would strongly recommend that you take a look at the tools that Red Gate provides, specifically SQL Packager and SQL Data Compare (I suspect there may be other tools out there as well, we just don't have any experience with them). These tools have been critical in the successful implementation of our strategy.
Update
We provide the baseline configuration through an SQL Script that is generated by RedGate's SQL Packager.
Because our end-users can modify the database between updates which will cause the identity values in their database to be different in ours, we actually store the baseline primary and foreign keys in separate fields within each record.
When we update the customer database and we need to link new records to known configuration information, we can use the baseline fields to find out what the database-specific FKs should be.
In otherwords, there is always a known set of field ids for well-known configuration records regardless what other data is modified in the database and we can use this to link records together.
For example, if I have Table1 linked to Table2, Table1 will have a baseline PK and Table2 will have a baseline PK and a baseline FKey containing Table1's baseline PK. When we update records, if we add a new Table2 record, all we have to do is find the Table1 record with the specified baseline PK, then update the actual FKey in Table2 with the actual PK in Table1.

A kind of versioning by date ranges is a common method for records in Enterprise applications. As an example we have a table for business entities (us) or companies (uk) and we keep the current official name in another table as follows:
CompanyID Name ValidFrom ValidTo
12 Business Lld 2000-01-01 2008-09-23
12 Business Inc 2008-09-23 NULL
The null in the last record means that this is current one. You may use the above logic and possibly add more columns to gain more control. This way there are no duplicates, you can keep the history up to any level and synchronize the current values across tables easily. Finally the performance will be great.

Related

Adding Entity Objects to LINQ-to-SQL Data Context at Runtime - SQL, C#(WPF)

I've hit a wall when it comes to adding a new entity object (a regular SQL table) to the Data Context using LINQ-to-SQL. This isn't regarding the drag-and-drop method that is cited regularly across many other threads. This method has worked repeatedly without issue.
The end goal is relatively simple. I need to find a way to add a table that gets created during runtime via stored procedure to the current Data Context of the LINQ-to-SQL dbml file. I'll then need to be able to use the regular LINQ query methods/extension methods (InsertOnSubmit(), DeleteOnSubmit(), Where(), Contains(), FirstOrDefault(), etc...) on this new table object through the existing Data Context. Essentially, I need to find a way to procedurally create the code that would otherwise be automatically generated when you do use the drag-and-drop method during development (when the application isn't running), but have it generate this same code while the application is running via command and/or event trigger.
More Detail
There's one table that gets used a lot and, over the course of an entire year, collects many thousands of rows. Each row contains a timestamp and this table needs to be divided into multiple tables based on the year that the row was added.
Current Solution (using one table)
Single table with tens of thousands of rows which are constantly queried against.
Table is added to Data Context during development using drag-and-drop, so there are no additional coding issues
Significant performance decrease over time
Goals (using multiple tables)
(Complete) While the application is running, use C# code to check if a table for the current year already exists. If it does, no action is taken. If not, a new table gets created using a stored procedure with the current year as a prefix on the table name (2017_TableName, 2018_TableName, 2019_TableName, and so on...).
(Incomplete) While the application is still running, add the newly created table to the active LINQ-to-SQL Data Context (the same code that would otherwise be added using drag-and-drop during development).
(Incomplete) Run regular LINQ queries against the newly added table.
Final Thoughts
Other than the above, my only other concern is how to write the C# code that references a table that may or may not already exist. Is it possible to use a variable in place of the standard 'DB_DataContext.2019_TableName' methodology in order to actually get the table's data into a UI control? Is there a way to simply create an Enumerable of all the tables where the name is prefixed with a year and then select the most current table?
From what I've read so far, the most likely solution seems to involve the use of a SQL add-on like SQLMetal or Huagati which (based solely from what I've read) will generate the code I need during runtime and update the corresponding dbml file. I have no experience using these types of add-ons, so any additional insight into these would be appreciated.
Lastly, I've seen some references to LINQ-to-Entities and/or LINQ-to-Objects. Would these be the components I'm looking for?
Thanks for reading through a rather lengthy first post. Any comments/criticisms are welcome.

The simplest way to achieve what you want is to redirect in SQL Server, and leave your client code alone. At design-time create your L2S Data Context, or EF DbContex referencing a database with only a single table. Then at run-time substitue a view or synonym for that table that points to the "current year" table.
HOWEVER this should not be necessary in the first place. SQL Server supports partitioning, so you can store all the data in a physically separate data structures, but have a single logical table. And SQL Server supports columnstore tables, which can compress and store many millions of rows with excellent performance.

Is it OK to generate IDs of entities outside of CRM?

I need to one-way-synchronize external Data with CRM on a regular basis (i.e. nightly).
This includes creating new Records as well as updating existing ones.
That means, I have to keep track of IDs of CRM-Entities created by my synchronization process.
emphasized textI already managed to create and update records in CRM from lines of database-tables so this is not a problem.
Currently, my mapped tables have the following columns
id: The tables primary key set when inserting a new row
new_myentityid: Primary Attribute of the mapped entity, set after the record was created by the synchronization process
new_name etc.: the values of the records attributes
However, I see a way to drastically simplify the whole process:
Instead of having a PrimaryKey (id) in the database and keeping track of the CRM ID (new_myentityid) in a seperate column, I could as well get rid of the id-columns and make the CRM-ID-Column (new_myentityid) primary key of the table and set it when inserting new records (newid()), so basically substitute id with new_myentityid from a database perspective. I could then bulk-upsert via ExecuteMultipleRequest in combination with UpsertRequest.
This way, I would save a column in each mapped table as well as logic to store the CRM IDs after creating them.
Question
Would this be acceptable or is there anything that should make me avoid this?

Disclaimer: I'm not aware of a best practice for this so this is just my personal opinion on the matter having developed for Dynamics several times.
I think that using the CRM Entity GUID for your primary key is a good idea. It's less complicated and is handled well in SQL. I assume the column in your database is uniqueidentifier.
My only comment is to not generate the GUIDs yourself. Let CRM generate them for you as it does a better job at keeping everything sequential and indexed.
See this blog entry on MSDN for further detail

I'm probably a little late to this discussion but just wanted to add my tuppence worth.
There is nothing inherently wrong with specifying the GUID when creating a new record in CRM, and this behaviour is explicitly supported by the SDK.
A common real life scenario is when creating records by script; it is useful to have the same GUID for an entity in Dev, Test and Production environments (Admittedly we normally use the GUID auto generated in Dev).
The reason that it is considered best practice to allow CRM generate its own GUID (https://msdn.microsoft.com/en-us/library/gg509027.aspx) is that CRM will generate the GUID sequentially. Using newid() generates a statistically random GUID. This has a performance impact on SQL server around index maintenance. This thread provides some insight: What are the performance improvement of Sequential Guid over standard Guid?
But basically specifying your own GUID can cause the underlying SQL INSERT statement to become more expensive. Read and Update operations should remain the same.
If you are generating you own GUIDs is SQL you can always use NEWSEQUENTIALID (https://msdn.microsoft.com/en-us/library/ms189786.aspx) for a sequentially generated GUIDs.

Hi previous posts cover this well. Just to note that if you did go with generating GUIDs outside of CRM you could mitigate against the potential performance impact (INSERTS) simply by running a weekly Maintenance plan to refresh the clustered indices directly on the SQL database(s) this would I believe ensure that GUIDs were ordered sequentially. In any case, CRM/API will always be the bottleneck, so best to do things in the way that the platform expects to avoid issues later on.

Why not save in new table?
likes origin exist a table named "customer",and your new data save in "customer_update",the field as same as the origin table.
it's will help you in future.maybe you want have a look the data's orgin.

SQL recursively copy rows from multiple tables following PK FK relationships

I was given the task of creating a stored procedure to copy every piece of data associated with a given ID in our database. This data spans dozens of tables. each table may have dozens of matching rows.
example:
table Account
pk = AccountID
Table AccountSettings
FK = AccountID
Table Users
PK = UserID
FK = AccountID
Table UserContent
PK = UserContentID
FK = UserID
I want to create a copy of everything that is associated with an AccountID(which will traverse nearly every table) The copy will have a new AccountID and UserContentID but will have the same UserID. the new data needs to be in its respective table.
:) fun right?
The above is just a sample but I will be doing this for something like 50 or 60 tables.
I have researched using CTEs but am still a bit foggy on them. that may prove to be the best method. MY SQL skills are...... well I have worked with it for about 40 logged hours so far :)
Any advice or direction on where to look would be greatly appreciated. In addition, I am not opposed to doing this via C# if that would be possible or better.
Thanks in advance for any help of info.

The simplest way to solve this is the brute force way: write a very long proc that processes each table individually. This will be error-prone and very hard to maintain. But it will have the advantage of not relying on the database or database metadata to be in any particularly consistent state.
If you want something that works based on metadata, things are more interesting. You have three challenges there:
You need to programmatically identify all the related tables.
You need to generate insert statements for all 50 or 60.
You need to capture generated ids for those tables that are more than one or two steps away from the Account table, so that they can in turn be used as foreign keys in yet more copied records.
I've looked at this problem in the past, and while I can't offer you a watertight algorithm, I can give you a general heuristic. In other words: this is how I'd approach it.
Using a later version of MS Entity Framework (you said you'd be open to using C#), build a model of the Account table and all the related tables.
Review the heck out of it. If your database is like many, some of the relationships your application(s) assume will, for whatever reason, not have an actual foreign key relationship set up in the database. Create them in your model anyway.
Write a little recursive routine in C# that can take an Account object and traverse all the related tables. Pick a couple of Account instances and have it dump table name and key information to a file. Review that for completeness and plausibility.
Once you are satisfied you have a good model and a good algorithm that picks up everything, it's time to get cracking on the code. You need to write a more complicated algorithm that can read an Account and recursively clone all the records that reference it. You will probably need reflection in order to do this, but it's not that hard: all the metadata that you need will be in there, somewhere.
Test your code. Allow plenty of time for debugging.
Use your first algorithm, in step 3, to compare results for completeness and accuracy.
The advantage of the EF approach: as the database changes, so can your model, and if your code is metadata-based, it ought to be able to adapt.
The disadvantage: if you have such phenomena as fields that are "really" the same but are different types, or complex three-way relationships that aren't modeled properly, or embedded CSV lists that you'd need to parse out, this won't work. It only works if your database is in good shape and is well-modeled. Otherwise you'll need to resort to brute force.

Keeping a history of data changes in database

Every change of data in some row in database should save the previous row data in some kind of history so user can rollback to previous row data state. Is there any good practice for that approach? Tried with DataContract and serializing and deserializing data objects but it becomes little messy with complex objects.
So to be more clear:
I am using NHibernate for data access and want to stay out off database dependency (For testing using SQL server 2005)
What is my intention is to provide data history so every time user can rollback to some previous versions.
An example of usage would be the following:
I have a news article
Somebody make some changes to that article
Main editor see that this news has some typos
It decides to rollback to previous valid version (until the newest version is corrected)
I hope I gave you valid info.

Tables that store changes when the main table changes are called audit tables. You can do this multiple ways:
In the database using triggers: I would recommend this approach because then there is no way that data can change without a record being made. You have to account for 3 types of changes when you do this: Add, Delete, Update. Therefore you need trigger functionality that will work on all three.
Also remember that a transaction can modify multiple records at the same time, so you should work with the full set of modified records, not just the last record (as most people belatedly realize they did).
Control will not be returned to the calling program until the trigger execution is completed. So you should keep the code as light and as fast as possible.
In the middle layer using code: This approach will let you save changes to a different database and possibly take some load off the database. However, a SQL programmer running an UPDATE statement will completely bypass your middle layer and you will not have an audit trail.
Structure of the Audit Table
You will have the following columns:
Autonumber PK, TimeStamp, ActionType + All columns from your original table
and I have done this in the following ways in the past:
Table Structure:
Autonumber PK, TimeStamp, ActionType, TableName, OriginalTableStructureColumns
This structure will mean that you create one audit table per data table saved. The data save and reconstruction is fairly easy to do. I would recommend this approach.
Name Value Pair:
Autonumber PK, TimeStamp, ActionType, TableName, PKColumns, ColumnName, OldValue, NewValue
This structure will let you save any table, but you will have to create name value pairs for each column in your trigger. This is very generic, but expensive. You will also need to write some views to recreate the actual rows by unpivoting the data. This gets to be tedious and is not generally the method followed.

Microsoft have introduced new auditing capabilities into SQL Server 2008. Here's an article describing some of the capabilities and design goals which might help in whichever approach you choose.
MSDN - Auditing in SQL Server 2008

You can use triggers for that.
Here is one example.
AutoAudit is a SQL Server (2005, 2008)
Code-Gen utility that creates Audit
Trail Triggers with:
* Created, Modified, and RowVerwsion (incrementing INT) columns to table
* view to reconstruct deleted rows
* UDF to reconstruct Row History
* Schema Audit Trigger to track schema changes
* Re-code-gens triggers when Alter Table changes the table
http://autoaudit.codeplex.com/

Saving serialized data always gets messy in the end, you're right to stay away from that. The best thing to do is to create a parallel "version" table with the same columns as your main table.
For instance, if you have a table named "book", with columns "id", "name", "author", you could add a table named "book_version" with columns "id", "name", "author", "version_date", "version_user"
Each time you insert or update a record on table "book", your application will also insert into "book_version".
Depending on your database system and the way you database access from your application, you may be able to completely automate this (cfr the Versionable plugin in Doctrine)

One way is to use a DB which supports this natively, like HBase. I wouldn't normally suggest "Change your DB server to get this one feature," but since you don't specify a DB server in your question I'm presuming you mean this as open-ended, and native support in the server is one of the best implementations of this feature.

What database system are you using? If you're using an ACID (atomicity, consistency, isolation, durability) compliant database, can't you just use the inbuilt rollback facility to go back to a previous transaction?

I solved this problem very nice by using NHibernate.Enverse
For those intersted read this:
http://nhforge.org/blogs/nhibernate/archive/2010/07/05/nhibernate-auditing-v3-poor-man-s-envers.aspx

Maintain a local copy of a table from an external database table, ADO.NET

We have built an application which needs a local copy of a table from another database. I would like to write an ado.net routine which will keep the local table in sync with the master. Using .net 2.0, C# and ADO.NET.
Please note I really have no control over the master table which is in a third party, mission critical app I don't wish to mess with.
For example Here is the master data table:
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
OtherUneededField int
OtherUneededField2 int
The local table we need to keep in sync...
ProjectCodeId Varchar(20) [PK]
ProjectCode Varchar(20)
ProjectDescrip Varchar(50)
Perhaps a better approach to this question is what have you done in the past to this type of problem? What has worked best for you or should be avoided at all costs?
My goal with this question is to determine a good way to handle this. So often I am combining data from two or more disjointed data sources. I haven't included database platforms for this reason, it really shouldn't matter. In this current situation both databases are MSSQL, but I prefer the solution not use linked databases or DTS, etc.
Sure, truncating the local table and refilling it each time from the master is an option, but with thousands of rows I don't think this is very efficient. Do you?

EDIT: First, recognize that what you are doing is hand-rolled replication and replication is never simple.
You need to track and apply all of the CRUD state changes. That said, ADO.NET can do this.
To track changes to the source you can use Query Notification with your source database. This requires special permission against the database so the owner of the source database will need to take action to enable this solution. I haven't used this technique myself, but here is a description of it.
See "Query Notifications in SQL Server (ADO.NET)"
Query notifications were introduced in
Microsoft SQL Server 2005 and the
System.Data.SqlClient namespace in
ADO.NET 2.0. Built upon the Service
Broker infrastructure, query
notifications allow applications to be
notified when data has changed. This
feature is particularly useful for
applications that provide a cache of
information from a database, such as a
Web application, and need to be
notified when the source data is
changed.
To apply changes from the source db table you need to retrieve the data from the target db table, apply the changes to the target rows and post the changes back to the target db.
To apply the changes you can either
1) Delete and reinsert all of the rows (simple), or
2) Merge row-by-row changes (hard).
Delete and reinsert is self explanatory, so I won't go into detail on that.
For row-by-row change tracking here is an approach. (I am assuming here that Query Notification doesn't give you row-by-row change information, so you have to calculate it.)
You need to determine which rows were modified and identify inserted and deleted rows. Create a DataView with a sort for each table to get a Find method you can use to lookup matching rows by ID.
Identify modified rows by using a datetime/timestamp column, or by comparing all field values. Copy modified values to the target row.
Identify added and deleted rows by looping over the respective table DataViews and using the Find method of the other DataView to identify rows that do not appear in the first table. Insert or delete rows from the target table as required. (The Delete method doesn't remove the row but marks it for deletion by the TableAdapter Update.)
Good luck!
+tom

I would push in the direction where the application that is inserting the data would insert into one db/table then the other in the same function. Make the application do the work, the db will be pushed already.

Some questions - what db platform? how are you using the data?
I'm going to assume you're just using this data as a lookup... and as you have no timestamp and no ability modify the existing table, i'd just blow away the local copy periodically and pull it down from the master table again.
Unless you've got a hell of a lot of data the overhead for this should be pretty small.
If you need to synch back to the master table, you'll need to do something a bit more exotic.

Can you use SQL replication? This would be preferable to writing code to do it no?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.