SQL recursively copy rows from multiple tables following PK FK relationships

SQL recursively copy rows from multiple tables following PK FK relationships - c#

I was given the task of creating a stored procedure to copy every piece of data associated with a given ID in our database. This data spans dozens of tables. each table may have dozens of matching rows.
example:
table Account
pk = AccountID
Table AccountSettings
FK = AccountID
Table Users
PK = UserID
FK = AccountID
Table UserContent
PK = UserContentID
FK = UserID
I want to create a copy of everything that is associated with an AccountID(which will traverse nearly every table) The copy will have a new AccountID and UserContentID but will have the same UserID. the new data needs to be in its respective table.
:) fun right?
The above is just a sample but I will be doing this for something like 50 or 60 tables.
I have researched using CTEs but am still a bit foggy on them. that may prove to be the best method. MY SQL skills are...... well I have worked with it for about 40 logged hours so far :)
Any advice or direction on where to look would be greatly appreciated. In addition, I am not opposed to doing this via C# if that would be possible or better.
Thanks in advance for any help of info.

The simplest way to solve this is the brute force way: write a very long proc that processes each table individually. This will be error-prone and very hard to maintain. But it will have the advantage of not relying on the database or database metadata to be in any particularly consistent state.
If you want something that works based on metadata, things are more interesting. You have three challenges there:
You need to programmatically identify all the related tables.
You need to generate insert statements for all 50 or 60.
You need to capture generated ids for those tables that are more than one or two steps away from the Account table, so that they can in turn be used as foreign keys in yet more copied records.
I've looked at this problem in the past, and while I can't offer you a watertight algorithm, I can give you a general heuristic. In other words: this is how I'd approach it.
Using a later version of MS Entity Framework (you said you'd be open to using C#), build a model of the Account table and all the related tables.
Review the heck out of it. If your database is like many, some of the relationships your application(s) assume will, for whatever reason, not have an actual foreign key relationship set up in the database. Create them in your model anyway.
Write a little recursive routine in C# that can take an Account object and traverse all the related tables. Pick a couple of Account instances and have it dump table name and key information to a file. Review that for completeness and plausibility.
Once you are satisfied you have a good model and a good algorithm that picks up everything, it's time to get cracking on the code. You need to write a more complicated algorithm that can read an Account and recursively clone all the records that reference it. You will probably need reflection in order to do this, but it's not that hard: all the metadata that you need will be in there, somewhere.
Test your code. Allow plenty of time for debugging.
Use your first algorithm, in step 3, to compare results for completeness and accuracy.
The advantage of the EF approach: as the database changes, so can your model, and if your code is metadata-based, it ought to be able to adapt.
The disadvantage: if you have such phenomena as fields that are "really" the same but are different types, or complex three-way relationships that aren't modeled properly, or embedded CSV lists that you'd need to parse out, this won't work. It only works if your database is in good shape and is well-modeled. Otherwise you'll need to resort to brute force.

Related

Is it OK to generate IDs of entities outside of CRM?

I need to one-way-synchronize external Data with CRM on a regular basis (i.e. nightly).
This includes creating new Records as well as updating existing ones.
That means, I have to keep track of IDs of CRM-Entities created by my synchronization process.
emphasized textI already managed to create and update records in CRM from lines of database-tables so this is not a problem.
Currently, my mapped tables have the following columns
id: The tables primary key set when inserting a new row
new_myentityid: Primary Attribute of the mapped entity, set after the record was created by the synchronization process
new_name etc.: the values of the records attributes
However, I see a way to drastically simplify the whole process:
Instead of having a PrimaryKey (id) in the database and keeping track of the CRM ID (new_myentityid) in a seperate column, I could as well get rid of the id-columns and make the CRM-ID-Column (new_myentityid) primary key of the table and set it when inserting new records (newid()), so basically substitute id with new_myentityid from a database perspective. I could then bulk-upsert via ExecuteMultipleRequest in combination with UpsertRequest.
This way, I would save a column in each mapped table as well as logic to store the CRM IDs after creating them.
Question
Would this be acceptable or is there anything that should make me avoid this?

Disclaimer: I'm not aware of a best practice for this so this is just my personal opinion on the matter having developed for Dynamics several times.
I think that using the CRM Entity GUID for your primary key is a good idea. It's less complicated and is handled well in SQL. I assume the column in your database is uniqueidentifier.
My only comment is to not generate the GUIDs yourself. Let CRM generate them for you as it does a better job at keeping everything sequential and indexed.
See this blog entry on MSDN for further detail

I'm probably a little late to this discussion but just wanted to add my tuppence worth.
There is nothing inherently wrong with specifying the GUID when creating a new record in CRM, and this behaviour is explicitly supported by the SDK.
A common real life scenario is when creating records by script; it is useful to have the same GUID for an entity in Dev, Test and Production environments (Admittedly we normally use the GUID auto generated in Dev).
The reason that it is considered best practice to allow CRM generate its own GUID (https://msdn.microsoft.com/en-us/library/gg509027.aspx) is that CRM will generate the GUID sequentially. Using newid() generates a statistically random GUID. This has a performance impact on SQL server around index maintenance. This thread provides some insight: What are the performance improvement of Sequential Guid over standard Guid?
But basically specifying your own GUID can cause the underlying SQL INSERT statement to become more expensive. Read and Update operations should remain the same.
If you are generating you own GUIDs is SQL you can always use NEWSEQUENTIALID (https://msdn.microsoft.com/en-us/library/ms189786.aspx) for a sequentially generated GUIDs.

Hi previous posts cover this well. Just to note that if you did go with generating GUIDs outside of CRM you could mitigate against the potential performance impact (INSERTS) simply by running a weekly Maintenance plan to refresh the clustered indices directly on the SQL database(s) this would I believe ensure that GUIDs were ordered sequentially. In any case, CRM/API will always be the bottleneck, so best to do things in the way that the platform expects to avoid issues later on.

Why not save in new table?
likes origin exist a table named "customer",and your new data save in "customer_update",the field as same as the origin table.
it's will help you in future.maybe you want have a look the data's orgin.

Is it possible to use Entity Framework and keep object relations in the code and out of the database

I'm having a hard time just defining my situation so please be patient. Either I have a situation that no one blogs about, or I've created a problem in my mind by lack of understanding the concepts.
I have a database which is something of a mess and the DB owner wants to keep it that way. By mess I mean it is not normalized and no relationships defined although they do exist...
I want to use EF, and I want to optimize my code by reducing database calls.
As a simplified example I have two tables with no relationships set like so:
Table: Human
HumanId, HumanName, FavoriteFoodId, LeastFavoriteFoodId, LastFoodEatenId
Table: Food
FoodId, FoodName, FoodProperty1, FoodProperty2
I want to write a single EF database call that will return a human and a full object for each related food item.
First, is it possible to do this?
Second, how?
Boring background information: A super sql developer has written a query that returns 21 tables in 20 milliseconds which contain a total of 1401 columns. This is being turned into an xml document for our front end developer to bind to. I want to change our technique to use objects and thus reduce the amount of hand coding and mapping from fields to xml (not to mention the handling of nulls vs empty strings etc) and create a type safe compile time environment. Unfortunately we are not allowed to change the database or add relationships...

If I understand you correct, it's better for you to use Entity Framework Code First Approach:
You can define your objects (entities) Human and Food
Make relations between them in code even if they don't have foreign keys in DB
Query them usinq linq-to-sql
And yes, you can select all related information in one call.

You can define the relationships in the code with Entity Framework using Fluent API. In your case you might be able to define your entities manually, or use a tool to reverse engineer your EF model from an existing database. There is some support for this built in to Visual Studio, and there are VS extensions like EF Power Tools that offer this capability.
As for making a single call to the database with EF, you would probably need to create a stored procedure or a view that returns all of the information you need. Using the standard setup with lazy-loading enabled, EF will make calls to the database and populate the data as needed.

Mapping multiple tables to the same Type/Collection

I'm working on a little project that's designed to record a lot of data, I've estimated that I need to store about 100-150 million rows of data in my database. These rows don't contain much data but are going to have frequent inserts and I want relatively quick data retrieval (this is going to be infrequent but will require rapid aggregation of the data).
From the information I've read at these sort of sizes I need to know what I'm doing and ensure Indexes etc are set up properly. What I could do however is actually split my table of data up (into roughly 250, 500k row tables).
I guess the first question is, could someone validate that this would be a good idea? From the things I've read I believe reads/inserts should be much quicker so it seems like a logical step to take.
I was also planning on using Entity Framework for this (despite the tables being quite simple) but I'm not sure if it's possible to map the same entity to lots of different tables. I've found a number of articles on mapping two tables to the same entity. So the second question is does Entity Framework allow you to map two tables to different entities of the same type?

Splitting the data into multiple separate tables is not a good idea. Databases in general and SQL Server in particular can handle large tables, even tables with hundreds of millions of rows. And, the implications of working with thousands of tables are daunting. It prevents you from setting up triggers and foreign key references. It makes security more difficult. It is daunting just to list the tables in the database.
One capability that might help you is vertical partitioning, described here. Partitions allow you to store one table in separate table spaces. This can speed queries, because only one partition may need to be read. This can speed deletes, because some deletes can be handled by dropping a partition.

Identifying dependencies among tables

Given a table, is there any way to identify all tables which are taking foreign key reference on that table?
The actual scenario goes like this. Given a database, I have a set of C# schema classes which I have to populated from data in the database and stored them onto a cache. All these schemas should always be in sync with the database.
Now, I have two ways to solve the above problem, one is that whenever a database change happens, go and update all the stored schemas which will be very costly. The other is use some heuristic based algo to correctly identify the schema which will be impacted from a db change and update those only.
In order to do implement this, I was thinking of building a dependency tree/graph kind of structure where a table T1 is called as dependent on Table T2 of T1 has a foreign key constrain on T2. So that whenever a change happens in one or more table, I can quickly iterate over the graph and says that these all schemas needs to be updated.
I know that using Data Dictionaries you can find these kind of dependencies but since I am using Entity Framework, I'm looking for a way of doing it through Entity Framework.
Also, if someone has a better approach of doing the same, share that as well.

Creating snapshot of application data - best practice

We have a text processing application developed in C# using .NET FW 4.0 where the Administrator can define various settings. All this 'settings' data reside in about 50 tables with foreign key relations and Identity primary keys (this one will make it tricky, I think). The entire database is no more than 100K records, with the average table having about 6 short columns. The system is based on MS SQL 2008 R2 Express database.
We face a requirement to create a snapshot of all this data so that the administrator of the system could roll back to one of the snapshots anytime he screws up something. We need to keep the last 5 snapshots only. Creation of the snapshot must be commenced from the application GUI and so must be the rollback to any of the snapshots if needed (use SSMS will not be allowed as direct access to the DB is denied). The system is still in development (are we ever really finished?) which means that new tables and columns are added many times. Thus we need a robust method that can take care of changes automatically (digging code after inserting/changing columns is something we want to avoid unless there's no other way). The best way would be to tell that "I want to create a snapshot of all tables where the name begins with 'Admin'". Obviously, this is quite a DB-intensive task, but due to the fact that it will be used in emergency situations only, this is something that I do not mind. I also do not mind if table locks happen as nothing will try to use these tables while the creation or rollback of the snapshot is in progress.
The problem can be divided into 2 parts:
creating the snapshot
rolling back to the snapshot
Regarding problem #1. we may have two options:
export the data into XML (file or database column)
duplicate the data inside SQL into the same or different tables (like creating the same table structure again with the same names as the original tables prefixed with "Backup").
Regarding problem #2. the biggest issue I see is how to re-import all data into foreign key related tables which use IDENTITY columns for PK generation. I need to delete all data from all affected tables then re-import everything while temporarily relaxing FK constraints and switching off Identity generation. Once data is loaded I should check if FK constraints are still OK.
Or perhaps I should find a logical way to load tables so that constraint checking can remain in place while loading (as we do not have an unmanageable number of tables this could be a viable solution). Of course I need to do all deletion and re-loading in a single transaction, for obvious reasons.
I suspect there may be no pure SQL-based solution for this, although SQL CLR might be of help to avoid moving data out of SQL Server.
Is there anyone out there with the same problem we face? Maybe someone who successfully solved such problem?
I do not expect a step by step instruction. Any help on where to start, which routes to take (export to RAW XML or keep snapshot inside the DB or both), pros/cons would be really helpful.
Thank you for your help and your time.
Daniel

We don't have this exact problem, but we have a very similar problem in which we provide our customers with a baseline set of configuration data (fairly complex, mostly identity PKs) that needs to be updated when we provide a new release.
Our mechanism is probably overkill for your situation, but I am sure there is a subset of it that is applicable.
The basic approach is this:
First, we execute a script that drops all of the FK constraints and changes the nullability of those FK columns that are currently NOT NULL to NULL. This script also drops all triggers to ensure that any logical constraints implemented in them will not be executed.
Next, we perform the data import, setting identity_insert off before updating a table, then setting it back on after the data in the table is updated.
Next, we execute a script that checks the data integrity of the newly added items with respect to the foreign keys. In our case, we know that items that do not have a corresponding parent record can safely be deleted, but you may choose to take a different approach (report the error and let someone manually handle the issue).
Finally, once we have verified the data, we execute another script that restores the nullability, adds the FKs back, and reinstalls the triggers.
If you have the budget for it, I would strongly recommend that you take a look at the tools that Red Gate provides, specifically SQL Packager and SQL Data Compare (I suspect there may be other tools out there as well, we just don't have any experience with them). These tools have been critical in the successful implementation of our strategy.
Update
We provide the baseline configuration through an SQL Script that is generated by RedGate's SQL Packager.
Because our end-users can modify the database between updates which will cause the identity values in their database to be different in ours, we actually store the baseline primary and foreign keys in separate fields within each record.
When we update the customer database and we need to link new records to known configuration information, we can use the baseline fields to find out what the database-specific FKs should be.
In otherwords, there is always a known set of field ids for well-known configuration records regardless what other data is modified in the database and we can use this to link records together.
For example, if I have Table1 linked to Table2, Table1 will have a baseline PK and Table2 will have a baseline PK and a baseline FKey containing Table1's baseline PK. When we update records, if we add a new Table2 record, all we have to do is find the Table1 record with the specified baseline PK, then update the actual FKey in Table2 with the actual PK in Table1.

A kind of versioning by date ranges is a common method for records in Enterprise applications. As an example we have a table for business entities (us) or companies (uk) and we keep the current official name in another table as follows:
CompanyID Name ValidFrom ValidTo
12 Business Lld 2000-01-01 2008-09-23
12 Business Inc 2008-09-23 NULL
The null in the last record means that this is current one. You may use the above logic and possibly add more columns to gain more control. This way there are no duplicates, you can keep the history up to any level and synchronize the current values across tables easily. Finally the performance will be great.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.