Logical migration from one DB structure to another

Logical migration from one DB structure to another - c#

I am currently in the process of creating a SQL Server CE database application in C# and I am having some logic issues that I thought maybe someone could help with.
Objective: to be able to supply an XML file to the end user, which tells the program to create a new set of tables using the supplied structure (new tables with tmp_ prefix). Existing data then needs to be moved from the old tables to the new ones (with new structure), then the old tables need to be dropped.
I've written too much code to be able to paste it here, so I'm going to break it down into logical steps (as it is a logical issue, not a compiler issue).
Get new database structure from supplied XML file, read into datatable [DONE]
Dynamically concatenate a SQL query to create new table with tmp_ prefix [DONE]
Compare new structure with old structure, move relevant data across [NOT DONE]
I am having problems with the logical approach to step 3. Basically I need to move data from an old structure to the new structure - ignoring old columns which do not appear in the new set of columns, and entering blank data for new columns which do not appear in the list of old columns. I have need to adhere to the new column schema, such as datatype, max length, etc etc. This is seriously making my head hurt as I'm very new to C#. Does anyone have ideas as the best way to approach this?
Thanks in advance!

Related

Adding Entity Objects to LINQ-to-SQL Data Context at Runtime - SQL, C#(WPF)

I've hit a wall when it comes to adding a new entity object (a regular SQL table) to the Data Context using LINQ-to-SQL. This isn't regarding the drag-and-drop method that is cited regularly across many other threads. This method has worked repeatedly without issue.
The end goal is relatively simple. I need to find a way to add a table that gets created during runtime via stored procedure to the current Data Context of the LINQ-to-SQL dbml file. I'll then need to be able to use the regular LINQ query methods/extension methods (InsertOnSubmit(), DeleteOnSubmit(), Where(), Contains(), FirstOrDefault(), etc...) on this new table object through the existing Data Context. Essentially, I need to find a way to procedurally create the code that would otherwise be automatically generated when you do use the drag-and-drop method during development (when the application isn't running), but have it generate this same code while the application is running via command and/or event trigger.
More Detail
There's one table that gets used a lot and, over the course of an entire year, collects many thousands of rows. Each row contains a timestamp and this table needs to be divided into multiple tables based on the year that the row was added.
Current Solution (using one table)
Single table with tens of thousands of rows which are constantly queried against.
Table is added to Data Context during development using drag-and-drop, so there are no additional coding issues
Significant performance decrease over time
Goals (using multiple tables)
(Complete) While the application is running, use C# code to check if a table for the current year already exists. If it does, no action is taken. If not, a new table gets created using a stored procedure with the current year as a prefix on the table name (2017_TableName, 2018_TableName, 2019_TableName, and so on...).
(Incomplete) While the application is still running, add the newly created table to the active LINQ-to-SQL Data Context (the same code that would otherwise be added using drag-and-drop during development).
(Incomplete) Run regular LINQ queries against the newly added table.
Final Thoughts
Other than the above, my only other concern is how to write the C# code that references a table that may or may not already exist. Is it possible to use a variable in place of the standard 'DB_DataContext.2019_TableName' methodology in order to actually get the table's data into a UI control? Is there a way to simply create an Enumerable of all the tables where the name is prefixed with a year and then select the most current table?
From what I've read so far, the most likely solution seems to involve the use of a SQL add-on like SQLMetal or Huagati which (based solely from what I've read) will generate the code I need during runtime and update the corresponding dbml file. I have no experience using these types of add-ons, so any additional insight into these would be appreciated.
Lastly, I've seen some references to LINQ-to-Entities and/or LINQ-to-Objects. Would these be the components I'm looking for?
Thanks for reading through a rather lengthy first post. Any comments/criticisms are welcome.

The simplest way to achieve what you want is to redirect in SQL Server, and leave your client code alone. At design-time create your L2S Data Context, or EF DbContex referencing a database with only a single table. Then at run-time substitue a view or synonym for that table that points to the "current year" table.
HOWEVER this should not be necessary in the first place. SQL Server supports partitioning, so you can store all the data in a physically separate data structures, but have a single logical table. And SQL Server supports columnstore tables, which can compress and store many millions of rows with excellent performance.

Is there any way to add new column to an NetTiers model without using codesmith?

I have to change some legacy code that was generated with CodeSmith using NetTiers templates. I need to add some new columns and I don't have the original template neither a CodeSmith licence. Is there any way to add them without use CodeSmith?

Yes, you definitely can. Nettiers is nothing more than automatically generated c# code, there is nothing special about the resulting code, you can modify it to your hearts content.
That said, manual modification of the type you are talking about it going to be a bit time intensive. If this is not a one off, I would highly suggest getting Code Smith and trying to regenerate.
In order to manually accomplish your goal, you will need to modify the entity class itself, also all get and save methods that you want to use the new columns, and finally the procedure xml and the stored procedures themselves. It's the same process as if the entire DAL was manually written in c#.
Another option you have is to add the new columns to the end of the tables, then use some other DAL to manage the data in them. As long as the new columns are only added to the end of the table nettiers will completely ignore them.
If your nettiers is using stored procedures to access the data, then as long as the column positions inside the stored procedures don't change the column positions themselves wouldn't matter either. I haven't tested to see if that holds true for parameterized queries built by nettiers though.

Object approach to database which has variable scheme

I am developing a C# app where I have to read/write existing MS SQL database. I decided to use object class for the database but the table columns can be changed during runtime and that causes an exception because of an attempt to write a new row (in the case of a new not null column).
Is there any recommendation how to preserve object approach to the database and deal with variable database tables? It is not necessary to have the object updated in the runtime, just to handle the new columns - fill them with a valid default value.
More details to my solution:
I used Data Source Configuration Wizard in VS2015 what generates objects for the database and everything is fine. When a table has a new column I have to run the wizard again to update the objects and define appropriate new value.
I can't modify anything in the database structure (existing ERP system). The database is huge (hundreds of tables, each has around 60+ columns) so I am looking for the automated ways how to generate the database objects.
I hope I just overlooked (as a newbie) some obvious solution.
Thanks for all suggestions in advance.
Petr

I would recommend to do the following:
Create a set of import tables with the needed columns and leave those tables fixed
Let your application copy data to the import tables
Update the production tables on the database from the import tables with a stored procedure

Data transferring, from .csv to db. Which one is the best way?

At the moment I'm working on a quite tricky transferring from a .csv file to DB. I have to develop a package/solution/xxxyyy that handles a flow of data from this .csv file to my SQL Server DB (the .csv is updated with new data everyday).
The approach that my boss "suggested" I should use is through SSIS (normally I would have wrote some kind of "parser" to easily convoy the data from the .csv). The fact is that I have quite a bit of transformation to do.
i.e.
An employee has this fields:
name;surname;id;roles
The field "roles" is formatted like this:
role1,role2,role3
This relationship in my db is mapped in 3 different tables:
tblEmployee
PK_Emp | name | surname
tblRoles
PK_Role | roleName
tblEmployeeRole
PK_Emp | PK_Role
So, from the .csv I have to extract the roles of a single employee, insert those in tblRoles (checking that there's no duplicate). Then I have to manage the relationship in tblEmployeeRole.
Considering that this is just an example of one of the different transformations that I have to manage I was wondering if SSIS is the best tool to achieve my goal (loads if script components). When I explained my perplexities to my boss he came up with this "idea":
Use SSIS to transfer the data, as they are, in a temporary table then handle the different transformations through stored procedures.
From the very little I know about stored procedure, I'm not sure that I should follow this idea.
Now, considering that my actual superior isn't that enlightened project manager (he usually mess up our work with bizarre ideas) and considering the fact that I'm not such an expert neither in SSIS nor in stored procedure, I've decided to write here and see if anyone can explain me if one of the previous approaches is the right one or if I have to consider some other (better) solution.
Sorry for my poor English, ty for any help =)

I would insert the data from the CSV file as-is.
Then do any parsing in the database end. If this is something that has to be done often I would then take any scripts you have made to do this and create procedures/functions from that. This question is a bit grand-scheme so this is only a general solution. If you need help doing the parsing of the roles into the look up tables then that would be more specific and of better use.
In general when I work with massive flat-file data sets that need to be parsed into a SQL structure:
Import the data as-is
Find the commonalities among the look up codes
Create the base look up tables (in your case it would be tblRoles)
Create a script to insert into both tblEmployee and tblEmployee role
Once my test scenarios work then I worry about combining each component step into one monolithic SSIS or stored procedure.
I suggest something similar here. Break this import task into small pieces and worry about the grand design later. SSIS, procs, compiled code...any of these might work for you. You just need to know what you need it to do.

Depending upon your transformations they can all be done within SSIS. If you don't need to store the raw .csv data, I would stay away from stored procedures and temporary tables as you are bypassing a large portion of SSIS's strengths.
As an example, you can do look-ups on your incoming data to determine proper relationships and insert those results into multiple tables (your 3 in the example).

Looks like the task is very suitable for bcp utility or BULK INSERT command

SaaS application needs to export/backup data to individual customer sites

We have a cloud based SaaS application and many of our customers (school systems) require that a backup of their data be stored on-site for them.
All of our application data is stored in a single MS SQL database. At the very top of the "hierarchy" we have an "Organization". This organization represents a single customer in our system. Each organization has many child tables/objects/data. Each having FK relationships that ultimately end at "Organization".
We need a way to extract a SINGLE customer's data from the database and bundle it in some way so that it can be downloaded to the customers site. Preferably in a SQL Express, SQLite or an access database.
For example: Organization -> Skill Area -> Program -> Target -> Target Data are all tables in the system. Each one linking back to the parent by a FK. I need to get all the target data, targets, programs and skill areas per organization and export that data.
Does anyone have any suggestions about how to do this within SQL Server, a C# service, or a 3-rd party tool?
I need this solution to be easy to replicate for each customer who wants this feature "turned on"
Ideas?

I'm a big fan of using messaging to propagate data at the moment, so here's a message based solution that will allow external customers to keep a local, in sync copy of the data which you provide on the web.
The basic architecture would be an online, password secured and user specific list of changes which have occurred in the system.
At the server side this list would be appended to any time there was a change to an entity which is relevant to the specific customer.
At the client would run an application which checks the list of changes for any it hasn't yet received and then applies them to its local database (in the order they occurred).
There a a bunch of different ways of doing the list based component of the system but my gut feeling is that you would be best to use something like RSS to do this.
Below is a practical scenario of how this could work:
A new skill area is created for organisation "my org"
The skill is added to the central database and associated with the "my org" reccord
A SkillAreaExists event is also added at the same time to the "my org" RSS with JSON or XML data specifying the properties of the new skill area
A new program is added to the skill area that was just created
The program is added to the central database and associated with the skill area
A ProgramExists event is also added at the same time to the "my org" RSS with JSON or XML data specifying the properties of the new program
A SkillAreaHasProgram event is also added at the same time to the "my org" RSS with JSON or XML data specifying an identifier for the skill area and program
The client agent checks the RSS feed and sees the new messages and processes them in order
When the SkillAreaExists event is processed a new Skill area is added to the local DB
When the ProgramExists event is processed a new Program is added to the local DB
When the SkillAreaHasProgram event is processed the program is linked to the skill area
This approach has a whole bunch of benefits over traditional point in time replication.
Its online, a consumer of this can get realtime updates if required
Consistancy is maintained by order, at any point in time in the event stream if you stop receiving events you have a local DB which accuratly reflects the central DB as at some point in time.
Its diff based, you only need to recieve changes
Its auditable, you can see whats actually happened not just the current state.
Its easily recoverable, if there's a data consistency issue you can revert the entire DB by replaying the event stream.
It allows for multiple consumers, lots of individual copies of the clients info can exist and function autonomously.
We have had a great deal of success with these techniques for replicating data between sites especially when they are only sometimes online.

While there are some very interesting enterprise solutions that have been suggested, I think my approach would be to develop a plane old scheduled backup solution that simply exports the data for each organisation with a stored procedure or just a number of select statements.
Admittedly you'll have to keep this up to date as your database schema changes but if this is a production application I cant imagine that happens very drastically.
There are any number of technologies available to do this, be it SSIS, a custom windows service, or even something as rudimentary as a scheduled task that kicks off a stored procedure from the command line.
The format you choose to export to is entirely up to you and should probably be driven by how the backup is intended to be used. I might consider writing data to a number of CSV files and zipping the result such that it could be imported into other platforms should the need arise.
Other options might be to copy data across to a scratch database and then simply create a SQL backup of that database.
However you choose to go about it, I would encourage you to ensure that the process is well documented and has as much automated installation and setup as possible. Systems with loosely coupled dependencies such as common file locations or scheduled tasks are prone to getting tweaked and changed over time. Without those tweaks and changes being recorded you can create a system that works but can't be replicated. Soon no one wants to touch it and no one remembers exactly how it works. When it eventual needs changing, or worse it breaks, you have to start reverse engineering before you can fix it.
In a cloud based environment this is especially important because you want to be able to deploy as quickly as possible. If there is a lot of configuration that needs to be done you're likely to make mistakes or just be inconsistent. By creating a nuke-and-repave deployment you have a single point that you can change installation and configuration, safe in the knowledge that the change will be consistent across any deployment.

From what i understand, you have one large database for all the clients, you use relations which lead to the table organization to know which data for which client, and you want to backup the data based on client => organization.
To backup the data you can use one of the following methods:
As the comments from #Phil, and #Kris you can use SSIS for automated backup, check this link for structure backup, and check this link for how to Export a Query Result to a File using SSIS and instead of file do it to access or SQL Server database.
Build an application\service using C# to select the data and export it manually, need time but customization has no limits.

Have you looked at StreamInsight?
http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/complex-event-processing.aspx

When I've had to deal with backups of relational data in the past (in MySQL which isn't super different in terms of capability from MSSQL that you're running) is to create a backup "package" file which is essentially a zip file with a different file extension so that windows won't let users open it.
If you really want to get fancy, encrypt the file after zipping it and change the extension. I presume you're using ASP for your SaaS and since I'm a PHP-geek, I can't help too much with the code side of things, but the way I've handled this before was for a script that would package an entire Joomla site and Database for migration to a new server.
//open the MySQL connection
$dbc = mysql_connect($cfg->host,$cfg->user,$cfg->password);
//select the database
mysql_select_db($cfg->db,$dbc);
output( 'Getting database tables
');
//get all the tables in the database
$tables = array();
$result = mysql_query('SHOW TABLES',$dbc);
while($row = mysql_fetch_row($result)) {
$tables[] = $row[0];
}
output( 'Found '.count($tables).' tables to be migrated.
Exporting tables:
');
$return = "";
//cycle through the tables and get their create statements and data
foreach($tables as $table) {
$result = mysql_query('SELECT * FROM '.$table);
$num_fields = mysql_num_fields($result);
$return.= 'DROP TABLE IF EXISTS '.$table.";\n";
$row2 = mysql_fetch_row(mysql_query('SHOW CREATE TABLE '.$table));
$return.= $row2[1].";\n";
while($row = mysql_fetch_row($result)) {
$return.= 'INSERT INTO '.$table.' VALUES(';
for($j=0; $j<$num_fields; $j++) {
$row[$j] = mysql_escape_string($row[$j]);
$row[$j] = ereg_replace("\n","\\n",$row[$j]);
if (!empty($row[$j])) {
$return.= "'".$row[$j]."'" ;
} else {
$return.= "NULL";
}
if ($j<($num_fields-1)) {
$return.= ',';
}
}
$return.= ");\n";
}
}
That's the relevant portion of the code in PHP that loops the database structure and stores the recreation script in $result which can then be output to a file.
In your case, you don't want to recreate the databases, but rather the data itself. You've compounded the issue slightly since you have a SaaS that is prone to possible data structure changes which you'll need to be able to account for. My suggestion would be this then:
Use a similar system to the above to dump the relevant data from the individual tables. I'm simply pulling all the data, but you could pull only the parts that pertain to the individual user by using JOIN statements and whatnot. Dump the contents of each table's insert/replace statements into a file named after the table. Create a file called manifest.xml or something of that sort and populate it with the current version of your SaaS application, name/information, unique ID, etc of the client exporting the data.
Package all those files into a ZIP file, change the extension to whatever you want, encrypt it if you desire, etc. Let them download that backup file and you're set.
In your import script, you will need to read the version number of the exported data and compare it to some algorithm that can handle remapping the data based on revisions you make later on. This way if you need to re-import one of their backups later, you can correctly handle transitioning the data from when they pulled the backup to the current structure of the data in that table now.
Hopefully that helps ;)

Because you keep all the data in just one database, it will always be difficult to export/backup data on customer basis.
Even if you implement such scenario now, you will end up with two different places you need to maintain/change/test every time you change the database schema (fixing bugs, adding new features, optimization, etc).
I would recommend you to partition the data, say, by using a database per organization. Then you change your application just once (mainly around building a connection string for the specified organization), and then you can safely export/backup each database separately in a way you want it.
It also gives you a lot of extra benefits "for free" such as scalability and the ability to dedicate resources on per-organization base (whether it is needed in the future).
Say, you have a set of small and low priority (from a business point of view) organizations, and a big and high priority one. So you will be able to keep a set of small low priority databases on one server, but dedicate another one for that specific important big one.
Or if your current DB server is overloaded (perhaps you have A LOT of data and A LOT of requests to the database), you can simply get another cheap server and move half of the load without any changes in your system...
You still need to write something in order to split the existing big database into several small ones, but you do it just once, and after it is done this "migration tool" can be thrown away so you don't need to support it anymore.

Have you tried SyncFramework?
Have a look at this article!
It explains how to sync filtered data between databases using Sync Framework.
You can sync to the customer's database or sync to your own empty db and then export it as a file.

Did you thought about using an ORM? (Object Relational Mapper)
I know, and use, LLBLGen Pro (so I can talk only about the feature of this specific ORM)
Anyway, with LLBLGen you can reverse-engineer the DB and create a hierarchy of class that map the tables and relations of your DB.
Now If all the data of a customer is reachable via relations, I can tell to my ORM framework to load a single costumers (1 row of a specific table) and then load all the related data in the related table.
If the data is not too complex, it should be possible.
If you have hundreds of self referenced tables or strange relations, it may be undoable, it depend upon your data.
If all the data of a single customer is, say, 10'000 rows in 100 tables, it will probably work.
If all the data of is 100'000 rows in 1000 tables it "may" work if you have some times, and a lot of memory.
If all the data is 10'000'000 you probably cant load it all at once, and you'll need a more efficient way.
Anyway, if you can load all the data at once, then you'll have a nice "in memory" graph with all the data of a single customer, and then you can serialize this data, or project it on a dataset (obtaining a set of datatable/relations) and then serialize the dataset.
Using an ORM to load and export all the data of a single customer as explained, probably, is not the most efficient way of doing things, but when doable it's a simple and cheap way.
Naturally, with or without ORM, you can find hundreds of different way to export this data :-)

For you design, you should have sharded your database for customers.
However, as you have already developed the database design, I suggest you to create a temp database and create the new tables in this temp database using the FK relation.
For this, you need to sort the tables based on the FK relationship and create them in the temp database.
Then, select the table data from the source database and insert them in the temp database.
You can also use this technique to shard your database and revamp your database design.
Aravind

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.