Migrating Db and Entity framework - c#

I have to take data from an existing database and move it into a new database that has a new design. So the new database has other columns and tables than the old one.
So basically I need to read tables from the old database and put that data into the new structure, some data won't be used anymore and other data will be placed in other columns or tables etc.
My plan was to just read the data from the old database with basic queries like
Select * from mytable
and use Entity Framework to map the new database structure. Then I can basically do similar to this:
while (result.Read())
{
context.Customer.Add(new Customer
{
Description = (string) result["CustomerDescription"],
Address = (string) result["CuAdress"],
//and go on like this for all properties
});
}
context.saveChanges();
I think it is more convenient to do it like this to avoid writing massive INSERT-statements and so on, but is there any problems in doing like this? Is this considered bad for some reason that I don't understand. Poor performance or any other pitfalls? If anyone has any input on this it would be appreciated, so I don't start with this and it turns out to be a big no-no for some reason.

Something that you could perhaps also try, is merely to write a new DBContext class for the new target database.
Then simply write a console application with a static method which copies entities and properties from the one context to the other.
This will ensure that your referential integrity remains intact and saves you a lot of hassle in terms of having to write SQL code, since EF does all the heavy lifting for you in this regard.
If the dbContext contains a lot of entity dbsets I recommend that you use some sort of automapper.
But, this depends on the amount of data that you are trying to move. If we are talking terrabytes, I would rather suggest you do not take this approach.

Related

Manipulating large quantities of data in ASP.NET MVC 5

I am currently working towards implementing a charting library with a database that contains a large amount of data. For the table I am using, the raw data is spread out across 148 columns of data, with over 1000 rows. As I have only created models for tables that contain a few columns, I am unsure how to go about implementing a model for this particular table. My usual method of creating a model and using the Entity Framework to connect it to a database doesn't seem practical, as implementing 148 properties for each column does not seem like an efficient method.
My questions are:
What would be a good method to implement this table into an MVC project so that there are read actions that allow one to pull the data from the table?
How would one structure a model so that one could read 148 columns of data from it without having to declare 148 properties?
Is the Entity Framework an efficient way of achieving this goal?
Entity Framework Database First sounds like the perfect solution for your problem.
Data first models mean how they sound; the data exists before the code does. Entity Framework will create the models as partial classes for you based on the table you direct it to.
Additionally, exceptions won't be thrown if the table changes (as long as nothing is accessing a field that doesn't exist), which can be extremely beneficial in a lot of cases. Migrations are not necessary. Instead, all you have to do is right click on the generated model and click "Update Model from Database" and it works like magic. The whole process can be significantly faster than Code First.
Here is another tutorial to help you.
yes with Database First you can create the entites so fast, also remember that is a good practice return onlye the fiedls that you really need, so, your entity has 148 columns, but your app needs only 10 fields, so convert the original entity to a model or viewmodel and use it!
One excelent tool that cal help you is AutoMapper
Regards,
Wow, that's a lot of columns!
Given your circumstances a few thoughts come to mind:
1: If your problem is the leg work of creating that many properties you could look at Entity Framework Power Tools. EF Tools is able to reverse engineer a database and create the necessary models/entity relation mappings for you, saving you a lot of the grunt work.
To save you pulling all of that data out in one go you can then use projections like so:
var result = DbContext.ChartingData.Select(x => new PartialDto {
Property1 = x.Column1,
Property50 = x.Column50,
Property109 = x.Column109
});
A tool like AutoMapper will allow you to do this with ease via simply configurable mapping profiles:
var result = DbContext.ChartingData.Project().To<PartialDto>().ToList();
2: If you have concerns with the performance of manipulating such large entities through Entity Framework then you could also look at using something like Dapper (which will happily work alongside Entity Framework).
This would save you the hassle of modelling the entities for the larger tables but allow you to easily query/update specific columns:
public class ModelledDataColumns
{
public string Property1 { get; set; }
public string Property50 { get; set; }
public string Property109 { get; set; }
}
const string sqlCommand = "SELECT Property1, Property50, Property109 FROM YourTable WHERE Id = #Id";
IEnumerable<ModelledDataColumns> collection = connection.Query<ModelledDataColumns>(sqlCommand", new { Id = 5 }).ToList();
Ultimately if you're keen to go the Entity Framework route then as far as I'm aware there's no way to pull that data from the database without having to create all of the properties one way or another.

Each row of a DataTable as a Class?

I have a "location" class. This class basically holds addresses, hence the term "location". I have a datatable that returns multiple records which I want to be "locations".
Right now, I have a "load" method in the "location" class, that gets a single "location" by ID. But what do I do when I want to have a collection of "location" objects from multiple datatable rows? Each row would be a "location".
I don't want to go to the database for each record, for obvious reasons. Do I simply create a new instance of a location class, assigning values to the properties, while looping through the rows in the datatable bypassing the "load" method? It seems logical to me, but I am not sure if this is the correct/most efficient method.
That is (your description) pretty much how a row (or a collection of rows) of data gets mapped to a C# biz object(s). But to save yourself a lot of work you should consider one of a number of existing ORM (object relational mapper) frameworks such as NHibernate, Entity Framework, Castle ActiveRecord etc.
Most ORMs will actually generate all the boilerplate code where rows and fields are mapped to your .NET object properties and vice-versa. (Yes, ORMs allow you to add, update and delete db data just as easily as retrieving and mapping it.) Do give the ORMs a look. The small amount of learning (there is some learning curve with each) will pay off very shortly. ORMs are also becoming quite standard and indeed expected in any application that touches an RDBMS.
Additionally these links may be of interest (ORM-related):
Wikipedia article on ORMs
SO Discussion on different ORMs
Many different .NET ORMs listed
You're on the right track, getting all the locations you need with one trip to the database would be best in terms of performance.
To make your code cleaner/shorter, make a constructor of your Location class that takes a DataRow, which will then set your properties accordingly. By doing this, you'll centralize your mapping from columns to properties in one place in your code base, which will be easy to maintain.
Then, it's totally acceptable to loop through the rows in your data table and call your constructor.
You could also use an object to relational mapper, like Entity Framework to do your database interaction.
Create a method that returns an IEnumerable . In this method do your database stuff and I often pass in the sqldatareader into the location constructor. So I would have something like this
public static IEnumerable<location> GetLocations()
{
List<location> retval = new List<location>();
using(sqlconnection conn = new sqlconn(connection string here);
{
sqlcommand command = new sqlcommand(conn, "spLoadData");
command.commandtype=stored proc
SqlDataReader reader = command.executereader();
while(reader.read())
{
retval.add(new location(reader));
}
}
return retval;
}
obviously that code won't work but it's just to give you an idea.
An ORM mapper could save you loads of time if you have lots to do however!

NHibernate Mapping Sanity Checks

Currently our new database design is changing rapidly and I don't always have time to keep up to date with the latest changes being made. Therefore I would like to create some basic integration tests that are basically sanity checks on my mappings against the database.
Here are a few of the things I'd like to accomplish in these tests:
Detect columns I have not defined in my mapping but exist in the database
Detect columns I have mapped but do NOT exist in the database
Detect columns that I have mapped where the data types between the database and my business objects no longer jive with each other
Detect column name changes between database and my mapping
I found the following article by Ayende but I just want to see what other people out there are doing to handle these sort of things. Basically I'm looking for simplified tests that cover a lot of my mappings but do not require me to write seperate queries for every business object in my mappings.
I'm happy with this test, that comes from the Ayende proposed one:
[Test]
public void PerformSanityCheck()
{
foreach (var s in NHHelper.Instance.GetConfig().ClassMappings)
{
Console.WriteLine(" *************** " + s.MappedClass.Name);
NHHelper.Instance.CurrentSession.CreateQuery(string.Format("from {0} e", s.MappedClass.Name))
.SetFirstResult(0).SetMaxResults(50).List();
}
}
I'm using plain old query since this version comes from a very old project and I'm to lazy to update with QueryOver or Linq2NH or something else...
It basically ping all mapped entities configured and grasp some data too in order to see that all is ok. It does not care if some field exists in the table but not on the mapping, that can generate problem in persistence if not nullable.
I'm aware that Fabio Maulo has something eventually more accurate.
As a personal consideration, if you are thinking on improvement, I would try to implement such a strategy: since mapping are browsable by API, look for any explicit / implicit table declaration in the map, and ping it with the database using the standard schema helperclasses you have inside NH ( they eventually uses the ADO.NET schema classes, but they insulate all the configuration stuff we already did in NH itself) By playng a little with naming strategy we can achieve a one by one table field check list. Another improvement can be done by, in case of unmatching field, looking for a candidate by applying Levensthein Distance to all the available names and choosing one if some threshold requisites are satisfied. This of course is useless in class first scenarios when the DB schema are generated by NH itself.
I use this one too:
Verifying NHibernate Entities Contain Only Virtual Members

Database Best-Practices for Beginners

So, I am a fairly new programmer working towards an undergraduate Comp Sci degree with a very small amount of work experience. In looking for internship-type jobs for my program, I have noticed that what I've heard from several profs -- "working with databases makes up 90% of all modern computer science jobs" -- looks like it is actually true. However, my program doesn't really have any courses with databases until 3rd year, so I'm trying to at least learn some things myself in the mean time.
I've seen very little on SO and the internet in general for somebody like myself. There seem to be tons of tutorials on the mechanics of how to read and write data in a database, but little on the associated best practices. To demonstrate what I am talking about, and to help get across my actual question, here is what can easily be found on the internet:
public static void Main ()
{
using (var conn = new OdbcConnection())
{
var command = new OdbcCommand();
command.Connection = conn;
command.CommandText = "SELECT * FROM Customer WHERE id = 1";
var dbAdapter = new OdbcDataAdapter();
dbAdapter.SelectCommand = command;
var results = new DataTable();
dbAdapter.Fill(results);
}
// then you would do something like
string customerName = (string) results.Rows[0]["name"];
}
And so forth. This is pretty simple to understand but obviously full of problems. I started out with code like this and quickly started saying things like "well it seems dumb to just have SQL all over the place, I should put all that in a constants file." And then I realized that it was silly to have those same lines of code all over the place and just put all that stuff with connection objects etc inside a method:
public DataTable GetTableFromDB (string sql)
{
// code similar to first sample
}
string getCustomerSql = String.Format(Constants.SelectAllFromCustomer, customerId);
DataTable customer = GetTableFromDB(getCustomerSql);
string customerName = (string) customer.Rows[0]["name"];
This seemed to be a big improvement. Now it's super-easy to, say, change from an OdbcConnection to an SQLiteConnection. But that last line, accessing the data, still seemed awkward; and it is still a pain to change a field name (like going from "name" to "CustName" or something). I started reading about using typed Data sets or custom business objects. I'm still kind of confused by all the terminology, but decided to look into it anyway. I figure that it is stupid to rely on a shiny Database Wizard to do all this stuff for me (like in the linked articles) before I actually learn what is going on, and why. So I took a stab at it myself and started getting things like:
public class Customer
{
public string Name {get; set;}
public int Id {get; set;}
public void Populate ()
{
string getCustomerSql = String.Format(Constants.SelectAllFromCustomer, this.Id);
DataTable customer = GetTableFromDB(getCustomerSql);
this.Name = (string) customer.Rows[0]["name"];
}
public static IEnumerable<Customer> GetAll()
{
foreach ( ... ) {
// blah blah
yield return customer;
}
}
}
to hide the ugly table stuff and provide some strong typing, allowing outside code to just do things like
var customer = new Customer(custId);
customer.Populate();
string customerName = customer.Name;
which is really nice. And if the Customer table changes, changes in the code only need to happen in one place: inside the Customer class.
So, at the end of all this rambling, my question is this. Has my slow evolution of database code been going in the right direction? And where do I go next? This style is all well and good for small-ish databases, but when there are tons of different tables, writing out all those classes for each one would be a pain. I have heard about software that can generate that type of code for you, but am kind of still confused by the DAL/ORM/LINQ2SQL/etc jargon and those huge pieces of software are kind of overwhelming. I'm looking for some good not-overwhelmingly-complex resources that can point me in the right direction. All I can find on this topic are complex articles that go way over my head, or articles that just show you how to use the point-and-click wizards in Visual Studio and such. Also note that I'm looking for information on working with Databases in code, not information on Database design/normalization...there's lots of good material on that out there.
Thanks for reading this giant wall of text.
Very good question indeed and you are certainly on the right track!
Being a computer engineer myself, databases and how to write code to interact with databases was also never a big part of my university degree and sure enough I'm responsible for all the database code at work.
Here's my experience, using legacy technology from the the early 90s on one project and modern technology with C# and WPF on another.
I'll do my best to explain terminology as I go but I'm certainly not an expert myself yet.
Tables, Objects, and Mappings Oh My!
A database contains tables but what really is that? It's just flat data related to other flat data and if you dive in and start grabbing things its going to get messy quickly! Strings will be all over the place, SQL statements repeated, records loaded twice, etc... It's therefore generally a good practice to represent each table record ( or collection of tables records depending on their relationships ) as an single object, generally referred to as a Model. This helps to encapsulate the data and provide functionality for maintaining and updating its state.
In your posting your Customer class would act as the Model! So you've already realized that benefit.
Now there are a variety of tools/frameworks (LINQ2SQL, dotConnect, Mindscape LightSpeed) that will write all your Model code for you. In the end they are mapping objects to relational tables or O/R mapping as they refer to it.
As expected when your database changes so do your O/R mappings. Like you touched on, if your Customer changes, you have to fix it in one place, again why we put things in classes. In the case of my legacy project, updating models consumed a lot of time because their were so many, while in my newer project it's a few clicks BUT ultimately the result is the same.
Who should know what?
In my two projects there has been two different ways of how objects interact with their tables.
In some camps, Models should know everything about their tables, how to save themselves, have direct shared access to the connection/session and can perform actions like Customer.Delete() and Customer.Save() all by themselves.
Other camps, put reading, writing, deleting, logic in a managing class. For example, MySessionManager.Save( myCustomer ). This methodology has the advantage of being able to easily implement change tracking on objects and ensuring all objects reference the same underlying table record. Implementing it however is more complex than the previously mention method of localized class/table logic.
Conclusion
You're on the right track and in my opinion interacting with databases is extremely rewarding. I can remember my head spinning when I first started doing research myself.
I would recommend experimenting a bit, start a small project maybe a simple invoicing system, and try writing the models yourself. After that try another small project and try leveraging a database O/R mapping tool and see the difference.
Your evolution is definitely in the right direction. A few more things to consider:
Use prepared statements versus String.Format to bind your parameters. This will protect you from SQL injection attacks.
Use the DBProviderFactory and System.Data.Common inferfaces to further disconnect your implementation from a specific database.
After that, look at methods to generate your SQL commands and map data into objects automatically. If you don't want to jump into a big complex ORM, look for simple examples: ADO.NET ORM in 10 minutes, Light ORM library, or Creating an ORM in .NET. If you decide to go this route, you'll ultimately be better served by a mature library like the Entity Framework, Hibernate, or SubSonic.
My advice if you want to learn about databases, the first step is forget about the programming language, next, forget about which database you are using and learn SQL. Sure there are many differences between mySQL, MS SQLserver and Oracle but there is so much that is the same.
Learn about joins, select as, date formats, normalization. Learn what happens when you have millions and millions of records and things start to slow down, then learn to fix it.
Create a test project related to something that interests you, for example a bike store. See what happens when you add a few million products, and a few million customers and think of all the ways the data needs to relate.
Use a desktop app for running queries on a local database (sequel pro, mysql workbench etc) as it's much quicker than uploading source code to a server. And have fun with it!
IMHO, you're definitely going in the right direction for really nice to work with maintainable code! However I'm not convinced the approach will scale to a real app. A few thoughts that may be helpful
While the code you're writing will be really nice to work with and really maintainable, it involves a lot of work up-front, this is part of the reason the wizards are so popular. They aren;t the nicest thing to work with, but save a lot of time.
Querying from the database is just the beginning; another reason for the use of typed datasets and wizards in general is that in most applications, users are at some stage going to edit your information and send it back for updating. Single records are fine, but what if your data is best represented in a Normalised way with a hierarchy of tables 4 deep? Writing code to auto-generate the update/insert/delete statements by hand for all that call be hellish, so tools are the only way forward. typed DataSets will generate all the code to perform these updates for you and have some very powerful functionality for handling disconnected (e.g. Client-side) updates/rollbacks of recent modifications.
What the last guys said about SQL injection (which is a SERIOUSLY big deal in industry) and protecting yourself by using a DBCommand object and adding DbParameters.
In general there's a really big problem in going from code to databases referred to as an impedance mismatch. Bridging the gap is very tricky and that's why the majority of industry relies on tools to do the heavy lifting. My advice would be to try the wizards out - because while stepping through a wizard is no test in skill, learning all their drawbacks/bugs and their various workarounds is a really useful skill in industry, and will allow you to get to some more advanced scenarios in data management more quickly (e.g. the disconnected update of a 4-deep table hierarchy I mentioned).
If you're a bit scared of things like Linq to SQL and the Entity Framework, you could step half way in between and explore something like iBATIS.NET. It is simply a data mapper tool that takes some of the pain of the database connection management and mapping your result sets to custom domain objects.
You still have to write all of your object classes and SQL, but it maps all of your data to the classes for you using reflection, and you don't have to worry about all of the underlying connectivity (you could easily write a tool to generate your classes). When you're up and running with iBATIS (assuming you might be interested), your code will start to look like this:
var customer = Helpers.Customers.SelectByCustomerID(1);
That SelectByCustomerID function exists inside the Customers mapper, whose definition might look like:
public Customer SelectByCustomerID(int id)
{
Return Mapper.QueryForObject<Customer>("Customers.SelectByID", id);
}
The "Customers.SelectByID" maps to an XML statement definition where "Customers" is the namespace and "SelectByID" is the ID of the map containing your SQL:
<statements>
<select id="SelectByID" parameterClass="int" resultClass="Customer">
SELECT * FROM Customers WHERE ID = #value#
</select>
</statements>
Or when you want to change a customer you can do things like:
customer.FirstName = "George"
customer.LastName = "Costanza"
Helpers.Customers.Update(customer);
LINQ to SQL and the Entity Framework get fancier by producing the SQL for you automatically. I like iBATIS because I still have full control of the SQL and what my domain objects look like.
Check out iBATIS (now migrated to Google under the name MyBatis.NET). Another great package is NHibernate, which is a few steps ahead of iBATIS and closer to a full ORM.
Visual page of database just with combobox and datagrid
namespace
TestDatabase.Model
{
class Database
{
private MySqlConnection connecting;
private MySqlDataAdapter adapter;
public Database()
{
connecting = new MySqlConnection("server=;uid=;pwd=;database=;");
connecting.Open();
}
public DataTable GetTable(string tableName)
{
adapter = new MySqlDataAdapter("SELECT * FROM "+ tableName, connecting);
DataSet ds = new DataSet();
adapter.Fill(ds);
adapter.UpdateCommand = new MySqlCommandBuilder(adapter).GetUpdateCommand();
adapter.DeleteCommand = new MySqlCommandBuilder(adapter).GetDeleteCommand();
ds.Tables[0].RowChanged += new DataRowChangeEventHandler(Rowchanged);
ds.Tables[0].RowDeleted += new DataRowChangeEventHandler(Rowchanged);
return ds.Tables[0];
}
public void Rowchanged(object sender, DataRowChangeEventArgs args)
{
adapter.Update(sender as DataTable);
}
}
}

NHibernate and Modular Code

We're developing an application using Nhibernate as the data access layer.
One of the things I'm struggling with is finding a way to map 2 objects to the same table.
We have an object which is suited to data entry, and another which is used in more of a batch process.
The table contains all the columns for the data entry and some additional info for the batch processes.
When it's in a batch process I don't want to load all the data just a subset, but I want to be able to update the values in the table.
Does nhibernate support multiple objects pointed at the same table? and what is the thing that allows this?
I tried it a while ago and I remember that if you do a query for one of the objects it loads double the amount but i'm not so sure I didn't miss something.
e.g.
10 data entry objects
+
10 batch objects
So 20 object instead of 10.
Can anyone shed any light on this?
I should clarify that these objects are 2 different objects which in my mind should not be polymorphic in behaviour. However they do point at the same database record, it's more that the record has a dual purpose within the application and for sake of logical partitioning they should be kept separate. (A change to one domain object should not blow up numerous screens in other modules etc).
Thanks
Pete
An easy way to map multiple objects to the same table is by using a discriminator column. Add an extra column to the table and have it contain a value declaring it as type "Data Entry" or "Batch Process".
You'd create two objects - one for Data Entry and Batch Process. I'm not entirely sure how you enact that in regular NHibernate XML mapping - I use Castle ActiveRecord for annotating, so you'd mark up your objects like so:
[ActiveRecord("[Big Honking Table]",
DiscriminatorColumn = "Type",
DiscriminatorType = "String",
DiscriminatorValue = "Data Entry")]
public class Data Entry : ActiveRecordBase
{
//Your stuff here!
}
[ActiveRecord("[Big Honking Table]",
DiscriminatorColumn = "Type",
DiscriminatorType = "String",
DiscriminatorValue = "Batch Process")]
public class Batch Process : ActiveRecordBase
{
//Also your stuff!
}
Here's the way to do it with NHibernate + Castle ActiveRecord: http://www.castleproject.org/activerecord/documentation/trunk/usersguide/typehierarchy.html
Note that they use a parent object - I don't think that's necessary but I haven't implemented a discriminator column exactly the way you're describing, so it might be.
And here's the mapping in XML: https://www.hibernate.org/hib_docs/nhibernate/html/inheritance.html
You can also, through the mapping, let NHibernate know which columns to load / update - if you end up just making one big object.
I suppose you just might be overengineering it just a little bit:
If you worry about performance, that's premature optimization (besides, retrieving less columns is not much faster, as for saving you can enable dynamic updates to only update columns that changed).
If you trying to protect the programmer from himself by locking down his choices, you complicating your design for not so noble a cause.
In short, based on my 10 yrs+ of experience and my somewhat limited understanding of your problem I recommend you think again about doing what you wanna do.

Categories

Resources