I'm learning about EF Code First. I've read Database Initialization Strategies in Code-First and Understanding Database Initializers in Entity Framework Code First articles.
But I'm still confused, where should be the data before initialization? Those articles didn't mention about that and I think it's pretty important.
I'm building a football application, so while initializing app I'd like to insert every team and player name into database (every in the Europe from best leagues - so quite a lot of data). They will not change. Should they be hard-coded in sourcecode? or attached in xml? simple file?
Right now I have: before running, there is initialization prompt (Please wait, initialization...) I read the file, line by line inserting them into database. Is it good way?
It depends on where you are in the development process. You can use a Seed() method on the initializer which runs when the database is created. So if you have a ton of data and will be frequently changing your models with a DropCreate_____ type of initializer - I would recommend just seeding a small amount of data to test with.
When you are happy with your initial design, you can do a full Seed() and switch to database migrations to handle your model changes. This will keep your existing data and has it's own Seed() method for new data.
As to where to get the data, you can look at something like this where you could fetch from a web service or download into csv, xml, etc. http://www.jokecamp.com/blog/guide-to-football-and-soccer-data-and-apis/#footballdata or here http://openfootball.github.io/
Related
I want to use Code First Technique of EF6 but when i made changes to table, it drops the database and recreates it, wiping out all data.
Is there any way to stop this from happening?
My Code :
Database.SetInitializer<EmployeeDb>(new DropCreateDatabaseIfModelChanges<EmployeeDb>());
These are the strategies for database initilization in code first approach:
CreateDatabaseIfNotExists: This is default initializer. As the name suggests, it will create the database if none exists as per the configuration. However, if you change the model class and then run the application with this initializer, then it will throw an exception.
DropCreateDatabaseIfModelChanges: This initializer drops an existing database and creates a new database, if your model classes (entity classes) have been changed. So you don't have to worry about maintaining your database schema, when your model classes change.
DropCreateDatabaseAlways: As the name suggests, this initializer drops an existing database every time you run the application, irrespective of whether your model classes have changed or not. This will be useful, when you want fresh database, every time you run the application, like while you are developing the application.
Custom DB Initializer: You can also create your own custom initializer, if any of the above doesn't satisfy your requirements or you want to do some other process that initializes the database using the above initializer.
Here, it can give you general idea and how to use one of these approaches.
Due to your comments CreateDatabaseIfNotExists helps you. With this approach when you add or remove your model classes, your db will be updated and your data will be stable.
Here you can find examples both Context constructor and config file
Another topic about this on stackoverflow.
I have tried lots of variations of EF migration v6.0.1 (from no database, to empty databases to existing databases) and I have a particular problem with Azure DB instances not being able to correctly be created on first deploy using Octopus deploy.
There are a number of places where this could be going wrong, so I thought I would check some basics of EF Code First migration with you fine people if I may:
If I create a code-first model, and know that the database does not exist in the intended target database server on Azure. With the default of 'CreateDatabaseIfNotExists' approach and with AutomaticMigrations disabled;
If I then call 'migrate.exe' with the assembly containing my DbContext and migration Configuration will I get a new database created with the current state of the model? or will I get a new database with nothing in it? i.e. do I need to explicitly 'add-migration' for the initial state of the model?
I have read in the documentation that the database instance should be created automatically by the migration process, but no one states clearly (at least to me) that this newly created database will be generated with the current model without a formal 'initial state' migration created.
So the question is this: do I need an explicit migration model generated for migrate.exe to work from?
Through whatever means I try, I get a database but the application launches with the unfriendly message "Model compatibility cannot be checked because the database does not contain model metadata. Model compatibility can only be checked for databases created using Code First or Code First Migrations." Remembering that this is the same application library that just created the database in the first place (from scratch) I fail to understand how this has happened!
I did manually delete the target database a few times via SQL Server management studio, is this bad? Have I removed some vital user account that I need to recover?
Migrations and the Database Initializer CreateDatabaseIfNotExists are not the same.
Migrations uses the Database Initializer MigrateDatabaseToLatestVersion, which relies upon a special table in the database _MigrationsHistory.
By contrast, CreateDatabaseIfNotExists is one of the Database Initializers which relies upon the special database table EdmMetadata. It does exactly as it implies: Creates a database with tables matching the current state of the model, i.e. a table for each DbSet<T>, only when the database does not exist.
The specific error you have quoted here, Model compatibility cannot be checked because the database does not contain model metadata., occurs due to the existence of DbSet<T> objects which were added to the code base after the initial database creation, and do not exist in EdmMetadata.
There are 4 basic Database Initializers available, 3 of which are for use when migrations is not being used:
CreateDatabaseIfNotExists
DropCreateDatabaseWhenModelChanges
DropCreateDatabaseAlways
Also note, the 4th Initializer, MigrateDatabaseToLatestVersion, will allow you to use Migrations even if AutomaticMigrations is disabled; AutomaticMigrations serves a diffierent purpose, and does not interact with the Database Initializers directly.
If you intend to use Migrations, you should change the Database Initializer to MigrateDatabaseToLatestVersion and forget about the other 3. If, instead, you intend to not use Migrations, then the choice of Initializer is situational.
CreateDatabaseIfNotExists will be more appropriate when you are certain that your data model is not undergoing active change, and you only intend to be concerned with database creation on a new deployment. This Initializer will elp ensure that you do not have any issues with accidental deletion of a database or live data.
DropCreateDatabaseWhenModelChanges is most appropriate in development, when you are changing the model fairly often, and want to be able to verify these changes to the model. It would not be appropriate for a production server, as changes to the model could inadvertently cause the database to be recreated.
DropCreateDatabaseAlways is only appropriate in testing, where your database is created from scratch every time you run your tests.
Migrations differs from these 3 Database Initializers, in that it never drops the database, it instead uses Data Motion to execute a series of Create Table and Drop Table SQL calls.
You also can use Update-Database -Script -SourceMigration:0 in the Package Manager Console at any time, no matter which Database Initializer you are using, to generate a full SQL script that can be run against a server to recreate the database.
Firstly, many thanks to Claies who helped me get to the bottom of this problem. I have accepted his answer as correct as ultimately it was a combination of his response and a few additional bits of reading that got me to my solution.
In answer to the actual posts question 'Do I need a migration for EF code first when the database does not exist in SQL Azure?' the answer is yes you do if you have disabled automatic migrations. But there is a little more to be aware of:
The Azure aspects of this particular problem are actually irrelevant in my situation. My problem was two-fold:
The migration being generated was out of sync with respect to the target model. What do I mean? I mean, that I was generating the migration script from my local database which itself was not in sync with the local codebase which created a migration that was incorrect. This can be seen by comparing the first few lines of the Model text in the __MigrationHistory. This awareness was helped by referring to this helpful post which explains how it works.
And more embarrassingly (I'm sure we've all done it) is that my octopus deployment of the web site itself (using Octopack) somehow neglected to include the Web.Config file. From what I can tell, this may have occurred after I installed a transform extension to Visual Studio. Within my nuget package I can see that there is a web.config.transform file but not a web.config. Basically this meant that when the application started up, it had no configuration file to turn to, no connections string at all. But this resulted in the slightly misleading error
Model compatibility cannot be checked because the database does not
contain model metadata.
Whereas what it should have said was, there isn't a connection string you idiot.
Hopefully this helps people understand the process a little better after reading Claies answer and also that blog-post. First though, check you have a web.config file and that it has a connection string in it...
This might seem like an odd question, but it's been bugging me for a while now. Given that i'm not a hugely experienced programmer, and i'm the sole application/c# developer in the company, I felt the need to sanity check this with you guys.
We have created an application that handles shipping information internally within our company, this application works with a central DB at our IT office.
We've recently switch DB from mysql to mssql and during the transition we decided to forgo the webservices previously used and connect directly to the DB using Application Role, for added security we only allow access to Store Procedures and all CRUD operations are handle via these.
However we currently have stored procedures for updating every field in one of our objects, which is quite a few stored procedures, and as such quite a bit of work on the client for the DataRepository (needing separate code to call the procedure and pass the right params for each procedure).
So i'm thinking, would it be better to simply update the entire object (in this case, an object represents a table, for example shipments) given that a lot of that data would be change one field at a time after initial insert, and that we are trying to keep the network usage down, as some of the clients will run with limited internet.
Whats the standard practice for this kind of thing? or is there a method that I've overlooked?
I would say that updating all the columns for the entire row is a much more common practice.
If you have a proc for each field, and you change multiple fields in one update, you will have to wrap all the stored procedure calls into a single transaction to avoid the database getting into an inconsistent state. You also have to detect which field changed (which means you need to compare the old row to the new row).
Look into using an Object Relational Mapper (ORM) like Entity Framework for these kinds of operations. You will find that there is not general consensus on whether ORMs are a great solution for all data access needs, but it's hard to argue that they solve the problem of CRUD pretty comprehensively.
Connecting directly to the DB over the internet isn't something I'd switch to in a hurry.
"we decided to forgo the webservices previously used and connect directly to the DB"
What made you decide this?
If you are intent on this model, then a single SPROC to update an entire row would be advantageous over one per column. I have a similar application which uses SPROCs in this way, however the data from the client comes in via XML, then a middleware application on our server end deals with updating the DB.
The standard practice is not to connect to DB over the internet.
Even for small app, this should be the overall model:
Client app -> over internet -> server-side app (WCF WebService) -> LAN/localhost -> SQL
DB
Benefits:
your client app would not even know that you have switched DB implementations.
It would not know anything about DB security, etc.
you, as a programmer, would not be thinking in terms of "rows" and "columns" on client side. Those would be objects and fields.
you would be able to use different protocols: send only single field updates between client app and server app, but update entire rows between server app and DB.
Now, given your situation, updating entire row (the entire object) is definitely more of a standard practice than updating a single column.
It's better to only update what you change if you know what you change (if using an ORM like entity Framework for example), but if you're going down the stored proc route then yes definately update everything in a row at once that's way granular enough.
You should take the switch as an oportunity to change over to LINQ to entities however if you're already in a big change and ditch stored procedures in the process whenever possible
We want to progress towards being able to do continuous delivery of of our application into production. We currently deploy to azure and use table/blob storage and have a azure sql database, which we access with the entity.
As the database schema changes we want to be able to automatically apply the schema changes to the production database, but as this will happen whilst the application is live and the code changes are being deployed to many nodes at the same time we are not sure what the correct approach is.
After some reading it seems (and this makes sense) that the application needs to be tolerant of the 2 different database schema versions, so that it doesn't matter if its an old version of the code or a new version of the code which sees the database, however I'm not sure what the best way to approach handling this in the application is, using the entity framework.
Should we have versioned instances of the EF generated classes in the code which know how to access a specific version of the schema? What happens when the schema is updated and an old version of the code is running against the database?
Our entity framework classes are mapped to views in specific schemas in the db and nothing is mapped to the underlying tables, so potentially this could allow us to create v1 views which the old code uses and v2 views which the new code uses, but maintaining this feels like it would be a bit of a nightmare (its already enough of a pain simply maintaining the EF mappings to views rather than tables)
So what are best practices in this area? What do others do to solve this problem?
Whether you use EF or not, maintaining the code's ability to work with 2 consecutive versions of the database is a good (and perhaps the only viable) approach here.
Here are some ways we handle specific types of migrations:
When adding a column, we can typically just add the column (with a default constraint if non-nullable) and not worry about the code. EF will never issue a "SELECT *", so it will be able to continue to function properly while ignoring the new column. Similarly, adding a table is easy.
When removing a column or table, simply keep that column around 1 version longer than you would have otherwise.
For more complex migrations (e. g. completely changing the structure for a table or segment of the data model), deploy the new model alongside backwards-compatibility views (or tables with triggers to keep them in-sync), which will live as long as does the code that references them. As you say, this can a lot of work depending on the complexity of the migration, but it sounds like you are already well-positioned to do this because your EF entities point to views anyway. On the other hand, the benefit of this work is that you have more time to do the code migration. If you have a large codebase, this could be really beneficial in allowing you to migrate the data model to fit the needs of new features while still supporting old features without major code changes.
As a side-note, the difficulty of data migration often makes us push developing a finalized data model as far back as possible in the development schedule. With EF, you can write and test a lot of code before the data model is finalized (we use code-first to generate a sample SQLExpress database in a unit tests, even though our production database is not maintained by code-first). That way, we make fewer incremental changes to the production data model once a new feature is released.
The title is not so accurate, but I couldn't come up with a better one.
I’m trying to write a MySQL Connector for MS‘ Forefront Identity Manager (FIM is basically a sync engine that synchronizes identities between various data sources using a meta directory). But I’m having difficulties to come up with an appropriate design.
Let’s say I want to import user data from a db into FIM’s metaverse. A user object has various attributes like firstname, lastname, address etc. In the database these attributes can be distributed between multiple tables. FIM ultimately needs these attributes to be merged into one object. So the user needs to configure the connector to tell it how the data is stored in the DB.
I was wondering what would be the “best” way to represent this configuration. Two alternatives come to (my) mind:
I could just save a select query that merges/joins the data, so that the result is a single “table” with all the desired attributes. The problem with this is that I think I would have to do some kind of parsing on this query-string to create a fim-compatible-schema out of it (which is basically the name of the object type (f.e. “person”) and a list of attributes). This schema needs to be creatable from the query-string alone without actually executing the query (I could execute some fake queries if that would simplify the process).
I could create some classes to represent the database schema, i.e. the tables and relationships. Since I’m not that experienced with MySQL (or databases at all for that matter) I’m running the risk of missing some special cases. Also it might be some kind of overkill, since the schema can be assumed as fixed once it's configured.
Does anyone have same advice on which alternative to choose and how to tackle the problems that would come with it? Or is there another – better – alternative I didn’t think of? Any advice would be greatly appreciated!
If something is not clear, please let me know.
Edit: Since there have been some questions on the use case, I'm going to elaborate a bit:
As I've said, I'm developing a Management Agent for FIM. FIM provides a so called Extensible Connectivity Management Agent, which is basically one single class implementing a few interfaces. (See this technet guide for a sample implementation).
Since I want to develop a generic agent for managing identities in a MySQL database, I don't know the database layout at compile time. When the enduser wants to use the management agent, he needs to decide, which attributes of the identities he'd like to manage. So I need to give the user some way to configure the management agent. My main question is, how to design the classes to save this configuration.
Lets look at a simple example:
Say you want to manage employee identities. To keep it simple, we have three attributes:
firstName
lastName
department
In this example case it could be f.e. just one single table with 4 columns (the attributes plus an id). But it could also be the much better design, which uses two tables, one user table and one department table, using a 1:1 relation to define the users department.
FIM requires me to consolidate these attributes in one object. It provides a class CSEntryChange which has an AttributeChanges collection member. I would then create some instances of AttributeChange (which basically contains the attribute name und it's value) and add them to the collection. So the user-editable configuration must tell the management agent how it can get the users with all defined attributes from the db and how to create and modify users in that database.
So ideally I'd have an intance of some "MySQLSchema" class (which is configured by the user up front), that could return a List<CSEntryChange> (I wouldn't actually use the CSEntryChange class for the sake of decoupling, but you should get the point) that contains all users in the db (pagination might be a requirement but I can figure that out later). In addition I'd like to able to pass it a CSEntryChange which would result in the corresponding database entries beeing updated (or created if not yet present).
I hope this clear it up a bit more :)
I think that your real question is, "How to access MySQL entities over C#?"
To begin with, I hope you are building this in as a MVC application.
I would suggest sticking to a full Microsoft stack for purposes of learning and ease of implementation.
With this in mind, you will want to create an EntityFramework MySQL data provider in the following steps:
Create a new project and and EntityFramework either through the Nuget package manager UI or package manager console by typing Install-Package EntityFramework -Version 6.0.2 (and add a reference to this project from your web project). Look half way down the page for "Configure EntityFramework to work with a MySQL database".
Install the MySQL provider for entity framework through the Nuget package manager UI or by typing Install-Package MySql.Data.Entity in the package manager console
The next step requires understanding of db configuration changes, that are nicely detailed here - Configure EntityFramework to work with a MySQL database.
You should end up with a nice class structure which will allow you to traverse your entities' navigation properties through EF.
Depending on the level of security your application requires, you may also want to create data transfer objects (DTOs) that contains only the data required for your remote calls - keeping your data calls efficient.
This is by no means a definitive guide on how to do this, but hopefully gives you a start in the right direction.
With regards to your step #1 above:
I could just save a select query that merges/joins the data, so that
the result is a single “table” with all the desired attributes. The
problem with this is that I think I would have to do some kind of
parsing on this query-string to create a fim-compatible-schema out of
it (which is basically the name of the object type (f.e. “person”) and
a list of attributes). This schema needs to be creatable from the
query-string alone without actually executing the query (I could
execute some fake queries if that would simplify the process).
I am slightly confused by this. Are you saying that you want to dynamically update your database schema based application requests?
You can use NHibernate with MySQL, and NHibernate is a full featured ORM, where C# classess maps with your MySQL tables, and the rest will be a breeze, once you get a hang of NHibernate.
A sample is here for your reference.
http://www.codeproject.com/Articles/26123/NHibernate-and-MySQL-A-simple-example
When you use the MySQL Connector/Net you can also use Entity Framework like this example from MSDN:
using (var db = new BloggingContext())
{
// Create and save a new Blog
Console.Write("Enter a name for a new Blog: ");
var name = Console.ReadLine();
var blog = new Blog { Name = name };
db.Blogs.Add(blog);
db.SaveChanges();
}
I have some experience with .NET <-> MySQL communication and I've used Entity Framework in the past for the communication - I had a lot of problems with it and performance issues and soon came to regret using it (this was 1-2 years ago, so may be they fixed it up). Of course, using an ORM framework adds a layer on top of your db communication which in my case proved to be not desired in terms of performance and flexibility.
Finally, I chose to take the following approach:
1) Create models with POCO classes as you would do with Entity Framework. Those models may or may not include relationships - it is up to your preference. I prefer to only add the relationships when I actually need them (so some objects may have their db relationships in the POCO's and some may not). I chose this because it lowers the complexities of when to pre-load the relationships and when not. Basically, if you don't need it - don't add it.
2) Create DAL layer (for example, using the repository pattern) that accepts and works with those objects and fires direct queries to MySQL. No EF required for this - you just need to install the Connector/NET for MySQL and you are ready to go.
A quick example of this would be the following (note: example is of the top of my head and it is just to illustrate the classes. I would use command parameters as well to prevent injection and so on):
public class Person{
public string Name {get;set;}
}
public interface IPersonRepository{
void AddPerson(Person p);
}
public class PersonRepository{
public void AddPerson(Person p){
using(var connection = new MySqlConnection("some connection string"){
connection.Open();
var command = new MySqlCommand(connection);
command.Text = string.Format("insert into Person (Name) values ({0})", p.Name)l
command.ExecuteNonQuery();
}
}
}
The benefits of this approach for me are:
Performance - my application need to insert large amounts of data int MySQL. Entity Framework could not cope with this. If your application doesn't handle a lot of data you might be alright with EF.
Flexibility - writing my own queries allows me to have better control over the communication. You can choose, for example, to use bulk inserts in MySQL (from file - really powerful and fast when you need to handle large amounts of data) for which you will need to bypass Entity Framework. I also found out that EF generates some funky queries
The main drawback is, of course, more work - you will get some things for "free" with the Entity Framework.
So, I can recommend the following:
Consider the amounts of data that you need to handle and make a small exercise application with those amounts. How does EF (or any other ORM) handle it? What about direct queries to the database? That will give you a somewhat accurate idea of how the communication will perform.
Consider how much time you have for building this application - if you are looking for a quick solution and are willing to sacrifice a bit of performance - go for EF or another ORM framework. If you have more time on your hands and would like to make a flexible solution - go for direct queries to the database.
Good luck!
Use Entity Framework Code First.
http://msdn.microsoft.com/en-us/data/jj193542.aspx
It is still a lot of work, but I think this is the quickest approach.
Create a C# classes according to the user and create the DB schema from those classes.