How do you (Unit) Test the database schema?

How do you (Unit) Test the database schema? - c#

When there are a number of people working on a project, all of who could alter the database schema, what's the simplest way to unit test / test / verify it? The main suggestion we've had so far is to write tests for each table to verify column names, constraints, etc.
Has anyone else done anything similar / simpler? We're using C# with SQL Server, if that makes any real difference.
Updates:
The segment of the project we're working on is using SSIS packages to do the bulk of the work so there is very little C# code to write unit tests agains.
The code for creating tables / stored procedures is spread across SQL files. Because of the build system, we could maintain a separate VS DB project file as well, but I'm not sure how that would help us verify the schema either.

One possibly answer is to use Visual Studio for Database developers and keep your schema in source control with the rest of your code. This allows you to see differences and you get a history of who changed what.
Alternatively you could use a tool like SQLCompare to see what has been modified in one database compared to another.

Your (relational) database does two things as far as I'm concerned: 1) Hold data and 2) Hold relations between data.
Holding data is not a behavior so you would not test it
And for ensuring relations just use constraints. Lots of constraints. All over the place.

That is an interesting question! There are lots of tools out there for testing stored procedures but not for testing the database schema.
Don't you find that the unit tests written for code generally find any problems with the database schema?
One approach I have used is to write stored procedures to copy test data from the developer's schema to a test schema. This is pretty rough and ready as the stored procedures generally crash when they come across any differences between the schemas but it does alert you to any changes you haven't been told about.
And nominate someone to be the DBA who monitors changes to the schema?

I've had to do this type of thing before, although not in C#. To begin with, I built a schema migration tool, based on the discussion at Ode to Code (page 1 of 5) (there are also existing tools to do similar things). Importantly, the migration tool I built allowed you to specify the database you were applying the changes to and what version you wanted to apply. Then, following a test first methodology, whenever I needed to make a schema change I would write a test script which would create a test database, apply version changes to the one before my target change script, add some data, apply the change script under test, and confirm that the data was in an expected state.
My main goal with this was to confirm that no data was lost or corrupted during schema migrations, not to check specifically that the schema was in a particular state. A good awareness of your production data set is required, so you can write representative sample data for the tests.
It's debatable if this should be considered unit testing or integration testing. I would tend to consider it integration testing, based on the fact that I don't want to run old tests every time I iterate my code. Whatever you want to call it, I found it to be a useful tool for that situation.

This is an old question but it appears that people are still landing here. So the best tool I have found so far is "SQL Test" by Red Gate. It allows you to create scripts that run as transactions. Allowing you to run "sandboxed" queries for checking the state of the database.

This does not really fit the unit test paradigm. I would suggest version controlling the schema and limiting write access to a single qualified team member such as the DBA or team lead, who can validate any requested changes against the entire application. Schema changes should not be done haphazardly.

Don't you find that the unit tests written for code generally find any problems with the database schema?
This assumes, of course, that your tests test everything.

Related

How to seed database data for integration testing in C#?

I've been writing some integration tests recently against ASP.Net MVC controller actions and have been frustrated by the difficulty in setting up the test data that needs to be present in order to run the test.
For example, I want to test the "add", "edit" and "delete" actions of a controller. I can write the "add" test fine, but then find that to write the "edit" test I was am either going to have to call the code of the "add" test to create a record so that I can edit it, or do a lot of setup in the test class, neither of which are particularly appealing.
Ideally I want to use or develop an integration test framework to make it easier to add seed data in a reusable way for integration tests so that the arrange aspect of an arrange/act/assert test can focus on arranging what I specifically need to arrange for my test rather than concerning itself with arranging a load of reference data only indirectly related to the code under the test.
I happen to be using NHibernate but I believe any data seeding functionality should be oblivious to that and be able to manipulate the database directly; the ORM may change, but I will allways be using a SQL database.
I'm using NUnit so envisage hooking into the test/testfixture setup/teardown (but I think a good solution would potentially transferable to other test frameworks).
I'm using FluentMigrator in my main project to manage schema and seeding of reference data so it would be nice, but not essential to be able to use the FluentMigrator framework for a consistent approach across the solution.
So my question is, "How do you seed your database data for integration testing in C#?" Do you execute the SQL directly? Do you use a framework?

You can make your integration testing on Sql Server Compact, you will have a .sdf file and you can connect to it giving the file's path as connection string. That would be faster and easier to setup and work with.
Your integration test would not probably need millions of rows of data. You can insert your test data into your database and save it as TestDbOriginal.sdf.
When you are running your tests, just make a copy of this 'TestDbOriginal.sdf' and work on that copy, which is already seeded with data. If you want to test a specific scenario, you will need to prepare your data by calling some methods like add, remove, edit .
When you go production or performance testing, switch back to your original server version, be it Sql Server 2008 or whatever.

I don't know if it's necessarily the 'right' thing to do, but I've always seeded using my add/create method(s).

How to transfer data (Entities) from one database to another

I have a system (using Entity Framework) that is deployed in various production systems and also on a quality control system. My problem is that data entry is often done only on one of those occurrences of my system (different databases).
I want to find the best way to transfer my data from one database to another database. Ids can change, as long as the relations between my objects are maintained. 98% of my data in in DB, some of it is external files, I can manage those separately, manually.
Currently we use a xml structure as a transition file. The file is then imported in the destination system, and code manually imports the entities and re-creates the data.
I'm looking for a more generic way to do this, with less code. Since all my data in stored in Entities couldn't I simply create a big List and throw all my objects in there, then serialize that in some matter into an external file and finally generically import all the entities in there in my destination system? (I'll probably have to be careful in maintaining relation ids, but should be ok...)
Anyways I'm wondering if anyone would have smart approaches, I'm pretty sure I,m not the first with a similar problem.
Thanks!

You need to get some process around this. If all environments contain the same data (unlikely) you can replicate. It is the most automatic. But a QA environ should not update production, so you have to really think this through.
If semi-automated is okay, there are tools out there you can use from a variety of vendors. I use Red Gate tools, personally, but others are also fine.
Can you set up a more automated push with EF? Sure, but the amount of time you spend is really not worth it.

In my opinion you can check some of the following approaches:
1) Use Sql Compare or Sql Data Compare. Those tools are from Red Gate and can be found here
2) Regular backups and restores of the databases. You could, if it is an option regularly backup your most up-to-date database and restore it on the destination systems. I have no experience in automatizing this but here is a link to do that through .net.
3) You could always give it a go creating a version control system of your own. I would picture one such system selecting all records from a certain table (or all of them), deleting all records in the target database and inserting them. This seems pretty complex though, as you have to worry about relationships, data dependencies, etc.
Hope this helps in some way.
Regards

If you for some reason will not be satisfied with existing tools may be you'll want take a look at the Sync Framework and implement this functionality yourself for your very particular data bases.

Given what you described, pushing data from One SQL Server to another for demo purposes, you should consider SQL Server Integration Services.
If you're got a simple scenario where you just move the data and objects from DB to the next you can use their built-in Wizards. If you need to do custom stuff you can build complex workflows using C# and SQL (tools you already know). Note: most of what you're going to want comes with the standard edition so if you're using express this is less interesting.
The story for Red Gate products is more compelling when you don't have SQL Server (So you have to go out and buy something) and if you are interested in finding out what the changes are between DB's (like viewing code changes in a .cs file in a source control product)

create a blank database from a production database programmatically

For software testing purposes I would like to create a sterile clone (with all data blanked out) of the production database. This way I can run my unit tests on a known set of records every time. I am looking to try and do this programmatically within the unit tests themselves so I can ensure that the tables contain exactly the test data that I need for the functional tests.
I have found the following information relating to creating an Access database within C#. Note: I know Access probably isn't the best solution, but its good enough!
What I would like to know, is there a way of using TableAdapters (perhaps) to replicate the production database schema (without any data) within a blank Access database file?

Do this:
create a copy of the access file; production -> test
connect to test database
enumerate all tables in the database
run DELETE * FROM [table] for all tables. run it several times if you have FK dependencies until there is no error - or TRUNCATE [table] as commented
compact the database

I do not have much experience with Access, but generally you would make a CREATE script for this purpose. Most database tools have a function for creating such a script. Such a script basically is a set of SQL statements that create all the objects (e.g. databases, views).
Searching for CREATE script and Access will give you some starting points.

I have bad experiences with Access as a production database. I won't recommend. Either go with SQLite or Firebird.
Secondly, yes you can use TableAdapters. You need to create two connections for each db. But I think there might be tools available to do this.
Edited **
How big is the database? For up to 4GB, Oracle Express Edition might help. Also, it will be easy to clone from Oracle to Oracle.

Unit testing using NDBUnit framework

Am writing unit tests for an app that has matured a lot with time..We are using NDBUnit as the test cases become independent of each other..while we started the development of this app the DB schema was pretty manageable and hence dragging and dropping the tables on VS designer to create an XSD was never an issue. Well, with my current DB schema the XSD that is generated is more than 3MB in size. On slow dev machine VS goes off to sleep when one tries to open the XSD.
Hence keeping the DB schema and the XSD in sync has become very challenging.
Is there a way i can get rid of the manual step of modifying the XSD?
Do you suggest that i should consider other unit testing frameworks?
Spring.Net will definitely give me wt i need, but we don't have interfaces and hence integrating it will be a tedious task.

If you're manually using the Visual Studio designer to create your XSD files in a production environment, you're probably doing it wrong :) As you're discovering, the Visual Studio XSD designer is probably the least-managable way to maintain your XSD file(s).
I'd recommend doing as I have done and switch to the MyGeneration code-gen tool to create your XSD files from your database.
Info: http://www.mygenerationsoftware.com
Download: http://sourceforge.net/projects/mygeneration/
XSD Template: http://www.mygenerationsoftware.com/TemplateLibrary/Archive/?guid=59a03408-c96f-4baf-8171-b6bfe8725dab
Also, you should take careful note that while I demonstrated the use of the NDbUnit tool in those screencasts by brute-forcing the entire database into one single XSD file, the intended real-world usage pattern of the NDbUnit tool is to use multiple XSD file(s) as needed to support your different database-dependent tests.
Since the contents of the XSD file(s) control the 'scope' of your database that NDbUNit will operate on at any one time, the intent of the tool is that you would have many different XSD files for different of your data-dependent tests and that you would carefully scope the tests, the XSD, and the test data (XML) together so that they all closely correlate with each other for different collections of tests.
Having your entire database represented in single XSD file (especially when that XSD file is approaching 3MB!) is almost certainly an anti-pattern and should be cause for you to consider that you're probably not thinking about your tests or your test data in a granular-enough way to be effective.
If you cannot effectively load data into just a portion of your database tables without violating referential integrity rules then you probably have a database design problem to address. just as one single god-object is an anti-pattern in OO design, so too is one single monolithic database with referential-integrity contraints in so many places that only by loading EVERY table with data can records be inserted (forcing you to test the whole thing at once as it sounds like you may be doing here).
Lastly, just as a quick point-of-order in the interests of clarity, the NDbUnit project is now and has always been open-source and completely independent of Microdesk. While I was employed at Microdesk, the company did make use of the tool and I did spend my off-hours contributing to the project, but Microdesk was just one company that adopted the tool and so its completely erroneous to refer to NDbUnit as 'Microdesk's NDbUnit Framework'.
Hope this helps~!

Database Deployment Strategies (SQL Server)

I am looking for a way to do daily deployments and keep the database scripts in line with releases.
Currently, we have a fairly decent way of deploying our source, we have unit code coverage, continuous integration and rollback procedures.
The problem is keeping the database scripts in line with a release. Everyone seems to try the script out on the test database then run them on live, when the ORM mappings are updated (that is, the changes goes live) then it picks up the new column.
The first problem is that none of the scripts HAVE to be written anywhere, generally everyone "attempts" to put them into a Subversion folder but some of the lazier people just run the script on live and most of the time no one knows who has done what to the database.
The second issue is that we have 4 test databases and they are ALWAYS out of line and the only way to truly line them back up is to do a restore from the live database.
I am a big believer that a process like this needs to be simple, straightforward and easy to use in order to help a developer, not hinder them.
What I am looking for are techniques/ideas that make it EASY for the developer to want to record their database scripts so they can be ran as part of a release procedure. A process that the developer would want to follow.
Any stories, use cases or even a link would helpful.

For this very problem I chose to use a migration tool: Migratordotnet.
With migrations (in any tool) you have a simple class used to perform your changes and undo them. Here's an example:
[Migration(62)]
public class _62_add_date_created_column : Migration
{
public void Up()
{
//add it nullable
Database.AddColumn("Customers", new Column("DateCreated", DateTime) );
//seed it with data
Database.Execute("update Customers set DateCreated = getdate()");
//add not-null constraint
Database.AddNotNullConstraint("Customers", "DateCreated");
}
public void Down()
{
Database.RemoveColumn("Customers", "DateCreated");
}
}
This example shows how you can handle volatile updates, like adding a new not-null column to a table that has existing data. This can be automated easily, and you can easily go up and down between versions.
This has been a really valuable addition to our build, and has streamlined the process immensely.
I posted a comparison of the various migration frameworks in .NET here: http://benscheirman.com/2008/06/net-database-migration-tool-roundup

Read K.Scott Allen's series of posts on database versioning.
We built a tool for applying database scripts in a controlled manner based on the techniques he describes and it works well.
This could then be used as part of the continuous integration process with each test database having changes deployed to it when a commit is made to the URL you keep the database upgrade scripts in. I'd suggest having a baseline script and upgrade scripts so that you can always run a sequence of scripts to get a database from it's current version to the new state that is needed.
This does still require some process and discipline from the developers though (all changes need to be rolled into a new version of the base install script and a patch script).

We've been using SQL Compare from RedGate for a few years now:
http://www.red-gate.com/products/index.htm
The pro version has a command line interface that you could probably use to setup your deployment procedures.

We use a modified version of the database versioning described by K. Scott Allen. We use the Database Publishing Wizard to create the original baseline script. Then a custom C# tool based on SQL SMO to dump the stored procedures, views and user functions. Change scripts which contain schema and data changes are generated by Red Gate tools. So we end up with a structure like
Database\
ObjectScripts\ - contains stored procs, views and user funcs 1-per file
\baseline.sql - database snapshot which includes tables and data
\sc.01.00.0001.sql - incremental change scripts
\sc.01.00.0002.sql
\sc.01.00.0003.sql
The custom tool creates the database if necessary, applies the baseline.sql if necessary, adds a SchemaChanges table if necessary and applies the change scripts as necessary based on what's in the SchemaChanges table. That process occurs as part of a nant build script each time we do a deployment build via cc.net.
If anyone wants the source code to the schemachanger app I can throw it up on codeplex/google or wherever.

If you are talking about trying to keep database schemas in sync, try using Red Gate SQL Comparison SDK. Build a temp database based on a create script (newDb) - this is what you want your database to look like. Compare newDb against your old database (oldDb). Get a change set from that comparison and apply it using Red Gate. You could build this upgrade process into you tests, and you can try and get all the devs to agree that there is one place where the create script for the database is kept. This same practice works well for upgrading your database across several versions and running data migration scripts and processes between each step (using an XML doc to map the create and data migration scripts)
Edit: With Red Gate technique, you only are concerned with create scripts, not upgrade scripts since Red Gate comes up with the upgrade script. It will also let you drop and create indexes, stored procedures, functions, etc.

Go here:
https://blog.codinghorror.com/get-your-database-under-version-control/
Scroll down a bit to the list of 5 links to the odetocode.com website. Fantastic five-part series. I would use that as a starting point to get ideas and figure out a process that will work for your team.

You should consider using a build tool like MSBuild or NAnt. We use a combination of CruiseControl.NET, NAnt, and SourceGear Fortress to handle our deployments, including SQL objects. The NAnt db build task calls sqlcmd.exe to update scripts in our dev and staging environments after they're checked into Fortress.

We use Visual Studio for Database Professionals and TFS to version and manage our database deployments. This allows us to treat our databases just like code (check out, check in, lock, view version history, branch, build, deploy, test, etc.) and even include them in the same solution files if we wish.
Our developers can work on local databases to avoid stepping on each other's changes in a shared environment. When they check database changes into TFS, we have continuous integration to build, test and deploy to our integrated dev environment. We have separate builds on release branches to create differential deployment scripts for each subsequent environment.
Later, if a bug is discovered in a release, we can go to a release branch and hotfix the code and database at the same time.
This is a great product, but its adoption was hindered early on due to a Microsoft marketing blunder. It was originally a separate product under Team System. This meant in order to use features of the developer edition and database edition at the same time, you were required to step up to the much more expensive Team Suite edition. We (and many other customers) gave Microsoft grief about this, and we were very happy they announced this year that DB Pro has been folded into the developer edition, and that immediately anyone licensed with developer edition can install the database edition.

Gus off-handedly mentioned DB Ghost (above) – I second it as a potential solution.
A brief overview of how my company is using DB Ghost:
After the schema for a new DB has been reasonably settled during initial development, we use the DB Ghost 'Data and Schema Scripter' to create script (.sql) files for all the DB objects (and any static data) and we check-in these script files into source control (the tool separates the objects into folders such as 'Stored Procedures', 'Tables', etc.). At this point, we can use either of the DB GHost 'Packager' or 'Packager Plus' tools to create a stand-alone executable to create a new DB from these scripts.
All changes to the DB schema are checked-in to source by check-ins to the specific script files.
At anytime we can use the packager to create an executable to either (a) create a new DB or (b) update an existing DB. Some customization is required for certain path-dependent changes (e.g. changes that require data to be updated), but we have pre-update and post-update scripts that are run.
The 'update' process involves the creation of a clean 'source' DB and then (after pre-update custom scripts), a comparison between the schemas of the source DB and the target DB. DB Ghost updates the target DB to match
We routinely make changes to production DBs (we have 14 customers in 7 different production environments) but inevitably deploy a large-enough set of changes with a DB Ghost update executable (created during our build process). Any production changes that were not checked-in to source (or that were not checked-in to the appropriate branch being released) are LOST. This has forced everyone to check-in changes consistently.
To summarize:
If you enforce a policy that all DB updates be deployed using a DB Ghost update executable, you can 'force' developers to consistently check-in their changes, regardless of whether they are deployed manually in the interim.
Adding a step (or steps) to your build process to create a DB Ghost update executable will in-effect perform a test to verify that a DB can be created from scripts (i.e. because DB Ghost creates a 'source' DB, even when creating the update executable package) and if you add a step (or steps) to execute the update package [on any of the four test DBs you mentioned], you can keep your test DBs in line with source.
There are some caveats and some limitations in what changes are 'easily' deployed with this tool (really a suite of related tools), but they are all fairly minor (at least for my company):
Renaming objects must be done in one of the custom scripts
The entire DB is always updated (e.g. objects in a single schema can't be updated alone) making it difficult to support customer-specific objects in the main application DB

The book Refactoring Databases addresses many of these issues at a conceptual level.
As far as tools go, I know that DB Ghost works well for SQL Server. I have heard that the Data Dude edition of Visual Studio has really been imporved upon in the latest release but I don't have any experience with it.
As far as really pulling off continuous integration style database development, it gets really resource instensive really fast because of the number of database copies you need. It is very doable when the database can fit on a developer workstation but impractical when the database is so large that it needs to be deployed across a grid. To do it you bacically need 1 copy of the database per developer [developers who make DDL changes, not just changes to procs] + 6 common copies. The common copies are as follows:
INT DEV --> Developers "check in" their refactoring to INT DEV for integration testing. When integration testing passes, this database is copied over to DEV.
DEV --> This is the "official" development copy of the database. INT DEV is refreshed regularly with a copy of DEV. Developers working on new refactorings get a fresh copy of the database from DEV.
INT QA --> Same idea as INT DEV except for the QA team. When integration tests pass here, this database is copied over to QA and to DEV*.
QA
INT PROD --> Same idea as INT QA except for production. When integration tests pass here, this database is copied over to PROD, QA*, and DEV*
PROD
*When copying databases across DEV/QA/PROD lines, you will also need to run scripts to update test data relevant to the particular environment (e.g. setting up users in QA that the QA team uses to test but that don't exist in production).

One possible solution is to look into implementing DML auditing on your test databases, then just rolling those audit logs into a script for final testing and live deployment. SQL Server 2008 significantly improves on DML auditing, but even SQL Server 2005 supports it via triggers.

There are a bunch of links in these posts that I'll want to follow up on (I "rolled my own" system years ago, have to see if there are similarities). One thing you will need, and that I hope is mentioned in these links, is discipline. I don't quite see how any automated system can work if anyone can change anything at any time. (Your question implies that this can happen on your production systems, but obviously that can't be true.)
Having one person (the fabled "database administrator") dedicated to the task of managing changes to databases, particularly production databases, is a very common solution. As for maintaining consistency across X development and testing databases: if it/they are used by many users, once again you are best served by having an individual act as a "clearing house" for changes; if everyone has their own database instance, then they're responsible for keeping it in order, and having a central consistent database "source" will be critical when they need a refreshed baseline database.
Here's a recent Stack Overflow post that may be of interest: how-to-refresh-a-test-instance-of-sql-server-with-production-data-without-using

Red Gate has a paper describing how to achieve build automation: http://downloads.red-gate.com/HelpPDF/ContinuousIntegrationForDatabasesUsingRedGateSQLTools.pdf
This is built around SQL Source Control, which integrates with SSMS and your existing source control system.

I've written a .NET based tool to handle database versioning in an automated fashion. We have been using this tool in production to handle rolling out database updates (including patches) to multiple environments, keep a log in each database of which scripts have been run, and do it all in an automated fashion. It has a command-line console so you can create batch scripts which use this tool. Check it out: https://github.com/bmontgomery/DatabaseVersioning

For what it's worth, this is a real example of a simple, low cost approach used by my former employer (and which I am trying to impress on my current employer as a basic first step).
Add a table called 'DB_VERSION' or similar. In EVERY upgrade script, add a row to that table which can include as little or as many columns as you see fit to describe the upgrade but at a minimum I would suggest { VERSION, EXECUTION_DATE, DESCRIPTION, EXECUTION_USER }. Now you have a concrete record of what has been going on. If someone runs their own unauthorised script you'd still need to follow the advice of the answers above, but this is just a simple way of dramatically improving on your existing versioning control (i.e. none).
Now let's you have an upgrade script from v2.1 to v2.2 of the database and you want to verify the lone maverick guy has actually run it on his database, you can just search for rows where VERSION = 'v2.2' and if you get a result, don't run this upgrade script. Can be built into a console utility app if necessary.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.