I am creating a series of web automation tests which require test data to be in a database (SQL Server 2008). To generate the required data for each test I have to call some C# code which will insert the correct data into the DB (i.e. I can't just write SQL scripts to insert the data). My problem is I don't want to pollute my test db with lot's of test data from these automated tests. So would like to rollback all of the changes made to the DB during the test.
Can anyone suggest a sensible way of achieving this?
Simple way would be to create a backup of the database before running the tests, and then just restore back at the end.
Two ways of doing this.
One is to enclose your test within a transaction and roll it back. Another is to use a cleanup script as part of your test completion code (we do this for some of our integration tests where a transaction does not work).
When I don't have control over the transaction scopes of my tests, I usually drop and recreate the database from scratch each time.
Obviously, this is only feasible if the tests can run against the bare schema (or with hardcoded lookup values inserted in the create scripts).
When I test against a snapshot database pre-populated with lots of data, I have before used cleanup scripts, say delete all records from each table above the max id of my baseline snapshot.
Haven't tried automating the backup/rollback as AdaTheDev suggests, but it sounds like probably your best option if you don't want to maintain potentially complicated (and buggy) cleanup scripts (depends on the complexity of your snapshot data / how often you might change your snapshot and have to modify your cleanups accordingly).
Have you considered mocking out the data access, so that your web tests run against an in memory data store? Then test the data access procedures internally where you can still rollback the transaction scopes?
The wizards over at Red Gate have just released SQL Virtual Restore, which will actually mount a backup file as a live, readable, writable database - so you could have a backup file representing the baseline state of your system before tests, take a copy of this backup, mount the copy as your test database, run the tests, and then unmount and wipe the copy.
Virtual Restore is at http://www.red-gate.com/products/sql_virtual_restore/index.htm and there's 14-day trial if you want to try it out.
I have no affiliation with Red Gate, btw - I'm just an enthusiastic user of their tools.
It sounds like it would be hard to use transactions, since you're going to be making multiple web requests in a single test - but that would be my first preference because it's faster than restoring a database from backup.
If you've got the right version of SQL server, you could use database snapshots instead of backups: http://msdn.microsoft.com/en-us/library/ms175876.aspx, simply because they are faster :)
Obviously it all depends on how you call the tests but would the "Rollback" attribute work that is in MbUnit?
Related
I've been writing some integration tests recently against ASP.Net MVC controller actions and have been frustrated by the difficulty in setting up the test data that needs to be present in order to run the test.
For example, I want to test the "add", "edit" and "delete" actions of a controller. I can write the "add" test fine, but then find that to write the "edit" test I was am either going to have to call the code of the "add" test to create a record so that I can edit it, or do a lot of setup in the test class, neither of which are particularly appealing.
Ideally I want to use or develop an integration test framework to make it easier to add seed data in a reusable way for integration tests so that the arrange aspect of an arrange/act/assert test can focus on arranging what I specifically need to arrange for my test rather than concerning itself with arranging a load of reference data only indirectly related to the code under the test.
I happen to be using NHibernate but I believe any data seeding functionality should be oblivious to that and be able to manipulate the database directly; the ORM may change, but I will allways be using a SQL database.
I'm using NUnit so envisage hooking into the test/testfixture setup/teardown (but I think a good solution would potentially transferable to other test frameworks).
I'm using FluentMigrator in my main project to manage schema and seeding of reference data so it would be nice, but not essential to be able to use the FluentMigrator framework for a consistent approach across the solution.
So my question is, "How do you seed your database data for integration testing in C#?" Do you execute the SQL directly? Do you use a framework?
You can make your integration testing on Sql Server Compact, you will have a .sdf file and you can connect to it giving the file's path as connection string. That would be faster and easier to setup and work with.
Your integration test would not probably need millions of rows of data. You can insert your test data into your database and save it as TestDbOriginal.sdf.
When you are running your tests, just make a copy of this 'TestDbOriginal.sdf' and work on that copy, which is already seeded with data. If you want to test a specific scenario, you will need to prepare your data by calling some methods like add, remove, edit .
When you go production or performance testing, switch back to your original server version, be it Sql Server 2008 or whatever.
I don't know if it's necessarily the 'right' thing to do, but I've always seeded using my add/create method(s).
For software testing purposes I would like to create a sterile clone (with all data blanked out) of the production database. This way I can run my unit tests on a known set of records every time. I am looking to try and do this programmatically within the unit tests themselves so I can ensure that the tables contain exactly the test data that I need for the functional tests.
I have found the following information relating to creating an Access database within C#. Note: I know Access probably isn't the best solution, but its good enough!
What I would like to know, is there a way of using TableAdapters (perhaps) to replicate the production database schema (without any data) within a blank Access database file?
Do this:
create a copy of the access file; production -> test
connect to test database
enumerate all tables in the database
run DELETE * FROM [table] for all tables. run it several times if you have FK dependencies until there is no error - or TRUNCATE [table] as commented
compact the database
I do not have much experience with Access, but generally you would make a CREATE script for this purpose. Most database tools have a function for creating such a script. Such a script basically is a set of SQL statements that create all the objects (e.g. databases, views).
Searching for CREATE script and Access will give you some starting points.
I have bad experiences with Access as a production database. I won't recommend. Either go with SQLite or Firebird.
Secondly, yes you can use TableAdapters. You need to create two connections for each db. But I think there might be tools available to do this.
Edited **
How big is the database? For up to 4GB, Oracle Express Edition might help. Also, it will be easy to clone from Oracle to Oracle.
I'm writing unit-tests for an app that uses a database, and I'd like to be able to run the app against some sample/test data - but I'm not sure of the best way to setup the initial test data for the tests.
What I'm looking for is a means to run the code-under-test against the same database (or schematically identical) that I currently use while debugging - and before each test, I'd like to ensure that the database is reset to a clean slate prior to inserting the test data.
I realize that using an IRepository pattern would allow me to remove the complexity of testing against an actual database, but I'm not sure that will be possible in my case.
Any suggestions or articles that could point me in the right direction?
Thanks!
--EDIT--
Thanks everyone, those are some great suggestions! I'll probably go the route of mocking my data access layer, combined with some simple set-up classes to generate exactly the data I need per test.
Here's the general approach I try to use. I conceive of tests at about three or four levels:: unit-tests, interaction tests, integration tests, acceptance tests.
At the unit test level, it's just code. Any database interaction is mocked out, either manually or using one of the popular frameworks, so loading data is not an issue. They run quick, and make sure the objects work as expected. This allows for very quick write-test/write code/run test cycles. The mock objects serve up the data that is needed by each test.
Interaction tests test the interactions of non-trivial class interactions. Again, no database required, it's mocked out.
Now at the integration level, I'm testing integration of components, and that's where real databases, queues, services, yada yada, get thrown in. If I can, I'll use one of the popular in-memory databases, so initialization is not an issue. It always starts off empty, and I use utility classes to scrub the database and load exactly the data I want before each test, so that there's no coupling between the tests.
The problem I've hit using in-memory databases is that they often don't support all the features I need. For example, perhaps I require an outer join, and the in-memory DB doesn't support that. In that case, I'll typically test against a local conventional database such as MySQL, again, scrubbing it before each test. Since the app is deployed to production in a separate environment, that data is untouched by the testing cycle.
The best way I've found to handle this is to use a static test database with known data, and use transactions to ensure that your tests don't change anything.
In your test setup you would start a transaction, and in your test cleanup, you would roll the transaction back. This lets you modify data in your tests but also makes sure everything gets restored to its original state when the test completes.
I know you're using C# but in the Java World there's the Spring framework. It allows you to run database minipulations in a transaction and after this transaction, you roll this one back. This means that you operate against a real database without touching the state after the test finishes. Perhaps this could be a hint to further investigation in C#.
Mocking is of cause the best way to unit test your code.
As far as integration tests go, I have had some issues using in-memory databases like SQLite, mainly because of small differences in behaviour and/or syntax.
I have been using a local instance of MySql for integration tests in several projects. A returning problem is the server setup and creation of test data.
I have created a small Nuget package called Mysql.Server (see more at https://github.com/stumpdk/MySql.Server), that simply sets up a local instance of MySql every time you run your tests.
With this instance running you can easily set up table structures and sample data for your tests without being concerned of either your production environment or local server setup.
I don't think there is an easy way to finish this. You just have to create those Pre-Test sql setup scripts and post-test Tear-down scripts. Then you need trigger those scripts for each run. A lot of people suggest SQLLite for unit test setup.
I found it best to have my tests go to a different db so I could wipe it clean and put in the data I wanted for the test.
You may want to have the database be something that can be set within the program, then your test can tell the classes to change the database.
This code clears all data from all user's tables in MS SQL Server:
private DateTime _timeout;
public void ClearDatabase(SqlConnection connection)
{
_timeout = DateTime.Now + TimeSpan.FromSeconds(30);
do
{
SqlCommand command = connection.CreateCommand();
command.CommandText = "exec sp_MSforeachtable 'DELETE FROM ?'";
try
{
command.ExecuteNonQuery();
return;
}
catch (SqlException)
{
}
} while (!TimeOut());
if (TimeOut())
Assert.Fail("Fail to clear DB");
}
private bool TimeOut()
{
return DateTime.Now > _timeout;
}
If you are thinking about a real database usage, then mostlikely we're talking integration tests here. I.e tests, which check app behavior as a composition of different components contrary to unit tests, where components are supposed to be tested in isolation.
Having the testing scope defined, I wouldn't recommend using things like in-memory databases or mocking libraries as the other authors suggested. The problem is that usually there is a slightly different behavior or reduced set of features for in-memory databases and there is no database at all with mocking, therefore you'll be testing some other application in general sense and not the one you'll be delivering to your customers.
I'd rather suggest to minimize the amount of integration tests by covering just a crucial parts of your logic leaving the rest for unit testing, while using a real database with the setup as close to the production one as possible. Test runs could be too slow and a real pain if there are a lot of integration ones.
Also you might use some tricks to optimize the speed of your tests execution:
Split tests to Read and Write in regard to the data mutations they introduce and run the former ones in parallel and without any cleanup. (E.g HTTP GET requests are safe to be run in parallel if the system under test is a webapp and tests are more like end-to-end);
Use the only insert/delete script for all the data and optimize as much as possible. You might find Reseed library I'm developing currently helpful. It's able to generate both insert and delete scripts for you. So basically what you asked for. Or check out Respawn which could be used for database cleanup;
Use database snapshots for the restore, which might be faster than full insert/delete cycle;
Wrap each test in transaction and revert it afterwards (this one is also not 100% honest and somewhat fragile);
Parallelize your tests by using a pool of databases instead of the only. Docker and TestContainers could be suitable here;
I am looking for a way to do daily deployments and keep the database scripts in line with releases.
Currently, we have a fairly decent way of deploying our source, we have unit code coverage, continuous integration and rollback procedures.
The problem is keeping the database scripts in line with a release. Everyone seems to try the script out on the test database then run them on live, when the ORM mappings are updated (that is, the changes goes live) then it picks up the new column.
The first problem is that none of the scripts HAVE to be written anywhere, generally everyone "attempts" to put them into a Subversion folder but some of the lazier people just run the script on live and most of the time no one knows who has done what to the database.
The second issue is that we have 4 test databases and they are ALWAYS out of line and the only way to truly line them back up is to do a restore from the live database.
I am a big believer that a process like this needs to be simple, straightforward and easy to use in order to help a developer, not hinder them.
What I am looking for are techniques/ideas that make it EASY for the developer to want to record their database scripts so they can be ran as part of a release procedure. A process that the developer would want to follow.
Any stories, use cases or even a link would helpful.
For this very problem I chose to use a migration tool: Migratordotnet.
With migrations (in any tool) you have a simple class used to perform your changes and undo them. Here's an example:
[Migration(62)]
public class _62_add_date_created_column : Migration
{
public void Up()
{
//add it nullable
Database.AddColumn("Customers", new Column("DateCreated", DateTime) );
//seed it with data
Database.Execute("update Customers set DateCreated = getdate()");
//add not-null constraint
Database.AddNotNullConstraint("Customers", "DateCreated");
}
public void Down()
{
Database.RemoveColumn("Customers", "DateCreated");
}
}
This example shows how you can handle volatile updates, like adding a new not-null column to a table that has existing data. This can be automated easily, and you can easily go up and down between versions.
This has been a really valuable addition to our build, and has streamlined the process immensely.
I posted a comparison of the various migration frameworks in .NET here: http://benscheirman.com/2008/06/net-database-migration-tool-roundup
Read K.Scott Allen's series of posts on database versioning.
We built a tool for applying database scripts in a controlled manner based on the techniques he describes and it works well.
This could then be used as part of the continuous integration process with each test database having changes deployed to it when a commit is made to the URL you keep the database upgrade scripts in. I'd suggest having a baseline script and upgrade scripts so that you can always run a sequence of scripts to get a database from it's current version to the new state that is needed.
This does still require some process and discipline from the developers though (all changes need to be rolled into a new version of the base install script and a patch script).
We've been using SQL Compare from RedGate for a few years now:
http://www.red-gate.com/products/index.htm
The pro version has a command line interface that you could probably use to setup your deployment procedures.
We use a modified version of the database versioning described by K. Scott Allen. We use the Database Publishing Wizard to create the original baseline script. Then a custom C# tool based on SQL SMO to dump the stored procedures, views and user functions. Change scripts which contain schema and data changes are generated by Red Gate tools. So we end up with a structure like
Database\
ObjectScripts\ - contains stored procs, views and user funcs 1-per file
\baseline.sql - database snapshot which includes tables and data
\sc.01.00.0001.sql - incremental change scripts
\sc.01.00.0002.sql
\sc.01.00.0003.sql
The custom tool creates the database if necessary, applies the baseline.sql if necessary, adds a SchemaChanges table if necessary and applies the change scripts as necessary based on what's in the SchemaChanges table. That process occurs as part of a nant build script each time we do a deployment build via cc.net.
If anyone wants the source code to the schemachanger app I can throw it up on codeplex/google or wherever.
If you are talking about trying to keep database schemas in sync, try using Red Gate SQL Comparison SDK. Build a temp database based on a create script (newDb) - this is what you want your database to look like. Compare newDb against your old database (oldDb). Get a change set from that comparison and apply it using Red Gate. You could build this upgrade process into you tests, and you can try and get all the devs to agree that there is one place where the create script for the database is kept. This same practice works well for upgrading your database across several versions and running data migration scripts and processes between each step (using an XML doc to map the create and data migration scripts)
Edit: With Red Gate technique, you only are concerned with create scripts, not upgrade scripts since Red Gate comes up with the upgrade script. It will also let you drop and create indexes, stored procedures, functions, etc.
Go here:
https://blog.codinghorror.com/get-your-database-under-version-control/
Scroll down a bit to the list of 5 links to the odetocode.com website. Fantastic five-part series. I would use that as a starting point to get ideas and figure out a process that will work for your team.
You should consider using a build tool like MSBuild or NAnt. We use a combination of CruiseControl.NET, NAnt, and SourceGear Fortress to handle our deployments, including SQL objects. The NAnt db build task calls sqlcmd.exe to update scripts in our dev and staging environments after they're checked into Fortress.
We use Visual Studio for Database Professionals and TFS to version and manage our database deployments. This allows us to treat our databases just like code (check out, check in, lock, view version history, branch, build, deploy, test, etc.) and even include them in the same solution files if we wish.
Our developers can work on local databases to avoid stepping on each other's changes in a shared environment. When they check database changes into TFS, we have continuous integration to build, test and deploy to our integrated dev environment. We have separate builds on release branches to create differential deployment scripts for each subsequent environment.
Later, if a bug is discovered in a release, we can go to a release branch and hotfix the code and database at the same time.
This is a great product, but its adoption was hindered early on due to a Microsoft marketing blunder. It was originally a separate product under Team System. This meant in order to use features of the developer edition and database edition at the same time, you were required to step up to the much more expensive Team Suite edition. We (and many other customers) gave Microsoft grief about this, and we were very happy they announced this year that DB Pro has been folded into the developer edition, and that immediately anyone licensed with developer edition can install the database edition.
Gus off-handedly mentioned DB Ghost (above) – I second it as a potential solution.
A brief overview of how my company is using DB Ghost:
After the schema for a new DB has been reasonably settled during initial development, we use the DB Ghost 'Data and Schema Scripter' to create script (.sql) files for all the DB objects (and any static data) and we check-in these script files into source control (the tool separates the objects into folders such as 'Stored Procedures', 'Tables', etc.). At this point, we can use either of the DB GHost 'Packager' or 'Packager Plus' tools to create a stand-alone executable to create a new DB from these scripts.
All changes to the DB schema are checked-in to source by check-ins to the specific script files.
At anytime we can use the packager to create an executable to either (a) create a new DB or (b) update an existing DB. Some customization is required for certain path-dependent changes (e.g. changes that require data to be updated), but we have pre-update and post-update scripts that are run.
The 'update' process involves the creation of a clean 'source' DB and then (after pre-update custom scripts), a comparison between the schemas of the source DB and the target DB. DB Ghost updates the target DB to match
We routinely make changes to production DBs (we have 14 customers in 7 different production environments) but inevitably deploy a large-enough set of changes with a DB Ghost update executable (created during our build process). Any production changes that were not checked-in to source (or that were not checked-in to the appropriate branch being released) are LOST. This has forced everyone to check-in changes consistently.
To summarize:
If you enforce a policy that all DB updates be deployed using a DB Ghost update executable, you can 'force' developers to consistently check-in their changes, regardless of whether they are deployed manually in the interim.
Adding a step (or steps) to your build process to create a DB Ghost update executable will in-effect perform a test to verify that a DB can be created from scripts (i.e. because DB Ghost creates a 'source' DB, even when creating the update executable package) and if you add a step (or steps) to execute the update package [on any of the four test DBs you mentioned], you can keep your test DBs in line with source.
There are some caveats and some limitations in what changes are 'easily' deployed with this tool (really a suite of related tools), but they are all fairly minor (at least for my company):
Renaming objects must be done in one of the custom scripts
The entire DB is always updated (e.g. objects in a single schema can't be updated alone) making it difficult to support customer-specific objects in the main application DB
The book Refactoring Databases addresses many of these issues at a conceptual level.
As far as tools go, I know that DB Ghost works well for SQL Server. I have heard that the Data Dude edition of Visual Studio has really been imporved upon in the latest release but I don't have any experience with it.
As far as really pulling off continuous integration style database development, it gets really resource instensive really fast because of the number of database copies you need. It is very doable when the database can fit on a developer workstation but impractical when the database is so large that it needs to be deployed across a grid. To do it you bacically need 1 copy of the database per developer [developers who make DDL changes, not just changes to procs] + 6 common copies. The common copies are as follows:
INT DEV --> Developers "check in" their refactoring to INT DEV for integration testing. When integration testing passes, this database is copied over to DEV.
DEV --> This is the "official" development copy of the database. INT DEV is refreshed regularly with a copy of DEV. Developers working on new refactorings get a fresh copy of the database from DEV.
INT QA --> Same idea as INT DEV except for the QA team. When integration tests pass here, this database is copied over to QA and to DEV*.
QA
INT PROD --> Same idea as INT QA except for production. When integration tests pass here, this database is copied over to PROD, QA*, and DEV*
PROD
*When copying databases across DEV/QA/PROD lines, you will also need to run scripts to update test data relevant to the particular environment (e.g. setting up users in QA that the QA team uses to test but that don't exist in production).
One possible solution is to look into implementing DML auditing on your test databases, then just rolling those audit logs into a script for final testing and live deployment. SQL Server 2008 significantly improves on DML auditing, but even SQL Server 2005 supports it via triggers.
There are a bunch of links in these posts that I'll want to follow up on (I "rolled my own" system years ago, have to see if there are similarities). One thing you will need, and that I hope is mentioned in these links, is discipline. I don't quite see how any automated system can work if anyone can change anything at any time. (Your question implies that this can happen on your production systems, but obviously that can't be true.)
Having one person (the fabled "database administrator") dedicated to the task of managing changes to databases, particularly production databases, is a very common solution. As for maintaining consistency across X development and testing databases: if it/they are used by many users, once again you are best served by having an individual act as a "clearing house" for changes; if everyone has their own database instance, then they're responsible for keeping it in order, and having a central consistent database "source" will be critical when they need a refreshed baseline database.
Here's a recent Stack Overflow post that may be of interest: how-to-refresh-a-test-instance-of-sql-server-with-production-data-without-using
Red Gate has a paper describing how to achieve build automation: http://downloads.red-gate.com/HelpPDF/ContinuousIntegrationForDatabasesUsingRedGateSQLTools.pdf
This is built around SQL Source Control, which integrates with SSMS and your existing source control system.
I've written a .NET based tool to handle database versioning in an automated fashion. We have been using this tool in production to handle rolling out database updates (including patches) to multiple environments, keep a log in each database of which scripts have been run, and do it all in an automated fashion. It has a command-line console so you can create batch scripts which use this tool. Check it out: https://github.com/bmontgomery/DatabaseVersioning
For what it's worth, this is a real example of a simple, low cost approach used by my former employer (and which I am trying to impress on my current employer as a basic first step).
Add a table called 'DB_VERSION' or similar. In EVERY upgrade script, add a row to that table which can include as little or as many columns as you see fit to describe the upgrade but at a minimum I would suggest { VERSION, EXECUTION_DATE, DESCRIPTION, EXECUTION_USER }. Now you have a concrete record of what has been going on. If someone runs their own unauthorised script you'd still need to follow the advice of the answers above, but this is just a simple way of dramatically improving on your existing versioning control (i.e. none).
Now let's you have an upgrade script from v2.1 to v2.2 of the database and you want to verify the lone maverick guy has actually run it on his database, you can just search for rows where VERSION = 'v2.2' and if you get a result, don't run this upgrade script. Can be built into a console utility app if necessary.
When there are a number of people working on a project, all of who could alter the database schema, what's the simplest way to unit test / test / verify it? The main suggestion we've had so far is to write tests for each table to verify column names, constraints, etc.
Has anyone else done anything similar / simpler? We're using C# with SQL Server, if that makes any real difference.
Updates:
The segment of the project we're working on is using SSIS packages to do the bulk of the work so there is very little C# code to write unit tests agains.
The code for creating tables / stored procedures is spread across SQL files. Because of the build system, we could maintain a separate VS DB project file as well, but I'm not sure how that would help us verify the schema either.
One possibly answer is to use Visual Studio for Database developers and keep your schema in source control with the rest of your code. This allows you to see differences and you get a history of who changed what.
Alternatively you could use a tool like SQLCompare to see what has been modified in one database compared to another.
Your (relational) database does two things as far as I'm concerned: 1) Hold data and 2) Hold relations between data.
Holding data is not a behavior so you would not test it
And for ensuring relations just use constraints. Lots of constraints. All over the place.
That is an interesting question! There are lots of tools out there for testing stored procedures but not for testing the database schema.
Don't you find that the unit tests written for code generally find any problems with the database schema?
One approach I have used is to write stored procedures to copy test data from the developer's schema to a test schema. This is pretty rough and ready as the stored procedures generally crash when they come across any differences between the schemas but it does alert you to any changes you haven't been told about.
And nominate someone to be the DBA who monitors changes to the schema?
I've had to do this type of thing before, although not in C#. To begin with, I built a schema migration tool, based on the discussion at Ode to Code (page 1 of 5) (there are also existing tools to do similar things). Importantly, the migration tool I built allowed you to specify the database you were applying the changes to and what version you wanted to apply. Then, following a test first methodology, whenever I needed to make a schema change I would write a test script which would create a test database, apply version changes to the one before my target change script, add some data, apply the change script under test, and confirm that the data was in an expected state.
My main goal with this was to confirm that no data was lost or corrupted during schema migrations, not to check specifically that the schema was in a particular state. A good awareness of your production data set is required, so you can write representative sample data for the tests.
It's debatable if this should be considered unit testing or integration testing. I would tend to consider it integration testing, based on the fact that I don't want to run old tests every time I iterate my code. Whatever you want to call it, I found it to be a useful tool for that situation.
This is an old question but it appears that people are still landing here. So the best tool I have found so far is "SQL Test" by Red Gate. It allows you to create scripts that run as transactions. Allowing you to run "sandboxed" queries for checking the state of the database.
This does not really fit the unit test paradigm. I would suggest version controlling the schema and limiting write access to a single qualified team member such as the DBA or team lead, who can validate any requested changes against the entire application. Schema changes should not be done haphazardly.
Don't you find that the unit tests written for code generally find any problems with the database schema?
This assumes, of course, that your tests test everything.