Unit testing using NDBUnit framework - c#

Am writing unit tests for an app that has matured a lot with time..We are using NDBUnit as the test cases become independent of each other..while we started the development of this app the DB schema was pretty manageable and hence dragging and dropping the tables on VS designer to create an XSD was never an issue. Well, with my current DB schema the XSD that is generated is more than 3MB in size. On slow dev machine VS goes off to sleep when one tries to open the XSD.
Hence keeping the DB schema and the XSD in sync has become very challenging.
Is there a way i can get rid of the manual step of modifying the XSD?
Do you suggest that i should consider other unit testing frameworks?
Spring.Net will definitely give me wt i need, but we don't have interfaces and hence integrating it will be a tedious task.

If you're manually using the Visual Studio designer to create your XSD files in a production environment, you're probably doing it wrong :) As you're discovering, the Visual Studio XSD designer is probably the least-managable way to maintain your XSD file(s).
I'd recommend doing as I have done and switch to the MyGeneration code-gen tool to create your XSD files from your database.
Info: http://www.mygenerationsoftware.com
Download: http://sourceforge.net/projects/mygeneration/
XSD Template: http://www.mygenerationsoftware.com/TemplateLibrary/Archive/?guid=59a03408-c96f-4baf-8171-b6bfe8725dab
Also, you should take careful note that while I demonstrated the use of the NDbUnit tool in those screencasts by brute-forcing the entire database into one single XSD file, the intended real-world usage pattern of the NDbUnit tool is to use multiple XSD file(s) as needed to support your different database-dependent tests.
Since the contents of the XSD file(s) control the 'scope' of your database that NDbUNit will operate on at any one time, the intent of the tool is that you would have many different XSD files for different of your data-dependent tests and that you would carefully scope the tests, the XSD, and the test data (XML) together so that they all closely correlate with each other for different collections of tests.
Having your entire database represented in single XSD file (especially when that XSD file is approaching 3MB!) is almost certainly an anti-pattern and should be cause for you to consider that you're probably not thinking about your tests or your test data in a granular-enough way to be effective.
If you cannot effectively load data into just a portion of your database tables without violating referential integrity rules then you probably have a database design problem to address. just as one single god-object is an anti-pattern in OO design, so too is one single monolithic database with referential-integrity contraints in so many places that only by loading EVERY table with data can records be inserted (forcing you to test the whole thing at once as it sounds like you may be doing here).
Lastly, just as a quick point-of-order in the interests of clarity, the NDbUnit project is now and has always been open-source and completely independent of Microdesk. While I was employed at Microdesk, the company did make use of the tool and I did spend my off-hours contributing to the project, but Microdesk was just one company that adopted the tool and so its completely erroneous to refer to NDbUnit as 'Microdesk's NDbUnit Framework'.
Hope this helps~!

Related

Schema Migration Scripts in NoSQL Databases

I have a active project that has always used C#, Entity Framework, and SQL Server. However, with the feasibility of NoSQL alternatives daily increasing, I am researching all the implications of switching the project to use MongoDB.
It is obvious that the major transition hurdles would be due to being "schema-less". A good summary of what that implies for languages like C# is found here in the official MongoDB documentation. Here are the most helpful relevant paragraphs (bold added):
Just because MongoDB is schema-less does not mean that your code can
handle a schema-less document. Most likely, if you are using a
statically typed language like C# or VB.NET, then your code is not
flexible and needs to be mapped to a known schema.
There are a number of different ways that a schema can change from one
version of your application to the next.
How you handle these is up to you. There are two different strategies:
Write an upgrade script. Incrementally update your documents as they
are used. The easiest strategy is to write an upgrade script. There is
effectively no difference to this method between a relational database
(SQL Server, Oracle) and MongoDB. Identify the documents that need to
be changed and update them.
Alternatively, and not supportable in most relational databases, is
the incremental upgrade. The idea is that your documents get updated
as they are used. Documents that are never used never get updated.
Because of this, there are some definite pitfalls you will need to be
aware of.
First, queries against a schema where half the documents are version 1
and half the documents are version 2 could go awry. For instance, if
you rename an element, then your query will need to test both the old
element name and the new element name to get all the results.
Second, any incremental upgrade code must stay in the code-base until
all the documents have been upgraded. For instance, if there have been
3 versions of a document, [1, 2, and 3] and we remove the upgrade code
from version 1 to version 2, any documents that still exist as version
1 are un-upgradeable.
The tooling for managing/creating such an initialization or upgrade scripts in SQL ecosystem is very mature (e.g. Entity Framework Migrations)
While there are similar tools and homemade scripts available for such upgrades in the NoSQL world (though some believe there should not be), there seems to be less consensus on "when" and "how" to run these upgrade scripts. Some suggest after deployment. Unfortunately this approach (when not used in conjunction with incremental updating) can leave the application in an unusable state when attempting to read existing data for which the C# model has changed.
If
"The easiest strategy is to write an upgrade script."
is truly the easiest/recommended approach for static .NET languages like C#, are there existing tools for code-first schema migration in NoSql Databases for those languages? or is the NoSql ecosystem not to that point of maturity?
If you disagree with MongoDB's suggestion, what is a better implementation, and can you give some reference/examples of where I can see that implementation in use?
Short version
Is "The easiest strategy is to write an upgrade script." is truly the easiest/recommended approach for static .NET languages like C#?
No. You could do that, but that's not the strength of NoSQL. Using C# does not change that.
are there existing tools for code-first schema migration in NoSql Databases for those languages?
Not that I'm aware of.
or is the NoSql ecosystem not to that point of maturity?
It's schemaless. I don't think that's the goal or measurement of maturity.
Warnings
First off, I'm rather skeptical that just pushing an existing relational model to NoSql would in a general case solve more problems than it would create.
SQL is for working with relations and on sets of data, noSQL is targeted for working with non-relational data: "islands" with few and/or soft relations. Both are good at what what they are targeting, but they are good at different things. They are not interchangeable. Not without serious effort in data redesign, team mindset and application logic change, possibly invalidating most previous technical design decision and having impact run up to architectural system properties and possibly up to user experience.
Obviously, it may make sense in your case, but definitely do the ROI math before committing.
Dealing with schema change
Assuming you really have good reasons to switch, and schema change management is a key in that, I would suggest to not fight the schemaless nature of NoSQL and embrace it instead. Accept that your data will have different schemas.
Don't do upgrade scripts
.. unless you know your application data set will never-ever grow or change notably. The other SO post you referenced explains it really well. You just can't rely on being able to do this in long term and hence you need a plan B anyway. Might as well start with it and only use schema update scripts if it really is the simpler thing to do for that specific case.
I would maybe add to the argumentation that a good NoSQL-optimized data model is usually optimized for single-item seeks and writes and mass-updates can be significantly heavier compared to SQL, i.e. to update a single field you may have to rewrite a larger portion of the document + maybe handle some denormalizations introduced to reduce the need of lookups in noSQL (and it may not even be transactional). So "large" in NoSql may happen to be significantly smaller and occur faster than you would expect, when measuring in upgrade down-time.
Support multiple schemas concurrently
Having different concurrently "active" schema versions is in practice expected since there is no enforcement anyway and that's the core feature you are buying into by switching to NoSQL in the first place.
Ideally, in noSQL mindset, your logic should be able to work with any input data that meets the requirements a specific process has. It should depend on its required input not your storage model (which also makes universally sense for dependency management to reduce complexity). Maybe logic just depends on a few properties in a single type of document. It should not break if some other fields have changed or there is some extra data added as long as they are not relevant to given specific work to be done. Definitely it should not care if some other model type has had changes. This approach usually implies working on some soft value bags (JSON/dynamic/dictionary/etc).
Even if the storage model is schema-less, then each business logic process has expectations about input model (schema subset) and it should validate it can work with what it's given. Persisted schema version number along model also helps in trickier cases.
As a C# guy, I personally avoid working with dynamic models directly and prefer creating a strongly typed objects to wrap each dynamic storage type. To avoid having to manage N concurrent schema version models (with minimal differences) and constantly upgrade logic layer to support new schema versions, I would implement it as a superset of all currently supported schema versions for given entity and implement any interfaces you need. Of course you could add N more abstraction layers ;) Once some old schema versions have eventually phased out from data, you can simplify your model and get strongly typed support to reach all dependents.
Also, it's important for logic layer should have a fallback or reaction plan should the input model NOT match the requirements for carrying out the intended logic. It's up to app when and where you can auto-upgrade, accept a discard, partial reset or have to direct to some trickier repair queue (up to manual fix if no automatics can cut it) or have to just outright reject the request due to incompatibility.
Yes, there's the problem of querying across sets of models with different versions, so you should always consider those cases as well. You may have to adjust querying logic to query different versions separately and merge results (or accept partial results if acceptable).
There definitely are tradeoffs to consider, sure.
So, migrations?
A downside (if you consider migrations tool set availability) is that you don't have one true schema to auto generate the model or it's changes as the C# model IS the source-of-truth schema you're currently supporting. Actually, quite similar to code-first mindset, but without migrations.
You could implement an incoming model pipe which auto-upgrades the models as they are read and hence reduce the number schema versions you need to support upstream. I would say this is as close to migrations as you get. I don't know any tools to do this for you automatically and I'm not sure I would want it to. There are trade-offs to consider, for example some clients consuming the data may get upgraded with different time-line etc. Upgrade to latest may not always be what you want.
Conclusion
NoSQL is by definition not SQL. Both are cool, but expecting equivalency or interchangeability is bound for trouble.
You still have to consider and manage schema in NoSQL, but if you want one true enforced & guaranteed schema, then consider SQL instead.
While Imre's answer is really great and I agree with it in every detail I would like to add more to it but also trying to not duplicate information.
Short version
If you plan to migrate your existing C#/EF/SQL project to MongoDB it is a high chance that you shouldn't. It probably works quite well for some time, the team knows it and probably hundreds or more bugs have been already fixed and users are more or less happy with it. This is the real value that you already have. And I mean it. For reasons why you should not replace old code with new code see here:
https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/.
Also more important than existence of tools for any technology is that it brings value and it works as promised (tooling is secondary).
Disclaimers
I do not like the explanation from mongoDB you cited that claims that statically typed language is an issue here. It is true but only on a basic, superficial level. More on this later.
I do not agree that EF Code First Migration is very mature - though it is really great for development and test environments and it is much, much better than previous .NET database-first approaches but still you have to have your own careful approach for production deployments.
Investing in your own tooling should not be a blocker for you. In fact if the engine you choose would be really great it is worthwhile to write some specific tooling around it. I believe that great teams rarely use tooling "off the shelves". They rather choose technologies wisely and then customize tools to their needs or build new tools around it (probably selling the tool a year or two years later).
Where the front line lays
It is not between statically and dynamically typed languages. This difference is highly overrated.
It is more about problem at hand and nature of your schema.
Part of the schema is quite static and it will play nicely both in static and dynamic "world" but other part can be naturally changing with time and it fits better for dynamically typed languages but not in the essence of it.
You can easily write code in C# that has a list of pairs (key, value) and thus have dynamism under control. What dynamically typed languages gives you is impression that you call properties directly while in C# you access it by "key". While being easier and prettier to use for developer it does not save you from bigger problems like deploy schema changes, access different versions of schemas etc.
So static/dynamic languages case is not an issue here at all.
It is rather drawing a line between data that you want to control from your code (that is involved in any logic) and the other part that you do not have to control strictly. The second part do not have to be explicitly and minutely expressed in schema in your code (it can be rather list or dictionary than named fields/properties because maintaining such fields costs you but does not brings any value).
My Use Case
Once upon a time my team has made a project that uses three different databases:
SQL for "usual" configuration and evidence stuff
Graph database to make it natural to build wide network of arbitrarily connected objects
Document database tuned for searching (Elastic Search in fact) to make searching instant and really modern (like dealing with typos or the like)
Of course it is a challenge to deploy such wide technology stack but each part of it brings its best to the whole solution.
The aim of the project is to search through a knowledge base of literally anything (projects, peoples, books, products, documents, simply anything).
That's why SQL is here only to record a list of available "knowledge databases" and users assigned to them. The schema here is obvious, stable and trivial. There is low probability of changes in the future.
Next, graph database allows to literally "throw" anything into the database from different sources around and connect things with each other. The idea, to put it simply, is to have objects accessible by ID.
Next, Elastic search is here to accumulate IDs and a selected subset of properties to make them searchable in the instant. Here the schema contains only ID and list of pairs (key, value).
As the final step, to put it simply, the solution calls Elastic Search, gets Ids and displays details (schema is irrelevant as we treat it as a list of pairs key x value, so GUI is prepared to build screens dynamically).
Though the way to the solution was really painful.
We tested a few graph databases by running concept proofs to find that most of them simply does not work in operations like updating data! (ugh!!!) Finally we have found one good enough DB.
On the other hand finding and using Elastic Search was a great pleasure! Though being great you have to be aware that under pressure of uploading massive data it can break so you have to adjust your tooling to adapt to it.
(so no silver bullet here).
Going into more widely used direction
Apart from my use case which is kind of extreme usually you have sth "in-between".
For example a database for documents.
It can have almost static "header" of fields like ID, name, author, and so on and your code can manage it "traditionally" but all other fields could be managed in a way that it can exists or not and can have different contents or structure.
"The header" is the part you decided to make it relevant for the project and controllable by the project. The rest is rather accompanying than crucial (from the project logic point of view).
Different approaches
I would rather recommend to learn about strengths of particular NoSQL database types, find answers why were they created, why are they popular and useful. Then answer in which way they can bring benefits to your project.
BTW. This is interesting why you have indicated MongoDB?
The other way around would be to answer what are your project's current greatest weaknesses or greatest challenges from technological point of view - being it performance, cost of support changes, need to scale significantly or other. Then try to answer if some NoSQL DB would be great at resolving the issue.
Conclusion
I'm sure you can find benefits of NoSQL databases to your project either by replacing part of it or by bringing new values to users (searching for example?). Either way I would prefer a really good technology that brings what it promises rather than looking if it is fully supported by tools around it.
And also concept proof is a really good tool to check technologies in a scenario that is very simple but at the same time meaningful for you. But the approach should be not to play with technologies but aggressively and quickly prove or disprove quality of them.
There are so much promises and advertises around that we should protect ourselves by focusing of the real things that works.

How may I generate a DDL, DML and C# ORM from an XSD file?

I have been searching for a .Net implementation of the Active Record Pttern, using either the Castle Project or SubSonic.
I want to generate ORM classes from an XSD file. Can anyone tell me how to go about doing this?.
[Edit]
Although the technologies I mentioned seem disparate, here is how I envisage using them together:
XSD specifies the entities, data types and permitted values
This information maps "naturally" to not only a DDL, but also fixtures data. In short, the XSD can be used to generate DDL and DML (specifically INSERT) statements, so that the newly created database is suitably initialised when the DDL and DML statements are generated and run from the XSD.
Ok, so now we have database entities, generated and initialised from the XSD. A natural step is to creat a DB access layer to the underlying database, to provide CRUD functionality - this is where the ORM comes in.
For reasons I won't go into here, I have settled on the Active Record Pattern for Db access, and this is where Castle Project (or SubSonic) comes in to play
So, what I am trying to find out is how to automate this process (generating DDL, DML and ORM) so that when the XSD changes, I can synchronize the db and ORM.
Like I said before, I know what I want to do, but I don't know if this is the way to do it (thats why I asked the question). I notice that T4 keeps getting mentioned - maybe this is the way to go - could anyone with some T4 knowledge tell me if this is possible, and also possible provide a rough guideline as to how to do what I described above?
As an aside, the ORM classes are to be in C# - and also, I am running on Linux (mono) not Windows.

How to transfer data (Entities) from one database to another

I have a system (using Entity Framework) that is deployed in various production systems and also on a quality control system. My problem is that data entry is often done only on one of those occurrences of my system (different databases).
I want to find the best way to transfer my data from one database to another database. Ids can change, as long as the relations between my objects are maintained. 98% of my data in in DB, some of it is external files, I can manage those separately, manually.
Currently we use a xml structure as a transition file. The file is then imported in the destination system, and code manually imports the entities and re-creates the data.
I'm looking for a more generic way to do this, with less code. Since all my data in stored in Entities couldn't I simply create a big List and throw all my objects in there, then serialize that in some matter into an external file and finally generically import all the entities in there in my destination system? (I'll probably have to be careful in maintaining relation ids, but should be ok...)
Anyways I'm wondering if anyone would have smart approaches, I'm pretty sure I,m not the first with a similar problem.
Thanks!
You need to get some process around this. If all environments contain the same data (unlikely) you can replicate. It is the most automatic. But a QA environ should not update production, so you have to really think this through.
If semi-automated is okay, there are tools out there you can use from a variety of vendors. I use Red Gate tools, personally, but others are also fine.
Can you set up a more automated push with EF? Sure, but the amount of time you spend is really not worth it.
In my opinion you can check some of the following approaches:
1) Use Sql Compare or Sql Data Compare. Those tools are from Red Gate and can be found here
2) Regular backups and restores of the databases. You could, if it is an option regularly backup your most up-to-date database and restore it on the destination systems. I have no experience in automatizing this but here is a link to do that through .net.
3) You could always give it a go creating a version control system of your own. I would picture one such system selecting all records from a certain table (or all of them), deleting all records in the target database and inserting them. This seems pretty complex though, as you have to worry about relationships, data dependencies, etc.
Hope this helps in some way.
Regards
If you for some reason will not be satisfied with existing tools may be you'll want take a look at the Sync Framework and implement this functionality yourself for your very particular data bases.
Given what you described, pushing data from One SQL Server to another for demo purposes, you should consider SQL Server Integration Services.
If you're got a simple scenario where you just move the data and objects from DB to the next you can use their built-in Wizards. If you need to do custom stuff you can build complex workflows using C# and SQL (tools you already know). Note: most of what you're going to want comes with the standard edition so if you're using express this is less interesting.
The story for Red Gate products is more compelling when you don't have SQL Server (So you have to go out and buy something) and if you are interested in finding out what the changes are between DB's (like viewing code changes in a .cs file in a source control product)

How do you (Unit) Test the database schema?

When there are a number of people working on a project, all of who could alter the database schema, what's the simplest way to unit test / test / verify it? The main suggestion we've had so far is to write tests for each table to verify column names, constraints, etc.
Has anyone else done anything similar / simpler? We're using C# with SQL Server, if that makes any real difference.
Updates:
The segment of the project we're working on is using SSIS packages to do the bulk of the work so there is very little C# code to write unit tests agains.
The code for creating tables / stored procedures is spread across SQL files. Because of the build system, we could maintain a separate VS DB project file as well, but I'm not sure how that would help us verify the schema either.
One possibly answer is to use Visual Studio for Database developers and keep your schema in source control with the rest of your code. This allows you to see differences and you get a history of who changed what.
Alternatively you could use a tool like SQLCompare to see what has been modified in one database compared to another.
Your (relational) database does two things as far as I'm concerned: 1) Hold data and 2) Hold relations between data.
Holding data is not a behavior so you would not test it
And for ensuring relations just use constraints. Lots of constraints. All over the place.
That is an interesting question! There are lots of tools out there for testing stored procedures but not for testing the database schema.
Don't you find that the unit tests written for code generally find any problems with the database schema?
One approach I have used is to write stored procedures to copy test data from the developer's schema to a test schema. This is pretty rough and ready as the stored procedures generally crash when they come across any differences between the schemas but it does alert you to any changes you haven't been told about.
And nominate someone to be the DBA who monitors changes to the schema?
I've had to do this type of thing before, although not in C#. To begin with, I built a schema migration tool, based on the discussion at Ode to Code (page 1 of 5) (there are also existing tools to do similar things). Importantly, the migration tool I built allowed you to specify the database you were applying the changes to and what version you wanted to apply. Then, following a test first methodology, whenever I needed to make a schema change I would write a test script which would create a test database, apply version changes to the one before my target change script, add some data, apply the change script under test, and confirm that the data was in an expected state.
My main goal with this was to confirm that no data was lost or corrupted during schema migrations, not to check specifically that the schema was in a particular state. A good awareness of your production data set is required, so you can write representative sample data for the tests.
It's debatable if this should be considered unit testing or integration testing. I would tend to consider it integration testing, based on the fact that I don't want to run old tests every time I iterate my code. Whatever you want to call it, I found it to be a useful tool for that situation.
This is an old question but it appears that people are still landing here. So the best tool I have found so far is "SQL Test" by Red Gate. It allows you to create scripts that run as transactions. Allowing you to run "sandboxed" queries for checking the state of the database.
This does not really fit the unit test paradigm. I would suggest version controlling the schema and limiting write access to a single qualified team member such as the DBA or team lead, who can validate any requested changes against the entire application. Schema changes should not be done haphazardly.
Don't you find that the unit tests written for code generally find any problems with the database schema?
This assumes, of course, that your tests test everything.

How do I verify that my LINQ-to-SQL model matches the database schema?

I am absolutely new to the .NET world, and started with C# on friday. I have some experience with database apps, though.
We will go with LINQ-to-SQL for a medium scale project. I am used to generating my schema from classes and keep track of changes with subversion and equivalents to Ruby's Migrations. There obviously is no easy way to do this with LINQ itself.
So I thought of generating the schema (and do some data access) with Castle Project's ActiveRecord and use Migrator.NET Tarantino or dbdeploy.net for the schema updates. (Any suggestions for this?)
My main question is: How do I verify that my LINQ classes still match the database schema? Does LINQ throw exceptions if the schema does not match? Can I iterate over all the LINQ classes and invoke some verify method?
I already found that sqlmetal is the way to regenerate the classes.
PS: We will use SQL Server (2008 or 2005).
This tool will help sync them, but I'm unsure if it'll show differences...may be of some use though. (:
EDIT: As KristoferA said in his comment, it does support comparisons (: - thanks KristoferA.
As for the:
I am used to generating my schema
from classes and keep track of changes
I just added a DDL generation feature to Huagati DBML/EDMX Tools (ver 1.47, released today). It can generate DDL diff scripts in case you have added things to your Linq-to-SQL designer but not yet added them to the database.

Categories

Resources