How to Test Functions w/ Complex Data Interactions

How to Test Functions w/ Complex Data Interactions - c#

Currently, I am working on system that does quite a bit of reporting-style functions that consumes many different data points and transforms them into larger, sometimes flattened outputs. Most of my app is built upon a variation of the repository pattern. Due to this, I have a suite of mock-repositories that I use for testing scenarios. The problem that I am running into is that the interaction between these data points is so complex that it is quickly become a maintenance nightmare to maintain the "mock data". Here is a mock example:
public class SomeReportingEntity
{
private IProductRepo ProductRepo;
private IManagerRepo ManagerRepo;
private ILocationRepo LocationRepo;
private IOrdersService OrdersService;
private IEmployeeRepo EmployeeRepo;
public ReportingEntity(IProductRepo ipr, IManagerRepo imr, ILocationRepo ilr, IOrdersService ios,
IEmployeeRepo ier){
//Load these to private vars...
}
//This is the function that I want to test...
public SomeReportingEntity GetManagerSalesByRegionReport()
{
//Make a complex join on all sub collections. These
//sub collections are all under test individually.
var MangerSalesByRegionItems = From x in ProductRepo.CurrentProducts()
Join y in OrdersService.FutureOrders() On ...
Join z in EmployeeRepo.ActiveEmployees() On ...
Join a in LocationRepo.GetAllRegions() On ...
Join b In ManagerRepo.GetActiveManagers On ...
Select new SomeReportingEntity() With { ... }
return MangerSalesByRegionItems.ToList();
}
}
Admittedly, this is a very contrived example but the basic idea that I want to emphasize is that I have several repositories that I am joining and I need to create many tests to ensure that this complex query does as expected. Due to the fact that the joining operations are so complex, it makes the mock data VERY difficult to keep in line - especially as I have to add more associations and test additional points. In addition, I need to be able to enter specific record states into the mocks (such as an employee lacking an assigned manager) to verify that query handles those situations appropriately.
So here are my questions:
What is the best way to "mock" this data so that it is not such a matinenance nightmare? I have had many people suggest building an in-memory database to support this.
Am I really suffering from an architecture issue here? In reporting scenarios, I find myself in this pattern quite a bit where I take many disassociated data points and merge them into a new, hybrid entity. With the onset of Linq, it is very easy to do and has high clarity of intent, but sometimes it feels like I am cheating a little.

The first thing you want to do is make centralized object that knows how to retrieve the data for different repositories. Since this is reporting only, it's easier because you don't have to worry about change tracking.
From a logistical standpoint, one thing I would consider is making a local database to hold the remote data (update periodically using agents). This would remove some of the issues of calling remote services and aggregating their data on the fly. You would also be able to pre-process some of the data at the start.
When I use the repository pattern, I couple it with the Unit Of Work pattern. The Unit of Work is the guy that does all the legwork for you. Theoretically, your UoW could bring in the data from the multiple services and present it to the repositories based on configuration.
For testing, you can use the InMemoryUnitOfWork to provide all the data in one single place.

I've been working on data-heavy project myself. What has worked for us is to use the Repository itself to hydrate objects and then serialize them to XML. We pull the XML file into our test project and use that as the starting point for our automated tests. It's nice because it ensures that your mock data looks like real data.
Our tests tend to look like this...
var object1 = XmlUtil.LoadObject1("filename1");
var object2 = XmlUtil.LoadObject2("filename2");
var result = SomeConverter.Convert(object1, object2);
Assert("somevalue", result.Property1);
If you need to do inline lookups, you can add a mock repository that would provide the same level of dependency injection.
The downside of this approach is if the data schema changes. Sometimes, a test can become obsolete if the data schema has changed. If your schema is still under a lot of flux, I would keep your automated test small until the schema settles down. Focus on unit tests until you know that the schema is relaitively stable.

You have to decide exactly what you want to test.
One way to do this might be to pretend you're using TDD. Pretend that your GetManagerSalesByRegionReport method does not exist (or actually delete it). You'll have to:
Write a failing unit test. What's the simplest thing for it to test: that you can call the method and that it doesn't throw an exception when there's nothing wrong with the data.
You'll need to create the method, empty. It should return void since your test doesn't need it to return anything.
Your test should now pass.
Add a test to ensure that a List of the appropriate type is returned, even if none of the sub-repositories have data.
You'll have to change the method to return your list type, and you'll have to change it to return null. Your test will still fail, so change it to return an empty List and it will pass.
What's left? Those are INNER joins, so you won't get any data back unless all the repositories contain at least one row. So, test for that: create a test where each repo contains one row and ensure the returned list contains the appropriate number of rows. Then, test for the appropriate properties per returned row. Then test that no data is returned if any of the repos contain no rows.
Then, maybe test what happens if some of the repos contain more than one row.
Then, I don't know what would be left to test.

Related

Using Entity Framework to return a table of data to iterate against

I am currently using EF 6 to do the following. Execute a stored procedure, then bring in the data I need to use. The data is usually 30-40 rows per application run.
I then iterate over the var, object, table (whatever you would like to call it), performing similar (sometimes different) tasks on each row. It works great. I am able to create an Entity object, expose the different complex functions of it, and then create a var to iterate over.
Like:
foreach (var result in StoredProcedureResult)
{
string strFirstname = result.FirstName
string strLastName = result.LastName
//more logic goes here using those variables and interacting with another app
}
I recently thought it would be cool if I had a class solely for accessing the data. In this way, I could just reference that class, toss the corresponding connection string into my app.config, and then I can keep the two sets of logic separate. So when attempting to do the above in that structure, I get to the point at which, you can't return a var, or when I attempt to match object return type. The return type of the execution of a stored procedure is object (which I can't iterate on).
So my question is, how does one get to the above example, except, the var result, get returned from this data access class?
If I am missing something, or its not possible because I am doing this incorrectly, do let me know. It appeared right in my head.

I'm not going to describe the architecture in full. But based on your comments you can do the following (this is not the definitive nor the only way how to do it):
in your data access project you keep the DBContext class, all the code for the stored procedure call and also the class that defines the result of the SP call, let's call it class A;
in your shared layer project - I would suggest calling it Service layer - you can create a XYService class, that has a method e.g. GetListOfX that connects to the DB and calls the procedure, if needed this method can also perform some logic, but more importantly: it doesn't return class A, but returns a new class B (this one is defined in the service layer, or can be defined in yet another project - that might be the true shared/common project; as it would be just a definition of common structures it isn't really a layer);
in your application layer you work only with the method GetListOfX of the XYService and the class B, that way you don't need a reference to the data access project
In a trivial case the class B has the same properties as the class A. But depending on your needs the class B can have additional properties/functionality it can also ignore some properties of A or even combine multiple properties into one: e.g. combining the FirstName and LastName as one property called simply Name.
Basically what you are looking for is the multi-tier application architecture (usually 3-4 tier). The full extent of such approach (which includes heavy usage of concepts like interfaces and dependency injection) might not be suitable or needed based on your goals, e.g. if you are building just a small application for yourself with a couple of functions or you know there won't be any reuse of the components of the final solution, then this approach is too wasteful and you can work faster with everything in one project - you should still apply principles like SOLID, DRY and Separation of concerns.

Any point unit testing methods that use EF or Linq?

I am writing unit tests for my service layer, and i completely see the point of creating unit tests to validate logic. For example, if i create a function that adds two numbers, sure write a unit test for this.
But if in my service layer method i have something like this
public ICollection<MyEntity> GetAll()
{
return _context.MyEntities
.Where(e => !e.IsDeleted)
.ToList();
}
What is the point in unit testing this? Since i am getting this from a database, it seems stupid to mock the database, because i am then just assuming that Linq is working as it should be?
Would it not be better to test this against an actual "test" database with sameple data in it. This way i can see if the number of record that are retrieved from the database match what i would expect?
I know that testing against a database, makes this more of a integration test, but is it really valid for unit testing?
What if i take another example, say this
public int Delete(long id)
{
_context.Database.ExecuteCommand("DELETE FROM myTable WHERE Id = ?", id);
return _context.SaveChanges();
}
How can this function be unit tested? If i mock _context.Database and create a unit test that checks if _context.SaveChanges is being called (which i see no point in what so ever), there is no guarntee that it will actually delete my data. WHat if i have a foreign key constraint? The mock would pass, but the actual method would fail really?
I am just starting to think, that unless a method actually calculates some sort of logic, i dont see the point/reason for creating a unit test, especially when using Entity framework?

I think it makes sense to unit test nearly all types of functions:
"What is the point in unit testing this? Since i am getting this from a database, it seems stupid to mock the database, because i am then just assuming that Linq is working as it should be?"
You are not testing Linq, but you are testing the function; the function has the name GetAllAsync; and simply I can assume that this will return all of MyEntity instances stored in database. But it simply returns only deleted items; unit testing is not just verifying if the function works properly; it is also a way to check whether this function is named properly.
Also this function has a problem; what if
_context.MyEntities(e => !e.IsDeleted) returns null? ToList will throw an exception. Then unit testing will help to identify potential problems if you test for extreme values.
Also, unit testing forces you to employ abstraction. If you can not unit test a method, the method may have problems, you need to investigate that method and re-factor.
_context.Database.ExecuteCommand("DELETE FROM myTable WHERE Id = ?", id); In my opinion this line of code needs to stay somewhere else, not in the service layer (in repository maybe?). What if id is "-1"? how will you handle the exception?
I think it is really hard to state a generic rule about not unit testing methods that include Linq.

Why would you unit test a function that adds two numbers? You're not testing the + operator any more than you're trying to test LINQ or EF. You are testing behaviour so it's perfectly valid to test things that you might assume "just work". If, for example, I banned the use of EF in your application, you'd still need a test to ensure correctness in whatever you replaced the function with.
Where do you want to draw the line?

For database related code, I would actually test against a real database, not a mock. It's the only way you can be sure that the SQL is valid (whether you write it by hand or let an ORM generate it for you). There are tools to make that easier, like Respawn.

How to write an integration test in NUnit?

We are two students writing our bachelor thesis and we have developed a Windows Application, which should be able to aid a restaurant in various communication processes. Fundamentally, it should be able to present information about the order from the moment a guest send it to it is served.
We have omitted to test during the development but have decided to write unit tests now. Nevertheless, we have found out that the most suitable test we can write to our system now are integration tests because all the methods in our classes are bound to SQL stored procedures via LINQ to SQL. We are aware of the usage of stubs to fake out a dependency to a database, but when our database already is implemented together with all the functions, we figured it would give us more value to test several methods together as an integration test.
As seen in the code below we have tried to follow the guide lines for a unit test, but is this the right way to write an integration test?
[Test]
public void SendTotalOrder_SendAllItemsToProducer_OneSentOrder()
{
//Arrange
Order order = new Order();
Guest guest = new Guest(1, order);
Producer producer = new Producer("Thomas", "Guldborg", "Beverage producer");
DataGridView dataGridView = new DataGridView { BindingContext = new BindingContext() };
order.MenuItemId = 1;
order.Quantity = 1;
//Act
guest.AddItem();
dataGridView.DataSource = guest.SendOrderOverview();
guest.SendOrder(dataGridView);
dataGridView.DataSource = producer.OrderOverview();
var guestTableOrder = producer.OrderOverview()
.Where(orders => orders.gtid == guest.GuestTableId)
.Select(producerOrder => producerOrder.gtid)
.Single();
//Assert
Assert.That(guestTableOrder, Is.EqualTo(guest.GuestTableId));
}

Yes, generally speaking, this is how to write a unit test/integration test. You observe some important guidelines:
Distinct Act-Arrange-Assert steps
The test name describes these steps (maybe it should have something like "ShouldSendOneOrder" at the end, "Should" is commonly used to describe the Assert).
One Assert per test.
I assume you also obey other guidelines:
Tests are independent: they don't change persistent state, so they don't influence other tests.
Test realistic use cases: don't arrange constellations that violate business logic, don't do impossible acts. Or: mimic the real application.
However, I also see things that raise eyebrows.
It's not clear which act you test. I think some "acts" belong to the arrange step.
A method like producer.OrderOverview() makes me suspect that domain objects execute database interaction. If so, this would violate persistence ignorance. I think there should be a service that presents this method (but see below).
It's not clear why dataGridView.DataSource = producer.OrderOverview(); is necessary for the test. If it is, this only aggravates the most serious point:
Business logic and UI are entangled!!
Method like guest.SendOrderOverview() and producer.OrderOverview() are smelly: why should a domain object know how to present its content? That's something a presenter (MVP) or a controller (MVC) or a view model (MVVM) should be responsible for.
A method like guest.SendOrder(dataGridView) is evil. It ties the domain layer to the UI framework. This fixed dependency is evil enough, but of course you also need values from the grid view inside this method. So the business logic needs intimate knowledge of some UI component. This violates the tell - don't ask principle. guest.SendOrder should have simple parameters that tell it how to do its task and the domain shouldn't have any reference to any UI framework.
You really should address the latter point. Make it your goal to run this test without any interaction with DGV.

If you continue to bound sql in class,your test is not a big problem.
You can use this method when the program logic is very simple,But I suggest you study The Repository Pattern,as the logic becomes more complex.

Is it ok to use C# Property like this

One of my fellow developer has a code similar to the following snippet
class Data
{
public string Prop1
{
get
{
// return the value stored in the database via a query
}
set
{
// Save the data to local variable
}
}
public void SaveData()
{
// Write all the properties to a file
}
}
class Program
{
public void SaveData()
{
Data d = new Data();
// Fetch the information from database and fill the local variable
d.Prop1 = d.Prop1;
d.SaveData();
}
}
Here the Data class properties fetch the information from DB dynamically. When there is a need to save the Data to a file the developer creates an instance and fills the property using self assignment. Then finally calls a save. I tried arguing that the usage of property is not correct. But he is not convinced.
This are his points
There are nearly 20 such properties.
Fetching all the information is not required except for saving.
Instead of self assignment writing an utility method to fetch all will have same duplicate code in the properties.
Is this usage correct?

I don't think that another developer who will work with the same code will be happy to see :
d.Prop1 = d.Prop1;
Personally I would never do that.
Also it is not the best idea to use property to load data from DB.
I would have method which will load data from DB to local variable and then you can get that data using property. Also get/set logically must work with the same data. It is strange to use get for getting data from DB but to use set to work with local variable.

Properties should really be as lightweight as possible.
When other developers are using properties, they expect them to be intrinsic parts of the object (that is, already loaded and in memory).
The real issue here is that of symmetry - the property get and set should mirror each other, and they don't. This is against what most developers would normally expect.
Having the property load up from database is not recommended - normally one would populate the class via a specific method.

This is pretty terrible, imo.
Properties are supposed to be quick / easy to access; if there's really heavy stuff going on behind a property it should probably be a method instead.
Having two utterly different things going on behind the same property's getter and setter is very confusing. d.Prop1 = d.Prop1 looks like a meaningless self-assignment, not a "Load data from DB" call.
Even if you do have to load twenty different things from a database, doing it this way forces it to be twenty different DB trips; are you sure multiple properties can't be fetched in a single call? That would likely be much better, performance-wise.

"Correct" is often in the eye of the beholder. It also depends how far or how brilliant you want your design to be. I'd never go for the design you describe, it'll become a maintenance nightmare to have the CRUD actions on the POCOs.
Your main issue is the absense of separations of concerns. I.e., The data-object is also responsible for storing and retrieving (actions that need to be defined only once in the whole system). As a result, you end up with duplicated, bloated and unmaintainable code that may quickly become real slow (try a LINQ query with a join on the gettor).
A common scenario with databases is to use small entity classes that only contain the properties, nothing more. A DAO layer takes care of retrieving and filling these POCOs with data from the database and defined the CRUD actions only ones (through some generics). I'd suggest NHibernate for the ORM mapping. The basic principle explained here works with other ORM mappers too and is explained here.
The reasons, esp. nr 1, should be a main candidate for refactoring this into something more maintainable. Duplicated code and logic, when encountered, should be reconsidered strongly. If the gettor above is really getting the database data (I hope I misunderstand that), get rid of it as quickly as you can.
Overly simplified example of separations of concerns:
class Data
{
public string Prop1 {get; set;}
public string Prop2 {get; set;}
}
class Dao<T>
{
SaveEntity<T>(T data)
{
// use reflection for saving your properies (this is what any ORM does for you)
}
IList<T> GetAll<T>()
{
// use reflection to retrieve all data of this type (again, ORM does this for you)
}
}
// usage:
Dao<Data> myDao = new Dao<Data>();
List<Data> allData = myDao.GetAll();
// modify, query etc using Dao, lazy evaluation and caching is done by the ORM for performance
// but more importantly, this design keeps your code clean, readable and maintainable.
EDIT:
One question you should ask your co-worker: what happens if you have many Data (rows in database), or when a property is a result of a joined query (foreign key table). Have a look at Fluent NHibernate if you want a smooth transition from one situation (unmaintainable) to another (maintainable) that's easy enough to understand by anybody.

If I were you I would write a serialize / deserialize function, then provide properties as lightweight wrappers around the in-memory results.
Take a look at the ISerialization interface: http://msdn.microsoft.com/en-us/library/system.runtime.serialization.iserializable.aspx

This would be very hard to work with,
If you set the Prop1, and then get Prop1, you could end up with different results
eg:
//set Prop1 to "abc"
d.Prop1 = "abc";
//if the data source holds "xyz" for Prop1
string myString = d.Prop1;
//myString will equal "xyz"
reading the code without the comment you would expect mystring to equal "abc" not "xyz", this could be confusing.
This would make working with the properties very difficult and require a save every time you change a property for it to work.

As well as agreeing with what everyone else has said on this example, what happens if there are other fields in the Data class? i.e. Prop2, Prop3 etc, do they all go back to the database, each time they are accessed in order to "return the value stored in the database via a query". 10 properties would equal 10 database hits. Setting 10 properties, 10 writes to the database. That's not going to scale.

In my opinion, that's an awful design. Using a property getter to do some "magic" stuff makes the system awkward to maintain. If I would join your team, how should I know that magic behind those properties?
Create a separate method that is called as it behaves.

How do you add sample (dummy) data to your unit tests?

In bigger projects my unit tests usually require some "dummy" (sample) data to run with. Some default customers, users, etc. I was wondering how your setup looks like.
How do you organize/maintain this data?
How do you apply it to your unit tests (any automation tool)?
Do you actually require test data or do you think it's useless?
My current solution:
I differentiate between Master data and Sample data where the former will be available when the system goes into production (installed for the first time) and the latter are typical use cases I require for my tests to run (and to play during development).
I store all this in an Excel file (because it's so damn easy to maintain) where each worksheet contains a specific entity (e.g. users, customers, etc.) and is flagged either master or sample.
I have 2 test cases which I (miss)use to import the necessary data:
InitForDevelopment (Create Schema, Import Master data, Import Sample data)
InitForProduction (Create Schema, Import Master data)

I use the repository pattern and have a dummy repository that's instantiated by the unit tests in question, it provides a known set of data that encompasses a examples that are both within and out of range for various fields.
This means that I can test my code unchanged by supplying the instantiated repository from the test unit for testing or the production repository at runtime (via a dependency injection (Castle)).
I don't know of a good web reference for this but I learnt much from Steven Sanderson's Professional ASP.NET MVC 1.0 book published by Apress. The MVC approach naturally provides the separation of concern that's necessary to allow your testing to operate with fewer dependencies.
The basic elements are that you repository implements an interface for data access, that same interface is then implemented by a fake repository that you construct in your test project.
In my current project I have an interface thus:
namespace myProject.Abstract
{
public interface ISeriesRepository
{
IQueryable<Series> Series { get; }
}
}
This is implemented as both my live data repository (using Linq to SQL) and also a fake repository thus:
namespace myProject.Tests.Respository
{
class FakeRepository : ISeriesRepository
{
private static IQueryable<Series> fakeSeries = new List<Series> {
new Series { id = 1, name = "Series1", openingDate = new DateTime(2001,1,1) },
new Series { id = 2, name = "Series2", openingDate = new DateTime(2002,1,30),
...
new Series { id = 10, name = "Series10", openingDate = new DateTime(2001,5,5)
}.AsQueryable();
public IQueryable<Series> Series
{
get { return fakeSeries; }
}
}
}
Then the class that's consuming the data is instantiated passing the repository reference to the constructor:
namespace myProject
{
public class SeriesProcessor
{
private ISeriesRepository seriesRepository;
public void SeriesProcessor(ISeriesRepository seriesRepository)
{
this.seriesRepository = seriesRepository;
}
public IQueryable<Series> GetCurrentSeries()
{
return from s in seriesRepository.Series
where s.openingDate.Date <= DateTime.Now.Date
select s;
}
}
}
Then in my tests I can approach it thus:
namespace myProject.Tests
{
[TestClass]
public class SeriesTests
{
[TestMethod]
public void Meaningful_Test_Name()
{
// Arrange
SeriesProcessor processor = new SeriesProcessor(new FakeRepository());
// Act
IQueryable<Series> currentSeries = processor.GetCurrentSeries();
// Assert
Assert.AreEqual(currentSeries.Count(), 10);
}
}
}
Then look at CastleWindsor for the inversion of control approach for your live project to allow your production code to automatically instantiate your live repository through dependency injection. That should get you closer to where you need to be.

In our company we discuss exact these problem a bunch of time since weeks and month.
To follow the guideline of unit testing:
Each test must be atomar and don't allow relate to each other (No data sharing), that means, each tust must be have there own data at the beginning and clear the data at end.
Out product is so complex (5 years development, over 100 tables in a database), that is nearly impossible to maintain this in a acceptable way.
We tried out database scripts, which creates and deletes the data before / after the test (there are automatic methods which call it).
I would say you are on a good way with excel files.
Ideas from me to make it a little well:
If you have a database behind your software google for "NDBUnit". It's a framework to insert and delete data in databases for unit tests.
If you have no database maybe XML is a little more flexible on systems like excel.

Not directly answering the question but one way to limit the amount of tests that need to use dummy data is to use a mocking framework to create mocked objects that you can use to fake the behavior of any dependencies you have in a class.
I find that using mocked objects rather then a specific concrete implementation you can drastically reduce the amount of real data you need to use as mocks don't process the data you pass into them. They just perform exactly as you want them to.
I'm still sure you probably need dummy data in a lot of instances so apologies if you're already using or are aware of mocking frameworks.

Just to be clear, you need to differenciate between UNIT testing (test a module with no implied dependencies on other modules) and app testing (test parts of application).
For the former, you need a mocking framework (I'm only familiar with Perl ones, but i'm sure they exist in Java/C#). A sign of a good framework would be ability to take a running app, RECORD all the method calls/returns, and then mock the selected methods (e.g. the ones you are not testing in this specific unit test) using recorded data.
For good unit tests you MUST mock every external dependency - e.g., no calls to filesystem, no calls to DB or other data access layers unless that is what you are testing, etc...
For the latter, the same mocking framework is useful, plus ability to create test data sets (that can be reset for each test). The data to be loaded for the tests can reside in any offline storage that you can load from - BCP files for Sybase DB data, XML, whatever tickles your fancy. We use both BCP and XML.
Please note that this sort of "load test data into DB" testing is SIGNIFICANTLY easier if your overall company framework allows - or rather enforces - a "What is the real DB table name for this table alias" API. That way, you can cause your application to look at cloned "test" DB tables instead of real ones during testing - on top of such table aliasing API's main purpose of enabling one to move DB tables from one database to another.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.