Serialization to file in data layer - c#

I am trying to update a legacy application and need some advice about how to organize the data level.
Today, all the data is stored in a binary file created with the help of binary serialization. The data that is stored is a several levels deep tree structure.
The object level of the saved data:
ApplicationSettings
CommunicationSettings
ConfigurationSettings
HardwareSettings
and so forth some additional levels
All this classes have a lot of logic to do different things. They also have status information that should not be saved to the file.
The data is constantly updated during the execution of the program, and saved when updated to the binary file by the "business logic".
I try to update the program, but doing unit tests for this is a nightmare.
I want the data still be saved in a file in any way. But otherwise, I'm open to suggestions how to improve this.
Edit:
The program is quite small, and I do not want to be dependent on large, complex frameworks.
The reason I need help is to try to clean up the code where virtually the entire application logic is in one huge method.

What I would do;
First, turn the settings into contracts;
public interface IApplicationSettings
{
ICommunicationSettings CommunicationSettings{get;}
IConfigurationSettings ConfigurationSettings{get;}
}
Now, break up your logic into separate concerns and pass in your settings at the highest level posible; Such that if MyLogicForSomething only concerns itself with the communication settings, then only pass in the communication settings;
public class MyLogicForSomething
{
public MyLogicForSomething(ICommunicationSettings commSettings)
{
}
public void PerformSomething(){/* ... */}
}
ICommunicationSettings is easily mockable here; with something like Rhino Mocks
You can now easily test to ensure something in your settings is called/set when you run your logic
var commSettings = MockRepository.GenerateStub<ICommunicationSettings>();
var logic = new MyLogicForSomething(commSettings );
logic.PerformSomething()
commSettings.AssertWasCalled( x => x.SaveSetting(null), o=>o.IgnoreArguments() );

Related

Who should be responsible for inflating a Data Object?

In my current project, I have quite a few objects I need to persist to XML and inflate at runtime. I've been managing this through .NET's DataContracts. What I am doing right now is creating a separate class that represents the objects I'm serializing and reading/writing those to/from disc to avoid having too much responsibility in a single class. Here's an example:
public class Range
{
private float _min;
private float _max;
public float Min { get { return this._min; } }
public float Max { get { return this._max; } }
// Constructrs & Methods...
public SerializedRange GetAsSerializable();
}
The Range class has the complimentary class:
[DataContract]
public class SerializedRange
{
[DataMember]
public float Min;
[DataMember]
public float Max;
// Constructor...
}
My question then is, who should be responsible for actually taking the Serialized version of the object and inflating it into the actual object? I see three options, but I'm not sure which (if any of them) would be the best:
Give the Serialized version of the object an instance method that spits out an inflated instance, using the available constructors/factories of the sister class.
Give the sister class a factory that takes an instance of the serialized version to construct itself.
Don't have either the class or it's Serializable counterpart do anything- have the code that reads in the Serialized objects manually create the regular objects using whatever constructors/factories they'd regularly have.
I realize that in certain situations you'd have to do it one way or the other because of constraints outside of this somewhat contrived example. Since there's more then one way to do it though, what I'm really looking for is a general rule of thumb that yields neat, readable, and expandable code.
If you 'break' you application into constituent parts what logical components would you get? Here are few based on my understanding:
Domain Objects (Data that you are storing)
Data Layer - responsible for persisting the data (and retrieving it)
and many others (but just taken a subset as per your description)
Now, the job of the data layer is to write the content out to some storage - XML files to disk in your case.
Now, when you 'query' the file who fetches it? The data layer. Who 'should' populate the corresponding domain object? Well the data layer itself.
Should the data layer 'delegate' the responsibility of population to a separate class/factory? It depends if it's ever going to be reused by someone else. If not, concepts like inner classes can be of good help (they exist in the java world, not sure of it's equivalent in C#.NET). That way you'll have it modularized into a specific class, which is not publicly visible to other classes, unless you want it that way.
Should you go with factory? Yes, you may. But make sure it's logically correct to do so. You could land up with many object inflators - that could isolate the inflation functionality to one class and the factory could itself be a part of the data layer (if you want it that way).
Once you delineate the concerns you'll be in a better position to decided where to put that piece of code. I've provided some pointers that could come in handy.
Hope it helps...

How to Test Functions w/ Complex Data Interactions

Currently, I am working on system that does quite a bit of reporting-style functions that consumes many different data points and transforms them into larger, sometimes flattened outputs. Most of my app is built upon a variation of the repository pattern. Due to this, I have a suite of mock-repositories that I use for testing scenarios. The problem that I am running into is that the interaction between these data points is so complex that it is quickly become a maintenance nightmare to maintain the "mock data". Here is a mock example:
public class SomeReportingEntity
{
private IProductRepo ProductRepo;
private IManagerRepo ManagerRepo;
private ILocationRepo LocationRepo;
private IOrdersService OrdersService;
private IEmployeeRepo EmployeeRepo;
public ReportingEntity(IProductRepo ipr, IManagerRepo imr, ILocationRepo ilr, IOrdersService ios,
IEmployeeRepo ier){
//Load these to private vars...
}
//This is the function that I want to test...
public SomeReportingEntity GetManagerSalesByRegionReport()
{
//Make a complex join on all sub collections. These
//sub collections are all under test individually.
var MangerSalesByRegionItems = From x in ProductRepo.CurrentProducts()
Join y in OrdersService.FutureOrders() On ...
Join z in EmployeeRepo.ActiveEmployees() On ...
Join a in LocationRepo.GetAllRegions() On ...
Join b In ManagerRepo.GetActiveManagers On ...
Select new SomeReportingEntity() With { ... }
return MangerSalesByRegionItems.ToList();
}
}
Admittedly, this is a very contrived example but the basic idea that I want to emphasize is that I have several repositories that I am joining and I need to create many tests to ensure that this complex query does as expected. Due to the fact that the joining operations are so complex, it makes the mock data VERY difficult to keep in line - especially as I have to add more associations and test additional points. In addition, I need to be able to enter specific record states into the mocks (such as an employee lacking an assigned manager) to verify that query handles those situations appropriately.
So here are my questions:
What is the best way to "mock" this data so that it is not such a matinenance nightmare? I have had many people suggest building an in-memory database to support this.
Am I really suffering from an architecture issue here? In reporting scenarios, I find myself in this pattern quite a bit where I take many disassociated data points and merge them into a new, hybrid entity. With the onset of Linq, it is very easy to do and has high clarity of intent, but sometimes it feels like I am cheating a little.
The first thing you want to do is make centralized object that knows how to retrieve the data for different repositories. Since this is reporting only, it's easier because you don't have to worry about change tracking.
From a logistical standpoint, one thing I would consider is making a local database to hold the remote data (update periodically using agents). This would remove some of the issues of calling remote services and aggregating their data on the fly. You would also be able to pre-process some of the data at the start.
When I use the repository pattern, I couple it with the Unit Of Work pattern. The Unit of Work is the guy that does all the legwork for you. Theoretically, your UoW could bring in the data from the multiple services and present it to the repositories based on configuration.
For testing, you can use the InMemoryUnitOfWork to provide all the data in one single place.
I've been working on data-heavy project myself. What has worked for us is to use the Repository itself to hydrate objects and then serialize them to XML. We pull the XML file into our test project and use that as the starting point for our automated tests. It's nice because it ensures that your mock data looks like real data.
Our tests tend to look like this...
var object1 = XmlUtil.LoadObject1("filename1");
var object2 = XmlUtil.LoadObject2("filename2");
var result = SomeConverter.Convert(object1, object2);
Assert("somevalue", result.Property1);
If you need to do inline lookups, you can add a mock repository that would provide the same level of dependency injection.
The downside of this approach is if the data schema changes. Sometimes, a test can become obsolete if the data schema has changed. If your schema is still under a lot of flux, I would keep your automated test small until the schema settles down. Focus on unit tests until you know that the schema is relaitively stable.
You have to decide exactly what you want to test.
One way to do this might be to pretend you're using TDD. Pretend that your GetManagerSalesByRegionReport method does not exist (or actually delete it). You'll have to:
Write a failing unit test. What's the simplest thing for it to test: that you can call the method and that it doesn't throw an exception when there's nothing wrong with the data.
You'll need to create the method, empty. It should return void since your test doesn't need it to return anything.
Your test should now pass.
Add a test to ensure that a List of the appropriate type is returned, even if none of the sub-repositories have data.
You'll have to change the method to return your list type, and you'll have to change it to return null. Your test will still fail, so change it to return an empty List and it will pass.
What's left? Those are INNER joins, so you won't get any data back unless all the repositories contain at least one row. So, test for that: create a test where each repo contains one row and ensure the returned list contains the appropriate number of rows. Then, test for the appropriate properties per returned row. Then test that no data is returned if any of the repos contain no rows.
Then, maybe test what happens if some of the repos contain more than one row.
Then, I don't know what would be left to test.

Learning TDD with a simple example

I'm attempting to learn TDD but having a hard time getting my head around what / how to test with a little app I need to write.
The (simplified somewhat) spec for the app is as follows:
It needs to take from the user the location of a csv file, the location of a word document mailmerge template and an output location.
The app will then read the csv file and for each row, merge the data with the word template and output to the folder specified.
Just to be clear, I'm not asking how I would go about coding such an app as I'm confident that I know how to do it if I just went ahead and started. But if I wanted to do it using TDD, some guidance on the tests to write would be appreciated as I'm guessing I don't want to be testing reading a real csv file, or testing the 3rd party component that does the merge or converts to pdf.
I think just some general TDD guidance would be a great help!
I'd start out by thinking of scenarios for each step of your program, starting with failure cases and their expected behavior:
User provides a null csv file location (throws an ArgumentNullException).
User provides an empty csv file location (throws an ArgumentException).
The csv file specified by the user doesn't exist (whatever you think is appropriate).
Next, write a test for each of those scenarios and make sure it fails. Next, write just enough code to make the test pass. That's pretty easy for some of these conditions, because the code that makes your test pass is often the final code:
public class Merger {
public void Merge(string csvPath, string templatePath, string outputPath) {
if (csvPath == null) { throw new ArgumentNullException("csvPath"); }
}
}
After that, move into standard scenarios:
The specified csv file has one line (merge should be called once, output written to the expected location).
The specified csv file has two lines (merge should be called twice, output written to the expected location).
The output file's name conforms to your expectations (whatever those are).
And so on. Once you get to this second phase, you'll start to identify behavior you want to stub and mock. For example, checking whether a file exists or not - .NET doesn't make it easy to stub this, so you'll probably need to create an adapter interface and class that will let you isolate your program from the actual file system (to say nothing of actual CSV files and mail-merge templates). There are other techniques available, but this method is fairly standard:
public interface IFileFinder { bool FileExists(string path); }
// Concrete implementation to use in production
public class FileFinder: IFileFinder {
public bool FileExists(string path) { return File.Exists(path); }
}
public class Merger {
IFileFinder finder;
public Merger(IFileFinder finder) { this.finder = finder; }
}
In tests, you'll pass in a stub implementation:
[Test]
[ExpectedException(typeof(FileNotFoundException))]
public void Fails_When_Csv_File_Does_Not_Exist() {
IFileFinder finder = mockery.NewMock<IFileFinder>();
Merger merger = new Merger(finder);
Stub.On(finder).Method("FileExists").Will(Return.Value(false));
merger.Merge("csvPath", "templatePath", "outputPath");
}
Simple general guidance:
You write unit tests first. At the beginning
they all fail.
Then you go into the class under test
and write code until tests related to
each method pass.
Do this for every public method of
your types.
By writing units test you actually specify the requirements but in another form, easy to read code.
Looking to it from another angle: when you receive a new black boxed class and unit tests for it, you should read the unit tests to see what the class does and how it behaves.
To read more about unit testing I recommend a very good book: Art Of Unit Testing
Here are a couple links to articles on StackOverflow regarding TDD for more details and examples:
Link1
Link2
To be able to unit test you need to decouple the class from any dependencies so you can effectively just test the class itself.
To do this you'll need to inject any dependencies into the class. You would typically do this by passing in an object that implements the dependency interface, into your class in the constructor.
Mocking frameworks are used to create a mock instance of your dependency that your class can call during the test. You define the mock to behave in the same way as your dependency would and then verify it's state at the end of the test.
I would recommend having a play with Rhino mocks and going through the examples in the documentation to get a feel for how this works.
http://ayende.com/projects/rhino-mocks.aspx

Importing data from third party datasource (open architecture design )

How would you design an application (classes, interfaces in class library) in .NET when we have a fixed database design on our side and we need to support imports of data from third party data sources, which will most likely be in XML?
For instance, let us say we have a Products table in our DB which has columns
Id
Title
Description
TaxLevel
Price
and on the other side we have for instance Products:
ProductId
ProdTitle
Text
BasicPrice
Quantity.
Currently I do it like this:
Have the third party XML convert to classes and XSD's and then deserialize its contents into strong typed objects (what we get as a result of this process is classes like ThirdPartyProduct, ThirdPartyClassification, etc.).
Then I have methods like this:
InsertProduct(ThirdPartyProduct newproduct)
I do not use interfaces at the moment but I would like to. What I would like is implement something like
public class Contoso_ProductSynchronization : ProductSynchronization
{
public void InsertProduct(ContosoProduct p)
{
Product product = new Product(); // this is our Entity class
// do the assignments from p to product here
using(SyncEntities db = new SyncEntities())
{
// ....
db.AddToProducts(product);
}
}
// the problem is Product and ContosoProduct have no arhitectural connection right now
// so I cannot do this
public void InsertProduct(ContosoProduct p)
{
Product product = (Product)p;
using(SyncEntities db = new SyncEntities())
{
// ....
db.AddToProducts(product);
}
}
}
where ProductSynchronization will be an interface or abstract class. There will most likely be many implementations of ProductSynchronization. I cannot hardcode the types - classes like ContosoProduct, NorthwindProduct might be created from the third party XML's (so preferably I would continue to use deserialization).
Hopefully someone will understand what I'm trying to explain here. Just imagine you are the seller and you have numerous providers and each one uses their own proprietary XML format. I don't mind the development, which will of course be needed everytime new format appears, because it will only require 10-20 methods to be implemented, I just want the architecture to be open and support that.
In your replies, please focus on design and not so much on data access technologies because most are pretty straightforward to use (if you need to know, EF will be used for interacting with our database).
[EDIT: Design note]
Ok, from a design perspective I would do xslt on the incoming xml to transform it to a unified format. Also very easy to verify the result xml towards a schema.
Using xslt I would stay away from any interface or abstract class, and just have one class implementation in my code, the internal class. It would keep the code base clean, and the xslt's themselves should be pretty short if the data is as simple as you state.
Documenting the transformations can easily be done wherever you have your project documentation.
If you decide you absolutely want to have one class per xml (or if you perhaps got a .net dll instead of xml from one customer), then I would make the proxy class inherit an interface or abstract class (based off your internal class, and implement the mappings per property as needed in the proxy classes. This way you can cast any class to your base/internal class.
But seems to me doing the conversion/mapping in code will make the code design a bit more messy.
[Original Answer]
If I understand you correctly you want to map a ThirdPartyProduct class over to your own internal class.
Initially I am thinking class mapping. Use something like Automapper and configure up the mappings as you create your xml deserializing proxy's. If you make your deserialization end up with the same property names as your internal class, then there's less config to do for the mapper. Convention over Configuration.
I'd like to hear anyones thoughts on going this route.
Another approach would be to add a .ToInternalProduct( ThirdPartyClass ) in a Converter class. And keep adding more as you add more external classes.
The third approach is for XSLT guys. If you love XSLT you could transform the xml into something which can be deserialized into your internal product class.
Which one of these three I'd choose would depend on the skills of the programmer, and who will maintain adding new external classes. The XSLT approach would require no recompiling or compiling of code as new formats arrived. That might be an advantage.

How do you add sample (dummy) data to your unit tests?

In bigger projects my unit tests usually require some "dummy" (sample) data to run with. Some default customers, users, etc. I was wondering how your setup looks like.
How do you organize/maintain this data?
How do you apply it to your unit tests (any automation tool)?
Do you actually require test data or do you think it's useless?
My current solution:
I differentiate between Master data and Sample data where the former will be available when the system goes into production (installed for the first time) and the latter are typical use cases I require for my tests to run (and to play during development).
I store all this in an Excel file (because it's so damn easy to maintain) where each worksheet contains a specific entity (e.g. users, customers, etc.) and is flagged either master or sample.
I have 2 test cases which I (miss)use to import the necessary data:
InitForDevelopment (Create Schema, Import Master data, Import Sample data)
InitForProduction (Create Schema, Import Master data)
I use the repository pattern and have a dummy repository that's instantiated by the unit tests in question, it provides a known set of data that encompasses a examples that are both within and out of range for various fields.
This means that I can test my code unchanged by supplying the instantiated repository from the test unit for testing or the production repository at runtime (via a dependency injection (Castle)).
I don't know of a good web reference for this but I learnt much from Steven Sanderson's Professional ASP.NET MVC 1.0 book published by Apress. The MVC approach naturally provides the separation of concern that's necessary to allow your testing to operate with fewer dependencies.
The basic elements are that you repository implements an interface for data access, that same interface is then implemented by a fake repository that you construct in your test project.
In my current project I have an interface thus:
namespace myProject.Abstract
{
public interface ISeriesRepository
{
IQueryable<Series> Series { get; }
}
}
This is implemented as both my live data repository (using Linq to SQL) and also a fake repository thus:
namespace myProject.Tests.Respository
{
class FakeRepository : ISeriesRepository
{
private static IQueryable<Series> fakeSeries = new List<Series> {
new Series { id = 1, name = "Series1", openingDate = new DateTime(2001,1,1) },
new Series { id = 2, name = "Series2", openingDate = new DateTime(2002,1,30),
...
new Series { id = 10, name = "Series10", openingDate = new DateTime(2001,5,5)
}.AsQueryable();
public IQueryable<Series> Series
{
get { return fakeSeries; }
}
}
}
Then the class that's consuming the data is instantiated passing the repository reference to the constructor:
namespace myProject
{
public class SeriesProcessor
{
private ISeriesRepository seriesRepository;
public void SeriesProcessor(ISeriesRepository seriesRepository)
{
this.seriesRepository = seriesRepository;
}
public IQueryable<Series> GetCurrentSeries()
{
return from s in seriesRepository.Series
where s.openingDate.Date <= DateTime.Now.Date
select s;
}
}
}
Then in my tests I can approach it thus:
namespace myProject.Tests
{
[TestClass]
public class SeriesTests
{
[TestMethod]
public void Meaningful_Test_Name()
{
// Arrange
SeriesProcessor processor = new SeriesProcessor(new FakeRepository());
// Act
IQueryable<Series> currentSeries = processor.GetCurrentSeries();
// Assert
Assert.AreEqual(currentSeries.Count(), 10);
}
}
}
Then look at CastleWindsor for the inversion of control approach for your live project to allow your production code to automatically instantiate your live repository through dependency injection. That should get you closer to where you need to be.
In our company we discuss exact these problem a bunch of time since weeks and month.
To follow the guideline of unit testing:
Each test must be atomar and don't allow relate to each other (No data sharing), that means, each tust must be have there own data at the beginning and clear the data at end.
Out product is so complex (5 years development, over 100 tables in a database), that is nearly impossible to maintain this in a acceptable way.
We tried out database scripts, which creates and deletes the data before / after the test (there are automatic methods which call it).
I would say you are on a good way with excel files.
Ideas from me to make it a little well:
If you have a database behind your software google for "NDBUnit". It's a framework to insert and delete data in databases for unit tests.
If you have no database maybe XML is a little more flexible on systems like excel.
Not directly answering the question but one way to limit the amount of tests that need to use dummy data is to use a mocking framework to create mocked objects that you can use to fake the behavior of any dependencies you have in a class.
I find that using mocked objects rather then a specific concrete implementation you can drastically reduce the amount of real data you need to use as mocks don't process the data you pass into them. They just perform exactly as you want them to.
I'm still sure you probably need dummy data in a lot of instances so apologies if you're already using or are aware of mocking frameworks.
Just to be clear, you need to differenciate between UNIT testing (test a module with no implied dependencies on other modules) and app testing (test parts of application).
For the former, you need a mocking framework (I'm only familiar with Perl ones, but i'm sure they exist in Java/C#). A sign of a good framework would be ability to take a running app, RECORD all the method calls/returns, and then mock the selected methods (e.g. the ones you are not testing in this specific unit test) using recorded data.
For good unit tests you MUST mock every external dependency - e.g., no calls to filesystem, no calls to DB or other data access layers unless that is what you are testing, etc...
For the latter, the same mocking framework is useful, plus ability to create test data sets (that can be reset for each test). The data to be loaded for the tests can reside in any offline storage that you can load from - BCP files for Sybase DB data, XML, whatever tickles your fancy. We use both BCP and XML.
Please note that this sort of "load test data into DB" testing is SIGNIFICANTLY easier if your overall company framework allows - or rather enforces - a "What is the real DB table name for this table alias" API. That way, you can cause your application to look at cloned "test" DB tables instead of real ones during testing - on top of such table aliasing API's main purpose of enabling one to move DB tables from one database to another.

Categories

Resources