QueryOver – ensure adding a join table only once - c#

I have a fairly complex use case that requires performing the same SQL tasks conditionally in various parts of the code. I wanted to duplicate as little code as possible, so I built a few static helper methods that allow me to add some JOIN statements when needed.
I know this probably could've been done a bit more cleanly with extensions, but for now my code looks something like this:
static class Foo
{
// Actually adds some filters which need additional JOINs
public static IQueryOver<Transaction, Transaction> FromRetailer(Retailer retailer, IQueryOver<Transaction, Transaction> baseQuery = null)
{
RetailLocation retailLocation = null;
return ForRetailerBase(baseQuery)
.Where(t => retailLocation.Retailer == retailer);
}
// Auxiliary method which only adds some JOINs needed in various places
public static IQueryOver<Transaction, Transaction> ForRetailerBase(IQueryOver<Transaction, Transaction> baseQuery = null)
{
if (baseQuery == null)
baseQuery = QueryOver(); // Custom method that creates a vanilla IQueryOver instance
// Add all sorts of JOINs needed to query the retailer
return baseQuery
.JoinAlias(...)
.Left.JoinAlias(...)
// and so on
;
}
}
In the business logic, I either need to actually filter by retailer (in which case I call FromRetailer(), which calls ForRetailerBase() for me), or I don't need to filter by retailer – but I still need the JOINs added by ForRetailerBase() later on for grouping. Calling ForRetailerBase() unconditionally obviously breaks things when FromRetailer() is also called.
I'm currently solving this in a very clumsy fashion, by using a boolean in the business logic in order to execute ForRetailerBase() conditionally, only if FromRetailer() isn't executed.
I realize this could be fixed on two levels: either use a more adequate pattern altogether, or add those JOINs conditionally in ForRetailerBase(), by interrogating the baseQuery object to determine whether it already has the necessary JOINs. I'd rather go with the first approach, if one is available (this part of the code is still relatively young, and I can easily refactor it) – but I'll settle for the second approach as well. Problem is, I don't know how to advance in either direction.
I also realize the superficial solution is to remove the call to ForRetailerBase() from method FromRetailer(), and calling it unconditionally from the business logic, but that's just as bad as my current solution, because it requires my business logic to know how those methods work internally.

ForRetailerBase, FromRetailer looks to me as something the business logic should not know at all. It looks like query helpers, which should be handled by a query repository.
Such repository will expose querying methods for the business, methods which would internally call your ForRetailerBase or FromRetailer as required.
This way, your business will not need knowledge of how to build your queries, and your querying logic will still be factorized, inside the repository.
Side note: your question does not really look bound to the specific technologies you are using. It looks to me more as a code design question. Maybe should you ask it on https://softwareengineering.stackexchange.com/ instead, which is meant for such questions (see its on topic page).

Related

Using Entity Framework to return a table of data to iterate against

I am currently using EF 6 to do the following. Execute a stored procedure, then bring in the data I need to use. The data is usually 30-40 rows per application run.
I then iterate over the var, object, table (whatever you would like to call it), performing similar (sometimes different) tasks on each row. It works great. I am able to create an Entity object, expose the different complex functions of it, and then create a var to iterate over.
Like:
foreach (var result in StoredProcedureResult)
{
string strFirstname = result.FirstName
string strLastName = result.LastName
//more logic goes here using those variables and interacting with another app
}
I recently thought it would be cool if I had a class solely for accessing the data. In this way, I could just reference that class, toss the corresponding connection string into my app.config, and then I can keep the two sets of logic separate. So when attempting to do the above in that structure, I get to the point at which, you can't return a var, or when I attempt to match object return type. The return type of the execution of a stored procedure is object (which I can't iterate on).
So my question is, how does one get to the above example, except, the var result, get returned from this data access class?
If I am missing something, or its not possible because I am doing this incorrectly, do let me know. It appeared right in my head.
I'm not going to describe the architecture in full. But based on your comments you can do the following (this is not the definitive nor the only way how to do it):
in your data access project you keep the DBContext class, all the code for the stored procedure call and also the class that defines the result of the SP call, let's call it class A;
in your shared layer project - I would suggest calling it Service layer - you can create a XYService class, that has a method e.g. GetListOfX that connects to the DB and calls the procedure, if needed this method can also perform some logic, but more importantly: it doesn't return class A, but returns a new class B (this one is defined in the service layer, or can be defined in yet another project - that might be the true shared/common project; as it would be just a definition of common structures it isn't really a layer);
in your application layer you work only with the method GetListOfX of the XYService and the class B, that way you don't need a reference to the data access project
In a trivial case the class B has the same properties as the class A. But depending on your needs the class B can have additional properties/functionality it can also ignore some properties of A or even combine multiple properties into one: e.g. combining the FirstName and LastName as one property called simply Name.
Basically what you are looking for is the multi-tier application architecture (usually 3-4 tier). The full extent of such approach (which includes heavy usage of concepts like interfaces and dependency injection) might not be suitable or needed based on your goals, e.g. if you are building just a small application for yourself with a couple of functions or you know there won't be any reuse of the components of the final solution, then this approach is too wasteful and you can work faster with everything in one project - you should still apply principles like SOLID, DRY and Separation of concerns.

EF - 3 tier not safe

Supposing a classic 3 tier application. In DAL, you have a GenericRepository where T represent your POCO class and it include some method like Insert(T entity), Delete(T entity), Update(T entity) and so on. Then, your BLL (business logic class) contains something like CustomerRepositor.
Well, all rights.
Now, image your aspx page:
var customers = BLL.CustomerRepository.GetAll();
customers.First().Name = "some name";
Not good, you have to pass by CustomerRepository.Update, Insert or Delete methods in order to I can execute some validation for all CRUD operations. In this way all business logic will not works as I aspected.
I note that no-one has never thought about this, but I think is an important question. has not make sense to have business method for CRUD operation if yuo can bypass them.
Am I missing something?
Well, lets start.
var customers = BLL.CustomerRepository.GetAll();
This was a nice line of code in the last millenium. Before generics and LINQ came along. Seirously.
These days, I would expect it at least to be like this:
var customers = BLL.Repository<Customer>.ToList (); //IF you have to materialize
There is no need for an "All" method at all ;)
Am I missing something?
To a large degree an understanding that you are still within an application, so compromises are sort of acceptable. It is not like you run a trust boundary between applications here. Second, the fact that you should have programmed a better abstraction.
Repository repository = BLL.GetRepository ();
var customer s repository.Entity<Customer>.ToList ();
customer[0].Name = null;
repository.ValidateU ();
repository.Commit ();
would be a lot better abstraction. Creation is not done with "new" but with
var newCustomer = repository.Create<Customer> ();
which then commits.
All validation can be checked in the Validate method.
At the end, this is about HOW you design your interface for the repository - and if you insist on not keeping any state (which is a valid pattern for some operations) then this opens you to problems. And yes, you can have repositories that do not do full validation - totally valid. It really depends. You may be surprised, but I work on applications mostly where the repository is often not even updated in the same transaction as the object for performance reasons, and updates are queued and then batched, while the in memory version is the relevant one for all further operations.
It shows, at the end, that a little more thinking about how to design the DAL interface is in order, and please please please stop using an approach that is totally outdated and just leads to method creep (as you need tons of methods that otherwise just disappear in generics + linq expression trees.

C# LINQ and calculations involving large datasets

This is more of a technical "how-to" or "best approach" question.
We have a current requirement to retrieve records from the database, place them into an 'in-memory' list and then perform a series of calculations on the data, i.e. maximum values, averages and some more specific custom statistics as well.
Getting the data into an 'in-memory' list is not a problem as we use NHibernate as our ORM and it does an excellent job of retrieving data from the database. The advice I am seeking is how should we best perform calculations on the resulting list of data.
Ideally I would like to create a method for each statistic, MaximumValue(), AverageValueUnder100(), MoreComplicatedStatistic() etc etc. Of course passing the required variables to each method and having it return the result. This approach would also make unit testing a breeze and provide us with excellent coverage.
Would there be a performance hit if we perform a LINQ query for each calculation or should be consolidate as many calls to each statistic method in as few LINQ queries as possible. For example it doesn't make much sense to pass the list of data to a method called AverageValueBelow100 and then pass the entire list of data to another method AverageValueBelow50 when they could effectively be performed with one LINQ query.
How can we achieve a high level of granularity and separation without sacrificing performance?
Any advice ... is the question clear enough?
Depending on the complexity of the calculation, it may be best to do it in the database. If it is signifcantly complex that you need to bring it in as objects and encur that overhead, you may want to avoid multiple iterations over your result set. you may want to consider using Aggregate. See http://geekswithblogs.net/malisancube/archive/2009/12/09/demystifying-linq-aggregates.aspx for a discussion if it. You would be able to unit test each aggregate separately, but then (potentially) project multiple aggregates within a single iteration.
I dont agree that it is best "to do it all in the database".
Well written Linq Queries will result in good SQL queries being executed against the database, which should be good enough performance wise (if you are not going to do dwh stuff). This is assuming you are using the Linq Provider for NHibernate and not Linq to Objects.
It does look good, you can change it easily and keeps your business logic in one place.
If that is too slow for your needs, you might check the SQL code created and tweak your linq queries, are try to precompile them, and in the end you can still go back to writing the beloved stored procedures - and start to spread your business logic all over the place.
Will there be a performance hit? Yeah, you might lose a few millisecs, but is that worth the price you have to pay for separating your logic?
To answer the "I would like to create a method for each statistic" concern, I would suggest you to build a kind of statistician class. Here is some pseudo code to express the idea :
class Statistician
{
public bool MustCalculateFIRSTSTATISTIC { get; set; } // Please rename me!
public bool MustCalculateSECONDSTATISTIC { get; set; } // Please rename me!
public void ProcessObject(object Object) // Replace object and Rename
{
if (MustCalculateFIRSTSTATISTIC)
CalculateFIRSTSTATISTIC(Object);
if (MustCalculateFIRSTSTATISTIC)
CalculateSECONDSTATISTIC(Object);
}
public object GetFIRSTSTATISTIC() // Replace object, Rename
{ /* ... */ }
public object GetSECONDSTATISTIC() // Replace object, Rename
{ /* ... */ }
private void CalculateFIRSTSTATISTIC(object Object) // Replace object
{ /* ... */ }
private void CalculateSECONDSTATISTIC(object Object) // Replace object
{ /* ... */ }
}
Would I have to do this, I would probably try to make it generic and use collections of delegates instead of methods, but since I don't know your context, I'll leave it to that. Also note that I only used Object members of object class, but that's only because I'm not suggesting you to use DataRows, Entities, or what not; I'll leave that to the other folks that know more then me on the subject!

How to Test Functions w/ Complex Data Interactions

Currently, I am working on system that does quite a bit of reporting-style functions that consumes many different data points and transforms them into larger, sometimes flattened outputs. Most of my app is built upon a variation of the repository pattern. Due to this, I have a suite of mock-repositories that I use for testing scenarios. The problem that I am running into is that the interaction between these data points is so complex that it is quickly become a maintenance nightmare to maintain the "mock data". Here is a mock example:
public class SomeReportingEntity
{
private IProductRepo ProductRepo;
private IManagerRepo ManagerRepo;
private ILocationRepo LocationRepo;
private IOrdersService OrdersService;
private IEmployeeRepo EmployeeRepo;
public ReportingEntity(IProductRepo ipr, IManagerRepo imr, ILocationRepo ilr, IOrdersService ios,
IEmployeeRepo ier){
//Load these to private vars...
}
//This is the function that I want to test...
public SomeReportingEntity GetManagerSalesByRegionReport()
{
//Make a complex join on all sub collections. These
//sub collections are all under test individually.
var MangerSalesByRegionItems = From x in ProductRepo.CurrentProducts()
Join y in OrdersService.FutureOrders() On ...
Join z in EmployeeRepo.ActiveEmployees() On ...
Join a in LocationRepo.GetAllRegions() On ...
Join b In ManagerRepo.GetActiveManagers On ...
Select new SomeReportingEntity() With { ... }
return MangerSalesByRegionItems.ToList();
}
}
Admittedly, this is a very contrived example but the basic idea that I want to emphasize is that I have several repositories that I am joining and I need to create many tests to ensure that this complex query does as expected. Due to the fact that the joining operations are so complex, it makes the mock data VERY difficult to keep in line - especially as I have to add more associations and test additional points. In addition, I need to be able to enter specific record states into the mocks (such as an employee lacking an assigned manager) to verify that query handles those situations appropriately.
So here are my questions:
What is the best way to "mock" this data so that it is not such a matinenance nightmare? I have had many people suggest building an in-memory database to support this.
Am I really suffering from an architecture issue here? In reporting scenarios, I find myself in this pattern quite a bit where I take many disassociated data points and merge them into a new, hybrid entity. With the onset of Linq, it is very easy to do and has high clarity of intent, but sometimes it feels like I am cheating a little.
The first thing you want to do is make centralized object that knows how to retrieve the data for different repositories. Since this is reporting only, it's easier because you don't have to worry about change tracking.
From a logistical standpoint, one thing I would consider is making a local database to hold the remote data (update periodically using agents). This would remove some of the issues of calling remote services and aggregating their data on the fly. You would also be able to pre-process some of the data at the start.
When I use the repository pattern, I couple it with the Unit Of Work pattern. The Unit of Work is the guy that does all the legwork for you. Theoretically, your UoW could bring in the data from the multiple services and present it to the repositories based on configuration.
For testing, you can use the InMemoryUnitOfWork to provide all the data in one single place.
I've been working on data-heavy project myself. What has worked for us is to use the Repository itself to hydrate objects and then serialize them to XML. We pull the XML file into our test project and use that as the starting point for our automated tests. It's nice because it ensures that your mock data looks like real data.
Our tests tend to look like this...
var object1 = XmlUtil.LoadObject1("filename1");
var object2 = XmlUtil.LoadObject2("filename2");
var result = SomeConverter.Convert(object1, object2);
Assert("somevalue", result.Property1);
If you need to do inline lookups, you can add a mock repository that would provide the same level of dependency injection.
The downside of this approach is if the data schema changes. Sometimes, a test can become obsolete if the data schema has changed. If your schema is still under a lot of flux, I would keep your automated test small until the schema settles down. Focus on unit tests until you know that the schema is relaitively stable.
You have to decide exactly what you want to test.
One way to do this might be to pretend you're using TDD. Pretend that your GetManagerSalesByRegionReport method does not exist (or actually delete it). You'll have to:
Write a failing unit test. What's the simplest thing for it to test: that you can call the method and that it doesn't throw an exception when there's nothing wrong with the data.
You'll need to create the method, empty. It should return void since your test doesn't need it to return anything.
Your test should now pass.
Add a test to ensure that a List of the appropriate type is returned, even if none of the sub-repositories have data.
You'll have to change the method to return your list type, and you'll have to change it to return null. Your test will still fail, so change it to return an empty List and it will pass.
What's left? Those are INNER joins, so you won't get any data back unless all the repositories contain at least one row. So, test for that: create a test where each repo contains one row and ensure the returned list contains the appropriate number of rows. Then, test for the appropriate properties per returned row. Then test that no data is returned if any of the repos contain no rows.
Then, maybe test what happens if some of the repos contain more than one row.
Then, I don't know what would be left to test.

Is it ok to use C# Property like this

One of my fellow developer has a code similar to the following snippet
class Data
{
public string Prop1
{
get
{
// return the value stored in the database via a query
}
set
{
// Save the data to local variable
}
}
public void SaveData()
{
// Write all the properties to a file
}
}
class Program
{
public void SaveData()
{
Data d = new Data();
// Fetch the information from database and fill the local variable
d.Prop1 = d.Prop1;
d.SaveData();
}
}
Here the Data class properties fetch the information from DB dynamically. When there is a need to save the Data to a file the developer creates an instance and fills the property using self assignment. Then finally calls a save. I tried arguing that the usage of property is not correct. But he is not convinced.
This are his points
There are nearly 20 such properties.
Fetching all the information is not required except for saving.
Instead of self assignment writing an utility method to fetch all will have same duplicate code in the properties.
Is this usage correct?
I don't think that another developer who will work with the same code will be happy to see :
d.Prop1 = d.Prop1;
Personally I would never do that.
Also it is not the best idea to use property to load data from DB.
I would have method which will load data from DB to local variable and then you can get that data using property. Also get/set logically must work with the same data. It is strange to use get for getting data from DB but to use set to work with local variable.
Properties should really be as lightweight as possible.
When other developers are using properties, they expect them to be intrinsic parts of the object (that is, already loaded and in memory).
The real issue here is that of symmetry - the property get and set should mirror each other, and they don't. This is against what most developers would normally expect.
Having the property load up from database is not recommended - normally one would populate the class via a specific method.
This is pretty terrible, imo.
Properties are supposed to be quick / easy to access; if there's really heavy stuff going on behind a property it should probably be a method instead.
Having two utterly different things going on behind the same property's getter and setter is very confusing. d.Prop1 = d.Prop1 looks like a meaningless self-assignment, not a "Load data from DB" call.
Even if you do have to load twenty different things from a database, doing it this way forces it to be twenty different DB trips; are you sure multiple properties can't be fetched in a single call? That would likely be much better, performance-wise.
"Correct" is often in the eye of the beholder. It also depends how far or how brilliant you want your design to be. I'd never go for the design you describe, it'll become a maintenance nightmare to have the CRUD actions on the POCOs.
Your main issue is the absense of separations of concerns. I.e., The data-object is also responsible for storing and retrieving (actions that need to be defined only once in the whole system). As a result, you end up with duplicated, bloated and unmaintainable code that may quickly become real slow (try a LINQ query with a join on the gettor).
A common scenario with databases is to use small entity classes that only contain the properties, nothing more. A DAO layer takes care of retrieving and filling these POCOs with data from the database and defined the CRUD actions only ones (through some generics). I'd suggest NHibernate for the ORM mapping. The basic principle explained here works with other ORM mappers too and is explained here.
The reasons, esp. nr 1, should be a main candidate for refactoring this into something more maintainable. Duplicated code and logic, when encountered, should be reconsidered strongly. If the gettor above is really getting the database data (I hope I misunderstand that), get rid of it as quickly as you can.
Overly simplified example of separations of concerns:
class Data
{
public string Prop1 {get; set;}
public string Prop2 {get; set;}
}
class Dao<T>
{
SaveEntity<T>(T data)
{
// use reflection for saving your properies (this is what any ORM does for you)
}
IList<T> GetAll<T>()
{
// use reflection to retrieve all data of this type (again, ORM does this for you)
}
}
// usage:
Dao<Data> myDao = new Dao<Data>();
List<Data> allData = myDao.GetAll();
// modify, query etc using Dao, lazy evaluation and caching is done by the ORM for performance
// but more importantly, this design keeps your code clean, readable and maintainable.
EDIT:
One question you should ask your co-worker: what happens if you have many Data (rows in database), or when a property is a result of a joined query (foreign key table). Have a look at Fluent NHibernate if you want a smooth transition from one situation (unmaintainable) to another (maintainable) that's easy enough to understand by anybody.
If I were you I would write a serialize / deserialize function, then provide properties as lightweight wrappers around the in-memory results.
Take a look at the ISerialization interface: http://msdn.microsoft.com/en-us/library/system.runtime.serialization.iserializable.aspx
This would be very hard to work with,
If you set the Prop1, and then get Prop1, you could end up with different results
eg:
//set Prop1 to "abc"
d.Prop1 = "abc";
//if the data source holds "xyz" for Prop1
string myString = d.Prop1;
//myString will equal "xyz"
reading the code without the comment you would expect mystring to equal "abc" not "xyz", this could be confusing.
This would make working with the properties very difficult and require a save every time you change a property for it to work.
As well as agreeing with what everyone else has said on this example, what happens if there are other fields in the Data class? i.e. Prop2, Prop3 etc, do they all go back to the database, each time they are accessed in order to "return the value stored in the database via a query". 10 properties would equal 10 database hits. Setting 10 properties, 10 writes to the database. That's not going to scale.
In my opinion, that's an awful design. Using a property getter to do some "magic" stuff makes the system awkward to maintain. If I would join your team, how should I know that magic behind those properties?
Create a separate method that is called as it behaves.

Categories

Resources