I started working recently in a new project where we have thousands of lines of legacy code. We are facing several performance issues. I decided to take a look at the code and saw the following. There's a class:
public class BaseDataAccess
{
private Database dB;
public Database DB
{
get
{
if (dB == null)
{
dB = DatabaseFactory.CreateDatabase();
}
return dB;
}
}
}
And many descendant classes which inherit from the previous base class. Internally, those other classes make use of the DB property, like this:
DataSet ds = DB.ExecuteDataSet(spGetCustomersSortedByAge);
Finally, there's a huge class (5000 lines of code) with tens of methods like the following:
public void ProcessPayments()
{
try
{
List<Employee> employees = new EmployeesDA().SelectAll(null);
foreach (Employee employee in employees)
{
employee.Account = new MovementsDA().SelectAll(employee.Id, DateTime.Now);
...
City city = new CitiesDA().Select(zone.cityId);
...
Management m = new ManagmentDA().Select(city.id);
}
}
catch (Exception ex)
{
...
}
}
Note in the previous method EmployeesDA, MovementsDA, CitiesDA and ManagmentDA all are inheritors of BaseDataAccess and internally use their respective DB properties. Also note they are constantly being instantiated inside foreach loops (many times within 2 levels of nesting).
I think the instantiation itself is suspicious but I'm more concerned about what's going on with the database connections here? Is every DA instantiated opening a new underlying connection? How bad is this code?
As a side note about the solution I was considering in case this code should be fixed: I was considering making every constructor private so the compiler starts complaining about the instantiations and refactor the instantiations with calls to the GetInstance method (singleton pattern) to avoid the recreation of the objects and underlying connections. But, I'm not sure if this could also be dangerous in some way, for example, if the connections may get closed. The current code doesn't have that problem because of the instantiatons happening all the time.
It's a common misconception that object construction is expensive. It's much more expensive than base arithmetic or other machine-level things, but isn't likely the direct source of performance issues.
Using boxed integers for a loop is wasteful for example, but constructing an Employee object in each vs reusing a mutable Employee object isn't going to give meaningful performance advantages.
Many garbage collectors are capable of object memory frame reuse in loops like this. In effect a single object frame is allocated and overwritten on each pass of the loop.
In this specific case there may be a cost if the DA's have significant initialization costs. If that is the case I would refactor the code to create those once outside the loop. I would not use actual static singletons. I would use dependency injection to manage singleton objects if you need it. Static singletons are effectively global variables and are an invitation to stateful coupling and the break down of modularity.
Related
It must be a very dump question but I am wondering if I can use a cached object as part of the using statement
e.g
using(Class1 sample = Cache.GetClass<Class1>())
Cache.class is a static class which uses memoryCache to store a copy of Class1, and the GetClass is to get a copy of the stored object from cache if it is already there.
In my real life (almost, but simpilfied) exmaple, I have got this:
using (dataDesignerClass dds = Cache.GetClass<dataDesignerClass>()){
...
Dataset ds = new Dataset();
dds.dataadapter1.fill(ds); //dds is the data designer which contains all the sqlconnection, sql commands, adapters..etc which can get quite big
...
}
..which seems to be ok to me, but I find that SOMETIMES the dataset (ds) is not filled by the dataadapter1, without returning error.
My GetClass static class:
public static T GetClass<T> () where T: class
{
string keyName = "CACHE_" + typeof(T).Name.ToUpper();
CacheItem cacheItem = null;
cacheItem = GetCache(keyName); //a function to return the cache item
if (cacheItem == null)
{
T daClass = Activator.CreateInstance(typeof(T)) as T; //the constructor will call the initilalization routine
AddCache(keyName, daClass);
return daClass;
}
return (T)cacheItem.Value;
}
Can someone explain why it fails?
I think it is a bad idea to use using on something you cache.
The idea behind using is that it disposes all unmanaged memory allocation and handles an object has before it is destructed. You should not use your object after it is disposed. The problem here is it is not your intention to destruct and get rid of the object, hence you save it in a cache!
Also, a DataReader is somewhat of a cursor typed object. It will not like you for reusing it, especially when you use more than one thread.
Disposing the object will most likely break your software and give unexpected and unwanted result. Don't use using in this scenario.
Reusing a shared object is sometimes good practice, but you need to make sure it can be reused. In your program, you are storing a data adapter in the cache and trying to reuse it between different threads, that causes strange results sometimes because the data adapter can't be shared. Imaging two threads get a same instance of your adapter and modify it at the same time! IMO the data adapter is quite lite and you can create a new instance for each db read, it's unnecessary to cache and reuse it, that makes things complex.
I'm reading Vaughn Vernon's book on Implementing Domain Driven design. I have also been going through the book code, C# version, from his github here.
The Java version of the book has decorators #Transactional which I believe are from the spring framework.
public class ProductBacklogItemService
{
#Transactional
public void assignTeamMemberToTask(
string aTenantId,
string aBacklogItemId,
string aTaskId,
string aTeamMemberId)
{
BacklogItem backlogItem =
backlogItemRepository.backlogItemOfId(
new TenantId(aTenantId),
new BacklogItemId(aBacklogItemId));
Team ofTeam =
teamRepository.teamOfId(
backlogItem.tennantId(),
backlogItem.teamId());
backlogItem.assignTeamMemberToTask(
new TeamMemberId(aTeamMemberId),
ofTeam,
new TaskId(aTaskId));
}
}
What would be the equivalent manual implementation in C#? I'm thinking something along the lines of:
public class ProductBacklogItemService
{
private static object lockForAssignTeamMemberToTask = new object();
private static object lockForOtherAppService = new object();
public voice AssignTeamMemberToTask(string aTenantId,
string aBacklogItemId,
string aTaskId,
string aTeamMemberId)
{
lock(lockForAssignTeamMemberToTask)
{
// application code as before
}
}
public voice OtherAppsService(string aTenantId)
{
lock(lockForOtherAppService)
{
// some other code
}
}
}
This leaves me with the following questions:
Do we lock by application service, or by repository? i.e. Should we not be doing backlogItemRepository.lock()?
When we are reading multiple repositories as part of our application service, how do we protect dependencies between repositories during transactions (where aggregate roots reference other aggregate roots by identity) - do we need to have interconnected locks between repositories?
Are there any DDD infrastructure frameworks that handle any of this locking?
Edit
Two useful answers came in to use transactions, as I haven't selected my persistence layer I am using in-memory repositories, these are pretty raw and I wrote them (they don't have transaction support as I don't know how to add!).
I will design the system so I do not need to commit to atomic changes to more than one aggregate root at the same time, I will however need to read consistently across a number of repositories (i.e. if a BacklogItemId is referenced from multiple other aggregates, then we need to protect against race conditions should BacklogItemId be deleted).
So, can I get away with just using locks, or do I need to look at adding TransactionScope support on my in-memory repository?
TL;DR version
You need to wrap your code in a System.Transactions.TransactionScope. Be careful about multi-threading btw.
Full version
So the point of aggregates is that the define a consistency boundary. That means any changes should result in the state of the aggregate still honouring it's invariants. That's not necessarily the same as a transaction. Real transactions are a cross-cutting implementation detail, so should probably be implemented as such.
A warning about locking
Don't do locking. Try and forget any notion you have of implementing pessimistic locking. To build scalable systems you have no real choice. The very fact that data takes time to be requested and get from disk to your screen means you have eventual consistency, so you should build for that. You can't really protect against race conditions as such, you just need to account for the fact they could happen and be able to warn the "losing" user that their command failed. Often you can start finding these issues later on (seconds, minutes, hours, days, whatever your domain experts tell you the SLA is) and tell users so they can do something about it.
For example, imagine if two payroll clerks paid an employee's expenses at the same time with the bank. They would find out later on when the books were being balanced and take some compensating action to rectify the situation. You wouldn't want to scale down your payroll department to a single person working at a time in order to avoid these (rare) issues.
My implementation
Personally I use the Command Processor style, so all my Application Services are implemented as ICommandHandler<TCommand>. The CommandProcessor itself is the thing looking up the correct handler and asking it to handle the command. This means that the CommandProcessor.Process(command) method can have it's entire contents processed in a System.Transactions.TransactionScope.
Example:
public class CommandProcessor : ICommandProcessor
{
public void Process(Command command)
{
using (var transaction = new TransactionScope())
{
var handler = LookupHandler(command);
handler.Handle(command);
transaction.Complete();
}
}
}
You've not gone for this approach so to make your transactions a cross-cutting concern you're going to need to move them a level higher in the stack. This is highly-dependent on the tech you're using (ASP.NET, WCF, etc) so if you add a bit more detail there might be an obvious place to put this stuff.
Locking wouldn't allow any concurrency on those code paths.
I think you're looking for a transaction scope instead.
I don't know what persistency layer you are going to use but the standard ones like ADO.NET, Entity Framework etc. support the TransactionScope semantics:
using(var tr = new TransactionScope())
{
doStuff();
tr.Complete();
}
The transaction is committed if tr.Complete() is called. In any other case it is rolled back.
Typically, the aggregate is a unit of transactional consistency. If you need the transaction to spread across multiple aggregates, then you should probably reconsider your model.
lock(lockForAssignTeamMemberToTask)
{
// application code as before
}
This takes care of synchronization. However, you also need to revert the changes in case of any exception. So, the pattern will be something like:
lock(lockForAssignTeamMemberToTask)
{
try {
// application code as before
} catch (Exception e) {
// rollback/restore previous values
}
}
Now and again I end up with code along these lines, where I create some objects then loop through them to initialise some properties using another class...
ThingRepository thingRepos = new ThingRepository();
GizmoProcessor gizmoProcessor = new GizmoProcessor();
WidgetProcessor widgetProcessor = new WidgetProcessor();
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Loops through setting thing.Gizmo to a new Gizmo
gizmoProcessor.AddGizmosToThings(allThings);
// Loops through setting thing.Widget to a new Widget
widgetProcessor.AddWidgetsToThings(allThings);
return allThings;
}
...which just, well, feels wrong.
Is this a bad idea?
Is there a name of an anti-pattern that I'm using here?
What are the alternatives?
Edit: assume that both GizmoProcessor and WidgetProcessor have to go off and do some calculation, and get some extra data from other tables. They're not just data stored in a repository. They're creating new Gizmos and Widgets based on each Thing and assigning them to Thing's properties.
The reason this feels odd to me is that Thing isn't an autonomous object; it can't create itself and child objects. It's requiring higher-up code to create a fully finished object. I'm not sure if that's a bad thing or not!
ThingRepository is supposed to be the single access point to get collections of Thing's, or at least that's where developers will intuitively look. For that reason, it feels strange that GetThings(DateTime date) should be provided by another object. I'd rather place that method in ThingRepository itself.
The fact that the Thing's returned by GetThings(DateTime date) are different, "fatter" animals than those returned by ThingRepository.FetchThings() also feels awkward and counter-intuitive. If Gizmo and Widget are really part of the Thing entity, you should be able to access them every time you have an instance of Thing, not just for instances returned by GetThings(DateTime date).
If the Date parameter in GetThings() isn't important or could be gathered at another time, I would use calculated properties on Thing to implement on-demand access to Gizmo and Widget :
public class Thing
{
//...
public Gizmo Gizmo
{
get
{
// calculations here
}
}
public Widget Widget
{
get
{
// calculations here
}
}
}
Note that this approach is valid as long as the calculations performed are not too costly. Calculated properties with expensive processing are not recommended - see http://msdn.microsoft.com/en-us/library/bzwdh01d%28VS.71%29.aspx#cpconpropertyusageguidelinesanchor1
However, these calculations don't have to be implemented inline in the getters - they can be delegated to third-party Gizmo/Widget processors, potentially with a caching strategy, etc.
If you have complex intialization then you could use a Strategy pattern. Here is a quick overview adapted from this strategy pattern overview
Create a strategy interface to abstract the intialization
public interface IThingInitializationStrategy
{
void Initialize(Thing thing);
}
The initialization implementation that can be used by the strategy
public class GizmosInitialization
{
public void Initialize(Thing thing)
{
// Add gizmos here and other initialization
}
}
public class WidgetsInitialization
{
public void Initialize(Thing thing)
{
// Add widgets here and other initialization
}
}
And finally a service class that accepts the strategy implementation in an abstract way
internal class ThingInitalizationService
{
private readonly IThingInitializationStrategy _initStrategy;
public ThingInitalizationService(IThingInitializationStrategy initStrategy)
{
_initStrategy = initStrategy;
}
public Initialize(Thing thing)
{
_initStrategy.Initialize(thing);
}
}
You can then use the initialization strategies like so
var initializationStrategy = new GizmosInitializtion();
var initializationService = new ThingInitalizationService(initializationStrategy);
List<Thing> allThings = thingRepos.FetchThings();
allThings.Foreach ( thing => initializationService.Initialize(thing) );
Tho only real potential problem would be that you're iterating over the same loop multiple times, but if you need to hit a database to get all the gizmos and widgets then it might be more efficient to request them in batches so passing the full list to your Add... methods would make sense.
The other option would be to look into returning the gizmos and widgets with the thing in the first repository call (assuming they reside in the same repo). It might make the query more complex, but it would probably be more efficient. Unless of course you don't ALWAYS need to get gizmos and widgets when you fetch things.
To answer your questions:
Is this a bad idea?
From my experience, you rarely know if it's a good/bad idea until you need to change it.
IMO, code is either: Over-engineered, under-engineered, or unreadable
In the meantime, you do your best and stick to the best practices (KISS, single responsibility, etc)
Personally, I don't think the processor classes should be modifying the state of any Thing.
I also don't think the processor classes should be given a collection of Things to modify.
Is there a name of an anti-pattern that I'm using here?
Sorry, unable to help.
What are the alternatives?
Personally, I would write the code as such:
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Build the gizmo and widget for each thing
foreach (var thing in allThings)
{
thing.Gizmo = gizmoProcessor.BuildGizmo(thing);
thing.Widget = widgetProcessor.BuildWidget(thing);
}
return allThings;
}
My reasons being:
The code is in a class that "Gets things". So logically, I think it's acceptable for it to traverse each Thing object and initialise them.
The intention is clear: I'm initialising the properties for each Thing before returning them.
I prefer initialising any properties of Thing in a central location.
I don't think that gizmoProcessor and widgetProcessor classes should have any business with a Collection of Things
I prefer the Processors to have a method to build and return a single widget/gizmo
However, if your processor classes are building several properties at once, then only would I refactor the property initialisation to each processor.
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Build the gizmo and widget for each thing
foreach (var thing in allThings)
{
// [Edited]
// Notice a trend here: The common Initialize(Thing) interface
// Could probably be refactored into some
// super-mega-complex Composite Builder-esque class should you ever want to
gizmoProcessor.Initialize(thing);
widgetProcessor.Initialize(thing);
}
return allThings;
}
P.s.:
I personally do not care that much for (Anti)Pattern names.
While it helps to discuss a problem at a higher level of abstraction, I wouldn't commit every (anti)pattern names to memory.
When I come across a Pattern that I believe is helpful, then only do I remember it.
I'm quite lazy, and my rationale is that: Why bother remembering every pattern and anti pattern if I'm only going to use a handful?
[Edit]
Noticed an answer was already given regarding using a Strategy Service.
I'm currently having a problem where one of my .Net based windows services is hogging up too much memory.
I'm almost positive it has to do with a caching implementation where I have decided to use a "database" caching technique and the problem is occurring with how I am initially loading up the cache values when the service starts up.
Here's the concept...
Class: Service
Operation: Start
Class: Cacheloader
Operation: LoadCache
Class: DataAccessLayer
Operation: Store_Cache_in_DB
...and don't ask me why but...
A) Cacheloader is "newed" up as a local variable in the Service "start" method.
B) DataAccessLayer is static to the service (Singleton via IOC)
So the code kinda looks like this
Service:
start()
{
_cacheLoader = new Cacheloader(_dataAccessLayer);
_cacheLoader.LoadCache();
}
Cacheloader:
LoadCache()
{
var entities = _dataAccessLayer.FindEntitiesForCache();
_dataAccessLayer.Store_Cache_in_DB(entities);
}
DataAccessLayer:
Store_Cache_in_DB(List<Entity> entities)
{
using(someConnection)
{
stored entities in datatable
pass database to sproc that stores each entity
}
}
So, my concern here is with the "entities" that are passed to the static DataAccessLayer object via the Store_Cache_in_DB method. I'm wondering if somehow the garbage collector will not clean these up because somehow they have been referenced with a static class? If this is the case, would it help to assign the entities to a local variable first as so...
DataAccessLayer:
Store_Cache_in_DB(List<Entity> entities)
{
var cachedEntities = entities;
using(someConnection)
{
stored cachedEntities in datatable
pass database to sproc that stores each entity
}
}
...hopefully this would solve my problem. If this isn't the reason why my memory consumption is so high, are there any other ideas? Again, I'm sure this caching technique is the perpetrator.
Thanks in advance!
If this is the case, would it help to assign the entities to a local variable first as so...
Having a local won't change anything - the same object instance will be reachable by user code.
The only part that might keep this from being garbage collected is what happens here:
using(someConnection)
{
stored cachedEntities in datatable
pass database to sproc that stores each entity
}
If entities are kept in a variable that persists, it will prevent them from being collected later.
If this isn't the reason why my memory consumption is so high, are there any other ideas?
I would recommend running this under a memory profiler, as it will tell you exactly what's holding onto your memory.
I have a method that I want to be "transactional" in the abstract sense. It calls two methods that happen to do stuff with the database, but this method doesn't know that.
public void DoOperation()
{
using (var tx = new TransactionScope())
{
Method1();
Method2();
tc.Complete();
}
}
public void Method1()
{
using (var connection = new DbConnectionScope())
{
// Write some data here
}
}
public void Method2()
{
using (var connection = new DbConnectionScope())
{
// Update some data here
}
}
Because in real terms the TransactionScope means that a database transaction will be used, we have an issue where it could well be promoted to a Distributed Transaction, if we get two different connections from the pool.
I could fix this by wrapping the DoOperation() method in a ConnectionScope:
public void DoOperation()
{
using (var tx = new TransactionScope())
using (var connection = new DbConnectionScope())
{
Method1();
Method2();
tc.Complete();
}
}
I made DbConnectionScope myself for just such a purpose, so that I don't have to pass connection objects to sub-methods (this is more contrived example than my real issue). I got the idea from this article: http://msdn.microsoft.com/en-us/magazine/cc300805.aspx
However I don't like this workaround as it means DoOperation now has knowledge that the methods it's calling may use a connection (and possibly a different connection each). How could I refactor this to resolve the issue?
One idea I'm thinking of is creating a more general OperationScope, so that when teamed up with a custom Castle Windsor lifestyle I'll write, will mean any component requested of the container with OperationScopeLifetyle will always get the same instance of that component. This does solve the problem because OperationScope is more ambiguous than DbConnectionScope.
I'm seeing conflicting requirements here.
On the one hand, you don't want DoOperation to have any awareness of the fact that a database connection is being used for its sub-operations.
On the other hand, it clearly is aware of this fact because it uses a TransactionScope.
I can sort of understand what you're getting at when you say you want it to be transactional in the abstract sense, but my take on this is that it's virtually impossible (no, scratch that - completely impossible) to describe a transaction in such abstract terms. Let's just say you have a class like this:
class ConvolutedBusinessLogic
{
public void Splork(MyWidget widget)
{
if (widget.Validate())
{
widgetRepository.Save(widget);
widget.LastSaved = DateTime.Now;
OnSaved(new WidgetSavedEventArgs(widget));
}
else
{
Log.Error("Could not save MyWidget due to a validation error.");
SendEmailAlert(new WidgetValidationAlert(widget));
}
}
}
This class is doing at least two things that probably can't be rolled back (setting the property of a class and executing an event handler, which might for example cascade-update some controls on a form), and at least two more things that definitely can't be rolled back (appending to a log file somewhere and sending out an e-mail alert).
Perhaps this seems like a contrived example, but that is actually my point; you can't treat a TransactionScope as a "black box". The scope is in fact a dependency like any other; TransactionScope just provides a convenient abstraction for a unit of work that may not always be appropriate because it doesn't actually wrap a database connection and can't predict the future. In particular, it's normally not appropriate when a single logical operation needs to span more than one database connection, whether those connections are to the same database or different ones. It tries to handle this case of course, but as you've already learned, the result is sub-optimal.
The way I see it, you have a few different options:
Make explicit the fact that Method1 and Method2 require a connection by having them take a connection parameter, or by refactoring them into a class that takes a connection dependency (constructor or property). This way, the connection becomes part of the contract, so Method1 no longer knows too much - it knows exactly what it's supposed to know according to the design.
Accept that your DoOperation method does have an awareness of what Method1 and Method2 do. In fact, there is nothing wrong with this! It's true that you don't want to be relying on implementation details of some future call, but forward dependencies in the abstraction are generally considered OK; it's reverse dependencies you need to be concerned about, like when some class deep in the domain model tries to update a UI control that it has no business knowing about in the first place.
Use a more robust Unit of Work pattern (also: here). This is getting to be more popular and it is, by and large, the direction Microsoft has gone in with Linq to SQL and EF (the DataContext/ObjectContext are basically UOW implementations). This sleeves in well with a DI framework and essentially relieves you of the need to worry about when transactions start and end and how the data access has to occur (the term is "persistence ignorance"). This would probably require significant rework of your design, but pound for pound it's going to be the easiest to maintain long-term.
Hope one of those helps you.