I'm currently having a problem where one of my .Net based windows services is hogging up too much memory.
I'm almost positive it has to do with a caching implementation where I have decided to use a "database" caching technique and the problem is occurring with how I am initially loading up the cache values when the service starts up.
Here's the concept...
Class: Service
Operation: Start
Class: Cacheloader
Operation: LoadCache
Class: DataAccessLayer
Operation: Store_Cache_in_DB
...and don't ask me why but...
A) Cacheloader is "newed" up as a local variable in the Service "start" method.
B) DataAccessLayer is static to the service (Singleton via IOC)
So the code kinda looks like this
Service:
start()
{
_cacheLoader = new Cacheloader(_dataAccessLayer);
_cacheLoader.LoadCache();
}
Cacheloader:
LoadCache()
{
var entities = _dataAccessLayer.FindEntitiesForCache();
_dataAccessLayer.Store_Cache_in_DB(entities);
}
DataAccessLayer:
Store_Cache_in_DB(List<Entity> entities)
{
using(someConnection)
{
stored entities in datatable
pass database to sproc that stores each entity
}
}
So, my concern here is with the "entities" that are passed to the static DataAccessLayer object via the Store_Cache_in_DB method. I'm wondering if somehow the garbage collector will not clean these up because somehow they have been referenced with a static class? If this is the case, would it help to assign the entities to a local variable first as so...
DataAccessLayer:
Store_Cache_in_DB(List<Entity> entities)
{
var cachedEntities = entities;
using(someConnection)
{
stored cachedEntities in datatable
pass database to sproc that stores each entity
}
}
...hopefully this would solve my problem. If this isn't the reason why my memory consumption is so high, are there any other ideas? Again, I'm sure this caching technique is the perpetrator.
Thanks in advance!
If this is the case, would it help to assign the entities to a local variable first as so...
Having a local won't change anything - the same object instance will be reachable by user code.
The only part that might keep this from being garbage collected is what happens here:
using(someConnection)
{
stored cachedEntities in datatable
pass database to sproc that stores each entity
}
If entities are kept in a variable that persists, it will prevent them from being collected later.
If this isn't the reason why my memory consumption is so high, are there any other ideas?
I would recommend running this under a memory profiler, as it will tell you exactly what's holding onto your memory.
Related
I started working recently in a new project where we have thousands of lines of legacy code. We are facing several performance issues. I decided to take a look at the code and saw the following. There's a class:
public class BaseDataAccess
{
private Database dB;
public Database DB
{
get
{
if (dB == null)
{
dB = DatabaseFactory.CreateDatabase();
}
return dB;
}
}
}
And many descendant classes which inherit from the previous base class. Internally, those other classes make use of the DB property, like this:
DataSet ds = DB.ExecuteDataSet(spGetCustomersSortedByAge);
Finally, there's a huge class (5000 lines of code) with tens of methods like the following:
public void ProcessPayments()
{
try
{
List<Employee> employees = new EmployeesDA().SelectAll(null);
foreach (Employee employee in employees)
{
employee.Account = new MovementsDA().SelectAll(employee.Id, DateTime.Now);
...
City city = new CitiesDA().Select(zone.cityId);
...
Management m = new ManagmentDA().Select(city.id);
}
}
catch (Exception ex)
{
...
}
}
Note in the previous method EmployeesDA, MovementsDA, CitiesDA and ManagmentDA all are inheritors of BaseDataAccess and internally use their respective DB properties. Also note they are constantly being instantiated inside foreach loops (many times within 2 levels of nesting).
I think the instantiation itself is suspicious but I'm more concerned about what's going on with the database connections here? Is every DA instantiated opening a new underlying connection? How bad is this code?
As a side note about the solution I was considering in case this code should be fixed: I was considering making every constructor private so the compiler starts complaining about the instantiations and refactor the instantiations with calls to the GetInstance method (singleton pattern) to avoid the recreation of the objects and underlying connections. But, I'm not sure if this could also be dangerous in some way, for example, if the connections may get closed. The current code doesn't have that problem because of the instantiatons happening all the time.
It's a common misconception that object construction is expensive. It's much more expensive than base arithmetic or other machine-level things, but isn't likely the direct source of performance issues.
Using boxed integers for a loop is wasteful for example, but constructing an Employee object in each vs reusing a mutable Employee object isn't going to give meaningful performance advantages.
Many garbage collectors are capable of object memory frame reuse in loops like this. In effect a single object frame is allocated and overwritten on each pass of the loop.
In this specific case there may be a cost if the DA's have significant initialization costs. If that is the case I would refactor the code to create those once outside the loop. I would not use actual static singletons. I would use dependency injection to manage singleton objects if you need it. Static singletons are effectively global variables and are an invitation to stateful coupling and the break down of modularity.
It must be a very dump question but I am wondering if I can use a cached object as part of the using statement
e.g
using(Class1 sample = Cache.GetClass<Class1>())
Cache.class is a static class which uses memoryCache to store a copy of Class1, and the GetClass is to get a copy of the stored object from cache if it is already there.
In my real life (almost, but simpilfied) exmaple, I have got this:
using (dataDesignerClass dds = Cache.GetClass<dataDesignerClass>()){
...
Dataset ds = new Dataset();
dds.dataadapter1.fill(ds); //dds is the data designer which contains all the sqlconnection, sql commands, adapters..etc which can get quite big
...
}
..which seems to be ok to me, but I find that SOMETIMES the dataset (ds) is not filled by the dataadapter1, without returning error.
My GetClass static class:
public static T GetClass<T> () where T: class
{
string keyName = "CACHE_" + typeof(T).Name.ToUpper();
CacheItem cacheItem = null;
cacheItem = GetCache(keyName); //a function to return the cache item
if (cacheItem == null)
{
T daClass = Activator.CreateInstance(typeof(T)) as T; //the constructor will call the initilalization routine
AddCache(keyName, daClass);
return daClass;
}
return (T)cacheItem.Value;
}
Can someone explain why it fails?
I think it is a bad idea to use using on something you cache.
The idea behind using is that it disposes all unmanaged memory allocation and handles an object has before it is destructed. You should not use your object after it is disposed. The problem here is it is not your intention to destruct and get rid of the object, hence you save it in a cache!
Also, a DataReader is somewhat of a cursor typed object. It will not like you for reusing it, especially when you use more than one thread.
Disposing the object will most likely break your software and give unexpected and unwanted result. Don't use using in this scenario.
Reusing a shared object is sometimes good practice, but you need to make sure it can be reused. In your program, you are storing a data adapter in the cache and trying to reuse it between different threads, that causes strange results sometimes because the data adapter can't be shared. Imaging two threads get a same instance of your adapter and modify it at the same time! IMO the data adapter is quite lite and you can create a new instance for each db read, it's unnecessary to cache and reuse it, that makes things complex.
So I've written a couple of wrapper methods around the System.Runtime MemoryCache, to get a general/user bound cache context per viewmodel in my ASP.NET MVC application.
At some point i noticed that my delegate just keeps getting called every time rather than retrieving my stored object for no apparent reason.
Oddly enough none of my unit tests (which use simple data to check it) failed or showed a pattern explaining that.
Here's one of the wrapper methods:
public T GetCustom<T>(CacheItemPolicy cacheSettings, Func<T> createCallback, params object[] parameters)
{
if (parameters.Length == 0)
throw new ArgumentException("GetCustom can't be called without any parameters.");
lock (_threadLock)
{
var mergedToken = GetCacheSignature(parameters);
var cache = GetMemoryCache();
if (cache.Contains(mergedToken))
{
var cacheResult = cache.Get(mergedToken);
if (cacheResult is T)
return (T)cacheResult;
throw new ArgumentException(string.Format("A caching signature was passed, which duplicates another signature of different return type. ({0})", mergedToken));
}
var result = createCallback(); <!-- keeps landing here
if (!EqualityComparer<T>.Default.Equals(result, default(T)))
{
cache.Add(mergedToken, result, cacheSettings);
}
return result;
}
}
I was wondering if anyone here knows about conditions which render an object invalid for storage within the MemoryCache.
Until then i'll just strip my complex classes' properties until storage works.
Experiences would be interesting nevertheless.
There are couple frequent reasons why it may be happening (assuming correct logic to actually add objects to cache/find correct cache instance):
x86 (32bit) process have "very small" amount of memory to deal with - it is relatively easy to consume too much memory outside the cache (or particular instance of the cache) and as result items will be immediately evicted from the cache.
ASP.Net app domain recycles due to variety of reasons will clear out cache too.
Notes
generally you'd store "per user cached information" in session state so it managed appropriately and can be persisted via SQL/other out-of-process state options.
relying on caching per-user objects may not improve performance if you need to support larger number of users. You need to carefully measure impact on the load level you expect to have.
Yet another How-to-free-memory question:
I'm copying data between two databases which are currently identical but will soon be getting out of sync. I have put together a skeleton app in C# using Reflection and ADO.Net Entities that does this:
For each table in the source database:
Clear the corresponding table in the destination database
For each object in the source table
For each property in the source object
If an identically-named property exists in the destination object, use Reflection to copy the source property to the destination property
This works great until I get to the big 900MB table that has user-uploaded files in it.
The process of copying the blobs (up to 7 MB each) to my machine and back to the destination database uses up local memory. However, that memory isn't getting freed, and the process dies once it's copied about 750 MB worth of data - with my program having 1500 MB of allocated space when the OutOfMemoryException is thrown, presumably two copies of everything that it's copied so far.
I tried a naive approach first, doing a simple copy. It worked on every table until I got to the big one. I have tried forcing a GC.Collect() with no obvious change to the results. I've also tried putting the actual copy into a separate function in hopes that the reference going out of scope would help it get GCed. I even put a Thread.Sleep in to try to give background processes more time to run. All of these have had no effect.
Here's the relevant code as it exists right now:
public static void CopyFrom<TSource, TDest>(this ObjectSet<TDest> Dest, ObjectSet<TSource> Source, bool SaveChanges, ObjectContext context)
where TSource : class
where TDest : class {
int total = Source.Count();
int count = 0;
foreach (var src in Source) {
count++;
CopyObject(src, Dest);
if (SaveChanges && context != null) {
context.SaveChanges();
GC.Collect();
if (count % 100 == 0) {
Thread.Sleep(2000);
}
}
}
}
I didn't include the CopyObject() function, it just uses reflection to evaluate the properties of src and put them into identically-named properties in a new object to be appended to Dest.
SaveChanges is a Boolean variable passed in saying that this extra processing should be done, it's only true on the big table, false otherwise.
So, my question: How can I modify this code to not run me out of memory?
The problem is that your database context utilizes a lot of caching internally, and it's holding onto a lot of your information and preventing the garbage collector from freeing it (whether you call Collect or not).
This means that your context is defined at too high of a scope. (It appears, based on your edit, that you're using it across tables. That's...not good.) You haven't shown where it is defined, but wherever it is it should probably be at a lower level. Keep in mind that because of connection pooling creating new contexts is not expensive, and based on your use cases you shouldn't need to rely on a lot of the cached info (because you're not touching items more than once) so frequently creating new contexts shouldn't add performance costs, even though it's substantially decreasing your memory footprint.
Can we avoid casting T to Object when placing it in a Cache?
WeakReference necessitate the use of objects. System.Runtime.Caching.MemoryCache is locked to type object.
Custom Dictionaries / Collections cause issues with the Garbage Collector, or you have to run a Garbage Collector of your own (a seperate thread)?
Is it possible to have the best of both worlds?
I know I accepted an answer already, but using WeakReference is now possible! Looks like they snuck it into .Net 4.
http://msdn.microsoft.com/en-us/library/gg712911(v=VS.96).aspx
an old feature request for the same.
http://connect.microsoft.com/VisualStudio/feedback/details/98270/make-a-generic-form-of-weakreference-weakreference-t-where-t-class
There's nothing to stop you writing a generic wrapper around MemoryCache - probably with a constraint to require reference types:
public class Cache<T> where T : class
{
private readonly MemoryCache cache = new MemoryCache();
public T this[string key]
{
get { return (T) cache[key]; }
set { cache[key] = value; }
}
// etc
}
Obviously it's only worth delegating the parts of MemoryCache you're really interested in.
So you basically want to dependanct inject a cache provider that only returns certain types?
Isn't that kind of against everything OOP?
The idea of the "object" type is that anything and everything is an object so by using a cache that caches instances of "objects" of type object you are saying you can cache anything.
By building a cache that only caches objects of some predetermined type you are limiting the functionality of your cache however ...
There is nothing stopping you implementing a custom cache provider that has a generic constraint so it only allows you cache certain object types, and this in theory would save you about 2 "ticks" (not even a millisecond) per retrieval.
The way to look at this is ...
What's more important to me:
Good OOP based on best practise
about 20 milliseconds over the lifetime of my cache provider
The other thing is ... .net is already geared to optimise the boxing and unboxing process to the extreme and at the end of the day when you "cache" something you are simply putting it somewhere it can be quickly retrieved and storing a pointer to its location for that retrieval later.
I've seen solutions that involve streaming 4GB XML files through a business process that use objects that are destroyed and recreated on every call .. the point is that the process flow was important not so much the initialisation and prep work if that makes sense.
How important is this casting time loss to you?
I would be interested to know more about scenario that requires such speed.
As a side note:
Another thing i've noticed about newer technologies like linq and entity framework is that the result of query is something that is important to cache when the query takes a long time but not so much the side effects on the result.
What this means is that (for example):
If i was to cache a basic "default instance" of an object that uses a complex set of entity queries to create, I wouldn't cache the resulting object but the queries.
With microsoft already doing the ground work i'd ask ... what am i caching and why?