How to clean dictionary by deleting old elements from it - c#

Given is Dictionary<T, U> in asp.net application. Application is running for a long time and new elements will be continuously added to the dictionary. My goal is to keep so many elements in Dictionary as possible, without "memory overflow" or other side effects. How should I clean the collection to give some memory free for new elements, by deleting old ones? How to check how many old elements and when should elements be deleted?

Have you checked out Application.Cache ?
Sounds to me like it's just what you need...
http://msdn.microsoft.com/en-us/library/ms178597(v=vs.100).aspx
Quote:
The advantage of using the application cache is that ASP.NET manages the cache and removes items when they expire or become invalidated, or when memory runs low.

This is not a simple task. It sounds as if you are using the dictionary as cache in which case you should look into standard caching for asp.net.
That said you could choose different strategies. You could choose to set an upper bound on the number of elements you want in the dictionary. This is can be done by having a linked list with elements and letting the dictionary store the LinkedListElement<U>. When ever you retrieve something from the dictionary, you move the LinkedListElement<U> to the front of the linked list. This way you always have the newest elements on top of the list. When you insert elements, you add them to both the linked list and the dictionary, and test if the dictionary size limit is met. If it is, you remove elements from the bottom of the linked list and the dictionary.
Remember locking!

Since you are using a managed language you are limited in the way how you can manage your memory and the practices of manual memory freeing or forcing garbage collector do it are all deemed as suboptimal.
So, there is no direct way to say how many elements your generic dictionary is allowed to keep and how much memory you still can claim. In the same time deleting some old values from the dictionary will not automatically lead to immediate increase of available memory.
This said, you can keep monitoring the memory currently allocated to your application using GC.GetTotalMemory (as long as your object are fully manageable, e.g. contain no unmanaged code or data blocks) and compare it to the total system memory available to application.
Or, if you know the size of the objects you put into your Dictionary<>, you might be interested in using MemoryFailPoint class that will tell you if you can create a new object of the given size or not, so you won't bump into the OutOfMemory exception and have time to free some resources.
Finally, if you are using the dictionary as a cache, there are some built-in cache providers in ASP.NET, read from here. The cache will automatically drop objects that are obsolete or old in case of memory pressure.

Related

Patterns to not lock large collections when doing maintenance work (e.g. deleting entries)?

One question I recurrently face is how to efficiently have writers use a collection (mostly add to it) and at the same time have maintenance logic not lock the collection or slow down writers. Is there any pattern that efficiently achieves this or is using concurrent collections the only way to go?
An idea would be to use a segmented list, a list backed-up by two internal segments (List's). To read a value from the collection you should look at both segments, so there would be a read overhead. To delete old entries from the collection you would just replace the older segment with a new empty segment (the old segment would be recycled).
This idea may be viable if you don't read from the collection too often.

Application performance degradation due to several large in memory Dictionary objects in C#

I am working on a winforms application where I have to load data from Web API calls. Few million rows of data will be returned and had to be stored in a Dictionary. The logic goes like this. User will click on an item and data would be loaded. If the user clicks on another item, another new dictionary would be created.During the course of time several such heavy weight Dictionary objects would be created. The user might not use the old Dictionary objects after some time. Is this a case for using WeakReference?. Note that recreating any Dictionary object would take 10 to 20 secs. If I opt to keep all the objects in memory, the application performance degrades slowly after some time.
The answer here is to use a more advanced technique.
Use a memory-mapped file to store the dictionaries on disk; then you don't have to worry about holding them all in memory at once as they will be swapped in and out by the OS per demand.
You will want to write a Dictionary designed specifically to operate in the memory mapped file region, and a heap to store things pointed to by the key value pairs in the dictionary. Since you aren't deleting anything, this is actually pretty straightforward.
Otherwise you should take Fildor 4's suggestion and Just Use A Database, as it will basically do everything I just mentioned for you and wrap it up in a nice syntax.

Whats the difference between LRU Caching and Memory Caching C#

Please excuse my noob question as I am still a junior coder , whats the difference between LRU Caching using Dictionary and Linked list and Memory Caching C#, how would one implement a LRU list on say memory cache.
Thanks in advance.
LRU is a algorithm of expiring the cache and adding new Item to your cache. the this algorithm expires the least recently used Item in your cache when the cache is full.
The MemoryCache is a class in .net 4 and after which is a way of implementing caching inside the heap memory. Caching can be categorise in different ways base on the media that you cache you can cache on hard drive or memory, based on the location of the memory you can categorize it as in-memory (inside the heap memory) and out-memory (a memory out side the heap for example on another server). MemoryCaching in c# uses in-memory and you have to be careful because it can use all the memory of your application. So its better not to use it if you have more than one node.
One another things you have to take into consideration is that you when you cache an object in an out-memory the object should be serializable. But the in-memory caching can cache any object without serializing.
Least-recently-used (LRU) evicts the key-value used-the-least when the cache is full and it needs to add a value. Whereas a MemoryCache evicts the oldest key-values, or those past their 'use-by-date' if they happen to have one.
Say if the first key-value you added is vital and you happened to read all the time, well in a LRU cache it would be kept, but in a memoryCache it would eventually disappear and need to be replaced. Sometimes though older key-values disappearing is what your after, so up-to-date values get pulled through from your backend (e.g. database).
Consider also if adding a existing key-value should be considered as being a 'used' (so recently updated stuff tends to stay around) or if 'used' is only when you read a key-value, so you just favour the things your reader likes. As always I would consider concurrency if you have more than one task or thread using it.

C# .NET Memory Management with Data Structures (Dictionary, List, etc.)

I am hoping that someone can shed some light on how .NET handles garbage collection in the following case.
I have a program where I need to do a very specific kind of "Find in Files" functionality like you would see in Visual Studio. I have to search potentially thousands of files, and I collect the results in a List(Pair()) object, where Pair is a simple class I created for storing a pair of items (obviously).
When I am through using what I need, I call Clear() on the list in order to get rid of the old information. This does not seem to help free memory because I can see on my Task Manager that the memory consumed does not decrease.
For a really large search, I am potentially dealing with 5,000,000 lines of information (approx. 500MB of memory usage on my machine) that need to be handled. When my search is through, the memory consumed level stays the same. I made my Pair class implement IDisposable, and that didn't help.
Any idea what I might be missing? Thanks!
The garbage collection will clear memory when needed, that is, not when you "clear" the list, but when it finds out that none of the items that were referenced in it are referenced any more and when the process/computer is running out of memory.
There is no need to micromanage memory in C#.
The .NET Garbage Collector is surprisingly good. In general you shouldn't worry about the memory consumption you see in task manager because as you are observing, the garbage collector doesn't reclaim memory as soon as you would think. The reason for this is reclaiming memory is an expensive operation. If the memory isn't needed at that moment, why go messing around in there? The inner workings are of when it does go reclaiming space are pretty involved. There are different levels of collection the GC goes through (called Generations) to reclaim memory optimized for speed.
There are lots of articles which can explain this in more detail better than I can. Here is a starting point.
http://msdn.microsoft.com/en-us/library/ms973837.aspx
For now you should see at what point you end up getting out of memory exceptions, if at all, and go from there.
When you call Clear() all references to the Pair objects will be removed, this will cause those objects to be GC'ed eventually unless another object holds references to them, but you cannot count on when that will happen - it also depends on memory pressure.
As a side note you can use Tuple in C# 4 instead of Pair.

Efficiency of persistence methods for large asp.net cache store

Curious if anyone has opinions on which method would be better suited for asp.net caching. Option one, have fewer items in the cache which are more complex, or many items which are less complex.
For sake of discussion lets imagine my site has SalesPerson and Customer objects. These are pretty simple classes but I don’t want to be chatty with the database so I want to lazy load them into cache and invalidate them out of the cache when I make a change – simple enough.
Option 1
Create Dictionary and cache the entire dictionary. When I need to load an instance of a SalesPerson from the cache I get out the Dictionary and perform a normal key lookup against the Dictionary.
Option 2
Prefix the key of each item and store it directly in the asp.net cache. For example every SalesPerson instance in the cache would use a composite of the prefix plus the key for that object so it may look like sp_[guid] and is stored in the asp.net cache and also in the cache are the Customer objects with a key like cust_[guid].
One of my fears with option two is that the numbers of entries will grow very large, between SalesPerson, Customer and a dozen or so other categories I might have 25K items in cache and highly repetitive lookups for something like a string resource that I am using in several places might pay a penalty while the code looks through the cache’s key collection to find it amongst the other 25K.
I am sure at some point there is a diminishing return here on storing too many items in the cache but I am curious as to opinions on these matters.
You are best off to create many, smaller items in the cache than to create fewer, larger items. Here is the reasoning:
1) If your data is small, then the number of items in the cache will be relatively small and it won't make any difference. Fetching single entities from the cache is easier than fetching a dictionary and then fetching an item from that dictionary, too.
2) Once your data grows large, the cache may be used to manage the data in an intelligent fashion. The HttpRuntime.Cache object makes use of a Least Recently Used (LRU) algorithm to determine which items in the cache to expire. If you have only a small number of highly used items in the cache, this algorithm will be useless. However, if you have many smaller items in the cache, but 90% of them are not in use at any given moment (very common usage heuristic), then the LRU algorithm can ensure that those items that are seeing active use remain in the cache while evicting less-used items to ensure sufficient room remains for the used ones.
As your application grows, the importance of being able to manage what is in the cache will be most important. Also, I've yet to see any performance degradation from having millions of keys in the cache -- hashtables are extremely fast and if you find issues there it's likely easily solved by altering your naming conventions for your cache keys to optimize them for use as hashtable keys.
The ASP.NET Cache uses its own dictionary so using its dictionary to locate your dictionary to do lookups to retrieve your objects seems less than optimal. Dictionaries use hash tables which is about the most efficient lookup you can do. Using your own dictionaries would just add more overhead, I think. I don't know about diminishing returns in regards to hash tables, but I think it would be in terms of storage size, not lookup time.
I would concern yourself with whatever makes your job easier. If having the Cache more organized will make your app easier to understand, debug, extend and maintain then I would do it. If it makes those things more complex then I would not do it.
And as nullvoid mentioned, this is all assuming you've already explored the larger implications of caching, which involve gauging the performance gains vs. the performance hit. You're talking about storing lots and lots of objects, and this implies lots of cache traffic. I would only store something in the cache that you can measure a performance gain from doing so.
We have built an application that uses Caching for storing all resources. The application is multi-language, so for each label in the application we have at least three translations. We load a (Label,Culture) combination when first needed and then expire it from cache only if it was changed by and admin in the database. This scenario worked perfectly well even when the cache contained 100000 items in it. We only took care to configure the cache and the expiry policies such that we really benefit of the Cache. We use no-expiration, so the items are cached until the worker process is reset or until the item is intentionally expired. We also took care to define a domain for the values of the keys in such a way to uniquely identify a label in a specific culture with the least amount of characters.
I'm going to assume that you've considered all the implications of data changing from multiple users and how that will affect the cached data in terms of handling conflicting data. Caching is really only meant to be done on reletively static data.
From an efficiency perspective I would assume that if you're using the .net serialization properly you're going to benefit from storing the data in the cache in the form of larger typed serialized collections rather than individual base types.
From a maintenance perspective this would also be a better approach, as you can create a strongly typed object to represent the data and use serialization to cast it between the cache and your salesperson/customer object.

Categories

Resources