Efficient queue clearing in C#

Efficient queue clearing in C# - c#

I'm currently dealing with a queue that has a couple thousand entries in it. To save on RAM usage I'm currently using the TrimExcess() method built in to the queue datatype by Microsoft. As mentioned in the documentation
, this method is inefficient when it comes to large lists and results in a significant time loss whenever it is called.
Is there a more efficient way to remove items from a queue that actually deletes them in RAM aswell?
Edit: to clarify there are still elements in the queue, I just want to remove the elements that have been dequeued from RAM

The answer to your question is "Don't worry about that, and it's very likely that you should not do that." This answer is an elaboration of the comments from #madreflection and myself.
The Queue<T> class, like other collection classes uses an array to hold the items it collects. Like other collection classes, if the array runs out of slots, the Queue class creates a new, larger array and copies the data from old array.
I haven't looked at the Queue<T> source, but using the debugger, I can see that this array is the _array member of the class. It starts with an array of 0 length. When you enqueue one item, it gets replaced by an array of length 4. After that, the array doubles in size whenever it needs more space.
You say your queue "has a couple thousand entries in it". I'm going to use 2000 in this analysis as a rough guess.
As you enqueue more and more net entries into the queue, that array doubling will happen several times:
At First
After 5
After 9
After 17
After 33
4 Entries
Double to 8
Double to 16
Double to 32
Double to 64
It will keep doing this until it's doubled (and copied the array contents) 10 times - bringing it to 2048. At that point, you will have allocated 10 arrays, nine of which are garbage, and done about 3000 queued element copies.
Now think about it. I'm guessing you are enqueue reference type objects. A reference type object is represented by an object reference (in effect a pointer). If you have 2000 instances in a queue that will represent 8kb on a 32-bit machine (plus some one-time overhead for the members of the queue class). On a 64-bit machine, it's 16kb. That's nothing for a modern computer. The .NET garbage collector has two strategies for managing memory, a normal one and one for large objects. The boundary is 85kb; your queue will never be a large object
If you are enqueuing large value types, then more memory is needed (since the value type objects will be copied into the array elements that make up the queue entries). You'd need to be using very large value type objects before your queue becomes a large object.
The other thing that will happen is that as your queue grows in size, it will settle into the Garbage Collector's Gen2 memory area. Gen2 collections are expensive, but once an object becomes stable in Gen2, it doesn't bother the garbage collector at all.
But, think about what happens if you reduce your queue size way down to, say 100 entries and call TrimExcess. At that point, yet another new array will be created (this time, much smaller) and the entries in your queue will be copied to that new queue (that's what the notes in the Remarks section of the TrimExcess documentation is referring to when it talks about The cost of reallocating and copying a large Queue<T>). If your queue starts growing again, you will start doubling/copying that array over and over again - spinning off more garbage and spinning your wheels doing the copying.
A better strategy is to look at your estimated queue size, inflate it a bit, and pre-allocate the space for all of those entries at construction time. If you expect to have 2000 entries, allocate space for 2500 in the constructor:
var myQueue = new Queue<SomeType>(2500);
Now, you do one allocation, there should be no reallocation or array copying, and your memory will quickly migrate to Gen2, but will never be touched by the GC.

Related

Lists double their space in c# when they need more room. At some point does it become less efficient to double say 1024 to 2048?

When numbers are smaller, it's quick to grow the size of an array list from 2 to 4 memory addresses but when it starts to increase the amount of space closer to the max amount of space allowed in an array list (close to the 2MB limit). Would changing how much space is allotted in those bigger areas be more efficient if it was only growing the size of the array by a fraction of the size it needs at some point? Obviously growing the size from 1mb to 2mb isn't really a big deal now-days HOWEVER, if you had 50,000 people running something per hour that did this doubling the size of an array, I'm curious if that would be a good enough reason to alter how this works. Not to mention cut down on un-needed memory space (in theory).
A small graphical representation of what I mean..
ArrayList a has 4 elements in it and that is it's current max size at the moment
||||
Now lets add another item to the arraylist, the internal code will double the size of the array even though we're only adding one thing to the array.
The arraylist now becomes 8 elements large
||||||||
At these size levels, I doubt it makes any difference but when you're allocating 1mb up to 2mb everytime someone is doing something like adding some file into an arraylist or something that is around 1.25mb, there's .75mb of un-needed space allocated.
To give you more of an idea of the code that is currently ran in c# by the System.Collections.Generic class. The way it works now is it doubles the size of an array list (read array), every time a user tries to add something to an array that is too small. Doubling the size is a good solution and makes sense, until you're essentially growing it far bigger than you technically need it to be.
Here's the source for this particular part of the class:
private void EnsureCapacity(int min)
{
if (this._items.Length >= min)
return;
// This is what I'm refering to
int num = this._items.Length == 0 ? 4 : this._items.Length * 2;
if ((uint) num > 2146435071U)
num = 2146435071;
if (num < min)
num = min;
this.Capacity = num;
}
I'm going to guess that this is how memory management is handled in many programming languages so this has probably been considered many times before, just wondering if this is a kind of efficiency saver that could save system resources by a large amount on a massive scale.

As the size of the collection gets larger, so does the cost of creating a new buffer as you need to copy over all of the existing elements. The fact that the number of these copies that need to be done is indirectly proportional to the expense of each copy is exactly why the amortized cost of adding items to a List is O(1). If the size of the buffer increases linearly, then the amortized cost of adding an item to a List actually becomes O(n).
You save on memory, allowing the "wasted" memory to go from being O(n) to being O(1). As with virtually all performance/algorithm decisions, we're once again faced with the quintessential decision of exchanging memory for speed. We can save on memory and have slower adding speeds (because of more copying) or we can use more memory to get faster additions. Of course there is no one universally right answer. Some people really would prefer to have a slower addition speed in exchange for less wasted memory. The particular resource that is going to run out first is going to vary based on the program, the system that it's running on, and so forth. Those people in the situation where the memory is the scarcer resource may not be able to use List, which is designed to be as wildly applicable as possible, even though it can't be universally the best option.

The idea behind the exponential growth factor for dynamic arrays such as List<T> is that:
The amount of wasted space is always merely proportional to the amount of data in the array. Thus you are never wasting resources on a more massive scale than you are properly using.
Even with many, many reallocations, the total potential time spent copying while creating an array of size N is O(N) -- or O(1) for a single element.
Access time is extremely fast at O(1) with a small coefficient.
This makes List<T> very appropriate for arrays of, say, in-memory tables of references to database objects, for which near-instant access is required but the array elements themselves are small.
Conversely, linear growth of dynamic arrays can result in n-squared memory wastage. This happens in the following situation:
You add something to the array, expanding it to size N for large N, freeing the previous memory block (possibly quite large) of size N-K for small K.
You allocate a few objects. The memory manager puts some in the large memory block just vacated, because why not?
You add something else to the array, expanding it to size N+K for some small K. Because the previously freed memory block now is sparsely occupied, the memory manager does not have a large enough contiguous free memory block and must request more virtual memory from the OS.
Thus virtual memory committed grows quadratically despite the measured size of objects created growing linearly.
This isn't a theoretical possibility. I actually had to fix an n-squared memory leak that arose because somebody had manually coded a linearly-growing dynamic array of integers. The fix was to throw away the manual code and use the library of geometrically-growing arrays that had been created for that purpose.
That being said, I also have seen problems with the exponential reallocation of List<T> (as well as the similarly-growing memory buffer in Dictionary<TKey,TValue>) in 32-bit processes when the total memory required needs to grow past 128 MB. In this case the List or Dictionary will frequently be unable to allocate a 256 MB contiguous range of memory even if there is more than sufficient virtual address space left. The application will then report an out-of-memory error to the user. In my case, customers complained about this since Task Manager was reporting that VM use never went over, say, 1.5GB. If I were Microsoft I would damp the growth of 'List' (and the similar memory buffer in Dictionary) to 1% of total virtual address space.

Very heavy load of data cause a out of memory exception in a foreach

First of all, I got this huge xml file that represents data collected by an equipment. I convert this into an object. In fact, this object got a list of object. These objects have three strings in them. The strings look like this:
0,12987;0,45678;...
It is some sort of a list of double arranged this way for matters of performances. There are 1k doubles in each string, so 3k by object, and there are something like 3k objects just to give you an idea of a typical case.
When I read the data, I most get all the doubles from the objets and add them to the same list. I made an "object that contains three doubles" (one for each string) in a foreach, I get every objects and I split my strings into arrays. After that, I loop to turn my arrays into a list of "objects that contains three doubles" and I add it all to one list so I can use it for further operations.
It causes an out of memory exception before the end. Ideas? Something with linq would be the best.
What I got looks like this :

Let's do some math. 1000 values per string * 8 characters per value (6 digits plus a comma and semi-colon) * 2 bytes per character * 3 strings per object = 48,000 bytes per object. That by itself isn't a lot, and even with 3000 objects we're still only talking about around 150MB of RAM. That's still nothing to a modern system. Converting to double arrays is even less, as there's only 8 bytes per value rather than 16. String are also reference types, so there would have been overhead for that in the string version as well. The important thing is that no matter how you dice it you're still well short of the 85,000 byte thresh-hold for these to get stuck on the Large Object Heap, the normal source of OutOfMemoryException.
Without code it's hard to follow exactly what you're doing, but I have a couple different guesses:
Many of the values in your string are more than 5 digits, such that you cross the magic 85,000 byte threshold after all, and your objects end up on the Large Object Heap in the garbage collector. Thus, they are not collected and as you continue processing objects you soon run yourself out of Address Space (not real RAM).
You're extracting your doubles in such a way that you're rebuilding the string over and over. This creates a lot of memory pressure on the garbage collector, as it re-creates strings over and over and over.
If this is a long running program, where the size and number of items in each string can vary significantly, it could be that over time you're running into a few large lists of large values that will push your object just over the 85,000 byte mark.
Either way, what you want to do here is stop thing in terms of lists and start thinking in terms of sequences. Rather than List<string> or List<double>, try for an IEnumerable<string> and IEnumerable<double>. Write a parser for your string that uses the yield keyword to create an iterator block that will extract doubles from your string one at a time by iterating over the characters, without ever changing the string. This will perform way better, and likely fix your memory issue as well.

Why should we call TrimToSize method of a Queue?

The capacity of a Queue is the number of elements the Queue can hold. As elements are added to a Queue, the capacity is automatically increased as required through reallocation. The capacity can be decreased by calling TrimToSize.
This is written in MSDN Queue Document
Now the question is that in a queue if we add around 20 thousand items then one by one that queue is De-queued until the queue is empty. If we don't call TrimToSize function then the queue size will remain to that 20 Thousand but the data is removed by the garbage collector so technically there is no memory leak and if we check the count or serialize the queue the size is of an empty queue. So why should we call TrimToSize function ?

You are confusing the GC of the objects in the queue with the "slots" of memory for the queue itself.
The queue will have allocated space to store all the 20K references.... those slots will just be empty, and therefore not pointing to objects which are taking up yet more memory. But those "slots" will sstill be there, waiting to have references assigned to them.

Assume if queue stores items in an internal array, and when capacity is increases, a new array is allocated and and items are moved from old smaller size array to this new array.
Assuming initial capacity is 16, so array of length 16 is allocated in memory. Now your array is grown to 20000, probably due to spike in algorithm and once all jobs are processed and queue contains only 1 item. This time you are using array with 20000 length. In this case your queue is occupying way too much memory then needed.
Queue will mostly used for long running, task management kind of algorithm in which memory usage will be very dynamic. Reducing capacity will help in better performance as if you have many instances and each will outgrow, you will have most of memory unused.
Looking at this scenario I will prefer to use linked lists.

Think in terms of the two sets of objects:
queue other things
+------+
| slot | -> item
| slot | -> item
| slot | -> item
: :
| slot | -> item
+------+
While the items themselves may be garbage-collected when no longer used, that doesn't affect the single object, the queue, which is still in use.
However, it may have been expanded to a gazillion slots at some point when your load was high and it will keep that size until told otherwise.
By calling TrimToSize on the queue, you reduce the number of slots in use, potentially releasing memory back to the free pool for other purposes.
The queue can get quite large even without adding a lot of elements since you can configure the high multiplier for it (the value that its capacity is multiplied by when you add to a full queue).
It's all just good memory management, often used for queues where you know they won't increase in size again.
A classic example of this is reading in configuration items from a file. Once you've read them in, it's unlikely they'll increase in size again (until you re-read the file, which would usually be infrequent).
If your queue is likely to change size frequently, up and down all over the place, you may be better off not using TrimToSize.

The capacity is not increased one by one. If I remember correctly the capacity is being doubled each time the size reaches the threshold. TrimToSize sets the capacity to exactly the size.
Usually you won't need to call that Method. But there can be situations when you want to marshal or serialize.
It is also very true what Andrew sais.

Queue use object[] for hold elements, so even you dequeue all elements you will have array with length 20000 in the memory

How large per list item does List<uint> get in .NET 4.0?

I know that it takes 4 bytes to store a uint in memory, but how much space in memory does it take to store List<uint> for, say, x number of uints?
How does this compare to the space required by uint[]?

There is no per-item overhead of a List<T> because it uses a T[] to store its data. However, a List<T> containing N items may have 2N elements in its T[]. Also, the List data structure itself has probably 32 or more bytes of overhead.

You probably will notice not so much difference between T[] and list<T> but you can use
System.GC.GetTotalMemory(true);
before and after an object allocation to obtain an approximate memory usage.

List<> uses an array internally, so a List<uint> should take O(4bytes * n) space, just like a uint[]. There may be some more constant overhead in comparison to a array, but you should normally not care about this.
Depending on the specific implementation (this may be different when using Mono as a runtime instead of the MS .NET runtime), the internal array will be bigger than the number of actual items in the list. E.g.: a list of 5 elements has an internal array that can store 10, a list of 10000 elements may have an internal array of size 11000. So you cant generally say that the internal array will always be twice as big, or 5% bigger than the number of list element, it may also depend on the size.
Edit: I've just seen, Hans Passant has described the growing behaviour of List<T> here.
So, if you have a collection of items that you want to append to, and you cant know the size of this collection at the time the list is created, use a List<T>. It is specifically designed for this case. It provides fast random access O(1) to the elements, and has very little memory overhead (internal array). It is on the other hand very slow on removing or inserting in the middle of the list. If you need those operations often, use a LinkedList<T>, which has then more memory overhead (per item!), however. If you know the size of you collection from the beginning, and you know that is wont change (or just very few times) use arrays.

What is dictionary compaction support?

"Here is the implementation of the dictionary without any compaction support."
This quote is taken from here: http://blogs.msdn.com/jaredpar/archive/2009/03/03/building-a-weakreference-hashtable.aspx
I know jaredpar is a member on here and posts on the C# section. What exactly is "dictionary compaction support"? I am assuming it is some way to optimise or make it smaller? But how (if this is what it is)?
Thanks

For that particular post I was refering to shrinking the dictionary in order to be a more appropriate size for the number of non-collected elements.
Under the hood most hashtables are backed by a large array which usually points to another structure such as a linked list. The array starts out at an initialize size. When the number of elements added to the hashtable exceeds a certain threshold (say 70% of the number of elements in the array), the hashtable will expand. This usually involves creating a new array at twice the size and re-adding the values into the new array.
One of the problems / features of a weak reference hashtable is that over time the elements are collected. Over time this can lead to a bit of wasted space. Imagine that you added enough elements to go through this array doubling process. Over time some of these were collected and now the remaining elements could fit into the previous array size.
This is not necessarily a bad thing but it is wasted space. Compaction is the process where you essentially shrink the underlying data structure for the hashtable to be a more appropriate size for the data.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.