Why should we call TrimToSize method of a Queue?

Why should we call TrimToSize method of a Queue? - c#

The capacity of a Queue is the number of elements the Queue can hold. As elements are added to a Queue, the capacity is automatically increased as required through reallocation. The capacity can be decreased by calling TrimToSize.
This is written in MSDN Queue Document
Now the question is that in a queue if we add around 20 thousand items then one by one that queue is De-queued until the queue is empty. If we don't call TrimToSize function then the queue size will remain to that 20 Thousand but the data is removed by the garbage collector so technically there is no memory leak and if we check the count or serialize the queue the size is of an empty queue. So why should we call TrimToSize function ?

You are confusing the GC of the objects in the queue with the "slots" of memory for the queue itself.
The queue will have allocated space to store all the 20K references.... those slots will just be empty, and therefore not pointing to objects which are taking up yet more memory. But those "slots" will sstill be there, waiting to have references assigned to them.

Assume if queue stores items in an internal array, and when capacity is increases, a new array is allocated and and items are moved from old smaller size array to this new array.
Assuming initial capacity is 16, so array of length 16 is allocated in memory. Now your array is grown to 20000, probably due to spike in algorithm and once all jobs are processed and queue contains only 1 item. This time you are using array with 20000 length. In this case your queue is occupying way too much memory then needed.
Queue will mostly used for long running, task management kind of algorithm in which memory usage will be very dynamic. Reducing capacity will help in better performance as if you have many instances and each will outgrow, you will have most of memory unused.
Looking at this scenario I will prefer to use linked lists.

Think in terms of the two sets of objects:
queue other things
+------+
| slot | -> item
| slot | -> item
| slot | -> item
: :
| slot | -> item
+------+
While the items themselves may be garbage-collected when no longer used, that doesn't affect the single object, the queue, which is still in use.
However, it may have been expanded to a gazillion slots at some point when your load was high and it will keep that size until told otherwise.
By calling TrimToSize on the queue, you reduce the number of slots in use, potentially releasing memory back to the free pool for other purposes.
The queue can get quite large even without adding a lot of elements since you can configure the high multiplier for it (the value that its capacity is multiplied by when you add to a full queue).
It's all just good memory management, often used for queues where you know they won't increase in size again.
A classic example of this is reading in configuration items from a file. Once you've read them in, it's unlikely they'll increase in size again (until you re-read the file, which would usually be infrequent).
If your queue is likely to change size frequently, up and down all over the place, you may be better off not using TrimToSize.

The capacity is not increased one by one. If I remember correctly the capacity is being doubled each time the size reaches the threshold. TrimToSize sets the capacity to exactly the size.
Usually you won't need to call that Method. But there can be situations when you want to marshal or serialize.
It is also very true what Andrew sais.

Queue use object[] for hold elements, so even you dequeue all elements you will have array with length 20000 in the memory

Related

Efficient queue clearing in C#

I'm currently dealing with a queue that has a couple thousand entries in it. To save on RAM usage I'm currently using the TrimExcess() method built in to the queue datatype by Microsoft. As mentioned in the documentation
, this method is inefficient when it comes to large lists and results in a significant time loss whenever it is called.
Is there a more efficient way to remove items from a queue that actually deletes them in RAM aswell?
Edit: to clarify there are still elements in the queue, I just want to remove the elements that have been dequeued from RAM

The answer to your question is "Don't worry about that, and it's very likely that you should not do that." This answer is an elaboration of the comments from #madreflection and myself.
The Queue<T> class, like other collection classes uses an array to hold the items it collects. Like other collection classes, if the array runs out of slots, the Queue class creates a new, larger array and copies the data from old array.
I haven't looked at the Queue<T> source, but using the debugger, I can see that this array is the _array member of the class. It starts with an array of 0 length. When you enqueue one item, it gets replaced by an array of length 4. After that, the array doubles in size whenever it needs more space.
You say your queue "has a couple thousand entries in it". I'm going to use 2000 in this analysis as a rough guess.
As you enqueue more and more net entries into the queue, that array doubling will happen several times:
At First
After 5
After 9
After 17
After 33
4 Entries
Double to 8
Double to 16
Double to 32
Double to 64
It will keep doing this until it's doubled (and copied the array contents) 10 times - bringing it to 2048. At that point, you will have allocated 10 arrays, nine of which are garbage, and done about 3000 queued element copies.
Now think about it. I'm guessing you are enqueue reference type objects. A reference type object is represented by an object reference (in effect a pointer). If you have 2000 instances in a queue that will represent 8kb on a 32-bit machine (plus some one-time overhead for the members of the queue class). On a 64-bit machine, it's 16kb. That's nothing for a modern computer. The .NET garbage collector has two strategies for managing memory, a normal one and one for large objects. The boundary is 85kb; your queue will never be a large object
If you are enqueuing large value types, then more memory is needed (since the value type objects will be copied into the array elements that make up the queue entries). You'd need to be using very large value type objects before your queue becomes a large object.
The other thing that will happen is that as your queue grows in size, it will settle into the Garbage Collector's Gen2 memory area. Gen2 collections are expensive, but once an object becomes stable in Gen2, it doesn't bother the garbage collector at all.
But, think about what happens if you reduce your queue size way down to, say 100 entries and call TrimExcess. At that point, yet another new array will be created (this time, much smaller) and the entries in your queue will be copied to that new queue (that's what the notes in the Remarks section of the TrimExcess documentation is referring to when it talks about The cost of reallocating and copying a large Queue<T>). If your queue starts growing again, you will start doubling/copying that array over and over again - spinning off more garbage and spinning your wheels doing the copying.
A better strategy is to look at your estimated queue size, inflate it a bit, and pre-allocate the space for all of those entries at construction time. If you expect to have 2000 entries, allocate space for 2500 in the constructor:
var myQueue = new Queue<SomeType>(2500);
Now, you do one allocation, there should be no reallocation or array copying, and your memory will quickly migrate to Gen2, but will never be touched by the GC.

ArrayPool create method in C#

I was using with ArrayPool in C#.
I wanted to create my own pool with max no of arrays 5 and max size of array 1050000.
I used this ArrayPool.Create() method.
I am not able to understand one thing - i am trying to rent from the pool 10 times in the snippet below ,although i specified max arrays to be 5 , then why is it not showing any error.
Also, i specified max length to be 1050000.Then how am i able to rent a 4200000 array without any error ?
byte[] buffer;
ArrayPool<byte> pool = ArrayPool<byte>.Create(1050000, 5);
for (int i = 0; i < 10; i++)
{
buffer = pool.Rent(4200000);
}

The options passed to ArrayPool.Create don't imply you cannot recieve an array larger than those limits. Instead they are used to control the bucketing algorithm of the ConfigurableArrayPool. The second argument is the maximum number of slots in a bucket and the first is the maximum size of any array. This value is capped by an internal constant of 1,048,576 which is already smaller than your 1,050,000.
When you Rent from the array pool, the algorithm will attempt to locate an array in one of the buckets/slots. The number of these buckets (and their internal slots) are what become limited by the values you passed in. If the pool doesn't have an array of the minimum size requested either because all slots are in use or because the requested size is greater than the maximum, it will instead allocate a new one (without pooling it) and return that.
In short, when you request an array larger than the (capped) size you passed in to the Create method you will incur an allocation and recieve an array that does not participate in the pool. Calling Return with this array will not place it back into the pool; instead it will be "dropped".
Keep in mind however that these rules only apply to the built-in array pool. You (or someone else) could write an implementation that caps the size of the returned array or even throws -- though I'd argue that those might not be considered well-behaved (at least without supporting doc).
Update based on your comments:
While true there is not a parameter that corresponds directly to the number of buckets, there is indirectly. The number of buckets is calculated using the maximum array size you pass in. The max buckets is determined based on powers of 2 and some other logic.

Documentation has no exceptions defined for the Rent method (though there is at least one which can be thrown - ArgumentOutOfRangeException for negative array sizes).
Looking at the source code for the Create method - it returns ConfigurableArrayPool. It's Rent method will try to find an array matching for the request and if there is no suitable one it will just allocate a new one:
// The request was for a size too large for the pool. Allocate an array of exactly the requested length.
// When it's returned to the pool, we'll simply throw it away.
buffer = new T[minimumLength];
So both parameters (maxArrayLength and maxArraysPerBucket) are used just to control what ArrayPool will actually store and reuse, not how much can be allocated (and it makes sense, usually you don't want your application to fail while allocating memory if there is enough of memory available to it). Everything else will be in the control of GC, so ArrayPool will not end up storing a lot of noncollectable memory.

Good way to manage very large collection

I am trying to think of a fast and efficient way to handle a ton of items, all of the same struct type, in which the array can grow over time and quickly and selectively remove items when the conditions are right.
The application will have a large amount of data streaming in at a relatively fast rate, and I need to quickly analyze it, update some UI info, and drop the older datapoints to make room for new ones. There are certain data points of interest that I need to hang onto for a longer amount of time than others.
The data payload contains 2 integer numbers that represent physical spectrum data: frequency, power, etc. The "age out" thing was just some meta-data I was going to use to determine when it was a good time to drop old data.
I thought that using a LinkedList would be a good choice as it can easily remove items from the middle of the collection, but I need to be able to perform the following pseudo-code:
for(int i = 0; i < myCollection.Length; i++)
{
myCollection[i].AgeOutVal--;
if(myCollection[i].AgeOutVal == 0)
{
myCollection.Remove(i);
i--;
}
}
But I'm getting compiler errors indicating that I cannot use a collection like this. What would be a good/fast way to do this?

I would recommend that first, you do some serious performance analysis of your program. Processing a million items per second only leaves you a few thousand cycles per item, which is certainly doable. But with that kind of performance goal your performance is going to be heavily influenced by things like data locality and the resulting cache misses.
Second, I would recommend that you separate the concern of "does this thing need to be removed from the queue" from whatever concern the object itself represents.
Third, you do not say how big the "age" field can get, just that it is counting down. It seems inefficient to mutate the entire collection every time through the loop just to find the ones to remove. Some ideas:
Suppose the "age" counts down from ten to zero. Instead of creating one collection and each item in the collection has an age, create ten collections, one for things that will time out in one, one for things that will time out in two, and so on. Each tick you throw away the "time out in one" collection, then the "time out in two" collection becomes the "time out in one" collection, and so on. Every time through the loop you just move around a tiny number of collection references, rather than mutating a huge number of items.
Why is "age" counting down at all? Time is increasing. Mark each item according to when it was created, and never change that. Use a queue, so you can insert new items on one end and delete them from the other end. The queue will therefore be sorted by age. Each tick, dequeue items that are too old until you get to an item that is not too old. As mentioned elsewhere, a circular buffer implementation of a queue is likely to be efficient.

Lists double their space in c# when they need more room. At some point does it become less efficient to double say 1024 to 2048?

When numbers are smaller, it's quick to grow the size of an array list from 2 to 4 memory addresses but when it starts to increase the amount of space closer to the max amount of space allowed in an array list (close to the 2MB limit). Would changing how much space is allotted in those bigger areas be more efficient if it was only growing the size of the array by a fraction of the size it needs at some point? Obviously growing the size from 1mb to 2mb isn't really a big deal now-days HOWEVER, if you had 50,000 people running something per hour that did this doubling the size of an array, I'm curious if that would be a good enough reason to alter how this works. Not to mention cut down on un-needed memory space (in theory).
A small graphical representation of what I mean..
ArrayList a has 4 elements in it and that is it's current max size at the moment
||||
Now lets add another item to the arraylist, the internal code will double the size of the array even though we're only adding one thing to the array.
The arraylist now becomes 8 elements large
||||||||
At these size levels, I doubt it makes any difference but when you're allocating 1mb up to 2mb everytime someone is doing something like adding some file into an arraylist or something that is around 1.25mb, there's .75mb of un-needed space allocated.
To give you more of an idea of the code that is currently ran in c# by the System.Collections.Generic class. The way it works now is it doubles the size of an array list (read array), every time a user tries to add something to an array that is too small. Doubling the size is a good solution and makes sense, until you're essentially growing it far bigger than you technically need it to be.
Here's the source for this particular part of the class:
private void EnsureCapacity(int min)
{
if (this._items.Length >= min)
return;
// This is what I'm refering to
int num = this._items.Length == 0 ? 4 : this._items.Length * 2;
if ((uint) num > 2146435071U)
num = 2146435071;
if (num < min)
num = min;
this.Capacity = num;
}
I'm going to guess that this is how memory management is handled in many programming languages so this has probably been considered many times before, just wondering if this is a kind of efficiency saver that could save system resources by a large amount on a massive scale.

As the size of the collection gets larger, so does the cost of creating a new buffer as you need to copy over all of the existing elements. The fact that the number of these copies that need to be done is indirectly proportional to the expense of each copy is exactly why the amortized cost of adding items to a List is O(1). If the size of the buffer increases linearly, then the amortized cost of adding an item to a List actually becomes O(n).
You save on memory, allowing the "wasted" memory to go from being O(n) to being O(1). As with virtually all performance/algorithm decisions, we're once again faced with the quintessential decision of exchanging memory for speed. We can save on memory and have slower adding speeds (because of more copying) or we can use more memory to get faster additions. Of course there is no one universally right answer. Some people really would prefer to have a slower addition speed in exchange for less wasted memory. The particular resource that is going to run out first is going to vary based on the program, the system that it's running on, and so forth. Those people in the situation where the memory is the scarcer resource may not be able to use List, which is designed to be as wildly applicable as possible, even though it can't be universally the best option.

The idea behind the exponential growth factor for dynamic arrays such as List<T> is that:
The amount of wasted space is always merely proportional to the amount of data in the array. Thus you are never wasting resources on a more massive scale than you are properly using.
Even with many, many reallocations, the total potential time spent copying while creating an array of size N is O(N) -- or O(1) for a single element.
Access time is extremely fast at O(1) with a small coefficient.
This makes List<T> very appropriate for arrays of, say, in-memory tables of references to database objects, for which near-instant access is required but the array elements themselves are small.
Conversely, linear growth of dynamic arrays can result in n-squared memory wastage. This happens in the following situation:
You add something to the array, expanding it to size N for large N, freeing the previous memory block (possibly quite large) of size N-K for small K.
You allocate a few objects. The memory manager puts some in the large memory block just vacated, because why not?
You add something else to the array, expanding it to size N+K for some small K. Because the previously freed memory block now is sparsely occupied, the memory manager does not have a large enough contiguous free memory block and must request more virtual memory from the OS.
Thus virtual memory committed grows quadratically despite the measured size of objects created growing linearly.
This isn't a theoretical possibility. I actually had to fix an n-squared memory leak that arose because somebody had manually coded a linearly-growing dynamic array of integers. The fix was to throw away the manual code and use the library of geometrically-growing arrays that had been created for that purpose.
That being said, I also have seen problems with the exponential reallocation of List<T> (as well as the similarly-growing memory buffer in Dictionary<TKey,TValue>) in 32-bit processes when the total memory required needs to grow past 128 MB. In this case the List or Dictionary will frequently be unable to allocate a 256 MB contiguous range of memory even if there is more than sufficient virtual address space left. The application will then report an out-of-memory error to the user. In my case, customers complained about this since Task Manager was reporting that VM use never went over, say, 1.5GB. If I were Microsoft I would damp the growth of 'List' (and the similar memory buffer in Dictionary) to 1% of total virtual address space.

What is dictionary compaction support?

"Here is the implementation of the dictionary without any compaction support."
This quote is taken from here: http://blogs.msdn.com/jaredpar/archive/2009/03/03/building-a-weakreference-hashtable.aspx
I know jaredpar is a member on here and posts on the C# section. What exactly is "dictionary compaction support"? I am assuming it is some way to optimise or make it smaller? But how (if this is what it is)?
Thanks

For that particular post I was refering to shrinking the dictionary in order to be a more appropriate size for the number of non-collected elements.
Under the hood most hashtables are backed by a large array which usually points to another structure such as a linked list. The array starts out at an initialize size. When the number of elements added to the hashtable exceeds a certain threshold (say 70% of the number of elements in the array), the hashtable will expand. This usually involves creating a new array at twice the size and re-adding the values into the new array.
One of the problems / features of a weak reference hashtable is that over time the elements are collected. Over time this can lead to a bit of wasted space. Imagine that you added enough elements to go through this array doubling process. Over time some of these were collected and now the remaining elements could fit into the previous array size.
This is not necessarily a bad thing but it is wasted space. Compaction is the process where you essentially shrink the underlying data structure for the hashtable to be a more appropriate size for the data.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.