Does array resizing invoke the GC? - c#

I looked into the implementation of Array.Resize() and noticed that a new array is created and returned. I'm aiming for zero memory allocation during gameplay and so I need to avoid creating any new reference types. Does resizing an array trigger the Garbage Collector on the previous array? I'm creating my own 2D array resizer, but it essentially functions in the same way as the .NET Resize() method.
If the new array is smaller than the previous one, but excess objects have already been placed back into a generic object pool, will this invoke the GC?
Arrays will constantly be created in my game loop, so I need to try and make it as efficient as possible. I'm trying to create an array pool as such, so that there's no need to keep creating them ingame. However, if the resize method does the same thing, then it makes little sense to not just instantiate a new array instead of having the pool.
Thanks for the help

Array.Resize doesn't actually change the original array at all - anyone who still has a reference to it will be able to use it as before. Therefore there's no optimization possible. Frankly it's a badly named method, IMO :(
From the docs:
This method allocates a new array with
the specified size, copies elements
from the old array to the new one, and
then replaces the old array with the
new one.
So no, it's not going to reuse the original memory or anything like that. It's just creating a shallow copy with a different size.

Yes, using Array.Resize causes a new array to be allocated and the old one to eventually be collected (unless there are still references to it somewhere).
A more low-level array resizer could possibly do some minor optimization in some cases (for example when the array is being made smaller or there happens to be memory available right after the array), but .NET's implementation doesn't do that.

Implicitly yes.
Explicitly no.

Any allocation will eventually be cleaned up by the GC when no more references exist, so yes.
If you want to avoid resizing your arrays, the best thing you could do would be to preallocate with a large enough size to avoid having to reallocate at all. In that case, you might as well just use a collection class with an initial capacity specified in the constructor, such as List.

Related

Details of write barriers in the .Net Garbage Collector

I have a large T[] in generation 2 on the Large Object Heap. T is a reference type. I make the following assignment:
T[0] = new T(..);
Which object(s) are marked as dirty for the next Gen0/Gen1 mark phases of GC? The entire array instance, or just the new instance of T? Will the next Gen0/Gen1 GC mark phase have to go through every item of the array? (That would seem unnecessary and very inefficient.)
Are arrays special in this regard? Would it change the answer if the collection were e.g. a SortedList<K, T> and I added a new, maximal item?
I've read through many questions and articles, including the ones below, but I still don't think I've found a clear answer.
I'm aware that an entire range of memory is marked as dirty, not individual objects, but is the new array entry or the array itself the basis of this?
card table and write barriers in .net GC
Garbage Collector Basics and Performance Hints
Which object(s) are marked as dirty for the next Gen0/Gen1 mark phases of GC? The entire array instance, or just the new instance of T?
128B block containing the start of the array will be marked as dirty. The newly created instance (new T()) will be a new object, so it will first be checked through a Gen 0 collection without the card table.
For simplicity, presuming that the start of the array is aligned on a 128B boundary, this means the first 128B will be invalidated, so presuming T is a reference type and you're on a 64-bit system, that's first 16 items to check during the next collection.
Will the next Gen0/Gen1 GC mark phase have to go through every item of the array? (That would seem unnecessary and very inefficient.)
Just these 16 to 32 items, depending on the pointer size in this architecture.
Are arrays special in this regard? Would it change the answer if the collection were e.g. a SortedList and I added a new, maximal item?
Arrays are not special. A SortedList<K,T> maintains two arrays internally, so more blocks will end up dirty in the average case.
pretty sure its tracking array slots, not the root which is holding reference to array object itself.
btw if particual card is set dirty, it has to scan 4k of memory. ive read somewhere its now using windows' own mechanism which lets you get notifications if memory range in interest is written to.

Is there a way to trim a Dictionary's capacity once it is known to be fixed size?

After reading the excellent accepted answer in this question:
How is the c#/.net 3.5 dictionary implemented?
I decided to set my initial capacity to a large guess and then trim it after I read in all values. How can I do this? That is, how can I trim a Dictionary so the gc will collect the unused space later?
My goal with this is optimization. I often have large datasets and the time penalty for small datasets is acceptable. I want to avoid the overhead of reallocating and copying the data that is incured with small initial capacities on large datasets.
According to Reflector, the Dictionary class never shrinks. void Resize() is hard-coded to always double the size.
You can probably create a new dictionary and use the respective constructor to copy over the items. This will be quite inefficient.
Or, implement your own dictionary with the existing one as a blue-print. This is less work than you might think at first.
Be sure to benchmark both approaches.
In .NET 5 there is the method TrimExcess doing exactly what you're asking:
Sets the capacity of this dictionary to what it would be if it had
been originally initialized with all its entries.
You might consider putting your data in a list first. Then you know the list's size, and can create a dictionary with that capacity (now exactly right for the data you want) and populate it.
Allowing the list to dynamically resize (as you add the elements) should be cheaper than allowing a dictionary to resize. (But, as others have noted, test the performance yourself!) Resizing a dictionary involves a rehashing operation, which means every element's GetHashCode will get called again, as well as the reference being copied into the new data structure. Resizing a list just means copying the references, so should be cheaper.

Is it bad form to let C# garbage collect a list instead of reusing it? [duplicate]

This question already has answers here:
Using the "clear" method vs. New Object
(5 answers)
Closed 8 years ago.
I have a list of elements that steadily grows, until I dump all the data from that list into a file. I then want to reuse that list for the same purpose again. Is it bad practice to simply assign it to a new list, instead of removing all the elements from the list? It seems garbage collection should take care of the old list, and that way I don't have to worry about removing the elements.
For example:
var myList = new List<element>();
myList.Add(someElement);
myList.Add(anotherElement);
// dumps the elements into a file
myList = new List<element>();
Edit: Even if there are easy ways around this, I was wondering too about the philosophical side of it. Is it bad to let something be garbage collected if there is a way around it? What are the costs of allowing garbage collection vs deleting the elements and reusing the same memory?
It depends a bit on how many elements are in the list. If the array backing the list is large enough to be on the large object heap, then you might be better off clearing the list and reusing it. This will reduce the number of large memory allocations, and will help reduce the problem of large object heap fragmentation. (See http://msdn.microsoft.com/en-us/magazine/cc534993.aspx and http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/ for more information; see http://blogs.msdn.com/b/dotnet/archive/2011/10/04/large-object-heap-improvements-in-net-4-5.aspx for improvements due with .NET 4.5)
If the lists are small, you might be better off just creating a new list, or you might get better performance calling Clear(). When in doubt, measure the performance.
Edit: In response to the philosophical question you pose in your edit, here are two reasons to create a new list:
In general, code is cleaner and easier to reason about if you do not reuse objects. The cost of garbage collection is low, the cost of confusing code is high.
Consider what happens if the code dumping the list's contents is in another function, as it most likely is. Once you've passed that list out of its local context, it's possible that there are non-local references to the same list. Other code might be modifying the list, or might be assuming (incorrectly) that you're not modifying it.
myList.Clear() is even easier to code than myList = new List<element>();
msdn: List.Clear Method
Each element in the list is a different object itself, and will need to be garbage collected whether you clear the list, or recreate a new list, or remove the items one at a time. What will NOT need to be garbage collected if you just clear the list and reuse it is the list itself. Unless your list is huge, containing hundreds of thousands of items, it will be difficult to measure a performance difference one way or the other. Fortunately, the garbage collector is highly optimized and it's a rare occurrence where developers need to consider what it is doing.
(As others have pointed out, there are various factors involved, such as...how many elements will you be adding to the new list? vs how many elements were in the old list? ...but the point is: the garbage collection of the list itself isn't relevant when it comes to collecting the elements of the list.)
I'm no expert, but:
Making a new list expecting that the GC will "take care" of the old one is probably a bad idea because it's a bad practice & probably inefficient.
Although it's a micro-optimization, I'd say that "setting" the new values until you reach list.Count, and the continuing to list.Add is the best way, because then you don't clear nor allocate unnecessary new memory (unless it's large lists which you want to clear for space)
Anyway, I would recommend using List.Clear() - it saves you and the GC trouble.
It sounds like you're asking two different questions. One is whether it's okay to set it to a new object or just clear it, which I think Eric answered pretty well. The second is whether you should just ignore the GC and let it work without trying to "help" it - to that, I'd say absolutely YES. Let the framework do what the framework does and stay out of its way until you have to.
A lot of programmers want to dig in too deep, and most of the time it causes more problems than it helps. The GC is designed to collect these things and clean them up for you. Unless you are seeing a very specific problem, you should write the code that works and pay ignore when something will be collected (with the exception of the using keyword when appropriate).
The important perspective is clean code.
when you create a new list the old one will be removed by the GC (if there are no other reference to it.)
I would rather to use List.Clear() to remove all the elements for re-use. The Capacity remain unchanged so there shouldn't have additional overhead cost and letting GC to handle the memory garbage collection so you can maintain clean code.

arrays of structs need advice

I made an array of structs to represent map data that gets drawn; however I didn't double check it till it was too late: when I load in a new map I get either an "out of memory exception" (if i try to make a new array struct first) or I get a screwed up map that would require a lot of recodeing to get it to work right (if i just initialize a big map first)... maybe too much.
So now I'm wondering if there's a safe way to reallocate the array of structs since the data when I do it is thrown away anyway (i.e. I dont need to copy the data, just resize the array and reset new data from the file).
Is this possible safely?
Or should I just look to use something else, like an arraylist or list?
What I need here is basically indexing speed and reading speed more then anything.
A large and contiguous block of memory is sometimes difficult to allocate. Consider allocating more jagged data. Access time will be slightly degraded, but you will be able to allocate more memory.
Read more about jagged arrays

C# List<T>.ToArray performance is bad?

I'm using .Net 3.5 (C#) and I've heard the performance of C# List<T>.ToArray is "bad", since it memory copies for all elements to form a new array. Is that true?
No that's not true. Performance is good since all it does is memory copy all elements (*) to form a new array.
Of course it depends on what you define as "good" or "bad" performance.
(*) references for reference types, values for value types.
EDIT
In response to your comment, using Reflector is a good way to check the implementation (see below). Or just think for a couple of minutes about how you would implement it, and take it on trust that Microsoft's engineers won't come up with a worse solution.
public T[] ToArray()
{
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
}
Of course, "good" or "bad" performance only has a meaning relative to some alternative. If in your specific case, there is an alternative technique to achieve your goal that is measurably faster, then you can consider performance to be "bad". If there is no such alternative, then performance is "good" (or "good enough").
EDIT 2
In response to the comment: "No re-construction of objects?" :
No reconstruction for reference types. For value types the values are copied, which could loosely be described as reconstruction.
Reasons to call ToArray()
If the returned value is not meant to be modified, returning it as an array makes that fact a bit clearer.
If the caller is expected to perform many non-sequential accesses to the data, there can be a performance benefit to an array over a List<>.
If you know you will need to pass the returned value to a third-party function that expects an array.
Compatibility with calling functions that need to work with .NET version 1 or 1.1. These versions don't have the List<> type (or any generic types, for that matter).
Reasons not to call ToArray()
If the caller ever does need to add or remove elements, a List<> is absolutely required.
The performance benefits are not necessarily guaranteed, especially if the caller is accessing the data in a sequential fashion. There is also the additional step of converting from List<> to array, which takes processing time.
The caller can always convert the list to an array themselves.
taken from here
Yes, it's true that it does a memory copy of all elements. Is it a performance problem? That depends on your performance requirements.
A List contains an array internally to hold all the elements. The array grows if the capacity is no longer sufficient for the list. Any time that happens, the list will copy all elements into a new array. That happens all the time, and for most people that is no performance problem.
E.g. a list with a default constructor starts at capacity 16, and when you .Add() the 17th element, it creates a new array of size 32, copies the 16 old values and adds the 17th.
The size difference is also the reason why ToArray() returns a new array instance instead of passing the private reference.
This is what Microsoft's official documentation says about List.ToArray's time complexity
The elements are copied using Array.Copy, which is an O(n) operation, where n is Count.
Then, looking at Array.Copy, we see that it is usually not cloning the data but instead using references:
If sourceArray and destinationArray are both reference-type arrays or are both arrays of type Object, a shallow copy is performed. A shallow copy of an Array is a new Array containing references to the same elements as the original Array. The elements themselves or anything referenced by the elements are not copied. In contrast, a deep copy of an Array copies the elements and everything directly or indirectly referenced by the elements.
So in conclusion, this is a pretty efficient way of getting an array from a list.
it creates new references in an array, but that's just the only thing that that method could and should do...
Performance has to be understood in relative terms. Converting an array to a List involves copying the array, and the cost of that will depend on the size of the array. But you have to compare that cost to other other things your program is doing. How did you obtain the information to put into the array in the first place? If it was by reading from the disk, or a network connection, or a database, then an array copy in memory is very unlikely to make a detectable difference to the time taken.
For any kind of List/ICollection where it knows the length, it can allocate an array of exactly the right size from the start.
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
If your source type is IEnumerable (not a List/Collection) then the source is:
items = new TElement[4];
..
if (no more space) {
TElement[] newItems = new TElement[checked(count * 2)];
Array.Copy(items, 0, newItems, 0, count);
items = newItems;
It starts at size 4 and grows exponentially, doubling each time it runs out of space. Each time it doubles, it has to reallocate memory and copy the data over.
If we know the source-data size, we can avoid this slight overhead. However in most cases eg array size <=1024, it will execute so quickly, that we don't even need to think about this implementation detail.
References: Enumerable.cs, List.cs (F12ing into them), Joe's answer

Categories

Resources