Details of write barriers in the .Net Garbage Collector

Details of write barriers in the .Net Garbage Collector - c#

I have a large T[] in generation 2 on the Large Object Heap. T is a reference type. I make the following assignment:
T[0] = new T(..);
Which object(s) are marked as dirty for the next Gen0/Gen1 mark phases of GC? The entire array instance, or just the new instance of T? Will the next Gen0/Gen1 GC mark phase have to go through every item of the array? (That would seem unnecessary and very inefficient.)
Are arrays special in this regard? Would it change the answer if the collection were e.g. a SortedList<K, T> and I added a new, maximal item?
I've read through many questions and articles, including the ones below, but I still don't think I've found a clear answer.
I'm aware that an entire range of memory is marked as dirty, not individual objects, but is the new array entry or the array itself the basis of this?
card table and write barriers in .net GC
Garbage Collector Basics and Performance Hints

Which object(s) are marked as dirty for the next Gen0/Gen1 mark phases of GC? The entire array instance, or just the new instance of T?
128B block containing the start of the array will be marked as dirty. The newly created instance (new T()) will be a new object, so it will first be checked through a Gen 0 collection without the card table.
For simplicity, presuming that the start of the array is aligned on a 128B boundary, this means the first 128B will be invalidated, so presuming T is a reference type and you're on a 64-bit system, that's first 16 items to check during the next collection.
Will the next Gen0/Gen1 GC mark phase have to go through every item of the array? (That would seem unnecessary and very inefficient.)
Just these 16 to 32 items, depending on the pointer size in this architecture.
Are arrays special in this regard? Would it change the answer if the collection were e.g. a SortedList and I added a new, maximal item?
Arrays are not special. A SortedList<K,T> maintains two arrays internally, so more blocks will end up dirty in the average case.

pretty sure its tracking array slots, not the root which is holding reference to array object itself.
btw if particual card is set dirty, it has to scan 4k of memory. ive read somewhere its now using windows' own mechanism which lets you get notifications if memory range in interest is written to.

Related

Why most of the data structures in generic collections use array despite of Large Object Heap fragmentation?

I could see that CoreCLR and CoreFx implicitly use array for most of the generic collections. what is the main driving factor to go with arrays and how it handles any side effects of LOH fragmentation.

What other then arrays should collections be?
More importnatly, what other then arrays could collections be?
In use collection boils down to "arrays - and stuff we wrap around arrays, for ease of use.":
The pure thing (arrays), wich do offer some conveniences like bounds checks in C#/.NET
Self growing arrays (Lists)
Two synchronized arrays that allow the mapping of any any input to any element (Dictionaries key/value pair)
Three synchornized array: Key, Value and a Hashvalue to quickly identify not-matching keys (HastTable).
Below the hood - regardless of how hard .NET makes it to use pointers - it all boils down to some code doing C/C++ style pointer arythmethic to get the next element.
Edit 1: As I learned in another place, .NET Dictionaries are actually implemented as HashLists. The HashList class is just the pre-generics version. Object has a GetHashCode function with sensible default behavior wich can be used, but also fully overwritten.
Fragmentation wise the "best" would be a array of references. It can be as small as the reference width (a Pointer or slightly bigger) and the GC can move around the instances to defragment memory. Of course then you get the slight overhead of accessing references rather the just counting/mathing up a pointer, so as usualy it is a memory vs speed tradeoff. However this might go into Speed Rant Territory of detail.
Edit 2: As Markus Appel pointed out in the comments, there is something even better for fragmentation avoidance: Linked lists. Even that single array of references - if you just make it big enough - will take quite some memory in one indivisible chunk. So it might run into object size limits or array indexer limits. A linked list will do neither. But as a result the performance is around a disk that was never defragmented.
Generics is just a convience to have typesafety in collections/other places. It avoids you having to use the dreaded Object as type, wich ruins all compile-time typesafety. Afaik they add nothing else to this situation. List<string> works the same as a StringList would.

Array access is faster as it is a linear storage. If Arrays can solve a problem well enough they are a better storage for traversal rather than always identifying where the next object is stored. For Large data structures this performance benefit will also be amplified.

Using arrays can cause fragmentation if used carelessly. In the general case though, the performance gains outweigh the cost.
When the buffer runs out, the collection allocates a new one with double the size. If the code inserts a lot of items without specifying a capacity, this results in log2(N) reallocations. If the code does specify a capacity though, even a very rough approximation, there may be no fragmentation issues at all.
Removal is another expensive case as the collection will have to move the items after the deleted item(s) to the left.
In general though, array storage offers far better performance than other storage structures though, both for reading, inserting and allocating memory. Deletions are rare in most cases.
For example, inserting N items in a linked list requires allocating N objects to hold that value and storing N pointers. That cost will be paid for every insertion, while the GC will have a lot more objects to track and collect. Inserting 100K items in a linked list would allocate 100K node objects that would need tracking.
With an array there won't be any allocations unless the buffer runs out. In the majority of cases insertion means simply writing to a buffer location and updating a count. When the buffer runs out there will be a single reallocation and an (expensive) copy operation. For 100K items, that's 17 allocations. In most cases, that's an acceptable cost.
To reduce or even get rid of allocations, the code can specify a capacity that's used as the initial buffer size. Specifying even a very rough estimate can reduce allocations a lot. Specifying 1024 as the initial capacity for 100K items would reduce reallocations to 7.

Return object to pool when no references point to it

Ok, I want to do the following to me it seems like a good idea so if there's no way to do what I'm asking, I'm sure there's a reasonable alternative.
Anyways, I have a sparse matrix. It's pretty big and mostly empty. I have a class called MatrixNode that's basically a wrapper around each of the cells in the matrix. Through it you can get and set the value of that cell. It also has Up, Down, Left and Right properties that return a new MatrixNode that points to the corresponding cell.
Now, since the matrix is mostly empty, having a live node for each cell, including the empty ones, is an unacceptable memory overhead. The other solution is to make new instances of MatrixNode every time a node is requested. This will make sure that only the needed nodes are kept in the memory and the rest will be collected. What I don't like about it is that a new object has to be created every time. I'm scared about it being too slow.
So here's what I've come up with. Have a dictionary of weak references to nodes. When a node is requested, if it doesn't exist, the dictionary creates it and stores it as a weak reference. If the node does already exist (probably referenced somewhere), it just returns it.
Then, if the node doesn't have any live references left, instead of it being collected, I want to store it in a pool. Later, when a new node is needed, I want to first check if the pool is empty and only make a new node if there isn't one already available that can just have it's data swapped out.
Can this be done?
A better question would be, does .NET already do this for me? Am I right in worrying about the performance of creating single use objects in large numbers?

Instead of guessing, you should make a performance test to see if there are any issues at all. You may be surprised to know that managed memory allocation can often outperform explicit allocation because your code doesn't have to pay for deallocation when your data goes out of scope.
Performance may become an issue only when you are allocating new objects so frequently that the garbage collector has no chance to collect them.
That said, there are sparse array implementations in C# already, like Math.NET and MetaNumerics. These libraries are already optimized for performance and will probably avoid performance issues you will run into if you start your implementation from stratch
An SO search for c# and sparse-matrix will return many related questions, including answers pointing to commercial libraries like ILNumerics (has a community edition), NMath and Extreme Optimization's libraries

Most sparse matrix implementations use one of a few well-known schemes for their data; I generally recommend CSR or CSC, as those are efficient for common operations.
If that seems too complex, you can start using COO. What this means in your code is that you will not store anything for empty members; however, you have an item for every non-empty one. A simple implementation might be:
public struct SparseMatrixItem
{
int Row;
int Col;
double Value;
}
And your matrix would generally be a simple container:
public interface SparseMatrix
{
public IList<SparseMatrixItem> Items { get; }
}
You should make sure that the Items list stays sorted according to the row and col indices, because then you can use binary search to quickly find out if an item exists for a specific (i,j).

The idea of having a pool of objects that people use and then return to the pool is used for really expensive objects. Objects representing a network connection, a new thread, etc. It sounds like your object is very small and easy to create. Given that, you're almost certainly going to harm performance pooling it; the overhead of managing the pool will be greater than the cost of just creating a new one each time.
Having lots of short lived very small objects is the exact case that the GC is designed to handle quickly. Creating a new object is dirt cheap; it's just moving a pointer up and clearing out the bits for that object. The real overhead for objects comes in when a new garbage collection happens; for that it needs to find all "alive" objects and move them around, leaving all "dead" objects in their place. If your small object doesn't live through a single collection it has added almost no overhead. Keeping the objects around for a long time (like, say, by pooling them so you can reuse them) means copying them through several collections, consuming a fair bit of resources.

Deallocate memory from C# dictionary contained in a static object

I had some problems with a WCF web service (some dumps, memory leaks, etc.) and I run a profillng tool (ANTS Memory Profiles).
Just to find out that even with the processing over (I run a specific test and then stopped), Generation 2 is 25% of the memory for the web service. I tracked down this memory to find that I had a dictionary object full of (null, null) items, with -1 hash code.
The workflow of the web service implies that during specific processing items are added and then removed from the dictionary (just simple Add and Remove). Not a big deal. But it seems that after all items are removed, the dictionary is full of (null, null) KeyValuePairs. Thousands of them in fact, such that they occupy a big part of memory and eventually an overflow occurs, with the corresponding forced application pool recycle and DW20.exe getting all the CPU cycles it can get.
The dictionary is in fact Dictionary<SomeKeyType, IEnumerable<KeyValuePair<SomeOtherKeyType, SomeCustomType>>> (System.OutOfMemoryException because of Large Dictionary) so I already checked if there is some kind of reference holding things.
The dictionary is contained in a static object (to make it accesible to different processing threads through processing) so from this question and many more (Do static members ever get garbage collected?) I understand why that dictionary is in Generation 2. But this is also the cause of those (null, null)? Even if I remove items from dictionary something will be always occupied in the memory?
It's not a speed issue like in this question Deallocate memory from large data structures in C# . It seems that memory is never reclaimed.
Is there something I can do to actually remove items from dictionary, not just keep filling it with (null, null) pairs?
Is there anything else I need to check out?

Dictionaries store items in a hash table. An array is used internally for this. Because of the way hash tables work, this array must always be larger than the actual number of items stored (at least about 30% larger). Microsoft uses a load factor of 72%, i.e. at least 28% of the array will be empty (see An Extensive Examination of Data Structures Using C# 2.0 and especially The System.Collections.Hashtable Class
and The System.Collections.Generic.Dictionary Class) Therefore the null/null entries could just represent this free space.
If the array is too small, it will grow automatically; however, when items are removed, the array does not shrink, but the space that will be freed up should be reused when new items are inserted.
If you are in control of this dictionary, you could try to re-create it in order to shrink it:
theDict = new Dictionary<TKey, IEnumerable<KeyValuePair<TKey2, TVal>>>(theDict);
But the problem might arise from the actual (non empty) entries. Your dictionary is static and will therefore never be reclaimed automatically by the garbage collector, unless you assign it another dictionary or null (theDict = new ... or theDict = null). This is only true for the dictionary itself which is static, not for its entries. As long as references to removed entries exist somewhere else, they will persist. The GC will reclaim any object (earlier or later) which cannot be accessed any more through some reference. It makes no difference, whether this object was declared static or not. The objects themselves are not static, only their references.
As #RobertTausig kindly pointed out, since .NET Core 2.1 there is the new Dictionary.TrimExcess(), which is what you actually wanted, but didn't exist back then.

Looks like you need to recycle space in that dict periodically. You can do that by creating a new one: new Dictionary<a,b>(oldDict). Be sure to do this in a thread-safe manner.
When to do this? Either on the tick of a timer (60sec?) or when a specific number of writes has occurred (100k?) (you'd need to keep a modification counter).

A solution could be to call Clear() method on the static dictionary.
In this way, the reference to the dictionary will remain available, but the objects contained will be released.

Does array resizing invoke the GC?

I looked into the implementation of Array.Resize() and noticed that a new array is created and returned. I'm aiming for zero memory allocation during gameplay and so I need to avoid creating any new reference types. Does resizing an array trigger the Garbage Collector on the previous array? I'm creating my own 2D array resizer, but it essentially functions in the same way as the .NET Resize() method.
If the new array is smaller than the previous one, but excess objects have already been placed back into a generic object pool, will this invoke the GC?
Arrays will constantly be created in my game loop, so I need to try and make it as efficient as possible. I'm trying to create an array pool as such, so that there's no need to keep creating them ingame. However, if the resize method does the same thing, then it makes little sense to not just instantiate a new array instead of having the pool.
Thanks for the help

Array.Resize doesn't actually change the original array at all - anyone who still has a reference to it will be able to use it as before. Therefore there's no optimization possible. Frankly it's a badly named method, IMO :(
From the docs:
This method allocates a new array with
the specified size, copies elements
from the old array to the new one, and
then replaces the old array with the
new one.
So no, it's not going to reuse the original memory or anything like that. It's just creating a shallow copy with a different size.

Yes, using Array.Resize causes a new array to be allocated and the old one to eventually be collected (unless there are still references to it somewhere).
A more low-level array resizer could possibly do some minor optimization in some cases (for example when the array is being made smaller or there happens to be memory available right after the array), but .NET's implementation doesn't do that.

Implicitly yes.
Explicitly no.

Any allocation will eventually be cleaned up by the GC when no more references exist, so yes.
If you want to avoid resizing your arrays, the best thing you could do would be to preallocate with a large enough size to avoid having to reallocate at all. In that case, you might as well just use a collection class with an initial capacity specified in the constructor, such as List.

C# List<T>.ToArray performance is bad?

I'm using .Net 3.5 (C#) and I've heard the performance of C# List<T>.ToArray is "bad", since it memory copies for all elements to form a new array. Is that true?

No that's not true. Performance is good since all it does is memory copy all elements (*) to form a new array.
Of course it depends on what you define as "good" or "bad" performance.
(*) references for reference types, values for value types.
EDIT
In response to your comment, using Reflector is a good way to check the implementation (see below). Or just think for a couple of minutes about how you would implement it, and take it on trust that Microsoft's engineers won't come up with a worse solution.
public T[] ToArray()
{
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
}
Of course, "good" or "bad" performance only has a meaning relative to some alternative. If in your specific case, there is an alternative technique to achieve your goal that is measurably faster, then you can consider performance to be "bad". If there is no such alternative, then performance is "good" (or "good enough").
EDIT 2
In response to the comment: "No re-construction of objects?" :
No reconstruction for reference types. For value types the values are copied, which could loosely be described as reconstruction.

Reasons to call ToArray()
If the returned value is not meant to be modified, returning it as an array makes that fact a bit clearer.
If the caller is expected to perform many non-sequential accesses to the data, there can be a performance benefit to an array over a List<>.
If you know you will need to pass the returned value to a third-party function that expects an array.
Compatibility with calling functions that need to work with .NET version 1 or 1.1. These versions don't have the List<> type (or any generic types, for that matter).
Reasons not to call ToArray()
If the caller ever does need to add or remove elements, a List<> is absolutely required.
The performance benefits are not necessarily guaranteed, especially if the caller is accessing the data in a sequential fashion. There is also the additional step of converting from List<> to array, which takes processing time.
The caller can always convert the list to an array themselves.
taken from here

Yes, it's true that it does a memory copy of all elements. Is it a performance problem? That depends on your performance requirements.
A List contains an array internally to hold all the elements. The array grows if the capacity is no longer sufficient for the list. Any time that happens, the list will copy all elements into a new array. That happens all the time, and for most people that is no performance problem.
E.g. a list with a default constructor starts at capacity 16, and when you .Add() the 17th element, it creates a new array of size 32, copies the 16 old values and adds the 17th.
The size difference is also the reason why ToArray() returns a new array instance instead of passing the private reference.

This is what Microsoft's official documentation says about List.ToArray's time complexity
The elements are copied using Array.Copy, which is an O(n) operation, where n is Count.
Then, looking at Array.Copy, we see that it is usually not cloning the data but instead using references:
If sourceArray and destinationArray are both reference-type arrays or are both arrays of type Object, a shallow copy is performed. A shallow copy of an Array is a new Array containing references to the same elements as the original Array. The elements themselves or anything referenced by the elements are not copied. In contrast, a deep copy of an Array copies the elements and everything directly or indirectly referenced by the elements.
So in conclusion, this is a pretty efficient way of getting an array from a list.

it creates new references in an array, but that's just the only thing that that method could and should do...

Performance has to be understood in relative terms. Converting an array to a List involves copying the array, and the cost of that will depend on the size of the array. But you have to compare that cost to other other things your program is doing. How did you obtain the information to put into the array in the first place? If it was by reading from the disk, or a network connection, or a database, then an array copy in memory is very unlikely to make a detectable difference to the time taken.

For any kind of List/ICollection where it knows the length, it can allocate an array of exactly the right size from the start.
T[] destinationArray = new T[this._size];
Array.Copy(this._items, 0, destinationArray, 0, this._size);
return destinationArray;
If your source type is IEnumerable (not a List/Collection) then the source is:
items = new TElement[4];
..
if (no more space) {
TElement[] newItems = new TElement[checked(count * 2)];
Array.Copy(items, 0, newItems, 0, count);
items = newItems;
It starts at size 4 and grows exponentially, doubling each time it runs out of space. Each time it doubles, it has to reallocate memory and copy the data over.
If we know the source-data size, we can avoid this slight overhead. However in most cases eg array size <=1024, it will execute so quickly, that we don't even need to think about this implementation detail.
References: Enumerable.cs, List.cs (F12ing into them), Joe's answer

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.