Do array properties cause memory allocation on the heap? - c#

Consider the following:
public class FooBar {
public int[] SomeNumbers {
get { return _someNumbers; }
private set;
}
private int[] _someNumbers;
public FooBar() {
_someNumbers = new int[2];
_someNumbers[0] = 1;
_someNumbers[1] = 2;
}
}
// in some other method somewhere...
FooBar foobar = new FooBar();
Debug.Log(foobar.SomeNumbers[0]);
What I am wondering is, does calling the SomeNumbers property cause a heap allocation; basically does it cause a copy of the array to be created, or is it just a pointer?
I ask because I am trying to resolves some GC issues I have due to functions that return arrays, and I want to make sure my idea of caching some values like this will actually make a difference

Arrays are always reference types, so yes, it is "basically returning a pointer".
If you are trying to debug memory issues I recommend using a memory profiler. There is one built in to Visual Studio or you can use a 3rd party one (I personally like DotMemory, it has a 5 day free trial). Using a profiler will help you identify what is creating memory objects and what is keeping memory objects alive.

Related

How to avoid garbage collation problems when dealing with large in memory lists in c#

We had some problems with SQL performance with our article-tags relationships, so we have decided to keep our article/tags in-memory, which gave us significant boost, but it's now causing us headaches with garbage collection when entire lists are removed and replaced with new one (3m + records).
Here is a piece of code:
private readonly IContextCreator _contextCreator;
private volatile static List<TagEngineCacheResponse> _cachedList = new List<TagEngineCacheResponse>();
private readonly int KEYWORD_GROUP_NAME = 1;
private static BitmapIndex.BitmapIndex _bitMapIndex = new BitmapIndex.BitmapIndex();
public TagEngineService(IContextCreator contextCreator)
{
_contextCreator = contextCreator;
}
public async Task RepopulateEntireCacheAsync()
{
using (var ctx = _contextCreator.PortalContext())
{
var cmd = ctx.Database.Connection.CreateCommand();
cmd.CommandText = BASE_SQL_QUERY;
await ctx.Database.Connection.OpenAsync();
var reader = await cmd.ExecuteReaderAsync();
var articles = ((IObjectContextAdapter)ctx)
.ObjectContext
.Translate<TagEngineCacheResponse>(reader).ToList();
//recreate bitmap indexes
BitmapIndex.BitmapIndex tempBitmapIndex = new BitmapIndex.BitmapIndex();
int recordRow = 0;
foreach (var record in articles)
{
tempBitmapIndex.Set(new BIKey(KEYWORD_GROUP_NAME, record.KeywordId), recordRow);
recordRow++;
}
_cachedList = articles;
_bitMapIndex = tempBitmapIndex;
}
}
Class definition:
public class TagEngineCacheResponse
{
public int ArticleId { get; set; }
public int KeywordId { get; set; }
public DateTime PublishDate { get; set; }
public int ViewCountSum { get; set; }
}
As you can see, when cache is recreated, _cachedList is replaced with a new list, and old one is prepared to be garbage collected. At this point, cpu time for GC jumps to 60-90% for 2-3 seconds.
Are there any ideas how to improve this piece of code to avoid GC problems?
I would guess the list would take about 44 bytes per object, or ~130Mb for 3m objects. This is a bit on the large side, but not incredibly so.
Some suggestions:
The list is well over the 87k limit for the small object heap (SOH), so it will be allocated on the large object heap (LOH). This is only collected in gen 2 and gen 2 collections can be expensive. To avoid this it is recommended to avoid de-allocation of gen2 objects as much as possible, i.e. allocate them once and then reuse them as much as possible.
You could fetch the list from the database in smaller chunks and update the list in place. Make sure each chunk is within the limit of the SOH. You might consider either locking the list to ensure it is not accessed while updating, or keep two alternating lists where you update one and then switch the 'active' list.
You are using a class for the TagEngineCacheResponse, this will cause a great deal of objects to be allocated. While these are small enough to fit on the SOH, they may, if you are unlucky, survive long enough to be put on gen 2 heap. While GC time is not greatly affected by un-referenced objects, it might still be better to use a value type and avoid the problem. Profile to make sure it actually helps.

Manage hundreds of classes without creating and destroying them?

I have a class A that works with hundreds or thousands of classes, each class has a method with some calculations, for example.
Class A has a method where it choose which class, of those hundreds or thousands, runs. And the method of class A runs many times in a short time.
The solution that I thought at the beginning was to have the classes already created in class A, to avoid having to create and destroy classes every time the event was executed and that the garbage collector consumes CPU. But this class A, as I say, is going to have hundreds or thousands of classes to run and having them all loaded is too high an expense in memory (I think).
My question is, can you think of an optimal way to work with hundreds or thousands of classes, which will run some of them every second, without having to create and destroy it in each execution of the method that works with them?
Edit:
First example: Create and save the classes and then use them, I think it would be a memory expense. But keep the garbage collector from working too much.
public class ClassA {
Class1 class1;
Class2 class2;
// ... more classes
Class100 class100;
public ClassA() {
class1 = new Class1();
// ... ‎initializations
class100 = new Class100();
}
public ChooseClass(int numberClass) {
switch (numberClass) {
case 1:
class1.calculate();
break;
case 2:
class2.run();
break;
// ... more cases, one for each class
case 100:
class100.method();
break;
default:
break;
}
}
}
Second example: Creating the class when used, saves memory but the garbage collector consumes a lot of CPU.
public class ClassA {
public ChooseClass(int numberClass) {
switch (numberClass) {
case 1:
Class1 class1 = new Class1();
class1.calculate();
break;
case 2:
Class2 Class2 = new Class2();
class2.run();
break;
// ... more cases, one for each class
case 100:
Class100 class100 = new Class100();
class100.method();
break;
default:
break;
}
}
}
The basic problem you face, when you start increasing the number of class instances is that they all need to be accounted and tracked during garbage collection operation, even if you never free those instances, the garbage collector still needs to track them. There comes a point when the program spends more time performing garbage collection than actual work. We experienced this kind of performance problem with a binary search tree that ended up containing several millions of nodes that originally were class instances.
We were able to circumvent this by using List<T> of structs rather than classes. (The memory of a list is backed by an array, and for structs, the garbage collector only needs to track a single reference to this array). Now, instead of references to a class, we store indices to this list in order to access a desired instance of the struct.
In fact we also faced the problem (notice newer versions of the .NET framework do away with this limitation) that the backing array couldn't grow beyond 2GB even under 64-bits, so we split storage on several lists (256) and used a 32 bit index where 8 bits acted as a list selector and the remaining 24 bits served as an index into the list.
Of course it is convenient to build a class that abstracts all these details, and you need to be aware that when modifying the struct, you actually need to copy it to a local instance, modify it and then replace the original struct with a copy of the modified instance, otherwise your changes will occur in a temporal copy of the struct and not be reflected on your data collection. Also, there is a performance impact, that fortunately is paid-back once the collection is large enough, with extremely fast garbage collection cycles.
Here is some code (quite old), showing these ideas in place, and went from a server spending near 100% of CPU time, to around 15%, just by migrating our search tree to this approach.
public class SplitList<T> where T : struct {
// A virtual list divided into several sublists, removing the 2GB capacity limit
private List<T>[] _lists;
private Queue<int> _free = new Queue<int>();
private int _maxId = 0;
private const int _hashingBits = 8;
private const int _listSelector = 32 - _hashingBits;
private const int _subIndexMask = (1 << _listSelector) - 1;
public SplitList() {
int listCount = 1 << _hashingBits;
_lists = new List<T>[listCount];
for( int i = 0; i < listCount; i++ )
_lists[i] = new List<T>();
}
// Access a struct by index
// Remember that this returns a local copy of the struct, so if changes are to be done,
// the local copy must be copied to a local struct, modify it, and then copy back the changes
// to the list
public T this[int idx] {
get {
return _lists[(idx >> _listSelector)][idx & _subIndexMask];
}
set {
_lists[idx >> _listSelector][idx & _subIndexMask] = value ;
}
}
// returns an index to a "new" struct inside the collection
public int New() {
int result;
T newElement = new T();
// are there any free indexes available?
if( _free.Count > 0 ) {
// yes, return a free index and initialize reused struct to default values
result = _free.Dequeue();
this[result] = newElement;
} else {
// no, grow the capacity
result = ++_maxId;
List<T> list = _lists[result >> _listSelector];
list.Add(newElement);
}
return result;
}
// free an index and allow the struct slot to be reused.
public void Free(int idx) {
_free.Enqueue(idx);
}
}
Here is a snippet of how our binary tree implementation ended up looking using this SplitList backing container class:
public class CLookupTree {
public struct TreeNode {
public int HashValue;
public int LeftIdx;
public int RightIdx;
public int firstSpotIdx;
}
SplitList<TreeNode> _nodes;
…
private int RotateLeft(int idx) {
// Performs a tree rotation to the left, here you can see how we need
// to retrieve the struct to a local copy (thisNode), modify it, and
// push back the modifications to the node storage list
// Also note that we are working with indexes rather than references to
// the nodes
TreeNode thisNode = _nodes[idx];
int result = thisNode.RightIdx;
TreeNode rightNode = _nodes[result];
thisNode.RightIdx = rightNode.LeftIdx;
rightNode.LeftIdx = idx;
_nodes[idx] = thisNode;
_nodes[result] = rightNode;
return result;
}
}

How to truncate an array in place in C#

I mean is it really possible? MSDN says that arrays are fixed-size and the only way to resize is "copy-to-new-place". But maybe it is possible with unsafe/some magic with internal CLR structures, they all are written in C++ where we have a full memory control and can call realloc and so on.
I have no code provided for this question, because I don't even know if it can exist.
I'm not talking about Array.Resize methods and so on, because they obviosly do not have needed behaviour.
Assume that we have a standard x86 process with 2GB ram, and I have 1.9GB filled by single array. Then I want to release half of it. So I want to write something like:
MagicClass.ResizeArray(ref arr, n)
And do not get OutOfMemoryException. Array.Resize will try to allocate another gigabyte of RAM and will fail with 1.9+1 > 2GB OutOfMemory.
You can try Array.Resize():
int[] myArray = new int[] { 1, 2, 3, 4 };
int myNewSize = 1;
Array.Resize(ref myArray, myNewSize);
// Test: 1
Console.Write(myArray.Length);
realloc will attempt to do the inplace resize - but it reserves the right to copy the whole thing elsewhere and return a pointer that's completely different.
Pretty much the same outward behaviour is exposed by .NET's List<T> class - which you should be using anyway if you find yourself changing array sizes often. It hides the actual array reference from you so that the change is propagated throughout all of the references to the same list. As you remove items from the end, only the length of the list changes while the inner array stays the same - avoiding the copying.
It doesn't release the memory (you can always do that explicitly with Capacity = XXX, but that makes a new copy of the array), but then again, unless you're working with large arrays, neither does realloc - and if you're working with large arrays, yada, yada - we've been there :)
realloc doesn't really make sense in the kind of memory model .NET has anyway - the heap is continously collected and compacted over time. So if you're trying to use it to avoid the copies when just trimming an array, while also keeping memory usage low... don't bother. At the next heap compaction, the whole memory above your array is going to be moved to fill in the blanks. Even if it were possible to do the realloc, the only benefit you have over simply copying the array is that you would keep your array in the old-living heap - and that isn't necessarily what you want anyway.
Neither array type in BCL supports what you want. That being said - you can implement your own type that would support what you need. It can be backed by standard array, but would implement own Length and indexer properties, that would 'hide' portion of array from you.
public class MyTruncatableArray<T>
{
private T[] _array;
private int _length;
public MyTruncatableArray(int size)
{
_array = new T[size];
_length = size;
}
public T this[int index]
{
get
{
CheckIndex(index, _length);
return _array[index];
}
set
{
CheckIndex(index, _length);
_array[index] = value;
}
}
public int Length
{
get { return _length; }
set
{
CheckIndex(value);
_length = value;
}
}
private void CheckIndex(int index)
{
this.CheckIndex(index, _array.Length);
}
private void CheckIndex(int index, int maxValue)
{
if (index < 0 || index > maxValue)
{
throw new ArgumentException("New array length must be positive and lower or equal to original size");
}
}
}
It really depend what exactly do need. (E.g. do you need to truncate just so that you can easier use it from your code. Or is perf/GC/memory consumption a concern? If the latter is the case - did you perform any measurements that proves standard Array.Resize method unusable for your case?)

Loop of intensive processing and serializing - how to completely clean memory after each serialization in C#?

In my C# console application, I instantiate an object MyObject of type MyType. This object contains 6 very large arrays, some of them containing elements of primitive type, others elements of other reference types. The latter can in turn contain big arrays. In order to instantiate all these arrays, I do some intensive processing, which lasts for about 2 minutes.
The machine I'm working on has 4 GB RAM, on a 32bit Windows. Before running my console app, the available memory is at about 2413 MB, and right before finishing, the available memory goes to about 300-400 MB.
After I assign values for all my arrays in the object MyObject, I'm serializing it. My objective is to instantiate and serialize 50 objects like this one. So after serializing, I set to null all the arrays. [This does not reflect immediately in Task Manager, where the available memory is still 300-400 Mo. So I assume the GC does not collect immediately.] Right after this, I'm reexecuting the method that instantiates the arrays in MyObject. I get a system out of memory exception almost immediately. I'm thinking this is not the right approach to effectively manage memory in .NET.
So my question is this: knowing that processing one object of type MyType, like MyObject, "fits" in the available memory resources, how can I instantiate one object of type MyType, serialize it, then completely clean ALL the memory that was used for this purpose? And then, either reinstantiate the same object, or a new object of the same type? So that I the end I get with 50 different serialized objects of type MyType?
Thanks.
Updating question with code. Here's a simplified version of my code:
class MyType
{
int[] intArray; int[] intArray2;
double[] doubleArray;
RefType1[] refType1Array;
RefType2[] refType2Array;
RefType3[] refType3Array;
public MyType (params)
{
for (int i = 0; i < 50; i++)
{
instantiateArrays();
serializeObject();
releaseMemory();
}
}
private void instantiateArrays()
{
//instantiate all the primitive arrays with 50.000 elements per array
//instantiate refType1Array with 300 elements, refType2Array with 3000 elements and
//refType3Array with 150 elements
//lasts about 2 minutes
}
private void serializeObject()
{
Stream fileStream = File.Create(filePath);
BinaryFormatter serializer = new BinaryFormatter();
serializer.Serialize(fileStream, this);
fileStream.Close();
}
private void releaseMemory()
{
intArray = null;
intArray2 = null;
doubleArray = null;
refType1Array = null;
refType2Array = null;
refType3Array = null;
}
}
RefType1 contains integer and double fields, and another array of integers with, on average, 50 elements. RefType2 and RefType3 contain integer and double fields, and another array of reference type RefType4. On average, this array contains 500 objects. Each RefType4 object contains an array of 15 integers, on average.
you can clean memory by this command:
GC.Collect();
GC.WaitForPendingFinalizers();

C#.net decorator pattern converting/setting collection

Have a question about how to better optimize or speed things up. Currently my code seems to be running a bit slow...
I have the following classes
public class DataFoo : IFoo { }
public class Foo
{
internal IFoo UnderlyingDataObject{get;set;}
public Foo(IFoo f)
{
UnderlyingDataObject = f;
}
}
Now, in many cases I end up needing or calling a method that will provide back a List. This method will initially get a array of DataFoo objects and will iterate over all returned objects instantiating a new Foo object passing in the DataFoo... Here's an example...
public List<Foo> GetListOfFoo(Guid id)
{
DataFoo[] q = GetArrayOfDataFoo(id);
List<Foo> rv = new List<Foo>();
for(var i = 0; i < q.Length; i++)
{
rv.Add(new Foo(q[i]));
}
return rv;
}
The issue is that having to iterate over and instantiate like this seems pretty slow. I was curious if anyone might have suggestions on how to speed this up...
Firstly, you should profile this carefully. Whenever you're looking at performance, don't try and guess what's going on -- measure everything. You will, more often than not, be surprised by the truth.
Your definition of GetListOfFoo could be improved slightly to avoid needless resizing of the List<Foo> by specifying the initial capacity:
DataFoo[] q = GetArrayOfDataFoo(id);
List<Foo> rv = new List<Foo>(q.Length);
But unless you're dealing with very large arrays and are concerning yourself with very small periods of time, then this won't make much difference.
There's nothing about the decorator pattern you're using that should effect your performance noticably unless you're talking about millions of updates a second or microsecond latencies.
I would like to see what GetArrayOfDataFoo is doing. My guess is that your issue is occurring outside what you have shown us.

Categories

Resources