We have a problem which seems to be caused by the constant allocation and deallocation of memory:
We have a rather complex system here, where a USB device is measuring arbitrary points and sending the measurement data to the PC at a rate of 50k samples per second. These samples are then collected as MeasurementTasks in the software for each point and afterwards processed which causes even more needed memory because of the requirements of the calculations.
Simplified each MeasurementTask looks like the following:
public class MeasurementTask
{
public LinkedList<Sample> Samples { get; set; }
public ComplexSample[] ComplexSamples { get; set; }
public Complex Result { get; set; }
}
Where Sample looks like:
public class Sample
{
public ushort CommandIndex;
public double ValueChannel1;
public double ValueChannel2;
}
and ComplexSample like:
public class ComplexSample
{
public double Channel1Real;
public double Channel1Imag;
public double Channel2Real;
public double Channel2Imag;
}
In the calculation process the Samples are first calculated into a ComplexSample each and then futher processed until we get our Complex Result. After these calculations are done we release all the Sample and ComplexSample instances and the GC cleans them up soon after, but this results in a constant "up and down" of the memory usage.
This is how it looks at the moment with each MeasurementTask containing ~300k samples:
Now we have sometimes the problem that the samples buffer in our HW device is overflown, as it can only store ~5000 samples (~100ms) and it seems the application is not always reading the device fast enough (we use BULK transfer with LibUSB/LibUSBDotNet). We tracked this problem down to this "memory up and down" by the following facts:
the reading from the USB device happens in its own thread which runs at ThreadPriority.Highest, so the calculations should not interfere
CPU usage is between 1-5% on my 8-core CPU => <50% of one core
if we have (much) faster MeasurementTasks with only a few hundret samples each, the memory goes only up and down very little and the buffer never overflows (but the amount of instances/second is the same, as the device still sends 50k samples/second)
we had a bug before, which did not release the Sample and ComplexSample instances after the calculations and so the memory only went up at ~2-3 MB/s and the buffer overflew all the time
At the moment (after fixing the bug mentioned above) we have a direct correlation between the samples count per point and the overflows. More samples/point = higher memory delta = more overflows.
Now to the actual question:
Can this behaviour be improved (easily)?
Maybe there is a way to tell the GC/runtime to not release the memory so there is no need to re-allocate?
We also thought of an alternative approach by "re-using" the LinkedList<Sample> and ComplexSample[]: Keep a pool of such lists/arrays and instead of releasing them put them back in the pool and "change" these instances instead of creating new ones, but we are not sure this is a good idea as it adds complexity to the whole system...
But we are open to other suggestions!
UPDATE:
I now optimized the code base with the following improvements and did various test runs:
converted Sample to a struct
got rid of the LinkedList<Sample> and replaced them by straigt arrays (I actually had another one somewhere else I also removed)
several minor optimizations I found during analysis and optimization
(optional - see below) converted ComplexSample to a struct
In any case it seems that the problem is gone now on my machine (long term tests and test on low-spec hardware will follow), but I first run a test with both types as struct and got the following memory usage graph:
There it still was going up to ~300 MB on a regular basis (but no overflow errors anymore), but as this still seemed odd to me I did some additional tests:
Side note: Each value of each ComplexSample is altered at least once during the calculations.
1) Add a GC.Collect after a task is processed and the samples are not referenced any more:
Now it was alternating between 140 MB and 150 MB (no noticable perfomance hit).
2) ComplexSample as a class (no GC.Collect):
Using a class it is much more "stable" at ~140-200 MB.
3) ComplexSample as a class and GC.Collect:
Now it is going "up and down" a little in the range of 135-150 MB.
Current solution:
As we are not sure this is a valid case for manually calling GC.Collect we are using "solution 2)" now and I will start running the long-term (= several hours) and low-spec hardware tests...
Can this behaviour be improved (easily)?
Yes (depends on how much you need to improve it).
The first thing I would do is to change Sample and ComplexSample to be value-types. This will reduce the complexity of the graph dealt with by GC as while the arrays and linked lists are still collected, they contain those values directly rather than references to them, and that simplifies the rest of GC.
Then I'd measure performance at this point. The impact of working with relatively large structs is mixed. The guideline that value types should be less than 16 bytes comes from it being around that point where the performance benefits of using a reference type tend to overwhelm the performance benefits of using a value type, but that guideline is only a guideline because "tend to overwhelm" is not the same as "will overwhelm in your application".
After that if it had either not improved things, or not improved things enough, I would consider using a pool of objects; whether for those smaller objects, only the larger objects, or both. This will most certainly increase the complexity of your application, but if it's time-critical, then it might well help. (See How do these people avoid creating any garbage? for example which discusses avoiding normal GC in a time-critical case).
If you know you'll need a fixed maximum of a given type this isn't too hard; create and fill an array of them and dole them out from that array before returning them as they are no longer used. It's still hard enough in that you no longer have GC being automatic and have to manually "delete" the objects by putting them back in the pool.
If you don't have such knowledge, it gets harder but is still possible.
If it is really vital that you avoid GC, be careful of hidden objects. Adding to most collection types can for example result in them moving up to a larger internal store, and leaving the earlier store to be collected. Maybe this is fine in that you've still reduced GC use enough that it is no longer causing the problem you have, but maybe not.
Rarely I've seen a LinkedList<> used in .NET... Have you tried using a List<>? Consider that the basic "element" of a LinkedList<> is a LinkedListNode<> that is a class... So for each Sample there is a whole additional overhead of one object.
Note that if you want to use "big" value types (as suggested by others), the List<> could become again slower (because the List<> grows by "generate a new-internal array of double the current size size and copy from old to new), so the bigger the elements, the more memory the List<> has to copy around when it doubles itself.
If you go to List<> you could try splitting the Sample to
List<ushort> CommandIndex;
List<Sample> ValueChannels;
This because the doubles of Sample require 8 byte alignment, so as written the Sample is 24 bytes, with only 18 bytes used.
This wouldn't be a good idea for LinkedList<>, because the LL has a big overhead per item.
Change Sample and ComplexSample to struct.
Related
This might sound like a very strange question. But i work on a project which needs to have cirular references within it. Actually, they are even non-avoidable. Because Users could create their own circular references within the GUI. And this is absolutely intended.... Please don't ask why, this would take ages to explain.
All Question, Answers, Resources i found which discuss Circular References provide Solutions and Approaches on how to avoid one. But non i have read contained a solution on how to make one, without killing the underlying computational resources.
Issues i see
Such a cicular reference seems to me to always have the possibility to completely overhelm the underlying system, be it a simple home computer or research supercomputer where this program is meant to be run.
This is due to my understanding that the resources provided are always finite, but circular references are infinite by nature.
The resources i see which might be of issue here are:
computational power (CPU)
working memory (RAM)
Data storage
Network bandwidth
How could it be possible to mitigate those issues
Mitigation could take place by making sure that the program itself is only ever able to increase it's needs for computational resources in an very minor and incremental fashion. If there are then measures implemented which, based on gathered Data of the whole System as a Unit, allows us to decide if further evolutions are even necessary to improve the perceived Quality of the System. It would help us to cap the needs for Computational Resources.
One of the ways i could imagine that this capping could take place is by introducing time as a limiting factor. The program could be designed in such a way that it only considers re-evaluating "itself" after a given amount of time. If this time and the limit of Quality are carefully choosen to match the underlying computational resources, i feel like the resource issues with circular references could be mitigated.
Code Snippet
Find below a very simplified Code Snippet. Point 1 and Point 2 are completely independent in nature, they could even be on different Threads (actually that's an Idea how it could be done, but i dont understand multithreading well enough to decide if it would be a good approach or not). The action first begins when they are attached to another. I do not care if the behavior of "First this then that" happens in a specific way. The only thing for which i do care is that all interactions between those two Points have been taken place at some point in the future (after their attachement).
namespace Circularity
{
class Program
{
static void Main(string[] args)
{
Point Point1 = new Point();
Point Point2 = new Point();
Point1.attach(Point2);
}
}
class Point
{
private ulong Value;
public Point()
{
Value = ulong.MaxValue / 2;
}
public void attach(Point otherPoint)
{
if (Value < ulong.MaxValue) Value++;
otherPoint.attach(this);
}
}
}
This Code leads instantly to a Stack Overflow. But i do not understand the underlying concepts of the Stack well enough to implement a counter measure. I tried to apply the Time concept here already, but it just takes longer for the Stack Overflow.
The reason you're getting a stack overflow is because you're calling attach recursively, so you will just keep adding stack frames, the CLR can't handle that many and as you've witnessed, it quickly maxes out. One strategy here would be to use Continuation Passing Style so you avoid building a stack of method calls.
When and how to use continuation passing style
My question might sound a little vague. But what I want to know is where the List<> buffer is maintained.
I have a list List<MyClass> to which I am adding items from an infinite loop. But the RAM consumption of the Windows Service(inside which I am creating the List) never goes beyond 17 MB. In fact it hovers between 15-16MB even if I continue adding items to the List.
I was trying to do some Load Testing of My Service and came across this thing.
Can anyone tell me whether it dumps the data to some temporary location on the machine, and picks it from there as I don't see an increase in RAM consumption.
The method which I am calling infinitely is AddMessageToList().
class MainClass
{
List<MessageDetails> messageList = new List<MessageDetails>();
private void AddMessageToList()
{
SendMessage(ApplicationName,Address, Message);
MessageDetails obj= new MessageDetails();
obj.ApplicationName= ApplicationName;
obj.Address= Address;
obj.Message= Message;
lock(messageList)
{
messageList.Add(obj);
}
}
}
class MessageDetails
{
public string Message
{
get;
set;
}
public string ApplicationName
{
get;
set;
}
public string Address
{
get;
set;
}
}
The answer to your question is: "In Memory".
That can mean RAM, and it can also mean the hard drive (Virtual Memory). The OS memory manager decides when to page memory to Virtual Memory, which mostly has to do with how often the memory is accessed (though I don't pretend to know Microsoft's specific algorithm).
You also asked why your memory usages isn't going up. First off, a MegaByte is a HUGE amount of memory. Unless your class is quite large, you will need a LOT of them to make a MB appear. Eventually your memory usage should go up though.
In general C# objects are created from the Heap, which resides in memory. If you want to store things on disk there are ways to go about it, but a standard List<T> will live in memory.
When you create an object it will occupy a certain number of bytes in memory plus the size of the pointers used to reference it. Adding it to a list only adds a pointer to the object you've already created, so if you're adding lots of copies of the same instance into the list, it won't grow as fast as you expect.
If you really want to test the impact of large data structures on your memory, you're going to have to ramp up the numbers. A few thousand average objects aren't going to occupy much memory, but a few million might.
You might also be interested in the GC.GetTotalMemory() method and its friends.
Note that pretty much all memory on Windows (and .NET) is Virtual Memory - its "real, physical" location is arbitrary, Windows memory management handles that. However, regardless of whether it's currently using physical RAM or a page file on the HDD, it will show up as committed private memory.
So it's up to how you're actually creating the items and adding them to the List<T>. How many objects are there? Are you adding the same object over and over again, or creating a new one every time? Are you using the same List instance, or are you creating others? Do you keep references to the created objects (and List instances), or are you throwing them away? Do you actually do anything with the object / list? If not, the optimizer might have removed the code alltogether (it's very conservative, though, so I wouldn't count on that in adding items to a list - that's a very complex scenario with possible side effects).
In the lowest memory ideal case, you could be using about four bytes per list item, that's not much - you'd need 262 144 items to consume a single MiB of memory!
Show us your code, the whole loop and it's surroundings. Then we can tell you what you're actually doing.
EDIT: This is in a WCF service? You should have said so before. Where do you store the MainClass class? If it's inside the WCF service class, it might not last longer than a single request. And even if you fix that, and store it in something a bit more persistent, like a static class, you get into the complexities of when everything is collected, how the service is being restarted etc. If you need the data to be safely held for longer than a single request, storing it in process memory isn't good enough. If you don't care that the data can get thrown away once in a while, you can make the List instance static (it's not going to be shared nor persisted otherwise). Otherwise, use a database.
Speculating from your sparse description, there are two sorts of things you might not realize:
The 15-16 MB usage you see might have nothing to do with the size of your list: it could be the memory requirements for the rest of the program, and your list only consumes a negligible amount of memory in comparison. Even if you don't explicitly create objects, your program still has to load libraries and stuff, which takes memory.
I don't know C# so I don't know if this applies to List, but one of the standard container class implementations to dynamically allocate an array to hold the objects... and if the array is ever filled, then you allocate a new array twice the size and copy everything over to the new array and continue along (the actual ratio may be something other than $2$). This can have the effect that your memory usage remains constant for a long time, until you finally fill up the array and then it suddenly jumps up in size, only to remain constant again for a long time.
Ok, I want to do the following to me it seems like a good idea so if there's no way to do what I'm asking, I'm sure there's a reasonable alternative.
Anyways, I have a sparse matrix. It's pretty big and mostly empty. I have a class called MatrixNode that's basically a wrapper around each of the cells in the matrix. Through it you can get and set the value of that cell. It also has Up, Down, Left and Right properties that return a new MatrixNode that points to the corresponding cell.
Now, since the matrix is mostly empty, having a live node for each cell, including the empty ones, is an unacceptable memory overhead. The other solution is to make new instances of MatrixNode every time a node is requested. This will make sure that only the needed nodes are kept in the memory and the rest will be collected. What I don't like about it is that a new object has to be created every time. I'm scared about it being too slow.
So here's what I've come up with. Have a dictionary of weak references to nodes. When a node is requested, if it doesn't exist, the dictionary creates it and stores it as a weak reference. If the node does already exist (probably referenced somewhere), it just returns it.
Then, if the node doesn't have any live references left, instead of it being collected, I want to store it in a pool. Later, when a new node is needed, I want to first check if the pool is empty and only make a new node if there isn't one already available that can just have it's data swapped out.
Can this be done?
A better question would be, does .NET already do this for me? Am I right in worrying about the performance of creating single use objects in large numbers?
Instead of guessing, you should make a performance test to see if there are any issues at all. You may be surprised to know that managed memory allocation can often outperform explicit allocation because your code doesn't have to pay for deallocation when your data goes out of scope.
Performance may become an issue only when you are allocating new objects so frequently that the garbage collector has no chance to collect them.
That said, there are sparse array implementations in C# already, like Math.NET and MetaNumerics. These libraries are already optimized for performance and will probably avoid performance issues you will run into if you start your implementation from stratch
An SO search for c# and sparse-matrix will return many related questions, including answers pointing to commercial libraries like ILNumerics (has a community edition), NMath and Extreme Optimization's libraries
Most sparse matrix implementations use one of a few well-known schemes for their data; I generally recommend CSR or CSC, as those are efficient for common operations.
If that seems too complex, you can start using COO. What this means in your code is that you will not store anything for empty members; however, you have an item for every non-empty one. A simple implementation might be:
public struct SparseMatrixItem
{
int Row;
int Col;
double Value;
}
And your matrix would generally be a simple container:
public interface SparseMatrix
{
public IList<SparseMatrixItem> Items { get; }
}
You should make sure that the Items list stays sorted according to the row and col indices, because then you can use binary search to quickly find out if an item exists for a specific (i,j).
The idea of having a pool of objects that people use and then return to the pool is used for really expensive objects. Objects representing a network connection, a new thread, etc. It sounds like your object is very small and easy to create. Given that, you're almost certainly going to harm performance pooling it; the overhead of managing the pool will be greater than the cost of just creating a new one each time.
Having lots of short lived very small objects is the exact case that the GC is designed to handle quickly. Creating a new object is dirt cheap; it's just moving a pointer up and clearing out the bits for that object. The real overhead for objects comes in when a new garbage collection happens; for that it needs to find all "alive" objects and move them around, leaving all "dead" objects in their place. If your small object doesn't live through a single collection it has added almost no overhead. Keeping the objects around for a long time (like, say, by pooling them so you can reuse them) means copying them through several collections, consuming a fair bit of resources.
Normally, I'd never have to ask myself whether a given scenario is better suited to a struct or class and frankly I did not ask that question before going the class way in this case. Now that I'm optimizing, things are getting a little confusing.
I'm writing a number crunching application that deals with extremely large numbers containing millions of Base10 digits. The numbers are (x,y) coordinates in 2D space. The main algorithm is pretty sequential and has no more than 200 instances of the class Cell (listed below) in memory at any given time. Each instance of the class takes up approximately 5MB of memory resulting in no more than 1GB in total peak memory for the application. The finished product will run on a 16 core machine with 20GB of RAM and no other applications hogging up the resources.
Here is the class:
// Inheritance is convenient but not absolutely necessary here.
public sealed class Cell: CellBase
{
// Will contain numbers with millions of digits (512KB on average).
public System.Numerics.BigInteger X = 0;
// Will contain numbers with millions of digits (512KB on average).
public System.Numerics.BigInteger Y = 0;
public double XLogD = 0D;
// Size of the array is roughly Base2Log(this.X).
public byte [] XBytes = null;
public double YLogD = 0D;
// Size of the array is roughly Base2Log(this.Y).
public byte [] YBytes = null;
// Tons of other properties for scientific calculations on X and Y.
// NOTE: 90% of the other fields and properties are structs (similar to BigInteger).
public Cell (System.Numerics.BigInteger x, System.Numerics.BigInteger y)
{
this.X = x;
this.XLogD = System.Numerics.BigInteger.Log(x, 2);
this.XBytes = x.ToByteArray();
this.Y = y;
this.YLogD = System.Numerics.BigInteger.Log(y, 2);
this.YBytes = y.ToByteArray();
}
}
I chose to use a class instead of a struct simply because it 'felt' more natural. The number of fields, methods and memory all instinctively pointed to classes as opposed to structs. I further justified that by considering how much overhead temporary assignment calls would have since the underlying primary objects are instances of BigInteger, which itself is a struct.
The question is, have I chosen wisely here considering speed efficiency is the ultimate goal in this case?
Here's a bit about the algorithm in case it helps. In each iteration:
Sorting performed once on all 200 instances. 20% of execution time.
Calculating neighboring (x,y) coordinates of interest. 60% of execution time.
Parallel/Threading overhead for point 2 above. 10% of execution time.
Branching overhead. 10% of execution time.
The most expensive function: BigInteger.ToByteArray() (implementation).
This would be better fit as a class, for many reasons, including
It doesn't logically represent a single value
It's larger than 16 bytes
It's mutable
For details, see Choosing Between Classes and Structures.
In addition, I'd also suggest that it's better suited to a class given:
It contains reference types (arrays). Structures containing classes are rarely a good design idea.
This is especially true, though, given what you're doing. If you were to use a struct, sorting would require copies of the entire struct, instead of just copies of the references. Method calls (unless passed by ref) would incur a huge overhead, as well, since you'd be copying all of the data.
Parallelization of items in a collection could also likely incur huge overhead, as the bounds checking on any array of the struct (ie: if it's kept in a List<Cell> or similar) would cause bad false sharing, since all access into the list would access the memory at the start of the list.
I would recommend leaving this as a class, and, in addition, I would suggest trying to move the fields into properties, and making the class as immutable as possible. This will help keep your design clean, and less likely to be problematic when multithreading.
It's hard to tell based on what you've written (we don't know how often you end up copying a value of type Cell for example) but I would strongly expect a class to be the correct approach here.
The number of methods in the class is irrelevant, but if it has lots of fields you need to consider the impact of copying all those fields any time you pass a value to another method (etc).
Fundamentally it doesn't feel like a value type to start with - but I understand that if performance is particularly important, the philosophical aspects may not be as interesting to you.
So yes, I think you've made the right decision, and I see no reason to believe anything else at the moment - but of course if you can easily change the decision and test it as a struct, that would be better than guesswork. Performance is remarkably difficult to predict accurately.
Since your class does contain arrays which do consume most of your memory and you have only 200 Cell Instances around the memory consumption of the class itself is not an issue. You were right that a class felt more natural it is indeed the right choice. My guess would be that the comparison of XByte[] and XYBytes[] does limit your sorting time. It all depends how big your arrays are and how you do perform the comparison.
Let's start ignoring the performance matters, and work up to them.
Structs are ValueTypes and ValueTypes are value-types. Integer's and DateTime's are value-types and a good comparison. There's no sense in talking about how one 1 is or isn't the same as 1, or how one 2010-02-03T12:45:23.321Z is or isn't the same as another 2010-02-03T12:45:23.321Z. They may have different significance in different uses, but that 1 == 1 and 1 != 2 and that 2010-02-03T12:45:23.321Z == 2010-02-03T12:45:23.321Z and 2010-02-03T12:45:23.321Z != 2931-03-05T09:21:29.43Z is inherent to the nature of integers and date-times and that's what makes them value-types.
That's the purest way of thinking about this. If it matches the above it's a value-type, if it doesn't, it's a reference type. Nothing else comes into it.
Extension 1: If an X can have an X then it has to be a reference type. Whether this logically follows from what was said above is debatable, but whatever you think on the matter you can't have a struct that has an instance of another one of itself as a member (directly or indirectly) in practice, so that's that.
Extension 2: Some say that the difficulties that come from mutable structs come from the above, and some do not. Again though, whatever you think on the matter, there are practical difficulties. A mutable struct can be useful in a few cases, but they cause enough confusion that they should be restricted to private cases as an optimisation rather than public cases as a matter of course.
Here comes the performance bit...
Value types and reference types have different characteristics in different cases that affects the speed, the memory use, and the way that memory use affects garbage collection in several ways giving each different pros and cons as far as performance goes. Just how much attention we pay to that, depends on how much we need to get down to that level. It's worth saying right now that the ways in which they differ tends to balance to a win if you follow the above rule on deciding between struct and class so if we start thinking about this beyond that, we're at least bordering on optimisation territory.
Optimisation level 1.
If a value type instance will contain more than 16bytes per instance, it should probably be made a reference. This is sometimes even stated as a "natural" difference rather than one of optimisation. Strictly there's nothing in "value type" that entails "16 or fewer bytes" but it does tend to balance out that way.
Moving away from the simplistic "16 bytes" rule, the smaller it is the faster it is to copy, and contrary-wise, so bending it for a 20-byte instance is of less impact than bending it for a 200-byte instance.
Will you need to box and unbox a lot? Since the introduction of generics we've been able to avoid a lot of cases where we would box and unbox with 1.0 and 1.1, so this isn't as big a deal as once, but if you do it will hurt performance.
Optimisation level 2.
The fact that value types can be place on a stack, placed directly in an array (rather than references to them) and be direct fields of a struct or class (again, rather than references to them) can make access to them and to their fields faster.
If you're going to create an array of them and if all-zero values are a useful starting point for you, you get that immediately, where as with reference types you get an array of nulls. This can make structs faster.
Edit: Something that extends from the above, if you are going to be iterating through arrays rapidly, then as well as the direct-access giving a boost over following the reference, you'll be loading a couple of instances into CPU cache at a time (64 bytes worth on current x86-32 or x86-64/amd, 128 bytes worth on ia-64). It has to be a pretty tight loop to matter, but there are cases where it does.
Pretty much most "I went for struct rather than class for performance" comes down to either the first point, or the first in combination with the second.
Optimisation level 3.
If you will have cases where some of the values you are concerned with are duplicates of each other, and they are large in size, then with immutable instances (or mutable instances you simply never mutate once you start doing what follows), you can deliberately alias different references so that you save a lot of memory because your e.g. 20 duplicate objects of 2kiB in size are actually the same object, hence saving 26kiB in that case. It can also make comparisons faster because the cases where you can short-cut on identity are more frequent. This can only be done with reference types.
Optimisation level 4.
Structs that have arrays do though alias the contained array and could internally use the above technique, balancing out that point, though it's somewhat more involved.
Optimisation level X.
It doesn't matter how much thinking about these pros and cons comes to a particular answer, if actually measuring the results comes to a different ones. Since there are both pros and cons, it's always possible to get this wrong.
In thinking about 1 through 4, along with the differences between value and reference types aside from such optimisation concerns, I think you should go for a class.
In thinking about level X I wouldn't be amazed if your actually testing it proved me wrong. The best bit is, if it is arduous to change from class to struct (you make heavy use of aliasing or the possibility of null value), then you can be pretty confident that doing so is a lose. If it isn't arduous, then you can just do so and measure! I'd strongly suggest measuring a test that involves a real run over doing something 10,000 times - who cares if you can do a given operation 10,000 times in a few less seconds if you do a different operation 20 times more often in the real thing?
A struct can only contain an array-type field safely if either (1) the state of the struct depends upon the identity of the array rather than its contents (as is the case with ArraySegment), or (2) no reference to the array will ever be held by anything that might try to mutate it (typically, this means that the array field will be private, and the struct itself will create the array and perform all modifications that will ever be done to it, before storing a reference in the field).
I advocate using structs much more commonly than other people here, but the fact that your data storage thingie would have two array-type fields would seem a strong argument against using a struct.
There is an established guideline that getting a hashcode should not allocate memory because this will negatively impact hash table lookups by invoking the garbage collector.
Yet this exact failing is what I see what I profile my application which uses a System.Collections.Generic.Dictionary
Way deep down in a very tight loop I find the following in my profiler results:
[3.47%] TryGetValue(TKey, TValue&) (...Dictionary)
[3.47%] FindEntry(TKey) (...Dictionary)
[3.47%] GetHashCode(string) (System.CultureAwareComparer)
[3.46%] GetHashCodeOfString(String, CompareOptions) (System.Globalization.CompareInfo)
[3.39%] [Garbage Collection]
[0.01%] [Thread Suspendended]
That's the whole sub-tree accounting from the profiler.
I'm not a seasoned expert in this specific sort of work, so I could be reading these tea leaves incorrectly. But it looks to me like GetHashCodeOfString "must be" allocating memory and inviting the garbage collector to interrupt my program in the middle of this loop I want REALLY TUNED AND TIGHT, and this is accounting for the staggering majority of the cost of this loop.
As an aside, here is an additional piece of evidence suggesting this code allocates memory
My next step will be to initialize the Dictionary with the ordinal comparer and re-run my tests.
But I want to know if there is existing wisdom out there around this issue. It seems like dictionaries with string keys are common, and the costs of such a common thing may be well explored. I found the following analysis, but it focuses on the actual comparison as the cause for woe, and not the hash code method allocating memory.
Can anyone suggest the proper way to use a dictionary with string keys that avoids this problem?
Specific questions I have include:
If I use the ordinal comparitor will the allocation go away?
If not, do I need to write my own comparitor, and will THAT make the allocation go away?
If I do make the comparitor go away, can I really expect a real improvement, as per the MSFT recommendation link I started with?
EDIT: Crud, my bad, but this is not with the default comparer properties, we have it set to ignoreCase. Not sure if this impacts the results, but since ignoreCase would impact the equality, it must therefor have some impact on the hash.
UPDATE: Ran another test using the ordinal comparer (still with IgnoreCase), and recast the original results output to 100% cost = TryGetValue so it would be more apples to apples
Original:
100% TryGetValue
100% FindEntry
99.5% CultureAwareComparer.GetHashCode
99.5% CompareInfo.GetHashCodeOfString
95.86% [Garbage Collection]
3.31% [Thread Suspended]
0.5% CultureAwareComparer.Equals
0.5% Compare
0.5% [garbage collection]
Ordinal:
100% TryGetValue
100% FindEntry
47.22% CultureAwareComparer.Equals
47.22% [Garbage Collection]
There also appeared to be a dramatic decrease in the overall time spend in TryGetValue. I was not careful to make sure all else was equal, but this accounted for 46 seconds out of a 10 minute stress test in the first run, and in the orindal run it accounted for 252 milliseconds. Consider that anecdotal, not an expected relative cost.
It seems like the entire cost of the hash, which used to be 99+% of the cost, is now so "free" that it fails to even appear in the profiler, which I think is running in sampling mode.
I guess this seconds the word on the street that you should use ordinal comparison.
I still can't PROVE to myself why the GC cost is contributing so heavily to the first profile result, but from the comments below I suppose I have to believe it does NOT allocate managed heap memory, but that because it's slow, it tends to be the function that is "randomly" GCed by other activities on other threads, as this process is indeed using server mode gc.
Maybe this indicates that this tight loop tends to be concurrent with allocation-happy code someplace else.
By default, when you use string keys, string.GetHashCode() is used. This method doesn't allocate any memory on the heap, and should be pretty fast.
But since you're using ignore case, CultureAwareComparer.GetHashCode() is used instead. That method calls (as can be seen from your profile results) CompareInfo.GetHashCodeOfString(), which in turn calls the unmanaged function InternalGetGlobalizedHashCode(). Neither of the two managed methods makes any heap allocations (as you can see if you look at them in a decompiler). I can't say what InternalGetGlobalizedHashCode() does, but since it is unmanaged, I doubt it makes any allocations on the managed heap. In any case, it has to be quite a lot more complex than the default hash code computation, especially since it is culture-aware and has to keep in mind issues like the Turkish İ.
What this means is that you probably have some other code that allocates memory on the heap, which causes the garbage collection.
And if you are going for maximum performance, you should avoid “ignore case”, and especially its culture-aware variants.