I have a performance counters reporter class that holds in multiple members different lists. Is there a neat way to tell the class the memory limit of its consumption (to be bullet proof case the lists will be pushed with enormous amount of data), or should I go to each member and change it to blocked list? (and this is less dynamic in a way)
What you're asking doesn't make sense. How could a class limit its memory consumption?
Consider: you have a public property that is a list of data. You set the value of that property to be a 2GB set of data but the class is limited to 100MB. How does the class decide what data to throw away? What happens to the data that's thrown away? How does the rest of your program deal with the fact that half its data has disappeared?
None of these questions are sensibly answered, because each program will have a different answer. For that reason, you'd have to implement such logic yourself.
However, more importantly, you should consider this: if I create a List<int> that contains 2GB of data, and assign this list to a property of your "reporter class," the memory consumption of your reporter class doesn't change. This is because your reporter class has a property that is a List<int>, and what that means is that the property stores the memory address of a List<int> that is held somewhere else in the heap. This memory address - a pointer to what we consider the "value" of the property - is fixed according to the architecture of your machine/application and will never change. It's the same size when your pointer is null as it is when the pointer points to a 2GB list. And so in that sense, the memory consumption of your class itself won't be as big as you think.
You can redefine the question to say "when calculating consumption, include all objects pointed to by my properties" but this has its own problems. What happens if you assign the List<int> to a property on two different objects, each with its own memory limit?
Also, if your reporting class has two properties that could hold large data, and I assign large values to each, how do you decide what to throw away? If I have a 100MB limit for the class, and assign 200MB of data to one property and 1GB of data to the other, which data do I truncate? What would happen if I then cleared one of the properties - I now have "spare" memory consumption but data is irretrievably lost.
In short: this is a very complex requirement to request. You'd have to create your own logic to implement this, and it's unlikely you'll find anything "standard" to handle it, because no two implementations would be the same.
Related
I want to create a collection class that can collect any type of data (string, int, float). So I decided to use a List<object> structure to store any kind of data.
Since using a List structure is safer (managed) than creating an unmanaged array, I want to create a List structure so it can hold any kind of data... but I have some concerns that if I create a List<object> structure and try to hold some strings, there could be memory leaks because of string type..
So do I have to do somethings after using (emptying) the List and deallocate the strings individualy or does .Net already handle that...
Is there a nicer method for creating general collection class?
You won´t need to GarbageCollect any objects on your own as long as you really need to, but according to your post that´s not the case here, actually this is only necessary in a few cases (you may look here what these cases might be).
However .NET frees any memory to which no references exist (indepenedend if you have an int, a string or any custom object), thus if you leave the scope of your array, list or whatever you use the contained elements will be eliminated at some none-deterministic point of time by the GC, but you won´t take care for this.
What you mean by managed and unmanaged is probably the fact, that a List is a bit more dynamic as it can change its size depending on the current number of elements. If this number exceeds a given maximum the list automatically increaes by a given factor. An array is fixed in size however. The term "unmanaged" however relies to C++ e.g., C#-Code is allways managed code (which means there is a garbage-collector e.g.). See Wikipedia on Managed Code
My question might sound a little vague. But what I want to know is where the List<> buffer is maintained.
I have a list List<MyClass> to which I am adding items from an infinite loop. But the RAM consumption of the Windows Service(inside which I am creating the List) never goes beyond 17 MB. In fact it hovers between 15-16MB even if I continue adding items to the List.
I was trying to do some Load Testing of My Service and came across this thing.
Can anyone tell me whether it dumps the data to some temporary location on the machine, and picks it from there as I don't see an increase in RAM consumption.
The method which I am calling infinitely is AddMessageToList().
class MainClass
{
List<MessageDetails> messageList = new List<MessageDetails>();
private void AddMessageToList()
{
SendMessage(ApplicationName,Address, Message);
MessageDetails obj= new MessageDetails();
obj.ApplicationName= ApplicationName;
obj.Address= Address;
obj.Message= Message;
lock(messageList)
{
messageList.Add(obj);
}
}
}
class MessageDetails
{
public string Message
{
get;
set;
}
public string ApplicationName
{
get;
set;
}
public string Address
{
get;
set;
}
}
The answer to your question is: "In Memory".
That can mean RAM, and it can also mean the hard drive (Virtual Memory). The OS memory manager decides when to page memory to Virtual Memory, which mostly has to do with how often the memory is accessed (though I don't pretend to know Microsoft's specific algorithm).
You also asked why your memory usages isn't going up. First off, a MegaByte is a HUGE amount of memory. Unless your class is quite large, you will need a LOT of them to make a MB appear. Eventually your memory usage should go up though.
In general C# objects are created from the Heap, which resides in memory. If you want to store things on disk there are ways to go about it, but a standard List<T> will live in memory.
When you create an object it will occupy a certain number of bytes in memory plus the size of the pointers used to reference it. Adding it to a list only adds a pointer to the object you've already created, so if you're adding lots of copies of the same instance into the list, it won't grow as fast as you expect.
If you really want to test the impact of large data structures on your memory, you're going to have to ramp up the numbers. A few thousand average objects aren't going to occupy much memory, but a few million might.
You might also be interested in the GC.GetTotalMemory() method and its friends.
Note that pretty much all memory on Windows (and .NET) is Virtual Memory - its "real, physical" location is arbitrary, Windows memory management handles that. However, regardless of whether it's currently using physical RAM or a page file on the HDD, it will show up as committed private memory.
So it's up to how you're actually creating the items and adding them to the List<T>. How many objects are there? Are you adding the same object over and over again, or creating a new one every time? Are you using the same List instance, or are you creating others? Do you keep references to the created objects (and List instances), or are you throwing them away? Do you actually do anything with the object / list? If not, the optimizer might have removed the code alltogether (it's very conservative, though, so I wouldn't count on that in adding items to a list - that's a very complex scenario with possible side effects).
In the lowest memory ideal case, you could be using about four bytes per list item, that's not much - you'd need 262 144 items to consume a single MiB of memory!
Show us your code, the whole loop and it's surroundings. Then we can tell you what you're actually doing.
EDIT: This is in a WCF service? You should have said so before. Where do you store the MainClass class? If it's inside the WCF service class, it might not last longer than a single request. And even if you fix that, and store it in something a bit more persistent, like a static class, you get into the complexities of when everything is collected, how the service is being restarted etc. If you need the data to be safely held for longer than a single request, storing it in process memory isn't good enough. If you don't care that the data can get thrown away once in a while, you can make the List instance static (it's not going to be shared nor persisted otherwise). Otherwise, use a database.
Speculating from your sparse description, there are two sorts of things you might not realize:
The 15-16 MB usage you see might have nothing to do with the size of your list: it could be the memory requirements for the rest of the program, and your list only consumes a negligible amount of memory in comparison. Even if you don't explicitly create objects, your program still has to load libraries and stuff, which takes memory.
I don't know C# so I don't know if this applies to List, but one of the standard container class implementations to dynamically allocate an array to hold the objects... and if the array is ever filled, then you allocate a new array twice the size and copy everything over to the new array and continue along (the actual ratio may be something other than $2$). This can have the effect that your memory usage remains constant for a long time, until you finally fill up the array and then it suddenly jumps up in size, only to remain constant again for a long time.
I had some problems with a WCF web service (some dumps, memory leaks, etc.) and I run a profillng tool (ANTS Memory Profiles).
Just to find out that even with the processing over (I run a specific test and then stopped), Generation 2 is 25% of the memory for the web service. I tracked down this memory to find that I had a dictionary object full of (null, null) items, with -1 hash code.
The workflow of the web service implies that during specific processing items are added and then removed from the dictionary (just simple Add and Remove). Not a big deal. But it seems that after all items are removed, the dictionary is full of (null, null) KeyValuePairs. Thousands of them in fact, such that they occupy a big part of memory and eventually an overflow occurs, with the corresponding forced application pool recycle and DW20.exe getting all the CPU cycles it can get.
The dictionary is in fact Dictionary<SomeKeyType, IEnumerable<KeyValuePair<SomeOtherKeyType, SomeCustomType>>> (System.OutOfMemoryException because of Large Dictionary) so I already checked if there is some kind of reference holding things.
The dictionary is contained in a static object (to make it accesible to different processing threads through processing) so from this question and many more (Do static members ever get garbage collected?) I understand why that dictionary is in Generation 2. But this is also the cause of those (null, null)? Even if I remove items from dictionary something will be always occupied in the memory?
It's not a speed issue like in this question Deallocate memory from large data structures in C# . It seems that memory is never reclaimed.
Is there something I can do to actually remove items from dictionary, not just keep filling it with (null, null) pairs?
Is there anything else I need to check out?
Dictionaries store items in a hash table. An array is used internally for this. Because of the way hash tables work, this array must always be larger than the actual number of items stored (at least about 30% larger). Microsoft uses a load factor of 72%, i.e. at least 28% of the array will be empty (see An Extensive Examination of Data Structures Using C# 2.0 and especially The System.Collections.Hashtable Class
and The System.Collections.Generic.Dictionary Class) Therefore the null/null entries could just represent this free space.
If the array is too small, it will grow automatically; however, when items are removed, the array does not shrink, but the space that will be freed up should be reused when new items are inserted.
If you are in control of this dictionary, you could try to re-create it in order to shrink it:
theDict = new Dictionary<TKey, IEnumerable<KeyValuePair<TKey2, TVal>>>(theDict);
But the problem might arise from the actual (non empty) entries. Your dictionary is static and will therefore never be reclaimed automatically by the garbage collector, unless you assign it another dictionary or null (theDict = new ... or theDict = null). This is only true for the dictionary itself which is static, not for its entries. As long as references to removed entries exist somewhere else, they will persist. The GC will reclaim any object (earlier or later) which cannot be accessed any more through some reference. It makes no difference, whether this object was declared static or not. The objects themselves are not static, only their references.
As #RobertTausig kindly pointed out, since .NET Core 2.1 there is the new Dictionary.TrimExcess(), which is what you actually wanted, but didn't exist back then.
Looks like you need to recycle space in that dict periodically. You can do that by creating a new one: new Dictionary<a,b>(oldDict). Be sure to do this in a thread-safe manner.
When to do this? Either on the tick of a timer (60sec?) or when a specific number of writes has occurred (100k?) (you'd need to keep a modification counter).
A solution could be to call Clear() method on the static dictionary.
In this way, the reference to the dictionary will remain available, but the objects contained will be released.
One of the reasons List's are generally good for adding and removing items is that the internal data representation is allocated larger than needed to reduce the number of reallocations.
Is there a way to make an instance of this class (or another similar class) to grow as needed by a decent chunk size, but to prevent reducing the size of the internal array?
I'm not aware that the internal array ever reduces in size automatically (you can use TrimExcess to manually reduce it). List<T> always increases capacity doubling the size of the internal array whenever it runs out of space. You could write a wrapper class that increases the Capacity however you want if you don't like the built-in policy.
Normally, I'd never have to ask myself whether a given scenario is better suited to a struct or class and frankly I did not ask that question before going the class way in this case. Now that I'm optimizing, things are getting a little confusing.
I'm writing a number crunching application that deals with extremely large numbers containing millions of Base10 digits. The numbers are (x,y) coordinates in 2D space. The main algorithm is pretty sequential and has no more than 200 instances of the class Cell (listed below) in memory at any given time. Each instance of the class takes up approximately 5MB of memory resulting in no more than 1GB in total peak memory for the application. The finished product will run on a 16 core machine with 20GB of RAM and no other applications hogging up the resources.
Here is the class:
// Inheritance is convenient but not absolutely necessary here.
public sealed class Cell: CellBase
{
// Will contain numbers with millions of digits (512KB on average).
public System.Numerics.BigInteger X = 0;
// Will contain numbers with millions of digits (512KB on average).
public System.Numerics.BigInteger Y = 0;
public double XLogD = 0D;
// Size of the array is roughly Base2Log(this.X).
public byte [] XBytes = null;
public double YLogD = 0D;
// Size of the array is roughly Base2Log(this.Y).
public byte [] YBytes = null;
// Tons of other properties for scientific calculations on X and Y.
// NOTE: 90% of the other fields and properties are structs (similar to BigInteger).
public Cell (System.Numerics.BigInteger x, System.Numerics.BigInteger y)
{
this.X = x;
this.XLogD = System.Numerics.BigInteger.Log(x, 2);
this.XBytes = x.ToByteArray();
this.Y = y;
this.YLogD = System.Numerics.BigInteger.Log(y, 2);
this.YBytes = y.ToByteArray();
}
}
I chose to use a class instead of a struct simply because it 'felt' more natural. The number of fields, methods and memory all instinctively pointed to classes as opposed to structs. I further justified that by considering how much overhead temporary assignment calls would have since the underlying primary objects are instances of BigInteger, which itself is a struct.
The question is, have I chosen wisely here considering speed efficiency is the ultimate goal in this case?
Here's a bit about the algorithm in case it helps. In each iteration:
Sorting performed once on all 200 instances. 20% of execution time.
Calculating neighboring (x,y) coordinates of interest. 60% of execution time.
Parallel/Threading overhead for point 2 above. 10% of execution time.
Branching overhead. 10% of execution time.
The most expensive function: BigInteger.ToByteArray() (implementation).
This would be better fit as a class, for many reasons, including
It doesn't logically represent a single value
It's larger than 16 bytes
It's mutable
For details, see Choosing Between Classes and Structures.
In addition, I'd also suggest that it's better suited to a class given:
It contains reference types (arrays). Structures containing classes are rarely a good design idea.
This is especially true, though, given what you're doing. If you were to use a struct, sorting would require copies of the entire struct, instead of just copies of the references. Method calls (unless passed by ref) would incur a huge overhead, as well, since you'd be copying all of the data.
Parallelization of items in a collection could also likely incur huge overhead, as the bounds checking on any array of the struct (ie: if it's kept in a List<Cell> or similar) would cause bad false sharing, since all access into the list would access the memory at the start of the list.
I would recommend leaving this as a class, and, in addition, I would suggest trying to move the fields into properties, and making the class as immutable as possible. This will help keep your design clean, and less likely to be problematic when multithreading.
It's hard to tell based on what you've written (we don't know how often you end up copying a value of type Cell for example) but I would strongly expect a class to be the correct approach here.
The number of methods in the class is irrelevant, but if it has lots of fields you need to consider the impact of copying all those fields any time you pass a value to another method (etc).
Fundamentally it doesn't feel like a value type to start with - but I understand that if performance is particularly important, the philosophical aspects may not be as interesting to you.
So yes, I think you've made the right decision, and I see no reason to believe anything else at the moment - but of course if you can easily change the decision and test it as a struct, that would be better than guesswork. Performance is remarkably difficult to predict accurately.
Since your class does contain arrays which do consume most of your memory and you have only 200 Cell Instances around the memory consumption of the class itself is not an issue. You were right that a class felt more natural it is indeed the right choice. My guess would be that the comparison of XByte[] and XYBytes[] does limit your sorting time. It all depends how big your arrays are and how you do perform the comparison.
Let's start ignoring the performance matters, and work up to them.
Structs are ValueTypes and ValueTypes are value-types. Integer's and DateTime's are value-types and a good comparison. There's no sense in talking about how one 1 is or isn't the same as 1, or how one 2010-02-03T12:45:23.321Z is or isn't the same as another 2010-02-03T12:45:23.321Z. They may have different significance in different uses, but that 1 == 1 and 1 != 2 and that 2010-02-03T12:45:23.321Z == 2010-02-03T12:45:23.321Z and 2010-02-03T12:45:23.321Z != 2931-03-05T09:21:29.43Z is inherent to the nature of integers and date-times and that's what makes them value-types.
That's the purest way of thinking about this. If it matches the above it's a value-type, if it doesn't, it's a reference type. Nothing else comes into it.
Extension 1: If an X can have an X then it has to be a reference type. Whether this logically follows from what was said above is debatable, but whatever you think on the matter you can't have a struct that has an instance of another one of itself as a member (directly or indirectly) in practice, so that's that.
Extension 2: Some say that the difficulties that come from mutable structs come from the above, and some do not. Again though, whatever you think on the matter, there are practical difficulties. A mutable struct can be useful in a few cases, but they cause enough confusion that they should be restricted to private cases as an optimisation rather than public cases as a matter of course.
Here comes the performance bit...
Value types and reference types have different characteristics in different cases that affects the speed, the memory use, and the way that memory use affects garbage collection in several ways giving each different pros and cons as far as performance goes. Just how much attention we pay to that, depends on how much we need to get down to that level. It's worth saying right now that the ways in which they differ tends to balance to a win if you follow the above rule on deciding between struct and class so if we start thinking about this beyond that, we're at least bordering on optimisation territory.
Optimisation level 1.
If a value type instance will contain more than 16bytes per instance, it should probably be made a reference. This is sometimes even stated as a "natural" difference rather than one of optimisation. Strictly there's nothing in "value type" that entails "16 or fewer bytes" but it does tend to balance out that way.
Moving away from the simplistic "16 bytes" rule, the smaller it is the faster it is to copy, and contrary-wise, so bending it for a 20-byte instance is of less impact than bending it for a 200-byte instance.
Will you need to box and unbox a lot? Since the introduction of generics we've been able to avoid a lot of cases where we would box and unbox with 1.0 and 1.1, so this isn't as big a deal as once, but if you do it will hurt performance.
Optimisation level 2.
The fact that value types can be place on a stack, placed directly in an array (rather than references to them) and be direct fields of a struct or class (again, rather than references to them) can make access to them and to their fields faster.
If you're going to create an array of them and if all-zero values are a useful starting point for you, you get that immediately, where as with reference types you get an array of nulls. This can make structs faster.
Edit: Something that extends from the above, if you are going to be iterating through arrays rapidly, then as well as the direct-access giving a boost over following the reference, you'll be loading a couple of instances into CPU cache at a time (64 bytes worth on current x86-32 or x86-64/amd, 128 bytes worth on ia-64). It has to be a pretty tight loop to matter, but there are cases where it does.
Pretty much most "I went for struct rather than class for performance" comes down to either the first point, or the first in combination with the second.
Optimisation level 3.
If you will have cases where some of the values you are concerned with are duplicates of each other, and they are large in size, then with immutable instances (or mutable instances you simply never mutate once you start doing what follows), you can deliberately alias different references so that you save a lot of memory because your e.g. 20 duplicate objects of 2kiB in size are actually the same object, hence saving 26kiB in that case. It can also make comparisons faster because the cases where you can short-cut on identity are more frequent. This can only be done with reference types.
Optimisation level 4.
Structs that have arrays do though alias the contained array and could internally use the above technique, balancing out that point, though it's somewhat more involved.
Optimisation level X.
It doesn't matter how much thinking about these pros and cons comes to a particular answer, if actually measuring the results comes to a different ones. Since there are both pros and cons, it's always possible to get this wrong.
In thinking about 1 through 4, along with the differences between value and reference types aside from such optimisation concerns, I think you should go for a class.
In thinking about level X I wouldn't be amazed if your actually testing it proved me wrong. The best bit is, if it is arduous to change from class to struct (you make heavy use of aliasing or the possibility of null value), then you can be pretty confident that doing so is a lose. If it isn't arduous, then you can just do so and measure! I'd strongly suggest measuring a test that involves a real run over doing something 10,000 times - who cares if you can do a given operation 10,000 times in a few less seconds if you do a different operation 20 times more often in the real thing?
A struct can only contain an array-type field safely if either (1) the state of the struct depends upon the identity of the array rather than its contents (as is the case with ArraySegment), or (2) no reference to the array will ever be held by anything that might try to mutate it (typically, this means that the array field will be private, and the struct itself will create the array and perform all modifications that will ever be done to it, before storing a reference in the field).
I advocate using structs much more commonly than other people here, but the fact that your data storage thingie would have two array-type fields would seem a strong argument against using a struct.