I have a library which returns a hierarchical list composed of IDictionary, IList and primitive types (string, and ints). At present I cannot change how this data is returned.
I have another strongly typed class which is consuming this data and converting it into business objects. There is a list of "properties" in the data returned, which I want to import into my strongly typed class. I then can dispose of the hierarchy.
My question is this: If I do this:
MyCustomClass.Properties = HierarchicalData["some_name"]
Where MyCustomClass is my strongly typed class, and HierarchicalData is the IDictionary data what happens when I later call:
HierarchicalData = null
Can Hierarchical data be disposed and released? "some_data" in this case is another Dictionary and so technically that is all that needs to be kept. Do I need to do a explicit copy instead of importing, such as:
MyCustomClass.Properties = HierarchicalData["some_name"].ToDictionary<string, string>( /* selector */)
Clarification: I am not worried about the dictionary containing the properties being garbage collected. I want to make sure that HierarchicalData __can__ be deleted as it is quite large and I need to work with several of them.
Yes. Once there are no references to HierarchicalData, it will be a candidate for collection.
Since you have a reference to the data stored for the "some_name" key, that specific element (the other dictionary) will not be collected. However, the other, unreferenced, portions will become unrooted as far as the GC is concerned, and get finalized at some point.
This will work as you expect. Because you will have created a reference to the dictionary referenced by HierarchicalData["some_name"] in another place, the garbage collector will keep it around for you.
You definitely do not need to copy the dictionary.
Assuming that the class returns a standard Dictionary<TKey, TValue>, you probably don't need to do anything else.
The objects probably don't hold references to the dictionary that contains them, so they probably won't prevent the dictionary from being collected.
However, there's no way to be sure without checking; you should inspect the objects in the Visual Studio debugger's Watch window (or look at the source) and see whether they reference the dictionary.
You do not need to perform a copy.
The line:
MyCustomClass.Properties = HierarchicalData["some_name"]
assigns a reference, and while a reference to the object is alive, it will not be garbage collected.
Can Hierarchical data be disposed and released?
You mean by the GC? Not in this case, as it's referenced by your object(s). GC is not going to mess it up.
Related
I'm new to object oriented code, and I have a question if this code is safe.
I have a local List "TempCand" that is assigned to a static member List "Candidates" of the class. When I leave the search method, I fear that the memory of my local variable is subject to garbage collection, which would then affect my static variable. Or is this ok?
public class search
{
class candidate
{
// ...
}
static List <candidate> Candidates = new List <candidate>();
static public void clean_Candidates()
{
List <candidate> TempCand = new List <candidate>();
// ...
// copy some elements of Candidates in clean_Candidates()
// ...
Candidates = TempCand;
}
}
I fear that the memory of my local variable is subject to garbage collection
One of the things that means something can't be garbage collected, is that there is an in-use local variable that references it.
which would then affect my static variable
Another of the things that means something can't be garbage collected, is that there is a static field that references it.
Local variables themselves aren't something that garbage collection affects at all; at some point when the method isn't using it (which may be when the scope is left, or may be before then) the local memory can just be re-used for something else. You pretty much will never care, because if you ever go to use it, then by definition you aren't at a point where it will never be used (there's an exception around timers and weak references, but you're using neither here).
Now, if that local variable is a reference type, then it could have been something that was keeping the object referenced alive. However, that again won't generally be visible, because this is only if it's the only reference to this object.
When the garbage collector kicks in, the first thing it does is to find all the things it can't collect:
Anything in a local variable that is in use.
Anything in a static field.
Anything in a field of an object the above two rules have said can't be collected.
Anything in a field that rule 3 said can't be collected, applying this rule recursively.
If you can "see" it, the GC can't collect it.
GC can't affect your static field, because by definition, being in a static field makes it off-limits.
TempCand and Candidates are references toward some data (a list of Candidate objects, in this case). Of course, you can create and destroy additional references without affecting the objects at all.
So, one possible scenario is this:
You create the Candidates list and put in 5 Candidate objects inside.
As the objects are referenced by the list, and the list is referenced by the static property, the objects will not be garbage collected.
You call the CleanCandidates static method. Within it, you create another reference (tempCand) to a list of candidates. You add references to three of the Candidate objects to the new list.
If a garbage run occurs it this moment, none of the Candidate objects will be collected, as they are still referenced from the static list.
You set that the static Candidates property now points to the new list and exit the method. If a garbage run occurs now, it can collect
The old list that Candidates used to reference. It's not referenced by anything now.
The two Candidate objects that were in the old list, but not in the new list, as there is nothing that references them now.
The tempCand reference, as it is out of scope - the method is finished, and a new reference will be created if it is called again. Note that the list that tempCand references will not be collected, as it's available now through the Candidates static field. (Note that collection of local fields is usually done "magically" with stack rewind)
So, the candidates you want alive will be (and stay alive) and the candidates that you threw out will eventually be garbage collected (if not referenced by some other, not shown code).
That said, the code shown is not thread safe. You could potentially execute CleanCandidates twice, and wreak havoc on the collection. You have to be very, very careful as your static list is effectively shared application state.
I want to create a collection class that can collect any type of data (string, int, float). So I decided to use a List<object> structure to store any kind of data.
Since using a List structure is safer (managed) than creating an unmanaged array, I want to create a List structure so it can hold any kind of data... but I have some concerns that if I create a List<object> structure and try to hold some strings, there could be memory leaks because of string type..
So do I have to do somethings after using (emptying) the List and deallocate the strings individualy or does .Net already handle that...
Is there a nicer method for creating general collection class?
You won´t need to GarbageCollect any objects on your own as long as you really need to, but according to your post that´s not the case here, actually this is only necessary in a few cases (you may look here what these cases might be).
However .NET frees any memory to which no references exist (indepenedend if you have an int, a string or any custom object), thus if you leave the scope of your array, list or whatever you use the contained elements will be eliminated at some none-deterministic point of time by the GC, but you won´t take care for this.
What you mean by managed and unmanaged is probably the fact, that a List is a bit more dynamic as it can change its size depending on the current number of elements. If this number exceeds a given maximum the list automatically increaes by a given factor. An array is fixed in size however. The term "unmanaged" however relies to C++ e.g., C#-Code is allways managed code (which means there is a garbage-collector e.g.). See Wikipedia on Managed Code
I had some problems with a WCF web service (some dumps, memory leaks, etc.) and I run a profillng tool (ANTS Memory Profiles).
Just to find out that even with the processing over (I run a specific test and then stopped), Generation 2 is 25% of the memory for the web service. I tracked down this memory to find that I had a dictionary object full of (null, null) items, with -1 hash code.
The workflow of the web service implies that during specific processing items are added and then removed from the dictionary (just simple Add and Remove). Not a big deal. But it seems that after all items are removed, the dictionary is full of (null, null) KeyValuePairs. Thousands of them in fact, such that they occupy a big part of memory and eventually an overflow occurs, with the corresponding forced application pool recycle and DW20.exe getting all the CPU cycles it can get.
The dictionary is in fact Dictionary<SomeKeyType, IEnumerable<KeyValuePair<SomeOtherKeyType, SomeCustomType>>> (System.OutOfMemoryException because of Large Dictionary) so I already checked if there is some kind of reference holding things.
The dictionary is contained in a static object (to make it accesible to different processing threads through processing) so from this question and many more (Do static members ever get garbage collected?) I understand why that dictionary is in Generation 2. But this is also the cause of those (null, null)? Even if I remove items from dictionary something will be always occupied in the memory?
It's not a speed issue like in this question Deallocate memory from large data structures in C# . It seems that memory is never reclaimed.
Is there something I can do to actually remove items from dictionary, not just keep filling it with (null, null) pairs?
Is there anything else I need to check out?
Dictionaries store items in a hash table. An array is used internally for this. Because of the way hash tables work, this array must always be larger than the actual number of items stored (at least about 30% larger). Microsoft uses a load factor of 72%, i.e. at least 28% of the array will be empty (see An Extensive Examination of Data Structures Using C# 2.0 and especially The System.Collections.Hashtable Class
and The System.Collections.Generic.Dictionary Class) Therefore the null/null entries could just represent this free space.
If the array is too small, it will grow automatically; however, when items are removed, the array does not shrink, but the space that will be freed up should be reused when new items are inserted.
If you are in control of this dictionary, you could try to re-create it in order to shrink it:
theDict = new Dictionary<TKey, IEnumerable<KeyValuePair<TKey2, TVal>>>(theDict);
But the problem might arise from the actual (non empty) entries. Your dictionary is static and will therefore never be reclaimed automatically by the garbage collector, unless you assign it another dictionary or null (theDict = new ... or theDict = null). This is only true for the dictionary itself which is static, not for its entries. As long as references to removed entries exist somewhere else, they will persist. The GC will reclaim any object (earlier or later) which cannot be accessed any more through some reference. It makes no difference, whether this object was declared static or not. The objects themselves are not static, only their references.
As #RobertTausig kindly pointed out, since .NET Core 2.1 there is the new Dictionary.TrimExcess(), which is what you actually wanted, but didn't exist back then.
Looks like you need to recycle space in that dict periodically. You can do that by creating a new one: new Dictionary<a,b>(oldDict). Be sure to do this in a thread-safe manner.
When to do this? Either on the tick of a timer (60sec?) or when a specific number of writes has occurred (100k?) (you'd need to keep a modification counter).
A solution could be to call Clear() method on the static dictionary.
In this way, the reference to the dictionary will remain available, but the objects contained will be released.
I have seen code with the following logic in a few places:
public void func()
{
_myDictonary["foo"] = null;
_myDictionary.Remove("foo");
}
What is the point of setting foo to null in the dictionary before removing it?
I thought the garbage collection cares about the number of things pointing to whatever *foo originally was. If that's the case, wouldn't setting myDictonary["foo"] to null simply decrease the count by one? Wouldn't the same thing happen once myDictonary.Remove("foo") is called?
What is the point of _myDictonary["foo"] = null;
edit: To clarify - when I said "remove the count by one" I meant the following:
- myDictonary["foo"] originally points to an object. That means the object has one or more things referencing it.
- Once myDictonary["foo"] is set to null it is no longer referencing said object. This means that object has one less thing referencing it.
There is no point at all.
If you look at what the Remove method does (using .NET Reflector), you will find this:
this.entries[i].value = default(TValue);
That line sets the value of the dictionary item to null, as the default value for a reference type is null. So, as the Remove method sets the reference to null, there is no point to do it before calling the method.
Setting a dictionary entry to null does not decrease the ref count, as null is a perfectly suitable value to point to in a dictionary.
The two statements do very different things. Setting the value to null indicates that that is what the value should be for that key, whereas removing that key from the dictionary indicates that it should no longer be there.
There isn't much point to it.
However, if the Remove method causes heap allocations, and if the stored value is large, a garbage collection can happen when you call Remove, and it can also collect the value in the process (potentially freeing up memory). Practically, though, people don't usually worry about small things like this, unless it's been shown to be useful.
Edit:
Forgot to mention: Ideally, the dictionary itself should worry about its own implementation like this, not the caller.
It doesn't make much sense there, but there are times when it does make sense.
One example is in a Dispose() method. Consider this type:
public class Owner
{
// snip
private BigAllocation _bigAllocation;
// snip
protected virtual void Dispose(bool disposing)
{
if (disposing)
{
// free managed resources
if (_bigAllocation != null)
{
_bigAllocation.Dispose();
_bigAllocation = null;
}
}
}
}
Now, you could argue that this is unnecessary, and you'd be mostly right. Usually Dispose() is only called before Owner is dereferenced, and when Owner is collected, _bigAllocation will be, as well... eventually.
However:
Setting _bigAllocation to null makes it eligible for collection right away, if nobody else has a reference to it. This can be advantageous if Owner is in a higher-numbered GC generation, or has a finalizer. Otherwise, Owner must be released before _bigAllocation is eligible for collection.
This is sort of a corner case, though. Most types shouldn't have finalizers, and in most cases _bigAllocation and Owner would be in the same generation.
I guess I could maybe see this being useful in a multi-threaded application where you null the object so no other thread can operate on it. Though, this seems heavy handed and poor design.
If that's the case, wouldn't setting myDictonary["foo"] to null
simply decrease the count by one?
Nope, the count doesn't change, the reference is still in the dictionary, it points to null.
I see no reason for the code being the way it is.
I don't know about the internals of Dictionary in particular, but some types of collection may hold references to objects which are effectively 'dead'. For example, a collection may hold an array and a count of how many valid items are in the array; zeroing the count would make any items in the collection inaccessible, but would not destroy any references to them. It may be that deleting an item from a Dictionary ends up making the area that holds the object available for reuse without actually deleting it. If another item with the same hash code gets added to the dictionary, then the first item would actually get deleted, but that might not happen for awhile, if ever.
This looks like an old C++ habit.
I suspect that the author is worried about older collections and/or other languages. If memory serves, some collections in C++ would hold pointers to the collected objects and when 'removed' would only remove the pointer but would not automatically call the destructor of the newly removed object. This causes a very subtle memory leak. The habit became to set the object to null before removing it to make sure the destructor was called.
I have an object in a multi-threaded environment that maintains a collection of information, e.g.:
public IList<string> Data
{
get
{
return data;
}
}
I currently have return data; wrapped by a ReaderWriterLockSlim to protect the collection from sharing violations. However, to be doubly sure, I'd like to return the collection as read-only, so that the calling code is unable to make changes to the collection, only view what's already there. Is this at all possible?
If your underlying data is stored as list you can use List(T).AsReadOnly method.
If your data can be enumerated, you can use Enumerable.ToList method to cast your collection to List and call AsReadOnly on it.
I voted for your accepted answer and agree with it--however might I give you something to consider?
Don't return a collection directly. Make an accurately named business logic class that reflects the purpose of the collection.
The main advantage of this comes in the fact that you can't add code to collections so whenever you have a native "collection" in your object model, you ALWAYS have non-OO support code spread throughout your project to access it.
For instance, if your collection was invoices, you'd probably have 3 or 4 places in your code where you iterated over unpaid invoices. You could have a getUnpaidInvoices method. However, the real power comes in when you start to think of methods like "payUnpaidInvoices(payer, account);".
When you pass around collections instead of writing an object model, entire classes of refactorings will never occur to you.
Note also that this makes your problem particularly nice. If you don't want people changing the collections, your container need contain no mutators. If you decide later that in just one case you actually HAVE to modify it, you can create a safe mechanism to do so.
How do you solve that problem when you are passing around a native collection?
Also, native collections can't be enhanced with extra data. You'll recognize this next time you find that you pass in (Collection, Extra) to more than one or two methods. It indicates that "Extra" belongs with the object containing your collection.
If your only intent is to get calling code to not make a mistake, and modify the collection when it should only be reading all that is necessary is to return an interface which doesn't support Add, Remove, etc.. Why not return IEnumerable<string>? Calling code would have to cast, which they are unlikely to do without knowing the internals of the property they are accessing.
If however your intent is to prevent the calling code from observing updates from other threads you'll have to fall back to solutions already mentioned, to perform a deep or shallow copy depending on your need.
I think you're confusing concepts here.
The ReadOnlyCollection provides a read-only wrapper for an existing collection, allowing you (Class A) to pass out a reference to the collection safe in the knowledge that the caller (Class B) cannot modify the collection (i.e. cannot add or remove any elements from the collection.)
There are absolutely no thread-safety guarantees.
If you (Class A) continue to modify the underlying collection after you hand it out as a ReadOnlyCollection then class B will see these changes, have any iterators invalidated, etc. and generally be open to any of the usual concurrency issues with collections.
Additionally, if the elements within the collection are mutable, both you (Class A) and the caller (Class B) will be able to change any mutable state of the objects within the collection.
Your implementation depends on your needs:
- If you don't care about the caller (Class B) from seeing any further changes to the collection then you can just clone the collection, hand it out, and stop caring.
- If you definitely need the caller (Class B) to see changes that are made to the collection, and you want this to be thread-safe, then you have more of a problem on your hands. One possibility is to implement your own thread-safe variant of the ReadOnlyCollection to allow locked access, though this will be non-trivial and non-performant if you want to support IEnumerable, and it still won't protect you against mutable elements in the collection.
One should note that aku's answer will only protect the list as being read only. Elements in the list are still very writable. I don't know if there is any way of protecting non-atomic elements without cloning them before placing them in the read only list.
You can use a copy of the collection instead.
public IList<string> Data {
get {
return new List<T>(data);
}}
That way it doesn't matter if it gets updated.
You want to use the yield keyword. You loop through the IEnumerable list and return the results with yeild. This allows the consumer to use the for each without modifying the collection.
It would look something like this:
List<string> _Data;
public IEnumerable<string> Data
{
get
{
foreach(string item in _Data)
{
return yield item;
}
}
}