Could someone explain to a C++ programmer most important differences between Java (and C# as well) references and shared_ptr (from Boost or from C++0x).
I more or less aware how shared_ptr is implemented. I am curious about differences in the following ares:
1) Performance.
2) Cycling. shared_ptr can be cycled (A and B hold pointers to each other). Is cycling possible in Java?
3) Anything else?
Thank you.
Performance: shared_ptr performs pretty well, but in my experience is slightly less efficient than explicit memory management, mostly because it is reference counted and the reference count has to allocated as well. How well it performs depends on a lot of factors and how well it compares to Java/C# garbage collectors can only be determined on a per use case basis (depends on language implementation among other factors).
Cycling is only possible with weak_ptr, not with two shared_ptrs. Java allows cycling without further ado; its garbage collector will break the cycles. My guess is that C# does the same.
Anything else: the object pointed to by a shared_ptr is destroyed as soon as the last reference to it goes out of scope. The destructor is called immediately. In Java, the finalizer may not be called immediately. I don't know how C# behaves on this point.
The key difference is that when the shared pointer's use count goes to zero, the object it points to is destroyed (destructor is called and object is deallocated), immediately. In Java and C# the deallocation of the object is postponed until the Garbage Collector chooses to deallocate the object (i.e., it is non-deterministic).
With regard to cycles, I am not sure I understand what you mean. It is quite common in Java and C# to have two objects that contain member fields that refer to each other, thus creating a cycle. For example a car and an engine - the car refers to the engine via an engine field and the engine can refer to its car via a car field.
Nobody pointed the possibility of moving the object by the memory manager in managed memory. So in C# there are no simple references/pointers, they work like IDs describing object which is returned by the manager.
In C++ you can't achieve this with shared_ptr, because the object stays in the same location after it has been created.
First of all, Java/C# have only pointers, not references, though they call them that way. Reference is a unique C++ feature. Garbage collection in Java/C# basically means infinite life-time. shared_ptr on the other hand provides sharing and deterministic destruction, when the count goes to zero. Therefore, shared_ptr can be used to automatically manage any resources, not just memory allocation. In a sense (just like any RAII design) it turns pointer semantics into more powerful value semantics.
Cyclical references with C++ reference-counted pointers will not be disposed. You can use weak pointers to work around this. Cyclical references in Java or C# may be disposed, when the garbage collector feels like it.
When the count in a C++ reference-counted pointer drops to zero, the destructor is called. When a Java object is no longer reachable, its finalizer may not be called promptly or ever. Therefore, for objects which require explicit disposal of external resources, some form of explicit call is required.
Related
I need to dispose of an object so it can release everything it owns, but it doesn't implement the IDisposable so I can't use it in a using block. How can I make the garbage collector collect it?
You can force a collection with GC.Collect(). Be very careful using this, since a full collection can take some time. The best-practice is to just let the GC determine when the best time to collect is.
Does the object contain unmanaged resources but does not implement IDisposable? If so, it's a bug.
If it doesn't, it shouldn't matter if it gets released right away, the garbage collector should do the right thing.
If it "owns" anything other than memory, you need to fix the object to use IDisposable. If it's not an object you control this is something worth picking a different vendor over, because it speaks to the core of how well your vendor really understands .Net.
If it does just own memory, even a lot of it, all you have to do is make sure the object goes out of scope. Don't call GC.Collect() — it's one of those things that if you have to ask, you shouldn't do it.
You can't perform garbage collection on a single object. You could request a garbage collection by calling GC.Collect() but this will effect all objects subject to cleanup. It is also highly discouraged as it can have a negative effect on the performance of later collections.
Also, calling Dispose on an object does not clean up it's memory. It only allows the object to remove references to unmanaged resources. For example, calling Dispose on a StreamWriter closes the stream and releases the Windows file handle. The memory for the object on the managed heap does not get reclaimed until a subsequent garbage collection.
Chris Sells also discussed this on .NET Rocks. I think it was during his first appearance but the subject might have been revisited in later interviews.
http://www.dotnetrocks.com/default.aspx?showNum=10
This article by Francesco Balena is also a good reference:
When and How to Use Dispose and Finalize in C#
http://www.devx.com/dotnet/Article/33167/0/page/1
Garbage collection in .NET is non deterministic, meaning you can't really control when it happens. You can suggest, but that doesn't mean it will listen.
Tells us a little bit more about the object and why you want to do this. We can make some suggestions based off of that. Code always helps. And depending on the object, there might be a Close method or something similar. Maybe the useage is to call that. If there is no Close or Dispose type of method, you probably don't want to rely on that object, as you will probably get memory leaks if in fact it does contain resourses which will need to be released.
If the object goes out of scope and it have no external references it will be collected rather fast (likely on the next collection).
BEWARE: of f ra gm enta tion in many cases, GC.Collect() or some IDisposal is not very helpful, especially for large objects (LOH is for objects ~80kb+, performs no compaction and is subject to high levels of fragmentation for many common use cases) which will then lead to out of memory (OOM) issues even with potentially hundreds of MB free. As time marches on, things get bigger, though perhaps not this size (80 something kb) for LOH relegated objects, high degrees of parallelism exasperates this issue due simply due to more objects in less time (and likely varying in size) being instantiated/released.
Array’s are the usual suspects for this problem (it’s also often hard to identify due to non-specific exceptions and assertions from the runtime, something like “high % of large object heap fragmentation” would be swell), the prognosis for code suffering from this problem is to implement an aggressive re-use strategy.
A class in Systems.Collections.Concurrent.ObjectPool from the parallel extensions beta1 samples helps (unfortunately there is not a simple ubiquitous pattern which I have seen, like maybe some attached property/extension methods?), it is simple enough to drop in or re-implement for most projects, you assign a generator Func<> and use Get/Put helper methods to re-use your previous object’s and forgo usual garbage collection. It is usually sufficient to focus on array’s and not the individual array elements.
It would be nice if .NET 4 updated all of the .ToArray() methods everywhere to include .ToArray(T target).
Getting the hang of using SOS/windbg (.loadby sos mscoreei for CLRv4) to analyze this class of issue can help. Thinking about it, the current garbage collection system is more like garbage re-cycling (using the same physical memory again), ObjectPool is analogous to garbage re-using. If anybody remembers the 3 R’s, reducing your memory use is a good idea too, for performance sakes ;)
I'm basically a C++ guy trying to venture into C#. From the basic tutorial of C#, I happen to find that all objects are created and stored dynamically (also true for Java) and are accessed by references and hence there's no need for copy constructors. There is also no need of bitwise copy when passing objects to a function or returning objects from a function. This makes C# much simpler than C++.
However, I read somewhere that operating on objects exclusively through references imposes limitations on the type of operations that one can perform thus restricting the programmer of complete control. One limitation is that the programmer cannot precisely specify when an object can be destroyed.
Can someone please elaborate on other limitations? (with a sample code if required)
Most of the "limitations" are by design rather than considered a deficiency (you may not agree of course)
You cannot determine/you don't have to worry about
when an object is destroyed
where the object is in memory
how big it is (unless you are tuning the application)
using pointer arithmetic
accessing out side an object
accessing an object with the wrong type
sharing objects between threads is simpler
whether the object is on the stack or the heap. (The stack is being used more and more in Java)
fragmentation of memory (This is not true of all collectors)
Because of Garbage collection done in java we cannot predict when the object will get destroyed but it performs the work of destructor.
If you want to free up some resources then you can use finally block.
try {
} finally{
// dispose resources.
}
Having made a similar transition, the more you look into it, the more you do have to think about C#'s GC behaviour in all but the most straighforward cases. This is especially true when trying to handle unmanaged resources from managed code.
This article highlights a lot of the issues you may be interested in.
Personally I miss a reference counted alternative to IDisposable (more like shared_ptr), but that's probably a hangover from a C++ background.
The more I have to write my own plumbing to support C++ like programming, the more likely it is there is another C# mechanism I've overlooked, or I end up getting frustrated with C#. For example, swap and move are not common idioms in C# as far as I've seen and I miss them: other programmers with a C# background may well disagree about how useful those idioms are.
Sorry for being confused, at C++ I know to return local variable's reference or pointers can cause bad_reference exception. I am not sure how it is in C# ?
e.g
List<StringBuilder> logs = new List<StringBuilder>();
void function(string log)
{
StringBuilder sb = new StringBuilder();
logs.Add(sb);
}
at this function a local object is created and stored in a list, is that bad or must be done in another way. I am really sorry for asking this, but I am confused after coding C++ for 2 months.
Thanks.
Your C# code doesn't return an object reference so it doesn't match your concern. It is however a problem that doesn't exist in C#. The CLR doesn't let you create objects on the stack, only the heap. And the garbage collector makes sure that object references stay valid.
In C#, the garbage collector manages all the (managed) objects you create. It will not delete one unless there are no longer any references to it.
So that code is perfectly valid. logs keeps a reference to the StringBuilder. The garbage collector knows this, so it will not clean it up even after the context in which it was originally created goes out of scope.
In C# object lifecycle is managed for you by the CLR; compared to C++ where you have to match each new with a delete.
However in C# you can't do
void fun()
{
SomeObject sb(10);
logs.Add(sb);
}
i.e. allocating on the stack you have to use new - so in this respect both C# and C++ work similarly - except when it comes to releasing / freeing the object reference.
It is still possible to leak memory in C# - but it's harder than in C++.
There's nothing wrong with the code you've written. This is mostly because C#, like any .NET language, is a "managed" language that does a lot of memory management for you. To get the same effect in C++ you would need to explicitly use some third-party library.
To clear up some of the basics for you:
In C#, you rarely deal with "pointers" or "references" directly. You can deal with pointers, if you need to, but that is "unsafe" code and you really avoid that kind of thing unless you know what you're doing. In the few cases where you do deal with references (e.g. ref or out parameters) the language hides all the details from you and lets you treat them as normal variables.
Instead, objects in C# are defined as instances of reference types; whenever you use an instance of a reference type, it is similar to using a pointer except that you don't have to worry about the details. You create new instances of references types in C# in the same way that you create new instances of objects in C++, using the new operator, which allocates memory, runs constructors, etc. In your code sample, both StringBuilder and List<StringBuilder> are reference types.
The key aspect of managed languages that is important here is the automatic garbage collection. At runtime, the .NET Framework "knows" which objects you have created, because you're always creating them from it's own internally-managed heap (no direct malloc or anything like that in C#). It also "knows" when an object has gone completely out of scope -- when there are no more references to it anywhere in your program. Once that happens, the runtime is able to free the memory whenever it wants to, typically when it starts to run low on free memory, and you never have to do it. In fact, there is no way in C# to explicitly destroy a managed object (though you do have to clean up unmanaged resources if you use them).
In your example, the runtime knows that you've created a StringBuilder and put it into a List<>; it will keep track of that object, and as long as it's in the List<> it will stick around. Once you either remove it from the List<>, or the List<> itself goes away, the runtime will automatically clean up the StringBuilder for you.
SO I already know about memory management in objective C, and I never had to know about it while programming in .net (C#). But i still have some questions about how everything is done.
-Why does the code leak in objective c if we allocate an object and not release it?
-Why doesn't this leak in C#?
-What are some advantages and disadvantages of automatic-garbage-collecting?
-Why not use autorelease on every allocated object (Objective C)?
-Is it possible to take care of the memory manually (C#)? so let's say i instantiate an object, and when I'm done i want to release it, and i don't want to wait for the garbage collector to do it?
It leaks in Objective-C because Objective-C doesn’t take any action on it. It relies on you doing all the work. It doesn’t leak in C# (more precisely, in .NET) because it employs a garbage collector which cleans up objects that are no longer used.
The main advantage of garbage collection is the above: you have far fewer memory leaks. (It’s still possible to have a memory leak, e.g. by filling a list indefinitely, but that’s harder to do accidentally.) It used to be thought that garbage collection has a disadvantage in that it could slow down the program because it keeps doing the garbage collection in the background and you have little control over it. In reality, however, the difference is negligible: there are other background tasks on your computer (e.g. device drivers) running all the time, the garbage collector doesn’t break the camel’s back either.
Auto-deallocation (as it is employed in C++ when a non-pointer variable goes out of scope) is dangerous because it opens the possibility to have a reference to it still in existence even after the object has been disposed. If your code then tries to access the object, the process goes kaboom big time.
Yes, it is possible to tell C# to release memory by invoking the garbage collector directly (GC.Collect()). However, I have yet to see a case where this is at all necessary. If you actually run out of memory, the garbage collector will already kick in automatically and free as much as it can.
Objective-C isn't a garbage-collected language, so it has no way of knowing that an object won't be used anymore unless you tell it. That's the purpose of the .NET garbage collector: it checks to see which objects can no longer be used by the program, and- at some point- gets rid of them to free up memory. There are no guarantees as to when, or if, it will ever free any given abandoned object; it's just trying to keep memory usage from going out of control.
C# can't release an object without the garbage collector. If you release an object that's still being referenced, your program is going to crash when you try to use that. Which is always the risk of manual memory management, but like all "memory-managed languages", it is trying to prevent you from making exactly that mistake. If you want to explicitly shut down an object's operation, implement the interface IDisposable for that object's type, and use the Dispose() method on that object- essentially a destructor. Be sure you're done with it, of course, and that the object will behave correctly (by throwing exceptions) if something tries to use it after it's been Dispose()d of.
Objective-C is reference-counted. When an object is out of references, it deletes itself. It's not a bad solution to the "is someone still using this object?" problem, except for data structures that refer to themselves; circular data structures will hang around forever unless carefully handled. .NET isn't a reference counter, so it will get rid of circular data structures that can't be reached from running code.
Autorelease is just a "release later", for returning a value that should self-destruct if the code that grabs it doesn't immediately want to hold onto it, as far as I understand. (I'm not an Objective-C programmer, though.) It gets around the "who releases this object?" problem for calls that return an object, without destroying it before the function is finished. It's a special use case, though, and it doesn't make sense in most other cases.
The advantage of automatic garbage collection is that you do not have to explicitly free/release your objects as you said. The disadvantage is you cannot be sure when (or even if) any given object instance will be released.
C# has other mechanisms to release other resources like files, or db connections that have to be released detirminitely. For example, using allows you to make sure that IDispose is called on an object for sure.
Garbage collected systems tend to have more memory in use at any given time than a well tuned manual implementation. Of course, you do not have memory leaks.
The quality and performance of garbage collectors can vary quite a bit which is something you may not have a lot of control over. For example, there may be a noticable lag when the GC runs. In .NET, you can call GC.Collect() to tell the GC that now would be a good time but it may or may not listen.
These days Objective-C is also garbage collected on the Mac. On the iPhone it is reference counted though so I assume that is where you are running into the issue.
On garbage collected systems, you can still run into the issue where a an object hangs onto a reference to an object that you would expect to be garbage collected. The garbage collector will not clean-up this up and so the memory is essentially leaked.
In reference counted systems, you do not have to keep track of every object that points at your object instances but you do have to specify that you want them released.
EDIT: I guess I did not say this explicitly - you cannot manually control memory allocation in C#.
I understand that in a managed language like Java or C# there is this thing called a garbage collector that every once in a while checks to see if there are any object instances that are no longer referenced, and thus are completely orphaned, and clears then out of memory. But if two objects are not referenced by any variable in a program, but reference each other (like an event subscription), the garbage collector will see this reference and not clear the objects from memory.
Why is this? Why can't the garbage collector figure out that neither of the objects can be reference by any active part of the running program and dispose them.
Your presumption is incorrect. If the GC 'sees' (see below) a cyclic reference of 2 or more objects (during the 'Mark' phase) but are not referenced by any other objects or permanent GC handles (strong references), those objects will be collected (during the 'Sweep' phase).
An in-depth look at the CLR garbage collector can be found in this MSDN article and in this blog post.
Note: In fact, the GC doesn't even 'see' these types of objects during the mark phase since they are unreachable, and hence get collected during a sweep.
Most GCs don't work with reference counting anymore. They usually (and this is the case both in Java and .NET) work with reach-ability from the root set of objects. The root set of objects are globals and stack referenced instances. Anything reachable from that set directly or indirectly is alive. The rest of the memory is unreachable and thus prone to be collected.
This is a major drawback of traditional reference counting garbage collection. The property of a garbage collector describing this behavior is an incomplete collector. Other collectors largely fall into a category called tracing garbage collectors, which include traditional mark-sweep, semi-space/compacting, and generational hybrids, and don't suffer from these drawbacks (but face several others).
All of the JVM and CLI implementations I'm aware of use complete collectors, which means they don't suffer from the specific problem you are asking about here. To my knowledge, of those Jikes RVM is the only one supplying a reference counting collector (one of its many).
Another interesting thing to note is there are solutions to the completeness problem in reference counting garbage collection, and the resulting collectors demonstrate some interesting performance properties that are tough to get out of tracing collectors. Unfortunately, the highest performing reference-counting garbage collection algorithms and most completeness modifications rely on compiler assistance, so bringing them to C++'s shared_ptr<T> are difficult/not happening. Instead, we have weak_ptr<T> and documented rules (sorry about the sub-optimal link - apparently the documentation eludes me) about simply avoiding the problems. This isn't the first time (another mediocre link) we've seen this approach, and the hope is the extra work to prevent memory problems is less than the amount of work required to maintain code that doesn't use shared_ptr<T>, etc.
The mediocre links are because much of my reference material is scattered in notes from last semester's memory management class.
I would like to add that issues surrounding event subscription usually revolve around the fact that the subscriber & publisher have very different lifecycles.
Attach yourself e.g. to the App.Idle event in Windows Forms and your object will be kept alive for the remaining application lifetime. Why? That static App will hold a reference (albeit indirectly through a delegate) to your registered observer. Even though you may have disposed of your observer, it is still attached to App.Idle. You can construct many of those examples.
The other answers here are certainly correct; .NET does it's garbage collection based on reachability of an object.
What I wanted to add: I can recommend reading Understanding Garbage Collection in .NET (simple-talk article by Andrew Hunter) if you want a little more in-depth info.