Garbage collection performance comparison, java vs .net 4.5 [duplicate]

Garbage collection performance comparison, java vs .net 4.5 [duplicate] - c#

Does anyone know the major differences between the Java and .Net garbage collectors? A web search has not revealed much, and it was a question that came up in a test.

The difference is between the CLR (.Net) GC and the JVM GC rather than the languages themselves.
Both are subject to change and the specification of their behaviour loose to allow this to be changed without it affecting the correctness of programs.
There are some historical differences largely due to .Net being designed with lessons from the evolution of the java (and other gc based platforms). In the following do not assume that the .Net one was in some way superior because it included functionality from the beginning, it is simply the result of coming later.
A notable publicly visible difference is that the MS GC exposes its generational nature (via the GC api) this is likely to remain true for some time since this is an obvious approach to take based on the behaviour that most programs exhibit: Most allocations are extremely short lived.
Initial JVM's did not have generational garbage collectors though this feature was swiftly added.
The first generational collectors implemented by SunOracle and others tended to be Mark and Sweep. It was realized that a mark-sweep-compact approach would lead to much better memory locality justifying the additional copying overhead. The CLR runtime debuted with this behaviour.
A difference between SunOracle's and Microsoft's GC implementation 'ethos' is one of configurability.
Sun's provides a vast number of options (at the command line) to tweaks aspects of the GC or switch it between different modes. Many options are of the -X or -XX to indicate their lack of support across different versions or vendors. The CLR by contrast provides next to no configurability; your only real option is the use of the server or client collectors which optimise for throughput verses latency respectively.
Active research in GC strategies is ongoing in both companies (and in open source implementations) current approaches being used in the most recent GC implementations are per thread eden areas (improving locality and allowing the eden collection to potentially not cause a full pause) as well as pre-tenuring approaches, which try to avoid placing certain allocations into the eden generation.

This is just to add to ShuggyCoUk's excellent answer. The .NET GC also uses what is know as the large object heap (LOH). The CLR preallocates a bunch of objects on the LOH and all user allocated objects of at least 85000 bytes are allocated on the LOH as well. Furthermore, double[] of 1000 elements or more are allocated on the LOH as well due to some internal optimization.
The LOH is handled differently than the generational heaps in various ways:
It is only cleaned during a full collect and it is never compacted like the generational heaps.
Allocation from the LOH is done via a free list much like malloc is handled in the C runtime, whereas allocations from the generational heap is essentially done by just moving a pointer in generation 0.
I don't know if the JVM has something similar, but it is essential information on how memory is handled in .NET so hopefully, you find it useful.

If I recall correctly, the JVM doesn't release deallocated memory back to the operating system as the CLR does.

Java 5 introduced a lot of changes into its GC algorithms.
I'm not a C# maven, but these two articles suggest to me that both have evolved away from simple mark and sweep and towards newer generation models:
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
http://www.csharphelp.com/archives2/archive297.html

I found this:
In the J2SE platform version 1.4.2 there were four garbage collectors from which to choose but without an explicit choice by the user the serial garbage collector was always chosen. In version 5.0 the choice of the collector is based on the class of the machine on which the application is started.
here and this
Also just as the JVM manages the destruction of objects so also does the CLR via a Mark and Compact garbage collection algorithm
here
I hope this helps...

Related

C# Garbage Collector behavior

We have an application in C# that controls one of our device and reacts to signal this device gives us.
Basically, the application creates threads, treat operations (access to a database, etc) and communicates with this device.
In the life of the application, it creates objects and release them and so far, we're letting the Garbage Collector taking care of our memory. i've read that it is highly recommanded to let the GC do its stuff without interfering.
Now the problem we're facing is that the process of our application grows ad vitam eternam, growing by step. Example:
It seems to have "waves" when the application is growing and all of a sudden, the application release some memory but seems to leave memory leaks at the same time.
We're trying to investigate the application with some memory profiler but we would like to understand deeply how the garbage Collector works.
I've found an excellent article here : The Danger of Large Objects
I've also found the official documentation here : MSDN
Do you guys know another really really deep documentation of GC?
Edit :
Here is a screenshot that illustrates the behavior of the application :
You can clearly see the "wave" effect we're having on a very regular pattern.
Subsidiary question :
I've seen that my GC Collection 2 Heap is quite big and following the same pattern as the total bytes used by my application. I guess it's perfectly normal because most of our objects will survive at least 2 garbage collections (for example Singleton classes, etc)... What do you think ?

The behavior you describe is typical of problems with objects created on Large Object Heap (LOH). However, your memory consumption seems to return to some lower value later on, so check twice whether it is really a LOH issue.
You are obviously aware of that, but what is not quite obvious is that there is an exception to the size of the objects on LOH.
As described in documentation, objects above 85000 bytes in size end up on LOH. However, for some reason (an 'optimization' probably) arrays of doubles which are longer than 1000 elements also end up there:
double[999] smallArray = ...; // ends up in 'normal', Gen-0 heap
double[1001] bigArray = ...; // ends up in LOH
These arrays can result in fragmented LOH, which requires more memory, until you get an Out of memory exception.
I was bitten by this as we had an app which received some sensor readings as arrays of doubles which resulted in LOH defragmentation since every array slightly differed in length (these were readings of realtime data at various frequencies, sampled by non-realtime process). We solved the issue by implementing our own buffer pool.

I did some research on a class I was teaching a couple of years back. I don't think the references contain any information regarding the LoH but I thought it was worthwhile to share them either way (see below). Further, I suggest performing a second search for unreleased object references before blaming the garbage collector. Simply implementing a counter in the class finalizer to check that these large objects are being dropped as you believe.
A different solution to this problem, is simply to never deallocate your large objects, but instead reuse them with a pooling strategy. In my hubris I have many times before ended up blaming the GC prematurely for the memory requirements of my application growing over time, however this is more often than not a symptom of faulty implementation.
GC References:
http://blogs.msdn.com/b/clyon/archive/2007/03/12/new-in-orcas-part-3-gc-latency-modes.aspx
http://msdn.microsoft.com/en-us/library/ee851764.aspx
http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx
http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx
Eric Lippert's blog is especially interesting, when it comes to understanding anything C# in detail!

Here is an update with some of my investigations :
In our application, we're using a lot of thread to make different tasks. Some of these threads have higher priority.
1) We're using a GC that is concurrent and we tried to switch it back to non-concurrent.
We've seen a dramatic improvment :
The Garbage collector is being called much often and it seems that, when called more often, it's releasing much better our memory.
I'll post a screenshot as soon as I have a good one to illustrate this.
We've found a really good article on the MSDN. We also found an interesting question on SO.
With the next Framework 4.5, 4 possibilities will be available for GC configuration.
Workstation - non-concurrent
Workstation - concurrent
Server - non-concurrent
Server - concurrent
We'll try and switch to the "server - non-concurrent" and "serveur - concurrent" to check if it's giving us better performance.
I'll keep this thread updated with our findings.

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

I found this article here:
Quantifying the Performance of Garbage Collection vs. Explicit Memory Management
http://www.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf
In the conclusion section, it reads:
Comparing runtime, space consumption,
and virtual memory footprints over a
range of benchmarks, we show that the
runtime performance of the
best-performing garbage collector is
competitive with explicit memory
management when given enough memory.
In particular, when garbage collection
has five times as much memory as
required, its runtime performance
matches or slightly exceeds that of
explicit memory management. However,
garbage collection’s performance
degrades substantially when it must
use smaller heaps. With three times as
much memory, it runs 17% slower on
average, and with twice as much
memory, it runs 70% slower. Garbage
collection also is more susceptible to
paging when physical memory is scarce.
In such conditions, all of the garbage
collectors we examine here suffer
order-of-magnitude performance
penalties relative to explicit memory
management.
So, if my understanding is correct: if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB?
(And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)
Do you know if current (i.e. latest Java VM's and .NET 4.0's) garbage collectors suffer the same problems described in the aforementioned article? Has the performance of modern garbage collectors improved?
Thanks.

if I have an app written in native C++
requiring 100 MB of memory, to achieve
the same performance with a "managed"
(i.e. garbage collector based)
language (e.g. Java, C#), the app
should require 5*100 MB = 500 MB? (And
with 2*100 MB = 200 MB, the managed
app would run 70% slower than the
native app?)
Only if the app is bottlenecked on allocating and deallocating memory. Note that the paper talks exclusively about the performance of the garbage collector itself.

You seem to be asking two things:
have GC's improved since that research was performed, and
can I use the conclusions of the paper as a formula to predict required memory.
The answer to the first is that there have been no major breakthroughs in GC algorithms that would invalidate the general conclusions:
GC'ed memory management still requires significantly more virtual memory.
If you try to constrain the heap size the GC performance drops significantly.
If real memory is restricted, the GC'ed memory management approach results in substantially worse performance due to paging overheads.
However, the conclusions cannot really be used as a formula:
The original study was done with JikesRVM rather than a Sun JVM.
The Sun JVM's garbage collectors have improved in the ~5 years since the study.
The study does not seem to take into account that Java data structures take more space than equivalent C++ data structures for reasons that are not GC related.
On the last point, I have seen a presentation by someone that talks about Java memory overheads. For instance, it found that the minimum representation size of a Java String is something like 48 bytes. (A String consists of two primitive objects; one an Object with 4 word-sized fields and the other an array with a minimum of 1 word of content. Each primitive object also has 3 or 4 words of overhead.) Java collection data structures similarly use far more memory than people realize.
These overheads are not GC-related per se. Rather they are direct and indirect consequences of design decisions in the Java language, JVM and class libraries. For example:
Each Java primitive object header1 reserves one word for the object's "identity hashcode" value, and one or more words for representing the object lock.
The representation of a String has to use a separate "array of characters" because of JVM limitations. Two of the three other fields are an attempt to make the substring operation less memory intensive.
The Java collection types use a lot of memory because collection elements cannot be directly chained. So for example, the overheads of a (hypothetical) singly linked list collection class in Java would be 6 words per list element. By contrast an optimal C/C++ linked list (i.e. with each element having a "next" pointer) has an overhead of one word per list element.
1 - In fact, the overheads are less than this on average. The JVM only "inflates" a lock following use & contention, and similar tricks are used for the identity hashcode. The fixed overhead is only a few bits. However, these bits add up to a measurably larger object header ... which is the real point here.

Michael Borgwardt is kind of right about if the application is bottlenecked on allocating memory. This is according to Amdahl's law.
However, I have used C++, Java, and VB .NET. In C++ there are powerful techniques available that allocate memory on the stack instead of the heap. Stack allocation is easily a hundreds of times faster than heap allocation. I would say that use of these techniques could remove maybe one allocation in eight, and use of writable strings one allocation in four.
It's no joke when people claim highly optimized C++ code can trounce the best possible Java code. It's the flat out truth.
Microsoft claims the overhead in using any of the .NET family of languages over C++ is about two to one. I believe that number is just about right for most things.
HOWEVER, managed environments carry a particular benefit in that when dealing with inferior programmers you don't have to worry about one module trashing another module's memory and the resulting crash being blamed on the wrong developer and the bug difficult to find.

At least as I read it, your real question is whether there have been significant developments in garbage collection or manual memory management since that paper was published that would invalidate its results. The answer to that is somewhat mixed. On one hand, the vendors who provide garbage collectors do tune them so their performance tends to improve over time. On the other hand, there hasn't been anything like a major breakthroughs such as major new garbage collection algorithms.
Manual heap managers generally improve over time as well. I doubt most are tuned with quite the regularity of garbage collectors, but in the course of 5 years, probably most have had at least a bit of work done.
In short, both have undoubtedly improved at least a little, but in neither case have there been major new algorithms that change the fundamental landscape. It's doubtful that current implementations will give a difference of exactly 17% as quoted in the article, but there's a pretty good chance that if you repeated the tests today, you'd still get a difference somewhere around 15-20% or so. The differences between then and now are probably smaller than the differences between some of the different algorithms they tested at that time.

I am not sure how relivent your question still is today. A performance critical application shouldn't spend a sigificant portion of its time doing object creation (as the micro-benchmark is very likely to do) and the performance on modern systems is more likely to be determined by how well the application fits into the CPUs cache, rather than how much main memory it uses.
BTW: There are lots of ticks you can do in C++ which support this which are not available in Java.
If you are worried about the cost of GC or object creation, you can take steps to minimise how many objects you create. This is generally a good idea where performance is critical in any language.
The cost of main memory isn't as much of an issue as it used to me. A machine with 48 GB is relatively cheap these days. An 8 core server with 48 GB of main memory can be leased for £9/day. Try hiring a developer for £9/d. ;) However, what is still relatively expensive is CPU cache memory. It is fairly hard to find a system with more than 16 MB of CPU cache. c.f. 48,000 MB of main memory. A system performs much better when an application is using its CPU cache and this is the amount of memory to consider if performance is critical.

First note that its now 2019 and a lot of things has improved.
As long as you dont trigger GC, allocation would be like as simple as incrementing a pointer. In C++ its much more if you dont implement your own mechanism to allocate in chunks.
And if you use smart shared pointers each change to refercence count will required locked increment (xaddl instruction) is slow itself and requires processors communicate to invalidate and resynch their cacheline.
What is more, with GC you get more locality with at least three ways. First when it allocates a new segment, it zero's memory and warms cachelines. Second it compacts heap and cause data to stay closer togeter and lastly all threads use its own heap.
In conclusion, although its hard to test and compare with every scenario and GC implementation ive read somewhere on SO that its proven GC performs better than manual memory management.

Deterministic GC in the CLR?

are there any CLR implementations that have deterministic garbage collection?
Nondeterministic pauses in the MS CLR GC inhibit .Net from being a suitable environment for real-time development.
Metronome GC and BEA JRockit in Java are two deterministic GC implementations that I'm aware of.
But have there any .Net equivalents?
Thanks

There is no way to make the GC deterministic, expect of course from calling GC.Collect() exactly every second using a timer ;-).
The GC however does contain a notification mechanism (since .NET 3.5 SP1) that allows you to be notified when a gen 2 collect is about to happen. You can read about it here.
The GC now also contains multiple latency modes that make it possible to prevent any GC collects from occurring. Of course you should be very careful with this, but is especially useful for real-time systems. You can read more about it here.

No, there are non. From my experience .net can't be used to create real time systems for many reasons, not only about garbage collection. C or C++ are better choice. Also modern OSes do not provide deterministic scheduling, and it is about all applications, regardless of language.

You would have to control the GC yourself in order to get predictable real-time behaviour, but if you are doing this then you may as well not use a managed language.
For real-time systems you need control over everything that is running. There are third-party modifications to Windows XP that make it real-time (can't remember if it's soft or hard real-time though).
Completely unfeasible option. Look into Cosmos OS - written in C# and compiled to assembler I think - might be able to do something with that :)

.NET: What is typical garbage collector overhead?

5% of execution time spent on GC? 10%? 25%?
Thanks.

This blog post has an interesting investigation into this area.
The posters conclusion? That the overhead was negligible for his example.
So the GC heap is so fast that in a real program, even in tight loops, you can use closures and delegates without even giving it a second’s thought (or even a few nanosecond’s thought). As always, work on a clean, safe design, then profile to find out where the overhead is.

It depends entirely on the application. The garbage collection is done as required, so the more often you allocate large amounts of memory which later becomes garbage, the more often it must run.
It could even go as low as 0% if you allocate everything up front and the never allocate any new objects.
In typical applications I would think the answer is very close to 0% of the time is spent in the garbage collector.

The overhead varies widely. It's not really practical to reduce the problem domain into "typical scenarios" because the overhead of GC (and related functions, like finalization) depend on several factors:
The GC flavor your application uses (impacts how your threads may be blocked during a GC).
Your allocation profile, including how often you allocate (GC triggers automatically when an allocation request needs more memory) and the lifetime profile of objects (gen 0 collections are fastest, gen 2 collections are slower, if you induce a lot of gen 2 collections your overhead will increase).
The lifetime profile of finalizable objects, because they must have their finalizers complete before they will be eligible for collection.
The impact of various points on each of those axes of relevancy can be analyzed (and there are probably more relevant areas I'm not recalling off the top of my head) -- so the problem is really "how can you reduce those axes of relevancy to a 'common scenario?'"
Basically, as others said, it depends. Or, "low enough that you shouldn't worry about it until it shows up on a profiler report."

In native C/C++ there is sometimes a large cost of allocating memory due to finding a block of free memory that is of the right size, there is also a none 0 cost of freeing memory due to having to linked the freed memory into the correct list of blocks, and combine small blocks into large blocks.
In .NET it is very quick to allocate a new object, but you pay the cost when the garbage collector runs. However to cost of garbage collection short lived object is as close to free as you can get.
I have always found that if the cost of garbage collection is a problem to you, then you are likely to have over bigger problems with the design of your software. Paging can be a big issue with any GC if you don’t have enough physical RAM, so you may not be able to just put all your data in RAM and depend on the OS to provide virtual memory as needed.

It really can vary. Look at this demonstration short-but-complete program that I wrote:
http://nomorehacks.wordpress.com/2008/11/27/forcing-the-garbage-collector/
that shows the effect of large gen2 garbage collections.

Yes, the Garbage Collector will spend some X% of time collecting when averaged over all applications everywhere. But that doesn't necessarily means that time is overhead. For overhead, you can really only count the time that would be left after releasing an equivalent amount of memory on an unmanaged platform.
With that in mind, the actual overhead is negative, but the Garbage collector will save time by release several chunks of memory in batches. That means fewer context switches and an overall improvement in efficiency.
Additionally, starting with .Net 4 the garbage collector does a lot of it's work on a different thread that doesn't interrupt your currently running code as much. As we work more and more with mutli-core machines where a core might even be sitting idle now and then, this is a big deal.

.Net vs Java Garbage Collector

Does anyone know the major differences between the Java and .Net garbage collectors? A web search has not revealed much, and it was a question that came up in a test.

This is just to add to ShuggyCoUk's excellent answer. The .NET GC also uses what is know as the large object heap (LOH). The CLR preallocates a bunch of objects on the LOH and all user allocated objects of at least 85000 bytes are allocated on the LOH as well. Furthermore, double[] of 1000 elements or more are allocated on the LOH as well due to some internal optimization.
The LOH is handled differently than the generational heaps in various ways:
It is only cleaned during a full collect and it is never compacted like the generational heaps.
Allocation from the LOH is done via a free list much like malloc is handled in the C runtime, whereas allocations from the generational heap is essentially done by just moving a pointer in generation 0.
I don't know if the JVM has something similar, but it is essential information on how memory is handled in .NET so hopefully, you find it useful.

If I recall correctly, the JVM doesn't release deallocated memory back to the operating system as the CLR does.

Java 5 introduced a lot of changes into its GC algorithms.
I'm not a C# maven, but these two articles suggest to me that both have evolved away from simple mark and sweep and towards newer generation models:
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
http://www.csharphelp.com/archives2/archive297.html

I found this:
In the J2SE platform version 1.4.2 there were four garbage collectors from which to choose but without an explicit choice by the user the serial garbage collector was always chosen. In version 5.0 the choice of the collector is based on the class of the machine on which the application is started.
here and this
Also just as the JVM manages the destruction of objects so also does the CLR via a Mark and Compact garbage collection algorithm
here
I hope this helps...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.