I know some details of how it works by Richter's book, but i want to "feel" it in practice. I see some variants:
Write my own GC implementation by .NET standard (just kidding, it is too hardcore to do it on my own :)
Study MONO implementation of GC - it have some pluses (for example i could analyze some cases with debugger), but on the other hand it is not so different from reading a book. And by the way as far as i know MONO implementation really differs from Microsoft's one (correct me if i am wrong).
So, any suggestions?
Well, you could write a resource intensive server application, that will give you the gist of what GC does under significant load. You can pick anything you want, ideally something like a web server, an MMO server, etc.
You will get to see how the GC manages the managed heap under complicated circumstances (asynchronous sockets in particular make GC very unhappy, thanks to their use of pinned handles), and you can experiment with how to best allocate different kinds of memory resources to make GCs job as easy as possible.
And you could write your own GC, stupid GC is not that hard.
Related
Disclaimer : there might be some misconceptions in the phrasing below, please correct me if i misinterpreted the wy my code is handled in C# between the moment i write it to the point it looks like zeroes and ones
The questions are the following (those are linked) :
Is there any way in C# where my Data structures and/or my data manipulation implementation, will have a performance impact whether I use optimization techniques or not?
What does the compiler do when outputting IL, is it reliable?
Meaning : if i make my data SOA will it be SOA in IL? Always?
What happens to my data structure when the JIT reads the IL? is it changed? is it optimized automatically to fit my processor?
cf: that talk about C/C++
I know that this talk is targeted at native code, and talks about the specifics of the processor layout vs your data layout in native code.
I also know the C# compiler and the JIT compiler will optimize stuff for me in regards of those issues.
Basically I wonder if those kind of optimization will have an impact on my perfs :
SOA instead of AOS
Vectors access patterns (to be accessed contiguously in memory)
etc... you name it...
I work in game development and performance is critical, we manipulate large amounts of data and we need to do that minimum 24 times per second, I cannot have the GC do stuff for 300ms or memory to be accessed/allocated all over the place when I'm trying to detect collisions between 3000 different objects
For reference about stuff i read but did not really answer the question :
Excellent Eric Lippert article about structs and values types in C# (please read it if you think values types are always on the stack in C# you're in for a treat)
Excellent video about PerfView to track your GC behaviour and it's impact on your perfs
That SO question about Best practices to optimize memory in C# (and more importantly it's answer)
But those do not answer the performance cost relative to processor and data layout implementation.
To go further after what Hans answered :
When you say : "You can pursue SOA but that doesn't help. Yes, your program will slow down because of all that structure copying and does so in a deterministic way. But it doesn't stop the rain. You get the worst of both, a slow program and the exact same pauses."
It does not mean that my program benefits nothing from SOA it WILL be faster (potentially) because it will help processing my data. Just that it will have no impact on the GC in itself.
Other thing is that If i do not do SOA or other improvements in my data layout, the compiler won't improve that for me right? i cannot rely on the compiler to deal with that kind of things?
Worrying about the GC is like worrying about whether it is going to rain today. It is going to rain sooner or later, nothing you can do to stop it. And it is required, you can't keep that lawn look pretty green if it doesn't. What you never want to do is intentionally stop it from raining. Because if you do it will come down in a deluge, spilling away that pretty lawn. A steady drizzle is what you want. And preferably at night when you're not looking.
The .NET GC strongly supports this. Only the small gen #0 and #1 collections will pause your program. The expensive gen#2 collection happens in the background while your code continues executing. Worst-case pause hovers somewhere near a hundred microseconds. Which is pretty indistinguishable from other reasons your program will pause on a modern OS. Like your game loop getting suspended temporarily because another higher priority kernel thread needs to run. Just a drizzle, unobservable to the human eye.
You can pursue SOA but that doesn't help. Yes, your program will slow down because of all that structure copying and does so in a deterministic way. But it doesn't stop the rain. You get the worst of both, a slow program and the exact same pauses.
Don't worry about the rain, just make sure it comes down at the right time. To take advantage of background GCs you want to structure your data so it is either very short lived, so it disappears easily with a gen #0/1 collection. Or lives for very long so it finds a comfortable home in gen #2 and stays there for a while. Which in general is a very common pattern in programs, especially so in games. Pretty unlikely you need to do anything at all.
I am originally a native C++ programmer, in C++ every process in your program is bound to your code, i.e, nothing happens unless you want it to happen. And every bit of memory is allocated (and deallocated) according to what you wrote. So, performance is all your responsibility, if you do good, you get great performance.
(Note: Please don't complain about the code one haven't written himself such as STL, it's a C++ unmanaged code after all, that is the significant part).
But in managed code, such as code in Java and C#, you don't control every process, and memory is "hidden", or not under your control, to some extent. And that makes performance something relatively unknown, mostly you fear bad performance.
So my question is: What issues and Bold Lines should I look after and keep in mind to achieve a good performance in managed code?
I could think only of some practices such as:
Being aware of boxing and unboxing.
Choosing the correct Collection that best suites your needs and has the lowest operation cost.
But these never seem to be enough and even convincing! In fact perhaps I shouldn't have mentioned them.
Please note I am not asking for a C++ VS C# (or Java) code comparing, I just mentioned C++ to explain the problem.
There is no single answer here. The only way to answer this is: profile. Measure early and often. The bottlenecks are usually not where you expect them. Optimize the things that actually hurt. We use mvc-mini-profiler for this, but any similar tool will work.
You seem to be focusing on GC; now, that can sometimes be an issue, but usually only in specific cases; for the majority of systems the generational GC works great.
Obviously external resources will be slow; caching may be critical: in odd scenarios with very-long-lived data there are tricks you can do with structs to avoid long GEN-2 collects; serialization (files, network, etc), materialization (ORM), or just bad collection/algorithn choice may be the biggest issue - you cannot know until you measure.
Two things though:
make sure you understand what IDisposable and "using" mean
don't concatenate strings in loops; mass concatenation is the job of StringBuilder
Reusing large objects is very important in my experience.
Objects on the large object heap are implicitly generation 2, and thus require a full GC to clean up. And that's expensive.
The main thing to keep in mind with performance with managed languages is that your code can change structure at runtime to be better optimized.
For example the default JVM most people use is Sun's Hotspot VM which will actually optimize your code as it runs by converting parts of the program to native code, in-lining on the fly and other optimizations (such as the CLR or other managed runtimes) which you will never get using C++.
Additionally Hotspot will also detect which parts of you're code are used the most and optimize accordingly.
So as you can see optimising performance on a managed system is slightly harder than on an un-managed system because you have an intermediate layer that can make code faster without your intervention.
I am going to invoke the law of premature optimization here and say that you should first create the correct solution then, if performance becomes an issue, go back and measure what is actually slow before attempting to optimize.
I would suggest understanding better garbage collection algorithms. You can find good books on that matter, e.g. The Garbage Collection Handbook (by Richard Jones, Antony Hosking, Eliot Moss).
Then, your question is practically related to particular implementation, and perhaps even to a specific version of it. For instance, Mono used (e.g. in version 2.4) to use Boehm's garbage collector, but now uses a copying generational one.
And don't forget that some GC techniques can be remarkably efficient. Remember A.Appel's old paper Garbage Collection can be faster than stack allocation (but today, the cache performance matters much much more, so details are different).
I think that being aware of boxing (& unboxing) and allocation is enough. Some compilers are able to optimize these (by avoiding some of them).
Don't forget that GC performance can vary widely. There are good GCs (for your application) and bad ones.
And some GC implementations are quite fast. For example the one inside Ocaml
I would not bother that much: premature optimization is evil.
(and C++ memory management, even with smart pointers, or with ref-counters, can often be viewed as a poor man's garbage collection technique; and you don't have full control on what C++ is doing -unless you re-implement your ::operator new using operating system specific system calls-, so you don't really know a priori its performance)
.NET Generics don't specialize on reference types, which severely limits how much inlining can be done. It may (in certain performance hotspots) make sense to forgo a generic container type in favor of a specific implementation that will be better optimized. (Note: this doesn't mean to use .NET 1.x containers with element type object).
you must :
using large objects is very important in my experience.
Objects on the large object heap are implicitly generation 2, and thus require a full GC to clean up. And that's expensive.
I've been interested in different chess engines lately. There are many open and closed sources project in this field. They are all (most of them anyway) written in C/C++. This is kind of an obvious thing - you have a computationally intensive task, you use C/C++ so you get both portability and speed. This seems like a no-brainier.
However, I would like to question that idea. When .NET first appeared, there were many people saying that .NET idea would not work because .NET programs were doomed to be super-slow. In reality this did not happen. Somebody did a good job with the VM, JIT, etc and we have decent performance for most tasks now. But not all. Microsoft never promised that .NET will be suitable for all task and admitted for some tasks you will still need C/C++.
Going back to the question of computationally heavy task - is there a way to write a .NET program so that it does not perform computations considerably worse that an unmanaged code using the same algorithms? I'd be happy with "constant" speed loss, but anything worse than that is going to be a problem.
What do you think? Can we be close in speed to unmanaged code for computations in managed code, or unmanaged code is the only viable answer? If we can, how? If we can't why?
Update: a lot of good feedback here. I'm going to accept the most up-voted answer.
"Can" it? Yes, of course. Even without unsafe/unverifiable code, well-optimized .NET code can outperform native code. Case in point, Dr. Jon Harrop's answer on this thread: F# performance in scientific computing
Will it? Usually, no, unless you go way out of your way to avoid allocations.
.NET is not super-super slow- but nor is it in the same realm as a native language. The speed differential is something you could easily suck up for a business app that prefers safety and shorter development cycles. If you're not using every cycle on the CPU then it doesn't matter how many you use, and the fact is that many or even most apps simply don't need that kind of performance. However, when you do need that kind of performance, .NET won't offer it.
More importantly, it's not controllable enough. In C++ then you destroy every resource, manage every allocation. This is a big burden when you really don't want to have to do it- but when you need the added performance of fine-tuning every allocation, it's impossible to beat.
Another thing to consider is the compiler. I mean, the JIT has access to more information about both the program and the target CPU. However, it does have to re-compile from scratch every time, and do so under far, far greater time constraints than the C++ compiler, inherently limiting what it's capable of. The CLR semantics, like heap allocation for every object every time, are also fundamentally limiting of it's performance. Managed GC allocation is plenty fast, but it's no stack allocation, and more importantly, de-allocation.
Edit: Of course, the fact that .NET ships with a different memory control paradigm to (most) native languages means that for an application for which garbage collection is particularly suited, then .NET code may run faster than native code. This isn't, however, anything to do with managed code versus native code, just picking the right algorithm for the right job, and it doesn't mean that an equivalent GC algorithm used from native code wouldn't be faster.
Simple answer is no. To some commenters below: it will be slower most of the time, not always.
Computationally intensive applications where each millisecond counts will still be written in unmanaged languages such as C, C++. GC slows a lot when collecting.
For example nobody writes 3D engines in C# or XNA. There are some, but nothing is close to CryEngine or Unreal.
Short answer yes, long answer with enough work.
There are high frequency trading applications written in managed C# .NET. Very few other applications ever approach the time criticalness as a trading engine requires. The overall concept is you develop software that is extremely efficient that your application will not need the garbage collector to ever invoke itself for non generation 0 objects. If at any point the garbage collector kicks in you have a massive (in computing terms of time) lag lasting dozens or hundreds of milliseconds which would be unacceptable.
You can use unsafe and pointers to get "raw" memory access, which can give you a significant speed boost at the cost of more responsibility for your stuff (remember to pin your objects). At that point, you're just shifting bytes.
Garbage Collection pressure could be an interesting thing, but there are tactics around that as well (object pooling).
This seems like an overly broad question.
There are some gratuitous hints to throw around: Yes you can use unsafe, unchecked, arrays of Structs and most importantly C++/CLI.
There is never going to be a match for C++'s inlining, compiletime template expansion (and ditto optimizations) etc.
But the bottom line is: it depends on the problem. What is computations anyway. Mono has nifty extensions to use SIMD instructions, on Win32 you'd have to go native to get those. Interop is cheating.
In my experience, though, porting toy projects (such as parsers and a Chess engine) is going to result in at least an order of magnitude speed difference, no matter how much you do optimize the .NET side of things. I reckon this has to do, mainly, with the Heap management and the service routines (System.String, System.IO).
There can be big pitfalls in .NET (overusing Linq, lambdas, accidentally relying on Enum.HasFlag to perform like a bitwise operation...) etc. YMMV and choose your weapons carefully
In general, managed code will have at least some speed loss compared to compiled code, proportional to the size of your code.This loss comes in when the VM first JIT compiles your code. Assuming the JIT compiler is just as good as a normal compiler, after that, the code will perform the same.
However, depending on how it's written, it's even possible that the JIT compiler will perform better than a normal compiler. The JIT compiler knows many more things about the target platform than a normal compiler would--it knows what code is "hot", it can cache results for proven pure functions, it knows which instruction set extensions the target platform supports, etc, whereas depending on the compiler, and how specializing you (and your intended application) allow it to be, the compiler may not be able to optimize nearly as well.
Honestly, it completely depends on your application. Especially with something so algorithmic as a chess engine, it's possible that code in a JIT compiled language, following expected semantics and regular patterns, may run faster than the equivalent in C/C++.
Write something and test it out!
I program in Java doing a lot of web related stuff but I've been toying with the idea of creating a very simple DAW in some language. I considered C# but it doesn't seem to support Direct X anymore (Though there are some libraries that work with differing degrees of success). I was curious if anyone out there had an opinion on playing a lot of multi-channel sounds through Java. I would also at some point need to hack in some VST support (which would probably not be trivial. I'm really afraid that my only option will be C++, and that would be unpleasant enough to make me not actually work on it (know some C++, but not really enough to write something this intense).
Anyone have some ideas? Thanks
VST support in Java may be reasonably easy after all; I've heard of positive experiences with http://github.com/mhroth/jvsthost (that is to say, someone I had a conversation with on a forum seemed to be up and running with it pretty quickly, running a number of different synths successfully).
An aside: Personally, I'm developing some software in Java that uses SuperCollider as an audio backend (disclaimer: my actual experience of Java sound is limited). While it would probably be just about possible to build a DAW around SuperCollider, I wouldn't really recommend it as the tool for that job. However, I also don't quite understand why you want to build a DAW in the first place... should you decide you want to explore alternative means of making music with computers, you might give SC a look (also ChucK I found very easy to get started with and quite a lot of fun) :-)
Anyway, back to the question... while I tend to refer to Java specifically, much of this will go for C# as well:
Traditionally, garbage collection has been a source of concern doing anything where time is of the essence in Java; in a DAW, for example, this may manifest itself as inaccurate timing or clicks in the output where the GC interrupts the program long enough that it is not able to process a complete buffer. This will be particularly true if you want to use small buffers for low latency, and/or are not careful about the amount of garbage generated. However, I don't want to spread FUD about Java sound: as I mentioned, I haven't really used it heavily myself, and in any case I believe these issues are improving. It is certainly an issue you will need to be aware of, but probably not a show-stopper.
I imagine that a big bottleneck in any DAW will be file IO, which shouldn't suffer through Java as long as proper care is taken.
If you start doing intense DSP on many channels simultaneously, then it may be that Java computation performance isn't totally optimal (although probably not bad really); however if you mostly do basic mixing in your DAW code and any DSP with VSTs, then his should be a non-issue anyway.
In terms of actual audio IO, I see that there are also ASIO implementations for Java, should you be interested. I don't even have indirect experience of those, so I really won't vouch for them. Java 1.7 is supposed to have improved low-latency audio support, FWIW (although from what I've read, the applications they have in mind are not things like DAWs). DirectX support I don't think should be a major factor for a DAW. In that sense, you might not want to dismiss C#, as it is a very nice language.
There are already some DAWs that are using the java plattform (frinika or javaDAW
for example). So I think it's a reasonable option.
I'm working on something similar so I would have to say it is possible, my laptop was stolen and I've has to start over but I've rebuilt most of it. So far the track threads have been lining up pretty well but I'm considering implementing something like LWJGL's timer for better precision. Tritonus is a very helpful library and you can find it at jsresources.org as well as some very helpful examples. I've learned a lot there. I'd you send me an e-mail I'd be happy to share my code with you.
are there any CLR implementations that have deterministic garbage collection?
Nondeterministic pauses in the MS CLR GC inhibit .Net from being a suitable environment for real-time development.
Metronome GC and BEA JRockit in Java are two deterministic GC implementations that I'm aware of.
But have there any .Net equivalents?
Thanks
There is no way to make the GC deterministic, expect of course from calling GC.Collect() exactly every second using a timer ;-).
The GC however does contain a notification mechanism (since .NET 3.5 SP1) that allows you to be notified when a gen 2 collect is about to happen. You can read about it here.
The GC now also contains multiple latency modes that make it possible to prevent any GC collects from occurring. Of course you should be very careful with this, but is especially useful for real-time systems. You can read more about it here.
No, there are non. From my experience .net can't be used to create real time systems for many reasons, not only about garbage collection. C or C++ are better choice. Also modern OSes do not provide deterministic scheduling, and it is about all applications, regardless of language.
You would have to control the GC yourself in order to get predictable real-time behaviour, but if you are doing this then you may as well not use a managed language.
For real-time systems you need control over everything that is running. There are third-party modifications to Windows XP that make it real-time (can't remember if it's soft or hard real-time though).
Completely unfeasible option. Look into Cosmos OS - written in C# and compiled to assembler I think - might be able to do something with that :)