Possible to control the JIT-Compiler [duplicate]

Possible to control the JIT-Compiler [duplicate] - c#

I'm writing a DSP application in C# (basically a multitrack editor). I've been profiling it for quite some time on different machines and I've noticed some 'curious' things.
On my home machine, the first run of the playback loop takes up about 50%-60% of the available time, (I assume it's due to the JIT doing its job), then for the subsequent loops it goes down to a steady 5% consumption. The problem is, if I run the application on a slower computer, the first run takes up more than the available time, causing the playback to get interrupted and messing the output audio, which is unacceptable. After that, it goes down to a 8%-10% consumption.
Even after the first run, the application keeps calling some time-consuming routines from time to time (every 2 seconds more or less), which causes the steady 5% consumption to experience very short peaks of 20%-25%. I've noticed that if I let the application run for a while these peaks will also go down to a 7%-10%. (I'm not sure if it's due to the JIT recompiling these portions of code).
So, I have a serious problem with the JIT. While the application will behave nicely even in very slow machines, these 'compiling storms' are going to be a big problem. I'm trying to figure out how to resolve this issue and I've come up with an idea, which is to mark all the 'sensible' routines with an attribute that will tell the application to 'squeeze' them beforehand during start-up, so they'll be fully optimized when they're really needed. But this is only an idea (and I don't like it too much either) and I wonder if there's a better solution to the whole problem.
I'd like to hear what you guys think.
(NGEN the application is not an option, I like and want all the JIT optimizations I can get.)
EDIT:
Memory consumption and garbage collection kicks are not an issue, I'm using object pools and the maximum peak of memory during playback is 304 Kb.

You can trigger the JIT compiler to compile your entire set of assemblies during your application's initialization routine using the PrepareMethod ... method (without having to use NGen).
This solution is described in more detail here: Forcing JIT Compilation During Runtime.

The initial speed indeed sounds like Fusion+JIT, which would be helped by ILMerge (for Fusion) and NGEN (for JIT); you could always play a silent track through the system at startup so that this does all the hard work without the user noticing any distortion?
NGEN is a good option; is there a reason you can't use it?
The issues you mention after the initial load do not sound like they are related to JIT. Perhaps garbage collection.
Have you tried profiling? Both CPU and memory (collections)?

As Marc mentioned, the ongoing spikes do not sound like JIT issues. Other things to look for:
Garbage collection - are you allocating memory during your audio processing? If you're creating a lot of garbage, or even objects which survive a Gen 0 collection, this might cause noticible spikes. It sounds like you are doing some kind of pre-allocation, but watch out for hidden allocations in library code (even a foreach loop can allocate!)
Denormals. There is an issue with certain types of processors when dealing with very small floating point numbers which can cause CPU spikes. See http://www.musicdsp.org/files/denormal.pdf for details.
Edit:
Even if you don't want to use NGen, at least compare an NGen'd version so you can see what difference JITing makes

If you believe you are being impacted by JIT, then precompile your app with NGEN and run the tests again. There is no JIT overhead in code that has been compiled by NGEN. If you still see spikes in the NGEN'd app, then you know they are not caused by JIT.

Related

Is .NET code post JIT being executed the same as native code

.NET languages all compile to an intermediate language (MSIL).
As far as i know, during execution (and sometimes during other stages, which i am not fully knowledgeable about -- NGEN), code is being JITted (Compiled from MSIL into actual machine code).
I am wondering if after JITting the code there are performance "penalties" coming from the fact that the code is executing on the CLR, or whether the code behaves "the same" as any other native code?

There are a number of performance differences:
The free store for managed objects is implemented as a stack, not a heap (except for the Large Object Heap), and is lower overhead than the heap used by most native allocators. But then you pay for garbage collection and compaction later.
The JIT can inline some calls that an AOT compiler would have to leave virtual (i.e. calls into other assemblies). But the AOT compiler can spend more time looking for optimization opportunities.
Theoretically, the JIT can use advanced instructions present on the particular CPU running the code (e.g. AVX). Still waiting for a JIT that actually makes good use of them, though.
AOT compilers can use profiling data to control layout of code memory. JIT compilers almost always emit functions into memory in the order they were compiled.

The main performance penalty to JITed code is the time taken to compile the code when it's first run. That usually only exhibits itself as a (slightly, perhaps imperceptibly) longer startup time, though it can be a real hit if you're using it in a scenario like CGI where a new process is spawned to handle every request. Not that a CGI script written in .NET is a common use case, but it's the first example that popped into my head so I'm going to run with it.
NGen can improve your startup time by skipping the JIT step. The benefit is going to be biggest in a short-running program that gets run frequently, like a CGI script. (Or perhaps a Windows service that's set to start automatically is a better example, now that I think of it.) For programs that run infrequently, the executable is unlikely to be cached in memory so it's probably going to have to be loaded from the disk each time. The time it takes to read from disk is likely to dominate startup time and overwhelm NGen's benefits. And for programs that run for a long time, startup time probably isn't a significant performance characteristic.

Typically code written natively will be more performant as you can use optimizations that are available to a specific architecture that you might need to know at design time, for example SSE. There are also other things to consider for the performance of one language or another, for example Garbage Collection vs manual memory management.
As far as NGEN goes, I don't think there will be much of a performance difference between the JITTED code and what NGEN does, other than the JITTED code gets generated at runtime (which incures a performance penalty when the actual compilation happens).

will C# compiler for big codebase run dramatically faster on machine with huge RAM?

I have seen some real slow build times in a big legacy codebase without proper assembly decomposition, running on a 2G RAM machine. So, if I wanted to speed it up without code overhaul, would a 16G (or some other such huge number) RAM machine be radically faster, if the fairy IT department were to provide one? In other words, is RAM the major bottleneck for sufficiently large dotnet projects or are there other dominant issues?
Any input about similar situation for building Java is also appreciated, just out of pure curiosity.

Performance does not improve with additional RAM once you have more RAM than the application uses. You are likely not to see any more improvement by using 128GB of RAM.
We cannot guess the amount needed. Measure by looking at task manager.

It certainly won't do you any harm...
2G is pretty small for a dev machine, I use 16G as a matter of course.
However, build times are going to be gated by file access sooner or later, so whilst you might get a little improvement I suspect you won't be blown away by it. ([EDIT] as a commenter says, compilation is likely to be CPU bound too).
Have you looked into parallel builds (e.g. see this SO question: Visual Studio 2010, how to build projects in parallel on multicore).
Or, can you restructure your code base and maybe remove some less frequently updated assemblies in to a separate sln, and then reference these as DLLs (this isn't a great idea in all cases, but sometimes it can be expedient). From you description of the problem I'm guessing this is easier said than done, but this is how we've achieved good results in our code base.

The whole RAM issue is actually one of ROI (Return on Interest). The more RAM you add to a system, the less likely the application is going to have to search for a memory location large enough to store an object of a particular size and the faster it'll go; however, after a certain point it's so unlikely that the system will pick a location that is not large enough to store the object that it's pointless to go any higher. (note that read/write speeds of the RAM stick play a role in this as well).
In summary: # 2gb RAM, you definitely should upgrade that to something more like 8gb or the suggested 16gb however doing something more than that would be almost pointless because the bottleneck will come from the processor then.
ALSO it's probably a good idea to note the speed of the RAM too because then your RAM can bottleneck because it can only handle XXXXmhz clock speed at most. Generally, though, 1600mhz is fine.

C# Production Server, Do I collect the garbage?

I know there's tons of threads about this. And I read a few of them.
I'm wondering if in my case it is correct to GC.Collect();
I have a server for a MMORPG, in production it is online day and night. And the server is restarted every other day to implement changes to the production codebase. Every twenty minutes the server pauses all other threads, and serializes the current game state. This usually takes 0.5 to 4 seconds
Would it be a good idea to GC.Collect(); after serialization?
The server is, obviously, constantly creating and destroying game items.
Would I have a notorious gain in performance or memory optimization / usage?
Should I not manually collect?
I've read about how collecting can be bad if used in the wrong moments or too frequently, but I'm thinking these saves are both a good moment to collect, and not that frequent.
The server is in framework 4.0
Update in answer to a comment:
We are randomly experiencing server freezes, sometimes, unexpectedly, the server memory usage will raise increasingly until it reaches a point when the server takes way too long to handle any network operation. Thus, I'm considering a lot of different approaches to solve the issue, this is one of them.

The garbage collector knows best when to run, and you shouldn't force it.
It will not improve performance or memory optimization. CLR can tell GC to collect object which are no longer used if there is a need to do that.
Answer to an updated part:
Forcing the collection is not a good solution to the problem. You should rather have a look a bit deeper into your code to find out what is wrong. If memory usage grows unexpectedly you might have an issue with unmanaged resources which are not properly handled or even a "leaky code" within managed code.
One more thing. I would be surprise if calling GC.Collect fixed the problem.

Every twenty minutes the server pauses
all other threads, and serializes the
current game state. This usually takes
0.5 to 4 seconds
If all your threads are suspended already anyway you might as well call the garbage collection, since it should be fairly fast at this point. I suspect doing this will only mask your real problem though, not actually solve it.
We are randomly experiencing server
freezes, sometimes, unexpectedly, the
server memory usage will raise
increasingly until it reaches a point
when the server takes way too long to
handle any network operation. Thus,
I'm considering a lot of different
approaches to solve the issue, this is
one of them.
This sounds more like you actually are still referencing all these objects that use the memory - if you weren't the GC would run due to the memory pressure and try to release those objects. You might be looking at an actual bug in your production code (i.e. objects that are still subscribed to events or otherwise are being referenced when they shouldn't be) rather than something you can fix by manually taking out the garbage.
If possible in this scenario you should run a performance analysis to see where your bottlenecks are and what part of your code is causing the brunt of the memory allocations.

Could the memory increase be an "attack" by a player with a fake/modified game-client? Is a lot of memory allocated by the server when it accepts a new client connection? Does the server handle bogus incoming data well?

Benefits of 'Optimize code' option in Visual Studio build

Much of our C# release code is built with the 'Optimize code' option turned off. I believe this is to allow code built in Release mode to be debugged more easily.
Given that we are creating fairly simple desktop software which connects to backend Web Services, (ie. not a particularly processor-intensive application) then what if any sort of performance hit might be expected?
And is any particular platform likely to be worse affected? Eg. multi-processor / 64 bit.

You are the only person who can answer the "performance hit" question. Try it both ways, measure the performance, and see what happens. The hit could be enormous or it could be nonexistant; no one reading this knows whether "enormous" to you means one microsecond or twenty minutes.
If you're interested in what optimizations are done by the C# compiler -- rather than the jitter -- when the optimize switch is on, see:
http://blogs.msdn.com/ericlippert/archive/2009/06/11/what-does-the-optimize-switch-do.aspx

The full details are available at http://blogs.msdn.com/jaybaz_ms/archive/2004/06/28/168314.aspx.
In brief...
In managed code, the JITter in the runtime does nearly all the optimization. The difference in generated IL from this flag is pretty small.

In fact, there is a difference, sometimes quite significant. What can really affect the performance (as it is something that JIT does not fully take care of):
Unnecessary local variables (i.e., bigger stack frames for each call)
Too generic conditional instructions, JIT translates them in quite a straightforward manner.
Unnecessary branching (also not served well by a JIT - after all, it does not have too much time to do all the smart optimisations)
So, if you're doing something numerical - turn on the optimisation. Otherwise you won't see any difference at all.

The optimizations done by the compiler are fairly low level and shouldn't affect your users' experience.
If you'd like to quantify the optimization on your application, simply profile a non-optimized and an optimized build and compare the results.

I find that with complex, CPU intensive code (the code i'm using is a Monte Carlo simulation that can spawn enough threads to 100% utilize a computer. This was tested in a 36 core environment) the performance hit can be up to 4 times higher! A simulation that takes 2 hours will take about 9 hours without the optimization flag. (the paths are about 500,000 and for each paths there are 500 steps for around 2000 different objects with highly complex calculation on each objects).

Largest Heap used in a managed environment? (.net/java)

What is the largest heap you have personally used in a managed environment such as Java or .NET? What were some of the performance issues you ran into, and did you end up getting a diminishing returns the larger the heap was?

I work on a 64-bit .Net system that typically uses 9-12 GB, and sometimes as much as 20GB. I have not seen any performance problems even while garbage collecting, and I have been looking hard as I was not expecting it to work so well.
An earlier version hung on to some objects for too long resulting in occasional GCs that freed up 3GB+. Even then, there was no noticeable impact on performance. The system is running on a 16-core server with 32GB RAM, which probably helps...

In .Net, on Windows 32-bit, You can only really get to about 1.4 GB of memory usage before things start getting really screwy (out of memory exceptions). This is due to a limitation in 32 bit windows that limits a single process to using more than 2 GB of RAM. There is /3GB switch you can put in your boot.ini, but that will only bring you a little bit further. If you want to use lots of memory, you should seriously consider running on a 64 bit version of windows.

I currently have a production application with 6 GB of memory. You'll need a 64-bit box as well for the JVM to be able to address that much.
The garbage collector is really the only thing (that I've found so far) where performance degrades with size, and then only if you manually kick off a System.GC, which forces the JVM to bring everything to a screeching halt as it traverses 6 GB worth of objects. Takes a good 20 seconds, too. The default GC behavior does not do this, BTW, you have to be dumb enough to make it do that. Also worth researching JVM tuning at this size.
You can also find things like distributed and clustered JVMs, sorry, don't have any good references as I didn't look into this option too closely, although I did find references to larger installations.

I am unsure what you mean by heap, but if you mean memory used, I have used quite a bit, 2GB+. I have a web app that does image processing and it requires loading 2 large scan files into memory to do analysis.
There were performance issues. Windows would swap out lots of ram, and then that would create a lot of page faults. There was never any need for anymore than 2 images at a time as all requests were gainst those images (I only allowed 1 session per image set at a time)
For instance, to setup the files for initial viewing would take about 5 seconds. Doing simple analysis and zooming would be fairly fast once in memory, in the order of .1 to .5 seconds.
I still had to optimize, so I ended up preparsung the files and chopping into smaller peices and worked only with the peices that were required by the user at the time.

I have used from 2GB to 5GB of memory in java, but usually when I get to more than 2GB I really start thinking about memory optimization. Diminishing returns can vary from not optimizing when it's necessary because you have a lot of memory, to not having memory available for the OS/Disk caches (which can help your application overall).
For Java, I recommend watching your memory usage per generation over time. Do you create a lot of temporary objects or have long-lasting objects that consume a lot of memory? A lot of optimization of memory can be done when knowing those things.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.