Is C# really slower than say C++?

Is C# really slower than say C++? - c#

I've been wondering about this issue for a while now.
Of course there are things in C# that aren't optimized for speed, so using those objects or language tweaks (like LinQ) may cause the code to be slower.
But if you don't use any of those tweaks, but just compare the same pieces of code in C# and C++ (It's easy to translate one to another). Will it really be that much slower ?
I've seen comparisons that show that C# might be even faster in some cases, because in theory the JIT compiler should optimize the code in real time and get better results:
Managed Or Unmanaged?
We should remember that the JIT compiler compiles the code at real time, but that's a 1-time overhead, the same code (once reached and compiled) doesn't need to be compiled again at run time.
The GC doesn't add a lot of overhead either, unless you create and destroy thousands of objects (like using String instead of StringBuilder). And doing that in C++ would also be costly.
Another point that I want to bring up is the better communication between DLLs introduced in .Net. The .Net platform communicates much better than Managed COM based DLLs.
I don't see any inherent reason why the language should be slower, and I don't really think that C# is slower than C++ (both from experience and lack of a good explanation)..
So, will a piece of the same code written in C# will be slower than the same code in C++ ?
In if so, then WHY ?
Some other reference (Which talk about that a bit, but with no explanation about WHY):
Why would you want to use C# if its slower than C++?

Warning: The question you've asked is really pretty complex -- probably much more so than you realize. As a result, this is a really long answer.
From a purely theoretical viewpoint, there's probably a simple answer to this: there's (probably) nothing about C# that truly prevents it from being as fast as C++. Despite the theory, however, there are some practical reasons that it is slower at some things under some circumstances.
I'll consider three basic areas of differences: language features, virtual machine execution, and garbage collection. The latter two often go together, but can be independent, so I'll look at them separately.
Language Features
C++ places a great deal of emphasis on templates, and features in the template system that are largely intended to allow as much as possible to be done at compile time, so from the viewpoint of the program, they're "static." Template meta-programming allows completely arbitrary computations to be carried out at compile time (I.e., the template system is Turing complete). As such, essentially anything that doesn't depend on input from the user can be computed at compile time, so at runtime it's simply a constant. Input to this can, however, include things like type information, so a great deal of what you'd do via reflection at runtime in C# is normally done at compile time via template metaprogramming in C++. There is definitely a trade-off between runtime speed and versatility though -- what templates can do, they do statically, but they simply can't do everything reflection can.
The differences in language features mean that almost any attempt at comparing the two languages simply by transliterating some C# into C++ (or vice versa) is likely to produce results somewhere between meaningless and misleading (and the same would be true for most other pairs of languages as well). The simple fact is that for anything larger than a couple lines of code or so, almost nobody is at all likely to use the languages the same way (or close enough to the same way) that such a comparison tells you anything about how those languages work in real life.
Virtual Machine
Like almost any reasonably modern VM, Microsoft's for .NET can and will do JIT (aka "dynamic") compilation. This represents a number of trade-offs though.
Primarily, optimizing code (like most other optimization problems) is largely an NP-complete problem. For anything but a truly trivial/toy program, you're pretty nearly guaranteed you won't truly "optimize" the result (i.e., you won't find the true optimum) -- the optimizer will simply make the code better than it was previously. Quite a few optimizations that are well known, however, take a substantial amount of time (and, often, memory) to execute. With a JIT compiler, the user is waiting while the compiler runs. Most of the more expensive optimization techniques are ruled out. Static compilation has two advantages: first of all, if it's slow (e.g., building a large system) it's typically carried out on a server, and nobody spends time waiting for it. Second, an executable can be generated once, and used many times by many people. The first minimizes the cost of optimization; the second amortizes the much smaller cost over a much larger number of executions.
As mentioned in the original question (and many other web sites) JIT compilation does have the possibility of greater awareness of the target environment, which should (at least theoretically) offset this advantage. There's no question that this factor can offset at least part of the disadvantage of static compilation. For a few rather specific types of code and target environments, it can even outweigh the advantages of static compilation, sometimes fairly dramatically. At least in my testing and experience, however, this is fairly unusual. Target dependent optimizations mostly seem to either make fairly small differences, or can only be applied (automatically, anyway) to fairly specific types of problems. Obvious times this would happen would be if you were running a relatively old program on a modern machine. An old program written in C++ would probably have been compiled to 32-bit code, and would continue to use 32-bit code even on a modern 64-bit processor. A program written in C# would have been compiled to byte code, which the VM would then compile to 64-bit machine code. If this program derived a substantial benefit from running as 64-bit code, that could give a substantial advantage. For a short time when 64-bit processors were fairly new, this happened a fair amount. Recent code that's likely to benefit from a 64-bit processor will usually be available compiled statically into 64-bit code though.
Using a VM also has a possibility of improving cache usage. Instructions for a VM are often more compact than native machine instructions. More of them can fit into a given amount of cache memory, so you stand a better chance of any given code being in cache when needed. This can help keep interpreted execution of VM code more competitive (in terms of speed) than most people would initially expect -- you can execute a lot of instructions on a modern CPU in the time taken by one cache miss.
It's also worth mentioning that this factor isn't necessarily different between the two at all. There's nothing preventing (for example) a C++ compiler from producing output intended to run on a virtual machine (with or without JIT). In fact, Microsoft's C++/CLI is nearly that -- an (almost) conforming C++ compiler (albeit, with a lot of extensions) that produces output intended to run on a virtual machine.
The reverse is also true: Microsoft now has .NET Native, which compiles C# (or VB.NET) code to a native executable. This gives performance that's generally much more like C++, but retains the features of C#/VB (e.g., C# compiled to native code still supports reflection). If you have performance intensive C# code, this may be helpful.
Garbage Collection
From what I've seen, I'd say garbage collection is the poorest-understood of these three factors. Just for an obvious example, the question here mentions: "GC doesn't add a lot of overhead either, unless you create and destroy thousands of objects [...]". In reality, if you create and destroy thousands of objects, the overhead from garbage collection will generally be fairly low. .NET uses a generational scavenger, which is a variety of copying collector. The garbage collector works by starting from "places" (e.g., registers and execution stack) that pointers/references are known to be accessible. It then "chases" those pointers to objects that have been allocated on the heap. It examines those objects for further pointers/references, until it has followed all of them to the ends of any chains, and found all the objects that are (at least potentially) accessible. In the next step, it takes all of the objects that are (or at least might be) in use, and compacts the heap by copying all of them into a contiguous chunk at one end of the memory being managed in the heap. The rest of the memory is then free (modulo finalizers having to be run, but at least in well-written code, they're rare enough that I'll ignore them for the moment).
What this means is that if you create and destroy lots of objects, garbage collection adds very little overhead. The time taken by a garbage collection cycle depends almost entirely on the number of objects that have been created but not destroyed. The primary consequence of creating and destroying objects in a hurry is simply that the GC has to run more often, but each cycle will still be fast. If you create objects and don't destroy them, the GC will run more often and each cycle will be substantially slower as it spends more time chasing pointers to potentially-live objects, and it spends more time copying objects that are still in use.
To combat this, generational scavenging works on the assumption that objects that have remained "alive" for quite a while are likely to continue remaining alive for quite a while longer. Based on this, it has a system where objects that survive some number of garbage collection cycles get "tenured", and the garbage collector starts to simply assume they're still in use, so instead of copying them at every cycle, it simply leaves them alone. This is a valid assumption often enough that generational scavenging typically has considerably lower overhead than most other forms of GC.
"Manual" memory management is often just as poorly understood. Just for one example, many attempts at comparison assume that all manual memory management follows one specific model as well (e.g., best-fit allocation). This is often little (if any) closer to reality than many peoples' beliefs about garbage collection (e.g., the widespread assumption that it's normally done using reference counting).
Given the variety of strategies for both garbage collection and manual memory management, it's quite difficult to compare the two in terms of overall speed. Attempting to compare the speed of allocating and/or freeing memory (by itself) is pretty nearly guaranteed to produce results that are meaningless at best, and outright misleading at worst.
Bonus Topic: Benchmarks
Since quite a few blogs, web sites, magazine articles, etc., claim to provide "objective" evidence in one direction or another, I'll put in my two-cents worth on that subject as well.
Most of these benchmarks are a bit like teenagers deciding to race their cars, and whoever wins gets to keep both cars. The web sites differ in one crucial way though: they guy who's publishing the benchmark gets to drive both cars. By some strange chance, his car always wins, and everybody else has to settle for "trust me, I was really driving your car as fast as it would go."
It's easy to write a poor benchmark that produces results that mean next to nothing. Almost anybody with anywhere close to the skill necessary to design a benchmark that produces anything meaningful, also has the skill to produce one that will give the results he's decided he wants. In fact it's probably easier to write code to produce a specific result than code that will really produce meaningful results.
As my friend James Kanze put it, "never trust a benchmark you didn't falsify yourself."
Conclusion
There is no simple answer. I'm reasonably certain that I could flip a coin to choose the winner, then pick a number between (say) 1 and 20 for the percentage it would win by, and write some code that would look like a reasonable and fair benchmark, and produced that foregone conclusion (at least on some target processor--a different processor might change the percentage a bit).
As others have pointed out, for most code, speed is almost irrelevant. The corollary to that (which is much more often ignored) is that in the little code where speed does matter, it usually matters a lot. At least in my experience, for the code where it really does matter, C++ is almost always the winner. There are definitely factors that favor C#, but in practice they seem to be outweighed by factors that favor C++. You can certainly find benchmarks that will indicate the outcome of your choice, but when you write real code, you can almost always make it faster in C++ than in C#. It might (or might not) take more skill and/or effort to write, but it's virtually always possible.

Because you don't always need to use the (and I use this loosely) "fastest" language? I don't drive to work in a Ferrari just because it's faster...

Circa 2005 two MS performance experts from both sides of the native/managed fence tried to answer the same question. Their method and process are still fascinating and the conclusions still hold today - and I'm not aware of any better attempt to give an informed answer. They noted that a discussion of potential reasons for differences in performance is hypothetical and futile, and a true discussion must have some empirical basis for the real world impact of such differences.
So, the Old New Raymond Chen, and Rico Mariani set rules for a friendly competition. A Chinese/English dictionary was chosen as a toy application context: simple enough to be coded as a hobby side-project, yet complex enough to demonstrate non trivial data usage patterns. The rules started simple - Raymond coded a straightforward C++ implementation, Rico migrated it to C# line by line, with no sophistication whatsoever, and both implementations ran a benchmark. Afterwards, several iterations of optimizations ensued.
The full details are here: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
This dialogue of titans is exceptionally educational and I whole heartily recommend to dive in - but if you lack the time or patience, Jeff Atwood compiled the bottom lines beautifully:
Eventually, C++ was 2x faster - but initially, it was 13x slower.
As Rico sums up:
So am I ashamed by my crushing defeat? Hardly. The managed code
achieved a very good result for hardly any effort. To defeat the
managed version, Raymond had to:
Write his own file/io stuff
Write his own string class
Write his own allocator
Write his own international mapping
Of course he used available lower level libraries to do this,
but that's still a lot of work. Can you call what's left an STL
program? I don't think so.
That is my experience still, 11 years and who knows how many C#/C++ versions later.
That is no coincidence, of course, as these two languages spectacularly achieve their vastly different design goals. C# wants to be used where development cost is the main consideration (still the majority of software), and C++ shines where you'd save no expenses to squeeze every last ounce of performance out of your machine: games, algo-trading, data-centers, etc.

C++ always have an edge for the performance. With C#, I don't get to handle memory and I have literally tons of resources available for me to do my job.
What you need to question yourself is more about which one saves you time. Machines are incredibly powerful now and most of your code should be done in a language that allows you to get the most value in the least amount of time.
If there is a core processing that takes way too long in C#, you could then build a C++ and interop your way to it with C#.
Stop thinking about your code performance. Start building value.

C# is faster than C++. Faster to write. For execution times, nothing beats a profiler.
But C# does not have as much libraries as C++ can interface easily.
And C# depends heavily on windows...

BTW, time critical applications are not coded in C# or Java, primarily due to uncertainty of when the Garbage Collection will be performed.
In modern times, application or execution speed is not as important as was previously. Development schedules, correctness and robustness are higher priorities. A high speed version of an application is no good if it has lots of bugs, crashes a lot or worse, misses an opportunity to get to market or be deployed.
Since development schedules are a priority, new languages are coming out that speed up development. C# is one of these. C# also assists in correctness and robustness by removing features from C++ that cause common problems: one example is pointers.
The differences in execution speed of an application developed with C# and one developed using C++ is negligible on most platforms. This is due to the fact that the execution bottlenecks are not language dependent but usually depend on the operating system or I/O. For example if C++ performs a function in 5 ms but C# uses 2ms, and waiting for data takes 2 seconds, the time spent in the function is insignificant compared to the time waiting for data.
Choose a language that is best suited for the developers, platform and projects. Work towards the goals of correctness, robustness and deployment. The speed of an application should be treated as a bug: prioritize it, compare to other bugs, and fix as necessary.

A better way to look at it everything is slower than C/C++ because it abstracts away rather than following the stick and mud paradigm. It's called systems programming for a reason, you program against the grain or bare metal. Doing so also grants you speed you cannot achieve with other languages like C# or Java. But alas C roots are all about doing things the hard way, so your mostly going to be writing more code and spending more time debugging it.
C is also case sensitive, also objects in C++ also follow strict rule sets. Example a purple ice cream cone may not be the same as a blue ice cream cone, though they might be cones they may not necessarily belong to the cone family and if you forget to define what cone is you bug out. Thus the properties of ice cream may or may not be clones. Now the speed argument, C/C++ uses the stack and heap approach this is where bare metal gets it's metal.
With the boost library you can achieve incredible speeds unfortunately most game studios stick to the standard library. The other reason for this might be because software written in C/C++ tends to be massive in file size, as it's a giant collection of files instead of a single file. Also take note all operating systems are written in C so generally why must we ask the question what could be faster?!
Also caching is not faster than pure memory management, sorry but this just doesn't fan out. Memory is something physical, caching is something software does in order to gain a kick in performance. One could also reason that without physical memory caching would simply not exist. It doesn't void the fact memory must be managed at some level whether its automated or manual.

Why would you write a small application that doesn't require much in the way of optimization in C++, if there is a faster route(C#)?

Getting an exact answer to your question is not really possible unless you perform benchmarks on specific systems. However, it is still interesting to think about some fundamental differences between programming languages like C# and C++.
Compilation
Executing C# code requires an additional step where the code is JIT'ed. With regard to performance that will be in favor of C++. Also, the JIT compiler is only able to optimize the generated code within the unit of code that is JIT'ed (e.g. a method) while a C++ compiler can optimize across method calls using more aggressive techniques.
However, The JIT compiler is able to optimize the generated machine code to closely match the underlying hardware enabling it to take advantage of additional hardware features if they exist. To my knowledge the .NET JIT compiler doesn't do that but it would conceiveably be able to generate different code for Atom as opposed to Pentium CPU's.
Memory access
The garbage collected architecture can in many cases create more optimal memory access patterns than standard C++ code. If the memory area used for the first generation is small enough in can stay within the CPU cache increasing performance. If you create and destroy a lot of small objects the overhead of maintaing the managed heap may be smaller than what is required by the C++ runtime. Again, this is highly dependent on the application. A study Python of performance demonstrates that a specific managed Python application is able to scale much better than the compiled version as a result of more optimal memory access patterns.

Don't let confusing!
If a C# application is written in the best case and a C++ application is written in the best case, the C++ is faster.
Many reason is here about why C++ is faster that C# inherently, such as C# uses virtual machine similar to JVM in Java. Basically higher level language has less performance (if uses in best case).
If you're an experienced professional C# programmer just like you're an experienced professional C++ programmer, developing an application using C# is much more easy and fast than C++.
Many other situations between these situations is possible. For example, you can write an C# application and C++ application so that C# app runs faster than C++ one.
For choosing a language you should note the circumstances of the project and its subject. For a general business project you should use C#. For a hight performance required project like a Video Converter or Image Processing project you should choose C++.
Update:
OK. Lets compare some practical reason about why most possible speed of C++ is more than C#. Consider a good written C# application and same C++ version:
C# uses a VM as a middle layer for executing the application. It has overhead.
AFAIK CLR could not optimises all C# codes in target machine. C++ application could be compiled on target machine with most optimisation.
In C# the most possible optimisation for runtime means most possible fast VM. VM has overhead anyway.
C# is a higher level language thus it generates more program code lines for the final process. (consider difference between an Assembly application and Ruby one! same condition is between C++ and a higher level language such as C#/Java)
If you prefer to get some more info in practice as an expert, see this. It is about Java but it also applies to C#.

The primary concern would not be speed, but stability across windows versions and upgrades. Win32 is mostly immune across windows versions making it highly stable.
When servers are decommissioned and software migrated, A lot of anxiety happens around anything using .Net and usually a lot of panic about .net versions but a Win32 application built 10 years ago just keeps running like nothing happened.

I have been specializing in optimization for about 15 years, and regularly re write C++ code, making heavy use of compiler intrinsics as much as possible because C++ performance is often nowhere near what the CPU is capable of. Cache performance often needs to be considered. Many vector maths instructions are required to replace the standard C++ floating point code.
A great deal of STL code is re written and often runs many times faster. Maths and code which makes heavy use of data can be re written with spectacular results, as the CPU approaches its optimum performance.
None of this is possible in C#. To compare their relative #real time# performance is really a staggeringly ignorant question. The fastest piece of code in C++ will be when each single assembler instruction is optimised for the task in hand, with no unnecessary instructions - at all. Where each piece of memory is used when it is required, and not copied n times because that’s what the language design requires. Where each required memory movement works in harmony with the cache.
Where the final algorithm cannot be improved, based on the exact real time requirements, considering accuracy and functionality.
Then you will be approaching an optimal solution.
To compare C# with this ideal situation is staggering. C# can’t compete. In fact, I am currently re writing a whole bunch of C# code (when I say re writing I mean removing and replacing it completely) because it is not even in the same city, let alone ball park when it comes to heavy lifting real time performance.
So please, stop fooling yourselves. C# is slow. Dead slow. All software is slowing down, and C# is making this speed decline worse. All software runs using the fetch execute cycle in assembler (you know – on the CPU). You use 10 times as many instructions; it’s going to go 10 times as slow. You cripple the cache; it’s going to go even slower. You add garbage collect to a real time piece of software then you are often fooled into thinking that the code runs ‘ok’ there are just those few moments every now and then when the code goes ‘a bit slow for a while’.
Try adding a garbage collection system to code where every cycle counts. I wonder if the stock market trading software has garbage collection (you know – on the system running on the new undersea cable which cost $300 million?). Can we spare 300 milliseconds every 2 seconds? What about flight control software on the space shuttle – is GC ok there? How about engine management software in performance vehicles? (Where victory in a season can be worth millions).
Garbage collection in real time is a complete failure.
So no, emphatically, C++ is much faster. C# is a leap backwards.

Related

Can managed code perform computations as fast as unmamanged?

I've been interested in different chess engines lately. There are many open and closed sources project in this field. They are all (most of them anyway) written in C/C++. This is kind of an obvious thing - you have a computationally intensive task, you use C/C++ so you get both portability and speed. This seems like a no-brainier.
However, I would like to question that idea. When .NET first appeared, there were many people saying that .NET idea would not work because .NET programs were doomed to be super-slow. In reality this did not happen. Somebody did a good job with the VM, JIT, etc and we have decent performance for most tasks now. But not all. Microsoft never promised that .NET will be suitable for all task and admitted for some tasks you will still need C/C++.
Going back to the question of computationally heavy task - is there a way to write a .NET program so that it does not perform computations considerably worse that an unmanaged code using the same algorithms? I'd be happy with "constant" speed loss, but anything worse than that is going to be a problem.
What do you think? Can we be close in speed to unmanaged code for computations in managed code, or unmanaged code is the only viable answer? If we can, how? If we can't why?
Update: a lot of good feedback here. I'm going to accept the most up-voted answer.

"Can" it? Yes, of course. Even without unsafe/unverifiable code, well-optimized .NET code can outperform native code. Case in point, Dr. Jon Harrop's answer on this thread: F# performance in scientific computing
Will it? Usually, no, unless you go way out of your way to avoid allocations.

.NET is not super-super slow- but nor is it in the same realm as a native language. The speed differential is something you could easily suck up for a business app that prefers safety and shorter development cycles. If you're not using every cycle on the CPU then it doesn't matter how many you use, and the fact is that many or even most apps simply don't need that kind of performance. However, when you do need that kind of performance, .NET won't offer it.
More importantly, it's not controllable enough. In C++ then you destroy every resource, manage every allocation. This is a big burden when you really don't want to have to do it- but when you need the added performance of fine-tuning every allocation, it's impossible to beat.
Another thing to consider is the compiler. I mean, the JIT has access to more information about both the program and the target CPU. However, it does have to re-compile from scratch every time, and do so under far, far greater time constraints than the C++ compiler, inherently limiting what it's capable of. The CLR semantics, like heap allocation for every object every time, are also fundamentally limiting of it's performance. Managed GC allocation is plenty fast, but it's no stack allocation, and more importantly, de-allocation.
Edit: Of course, the fact that .NET ships with a different memory control paradigm to (most) native languages means that for an application for which garbage collection is particularly suited, then .NET code may run faster than native code. This isn't, however, anything to do with managed code versus native code, just picking the right algorithm for the right job, and it doesn't mean that an equivalent GC algorithm used from native code wouldn't be faster.

Simple answer is no. To some commenters below: it will be slower most of the time, not always.
Computationally intensive applications where each millisecond counts will still be written in unmanaged languages such as C, C++. GC slows a lot when collecting.
For example nobody writes 3D engines in C# or XNA. There are some, but nothing is close to CryEngine or Unreal.

Short answer yes, long answer with enough work.
There are high frequency trading applications written in managed C# .NET. Very few other applications ever approach the time criticalness as a trading engine requires. The overall concept is you develop software that is extremely efficient that your application will not need the garbage collector to ever invoke itself for non generation 0 objects. If at any point the garbage collector kicks in you have a massive (in computing terms of time) lag lasting dozens or hundreds of milliseconds which would be unacceptable.

You can use unsafe and pointers to get "raw" memory access, which can give you a significant speed boost at the cost of more responsibility for your stuff (remember to pin your objects). At that point, you're just shifting bytes.
Garbage Collection pressure could be an interesting thing, but there are tactics around that as well (object pooling).

This seems like an overly broad question.
There are some gratuitous hints to throw around: Yes you can use unsafe, unchecked, arrays of Structs and most importantly C++/CLI.
There is never going to be a match for C++'s inlining, compiletime template expansion (and ditto optimizations) etc.
But the bottom line is: it depends on the problem. What is computations anyway. Mono has nifty extensions to use SIMD instructions, on Win32 you'd have to go native to get those. Interop is cheating.
In my experience, though, porting toy projects (such as parsers and a Chess engine) is going to result in at least an order of magnitude speed difference, no matter how much you do optimize the .NET side of things. I reckon this has to do, mainly, with the Heap management and the service routines (System.String, System.IO).
There can be big pitfalls in .NET (overusing Linq, lambdas, accidentally relying on Enum.HasFlag to perform like a bitwise operation...) etc. YMMV and choose your weapons carefully

In general, managed code will have at least some speed loss compared to compiled code, proportional to the size of your code.This loss comes in when the VM first JIT compiles your code. Assuming the JIT compiler is just as good as a normal compiler, after that, the code will perform the same.
However, depending on how it's written, it's even possible that the JIT compiler will perform better than a normal compiler. The JIT compiler knows many more things about the target platform than a normal compiler would--it knows what code is "hot", it can cache results for proven pure functions, it knows which instruction set extensions the target platform supports, etc, whereas depending on the compiler, and how specializing you (and your intended application) allow it to be, the compiler may not be able to optimize nearly as well.
Honestly, it completely depends on your application. Especially with something so algorithmic as a chess engine, it's possible that code in a JIT compiled language, following expected semantics and regular patterns, may run faster than the equivalent in C/C++.
Write something and test it out!

Performance gains in re-writing C# code in C/C++

I wrote part of a program that does some heavy work with strings in C#. I initially chose C# not only because it was easier to use .NET's data structures, but also because I need to use this program to analyse some 2-3 million text records in a database, and it is much easier to connect to databases using C#.
There was a part of the program that was slowing down the whole code, and I decided to rewrite it in C using pointers to access every character in the string, and now the part of the code that took some 119 seconds to analyse 10,000,000 strings in C# takes the C code only 5 seconds! Performance is a priority, so I am considering rewriting the whole program in C, compiling it into a dll (something which I didn't know how to do when I started writing the program) and using DllImport from C# to use its methods to work with the database strings.
Given that rewriting the whole program will take some time, and since using DllImport to work with C#'s strings requires marshalling and such things, my question is will the performance gains from the C dll's faster string handling outweigh the performance hit of having to repeatedly marshal strings to access the C dll from C#?

First, profile your code. You might find some real headsmacker that speeds the C# code up greatly.
Second, writing the code in C using pointers is not really a fair comparison. If you are going to use pointers why not write it in assembly language and get real performance? (Not really, just reductio ad absurdam.) A better comparison for native code would be to use std::string. That way you still get a lot of help from the string class and C++ exception-safety.
Given that you have to read 2-3 million records from the DB to do this work, I very much doubt that the time spent cracking the strings is going to outweigh the elapsed time taken to load the data from the DB. So, consider instead how to structure your code so that you can begin string processing while the DB load is in progress.
If you use a SqlDataReader (say) to load the rows sequentially, it should be possible to batch up N rows as fast as possible and hand off to a separate thread for the post-processing that is your current headache and reason for this question. If you are on .Net 4.0 this is simplest to do using Task Parallel Library, and System.Collections.Concurrent could also be useful for collation of results between the threads.
This approach should mean that neither the DB latency nor the string processing is a show-stopping bottleneck, because they happen in parallel. This applies even if you are on a single-processor machine because your app can process strings while it's waiting for the next batch of data to come back from the DB over the network. If you find string processing is the slowest, use more threads (ie. Tasks) for that. If the DB is the bottleneck, then you have to look at external means to improve its performance - DB hardware or schema, network infrastructure. If you need some results in hand before processing more data, TPL allows dependencies to be created between Tasks and the coordinating thread.
My point is that I doubt it's worth the pain of re-engineering the entire app in native C or whatever. There are lots of ways to skin this cat.

One option is to rewrite the C code as unsafe C#, which ought to have roughly the same performance and won't incur any interop penalties.

There's no reason to write in C over C++, and C/C++ does not exist.
The performance implications of marshalling are fairly simple. If you have to marshal every string individually, then your performance is gonna suck. If you can marshal all ten million strings in one call, then marshalling isn't gonna make any difference at all. P/Invoke is not the fastest operation in the world but if you only invoke it a few times, it's not really gonna matter.
It might be easier to re-write your core application in C++ and then use C++/CLI to merge it with the C# database end.

There are some pretty good answers here already, especially #Steve Townsend's.
However, I felt it worth underlining a key point: There is intrinisically no reason why C code "will be faster" than C# code. That idea is a myth. Under the bonnet they both produce machine code that runs on the same CPU. As long as you don't ask the C# to do more work than the C, then it can perform just as well.
By switching to C, you forced yourself to be more frugal (you avoided using high level features like managed strings, bounds-checking, garbage collection, exception handling, etc, and simply treated your strings as blocks of raw bytes). If you applied these low-level techniques to your C# code (i.e. treating your data as raw blocks of bytes as you did in C), you would find much less difference in the speed.
For example: Last week I re-wrote (in C#) a class that a junior had written (also in C#). I achieved a 25x speed improvement over the original code by applying the same approach I would use if I were writing it in C (i.e. thinking about performance). I achieved the same speedup you're claiming without having to change to a different language at all.
Finally, just because an isolated case can be made 24x faster, it does not mean you can make your whole program 24x faster across the board by porting it all to C. As Steve said, profile it to work out where it's slow, and expend your effort only where it'll provide significant benefits. If you blindly convert to C you'll probably find you've spent a lot of time making some already-working-code a lot less maintainable.
(P.S. My viewpoint comes from 29 years experience writing assembler, C, C++, and C# code, and understanding that the language is just a tool for generating machine-code - in the case of C# vs C++ vs C, it is primarily the programmer's skill, not the language used, that determines whether the code will run quickly or slowly. C/C++ programmers tend to be better than C# programmers because they have to be - C# allows you to be lazy and get the code written quickly, while C/C++ make you do more work and the code takes longer to write. But a good programmer can get great performance out of C#, and a poor programmer can wrest abysmal performance out of C/C++)

With strings being immutable in .NET, I have no doubt that an optimised C implementation will outperform an optimised C# implemented - no doubt!
P/Invoke does incur an overhead but if you implement the bulk of the logic in C and only expose very granular API for C#, I believe you are in a much better shape.
At the end of the day, writing an implementation in C means taking longer - but that will give you better performance if you preprepared for extra development cost.

Make yourself familiar with mixed assemblies - this is better than Interop. Interop is a fast track way to deal with native libs, but mixed assemblies perform better.
Mixed assemblies on MSDN
As usual the main thing is testing and measuring...

For concatenation of long strings or multiple strings always use StringBuilder. What not everybody knows, is that StringBuilder cannot only be used to make concatenating strings faster, but also Insertion, Removal and Replacement of characters.
If this isn't fast enough for you, you can use char- or byte-arrays instead of a strings and operate on these. If you are done with manipulation you can convert the array back to a string.
There is also the option in C# to use unsafe code to get a pointer to a string and modifiy the otherwise immutable string but I wouldn't really recommend this.
As others have alread said, you can use managed C++ (C++/CLI) to nicely interoperate between .NET and managed code.
Would you mind showing us the code, maybe there are other options for optimizing?

When you start to optimize a program at a late stage (the application was written without optimization in mind) then you have to identify the bottlenecks.
Profiling is the first step to see where all those CPU cycles are going.
Just keep in mind that C# profilers will only profile your .Net application -not the IIS server implemented in the kernel nor the network stack.
And this can be an invisible bottleneck that beats by several orders of magnitude what you are focussing on when trying to make progress.
There you think that you have no influence on IIS implemented as a kernel driver -and you are right.
But you can do without it - and save a lot of time and money.
Put your talent where it can make the difference - not where you are forced to run with your feet tied together.

The inherent differences are usually given as 2x less CPU, 5x memory. In practice, few people are good enough at or C++ to gain the benefits.
There's additional gain for skimping on Unicode support, but only you can know your application well enough to know if that's safe.
Use the profiler first, make sure you not I/O bound.

What is faster- Java or C# (or good old C)? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm currently deciding on a platform to build a scientific computational product on, and am deciding on either C#, Java, or plain C with Intel compiler on Core2 Quad CPU's. It's mostly integer arithmetic.
My benchmarks so far show Java and C are about on par with each other, and .NET/C# trails by about 5%- however a number of my coworkers are claiming that .NET with the right optimizations will beat both of these given enough time for the JIT to do its work.
I always assume that the JIT would have done it's job within a few minutes of the app starting (Probably a few seconds in my case, as it's mostly tight loops), so I'm not sure whether to believe them
Can anyone shed any light on the situation? Would .NET beat Java? (Or am I best just sticking with C at this point?).
The code is highly multithreaded and data sets are several terabytes in size.
Haskell/Erlang etc are not options in this case as there is a significant quantity of existing legacy C code that will be ported to the new system, and porting C to Java/C# is a lot simpler than to Haskell or Erlang. (Unless of course these provide a significant speedup).
Edit: We are considering moving to C# or Java because they may, in theory, be faster. Every percent we can shave off our processing time saves us tens of thousands of dollars per year. At this point we are just trying to evaluate whether C, Java, or c# would be faster.

The key piece of information in the question is this:
Every percent we can shave off our
processing time saves us tens of
thousands of dollars per year
So you need to consider how much it will cost to shave each percent off. If that optimization effort costs tens of thousands of dollars per year, then it isn't worth doing. You could make a bigger saving by firing a programmer.
With the right skills (which today are rarer and therefore more expensive) you can hand-craft assembler to get the fastest possible code. With slightly less rare (and expensive) skills, you can do almost as well with some really ugly-looking C code. And so on. The more performance you squeeze out of it, the more it will cost you in development effort, and there will be diminishing returns for ever greater effort. If the profit from this stays at "tens of thousands of dollars per year" then there will come a point where it is no longer worth the effort. In fact I would hazard a guess you're already at that point because "tens of thousands of dollars per year" is in the range of one salary, and probably not enough to buy the skills required to hand-optimize a complex program.
I would guess that if you have code already written in C, the effort of rewriting it all as a direct translation in another language will be 90% wasted effort. It will very likely perform slower simply because you won't be taking advantage of the capabilities of the platform, but instead working against them, e.g. trying to use Java as if it was C.
Also within your existing code, there will be parts that make a crucial contribution to the running time (they run frequently), and other parts that are totally irrelevant (they run rarely). So if you have some idea for speeding up the program, there is no economic sense in wasting time applying it to the parts of the program that don't affect the running time.
So use a profiler to find the hot spots, and see where time is being wasted in the existing code.
Update when I noticed the reference to the code being "multithreaded"
In that case, if you focus your effort on removing bottlenecks so that your program can scale well over a large number of cores, then it will automatically get faster every year at a rate that will dwarf any other optimization you can make. This time next year, quad cores will be standard on desktops. The year after that, 8 cores will be getting cheaper (I bought one over a year ago for a few thousand dollars), and I would predict that a 32 core machine will cost less than a developer by that time.

I'm sorry, but that is not a simple question. It would depend a lot on what exactly was going on. C# is certainly no slouch, and you'd be hard-pressed to say "java is faster" or "C# is faster". C is a very different beast... it maybe has the potential to be faster - if you get it right; but in most cases it'll be about the same, but much harder to write.
It also depends how you do it - locking strategies, how you do the parallelization, the main code body, etc.
Re JIT - you could use NGEN to flatten this, but yes; if you are hitting the same code it should be JITted very early on.
One very useful feature of C#/Java (over C) is that they have the potential to make better use of the local CPU (optimizations etc), without you having to worry about it.
Also - with .NET, consider things like "Parallel Extensions" (to be bundled in 4.0), which gives you a much stronger threading story (compared to .NET without PFX).

Don't worry about language; parallelize!
If you have a highly multithreaded, data-intensive scientific code, then I don't think worrying about language is the biggest issue for you. I think you should concentrate on making your application parallel, especially making it scale past a single node. This will get you far more performance than just switching languages.
As long as you're confined to a single node, you're going to be starved for compute power and bandwidth for your app. On upcoming many-core machines, it's not clear that you'll have the bandwidth you need to do data-intensive computing on all the cores. You can do computationally intensive work (like a GPU does), but you may not be able to feed all the cores if you need to stream a lot of data to every one of them.
I think you should consider two options:
MapReduce
Your problem sounds like a good match for something like Hadoop, which is designed for very data-intensive jobs.
Hadoop has scaled to 10,000 nodes on Linux, and you can shunt your work off either to someone else's (e.g. Amazon's, Microsoft's) or your own compute cloud. It's written in Java, so as far as porting goes, you can either call your existing C code from within Java, or you can port the whole thing to Java.
MPI
If you don't want to bother porting to MapReduce, or if for some reason your parallel paradigm doesn't fit the MapReduce model, you could consider adapting your app to use MPI. This would also allow you to scale out to (potentially thousands) of cores. MPI is the de-facto standard for computationally intensive, distributed-memory applications, and I believe there are Java bindings, but mostly people use MPI with C, C++, and Fortran. So you could keep your code in C and focus on parallelizing the performance-intensive parts. Take a look at OpenMPI for starters if you are interested.

I'm honestly surprised at those benchmarks.
In a computationally intensive product I would place a large wager on C to perform faster. You might write code that leaks memory like a sieve, and has interesting threading related defects, but it should be faster.
The only reason I could think that Java or C# would be faster is due to a short run length on the test. If little or no GC happened, you'll avoid the overhead of actually deallocating memory. If the process is iterative or parallel, try sticking a GC.Collect wherever you think you're done a bunch of objects(after setting things to null or otherwise removing references).
Also, if you're dealing with terabytes of data, my opinion is you're going to be much better off with deterministic memory allocation that you get with C. If you deallocate roughly close to when you allocate your heap will stay largely unfragmented. With a GC environment you may very well end up with your program using far more memory after a decent run length than you would guess, just because of fragmentation.
To me this sounds like the sort of project where C would be the appropriate language, but would require a bit of extra attention to memory allocation/deallocation. My bet is that C# or Java will fail if run on a full data set.

Quite some time ago Raymond Chen and Rico Mariani had a series of blog posts incrementally optimising a file load into a dictionary tool. While .NET was quicker early on (i.e. easy to make quick) the C/Win32 approach eventually was significantly faster -- but at considerable complexity (e.g. using custom allocators).
In the end the answer to which is faster will heavily depend on how much time you are willing to expend on eking every microsecond out of each approach. That effort (assuming you do it properly, guided by real profiler data) will make a far greater difference than choice of language/platform.
The first and last performance blog entries:
Chen part 1
Mariani part 1
Check final part
Mariani final part
(The last link gives an overall summary of the results and some analysis.)

It is going to depend very much on what you are doing specifically. I have Java code that beats C code. I have Java code that is much slower than C++ code (I don't do C#/.NET so cannot speak to those).
So, it depends on what you are doing, I am sure you can find something that is faster in language X than language Y.
Have you tried running the C# code through a profiler to see where it is taking the most time (same with Java and C while you are at it). Perhaps you need to do something different.
The Java HotSpot VM is more mature (roots of it going back to at least 1994) than the .NET one, so it may come down to the code generation abilities of both for that.

You say "the code is multithreaded" which implies that the algorithms are parallelisable. Also, you save the "data sets are several terabytes in size".
Optimising is all about finding and eliminating bottlenecks.
The obvious bottleneck is the bandwidth to the data sets. Given the size of the data, I'm guessing that the data is held on a server rather than on a desktop machine. You haven't given any details of the algorithms you're using. Is the time taken by the algorithm greater than the time taken to read/write the data/results? Does the algorithm work on subsets of the total data?
I'm going to assume that the algorithm works on chunks of data rather than the whole dataset.
You have two scenarios to consider:
The algorithm takes more time to process the data than it does to get the data. In this case, you need to optimise the algorithm.
The algorithm takes less time to process the data than it does to get the data. In this case, you need to increase the bandwidth between the algorithm and the data.
In the first case, you need a developer that can write good assembler code to get the most out of the processors you're using, leveraging SIMD, GPUs and multicores if they're available. Whatever you do, don't just crank up the number of threads because as soon as the number of threads exceeds the number of cores, your code goes slower! This due to the added overhead of switching thread contexts. Another option is to use a SETI like distributed processing system (how many PCs in your organisation are used for admin purposes - think of all that spare processing power!). C#/Java, as bh213 mentioned, can be an order of magnitude slower than well written C/C++ using SIMD, etc. But that is a niche skillset these days.
In the latter case, where you're limited by bandwidth, then you need to improve the network connecting the data to the processor. Here, make sure you're using the latest ethernet equipment - 1Gbps everywhere (PC cards, switches, routers, etc). Don't use wireless as that's slower. If there's lots of other traffic, consider a dedicated network in parallel with the 'office' network. Consider storing the data closer to the clients - for every five or so clients use a dedicated server connected directly to each client which mirrors the data from the server.
If saving a few percent of processing time saves "tens of thousands of dollars" then seriously consider getting a consultant in, two actually - one software, one network. They should easily pay for themselves in the savings made. I'm sure there's many here that are suitably qualified to help.
But if reducing cost is the ultimate goal, then consider Google's approach - write code that keeps the CPU ticking over below 100%. This saves energy directly and indirectly through reduced cooling, thus costing less. You'll want more bang for your buck so it's C/C++ again - Java/C# have more overhead, overhead = more CPU work = more energy/heat = more cost.
So, in summary, when it comes to saving money there's a lot more to it than what language you're going to choose.

If there is already a significant quantity of legacy C code that will be added to the system then why move to C# and Java?
In response to your latest edit about wanting to take advantage of any improvements in processing speed....then your best bet would be to stick to C as it runs closer to the hardware than C# and Java which have the overhead of a runtime environment to deal with. The closer to the hardware you can get the faster you should be able to run. Higher Level languages such as C# and Java will result in quicker development times...but C...or better yet Assembly will result in quicker processing time...but longer development time.

I participated in a few TopCoder's Marathon matches where performance was they key to victory.
My choice was C#. I think C# solutions placed slightly above Java and were slighly slower than C++... Until somebody wrote a code in C++ that was a order of magnitude faster. You were alowed to use Intel compiler and the winning code was full of SIMD insturctions and you cannot replicate that in C# or Java. But if SIMD is not an option, C# and Java should be good enough as long as you take care to use memory correctly (e.g. watch for cache misses and try to limit memory access to the size of L2 cache)

You question is poorly phrased (or at least the title is) because it implies this difference is endemic and holds true for all instances of java/c#/c code.
Thankfully the body of the question is better phrased because it presents a reasonably detailed explanation of the sort of thing your code is doing. It doesn't state what versions (or providers) of c#/java runtimes you are using. Nor does it state the target architecture or machine the code will run on. These things make big differences.
You have done some benchmarking, this is good. Some suggestions as to why you see the results you do:
You aren't as good at writing performant c# code as you are at java/c (this is not a criticism, or even likely but it is a real possibility you should consider)
Later versions of the JVM have some serious optimizations to make uncontended locks extremely fast. This may skew things in your favour (And especially the comparison with the c implementation threading primitives you are using)
Since the java code seems to run well compared to the c code it is likely that you are not terribly dependent on the heap allocation strategy (profiling would tell you this).
Since the c# code runs less well than the java one (and assuming the code is comparable) then several possible reasons exist:
You are using (needlessly) virtual functions which the JVM will inline but the CLR will not
The latest JVM does Escape Analysis which may make some code paths considerably more efficient (notably those involving string manipulation whose lifetime is stack bound
Only the very latest 32 bit CLR will inline methods involving non primitive structs
Some JVM JIT compilers use hotspot style mechanisms which attempt to detect the 'hotspots' of the code and spend more effort re-jitting them.
Without an understanding of what your code spends most of its time doing it is impossible to make specific suggestions. I can quite easily write code which performs much better under the CLR due to use of structs over objects or by targeting runtime specific features of the CLR like non boxed generics, this is hardly instructive as a general statement.

Actually it is 'Assembly language'.

Depends on what kind of application you are writing.
Try The Computer Language Benchmarks Game
http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=csharp&lang2=java&box=1
http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=csharp&lang2=java&box=1

To reiterate a comment, you should be using the GPU, not the CPU if you are doing arithmetic scientific computing. Matlab with CUDA plugins would be much more awesome than Java or c# if Matlab licensing is not an issue. The nVidia documentation shows how to compile any CUDA function into a mex file. If you need free software, I like pycuda.
If however, GPUs are not an option, I personally like C for a lot of routines because the optimizations the compiler makes are not as complicated as JIT: you don't have to worry about whether a "class" becomes like a "struct" or not. In my experience, problems can usually be broken down such that higher-level things can be written in a very expressive language like Python (rich primitives, dynamic types, incredibly flexible reflection), and transformations can be written in something like C. Additionally, there's neat compiler software, like PLUTO (automatic loop parallelization and OpenMP code generation), and libraries like Hoard, tcmalloc, BLAS (CUBLAS for gpu), etc. if you choose to go the C/C++ route.

One thing to notice is that IF your application(s) would benefit of lazy evaluation a functional programming language like Haskell may yield speedups of a totally different magnitude than the theretically optimal structured/OO code just by not evaluating unnecessary branches.
Also, if you are talking about the monetary benefit of better performance, don't forget to add the cost of maintaing your software into the equation.

Surely the answer is to go and buy the latest PC with the most cores/processors you can afford. If you buy one of the latest 2x4 core PCs you will find not only does it have twice as many cores as a quad core but also they run 25-40% faster than the previous generation of processors/machines.
This will give you approximately a 150% speed up. Far more than choosing Java/C# or C.
and whats more your get the same again every 18 months if you keep buying in new boxes!
You can sit there for months rewriting you code or I could go down to my local PC store this afternoon and be running faster than all your efforts same day.
Improving code quality/efficiency is good but sometimes implementation dollars are better spent elsewhere.

Writing in one language or another will only give you small speed ups for a large amount of work. To really speed things up you might want to look at the following:
Buying the latest fastest Hardware.
Moving from 32 bit operating system to 64 bit.
Grid computing.
CUDA / OpenCL.
Using compiler optimisation like vectorization.

I would go with C# (or Java) because your development time will probably be much faster than with C. If you end up needing extra speed then you can always rewrite a section in C and call it as a module.

My preference would be C or C++ because I'm not separated from the machine language by a JIT compiler.
You want to do intense performance tuning, and that means stepping through the hot spots one instruction at a time to see what it is doing, and then tweaking the source code so as to generate optimal assembler.
If you can't get the compiler to generate what you consider good enough assembler code, then by all means write your own assembler for the hot spot(s). You're describing a situation where the need for performance is paramount.
What I would NOT do if I were in your shoes (or ever) is rely on anecdotal generalizations about one language being faster or slower than another. What I WOULD do is multiple passes of intense performance tuning along the lines of THIS and THIS and THIS. I have done this sort of thing numerous times, and the key is to iterate the cycle of diagnosis-and-repair because every slug fixed makes the remaining ones more evident, until you literally can't squeeze another cycle out of that turnip.
Good luck.
Added: Is it the case that there is some seldom-changing configuration information that determines how the bulk of the data is processed? If so, it may be that the program is spending a lot of its time re-interpreting the configuration info to figure out what to do next. If so, it is usually a big win to write a code generator that will read the configuration info and generate an ad-hoc program that can whizz through the data without constantly having to figure out what to do.

Depends what you benchmark and on what hardware. I assume it's speed rather than memory or CPU usage.But....
If you have a dedicated machine for an app only with very large amounts of memory then java might be 5% faster.
If you go down in the real world with limited memory and more apps running on the same machine .net looks better at utilizing computing resources :see here
If the hardware is very constrained, C/C++ wins hands down.

If you are using a highly multithreaded code, I would recommend you to take a look at the upcoming Task Parallel Library (TPL) for .NET and the Parallel Pattern Library (PPL) for native C++ applications. That will save you a lot of issues with thread/dead lockíng and all other issues that you would spend a lot of time digging into and solving for yourself.
For my self, I truly believe that the memory management in the managed world will be more efficient and beat the native code in the long term.

If much of your code is in C why not keep it?
In principal and by design it's obvious that C is faster. They may close the gap over time but they always have more level os indirection and "safety". C is fast because it's "unsafe". Just think about bound checking. Interfacing to C is supported in every langauge. And so I can not see why one would not like to just wrap the C code up if it's still working and use it in whatever language you like

I would consider what everyone else uses - not the folks on this site, but the folks who write the same kind of massively parallel, or super high-performance applications.
I find they all write their code in C/C++. So, just for this fact alone (ie. regardless of any speed issues between the languages), I would go with C/C++. The tools they use and have developed will be of much more use to you if you're writing in the same language.
Aside from that, I've found C# apps to have somewhat less than optimal performance in some areas, multithreading is one. .NET will try to keep you safe from thread problems (probably a good thing in most cases), but this will cause your specific case problems (to test: try writing a simple loop that accesses a shared object using lots of threads. Run that on a single core PC and you get better performance than if you run it on a multiple core box - .net is adding its own locks to make sure you don't muck it up)(I used Jon Skeet's singleton benchmark. The static lock on took 1.5sec on my old laptop, 8.5s on my superfast desktop, the lock version is even worse, try it yourself)
The next point is that with C you tend to access memory and data directly - nothing gets in the way, with C#/Java you will use some of the many classes that are provided. These will be good in the general case, but you're after the best, most efficient way to access this (which, for your case is a big deal with multi-terabytes of data, those classes were not designed with those datasets in mind, they were designed for the common cases everyone else uses), so again, you would be safer using C for this - you'll never get the GC getting clogged up by a class that creates new strings internally when you read a couple of terabytes of data if you write it in C!
So it may appear that C#/Java can give you benefits over a native application, but I think you'll find those benefits are only realised for the kind of line-of-business applications that are commonly written.

Note that for heavy computations there is a great advantage in having tight loops which can fit in the CPU's first level cache as it avoids having to go to slower memory repeatedly to get the instructions.
Even for level two cache a large program like Quake IV gets a 10% performance increase with 4 Mb level 2 cache versus 1 Mb level 2 cache - http://www.tomshardware.com/reviews/cache-size-matter,1709-5.html
For these tight loops C is most likely the best as you have the most control of the generated machine code, but for everything else you should go for the platform with the best libraries for the particular task you need to do. For instance the netlib libraries are reputed to have very good performance for a very large set of problems, and many ports to other languages are available.

If every percentage will really save you tens of thousands of dollars, then you should bring in a domain expert to help with the project. Well designed and written code with performance considered at the initial stages may be an order of magnitude faster, saving you 90%, or $900,000. I recently found a subtle flaw in some code that sped up a process by over 100 times. A colleague of mine found an algorithm that was running in O(n^3) that he re-wrote to make it O(N log n). This tends to be where the huge performance saving are.
If the problem is so simple that you are certain that a better algorithm cannot be employed giving you significant savings, then C is most likely your best language.

The most important things are already said here. I would add:
The developer utilizes a language which the compiler(s) utilize(s) to generate machine instructions which the processor(s) utilize(s) to use system resources. A program will be "fast" when ALL parts of the chain perform optimally.
So for the "best" language choice:
take that language which you are best able to control and
which is able to instruct the compiler sufficiently to
generate nearly optimal machine code so that
the processor on the target machine is able to utilize processing resources optimally.
If you are not a performance expert you will have a hard time to archieve 'peak performance' within ANY language. Possibly C++ still provides the most options to control the machine instructions (especially SSE extensions a.s.o).
I suggest to orient on the well known 80:20 rule. This is fairly well true for all: the hardware, the languages/platforms and the developer efforts.
Developers have always relied on the hardware to fix all performance issues automatically due to an upgrade to a faster processor f.e.. What might have worked in the past will not work in the (nearest) future. The developer now has the responsibility to structure her programs accordingly for parallelized execution. Languages for virtual machines and virtual runtime environments will show some advantage here. And even without massive parallelization there is little to no reason why C# or Java shouldn't succeed similar well as C++.
#Edit: See this comparison of C#, Matlab and FORTRAN, where FORTRAN does not win alone!

Ref; "My benchmarks so far show Java and C are about on par with each other"
Then your benchmarks are severely flawed...
C will ALWAYS be orders of magnitudes faster then both C# and Java unless you do something seriously wrong...!
PS!
Notice that this is not an attempt to try to bully neither C# nor Java, I like both Java and C#, and there are other reasons why you would for many problems choose either Java or C# instead of C. But neither Java nor C# would in a correct written tests NEVER be able to perform with the same speed as C...
Edited because of the sheer number of comments arguing against my rhetoric
Compare these two buggers...
C#
public class MyClass
{
public int x;
public static void Main()
{
MyClass[] y = new MyClass[1000000];
for( int idx=0; idx < 1000000; idx++)
{
y[idx] = new MyClass();
y[idx].x = idx;
}
}
}
against this one (C)
struct MyClass
{
int x;
}
void Main()
{
MyClass y[1000000];
for( int idx = 0; idx < 1000000; idx++)
{
y[idx].x = idx;
}
}
The C# version first of all needs to store its array on the heap. The C version stores the array on the stack. To store stuff on the stack is merely changing the value of an integer value while to store stuff on the heap means finding a big enough chunk of memory and potentially means traversing the memory for a pretty long time.
Now mostly C# and Java allocates huge chunks of memory which they keep on spending till it's out which makes this logic execute faster. But even then to compare this against changing the value of an integer is like an F16 against an oil tanker speedwise...
Second of all in the C version since all those objects are already on the stack we don't need to explicitly create new objects within the loop. Yet again for C# this is a "look for available memory operation" while the C version is a ZIP (do nothing operation)
Third of all is the fact that the C version will automatically delete all these objects when they run out of scope. Yet again this is an operation which ONLY CHANGES THE VALUE OF AN INTEGER VALUE. Which would on most CPU architectures take between 1 and 3 CPU cycles. The C# version doesn't do that, but when the Garbage Collector kicks in and needs to collect those items my guess is that we're talking about MILLIONS of CPU cycles...
Also the C version will instantly become x86 code (on an x86 CPU) while the C# version would first become IL code. Then later when executed it would have to be JIT compiled, which probably alone takes orders of magnitudes longer time then only executing the C version.
Now some wise guy could probably execute the above code and measure CPU cycles. However that's basically no point at all in doing because mathematically it's proven that the Managed Version would probably take several million times the number of CPU cycles as the C version. So my guess is that we're now talking about 5-8 orders of magnitudes slower in this example. And sure, this is a "rigged test" in that I "looked for something to prove my point", however I challenge those that commented badly against me on this post to create a sample which does NOT execute faster in C and which also doesn't use constructs which you normally never would use in C due to "better alternatives" existing.
Note that C# and Java are GREAT languages. I prefer them over C ANY TIME OF THE DAY. But NOT because they're FASTER. Because they are NOT. They are ALWAYS slower then C and C++. Unless you've coded blindfolded in C or C++...
Edit;
C# of course have the struct keyword, which would seriously change the speed for the above C# version, if we changed the C# class to a value type by using the keyword struct instead of class. The struct keyword means that C# would store new objects of the given type on the stack - which for the above sample would increase the speed seriously. Still the above sample happens to also feature an array of these objects.
Even though if we went through and optimized the C# version like this, we would still end up with something several orders of magnitudes slower then the C version...
A good written piece of C code will ALWAYS be faster then C#, Java, Python and whatever-managed-language-you-choose...
As I said, I love C# and most of the work I do today is C# and not C. However I don't use C# because it's faster then C. I use C# because I don't need the speed gain C gives me for most of my problems.
Both C# and Java is though ridiculously slower then C, and C++ for that matter...

Which is faster - C# unsafe code or raw C++

I'm writing an image processing program to perform real time processing of video frames. It's in C# using the Emgu.CV library (C#) that wraps the OpenCV library dll (unmanaged C++). Now I have to write my own special algorithm and it needs to be as fast as possible.
Which will be a faster implementation of the algorithm?
Writing an 'unsafe' function in C#
Adding the function to the OpenCV library and calling it through Emgu.CV
I'm guessing C# unsafe is slower because it goes throught the JIT compiler, but would the difference be significant?
Edit:
Compiled for .NET 3.5 under VS2008

it needs to be as fast as possible
Then you're asking the wrong question.
Code it in assembler, with different versions for each significant architecture variant you support.
Use as a guide the output from a good C++ compiler with optimisation, because it probably knows some tricks that you don't. But you'll probably be able to think of some improvements, because C++ doesn't necessarily convey to the compiler all information that might be useful for optimisation. For example, C++ doesn't have the C99 keyword restrict. Although in that particular case many C++ compilers (including MSVC) do now support it, so use it where possible.
Of course if you mean, "I want it to be fast, but not to the extent of going outside C# or C++", then the answer's different ;-)
I would expect C# to at least approach the performance of similar-looking C++ in a lot of cases. I assume of course that the program will be running long enough that the time the JIT itself takes is irrelevant, but if you're processing much video then that seems likely. But I'd also expect there to be certain things which if you do them in unsafe C#, will be far slower than the equivalent thing in C++. I don't know what they are, because all my experience of JITs is in Java rather than CLR. There might also be things which are slower in C++, for instance if your algorithm makes any calls back into C# code.
Unfortunately the only way to be sure how close it is is to write both and test them, which kind of misses the point that writing the C++ version is a bunch of extra effort. However, you might be able to get a rough idea by hacking some quick code which approximates the processing you want to do, without necessarily doing all of it or getting it right. If you algorithm is going to loop over all the pixels and do a few FP ops per pixel, then hacking together a rough benchmark should take all of half an hour.
Usually I would advise against starting out thinking "this needs to be as fast as possible". Requirements should be achievable, and by definition "as X as possible" is only borderline achievable. Requirements should also be testable, and "as X as possible" isn't testable unless you somehow know a theoretical maximum. A more friendly requirement is "this needs to process video frames of such-and-such resolution in real time on such-and-such a speed CPU", or "this needs to be faster than our main competitor's product". If the C# version does that, with a bit to spare to account for unexpected minor issues in the user's setup, then job done.

It depends on the algorithm, the implementation, the C++ compiler and the JIT compiler. I guess in most cases the C++ implementation will be faster. But this may change.
The JIT compiler can optimize your code for the platform your code is running on instead of an average for all the platforms your code might run on as the C++ compiler does. This is something newer versions of the JIT compiler are increasingly good at and may in some cases give JITted code an advantage. So the answer is not as clear as you might expect. The new Java hotspot compiler does this very well for example.
Other situations where managed code may do better than C++ is where you need to allocate and deallocate lots of small objects. The .net runtime preallocates large chunks of memory that can be reused so it doesn't need to call into the os every time you need to allocate memory.
I'm not sure unsafe C# runs much faster than normal C#. You'll have to try this too.
If you want to know what's the best solution for your situation you'll have to try both and measure the difference. I dont think there will be more than

Languages don't have a "speed". It depends on the compiler and the code. It's possible to write inefficient code in any language, and a clever compiler will generate near-optimal code no matter the language of the source.
The only really unavoidable factor in performance between C# and C++ is that C# apps have to do more at startup (load the .NET framework and perhaps JIT some code), so all things being equal, they will launch a bit slower. After that, it depends, and there's no fundamental reason why one language must always be faster than another.
I'm also not aware of any reasons why unsafe C# should be faster than safe. In general, safe is good because it allows the compiler to make some much stronger assumptions, and so safe might be faster. But again, it depends on the code you're compiling, the compiler you're using and a dozen other factors.
In short, give up on the idea that you can measure the performance of a language. You can't. A language is never "fast" or slow". It doesn't have a speed.

C# is typically slower than C++. There are runtime checks in managed code. These are what make it managed, after all. C++ doesn't have to check whether the bounds of an array have been exceeded for example.
From my experience, using fixed memory helps a lot. There is a new System.IO.UnmanagedMemoryAccessor class in .NET 4.0 which may help in the future.

If you are going to implement your algorithm in a standard way I think it's irrelevant.
But some languages have bindings to apis or libraries that can give you a non standart boost.
Consider if you can use GPU processing - nvidia and ati provide the CUDA and CTM frameworks and there is an ongoing standarization effort from the khronos group (openGL). A hunch tells me also that amd will add at least one streaming processor core in their future chips. So I think there is quite a promise in that area.
Try to see if you can exploit SSE instructions, there are libraries around -most in C++ or C- that provide handy apis, check Intel's site for handy optimized libraries I do recall "Intel Performance Primitives" and a "Math Kernel".
But on the politics side, do incorporate your algorithm in OpenCV so others may benefit too.

It's a battle that will rage on forever. C versus C++ versus C# versus whatever.
In C#, the notion of unsafe is to unlock "dangerous" operations. ie, the use of pointers, and being able to cast to void pointers etc, as you can in C and C++.
Very dangerous, and very powerful! But defeating what C# was based upon.
You'll find that nowadays, Microsoft has made strides in the direction of performance, especially since the release of .NET, and the next version of .NET will actually support inline methods, as you can with C++. This will increase performance for very specific situations. I hate that it's not going to be a c# feature, but a nasty attribute the compiler picks up on - but you can't have it all.
Personally, I'm writing a game with C# and managed DirectX (why not XNA?? beyond the scope of this post). I'm using unsafe code in graphical situations, which brings about a nod in the direction of what others have said.
It's only because pixel access is rediculously slow with GDI++ that I was driven to look for alternatives. But on the whole, the c# compiler is pretty damned good, and for code comparisons (you can find articles) you'll find the performance is very comparable to c++.
That's not to say there isn't a better way to write the code.
At the end of the day, I personally see C, C++, and C# as about the same speed when executing. It's just that in some painful situations where you want to work really closely with the underlying hardware or very close to those pixels, that you'll find noticeable advantage to the C/C++ crowd.
But for business, and most things nowadays, C# is a real contender, and staying within the "safe" environment is definitely a bonus.
When stepping outside, you can get most things done with unsafe code, as I have - and boy, have I gone to some extremes! But was it worth it? Probably not. I personally wonder if I should have thought more along the lines of time-critical code in C++, and all the Object Oriented safe stuff in C#. But I have better performance than I thought I'd get!
So long as you're careful with the amount of interop calls you're making, you can get the best of both worlds. I've personally avoided that, but I don't know to what cost.
So an approach I've not tried, but would love to hear adventures in, in actually using C++.NET to develop a library in - would that be any faster than c#'s unsafe for these special graphical situations? How would that compare to native C++ compiled code? Now there's a question!
Hmm..

If you know your environment and you use a good compiler (for video processing on windows, Intel C++ Compiler is probably the best choice), C++ will beat C# hands-down for several reasons:
The C++ runtime environment has no intrinsic runtime checks (the downside being that you have free reign to blow yourself up). The C# runtime environment is going to have some sanity checking going on, at least initially.
C++ compilers are built for optimizing code. While it's theoretically possible to implement a C# JIT compiler using all of the optimizing voodo that ICC (or GCC) uses, it's doubtful that Microsoft's JIT will reliably do better. Even if the JIT compiler has runtime statistics, that's still not as good as profile-guided optimization in either ICC or GCC.
A C++ environment allows you to control your memory model much better. If your application gets to the point of thrashing the data cache or fragmenting the heap, you'll really appreciate the extra control over allocation. Heck, if you can avoid dynamic allocations, you're already much better off (hint: the running time of malloc() or any other dynamic allocator is nondeterministic, and almost all non-native languages force heavier heap usage, and thus heavier allocation).
If you use a poor compiler, or if you can't target a good chipset, all bets are off.

To be honest, what language you write it in is not nearly as important as what algorithms you use (IMO, anyway). Maybe by going to native code you might make your application faster, but it might also make it slower--it'd depend on the compiler, how the programs are written, what sort of interop costs you'd incur if you're using a mixed environment, etc. You can't really say without profiling it. (and, for that matter, have you profiled your application? Do you actually know where it's spending time?)
A better algorithm is completely independent of the language you choose.

I'm a little late in responding but I can give you some anecdotal experience. We had some matrix multiplication routines that were originally coded in C# using pointers and unsafe code. This proved to be a bottleneck in our application and we then used pinning+P/Invoke to call into a C++ version of the Matrix multiplication routine and got a factor of 2 improvement. This was a while ago with .NET 1.1, so things might be better now. As others point out, this proves nothing, but it was an interesting exercise.
I also agree with thAAAnos, if you algorithm really has to be "as fast as possible" leverage IPL or, if you must, consider a GPU implementation.

Running on the CPU is always going to be faster than running on a VM on the CPU. I can't believe people are trying to argue otherwise.
For example, we have some fairly heavy image processing work on our web server that's queued up. Initially to get it working, we used PHP's GD functions.
They were slow as hell. We rewrote the functionality we needed in C++.

C++ performance vs. Java/C#

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
My understanding is that C/C++ produces native code to run on a particular machine architecture. Conversely, languages like Java and C# run on top of a virtual machine which abstracts away the native architecture. Logically it would seem impossible for Java or C# to match the speed of C++ because of this intermediate step, however I've been told that the latest compilers ("hot spot") can attain this speed or even exceed it.
Perhaps this is more of a compiler question than a language question, but can anyone explain in plain English how it is possible for one of these virtual machine languages to perform better than a native language?

JIT vs. Static Compiler
As already said in the previous posts, JIT can compile IL/bytecode into native code at runtime. The cost of that was mentionned, but not to its conclusion:
JIT has one massive problem is that it can't compile everything: JIT compiling takes time, so the JIT will compile only some parts of the code, whereas a static compiler will produce a full native binary: For some kind of programs, the static compiler will simply easily outperform the JIT.
Of course, C# (or Java, or VB) is usually faster to produce viable and robust solution than is C++ (if only because C++ has complex semantics, and C++ standard library, while interesting and powerful, is quite poor when compared with the full scope of the standard library from .NET or Java), so usually, the difference between C++ and .NET or Java JIT won't be visible to most users, and for those binaries that are critical, well, you can still call C++ processing from C# or Java (even if this kind of native calls can be quite costly in themselves)...
C++ metaprograming
Note that usually, you are comparing C++ runtime code with its equivalent in C# or Java. But C++ has one feature that can outperform Java/C# out of the box, that is template metaprograming: The code processing will be done at compilation time (thus, increasing vastly compilation time), resulting into zero (or almost zero) runtime.
I have yet so see a real life effect on this (I played only with concepts, but by then, the difference was seconds of execution for JIT, and zero for C++), but this is worth mentioning, alongside the fact template metaprograming is not trivial...
Edit 2011-06-10: In C++, playing with types is done at compile time, meaning producing generic code which calls non-generic code (e.g. a generic parser from string to type T, calling standard library API for types T it recognizes, and making the parser easily extensible by its user) is very easy and very efficient, whereas the equivalent in Java or C# is painful at best to write, and will always be slower and resolved at runtime even when the types are known at compile time, meaning your only hope is for the JIT to inline the whole thing.
...
Edit 2011-09-20: The team behind Blitz++ (Homepage, Wikipedia) went that way, and apparently, their goal is to reach FORTRAN's performance on scientific calculations by moving as much as possible from runtime execution to compilation time, via C++ template metaprogramming. So the "I have yet so see a real life effect on this" part I wrote above apparently does exist in real life.
Native C++ Memory Usage
C++ has a memory usage different from Java/C#, and thus, has different advantages/flaws.
No matter the JIT optimization, nothing will go has fast as direct pointer access to memory (let's ignore for a moment processor caches, etc.). So, if you have contiguous data in memory, accessing it through C++ pointers (i.e. C pointers... Let's give Caesar its due) will goes times faster than in Java/C#. And C++ has RAII, which makes a lot of processing a lot easier than in C# or even in Java. C++ does not need using to scope the existence of its objects. And C++ does not have a finally clause. This is not an error.
:-)
And despite C# primitive-like structs, C++ "on the stack" objects will cost nothing at allocation and destruction, and will need no GC to work in an independent thread to do the cleaning.
As for memory fragmentation, memory allocators in 2008 are not the old memory allocators from 1980 that are usually compared with a GC: C++ allocation can't be moved in memory, true, but then, like on a Linux filesystem: Who needs hard disk defragmenting when fragmentation does not happen? Using the right allocator for the right task should be part of the C++ developer toolkit. Now, writing allocators is not easy, and then, most of us have better things to do, and for the most of use, RAII or GC is more than good enough.
Edit 2011-10-04: For examples about efficient allocators: On Windows platforms, since Vista, the Low Fragmentation Heap is enabled by default. For previous versions, the LFH can be activated by calling the WinAPI function HeapSetInformation). On other OSes, alternative allocators are provided (see https://secure.wikimedia.org/wikipedia/en/wiki/Malloc for a list)
Now, the memory model is somewhat becoming more complicated with the rise of multicore and multithreading technology. In this field, I guess .NET has the advantage, and Java, I was told, held the upper ground. It's easy for some "on the bare metal" hacker to praise his "near the machine" code. But now, it is quite more difficult to produce better assembly by hand than letting the compiler to its job. For C++, the compiler became usually better than the hacker since a decade. For C# and Java, this is even easier.
Still, the new standard C++0x will impose a simple memory model to C++ compilers, which will standardize (and thus simplify) effective multiprocessing/parallel/threading code in C++, and make optimizations easier and safer for compilers. But then, we'll see in some couple of years if its promises are held true.
C++/CLI vs. C#/VB.NET
Note: In this section, I am talking about C++/CLI, that is, the C++ hosted by .NET, not the native C++.
Last week, I had a training on .NET optimization, and discovered that the static compiler is very important anyway. As important than JIT.
The very same code compiled in C++/CLI (or its ancestor, Managed C++) could be times faster than the same code produced in C# (or VB.NET, whose compiler produces the same IL than C#).
Because the C++ static compiler was a lot better to produce already optimized code than C#'s.
For example, function inlining in .NET is limited to functions whose bytecode is less or equal than 32 bytes in length. So, some code in C# will produce a 40 bytes accessor, which won't be ever inlined by the JIT. The same code in C++/CLI will produce a 20 bytes accessor, which will be inlined by the JIT.
Another example is temporary variables, that are simply compiled away by the C++ compiler while still being mentioned in the IL produced by the C# compiler. C++ static compilation optimization will result in less code, thus authorizes a more aggressive JIT optimization, again.
The reason for this was speculated to be the fact C++/CLI compiler profited from the vast optimization techniques from C++ native compiler.
Conclusion
I love C++.
But as far as I see it, C# or Java are all in all a better bet. Not because they are faster than C++, but because when you add up their qualities, they end up being more productive, needing less training, and having more complete standard libraries than C++. And as for most of programs, their speed differences (in one way or another) will be negligible...
Edit (2011-06-06)
My experience on C#/.NET
I have now 5 months of almost exclusive professional C# coding (which adds up to my CV already full of C++ and Java, and a touch of C++/CLI).
I played with WinForms (Ahem...) and WCF (cool!), and WPF (Cool!!!! Both through XAML and raw C#. WPF is so easy I believe Swing just cannot compare to it), and C# 4.0.
The conclusion is that while it's easier/faster to produce a code that works in C#/Java than in C++, it's a lot harder to produce a strong, safe and robust code in C# (and even harder in Java) than in C++. Reasons abound, but it can be summarized by:
Generics are not as powerful as templates (try to write an efficient generic Parse method (from string to T), or an efficient equivalent of boost::lexical_cast in C# to understand the problem)
RAII remains unmatched (GC still can leak (yes, I had to handle that problem) and will only handle memory. Even C#'s using is not as easy and powerful because writing a correct Dispose implementations is difficult)
C# readonly and Java final are nowhere as useful as C++'s const (There's no way you can expose readonly complex data (a Tree of Nodes, for example) in C# without tremendous work, while it's a built-in feature of C++. Immutable data is an interesting solution, but not everything can be made immutable, so it's not even enough, by far).
So, C# remains an pleasant language as long as you want something that works, but a frustrating language the moment you want something that always and safely works.
Java is even more frustrating, as it has the same problems than C#, and more: Lacking the equivalent of C#'s using keyword, a very skilled colleague of mine spent too much time making sure its resources where correctly freed, whereas the equivalent in C++ would have been easy (using destructors and smart pointers).
So I guess C#/Java's productivity gain is visible for most code... until the day you need the code to be as perfect as possible. That day, you'll know pain. (you won't believe what's asked from our server and GUI apps...).
About Server-side Java and C++
I kept contact with the server teams (I worked 2 years among them, before getting back to the GUI team), at the other side of the building, and I learned something interesting.
Last years, the trend was to have the Java server apps be destined to replace the old C++ server apps, as Java has a lot of frameworks/tools, and is easy to maintain, deploy, etc. etc..
...Until the problem of low-latency reared its ugly head the last months. Then, the Java server apps, no matter the optimization attempted by our skilled Java team, simply and cleanly lost the race against the old, not really optimized C++ server.
Currently, the decision is to keep the Java servers for common use where performance while still important, is not concerned by the low-latency target, and aggressively optimize the already faster C++ server applications for low-latency and ultra-low-latency needs.
Conclusion
Nothing is as simple as expected.
Java, and even more C#, are cool languages, with extensive standard libraries and frameworks, where you can code fast, and have result very soon.
But when you need raw power, powerful and systematic optimizations, strong compiler support, powerful language features and absolute safety, Java and C# make it difficult to win the last missing but critical percents of quality you need to remain above the competition.
It's as if you needed less time and less experienced developers in C#/Java than in C++ to produce average quality code, but in the other hand, the moment you needed excellent to perfect quality code, it was suddenly easier and faster to get the results right in C++.
Of course, this is my own perception, perhaps limited to our specific needs.
But still, it is what happens today, both in the GUI teams and the server-side teams.
Of course, I'll update this post if something new happens.
Edit (2011-06-22)
"We find that in regards to performance, C++ wins out by
a large margin. However, it also required the most extensive
tuning efforts, many of which were done at a level of sophistication
that would not be available to the average programmer.
[...] The Java version was probably the simplest to implement, but the hardest to analyze for performance. Specifically the effects around garbage collection were complicated and very hard to tune."
Sources:
https://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf
http://www.computing.co.uk/ctg/news/2076322/-winner-google-language-tests
Edit (2011-09-20)
"The going word at Facebook is that 'reasonably written C++ code just runs fast,' which underscores the enormous effort spent at optimizing PHP and Java code. Paradoxically, C++ code is more difficult to write than in other languages, but efficient code is a lot easier [to write in C++ than in other languages]."
– Herb Sutter at //build/, quoting Andrei Alexandrescu
Sources:
http://channel9.msdn.com/Events/BUILD/BUILD2011/TOOL-835T
http://video.ch9.ms/build/2011/slides/TOOL-835T_Sutter.pptx

Generally, C# and Java can be just as fast or faster because the JIT compiler -- a compiler that compiles your IL the first time it's executed -- can make optimizations that a C++ compiled program cannot because it can query the machine. It can determine if the machine is Intel or AMD; Pentium 4, Core Solo, or Core Duo; or if supports SSE4, etc.
A C++ program has to be compiled beforehand usually with mixed optimizations so that it runs decently well on all machines, but is not optimized as much as it could be for a single configuration (i.e. processor, instruction set, other hardware).
Additionally certain language features allow the compiler in C# and Java to make assumptions about your code that allows it to optimize certain parts away that just aren't safe for the C/C++ compiler to do. When you have access to pointers there's a lot of optimizations that just aren't safe.
Also Java and C# can do heap allocations more efficiently than C++ because the layer of abstraction between the garbage collector and your code allows it to do all of its heap compression at once (a fairly expensive operation).
Now I can't speak for Java on this next point, but I know that C# for example will actually remove methods and method calls when it knows the body of the method is empty. And it will use this kind of logic throughout your code.
So as you can see, there are lots of reasons why certain C# or Java implementations will be faster.
Now this all said, specific optimizations can be made in C++ that will blow away anything that you could do with C#, especially in the graphics realm and anytime you're close to the hardware. Pointers do wonders here.
So depending on what you're writing I would go with one or the other. But if you're writing something that isn't hardware dependent (driver, video game, etc), I wouldn't worry about the performance of C# (again can't speak about Java). It'll do just fine.
One the Java side, #Swati points out a good article:
https://www.ibm.com/developerworks/library/j-jtp09275

Whenever I talk managed vs. unmanaged performance, I like to point to the series Rico (and Raymond) did comparing C++ and C# versions of a Chinese/English dictionary. This google search will let you read for yourself, but I like Rico's summary.
So am I ashamed by my crushing defeat?
Hardly. The managed code got a very
good result for hardly any effort. To
defeat the managed Raymond had to:
Write his own file I/O stuff
Write his own string class
Write his own allocator
Write his own international mapping
Of course he used available lower
level libraries to do this, but that's
still a lot of work. Can you call
what's left an STL program? I don't
think so, I think he kept the
std::vector class which ultimately was
never a problem and he kept the find
function. Pretty much everything else
is gone.
So, yup, you can definately beat the
CLR. Raymond can make his program go
even faster I think.
Interestingly, the time to parse the
file as reported by both programs
internal timers is about the same --
30ms for each. The difference is in
the overhead.
For me the bottom line is that it took 6 revisions for the unmanaged version to beat the managed version that was a simple port of the original unmanaged code. If you need every last bit of performance (and have the time and expertise to get it), you'll have to go unmanaged, but for me, I'll take the order of magnitude advantage I have on the first versions over the 33% I gain if I try 6 times.

The compile for specific CPU optimizations are usually overrated. Just take a program in C++ and compile with optimization for pentium PRO and run on a pentium 4. Then recompile with optimize for pentium 4. I passed long afternoons doing it with several programs. General results?? Usually less than 2-3% performance increase. So the theoretical JIT advantages are almost none. Most differences of performance can only be observed when using scalar data processing features, something that will eventually need manual fine tunning to achieve maximum performance anyway. Optimizations of that sort are slow and costly to perform making them sometimes unsuitable for JIT anyway.
On real world and real application C++ is still usually faster than java, mainly because of lighter memory footprint that result in better cache performance.
But to use all of C++ capability you, the developer must work hard. You can achieve superior results, but you must use your brain for that. C++ is a language that decided to present you with more tools, charging the price that you must learn them to be able to use the language well.

JIT (Just In Time Compiling) can be incredibly fast because it optimizes for the target platform.
This means that it can take advantage of any compiler trick your CPU can support, regardless of what CPU the developer wrote the code on.
The basic concept of the .NET JIT works like this (heavily simplified):
Calling a method for the first time:
Your program code calls a method Foo()
The CLR looks at the type that implements Foo() and gets the metadata associated with it
From the metadata, the CLR knows what memory address the IL (Intermediate byte code) is stored in.
The CLR allocates a block of memory, and calls the JIT.
The JIT compiles the IL into native code, places it into the allocated memory, and then changes the function pointer in Foo()'s type metadata to point to this native code.
The native code is ran.
Calling a method for the second time:
Your program code calls a method Foo()
The CLR looks at the type that implements Foo() and finds the function pointer in the metadata.
The native code at this memory location is ran.
As you can see, the 2nd time around, its virtually the same process as C++, except with the advantage of real time optimizations.
That said, there are still other overhead issues that slow down a managed language, but the JIT helps a lot.

I like Orion Adrian's answer, but there is another aspect to it.
The same question was posed decades ago about assembly language vs. "human" languages like FORTRAN. And part of the answer is similar.
Yes, a C++ program is capable of being faster than C# on any given (non-trivial?) algorithm, but the program in C# will often be as fast or faster than a "naive" implementation in C++, and an optimized version in C++ will take longer to develop, and might still beat the C# version by a very small margin. So, is it really worth it?
You'll have to answer that question on a one-by-one basis.
That said, I'm a long time fan of C++, and I think it's an incredibly expressive and powerful language -- sometimes underappreciated. But in many "real life" problems (to me personally, that means "the kind I get paid to solve"), C# will get the job done sooner and safer.
The biggest penalty you pay? Many .NET and Java programs are memory hogs. I have seen .NET and Java apps take "hundreds" of megabytes of memory, when C++ programs of similar complexity barely scratch the "tens" of MBs.

I'm not sure how often you'll find that Java code will run faster than C++, even with Hotspot, but I'll take a swing at explaining how it could happen.
Think of compiled Java code as interpreted machine language for the JVM. When the Hotspot processor notices that certain pieces of the compiled code are going to be used many times, it performs an optimization on the machine code. Since hand-tuning Assembly is almost always faster than C++ compiled code, it's ok to figure that programmatically-tuned machine code isn't going to be too bad.
So, for highly repetitious code, I could see where it'd be possible for Hotspot JVM to run the Java faster than C++... until garbage collection comes into play. :)

Generally, your program's algorithm will be much more important to the speed of your application than the language. You can implement a poor algorithm in any language, including C++. With that in mind, you'll generally be able to write code the runs faster in a language that helps you implement a more efficient algorithm.
Higher-level languages do very well at this by providing easier access to many efficient pre-built data structures and encouraging practices that will help you avoid inefficient code. Of course, they can at times also make it easy to write a bunch of really slow code, too, so you still have to know your platform.
Also, C++ is catching up with "new" (note the quotes) features like the STL containers, auto pointers, etc -- see the boost library, for example. And you might occasionally find that the fastest way to accomplish some task requires a technique like pointer arithmetic that's forbidden in a higher-level language -- though they typcially allow you to call out to a library written in a language that can implement it as desired.
The main thing is to know the language you're using, it's associated API, what it can do, and what it's limitations are.

I don't know either...my Java programs are always slow. :-) I've never really noticed C# programs being particularly slow, though.

Here is another intersting benchmark, which you can try yourself on your own computer.
It compares ASM, VC++, C#, Silverlight, Java applet, Javascript, Flash (AS3)
Roozz plugin speed demo
Please note that the speed of javascript varries a lot depending on what browser is executing it. The same is true for Flash and Silverlight because these plugins run in the same process as the hosting browser. But the Roozz plugin run standard .exe files, which run in their own process, thus the speed is not influenced by the hosting browser.

You should define "perform better than..". Well, I know, you asked about speed, but its not everything that counts.
Do virtual machines perform more runtime overhead? Yes!
Do they eat more working memory? Yes!
Do they have higher startup costs (runtime initialization and JIT compiler) ? Yes!
Do they require a huge library installed? Yes!
And so on, its biased, yes ;)
With C# and Java you pay a price for what you get (faster coding, automatic memory management, big library and so on). But you have not much room to haggle about the details: take the complete package or nothing.
Even if those languages can optimize some code to execute faster than compiled code, the whole approach is (IMHO) inefficient. Imagine driving every day 5 miles to your workplace, with a truck! Its comfortable, it feels good, you are safe (extreme crumple zone) and after you step on the gas for some time, it will even be as fast as a standard car! Why don't we all have a truck to drive to work? ;)
In C++ you get what you pay for, not more, not less.
Quoting Bjarne Stroustrup: "C++ is my favorite garbage collected language because it generates so little garbage"
link text

The executable code produced from a Java or C# compiler is not interpretted -- it is compiled to native code "just in time" (JIT). So, the first time code in a Java/C# program is encountered during execution, there is some overhead as the "runtime compiler" (aka JIT compiler) turns the byte code (Java) or IL code (C#) into native machine instructions. However, the next time that code is encountered while the application is still running, the native code is executed immediately. This explains how some Java/C# programs appear to be slow initially, but then perform better the longer they run. A good example is an ASP.Net web site. The very first time the web site is accessed, it may be a bit slower as the C# code is compiled to native code by the JIT compiler. Subsequent accesses result in a much faster web site -- server and client side caching aside.

Some good answers here about the specific question you asked. I'd like to step back and look at the bigger picture.
Keep in mind that your user's perception of the speed of the software you write is affected by many other factors than just how well the codegen optimizes. Here are some examples:
Manual memory management is hard to do correctly (no leaks), and even harder to do effeciently (free memory soon after you're done with it). Using a GC is, in general, more likely to produce a program that manages memory well. Are you willing to work very hard, and delay delivering your software, in an attempt to out-do the GC?
My C# is easier to read & understand than my C++. I also have more ways to convince myself that my C# code is working correctly. That means I can optimize my algorithms with less risk of introducing bugs (and users don't like software that crashes, even if it does it quickly!)
I can create my software faster in C# than in C++. That frees up time to work on performance, and still deliver my software on time.
It's easier to write good UI in C# than C++, so I'm more likely to be able to push work to the background while UI stays responsive, or to provide progress or hearbeat UI when the program has to block for a while. This doesn't make anything faster, but it makes users happier about waiting.
Everything I said about C# is probably true for Java, I just don't have the experience to say for sure.

If you're a Java/C# programmer learning C++, you'll be tempted to keep thinking in terms of Java/C# and translate verbatim to C++ syntax. In that case, you only get the earlier mentioned benefits of native code vs. interpreted/JIT. To get the biggest performance gain in C++ vs. Java/C#, you have to learn to think in C++ and design code specifically to exploit the strengths of C++.
To paraphrase Edsger Dijkstra: [your first language] mutilates the mind beyond recovery.
To paraphrase Jeff Atwood: you can write [your first language] in any new language.

One of the most significant JIT optimizations is method inlining. Java can even inline virtual methods if it can guarantee runtime correctness. This kind of optimization usually cannot be performed by standard static compilers because it needs whole-program analysis, which is hard because of separate compilation (in contrast, JIT has all the program available to it). Method inlining improves other optimizations, giving larger code blocks to optimize.
Standard memory allocation in Java/C# is also faster, and deallocation (GC) is not much slower, but only less deterministic.

The virtual machine languages are unlikely to outperform compiled languages but they can get close enough that it doesn't matter, for (at least) the following reasons (I'm speaking for Java here since I've never done C#).
1/ The Java Runtime Environment is usually able to detect pieces of code that are run frequently and perform just-in-time (JIT) compilation of those sections so that, in future, they run at the full compiled speed.
2/ Vast portions of the Java libraries are compiled so that, when you call a library function, you're executing compiled code, not interpreted. You can see the code (in C) by downloading the OpenJDK.
3/ Unless you're doing massive calculations, much of the time your program is running, it's waiting for input from a very slow (relatively speaking) human.
4/ Since a lot of the validation of Java bytecode is done at the time of loading the class, the normal overhead of runtime checks is greatly reduced.
5/ At the worst case, performance-intensive code can be extracted to a compiled module and called from Java (see JNI) so that it runs at full speed.
In summary, the Java bytecode will never outperform native machine language, but there are ways to mitigate this. The big advantage of Java (as I see it) is the HUGE standard library and the cross-platform nature.

Orion Adrian, let me invert your post to see how unfounded your remarks are, because a lot can be said about C++ as well. And telling that Java/C# compiler optimize away empty functions does really make you sound like you are not my expert in optimization, because a) why should a real program contain empty functions, except for really bad legacy code, b) that is really not black and bleeding edge optimization.
Apart from that phrase, you ranted blatantly about pointers, but don't objects in Java and C# basically work like C++ pointers? May they not overlap? May they not be null? C (and most C++ implementations) has the restrict keyword, both have value types, C++ has reference-to-value with non-null guarantee. What do Java and C# offer?
>>>>>>>>>>
Generally, C and C++ can be just as fast or faster because the AOT compiler -- a compiler that compiles your code before deployment, once and for all, on your high memory many core build server -- can make optimizations that a C# compiled program cannot because it has a ton of time to do so. The compiler can determine if the machine is Intel or AMD; Pentium 4, Core Solo, or Core Duo; or if supports SSE4, etc, and if your compiler does not support runtime dispatch, you can solve for that yourself by deploying a handful of specialized binaries.
A C# program is commonly compiled upon running it so that it runs decently well on all machines, but is not optimized as much as it could be for a single configuration (i.e. processor, instruction set, other hardware), and it must spend some time first. Features like loop fission, loop inversion, automatic vectorization, whole program optimization, template expansion, IPO, and many more, are very hard to be solved all and completely in a way that does not annoy the end user.
Additionally certain language features allow the compiler in C++ or C to make assumptions about your code that allows it to optimize certain parts away that just aren't safe for the Java/C# compiler to do. When you don't have access to the full type id of generics or a guaranteed program flow there's a lot of optimizations that just aren't safe.
Also C++ and C do many stack allocations at once with just one register incrementation, which surely is more efficient than Javas and C# allocations as for the layer of abstraction between the garbage collector and your code.
Now I can't speak for Java on this next point, but I know that C++ compilers for example will actually remove methods and method calls when it knows the body of the method is empty, it will eliminate common subexpressions, it may try and retry to find optimal register usage, it does not enforce bounds checking, it will autovectorize loops and inner loops and will invert inner to outer, it moves conditionals out of loops, it splits and unsplits loops. It will expand std::vector into native zero overhead arrays as you'd do the C way. It will do inter procedural optimmizations. It will construct return values directly at the caller site. It will fold and propagate expressions. It will reorder data into a cache friendly manner. It will do jump threading. It lets you write compile time ray tracers with zero runtime overhead. It will make very expensive graph based optimizations. It will do strength reduction, were it replaces certain codes with syntactically totally unequal but semantically equivalent code (the old "xor foo, foo" is just the simplest, though outdated optimization of such kind). If you kindly ask it, you may omit IEEE floating point standards and enable even more optimizations like floating point operand re-ordering. After it has massaged and massacred your code, it might repeat the whole process, because often, certain optimizations lay the foundation for even certainer optimizations. It might also just retry with shuffled parameters and see how the other variant scores in its internal ranking. And it will use this kind of logic throughout your code.
So as you can see, there are lots of reasons why certain C++ or C implementations will be faster.
Now this all said, many optimizations can be made in C++ that will blow away anything that you could do with C#, especially in the number crunching, realtime and close-to-metal realm, but not exclusively there. You don't even have to touch a single pointer to come a long way.
So depending on what you're writing I would go with one or the other. But if you're writing something that isn't hardware dependent (driver, video game, etc), I wouldn't worry about the performance of C# (again can't speak about Java). It'll do just fine.
<<<<<<<<<<
Generally, certain generalized arguments might sound cool in specific posts, but don't generally sound certainly credible.
Anyways, to make peace: AOT is great, as is JIT. The only correct answer can be: It depends. And the real smart people know that you can use the best of both worlds anyways.

It would only happen if the Java interpreter is producing machine code that is actually better optimized than the machine code your compiler is generating for the C++ code you are writing, to the point where the C++ code is slower than the Java and the interpretation cost.
However, the odds of that actually happening are pretty low - unless perhaps Java has a very well-written library, and you have your own poorly written C++ library.

Actually, C# does not really run in a virtual machine like Java does. IL is compiled into assembly language, which is entirely native code and runs at the same speed as native code. You can pre-JIT an .NET application which entirely removes the JIT cost and then you are running entirely native code.
The slowdown with .NET will come not because .NET code is slower, but because it does a lot more behind the scenes to do things like garbage collect, check references, store complete stack frames, etc. This can be quite powerful and helpful when building applications, but also comes at a cost. Note that you could do all these things in a C++ program as well (much of the core .NET functionality is actually .NET code which you can view in ROTOR). However, if you hand wrote the same functionality you would probably end up with a much slower program since the .NET runtime has been optimized and finely tuned.
That said, one of the strengths of managed code is that it can be fully verifiable, ie. you can verify that the code will never access another processes's memory or do unsage things before you execute it. Microsoft has a research prototype of a fully managed operating system that has suprisingly shown that a 100% managed environment can actually perform significantly faster than any modern operating system by taking advantage of this verification to turn off security features that are no longer needed by managed programs (we are talking like 10x in some cases). SE radio has a great episode talking about this project.

In some cases, managed code can actually be faster than native code. For instance, "mark-and-sweep" garbage collection algorithms allow environments like the JRE or CLR to free large numbers of short-lived (usually) objects in a single pass, where most C/C++ heap objects are freed one-at-a-time.
From wikipedia:
For many practical purposes, allocation/deallocation-intensive algorithms implemented in garbage collected languages can actually be faster than their equivalents using manual heap allocation. A major reason for this is that the garbage collector allows the runtime system to amortize allocation and deallocation operations in a potentially advantageous fashion.
That said, I've written a lot of C# and a lot of C++, and I've run a lot of benchmarks. In my experience, C++ is a lot faster than C#, in two ways: (1) if you take some code that you've written in C#, port it to C++ the native code tends to be faster. How much faster? Well, it varies a whole lot, but it's not uncommon to see a 100% speed improvement. (2) In some cases, garbage collection can massively slow down a managed application. The .NET CLR does a terrible job with large heaps (say, > 2GB), and can end up spending a lot of time in GC--even in applications that have few--or even no--objects of intermediate life spans.
Of course, in most cases that I've encounted, managed languages are fast enough, by a long shot, and the maintenance and coding tradeoff for the extra performance of C++ is simply not a good one.

Here's an interesting benchmark
http://zi.fi/shootout/

Actually Sun's HotSpot JVM uses "mixed-mode" execution. It interprets the method's bytecode until it determines (usually through a counter of some sort) that a particular block of code (method, loop, try-catch block, etc.) is going to be executed a lot, then it JIT compiles it. The time required to JIT compile a method often takes longer than if the method were to be interpreted if it is a seldom run method. Performance is usually higher for "mixed-mode" because the JVM does not waste time JITing code that is rarely, if ever, run.
C# and .NET do not do this. .NET JITs everything which, often times, wastes time.

Go read about HP Labs' Dynamo, an interpreter for PA-8000 that runs on PA-8000, and often runs programs faster than they do natively. Then it won't seem at all surprising!
Don't think of it as an "intermediate step" -- running a program involves lots of other steps already, in any language.
It often comes down to:
programs have hot-spots, so even if you're slower running 95% of the body of code you have to run, you can still be performance-competitive if you're faster at the hot 5%
a HLL knows more about your intent than a LLL like C/C++, and so can generate more optimized code (OCaml has even more, and in practice is often even faster)
a JIT compiler has a lot of information that a static compiler doesn't (like, the actual data you happen to have this time)
a JIT compiler can do optimizations at run-time that traditional linkers aren't really allowed to do (like reordering branches so the common case is flat, or inlining library calls)
All in all, C/C++ are pretty lousy languages for performance: there's relatively little information about your data types, no information about your data, and no dynamic runtime to allow much in the way of run-time optimization.

You might get short bursts when Java or CLR is faster than C++, but overall the performance is worse for the life of the application:
see www.codeproject.com/KB/dotnet/RuntimePerformance.aspx for some results for that.

Here is answer from Cliff Click: http://www.azulsystems.com/blog/cliff/2009-09-06-java-vs-c-performanceagain

My understanding is that C/C++ produces native code to run on a particular machine architecture. Conversely, languages like Java and C# run on top of a virtual machine which abstracts away the native architecture. Logically it would seem impossible for Java or C# to match the speed of C++ because of this intermediate step, however I've been told that the latest compilers ("hot spot") can attain this speed or even exceed it.
That is illogical. The use of an intermediate representation does not inherently degrade performance. For example, llvm-gcc compiles C and C++ via LLVM IR (which is a virtual infinite-register machine) to native code and it achieves excellent performance (often beating GCC).
Perhaps this is more of a compiler question than a language question, but can anyone explain in plain English how it is possible for one of these virtual machine languages to perform better than a native language?
Here are some examples:
Virtual machines with JIT compilation facilitate run-time code generation (e.g. System.Reflection.Emit on .NET) so you can compile generated code on-the-fly in languages like C# and F# but must resort to writing a comparatively-slow interpreter in C or C++. For example, to implement regular expressions.
Parts of the virtual machine (e.g. the write barrier and allocator) are often written in hand-coded assembler because C and C++ do not generate fast enough code. If a program stresses these parts of a system then it could conceivably outperform anything that can be written in C or C++.
Dynamic linking of native code requires conformance to an ABI that can impede performance and obviates whole-program optimization whereas linking is typically deferred on VMs and can benefit from whole-program optimizations (like .NET's reified generics).
I'd also like to address some issues with paercebal's highly-upvoted answer above (because someone keeps deleting my comments on his answer) that presents a counter-productively polarized view:
The code processing will be done at compilation time...
Hence template metaprogramming only works if the program is available at compile time which is often not the case, e.g. it is impossible to write a competitively performant regular expression library in vanilla C++ because it is incapable of run-time code generation (an important aspect of metaprogramming).
...playing with types is done at compile time...the equivalent in Java or C# is painful at best to write, and will always be slower and resolved at runtime even when the types are known at compile time.
In C#, that is only true of reference types and is not true for value types.
No matter the JIT optimization, nothing will go has fast as direct pointer access to memory...if you have contiguous data in memory, accessing it through C++ pointers (i.e. C pointers... Let's give Caesar its due) will goes times faster than in Java/C#.
People have observed Java beating C++ on the SOR test from the SciMark2 benchmark precisely because pointers impede aliasing-related optimizations.
Also worth noting that .NET does type specialization of generics across dynamically-linked libraries after linking whereas C++ cannot because templates must be resolved before linking. And obviously the big advantage generics have over templates is comprehensible error messages.

On top of what some others have said, from my understanding .NET and Java are better at memory allocation. E.g. they can compact memory as it gets fragmented while C++ cannot (natively, but it can if you're using a clever garbage collector).

For anything needing lots of speed, the JVM just calls a C++ implementation, so it's a question more of how good their libs are than how good the JVM is for most OS related things.
Garbage collection cuts your memory in half, but using some of the fancier STL and Boost features will have the same effect but with many times the bug potential.
If you are just using C++ libraries and lots of its high level features in a large project with many classes you will probably wind up slower than using a JVM. Except much more error prone.
However, the benefit of C++ is that it allows you to optimize yourself, otherwise you are stuck with what the compiler/jvm does. If you make your own containers, write your own memory management that's aligned, use SIMD, and drop to assembly here and there, you can speed up at least 2x-4x times over what most C++ compilers will do on their own. For some operations, 16x-32x. That's using the same algorithms, if you use better algorithms and parallelize, increases can be dramatic, sometimes thousands of times faster that commonly used methods.

I look at it from a few different points.
Given infinite time and resources, will managed or unmanaged code be faster? Clearly, the answer is that unmanaged code can always at least tie managed code in this aspect - as in the worst case, you'd just hard-code the managed code solution.
If you take a program in one language, and directly translate it to another, how much worse will it perform? Probably a lot, for any two languages. Most languages require different optimizations and have different gotchas. Micro-performance is often a lot about knowing these details.
Given finite time and resources, which of two languages will produce a better result? This is the most interesting question, as while a managed language may produce slightly slower code (given a program reasonably written for that language), that version will likely be done sooner, allowing for more time spent on optimization.

A very short answer: Given a fixed budget you will achieve better performing java application than a C++ application (ROI considerations) In addition Java platform has more decent profilers, that will help you pinpoint your hotspots more quickly

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.