Simulated Annealing in C#

Simulated Annealing in C# - c#

I am using simulated annealing to solve a cryptanalysis problem and I've hit a brick wall. I cannot for the life of me get my probability function to operate correctly, it either takes a worse solution too often (so I bounce around a score of 0.03 and 0.2) or it doesn't take it often enough (so I get stuck at 0.35). I've looked around the internet but I only come across examples where the problems involve finding the MINIMUM value....my problem needs to find the MAXIMUM value, worst score is 0, best is 1.
I need advice on Temperature and what probability function I should use.

The Simulated Annealing article on Wikipedia provides some general guidance on how SA temperatures should be initialized and decreased. Efficient selection of these parameters is normally very problem specific and may need to be identified through tedious trial-and-error.
Normally, optimization algorithms search for the minimum of the objective function. If you want to use such an algorithm as-is on your maximization problem, ask the optimizer to minimize the negation of your objective function. For example, let's say that the objective function for which you want to find the maximum is f(x)=score. You should then request the optimizer to minimize -f(x), i.e. -score (or, as you indicate in the comment above, 1-score).
There are lots of simulated annealing and other global optimization algorithms available online, see for example this list on the Decision Tree for Optimization Software. Unfortunately these codes are normally not written in C#, but if the codes are written in Fortran or C it is normally fairly easy to interface with these codes via P/Invoke.
If you do not require that the optimizer necessarily find the global optimum, there are also some derivative-free optimizers listed here. At least one of these codes is available in a C# version, namely BOBYQA (in fact, this algorithm has been adapted to C# by me :-).

Related

Automatic parameter tuning

I've got a an audio processing app that takes an input audio file, processes it, and spits out a modified output audio file. This audio processing app has 10-15 parameters that affect how it processes the audio, and thus affects the content of the output audio file (it might have, say, a different frequency response, be louder, quieter, etc.). All these parameters have constrained ranges (x0 must be < 1 and > -1 for example).
The output audio file is evaluated by a tool that gives it a score. This tool knows what the "ideal" output should sound like, and scores the output file accordingly. A score of 1.0 means the output is ideal, i.e. the input file was processed with the best possible parameter set. A score of 0 means the output is completely wrong.
So with 10-15 parameters with their valid ranges, the combinations are endless! I'd be sitting here manually tweaking these parameters forever until I got the best solution. I've checked out some LP/MIP solvers (CBC, MS Solver Foundation, GKLP) but these use a mathematical equation as an objective function... you don't "plug in" an external evaluation function as far as I can see.
Is a LP/MIP solver the right tool to aid in the parameter tuning? Any ideas?
Thanks,
akevan

You could use a general heuristic like simulated annealing or genetic algorihms. Your evaluation process would be the fitness/objective function.

You could use the SPOT packet (R programming language). It allows you to find (near-)optimal parameter settings using significantly less runs than brute force. You can use any programming language for your fitness function code, SPOT has an adapter for that, and offers an automatic mode with default setup (You don't have to worry about the design types and prediction models). It has a steep learning curve, but once you understood the basics, it is a powerful tool. Here is a quick guide; chapter 2.6 offers a concrete example. The SPOT package comes with several examples.

If you had the objective function, then yes LP would be the ideal approach (and would give the ideal answer); the solution would be purely analytic. But in the absence of the function it seems you've correctly understood the problem becomes an integer programming problem. I have less knowledge of integer programming, but I believe that too assumes an objective function to solve. Even with the function, integer programs are NP-hard.
So it seems you would need to use brute force to detect a local maxima, and then tune it. I realize that is exactly what you didn't want to do, but that is what comes to mind.

Activation Function, Initializer function, etc, effects on neural networks for face detection

There's various activation functions: sigmoid, tanh, etc. And there's also a few initializer functions: Nguyen and Widrow, random, normalized, constant, zero, etc. So do these have much effect on the outcome of a neural network specialising in face detection? Right now I'm using the Tanh activation function and just randomising all the weights from -0.5 to 0.5. I have no idea if this is the best approach though, and with 4 hours to train the network each time, I'd rather ask on here than experiment!

Take a few hundred data cases and look at the mean and standard deviation of the activation values of your units. You want to be out of the saturation regime of the tanh sigmoid.
I doubt different reasonable initialization schemes will have much effect on the quality of your solutions. It is probably good enough to just initialize the weights to be uniform on the interval [-1/sqrt(N), +1/sqrt(N)], where N is the number of incoming connections.
That being said, what DOES tend to make a big difference is pretraining the network weights, either as an RBM or as an autoencoder. This can be helpful even for single hidden layer neural nets, although it is much more essential for deeper nets. You don't mention the architecture you are using, that information would allow a more helpful answer to your question.
There is even a new initialization rule that seems to work well described in this paper:
http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/447
The paper also mentions some of the symptoms of bad initialization that I was alluding to above that you can easily check for.
To summarize, Uniform on [-1/sqrt(N), +1/sqrt(N)] isn't too bad nor is the one mentioned in the paper I link to. Don't worry about it too much if you use one of those. What is very important is pretraining the weights as an autoencoder (or Restricted Boltzmann Machine), which you should look in to even if you only have a single hidden layer.
If you want to pre-train the weights as an RBM, you could switch to logistic sigmoids and even initialize the weights from a small standard deviation Gaussian without running in to trouble.

Why use flags+bitmasks rather than a series of booleans?

Given a case where I have an object that may be in one or more true/false states, I've always been a little fuzzy on why programmers frequently use flags+bitmasks instead of just using several boolean values.
It's all over the .NET framework. Not sure if this is the best example, but the .NET framework has the following:
public enum AnchorStyles
{
None = 0,
Top = 1,
Bottom = 2,
Left = 4,
Right = 8
}
So given an anchor style, we can use bitmasks to figure out which of the states are selected. However, it seems like you could accomplish the same thing with an AnchorStyle class/struct with bool properties defined for each possible value, or an array of individual enum values.
Of course the main reason for my question is that I'm wondering if I should follow a similar practice with my own code.
So, why use this approach?
Less memory consumption? (it doesn't seem like it would consume less than an array/struct of bools)
Better stack/heap performance than a struct or array?
Faster compare operations? Faster value addition/removal?
More convenient for the developer who wrote it?

It was traditionally a way of reducing memory usage. So, yes, its quite obsolete in C# :-)
As a programming technique, it may be obsolete in today's systems, and you'd be quite alright to use an array of bools, but...
It is fast to compare values stored as a bitmask. Use the AND and OR logic operators and compare the resulting 2 ints.
It uses considerably less memory. Putting all 4 of your example values in a bitmask would use half a byte. Using an array of bools, most likely would use a few bytes for the array object plus a long word for each bool. If you have to store a million values, you'll see exactly why a bitmask version is superior.
It is easier to manage, you only have to deal with a single integer value, whereas an array of bools would store quite differently in, say a database.
And, because of the memory layout, much faster in every aspect than an array. It's nearly as fast as using a single 32-bit integer. We all know that is as fast as you can get for operations on data.

Easy setting multiple flags in any order.
Easy to save and get a serie of 0101011 to the database.

Among other things, its easier to add new bit meanings to a bitfield than to add new boolean values to a class. Its also easier to copy a bitfield from one instance to another than a series of booleans.

It can also make Methods clearer. Imagine a Method with 10 bools vs. 1 Bitmask.

Actually, it can have a better performance, mainly if your enum derives from an byte.
In that extreme case, each enum value would be represented by a byte, containing all the combinations, up to 256. Having so many possible combinations with booleans would lead to 256 bytes.
But, even then, I don't think that is the real reason. The reason I prefer those is the power C# gives me to handle those enums. I can add several values with a single expression. I can remove them also. I can even compare several values at once with a single expression using the enum. With booleans, code can become, let's say, more verbose.

From a domain Model perspective, it just models reality better in some situations. If you have three booleans like AccountIsInDefault and IsPreferredCustomer and RequiresSalesTaxState, then it doesnn't make sense to add them to a single Flags decorated enumeration, cause they are not three distinct values for the same domain model element.
But if you have a set of booleans like:
[Flags] enum AccountStatus {AccountIsInDefault=1,
AccountOverdue=2 and AccountFrozen=4}
or
[Flags] enum CargoState {ExceedsWeightLimit=1,
ContainsDangerousCargo=2, IsFlammableCargo=4,
ContainsRadioactive=8}
Then it is useful to be able to store the total state of the Account, (or the cargo) in ONE variable... that represents ONE Domain Element whose value can represent any possible combination of states.

Raymond Chen has a blog post on this subject.
Sure, bitfields save data memory, but
you have to balance it against the
cost in code size, debuggability, and
reduced multithreading.
As others have said, its time is largely past. It's tempting to still do it, cause bit fiddling is fun and cool-looking, but it's no longer more efficient, it has serious drawbacks in terms of maintenance, it doesn't play nicely with databases, and unless you're working in an embedded world, you have enough memory.

I would suggest never using enum flags unless you are dealing with some pretty serious memory limitations (not likely). You should always write code optimized for maintenance.
Having several boolean properties makes it easier to read and understand the code, change the values, and provide Intellisense comments not to mention reduce the likelihood of bugs. If necessary, you can always use an enum flag field internally, just make sure you expose the setting/getting of the values with boolean properties.

Space efficiency - 1 bit
Time efficiency - bit comparisons are handled quickly by hardware.
Language independence - where the data may be handled by a number of different programs you don't need to worry about the implementation of booleans across different languages/platforms.
Most of the time, these are not worth the tradeoff in terms of maintance. However, there are times when it is useful:
Network protocols - there will be a big saving in reduced size of messages
Legacy software - once I had to add some information for tracing into some legacy software.
Cost to modify the header: millions of dollars and years of effort.
Cost to shoehorn the information into 2 bytes in the header that weren't being used: 0.
Of course, there was the additional cost in the code that accessed and manipulated this information, but these were done by functions anyways so once you had the accessors defined it was no less maintainable than using Booleans.

I have seen answers like Time efficiency and compatibility. those are The Reasons, but I do not think it is explained why these are sometime necessary in times like ours. from all answers and experience of chatting with other engineers I have seen it pictured as some sort of quirky old time way of doing things that should just die because new way to do things are better.
Yes, in very rare case you may want to do it the "old way" for performance sake like if you have the classic million times loop. but I say that is the wrong perspective of putting things.
While it is true that you should NOT care at all and use whatever C# language throws at you as the new right-way™ to do things (enforced by some fancy AI code analysis slaping you whenever you do not meet their code style), you should understand deeply that low level strategies aren't there randomly and even more, it is in many cases the only way to solve things when you have no help from a fancy framework. your OS, drivers, and even more the .NET itself(especially the garbage collector) are built using bitfields and transactional instructions. your CPU instruction set itself is a very complex bitfield, so JIT compilers will encode their output using complex bit processing and few hardcoded bitfields so that the CPU can execute them correctly.
When we talk about performance things have a much larger impact than people imagine, today more then ever especially when you start considering multicores.
when multicore systems started to become more common all CPU manufacturer started to mitigate the issues of SMP with the addition of dedicated transactional memory access instructions while these were made specifically to mitigate the near impossible task to make multiple CPUs to cooperate at kernel level without a huge drop in perfomrance it actually provides additional benefits like an OS independent way to boost low level part of most programs. basically your program can use CPU assisted instructions to perform memory changes to integers sized memory locations, that is, a read-modify-write where the "modify" part can be anything you want but most common patterns are a combination of set/clear/increment.
usually the CPU simply monitors if there is any other CPU accessing the same address location and if a contention happens it usually stops the operation to be committed to memory and signals the event to the application within the same instruction. this seems trivial task but superscaler CPU (each core has multiple ALUs allowing instruction parallelism), multi-level cache (some private to each core, some shared on a cluster of CPU) and Non-Uniform-Memory-Access systems (check threadripper CPU) makes things difficult to keep coherent, luckily the smartest people in the world work to boost performance and keep all these things happening correctly. todays CPU have a large amount of transistor dedicated to this task so that caches and our read-modify-write transactions work correctly.
C# allows you to use the most common transactional memory access patterns using Interlocked class (it is only a limited set for example a very useful clear mask and increment is missing, but you can always use CompareExchange instead which gets very close to the same performance).
To achieve the same result using a array of booleans you must use some sort of lock and in case of contention the lock is several orders of magnitude less permorming compared to the atomic instructions.
here are some examples of highly appreciated HW assisted transaction access using bitfields which would require a completely different strategy without them of course these are not part of C# scope:
assume a DMA peripheral that has a set of DMA channels, let say 20 (but any number up to the maximum number of bits of the interlock integer will do). When any peripheral's interrupt that might execute at any time, including your beloved OS and from any core of your 32-core latest gen wants a DMA channel you want to allocate a DMA channel (assign it to the peripheral) and use it. a bitfield will cover all those requirements and will use just a dozen of instructions to perform the allocation, which are inlineable within the requesting code. basically you cannot go faster then this and your code is just few functions, basically we delegate the hard part to the HW to solve the problem, constraints: bitfield only
assume a peripheral that to perform its duty requires some working space in normal RAM memory. for example assume a high speed I/O peripheral that uses scatter-gather DMA, in short it uses a fixed-size block of RAM populated with the description (btw the descriptor is itself made of bitfields) of the next transfer and chained one to each other creating a FIFO queue of transfers in RAM. the application prepares the descriptors first and then it chains with the tail of the current transfers without ever pausing the controller (not even disabling the interrupts). the allocation/deallocation of such descriptors can be made using bitfield and transactional instructions so when it is shared between diffent CPUs and between the driver interrupt and the kernel all will still work without conflicts. one usage case would be the kernel allocates atomically descriptors without stopping or disabling interrupts and without additional locks (the bitfield itself is the lock), the interrupt deallocates when the transfer completes.
most old strategies were to preallocate the resources and force the application to free after usage.
If you ever need to use multitask on steriods C# allows you to use either Threads + Interlocked, but lately C# introduced lightweight Tasks, guess how it is made? transactional memory access using Interlocked class. So you likely do not need to reinvent the wheel any of the low level part is already covered and well engineered.
so the idea is, let smart people (not me, I am a common developer like you) solve the hard part for you and just enjoy general purpose computing platform like C#. if you still see some remnants of these parts is because someone may still need to interface with worlds outside .NET and access some driver or system calls for example requiring you to know how to build a descriptor and put each bit in the right place. do not being mad at those people, they made our jobs possible.
In short : Interlocked + bitfields. incredibly powerful, don't use it

It is for speed and efficiency. Essentially all you are working with is a single int.
if ((flags & AnchorStyles.Top) == AnchorStyles.Top)
{
//Do stuff
}

factory floor simulation

I would like to create a simulation of a factory floor, and I am looking for ideas on how to do this. My thoughts so far are:
• A factory is a made up of a bunch of processes, some of these processes are in series and some are in parallel. Each process would communicate with it's upstream and downstream and parallel neighbors to let them know of it’s through put
• Each process would it's own basic attributes like maximum throughput, cost of maintenance as a result of through put
Obviously I have not fully thought this out, but I was hoping somebody might be able to give me a few ideas or perhaps a link to an on line resource
update:
This project is only for my own entertainment, and perhaps learn a little bit alnong the way. I am not employed as a programmer, programming is just a hobby for me. I have decided to write it in C#.

Simulating an entire factory accurately is a big job.
Firstly you need to figure out: why are you making the simulation? Who is it for? What value will it give them? What parts of the simulation are interesting? How accurate does it need to be? What parts of the process don't need to be simulated accurately?
To figure out the answers to these questions, you will need to talk to whoever it is that wants the simulation written.
Once you have figured out what to simulate, then you need to figure out how to simulate it. You need some models and some parameters for those models. You can maybe get some actual figures from real production and try to derive models from the figures. The models could be a simple linear relationship between an input and an output, a more complex relationship, and perhaps even a stochastic (random) effect. If you don't have access to real data, then you'll have to make guesses in your model, but this will never be as good so try to get real data wherever possible.
You might also want to consider to probabilities of components breaking down, and what affect that might have. What about the workers going on strike? Unavailability of raw materials? Wear and tear on the machinery causing progressively lower output over time? Again you might not want to consider these details, it depends on what the customer wants.
If your simulation involves random events, you might want to run it many times and get an average outcome, for example using a Monte Carlo simulation.
To give a better answer, we need to know more about what you need to simulate and what you want to achieve.

Since your customer is yourself, you'll need to decide the answer to all of the questions that Mark Byers asked. However, I'll give you some suggestions and hopefully they'll give you a start.
Let's assume your factory takes a few different parts and assembles them into just one finished product. A flowchart of the assembly process might look like this:
Factory Flowchart http://img62.imageshack.us/img62/863/factoryflowchart.jpg
For the first diamond, where widgets A and B are assembled, assume it takes on average 30 seconds to complete this step. We'll assume the actual time it takes the two widgets to be assembled is distributed normally, with mean 30 s and variance 5 s. For the second diamond, assume it also takes on average 30 seconds, but most of the time it doesn't take nearly that long, and other times it takes a lot longer. This is well approximated by an exponential distribution, with 30 s as the rate parameter, often represented in equations by a lambda.
For the first process, compute the time to assemble widgets A and B as:
timeA = randn(mean, sqrt(variance)); // Assuming C# has a function for a normally
// distributed random number with mean and
// sigma as inputs
For the second process, compute the time to add widget C to the assembly as:
timeB = rand()/lambda; // Assuming C# has a function for a uniformly distributed
// random number
Now your total assembly time for each iGadget will be timeA + timeB + waitingTime. At each assembly point, store a queue of widgets waiting to be assembled. If the second assembly point is a bottleneck, it's queue will fill up. You can enforce a maximum size for its queue, and hold things further up stream when that max size is reached. If an item is in a queue, it's assembly time is increased by all of the iGadgets ahead of it in the assembly line. I'll leave it up to you to figure out how to code that up, and you can run lots of trials to see what the total assembly time will be, on average. What does the resultant distribution look like?
Ways to "spice this up":
Require 3 B widgets for every A widget. Play around with inventory. Replenish inventory at random intervals.
Add a quality assurance check (exponential distribution is good to use here), and reject some of the finished iGadgets. I suggest using a low rejection rate.
Try using different probability distributions than those I've suggested. See how they affect your simulation. Always try to figure out how the input parameters to the probability distributions would map into real world values.
You can do a lot with this simple simulation. The next step would be to generalize your code so that you can have an arbitrary number of widgets and assembly steps. This is not quite so easy. There is an entire field of applied math called operations research that is dedicated to this type of simulation and analysis.

What you're describing is a classical problem addressed by discrete event simulation. A variety of both general purpose and special purpose simulation languages have been developed to model these kinds of problems. While I wouldn't recommend programming anything from scratch for a "real" problem, it may be a good exercise to write your own code for a small queueing problem so you can understand event scheduling, random number generation, keeping track of calendars, etc. Once you've done that, a general purpose simulation language will do all that stuff for you so you can concentrate on the big picture.
A good reference is Law & Kelton. ARENA is a standard package. It is widely used and, IMHO, is very comprehensive for these kind of simulations. The ARENA book is also a decent book on simulation and it comes with the software that can be applied to small problems. To model bigger problems, you'll need to get a license. You should be able to download a trial version of ARENA here.

It maybe more then what you are looking for but visual components is a good industrial simulation tool.
To be clear I do not work for them nor does the company I work for currently use them, but we have looked at them.

Automod is the way to go.
http://www.appliedmaterials.com/products/automod_2.html
There is a lot to learn, and it won't be cheap.
ASI's Automod has been in the factory simulation business for about 30 years. It is now owned by Applied Materials. The big players who work with material handling in a warehouse use Automod because it is the proven leader.

What is faster- Java or C# (or good old C)? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm currently deciding on a platform to build a scientific computational product on, and am deciding on either C#, Java, or plain C with Intel compiler on Core2 Quad CPU's. It's mostly integer arithmetic.
My benchmarks so far show Java and C are about on par with each other, and .NET/C# trails by about 5%- however a number of my coworkers are claiming that .NET with the right optimizations will beat both of these given enough time for the JIT to do its work.
I always assume that the JIT would have done it's job within a few minutes of the app starting (Probably a few seconds in my case, as it's mostly tight loops), so I'm not sure whether to believe them
Can anyone shed any light on the situation? Would .NET beat Java? (Or am I best just sticking with C at this point?).
The code is highly multithreaded and data sets are several terabytes in size.
Haskell/Erlang etc are not options in this case as there is a significant quantity of existing legacy C code that will be ported to the new system, and porting C to Java/C# is a lot simpler than to Haskell or Erlang. (Unless of course these provide a significant speedup).
Edit: We are considering moving to C# or Java because they may, in theory, be faster. Every percent we can shave off our processing time saves us tens of thousands of dollars per year. At this point we are just trying to evaluate whether C, Java, or c# would be faster.

The key piece of information in the question is this:
Every percent we can shave off our
processing time saves us tens of
thousands of dollars per year
So you need to consider how much it will cost to shave each percent off. If that optimization effort costs tens of thousands of dollars per year, then it isn't worth doing. You could make a bigger saving by firing a programmer.
With the right skills (which today are rarer and therefore more expensive) you can hand-craft assembler to get the fastest possible code. With slightly less rare (and expensive) skills, you can do almost as well with some really ugly-looking C code. And so on. The more performance you squeeze out of it, the more it will cost you in development effort, and there will be diminishing returns for ever greater effort. If the profit from this stays at "tens of thousands of dollars per year" then there will come a point where it is no longer worth the effort. In fact I would hazard a guess you're already at that point because "tens of thousands of dollars per year" is in the range of one salary, and probably not enough to buy the skills required to hand-optimize a complex program.
I would guess that if you have code already written in C, the effort of rewriting it all as a direct translation in another language will be 90% wasted effort. It will very likely perform slower simply because you won't be taking advantage of the capabilities of the platform, but instead working against them, e.g. trying to use Java as if it was C.
Also within your existing code, there will be parts that make a crucial contribution to the running time (they run frequently), and other parts that are totally irrelevant (they run rarely). So if you have some idea for speeding up the program, there is no economic sense in wasting time applying it to the parts of the program that don't affect the running time.
So use a profiler to find the hot spots, and see where time is being wasted in the existing code.
Update when I noticed the reference to the code being "multithreaded"
In that case, if you focus your effort on removing bottlenecks so that your program can scale well over a large number of cores, then it will automatically get faster every year at a rate that will dwarf any other optimization you can make. This time next year, quad cores will be standard on desktops. The year after that, 8 cores will be getting cheaper (I bought one over a year ago for a few thousand dollars), and I would predict that a 32 core machine will cost less than a developer by that time.

I'm sorry, but that is not a simple question. It would depend a lot on what exactly was going on. C# is certainly no slouch, and you'd be hard-pressed to say "java is faster" or "C# is faster". C is a very different beast... it maybe has the potential to be faster - if you get it right; but in most cases it'll be about the same, but much harder to write.
It also depends how you do it - locking strategies, how you do the parallelization, the main code body, etc.
Re JIT - you could use NGEN to flatten this, but yes; if you are hitting the same code it should be JITted very early on.
One very useful feature of C#/Java (over C) is that they have the potential to make better use of the local CPU (optimizations etc), without you having to worry about it.
Also - with .NET, consider things like "Parallel Extensions" (to be bundled in 4.0), which gives you a much stronger threading story (compared to .NET without PFX).

Don't worry about language; parallelize!
If you have a highly multithreaded, data-intensive scientific code, then I don't think worrying about language is the biggest issue for you. I think you should concentrate on making your application parallel, especially making it scale past a single node. This will get you far more performance than just switching languages.
As long as you're confined to a single node, you're going to be starved for compute power and bandwidth for your app. On upcoming many-core machines, it's not clear that you'll have the bandwidth you need to do data-intensive computing on all the cores. You can do computationally intensive work (like a GPU does), but you may not be able to feed all the cores if you need to stream a lot of data to every one of them.
I think you should consider two options:
MapReduce
Your problem sounds like a good match for something like Hadoop, which is designed for very data-intensive jobs.
Hadoop has scaled to 10,000 nodes on Linux, and you can shunt your work off either to someone else's (e.g. Amazon's, Microsoft's) or your own compute cloud. It's written in Java, so as far as porting goes, you can either call your existing C code from within Java, or you can port the whole thing to Java.
MPI
If you don't want to bother porting to MapReduce, or if for some reason your parallel paradigm doesn't fit the MapReduce model, you could consider adapting your app to use MPI. This would also allow you to scale out to (potentially thousands) of cores. MPI is the de-facto standard for computationally intensive, distributed-memory applications, and I believe there are Java bindings, but mostly people use MPI with C, C++, and Fortran. So you could keep your code in C and focus on parallelizing the performance-intensive parts. Take a look at OpenMPI for starters if you are interested.

I'm honestly surprised at those benchmarks.
In a computationally intensive product I would place a large wager on C to perform faster. You might write code that leaks memory like a sieve, and has interesting threading related defects, but it should be faster.
The only reason I could think that Java or C# would be faster is due to a short run length on the test. If little or no GC happened, you'll avoid the overhead of actually deallocating memory. If the process is iterative or parallel, try sticking a GC.Collect wherever you think you're done a bunch of objects(after setting things to null or otherwise removing references).
Also, if you're dealing with terabytes of data, my opinion is you're going to be much better off with deterministic memory allocation that you get with C. If you deallocate roughly close to when you allocate your heap will stay largely unfragmented. With a GC environment you may very well end up with your program using far more memory after a decent run length than you would guess, just because of fragmentation.
To me this sounds like the sort of project where C would be the appropriate language, but would require a bit of extra attention to memory allocation/deallocation. My bet is that C# or Java will fail if run on a full data set.

Quite some time ago Raymond Chen and Rico Mariani had a series of blog posts incrementally optimising a file load into a dictionary tool. While .NET was quicker early on (i.e. easy to make quick) the C/Win32 approach eventually was significantly faster -- but at considerable complexity (e.g. using custom allocators).
In the end the answer to which is faster will heavily depend on how much time you are willing to expend on eking every microsecond out of each approach. That effort (assuming you do it properly, guided by real profiler data) will make a far greater difference than choice of language/platform.
The first and last performance blog entries:
Chen part 1
Mariani part 1
Check final part
Mariani final part
(The last link gives an overall summary of the results and some analysis.)

It is going to depend very much on what you are doing specifically. I have Java code that beats C code. I have Java code that is much slower than C++ code (I don't do C#/.NET so cannot speak to those).
So, it depends on what you are doing, I am sure you can find something that is faster in language X than language Y.
Have you tried running the C# code through a profiler to see where it is taking the most time (same with Java and C while you are at it). Perhaps you need to do something different.
The Java HotSpot VM is more mature (roots of it going back to at least 1994) than the .NET one, so it may come down to the code generation abilities of both for that.

You say "the code is multithreaded" which implies that the algorithms are parallelisable. Also, you save the "data sets are several terabytes in size".
Optimising is all about finding and eliminating bottlenecks.
The obvious bottleneck is the bandwidth to the data sets. Given the size of the data, I'm guessing that the data is held on a server rather than on a desktop machine. You haven't given any details of the algorithms you're using. Is the time taken by the algorithm greater than the time taken to read/write the data/results? Does the algorithm work on subsets of the total data?
I'm going to assume that the algorithm works on chunks of data rather than the whole dataset.
You have two scenarios to consider:
The algorithm takes more time to process the data than it does to get the data. In this case, you need to optimise the algorithm.
The algorithm takes less time to process the data than it does to get the data. In this case, you need to increase the bandwidth between the algorithm and the data.
In the first case, you need a developer that can write good assembler code to get the most out of the processors you're using, leveraging SIMD, GPUs and multicores if they're available. Whatever you do, don't just crank up the number of threads because as soon as the number of threads exceeds the number of cores, your code goes slower! This due to the added overhead of switching thread contexts. Another option is to use a SETI like distributed processing system (how many PCs in your organisation are used for admin purposes - think of all that spare processing power!). C#/Java, as bh213 mentioned, can be an order of magnitude slower than well written C/C++ using SIMD, etc. But that is a niche skillset these days.
In the latter case, where you're limited by bandwidth, then you need to improve the network connecting the data to the processor. Here, make sure you're using the latest ethernet equipment - 1Gbps everywhere (PC cards, switches, routers, etc). Don't use wireless as that's slower. If there's lots of other traffic, consider a dedicated network in parallel with the 'office' network. Consider storing the data closer to the clients - for every five or so clients use a dedicated server connected directly to each client which mirrors the data from the server.
If saving a few percent of processing time saves "tens of thousands of dollars" then seriously consider getting a consultant in, two actually - one software, one network. They should easily pay for themselves in the savings made. I'm sure there's many here that are suitably qualified to help.
But if reducing cost is the ultimate goal, then consider Google's approach - write code that keeps the CPU ticking over below 100%. This saves energy directly and indirectly through reduced cooling, thus costing less. You'll want more bang for your buck so it's C/C++ again - Java/C# have more overhead, overhead = more CPU work = more energy/heat = more cost.
So, in summary, when it comes to saving money there's a lot more to it than what language you're going to choose.

If there is already a significant quantity of legacy C code that will be added to the system then why move to C# and Java?
In response to your latest edit about wanting to take advantage of any improvements in processing speed....then your best bet would be to stick to C as it runs closer to the hardware than C# and Java which have the overhead of a runtime environment to deal with. The closer to the hardware you can get the faster you should be able to run. Higher Level languages such as C# and Java will result in quicker development times...but C...or better yet Assembly will result in quicker processing time...but longer development time.

I participated in a few TopCoder's Marathon matches where performance was they key to victory.
My choice was C#. I think C# solutions placed slightly above Java and were slighly slower than C++... Until somebody wrote a code in C++ that was a order of magnitude faster. You were alowed to use Intel compiler and the winning code was full of SIMD insturctions and you cannot replicate that in C# or Java. But if SIMD is not an option, C# and Java should be good enough as long as you take care to use memory correctly (e.g. watch for cache misses and try to limit memory access to the size of L2 cache)

You question is poorly phrased (or at least the title is) because it implies this difference is endemic and holds true for all instances of java/c#/c code.
Thankfully the body of the question is better phrased because it presents a reasonably detailed explanation of the sort of thing your code is doing. It doesn't state what versions (or providers) of c#/java runtimes you are using. Nor does it state the target architecture or machine the code will run on. These things make big differences.
You have done some benchmarking, this is good. Some suggestions as to why you see the results you do:
You aren't as good at writing performant c# code as you are at java/c (this is not a criticism, or even likely but it is a real possibility you should consider)
Later versions of the JVM have some serious optimizations to make uncontended locks extremely fast. This may skew things in your favour (And especially the comparison with the c implementation threading primitives you are using)
Since the java code seems to run well compared to the c code it is likely that you are not terribly dependent on the heap allocation strategy (profiling would tell you this).
Since the c# code runs less well than the java one (and assuming the code is comparable) then several possible reasons exist:
You are using (needlessly) virtual functions which the JVM will inline but the CLR will not
The latest JVM does Escape Analysis which may make some code paths considerably more efficient (notably those involving string manipulation whose lifetime is stack bound
Only the very latest 32 bit CLR will inline methods involving non primitive structs
Some JVM JIT compilers use hotspot style mechanisms which attempt to detect the 'hotspots' of the code and spend more effort re-jitting them.
Without an understanding of what your code spends most of its time doing it is impossible to make specific suggestions. I can quite easily write code which performs much better under the CLR due to use of structs over objects or by targeting runtime specific features of the CLR like non boxed generics, this is hardly instructive as a general statement.

Actually it is 'Assembly language'.

Depends on what kind of application you are writing.
Try The Computer Language Benchmarks Game
http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=csharp&lang2=java&box=1
http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=csharp&lang2=java&box=1

To reiterate a comment, you should be using the GPU, not the CPU if you are doing arithmetic scientific computing. Matlab with CUDA plugins would be much more awesome than Java or c# if Matlab licensing is not an issue. The nVidia documentation shows how to compile any CUDA function into a mex file. If you need free software, I like pycuda.
If however, GPUs are not an option, I personally like C for a lot of routines because the optimizations the compiler makes are not as complicated as JIT: you don't have to worry about whether a "class" becomes like a "struct" or not. In my experience, problems can usually be broken down such that higher-level things can be written in a very expressive language like Python (rich primitives, dynamic types, incredibly flexible reflection), and transformations can be written in something like C. Additionally, there's neat compiler software, like PLUTO (automatic loop parallelization and OpenMP code generation), and libraries like Hoard, tcmalloc, BLAS (CUBLAS for gpu), etc. if you choose to go the C/C++ route.

One thing to notice is that IF your application(s) would benefit of lazy evaluation a functional programming language like Haskell may yield speedups of a totally different magnitude than the theretically optimal structured/OO code just by not evaluating unnecessary branches.
Also, if you are talking about the monetary benefit of better performance, don't forget to add the cost of maintaing your software into the equation.

Surely the answer is to go and buy the latest PC with the most cores/processors you can afford. If you buy one of the latest 2x4 core PCs you will find not only does it have twice as many cores as a quad core but also they run 25-40% faster than the previous generation of processors/machines.
This will give you approximately a 150% speed up. Far more than choosing Java/C# or C.
and whats more your get the same again every 18 months if you keep buying in new boxes!
You can sit there for months rewriting you code or I could go down to my local PC store this afternoon and be running faster than all your efforts same day.
Improving code quality/efficiency is good but sometimes implementation dollars are better spent elsewhere.

Writing in one language or another will only give you small speed ups for a large amount of work. To really speed things up you might want to look at the following:
Buying the latest fastest Hardware.
Moving from 32 bit operating system to 64 bit.
Grid computing.
CUDA / OpenCL.
Using compiler optimisation like vectorization.

I would go with C# (or Java) because your development time will probably be much faster than with C. If you end up needing extra speed then you can always rewrite a section in C and call it as a module.

My preference would be C or C++ because I'm not separated from the machine language by a JIT compiler.
You want to do intense performance tuning, and that means stepping through the hot spots one instruction at a time to see what it is doing, and then tweaking the source code so as to generate optimal assembler.
If you can't get the compiler to generate what you consider good enough assembler code, then by all means write your own assembler for the hot spot(s). You're describing a situation where the need for performance is paramount.
What I would NOT do if I were in your shoes (or ever) is rely on anecdotal generalizations about one language being faster or slower than another. What I WOULD do is multiple passes of intense performance tuning along the lines of THIS and THIS and THIS. I have done this sort of thing numerous times, and the key is to iterate the cycle of diagnosis-and-repair because every slug fixed makes the remaining ones more evident, until you literally can't squeeze another cycle out of that turnip.
Good luck.
Added: Is it the case that there is some seldom-changing configuration information that determines how the bulk of the data is processed? If so, it may be that the program is spending a lot of its time re-interpreting the configuration info to figure out what to do next. If so, it is usually a big win to write a code generator that will read the configuration info and generate an ad-hoc program that can whizz through the data without constantly having to figure out what to do.

Depends what you benchmark and on what hardware. I assume it's speed rather than memory or CPU usage.But....
If you have a dedicated machine for an app only with very large amounts of memory then java might be 5% faster.
If you go down in the real world with limited memory and more apps running on the same machine .net looks better at utilizing computing resources :see here
If the hardware is very constrained, C/C++ wins hands down.

If you are using a highly multithreaded code, I would recommend you to take a look at the upcoming Task Parallel Library (TPL) for .NET and the Parallel Pattern Library (PPL) for native C++ applications. That will save you a lot of issues with thread/dead lockíng and all other issues that you would spend a lot of time digging into and solving for yourself.
For my self, I truly believe that the memory management in the managed world will be more efficient and beat the native code in the long term.

If much of your code is in C why not keep it?
In principal and by design it's obvious that C is faster. They may close the gap over time but they always have more level os indirection and "safety". C is fast because it's "unsafe". Just think about bound checking. Interfacing to C is supported in every langauge. And so I can not see why one would not like to just wrap the C code up if it's still working and use it in whatever language you like

I would consider what everyone else uses - not the folks on this site, but the folks who write the same kind of massively parallel, or super high-performance applications.
I find they all write their code in C/C++. So, just for this fact alone (ie. regardless of any speed issues between the languages), I would go with C/C++. The tools they use and have developed will be of much more use to you if you're writing in the same language.
Aside from that, I've found C# apps to have somewhat less than optimal performance in some areas, multithreading is one. .NET will try to keep you safe from thread problems (probably a good thing in most cases), but this will cause your specific case problems (to test: try writing a simple loop that accesses a shared object using lots of threads. Run that on a single core PC and you get better performance than if you run it on a multiple core box - .net is adding its own locks to make sure you don't muck it up)(I used Jon Skeet's singleton benchmark. The static lock on took 1.5sec on my old laptop, 8.5s on my superfast desktop, the lock version is even worse, try it yourself)
The next point is that with C you tend to access memory and data directly - nothing gets in the way, with C#/Java you will use some of the many classes that are provided. These will be good in the general case, but you're after the best, most efficient way to access this (which, for your case is a big deal with multi-terabytes of data, those classes were not designed with those datasets in mind, they were designed for the common cases everyone else uses), so again, you would be safer using C for this - you'll never get the GC getting clogged up by a class that creates new strings internally when you read a couple of terabytes of data if you write it in C!
So it may appear that C#/Java can give you benefits over a native application, but I think you'll find those benefits are only realised for the kind of line-of-business applications that are commonly written.

Note that for heavy computations there is a great advantage in having tight loops which can fit in the CPU's first level cache as it avoids having to go to slower memory repeatedly to get the instructions.
Even for level two cache a large program like Quake IV gets a 10% performance increase with 4 Mb level 2 cache versus 1 Mb level 2 cache - http://www.tomshardware.com/reviews/cache-size-matter,1709-5.html
For these tight loops C is most likely the best as you have the most control of the generated machine code, but for everything else you should go for the platform with the best libraries for the particular task you need to do. For instance the netlib libraries are reputed to have very good performance for a very large set of problems, and many ports to other languages are available.

If every percentage will really save you tens of thousands of dollars, then you should bring in a domain expert to help with the project. Well designed and written code with performance considered at the initial stages may be an order of magnitude faster, saving you 90%, or $900,000. I recently found a subtle flaw in some code that sped up a process by over 100 times. A colleague of mine found an algorithm that was running in O(n^3) that he re-wrote to make it O(N log n). This tends to be where the huge performance saving are.
If the problem is so simple that you are certain that a better algorithm cannot be employed giving you significant savings, then C is most likely your best language.

The most important things are already said here. I would add:
The developer utilizes a language which the compiler(s) utilize(s) to generate machine instructions which the processor(s) utilize(s) to use system resources. A program will be "fast" when ALL parts of the chain perform optimally.
So for the "best" language choice:
take that language which you are best able to control and
which is able to instruct the compiler sufficiently to
generate nearly optimal machine code so that
the processor on the target machine is able to utilize processing resources optimally.
If you are not a performance expert you will have a hard time to archieve 'peak performance' within ANY language. Possibly C++ still provides the most options to control the machine instructions (especially SSE extensions a.s.o).
I suggest to orient on the well known 80:20 rule. This is fairly well true for all: the hardware, the languages/platforms and the developer efforts.
Developers have always relied on the hardware to fix all performance issues automatically due to an upgrade to a faster processor f.e.. What might have worked in the past will not work in the (nearest) future. The developer now has the responsibility to structure her programs accordingly for parallelized execution. Languages for virtual machines and virtual runtime environments will show some advantage here. And even without massive parallelization there is little to no reason why C# or Java shouldn't succeed similar well as C++.
#Edit: See this comparison of C#, Matlab and FORTRAN, where FORTRAN does not win alone!

Ref; "My benchmarks so far show Java and C are about on par with each other"
Then your benchmarks are severely flawed...
C will ALWAYS be orders of magnitudes faster then both C# and Java unless you do something seriously wrong...!
PS!
Notice that this is not an attempt to try to bully neither C# nor Java, I like both Java and C#, and there are other reasons why you would for many problems choose either Java or C# instead of C. But neither Java nor C# would in a correct written tests NEVER be able to perform with the same speed as C...
Edited because of the sheer number of comments arguing against my rhetoric
Compare these two buggers...
C#
public class MyClass
{
public int x;
public static void Main()
{
MyClass[] y = new MyClass[1000000];
for( int idx=0; idx < 1000000; idx++)
{
y[idx] = new MyClass();
y[idx].x = idx;
}
}
}
against this one (C)
struct MyClass
{
int x;
}
void Main()
{
MyClass y[1000000];
for( int idx = 0; idx < 1000000; idx++)
{
y[idx].x = idx;
}
}
The C# version first of all needs to store its array on the heap. The C version stores the array on the stack. To store stuff on the stack is merely changing the value of an integer value while to store stuff on the heap means finding a big enough chunk of memory and potentially means traversing the memory for a pretty long time.
Now mostly C# and Java allocates huge chunks of memory which they keep on spending till it's out which makes this logic execute faster. But even then to compare this against changing the value of an integer is like an F16 against an oil tanker speedwise...
Second of all in the C version since all those objects are already on the stack we don't need to explicitly create new objects within the loop. Yet again for C# this is a "look for available memory operation" while the C version is a ZIP (do nothing operation)
Third of all is the fact that the C version will automatically delete all these objects when they run out of scope. Yet again this is an operation which ONLY CHANGES THE VALUE OF AN INTEGER VALUE. Which would on most CPU architectures take between 1 and 3 CPU cycles. The C# version doesn't do that, but when the Garbage Collector kicks in and needs to collect those items my guess is that we're talking about MILLIONS of CPU cycles...
Also the C version will instantly become x86 code (on an x86 CPU) while the C# version would first become IL code. Then later when executed it would have to be JIT compiled, which probably alone takes orders of magnitudes longer time then only executing the C version.
Now some wise guy could probably execute the above code and measure CPU cycles. However that's basically no point at all in doing because mathematically it's proven that the Managed Version would probably take several million times the number of CPU cycles as the C version. So my guess is that we're now talking about 5-8 orders of magnitudes slower in this example. And sure, this is a "rigged test" in that I "looked for something to prove my point", however I challenge those that commented badly against me on this post to create a sample which does NOT execute faster in C and which also doesn't use constructs which you normally never would use in C due to "better alternatives" existing.
Note that C# and Java are GREAT languages. I prefer them over C ANY TIME OF THE DAY. But NOT because they're FASTER. Because they are NOT. They are ALWAYS slower then C and C++. Unless you've coded blindfolded in C or C++...
Edit;
C# of course have the struct keyword, which would seriously change the speed for the above C# version, if we changed the C# class to a value type by using the keyword struct instead of class. The struct keyword means that C# would store new objects of the given type on the stack - which for the above sample would increase the speed seriously. Still the above sample happens to also feature an array of these objects.
Even though if we went through and optimized the C# version like this, we would still end up with something several orders of magnitudes slower then the C version...
A good written piece of C code will ALWAYS be faster then C#, Java, Python and whatever-managed-language-you-choose...
As I said, I love C# and most of the work I do today is C# and not C. However I don't use C# because it's faster then C. I use C# because I don't need the speed gain C gives me for most of my problems.
Both C# and Java is though ridiculously slower then C, and C++ for that matter...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.