Unmanaged method calls taking high CPU

Unmanaged method calls taking high CPU - c#

I have an unmanaged method that when executed takes high CPU. Is it safe to say that unmanaged calls naturally take high CPU?
Following is the code:
public void ReadAt(long ulOffset, IntPtr pv, int cb, out UIntPtr pcbRead)
{
Marshal.Copy(buffer, 0, pv, bytesRead);
pcbRead = new UIntPtr((uint)bytesRead);
bytesRead = 0;
if (streamClosed)
buffer = null;
}

No it's not safe to generalize this. Both managed and unmanaged methods take whatever CPU they need to execute their code.
When someone says unmanaged calls may be expensive they usually mean the overhead from switching between managed and unmanaged. This particular cost will only matter if you do unmanaged calls in tight loops like per-pixel processing on a large image.
Some of the overhead of unmanaged calls can be removed by proper attributes, in particular it is possible to move the security checks from per-call to assembly-load-time. This is of course already done for all unmanaged functions in the .NET framework.
The best guess (without more context) about why you are spending so much time in that function is that you are either (a) copying a very large array or (b) you are calling the method very often in a loop.
In the first case the overhead from switching between managed-unmanaged for Marshal.Copy is negligible, copying a large memory block will always saturate the CPU (ie. 100% usage of one core). There is nothing you can do except eliminating the copy operation completely (which may or may not be possible, depending on how you use the buffers).
If you are in the second case and the arrays are very small it may be worth switching to a purely managed loop. But don't do this without measuring, it's easy to guess wrong, and the unmanaged implementation of Marshal.Copy can pull more tricks than the managed JIT to make up for the overhead.
PS: You might want to read this - high CPU usage by itself is no bad thing, the computer is just trying to get things done as fast as possible. Managed or unmanaged does not matter, if your usage is below 100% (per core) it just means your computer got nothing to do.

Related

Multi-threaded managed application that calls native code

I have a service that's using ASP WebApi. Each http request translates to a thread that needs to do some data manipulation (possibly changing the data). The API layer is written in C# and the data manipulation is written in C++. The C# layer calls the native library and supplies a pointer to some managed buffer.
Couple of questions:
How can I make sure there are no races? is std::mutex in the native library enough in this case? (do managed threads map to native threads? will they share the same std::mutex?)
How can I make sure that the GC doesn't release the pointer to the managed buffer while the native library is manipulating it?

Do you need a shared buffer? If the buffer is only ever used on one thread, you save yourself a lot of trouble. Managed threads do not map to native threads 1:1, but I'm not sure if that has any effect on your scenario.
You need to fix the buffer, and keep it fixed the whole time the native code has a pointer to it - releasing is the least of your worries, the .NET memory is moved around all the time. This is done using the fixed block.
Fixing managed memory:
byte[] theBuffer = new byte[256];
fixed (byte* ptr = &theBuffer[0])
{
// The pointer is now fixed - the GC is prohibited from moving the memory
TheNativeFunction(ptr);
}
// Unfixed again
However, note that prohibiting the GC from moving memory around can cause you quite a bit of trouble - it can prevent heap compaction altogether in a high-throughput server, for example.
If you don't need to work with the memory in the managed environment, you can simply allocate unmanaged memory for the task, such as by using Marshal.AllocHGlobal.

C# Memory optimization for large arrays

Here are two code parts in c++ and c# doing absolutely the same thing:
C++
http://ideone.com/UfL5R
#include <stdio.h>
int main(int argc, char *argv[]) {
char p[1000000];
unsigned int i,j;
unsigned long long s=0;
for(i=2;i<1000000;i++) p[i]=1;
for(i=2;i<500000;) {
for(j=2*i;j<1000000;j+=i) p[j]=0;
for(i++;!p[i];i++);
}
for(i=3,s=2;i<1000000;i+=2) if(p[i]) s+=i;
printf ("%lld\n",s);
return 0;
}
time: 0.01s memmory: 2576 kB
C#
http://ideone.com/baXYm
using System;
namespace ConsoleApplication4
{
internal class Program
{
private static void Main(string[] args)
{
var p = new byte[1000000];
ulong i, j;
double s = 0;
for(i=2;i<1000000;i++)
p[i]=1;
for(i=2;i<500000;)
{
for(j=2*i;j<1000000;j+=i)
p[j]=0;
for(i++;p[i]==0;i++);
}
for(i=3,s=2;i<1000000;i+=2)
if(p[i]!=0) s+=i;
Console.WriteLine(s);
}
}
}
time: 0.05s mem: 38288 kB
How can I improve the C# code to prove that C# can be as fast as C++ to my colleague?
As you can see the C# execution time is 5 time larger, and the memory consumption is 15 times larger.

Compile and run in Release mode. I get exactly 0.01s from the C# version when built and run in Release mode. As far as memory consumption is concerned you are comparing apples to oranges. A managed environment will consume more memory as it is hosting the CLR and the Garbage Collector which don't come without cost.

How to GREATLY Increase the Performance of your C# Code
Go "unsafe" (unmanaged) for that... every time you're doing someSortOfArray[i], the .NET framework is doing all kinds of neat-o things (such as out of bounds checking) which take up time.
That's really the whole point of going unmanaged (and then using pointers and doing myPointer++).
Just to clarify, if you go unmanaged and then still do a for-loop and do someArray[i], you've saved nothing.
Another S.O. question that may help you: True Unsafe Code Performance
Disclaimer
By the way, I'm not saying to do this all the time, but rather as an answer for THIS specific question only.

How can I improve the C# code to prove that C# can be as fast as C++ to my colleague?
You can't. There are legitimate areas where C++ is fundamentally faster than C#. But there are also areas where C# code will perform better than the equivalent C++ code. They're different languages with different strengths and weaknesses.
But as a programmer, you really ought to base your decisions in logic.
Logic dictates that you should gather information first, and then decide based on that.
You, on the contrary, made the decision first, and then looked for information to support it.
That may work if you're a politician, but it's not a good way to write software.
Don't go hunting for proof that C# is faster than C++. Instead, examine which option is faster in your case.
In any case, if you want to prove that X can be as fast as Y, you have to do it the usual way: make X as fast as Y. And as always, when doing performance tuning, a profiler is your best friend. Find out exactly where the additional time is being spent, and then figure out how to eliminate it.
Memory usage is a lost cause though. .NET simply uses more memory, for several reasons:
it has a bigger runtime library which must be present in the process' address space
.NET objects have additional members not present in C++ classes, so they use more memory
the garbage collector means that you'll generally have some amount of "no-longer-used-but-not-yet-reclaimed" memory lying around. In C++, memory is typically released immediately. In .NET it isn't. .NET is based on the assumption that memory is cheap (which is typically true)

Just a note to your timing. Its not shown, how did you measure the execution times. One can expect a reasonable overhead for .NET applications on startup. So if you are about the execution time of the loops only, you should run the inner loops several (many) times, skip the 1..2 first iterations, measure the other iterations and compute the average.
I would expect the results be more similar than. However, as always when targeting 'peak performance' - precautions regarding the memory management are important. Here, it probably would be sufficient to prevent from 'new' inside the measurement functions. Reuse the p[] in each iteration.

The memory usage may be related to garbage collection. In Java, memory usage is intentionally high -- garbage collection only happens when you need more memory. This is for speed reasons, so it would make sense that C# does the same thing. You shouldn't do this in release code, but to show much memory you're actually using, you can call GC.Collect() before measuring memory usage. Do you really care how much memory it's using though? It seems like speed in more important. And if you have memory limits, you can probably set the amount of memory that your program will use before garbage collecting.

Is better to allocate/deallocate an IntPtr (AllocHGlobal) each time or preserve it?

I've a situation where I use AllocHGlobal (always of the same size) on a function which I use often (30 times each seconds probably), at the end of the function I call FreeHGlobal
Is better if I keep the part of memory allocated with AllocHGlobal and free it when the class Dispose or should I alloc/free each time I call the function?
I don't know how this memory behaves in c#, it's a "new world" for me

30 seconds is an eternity. Much easier to put the FreeHGlobal in a finally block to ensure it's released. Saves you from having to do to do the finalizer and IDisposable song and dance. Well, the client's code.
Favoring the caching over the heap churn doesn't start to pay off until it gets in the millisecond range.

As with any performance question - write clean code first, measure and optimize what is needed.
If you are pretty sure that your object will not ever be used by multiple threads (thus making simultaneous calls to the function in question) it seems to fine to cache allocated memory.
If you decide to cache the unmanged block of memory relying on garbage collection will likley not be enough to free the memory early enough. Unmanged memory (AllocHGlobal) is not counted against CLR allocated memory thus potentially delaying garbage collection of your objects). You should implement and properlly use IDisposable on your objects.

You can do this as long as:
The amount of memory required is always less than or equal to the size of your allocation. (You have already indicated that this is the case.)
The method is not called from more than one thread at a time.
The method is not re-entrant.
You are careful to free the memory when it is no longer needed. (In other words, if the pointer is saved on a non-static class field, the class should implement IDisposable and the Dispose() method should free the memory. And, of course, the class consumers will have to call Dispose() or use a using(){} block.)
However, it's much more likely that the actual marshaling of the P/Invoke call is going to be the bottleneck. Have you actually profiled the code yet, or are you micro-optimizing?

windbg "Free" object type

Monitoring my program's Virtual Bytes usage while it is running showed that by doing some kind operations, the virtual bytes usage goes up by about 1GB in about 5 minutes.
The program deals with tcp sockets and high data transfer throughput between them (~800Mbps).
Loading a dump file of the program in windbg showed that the reason for the very high and fast memory usage is about 1GB of "free" objects.
Indeed, when I call the garbage collector (gen 0, 1 & 2) from the console screen of the program (after getting it to this state) it frees up about 1GB of memory usage.
I'm trying to understand what exactly are these free objects and why aren't they garbage collected by the garbage collector automatically.
Edit: One suggestion was that I may be creating the objects in the Large Object Heap and it becomes fragmanted but this is not the case as I've seen that all of the "free" objects sits in Gen 2 Heap.
Other suggestion was that maybe Gen 2 Heap gets fragmented because of pinned objects but if this was the case, GC.Collect wouldn't fix the problem but it actually does so I believe this is not the case as well.
What I suspect from the discussion with Paul is that the memory does gets freed but is from some reason returned to the OS rarely or only when I manually call GC.Collect.

They are not free 'objects', they are free space. .NET does not release memory it has used back to the operating system immediately. Any free blocks can be used for subsequent object allocations, providing they fit inside the free block (otherwise the heap has to be extended by asking the operating system to allocate more memory).
The garbage collector makes efforts to combine free space into large usable blocks by compacting generation 2. This is not always possible: for example an application may pin objects which will potentially prevent the garbage collector from combining free space by moving the live objects to the front of the heap. If this happens a lot, the application's memory will be broken up into useless small blocks and this affect is known as 'heap fragmentation'.
In addition, there is a Large Object Heap (LOH) in which larger objects are allocated. The rationale is that there is a cost associated with heap compaction as data must be copied around and so the LOH is not compacted, avoiding these costs. However, the flipside is that the LOH can become easily fragmented, with small, less useful blocks of free memory interspersed between live objects.
I would suggest running a dumpheap -stat. This command will report at the end of the list any fragmented blocks. You can then dump those blocks to get an idea of what is happening.

By the way, it looks like you have a well-known problem (at least among socket gurus) that most socket servers get in .Net. Paul has already touched on what it means. To elaborate more, what is going wrong is that during a Read/Write on the socket the buffer gets pinned - which means the GC isn't allowed to move it around (as such your heap fragments). Coversant (who pioneered the solution) were seeing an OutOfMemoryException when their actual memory usage was only about 500MB (due to such heavy fragmentation). Fixing it is another story entirely.
What you want to do is at application start up allocate a very large amount of buffers (I am currently doing 50MB). You will find the new ArraySegment<T> (v2.0) and ConcurrentQueue<T> (v4.0) classes particularly useful when writing this. There are a few lock-free queues floating around on the tubes if you are not using v4.0 yet.
// Pseudo-code.
ArraySegment<byte> CheckOut()
{
ArraySegment<byte> result;
while(!_queue.TryDequeue(out result))
GrowBufferQueue(); //Enqueue a bunch more buffers.
return result;
}
void CheckOut(ArraySegment<byte> buffer)
{
_queue.Enqueue(buffer);
}
void GrowBufferQueue()
{
// Verify this, I did throw it together in 30s.
// Allocates nearly 2MB. You might want to tweak that.
for(var j = 0; j < 5; j++)
{
var buffer = new byte[409600]; // 4096 = page size on Windows.
for(var i = 0; i < 409600; i += 4096)
_queue.Enqueue(new ArraySegment<byte>(buffer, i, 4096));
}
}
Following this you will need to subclass NetworkStream and swap out the incoming buffer with one from your buffer pool. Buffer::BlockCopy will help performance (don't use Array::Copy). This is complicated and hairy; especially if you make it async-capable.
If you are not layering streams (e.g. SSLStream <--> DeflateStream <--> XmlWriter etc.) you should use the new socket async pattern in .Net 4.0; which has more efficiency around the IAsyncResults that get passed around. Because you are not layering streams you have full control over the buffers that get used - so you don't need to go the NetworkStream subclass route.

When would I need to use the stackalloc keyword in C#?

What functionality does the stackalloc keyword provide? When and Why would I want to use it?

From MSDN:
Used in an unsafe code context to allocate a block of memory on the
stack.
One of the main features of C# is that you do not normally need to access memory directly, as you would do in C/C++ using malloc or new. However, if you really want to explicitly allocate some memory you can, but C# considers this "unsafe", so you can only do it if you compile with the unsafe setting. stackalloc allows you to allocate such memory.
You almost certainly don't need to use it for writing managed code. It is feasible that in some cases you could write faster code if you access memory directly - it basically allows you to use pointer manipulation which suits some problems. Unless you have a specific problem and unsafe code is the only solution then you will probably never need this.

Stackalloc will allocate data on the stack, which can be used to avoid the garbage that would be generated by repeatedly creating and destroying arrays of value types within a method.
public unsafe void DoSomeStuff()
{
byte* unmanaged = stackalloc byte[100];
byte[] managed = new byte[100];
//Do stuff with the arrays
//When this method exits, the unmanaged array gets immediately destroyed.
//The managed array no longer has any handles to it, so it will get
//cleaned up the next time the garbage collector runs.
//In the mean-time, it is still consuming memory and adding to the list of crap
//the garbage collector needs to keep track of. If you're doing XNA dev on the
//Xbox 360, this can be especially bad.
}

Paul,
As everyone here has said, that keyword directs the runtime to allocate on the stack rather than the heap. If you're interested in exactly what this means, check out this article.

http://msdn.microsoft.com/en-us/library/cx9s2sy4.aspx
this keyword is used to work with unsafe memory manipulation. By using it, you have ability to use pointer (a powerful and painful feature in C/C++)

stackalloc directs the .net runtime to allocate memory on the stack.

Most other answers are focused on the "what functionality" part of OP's question.
I believe this will answers the when and why:
When do you need this?
For the best worst-case performance with cache locality of multiple small arrays.
Now in an average app you won't need this, but for realtime sensitive scenarios it gives more deterministic performance: No GC is involved and you are all but guaranteed a cache hit.
(Because worst-case performance is more important than average performance.)
Keep in mind that the default stack size in .net is small though!
(I think it's 1MB for normal apps and 256kb for ASP.net?)
Practical use could for example include realtime sound processing.

It is like Steve pointed out, only used in unsafe code context (e.g, when you want to use pointers).
If you don't use unsafe code in your C# application, then you will never need this.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.