What exactly happens during a "managed-to-native transition"?

What exactly happens during a "managed-to-native transition"? - c#

I understand that the CLR needs to do marshaling in some cases, but let's say I have:
using System.Runtime.InteropServices;
using System.Security;
[SuppressUnmanagedCodeSecurity]
static class Program
{
[DllImport("kernel32.dll", SetLastError = false)]
static extern int GetVersion();
static void Main()
{
for (; ; )
GetVersion();
}
}
When I break into this program with a debugger, I always see:
Given that there is no marshaling that needs to be done (right?), could someone please explain what's actually happening in this "managed-to-native transition", and why it is necessary?

First the call stack needs to be set up so that a STDCALL can happen. This is the calling convention for Win32.
Next the runtime will push a so called execution frame. There are many different types of frames: security asserts, GC protected regions, native code calls, ...
The runtime uses such a frame to track that currently native code is running. This has implications for a potentially concurrent garbage collection and probably other stuff. It also helps the debugger.
So not a lot is happening here actually. It is a pretty slim code path.

Besides the marshaling layer, which is responsible for converting parameters for you and figuring out calling conventions, the runtime needs to do a few other things to keep internal state consistent.
The security context needs to be checked, to make sure the calling code is allowed to access native methods. The current managed stack frame needs to be saved, so that the runtime can do a stack walk back for things like debugging and exception handling (not to mention native code that calls into a managed callback). Internal bits of state need to be set to indicate that we're currently running native code.
Additionally, registers may need to be saved, depending on what needs to be tracked and which are guaranteed to be restored by the calling convention. GC roots that are in registers (locals) might need to be marked in some way so that they don't get garbage collected during the native method.
So mainly it's stack handling and type marshaling, with some security stuff thrown in. Though it's not a huge amount of stuff, it will represent a significant barrier against calling smaller native methods. For example, trying to P/Invoke into an optimized math library rarely results in a performance win, since the overhead is enough to negate any of the potential benefits. Some performance profiling results are discussed here.

I realise that this has been answered, but I'm surprised that no one has suggested that you show the external code in the debug window. If you right click on the [Native to Managed Transition] line and tick the Show External Code option, you will see exactly which .NET methods are being called in the transition. This may give you a better idea. Here is an example:

I can't really see much that'd be necessary to do. I suspect that it is mainly informative, to indicate to you that part of your call stack shows native functions, and also to indicate that the IDE and debugger may behave differently across that transition (since managed code is handled very differently in the debugger, and some features you expect may not work)
But I guess you should be able to find out simply by inspecting the disassembly around the transition. See if it does anything unusual.

Since you are calling a dll. it needs to go out of the managed environment. It is going into windows core. You are breaking the .net barrier and going into windows code that doesn't run the same as .NET.

Related

How to let the variable be stored in a machine register using C#?

I had referenced at MSDN and found the register keyword, but it's only in C++.
Syntax:
register int x = 0;
Can you tell me how to do that with C#?

There is no way to do that in C#. C# is compiled to MSIL, which is then compiled to native code by the JIT.
It's the JIT that will decide whether a variable will go into a register or not. You shouldn't worry about this.
As MSIL is meant to be run on different architectures, it wouldn't make much sense to include such a feature in the language. Different architectures have a different number of registers, which may be of different sizes. That's why it's the JIT's job to optimize this.

By using a keyword? No.
With unmanaged code, you certainly can though... I mean, you really don't want to... but you can : )
It is useful in extreme optimizations, where you know for sure that you can do better than the JIT Compiler. However, in those circumstances, you should probably be looking at straight unmanaged C anyway. So, I strongly urge you to do that if you can.
Let's assume you can't, and this absolutely positively must be done from C#
C# is compiled to MSIL, which takes those choices out of your hands. It actually does quite well too, so well in fact that there's rarely a need to optimize by hand. But, with C# being a managed language you have to step into an unmanaged section to do it.
There are several methods, both with and without reflection - and both using inline and external.
Firstly, you might compile that small fast section in C, ASM or some other unmanaged language as a DLL and call it unmanaged from C# in much the same way you'd call WinAPI functions... pay attention to calling conventions, there are several and each places a slightly different burden on caller/callee... for example, in terms of how parameters are passed and who clears up the stack afterwards.
Alternatively, you could use fasmNET or similar to include inline assembly for any routines which must be ultra-fast. fast can compile strings of Assembler in c# (at runtime) into a blob of memory which can then be called unmanaged from c#... many examples exist online.
Alternatively, you could externally compile just the instructions you need, provide them as a byte array yourself, and call the byte array as code in the same manner as above, but without a runtime compilation step.
There are also many tricks you can do with inline IL that can help you fine-tune your code without the JIT compilers involvement, these may or may not be useful to you depending on your project. Custom IL sections can be accomplished both with inline IL and dynamic IL and can give you considerably more control over how your c# application runs.
Depending on how often you need to switch back and forth between managed and unmanaged, you can also create a separate application domain from your code, and load your unmanaged code into that... this can help you separate the managed/unmanaged concerns and thus avoid any costly switching back and forth.
But...
I will not give code, as to how you do it depends greatly upon what you're trying to accomplish. This is not the type of thing where you should just paste a code snippet into your project - you need to research the various methods, learn about their overheads and drawbacks, and then implement them with care, wisdom and due diligence.
Personally, I'd suggest learning C and offloading such computationally important tasks as an external service. This has the added advantage of allowing you to use processor affinity to best effect. It also allows you to write clean, normal, sensible C# for your head end.
But trust me, if your code is too slow and you think using registers for a few variables will speed things up... well... 95% of the time, it absolutely won't. C# does a tonne of work behind the scenes to wrangle those CPU resources as effectively as possible ... if you step in and snatch control of a few registers from it, it will usually end up producing less optimal code overall.
So, if pressed to guess at your best strategy, I'd suggest offloading that small task to a seperate C program or service, and then use C# to throw it problems and gather output. Coupled with affinity, this can result in substantial speed gains. If you need to, it is also possible to set up shared memory between managed and unmanaged code - although this requires a lot of forward planning, may require experience using a good commercial debugger, and certainly isn't for the beginner.
Note that whichever way you go, portability WILL be adversely affected.
Re-evaluate whether you really need to do this at all. There are likely many more sensible and productive optimisations that can be done from within C#, in terms of the algorithm itself, which you should explore fully before going anywhere near the hardware.

You can't.
There aren't any real useful registers in IL and there is no guarantee that the target machine will have registers. The JIT or Ahead-of-time compiler will make those decisions for you.

Diagnose/Debug potential stack corruption .NET application

I think I have a curly one here... I have an WinForms application that crashes fairly regularly every hour or so when running as an x64 process. I suspect this is due to stack corruption and would like to know if anyone has seen a similar issue or has some advice for diagnosing and detecting the issue.
The program in question has no visible UI. It's just a message window that sits in the background and acts as a sort of 'middleware' between our other client programs and a server.
It dies in different ways on different machines. Sometimes it's an 'APPCRASH' dialog that reports a fault in ntdll.dll. Sometimes it's an 'APPCRASH' that reports our own dll as the culprit. Sometimes it's just a silent death. Sometimes our unhandled exception hook logs the error, sometimes it doesn't.
In the cases where Windows Error Reporting kicks in, I've examined memory dumps from several different crash scenarios and found the same Managed exception in memory each time. This is the same exception I see reported as an unhandled exception in the cases where we it logs before it dies.
I've also been lucky (?) enough to have the application crash while I was actively debugging with Visual Studio - and saw that same exception take down the program.
Now here's the kicker. This particular exception was thrown, caught and swallowed in the first few seconds of the program's life. I have verified this with additional trace logging and I have taken memory dumps of the application a couple of minutes after application startup and verified that exception is still sitting there in the heap somewhere. I've also run a memory profiler over the application and used that to verify that no other .NET object had a reference to it.
The code in question looks a bit like this (vastly simplified, but maintains the key points of flow control)
public class AClass
{
public object FindAThing(string key)
{
object retVal = null;
Collection<Place> places= GetPlaces();
foreach (Place place in places)
{
try
{
retval = place.FindThing(key);
break;
}
catch {} // Guaranteed to only be a 'NotFound' exception
}
return retval;
}
}
public class Place
{
public object FindThing(string key)
{
bool found = InternalContains(key); // <snip> some complex if/else logic
if (code == success)
return InternalFetch(key);
throw new NotFoundException(/*UsefulInfo*/);
}
}
The stack trace I see, both in the event log and when looking at the heap with windbg looks a bit like this.
Company.NotFoundException:
Place.FindThing()
AClass.FindAThing()
Now... to me that reeks of something like stack corruption. The exception is thrown and caught while the application is starting up. But the pointer to it survives on the stack for an hour or more, like a bullet in the brain, and then suddenly breaches a crucial artery, and the application dies in a puddle.
Extra clues:
The code within 'InternalFetch' uses some Marshal.[Alloc/Free]CoTask and pinvoke code. I have run FxCop over it looking for portability issues, and found nothing.
This particular manifestation of the issue is only affecting x64 code built in release mode (with code optimization on). The code I listed for the 'Place.Find' method reflects the optimized .NET code. The unoptimized code returns the found object as the last statement, not 'throw exception'.
We make some COM calls during startup before the above code is run... and in a scenario where the above problem is going to manifest, the very first COM call fails. (Exception is caught and swallowed). I have commented out that particular COM call, and it does not stop the exception sticking around on the heap.
The problem might also affect 32 bit systems, but if it does - then the problem does not manifest in the same spot. I was only sent (typical users!) a few pixels worth of a screen shot of an 'APP CRASH' dialog, but the one thing I could make out was 'StackHash_2264' in the faulting module field.
EDIT:
Breakthrough!
I have narrowed down the problem to a particular call to SetTimer.
The pInvoke looks like this:
[DllImport("user32")]
internal static extern IntPtr SetTimer(IntPtr hwnd, IntPtr nIDEvent, int uElapse, TimerProc CB);
internal delegate void TimerProc(IntPtr hWnd, uint nMsg, IntPtr nIDEvent, int dwTime);
There is a particular class that starts a timer in its constructor. Any timers set before that object is constructed work. Any timers set after that object is constructed work. Any timer set during that constructor causes the application to crash, more often than not. (I have a laptop that crashes maybe 95% of the time, but my desktop only crashes 10% of the time).
Whether the interval is set to 1 hour, or 1 second, seems to make no different. The application dies when the timer is due - usually by throwing some previously handled exception as described above. The callback does not actually get executed. If I set the same timer on the very next line of managed code after the constructor returns - all is fine and happy.
I have had a debugger attached when the bad timer was about to fire, and it caused an access violation in 'DispatchMessage'. The timer callback was never called. I have enabled the MDAs that relate to managed callbacks being garbage collected, and it isn't triggering. I have examined the objects with sos and verified that the callback still existed in memory, and that the address it pointed to was the correct callback function.
If I run '!analyze -v' at this point, it usually (but not always) reports something along the lines of 'ERROR_SXS_CORRUPT_ACTIVATION_STACK'
Replacing the call to SetTimer with Microsoft's 'System.Windows.Forms.Timer' class also stops the crash. I've used a Reflector on the class and can see internally it still calls SetTimer - but does not register a procedure. Instead it has a native window that receives the callback. It's pInvoke definition actually looks wrong... it uses 'ints' for the eventId, where MSDN documentation says it should be a UIntPtr.
Our own code originally also used 'int' for nIDEvent rather than IntPtr - I changed it during the course of this investigation - but the crash continued both before and after this declaration change. So the only real difference that I can see is that we are registering a callback, and the Windows class is not.
So... at this stage I can 'fix' the problem by shuffing one particular call to SetTimer to a slightly different spot. But I am still no closer to actually understanding what is so special about starting the timer within that constructor that causes this error. And I dearly would like to understand the root cause of this issue.

Just briefly thinking about it it sounds like an x64 interop issue (i.e., calling x32 native functions from x64 managed code is fraught with danger). Does the problem go away if you force your application to compile as x32 platform from within project properties?
You can read suggestions on forcing x32 compile during x32/x64 development on Dotnetrocks. Richard Campbell's suggestion is that Visual Studio should default to x32 platform and not AnyCPU.
http://www.dotnetrocks.com/default.aspx?showNum=341 (transcript).
With regard to advanced debugging, I have not had a chance to debug x64 interop code, but i hear that this book is an great resource: Advanced .NET Debugging.
Finally, one thing you might try is force Visual Studio to break when an exception is thrown.

Use something like DebugDiag for x64 or Windbg to write a dump on Kernel32!TerminateProcess and second chance exception on .NET which should give you the actual .excr context frame of the exception that occurred.
This should help you in identifying the call-stack for the process terminate.
IMO it could be mostly because of PInvoke calls. You could use Managed Debugging Assistants to debug these issues.
If MDA is used along with Windbg it would give out messages that would be helpful in debugging
Also I have found tools from the http://clrinterop.codeplex.com/ team are extremely handy when dealing with interop
EDIT
This should give an answer why it is not working in 64 bit Issue with callback method in SetTimer Windows API called from C# code .

This does sound like a corruption issue. I would go through all of your interop calls and ensure that all of the parameters to the DllImport'ed functions are the correct types. For exmaple, using an int in place of an IntPtr will work in 32 bit code but can crash 64 bit.
I would use a site like PInvoke.net to verify all of the signatures.

What are the disadvantages of using P/Invoke

I'm working already a good time with the .net framework. In this time there were some situations, I used P/Invoke for doing things that I couldn't do with managed code.
However I never knew exactly what the real disadvantages of using it were. That was also the reason why I tried not to use it as far as possible.
If I google, then I find various posts about it, some draw a catastrophic picture and recommend never to use it. What are the major drawbacks and problems of an app that uses one ore more P/Invoke-calls and when do they apply. (not of a performance perspective, more in the manner "could not be executed from a network share")?

Marshalling between managed/unmanaged types has an additional overhead
Doesn't work in medium trust
Not quite intuitive and could lead to subtle bugs like leaking handles, corrupting memory, ...
It could take some time before getting it right when looking at an exported C function
Problems when migrating from x86 to x64
When something goes wrong you can't simply step into the unmanaged code to debug and understand why you are getting exceptions. You can't even open it in Reflector :-)
Restricts cross platform interoperability (ie: you can't run under Linux if you rely on a Windows-only library).
Conclusion: use only if there's no managed alternative or in performance critical applications where only unmanaged libraries exist to provide the required speed.

There are three different scenarios where you use P/Invoke, and different disadvantages arise (or don't) in each.
You need to use a Windows capability that is not provided in the .NET Framework. Your only choice is P/Invoke, and the disadvantage you incur is that now your app will only work on Windows.
You need to use a C-style DLL provided to you by someone else and there is no managed equivalent. Now you have to deploy the DLL with your app, and you have the problems of marshalling time and possible screwups in your declaration of the function (eg IntPtr/int), string marshalling, and other things people find difficult.
You have some old native code that you wrote or control and you feel like accessing it from managed code without porting it. Here you have all the problems of (2) but you do have the option of porting it instead. I'm going to assert that you will cause more bugs by porting it than you will by P/Invoking badly. You may also cause a bigger perf problem if the code you're porting makes a lot of calls to native code such as the CRT.
My bottom line is that while P/Invoke is non trivial, telling people to avoid it is bad advice. Once you have it right, the runtime marshalling costs are all that remain. These may be less than the runtime costs you would get with porting the code.

Transfer of control and data between C and C#

C# main program needs to call a C program GA.c This C code executes many functions and one function initialize() calls objective() function. But this objective function needs to be written in C#.This call is in a loop in the C code and the C code needs to continue execution after the return from objective() until its main is over and return control
to C# main program.
C# main()
{
//code
call to GA in C;
//remaining code;
}
GA in C:
Ga Main()
{
//code
call to initialize function();
//remaining code
}
initialize function() in GA
{
for(some condition)
{
//code
call to objective(parameter) function in C#;
//code
}
}
How do we do this?

Your unmanaged C code needs to be in a library, not an executable. When a program "calls another program", that means it executes another executable, and any communication between the two processes is either in the form of command-line arguments to the callee coupled with an integer return value to the caller, or via some sort of IPC*. Neither of which allows the passing of a callback function (although equivalent functionality can be built with IPC, it's a lot of trouble).
From this C library, you'll need to export the function(s) you wish to be entry points from the C# code. You can then call this/these exported function(s) with platform invoke in C#.
C library (example for MSVC):
#include <windows.h>
BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved){
switch(ul_reason_for_call){
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}
#ifdef __cplusplus
extern "C"
#endif
__declspec(dllexport)
void WINAPI Foo(int start, int end, void (CALLBACK *callback)(int i)){
for(int i = start; i <= end; i++)
callback(i);
}
C# program:
using System;
using System.Runtime.InteropServices;
static class Program{
delegate void FooCallback(int i);
[DllImport(#"C:\Path\To\Unmanaged\C.dll")]
static extern void Foo(int start, int end, FooCallback callback);
static void Main(){
FooCallback callback = i=>Console.WriteLine(i);
Foo(0, 10, callback);
GC.KeepAlive(callback); // to keep the GC from collecting the delegate
}
}
This is working example code. Expand it to your needs.
A note about P/Invoke
Not that you asked, but there are two typical cases where platform invoke is used:
To leverage "legacy" code. A couple of good uses here:
To make use of existing code from your own code base. For instance, your company might want a brand new GUI for their accounting software, but choose to P/Invoke to the old business layer so as to avoid the time and expense of rewriting and testing a new implementation.
To interface with third-party C code. For instance, a lot of .NET applications use P/Invoke to access native Windows API functionality not exposed through the BCL.
To optimize performance-critical sections of code. Finding a bottleneck in a certain routine, a developer might decide to drop down to native code for this routine in an attempt to get more speed.
It is in this second case that there is usually a misjudgment. A number of considerations usually prove this to be a bad idea:
There is rarely a significant speed benefit to be obtained by using unmanaged code. This is a hard one for a lot of developers to swallow, but well-written managed code usually (though not always) performs nearly as fast as well-written unmanaged code. In a few cases, it can perform faster. There are some good discussions on this topic here on SO and elsewhere on the Net, if you're interested in searching for them.
Some of the techniques that can make unmanaged code more performant can also be done in C#. Primarily, I'm referring here to unsafe code blocks in C#, which allow one to use pointers, bypassing array boundary checking. In addition, straight C code is usually written in a procedural fashion, eliminating the slight overhead that comes from object-oriented code. C# can also be written procedurally, using static methods and static fields. While unsafe code and gratuitous use of static members are generally best avoided, I'd say that they are preferable to mixing managed and unmanaged code.
Managed code is garbaged-collected, while unmanaged code is usually not. While this is mostly a speed benefit while coding, it is sometimes a speed benefit at runtime, too. When one has to manage one's own memory, there is often a bit of overhead involved, such as passing an additional parameter to functions denoting the size of a block of memory. There is also eager destruction and deallocation, a necessity in most unmanaged code, whereas managed code can offload these tasks to the lazy collector, where they can be performed later, perhaps when the CPU isn't so busy doing real work. From what I've read, garbage collection also means that allocations can be faster than in unmanaged code. Lastly, some amount of manual memory management is possible in C#, using Managed.AllocHGlobal and unsafe pointers, and this might allow one to make fewer larger allocations instead of many smaller ones. Another technique is to convert types used in large arrays to value types instead of reference types, so that the memory for the entire array is allocated in one block.
Often overlooked is the cost within the platform invoke layer. This can outweigh small native code performance gains, especially when many transitions from managed to unmanaged (or vice versa, such as with your callback function) must occur. And this cost can increase exponentially when marshaling must take place.
There's a maintenance hassle when splitting your code between managed and unmanaged components. It means maintaining logic in two different projects, possibly using two different development environments, and possibly even requiring two different developers with different skill sets. The typical C# developer is not a good C developer, and vice versa. At minimum, having the code split this way will be a mental stumbling block for any new maintainers of the project.
Oftentimes, a better performance gain can be had by just rethinking the existing implementation, and rewriting it with a new approach. In fact, I'd say that most real performance gains that are achieved when bottleneck code is rewritten for a "faster" platform are probably directly due to the developer being forced to rethink the problem.
Sometimes, the code that is chosen to be dropped out into unmanaged is not the real bottleneck. People too often make assumptions about what is slowing them down without doing actual profiling to verify. Profiling can often reveal inefficiencies that can be correctly fairly easily without dropping down to a lower-level platform.
If you find yourself faced with a temptation to mix platforms to increase performance, keep these pitfalls in mind.
* There is one more thing, sort of. The parent process can redirect the stdin and stdout streams of the child process, implementing character-based message passing via stdio. This is really just an IPC mechanism, it's just one that's been around longer than the term "IPC" (AFAIK).

This is known as a callback. When you create an instance of GA, pass it your c# objective() method as a delegate (a delegate is reference to a class method). Look for the MSDN help topic on delegates in C#.
I don't know the proper syntax for the C side of this. And there are sure to be some special considerations for calling out to unmanaged code. Someone else is bound to provide the whole answer. :)

Strategies For Tracking Down Memory Leaks When You've Done Everything Wrong

My program, alas, has a memory leak somewhere, but I'll be damned if I know what it is.
Its job is to read in a bunch of ~2MB files, do some parsing and string replacement, then output them in various formats. Naturally, this means a lot of strings, and so doing memory tracing shows that I have a lot of strings, which is exactly what I'd expect. The structure of the program is a series of classes (each in their own thread, because I'm an idiot) that acts on an object that represents each file in memory. (Each object has an input queue that uses a lock on both ends. While this means I get to run this simple processing in parallel, it also means I have multiple 2MB objects sitting in memory.) Each object's structure is defined by a schema object.
My processing classes raise events when they've done their processing and pass a reference to the large object that holds all my strings to add it to the next processing object's queue. Replacing the event with a function call to add to the queue does not stop the leak. One of the output formats requires me to use an unmanaged object. Implementing Dispose() on the class does not stop the leak. I've replaced all the references to the schema object with an index name. No dice. I got no idea what's causing it, and no idea where to look. The memory trace doesn't help because all I see are a bunch of strings being created, and I don't see where the references are sticking in memory.
We're pretty much going to give up and roll back at this point, but I have a pathological need to know exactly how I messed this up. I know Stack Overflow can't exactly comb my code, but what strategies can you suggest for tracking this leak down? I'm probably going to do this in my own time, so any approach is viable.

One technique I would try is to systematically reduce the amount of code you need to demonstrate the problem without making the problem go away. This is informally known as "divide and conquer" and is a powerful debugging technique. Once you have a small example that demonstrates the same problem, it will be much easier for you to understand. Perhaps the memory problem will become clearer at that point.

There is only one person who can help you. That person's name is Tess Ferrandez. (hushed silence)
But seriously. read her blog (the first article is pretty pertinent). Seeing how she debugs this stuff will give you a lot of deep insight into knowing what's going on with your problem.

I like the CLR Profiler from Microsoft. It provides some great tools for visualizing the managed heap and tracking down leaks.

I use the dotTrace profiler for tracking down memory leaks. It's a lot more deterministic than methodological trial and error and turns up results a lot faster.
For any actions that the system performs, I take a snapshot then run a few iterations of the function, then take another snapshot. Comparing the two will show you all the objects that were created in between but were not freed. You can then see the stack frame at the point of their creation, and therefore work out what instances are not being freed.

Get this: http://www.red-gate.com/Products/ants_profiler/index.htm
The memory and performance profiling are awesome. Being able to actually see proper numbers instead of guessing makes optimisation pretty fast. I've used it quite a bit at work for reducing the memory footprint of our main app.

Add code to the constructor of the
unamanaged object to log when it's
onstructed, and sort a unique ID.
Use that unique ID when the object
is destroyed again, and you can at
least tell which ones are going
astray.
Grep the code for every place you
construct a new object; follow that
code path to see if you have a
matching destroy.
Add chaining pointers to the
constructed objects, so you have a
link to the object constructed
before and after the current one. Then you can sweep through them later.
Add reference counters.
Is there a "debug malloc" available?

The managed debugging add in SoS (Son of Strike) is immensely poweful for tracking down managed memory 'leaks' since they are, by definition discoverable from the gc roots.
It will work in WinDbg or Visual studio (though it is in many respects easier to use in WinDbg)
It is not at all easy to get to grips with. Here is a tutorial
I would second the recommendation to check out Tess Fernandez's blog too.

How do you know for a fact that you actually have a memory leak?
One other thing: You write that your processing classes are using events. If you have registered an event handler it will keep the object that owns the event alive - i.e. the GC cannot collect it. Make sure you de-register all event handlers if you want your objects to be garbage collected.

Be careful how you define "leak". "Uses more memory" or even "uses too much memory" is not the same as "memory leak". This is especially true in a garbage-collected environment. It may simply be that GC hasn't needed to collect the extra memory you're seeing used. Also be careful about the difference between virtual memory use and physical memory use.
Finally not all "memory leaks" are caused by "memory" sorts of issues. I was once told (not asked) to fix an urgent memory leak that was causing IIS to restart frequently. In fact, I did profiling and found I was using a lot of strings through the StringBuilder class. I implemented an object pool (from an MSDN article) for the StringBuilders, and memory usage went down substantially.
IIS still restarted just as frequently. This was because there was no memory leak. Instead, there was unmanaged code that claimed to be thread-safe but was not. Using it in a web service (multiple threads) caused it to write all over the C Runtime Library heap. Since nobody was looking for unmanaged exceptions, nobody saw this until I happened to do some profiling with AQtime from Automated QA. It happens to have an events window, that happened to display the cries of pain from the C Runtime Library.
Placed locks around the calls to the unmanaged code, and the "memory leak" went away.

If your unmanaged object really is the cause of the leak, you may want to have it call AddMemoryPressure when it allocates unmanaged memory and RemoveMemoryPressure in Finalize/Dispose/where ever it deallocates the unmanaged memory. This will give the GC a better handle on the situation, because it may not realize there's a need to schedule collection otherwise.

You mentioned that your using events. Are you removing the handlers from those events when your done with your object? I've found that 'loose' event handlers will cause a lot of memory leak problems if you add a bunch of handlers without removing them when your done.

The best memory profiling tool for .Net is this:
http://memprofiler.com
Also, while I'm here, the best performance profiler for .Net is this:
http://www.yourkit.com/dotnet/download/index.jsp
They are also great value for money, have low overhead and are easy to use. Anyone serious about .Net development should consider both of these a personal investment and purchase immediately. Both of them have a free trial.
I work on a real time game engine with over 700k lines of code written in C# and have spent hundreds of hours using both these tools. I have used the Sci Tech product since 2002 and YourKit! for the last three years. Although I've tried quite a few of the others I have always returned to these.
IMHO, they are both absolutely brilliant.

Similar to Charlie Martin, you can do something like this:
static unigned __int64 _foo_id = 0;
foo::foo()
{
++_foo_id;
if (_foo_id == MAGIC_BAD_ALLOC_ID)
DebugBreak();
std::werr << L"foo::foo # " << _foo_id << std::endl;
}
foo::~foo()
{
--_foo_id;
std::werr << L"foo::~foo # " << _foo_id << std::endl;
}
If you can recreate it, even once or twice with the same allocation id, this will let you look at what is happening right then and there (obviously TLS/threading has to be handled as well, if needed, but I left it out for clarity).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.