I use a dll compiled from C++ code (LPSolve, see http://lpsolve.sourceforge.net/5.5/), from my C# code. I use the API to build a linear programming model, and subsequently solve it. I call functions such as:
[DllImport("lpsolve55.dll", SetLastError = true)]
public static extern bool add_columnex
(int lp, int count, double[] column, int[] rowno);
I am wondering what happens, memorywise, when I call such a function and the ints and arrays that I created in managed code leave scope (in the c# code). Will they be eligible for garbage collection? What does this mean for the c++ code? Or are the ineligible, and in that case: why?
Because the function prototype is using plain old datatypes and arrays, the memory for these values is pinned in place and then the native code acts directly on the data. Then when the function returns, the memory is unpinned and can be garbage collected.
In other words, they never leave the scope.
In terms of the C++ code, if it needs to store any of the data then it will need to take a copy of the data passed into it and then manage that memory itself.
I think Nick has covered the basic part, this is just to add more information. Array of int/double are considered as blittable types (types that have same layout in managed/unmanaged worlds) - these are typically get pinned when Marshalled. So you don't have to about GC. Also, what you have done indicates passing array by value - in such case, marshaller treats this as In parameter - in case, your unmanaged dll is going to update values in array then I would suggest you to mark it as In/Out parameter (e.g. [In, Out]double[] column). For more info:
Blittable and Non-Blittable Types:
http://msdn.microsoft.com/en-us/library/75dwhxf7.aspx
Copying and Pinning: http://msdn.microsoft.com/en-us/library/23acw07k.aspx
Marshalling Arrays: http://learning.infocollections.com/ebook%202/Computer/Programming/General/Programming.With.Microsoft.Dot.NET/LiB0948.htm
If your app doesn't crash with an AccessViolationException after using it for a while (past a garbage collection) then it is pretty safe to assume that the unmanaged code made a copy of the array elements you passed it. This is the normal thing to do, the library would otherwise be very hard to use from native code as well. There also ought to be an API function that lets you clear or re-initialize the model, that one should be releasing the memory.
The C or C++ function has a prototype like this:
bool add_columnex(int lp, int count, double[] column, int[] rowno);
The parameters lp and count are passed by value. The parameters column and rowno are also passed by value but the actual data is passed by reference, the function add_columnex will have to dereference column and rowno. This dereferentiation is only allowed during the function call. When the function returns, these parameters are out of scope. The kind of derenferentation must be in the contract of the interface.
When the function returns all arguments get out of scope and there is no mean that the function could do anything after call. If the function stores copies of the arguments, i.e. the address of the double or rowno array, this must be allowed by the contract. In that case you would get in trouble. A better contract would be that the function must copy the dereferenced data.
Related
I am currently writing a thin C# binding for OpenGL. I've just recently implemented the OpenGL GenVertexArrays function, which has the following signature:
OpenGL Documentation on glGenVertexArrays.
Essentially, you pass it an array in which to store generated object values for the vertex arrays created by OpenGL.
In order to create the binding, I use delegates as glGenVertexArrays is an OpenGL extension function, so I have to load it dynamically using wglGetProcAddress. The delegate signature I have defined in C# looks like this:
[SuppressUnmanagedCodeSecurity]
[UnmanagedFunctionPointer(CallingConvention.StdCall)]
private delegate void glGenVertexArrays(uint amount, uint[] array);
The function pointer is retrieved and converted to this delegate using Marshal.GetDelegateForFunctionPointer, like this:
IntPtr proc = wglGetProcAddress(name);
del = Marshal.GetDelegateForFunctionPointer(proc, delegateType);
Anyways, here's what bothers me:
In any official documentation I can find on default marshalling behaviour for reference types (which includes arrays), is this:
By default, reference types (classes, arrays, strings, and interfaces)
passed by value are marshaled as In parameters for performance
reasons. You do not see changes to these types unless you apply
InAttribute and OutAttribute (or just OutAttribute) to the method
parameter.
This is taken from this MSDN page: MSDN page on directional attributes
However, as can be seen from my delegate signatures, the [In] and [Out] directional attributes have not been used on the array of unsigned integers, meaning when I call this function I should actually not be able to see the generated object values which OpenGL should have stored in them. Except, I am. Using this signature, I can the following result when running the debugger:
As can be seen, the call absolutely did affect the array, even though I did not explicitly use the [Out] attribute. This is not, from what I understand, a result I should expect.
Does anyone know the reason behind this? I know it might seem as a minor deal, but I am very curious to know why this seems to break the default marshalling behaviour described by Microsoft. Is there some behind-the-scenes stuff going on when invoking delegates compared to pure platform invoke prototypes? Or am I misinterpreting the documentation?
[EDIT]
For anyone curious, the public method that invokes the delegate is defined on a static "GL" class, and is as followed:
public static void GenVertexArrays(uint amount, uint[] array)
{
InvokeExtensionFunction<glGenVertexArrays>()(amount, array);
}
It is not mentioned on the documentation page you linked, but there is another topic dedicated to the marshaling of arrays, where it says:
With pinning optimization, a blittable array can appear to operate as an In/Out parameter when interacting with objects in the same apartment.
Both conditions are met in your case: array of uint is blittable, and there is no machine-to-machine marshaling. It is still a good idea to declare it [Out], so your intention is documented within the code.
The documentation is correct in the general case. But uint is a bit special, it is a blittable type. An expensive word that means that the pinvoke marshaller does not have to do anything special to convert the array element values. An uint in C# is exactly the same type as an unsigned int in C. Not a coincidence at all, it is the kind of type that a processor can handle natively.
So the marshaller can simply pin the array and pass a pointer to the first array element as the second argument. Very fast, always what you want. And the function scribbles directly into the managed array, so copying the values back is not necessary. A bit dangerous too, you never ever want to lie about the amount argument, GC heap corruption is an excessively ugly bug to diagnose.
Most simple value types and structs of simple values types are blittable. bool is a notable exception. You'll otherwise never have to be sorry for using [Out] even if it is not necessary. The marshaller simply ignores it here.
I have a object that is my in memory state of the program and also have some other worker functions that I pass the object to to modify the state. I have been passing it by ref to the worker functions. However I came across the following function.
byte[] received_s = new byte[2048];
IPEndPoint tmpIpEndPoint = new IPEndPoint(IPAddress.Any, UdpPort_msg);
EndPoint remoteEP = (tmpIpEndPoint);
int sz = soUdp_msg.ReceiveFrom(received_s, ref remoteEP);
It confuses me because both received_s and remoteEP are returning stuff from the function. Why does remoteEP need a ref and received_s does not?
I am also a c programmer so I am having a problem getting pointers out of my head.
Edit:
It looks like that objects in C# are pointers to the object under the hood. So when you pass an object to a function you can then modify the object contents through the pointer and the only thing passed to the function is the pointer to the object so the object itself is not being copied. You use ref or out if you want to be able to switch out or create a new object in the function which is like a double pointer.
Short answer: read my article on argument passing.
Long answer: when a reference type parameter is passed by value, only the reference is passed, not a copy of the object. This is like passing a pointer (by value) in C or C++. Changes to the value of the parameter itself won't be seen by the caller, but changes in the object which the reference points to will be seen.
When a parameter (of any kind) is passed by reference, that means that any changes to the parameter are seen by the caller - changes to the parameter are changes to the variable.
The article explains all of this in more detail, of course :)
Useful answer: you almost never need to use ref/out. It's basically a way of getting another return value, and should usually be avoided precisely because it means the method's probably trying to do too much. That's not always the case (TryParse etc are the canonical examples of reasonable use of out) but using ref/out should be a relative rarity.
Think of a non-ref parameter as being a pointer, and a ref parameter as a double pointer. This helped me the most.
You should almost never pass values by ref. I suspect that if it wasn't for interop concerns, the .Net team would never have included it in the original specification. The OO way of dealing with most problem that ref parameters solve is to:
For multiple return values
Create structs that represent the multiple return values
For primitives that change in a method as the result of the method call (method has side-effects on primitive parameters)
Implement the method in an object as an instance method and manipulate the object's state (not the parameters) as part of the method call
Use the multiple return value solution and merge the return values to your state
Create an object that contains state that can be manipulated by a method and pass that object as the parameter, and not the primitives themselves.
You could probably write an entire C# app and never pass any objects/structs by ref.
I had a professor who told me this:
The only place you'd use refs is where you either:
Want to pass a large object (ie, the objects/struct has
objects/structs inside it to multiple levels) and copying it would
be expensive and
You are calling a Framework, Windows API or other API that requires
it.
Don't do it just because you can. You can get bit in the ass by some
nasty bugs if you start changing the values in a param and aren't
paying attention.
I agree with his advice, and in my five plus years since school, I've never had a need for it outside of calling the Framework or Windows API.
Since received_s is an array, you're passing a pointer to that array. The function manipulates that existing data in place, not changing the underlying location or pointer. The ref keyword signifies that you're passing the actual pointer to the location and updating that pointer in the outside function, so the value in the outside function will change.
E.g. the byte array is a pointer to the same memory before and after, the memory has just been updated.
The Endpoint reference is actually updating the pointer to the Endpoint in the outside function to a new instance generated inside the function.
Think of a ref as meaning you are passing a pointer by reference. Not using a ref means you are passing a pointer by value.
Better yet, ignore what I just said (it's probably misleading, especially with value types) and read This MSDN page.
While I agree with Jon Skeet's answer overall and some of the other answers, there is a use case for using ref, and that is for tightening up performance optimizations. It has been observed during performance profiling that setting the return value of a method has slight performance implications, whereas using ref as an argument whereby the return value is populated into that parameter results in this slight bottleneck being removed.
This is really only useful where optimization efforts are taken to extreme levels, sacrificing readability and perhaps testability and maintainability for saving milliseconds or perhaps split-milliseconds.
my understanding is that all objects derived from Object class are passed as pointers whereas ordinary types (int, struct) are not passed as pointers and require ref. I am nor sure about string (is it ultimately derived from Object class ?)
Ground zero rule first, Primitives are passed by value(stack) and Non-Primitive by reference(Heap) in the context of TYPES involved.
Parameters involved are passed by Value by default.
Good post which explain things in details.
http://yoda.arachsys.com/csharp/parameters.html
Student myStudent = new Student {Name="A",RollNo=1};
ChangeName(myStudent);
static void ChangeName(Student s1)
{
s1.Name = "Z"; // myStudent.Name will also change from A to Z
// {AS s1 and myStudent both refers to same Heap(Memory)
//Student being the non-Primitive type
}
ChangeNameVersion2(ref myStudent);
static void ChangeNameVersion2(ref Student s1)
{
s1.Name = "Z"; // Not any difference {same as **ChangeName**}
}
static void ChangeNameVersion3(ref Student s1)
{
s1 = new Student{Name="Champ"};
// reference(myStudent) will also point toward this new Object having new memory
// previous mystudent memory will be released as it is not pointed by any object
}
We can say(with warning) Non-primitive types are nothing but Pointers
And when we pass them by ref we can say we are passing Double Pointer
I know this question is old, but I would to brief the answer for any new curious newbie developers.
Try as much as you can to avoid using ref/out, it is against writing a clean code and bad quality code.
Passing types by reference (using out or ref) requires experience with pointers, understanding how value types and reference types differ and handling methods that have multiple return values. Also, the difference between out and ref parameters is not widely understood.
please check
https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1045
In C, there is a function that accepts a pointer to a function to perform comparison:
[DllImport("mylibrary.dll", CallingConvention = CallingConvention.Cdecl)]
private static extern int set_compare(IntPtr id, MarshalAs(UnmanagedType.FunctionPtr)]CompareFunction cmp);
In C#, a delegate is passed to the C function:
[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
delegate int CompareFunction(ref IntPtr left, ref IntPtr right);
Currently, I accept Func<T,T,int> comparer in a constructor of a generic class and convert it to the delegate. "mylibrary.dll" owns data, managed C# library knows how to convert pointers to T and then compare Ts.
//.in ctor
CompareFunction cmpFunc = (ref IntPtr left, ref IntPtr right) => {
var l = GenericFromPointer<T>(left);
var r = GenericFromPointer<T>(right);
return comparer(l, r);
};
I also have an option to write a CompareFunction in C for most important data types that are used in 90%+ cases, but I hope to avoid modifications to the native library.
The question is, when setting the compare function with P/Invoke, does every subsequent call to that function from C code incurs marshaling overheads, or the delegate is called from C as if it was initially written in C?
I imagine that, when compiled, the delegate is a sequence of machine instructions in memory, but do not understand if/why C code would need to ask .NET to make the actual comparison, instead of just executing these instructions in place?
I am mostly interested in better understanding how interop works. However, this delegate is used for binary search on big data sets, and if every subsequent call has some overheads as a single P/Invoke, rewriting comparers in native C could be a good option.
I imagine that, when compiled, the delegate is a sequence of machine instructions in memory, but do not understand if/why C code would need to ask .NET to make the actual comparison, instead of just executing these instructions in place?
I guess you're a bit confused about how .NET works. C doesn't ask .NET to execute code.
First, your lambda is turned into a compiler-generated class instance (because you're closing over the comparer variable), and then a delegate to a method of this class is used. And it's an instance method since your lambda is a closure.
A delegate is similar to a function pointer. So, like you say, it points to executable code. Whether this code is generated from a C source or a .NET source is irrelevant at this point.
It's in the interop case when this starts to matter. P/Invoke won't pass your delegate as-is as a function pointer to C code. It will pass a function pointer to a thunk which calls the delegate. Visual Studio will display this as a [Native to Managed Transition] stack frame. This is needed for different reasons such as marshaling or passing additional parameters (like the instance of the class backing your lambda for instance).
As to the performance considerations of this, here's what MSDN says, quite obviously:
Thunking. Regardless of the interoperability technique used, special transition sequences, which are known as thunks, are required each time a managed function calls an native function, and vice-versa. Because thunking contributes to the overall time that it takes to interoperate between managed code and native code, the accumulation of these transitions can negatively affect performance.
So, if your code requires a lot of transitions between managed and native code, you should get better performance by doing your comparisons on the C side if possible, so that you avoid the transitions.
In another topic, a nice guy told me by quote Eric Lippert's words:The significance of static has to do with the knowledge and certainties the compiler has at compile time of a certain class/struct/field what have you. It has nothing to do with memory locations and them being fixed or not, etc.
But I'm still not so sure, because the compiler allows something shown below happen.
struct MyStruct
{
public static int[] Arr = {1,3,5};
}
static void Test<T>(ref T t) where T:struct
{
Console.WriteLine (t);
}
void Main()
{
Test(ref MyStruct.Arr[2]);//output: as expected 5
}
Are the ref arguments totally different things compare to the c++ references, or a behind the scene pin happens every time some args are passed by ref? If the static members are movable, how does the runtime guarantee the address of an array element won't change during the execution of the called function? I've learned from an experiment that the return values of objects' Item prop other than arrays' are not allowed to be passed byRef. I thought that's because array elements are allocated in a chunk of continuous memory, but if the whole array is movable, how can one take an address of its elements?
I'm kinda stuck by this uncertainty. I'd very much appreciate it if someone could give some certain answer. Thanks in advance!
~~~~~~~~~~~~~~~~~~
Trying to understand it:
So, any managed operation as long as the compiler allows it happen, We shouldn't sweat it, right? I have some C/C++ background, I think I understand the meaning of "static" pretty well for c++, only the movability thing of managed code make me dubious. Any managed object, no matter it's on a stack or managed heap, A ref arg can always point to it correctly, right?
C# ref arguments aren't totally different from C++ references, but they are different in this respect.
C# ref arguments are known to the garbage collector and it will adjust them if it promotes objects to a different generation.
C++ references are invisible to the .NET garbage collector, they will break if the target is not pinned and the garbage collector runs.
(C++/CLI supports both .NET references and native references)
If the static members are movable, how does the runtime guarantee the address of a array element won't change during the execution of the called function?
It doesn't. But the function will use the updated address, because it's .NET code also.
And none of this changes depending on whether there's a static field referring to the array. (In fact the array itself isn't static, only the field referring to it. This fact makes the whole question nonsensical.)
In delphi the "ZeroMemory" procedure, ask for two parameters.
CODE EXAMPLE
procedure ZeroMemory(Destination: Pointer; Length: DWORD);
begin
FillChar(Destination^, Length, 0);
end;
I want make this, or similar in C#... so, what's their equivalent?
thanks in advance!
.NET framework objects are always initialized to a known state
.NET framework value types are automatically 'zeroed' -- which means that the framework guarantees that it is initialized into its natural default value before it returns it to you for use. Things that are made up of value types (e.g. arrays, structs, objects) have their fields similarly initialized.
In general, in .NET all managed objects are initialized to default, and there is never a case when the contents of an object is unpredictable (because it contains data that just happens to be in that particular memory location) as in other unmanaged environments.
Answer: you don't need to do this, as .NET will automatically "zero" the object for you. However, you should know what the default value for each value type is. For example, the default of a bool is false, and the default of an int is zero.
Unmanaged objects
"Zero-ing" a region of memory is usually only necessary in interoping with external, non-managed libraries.
If you have a pinned pointer to a region of memory containing data that you intend to pass to an outside non-managed library (written in C, say), and you want to zero that section of memory, then your pointer most likely points to a byte array and you can use a simple for-loop to zero it.
Off-topic note
On the flip side, if a large object is allocated in .NET, try to reuse it instead of throwing it away and allocating a new one. That's because any new object is automatically "zeroed" by the .NET framework, and for large objects this clearing will cause a hidden performance hit.
You very rarely need unsafe code in C#. Usually only when interacting with native libraries.
The Marshal class as some low level helper functions, but I'm not aware of any that zeros out memory.
Firstly, in .Net (including C#) then value types are zero by default - so this takes away one of the common uses of ZeroMemory.
Secondly, if you want to zero a list of type T then try a method like:
void ZeroMemory<T>(IList<T> destination)
{
for (var i=0;i<destination.Count; i+))
{
destination[i] = default(T);
}
}
If a list isn't available... then I think I'd need to see more of the calling code.
Technically there is the Array.Clear, but it's only for managed arrays. What do you want to do?