Properly accessing a SafeArray of VT_UNKNOWN with SafeArrayGetElement

Properly accessing a SafeArray of VT_UNKNOWN with SafeArrayGetElement - c#

We have a COM component who’s implementation and interface definition exist in managed code but is driven by a native component. The managed component is returning a SafeArray back to the native code via the following method declaration.
interface IExample {
<return: MarshalAs(UnmanagedType.SafeArray, SafeArraySubType=VarEnum.VT_UNKNOWN)>
object[] DoSomeOperation()
}
The generated native signature properly passes this back as a SafeArray.
During a code review though we came up with some questions about calls to the resulting array with SafeArrayGetElement. The issue is whether or not SafeArrayGetElement returns a IUnknown instance which is AddRef'd or not. Essentially it boils down to which of the following is correct
Example 1:
CComPtr<IUnknown> spUnk;
hr = SafeArrayGetElement(pArray, &bounds, reinterpret_cast<void**>(&spUnk));
Example 2:
IUnknown* pUnk;
hr = SafeArrayGetElement(pArray, &bounds, reinterpret_cast<void**>(&pUnk));
The documentation is very thin on this subject. It only includes the following line.
If the data element is a string, object, or variant, the function copies the element in the correct way.
The definition of correct is a bit ambiguous.

The first method should be correct, and would be in line with handling of objects throughout COM, presumably the definition you found makes the assumption that the consumer knows the correct way.
The other items mentioned require it. Copying a VARIANT or a SAFEARRAY carries an implicit AddRef() when they contain objects. VARIANT doesn't require it when VT_BYREF is present, though.
VariantCopy # MSDN
SafeArrayCopy # MSDN
This behavior isn't intrinsic to SAFEARRAYs or VARIANTs as it is part of the rules of handling parameters in COM. There isn't anything stopping someone from trying to circumvent the rules, though.
For input parameters, it isn't the responsibility of the callee to AddRef() unless they intend to keep the interface pointer for further use. However, other cases of parameter use require it.
For example, interfaces placed in VARIANTs or other containers should have at least one AddRef() call applied, otherwise this would create issues when using VARIANTs as output parameters from COM methods, as transfer of data/references is one-way. The original object could expire by the time the call arrives at its destination. Similarly, marshaling an interface into a Stream requires an AddRef() as well.
Similarly, calling by reference requires at least one AddRef call as well. If this were not the case, then any suitable long-running call (say, through DCOM) may not arrive at its destination with the guarantee that the referenced object is still alive. Extra AddRef()/Release() calls are frequently skipped here, though, as the object should already be at 1+ due to creation in or before the calling scope.
If it is possible to modify the component, and your calls are in-process, then it may be desirable to use the GIT instead. This allows you to pass a token instead, and it will be easier to marshal the interface across COM apartments. The lifetime of the objects involved then becomes the responsibility of the caller through the duration of the call, and you'll be able to trap cases where an object could not be marshaled.
Creating the Global Interface Table # MSDN
Also interesting is the footnote for BSTRs.
If the implementation of a function that takes a BSTR reference parameter assigns a new BSTR to the parameter, it must free the previously referenced BSTR.
String Manipulation Functions (COM)

It should be AddRef:ed, but I don't have first-hand info that that's the case (e.g. I haven't read the source).
I think the documentation is pretty clear, though -- copying an interface pointer 'correctly' is AddRef:ing it.
If you want to be really sure, build a super-simple ATL COM object which implements IUnknown, stuff a number of them into a SAFEARRAY and put a breakpoint in CComObjectBase<>::InternalAddRef (if my memory serves). Then debug a call to SafeArrayGetElement and see if your breakpoint is hit.

Related

What does "late-bound access to the destination object" mean?

The docs for Interlocked.Exchange<T> contain the following remark:
This method overload is preferable to the Exchange(Object, Object) method overload, because the latter requires late-bound access to the destination object.
I am quite bewildered by this note. To me "late binding" refers to runtime method dispatch and doesn't seem to have anything to do with the technical specifics of atomically swapping two memory locations. What is the note talking about? What does "late-bound access" mean in this context?

canton7's answer is correct, and thanks for the shout-out. I'd like to add a few additional points.
This sentence, as is too often the case in the .NET documentation, both chooses to enstructure bizarre word usements, and thoroughly misses the point. For me, the poor word choice that stood out was not "late bound", which merely misses the point. The really awful word choice is using "destination object" to mean variable. A variable is not an object, any more than your sock drawer is a pair of socks. A variable contains a reference to an object, just as a sock drawer contains socks, and those two things should not be confused.
As you note, the reason to prefer the T version has nothing to do with late binding. The reason to prefer the T version is C# does not allow variant conversions on ref arguments. If you have a variable shelly of type Turtle, you cannot pass ref shelly to a method that takes ref object, because that method could write a Tiger into a ref object.
What then are the logical consequences of using the Object-taking overload on shelly? There are only two possibilities:
We copy the value of shelly to a second variable of type Object, do the exchange, and then copy the new value back, and now our operation is no longer atomic, which was the whole point of calling the interlocked exchange.
We change shelly to be of type Object, and now we are in a non-statically-typed and therefore bug-prone world, where we cannot ever be sure that shelly still contains a reference to Turtle.
Since both of those alternatives are terrible, you should use the generic version because it allows the aliased variable to be of the correct type throughout the operation.

The equivalent remark for Interlocked.Exchange(object, object) is:
Beginning with .NET Framework 2.0, the Exchange<T>(T, T) method overload provides a type-safe alternative for reference types. We recommend that you call it instead of this overload.
Although I haven't heard it used in this way before, I think by "late-bound" it simply means "non type-safe", as you need to cast the object to your concrete type (at runtime) before using it.
As well as virtual method dispatch, "Late Binding" also commonly refers to reflection, as the exact method to be called similarly isn't known until runtime.
To quote Eric Lippert:
Basically by "early binding" we mean "the binding analysis is performed by the compiler and baked in to the generated program"; if the binding fails then the program does not run because the compiler did not get to the code generation phase. By "late binding" we mean "some aspect of the binding will be performed by the runtime" and therefore a binding failure will manifest as a runtime failure
(emphasis mine). Under this rather loose definition, casting object to a concrete type and then calling a method on it could be seen as "late bound", as there's an element of the binding which is performed at, and could fail at, runtime.

Why structs cannot have destructors?

What is best answer on interview on such question you think?
I think I didn't find a copy of this here, if there is one please link it.

Another way of looking at this - rather than just quoting the spec which says that structs can't/don't have destructors - consider what would happen if the spec was changed so that they did - or rather, let's ask the question: can we guess why did the language designers decide to not allow structs to have 'destructors' in the first place?
(Don't get hung up on the word 'destructor' here; we're basically talking about a magic method on structs that gets called automatically when the variable goes out of scope. In other words, a language feature analogous to C++'s destructors.)
The first thing to realize is that we don't care about releasing memory. Whether the object is on the stack or on the heap (eg. a struct in a class), the memory will be taken care of one way or another sooner or later; either by being popped off the stack or by being collected. The real reason for having something that's destructor-like in the first place is for managing external resources - things like file handles, window handles, or other things that need special handling to get them cleaned up that the CLR itself doesn't know about.
Now supposed you allow a struct to have a destructor that can do this cleanup. Fine. Until you realize that when structs are passed as parameters, they get passed by value: they are copied. Now you've got two structs with the same internal fields, and they're both going to attempt to clean up the same object. One will happen first, and so code that is using the other one afterwards will start to fail mysteriously... and then its own cleanup will fail (hopefully! - worst case is it might succeed in cleaning up some other random resource - this can happen in situations where handle values are reused, for example.)
You could conceivably make a special case for structs that are parameters so that their 'destructors' don't run (but be careful - you now need to remember that when calling a function, it's always the outer one that 'owns' the actual resource - so now some structs are subtly different to others...) - but then you still have this problem with regular struct variables, where one can be assigned to another, making a copy.
You could perhaps work around this by adding a special mechanism to assignment operations that somehow allows the new struct to negotiate ownership of the underlying resource with its new copy - perhaps they share it or transfer ownership outright from the old to the new - but now you've essentially headed off into C++-land, where you need copy constructors, assignment operators, and have added a bunch of subtleties waiting to trap the unaware novice programmer. And keep in mind that the entire point of C# is to avoid that type of C++-style complexity as much as possible.
And, just to make things a bit more confusing, as one of the other answers pointed out, structs don't just exist as local objects. With locals, scope is nice and well defined; but structs can also be members of a class object. When should the 'destructor' get called in that case? Sure, you can do it when the container class is finalized; but now you have a mechanism that behaves very differently depending on where the struct lives: if the struct is a local, it gets triggered immediately at end of scope; if the struct is within a class, it gets triggered lazily... So if you really care about ensuring that some resource in one of your structs is cleaned up at a certain time, and if your struct could end up as a member of a class, you'd probably need something explicit like IDisposable/using() anyhow to ensure you've got your bases covered.
So while I can't claim to speak for the language designers, I can make a pretty good guess that one reason they decided not to include such a feature is because it would be a can of worms, and they wanted to keep C# reasonably simple.

From Jon Jagger:
"A struct cannot have a destructor. A destructor is just an override of object.Finalize in disguise, and structs, being value types, are not subject to garbage collection."

Every object other than arrays and strings is stored on the heap in the same way: a header which gives information about the "object-related" properties (its type, whether it's used by any active monitor locks, whether it has a non-suppressed Finalize method, etc.), and its data (meaning the contents of all the type's instance fields (public, private, and protected intermixed, with base-class fields appearing before derived-type fields). Because every heap object has a header, the system can take a reference to any object and know what it is, and what the garbage-collector is supposed to do with it. If the system has a list of all objects which have been created and have a Finalize method, it can examine every object in the list, see if its Finalize method is unsuppressed, and act on it appropriately.
Structs are stored without any header; a struct like Point with two integer fields is simply stored as two integers. While it is possible to have a ref to a struct (such a thing is created when a struct is passed as a ref parameter), the code that uses the ref has to know what type of struct the ref points to, since neither the ref nor the struct itself holds that information. Further, heap objects may only be created by the garbage-collector, which will guarantee that any object which is created will always exist until the next GC cycle. By contrast, user code can create and destroy structs by itself (often on the stack); if code creates a struct along with a ref to it, and passes that ref it to a called routine, there's no way that code can destroy the struct (or do anything at all, for that matter) until the called routine returns, so the struct is guaranteed to exist at least until the called routine exits. On the other hand, once the called routine exits, the ref it was given should be presumed invalid, since the caller would be free to destroy the struct at any time thereafter.

Becuase by definition destructors are used to destruct instances of classes, and structs are value types.
Ref: http://msdn.microsoft.com/en-us/library/66x5fx1b.aspx
By Microsoft's own words: "Destructors are used to destruct instances of classes." so it's a little silly to ask "Why can't you use a destructor on (something that is not a class)?" ^^

What's the deal with delegates?

I understand delegates encapsulate method calls. However I'm having a hard time understanding their need. Why use delegates at all, what situations are they designed for?

A delegate is basically a method pointer. A delegate let us create a reference variable, but instead of referring to an instance of a class, it refers to a method inside the class. It refers any method that has a return type and has same parameters as specified by that delegate. It's a very very useful aspect of event. For thorough reading I would suggest you to read the topic in Head First C# (by Andrew Stellman and Jennifer Greene). It beautifully explains the delegate topic as well as most concepts in .NET.

Well, some common uses:
Event handlers (very common in UI code - "When the button is clicked, I want this code to execute")
Callbacks from asynchronous calls
Providing a thread (or the threadpool) with a new task to execute
Specifying LINQ projections/conditions etc
Don't think of them as encapsulating method calls. Think of them as encapsulating some arbitrary bit of behaviour/logic with a particular signature. The "method" part is somewhat irrelevant.
Another way of thinking of a delegate type is as a single-method interface. A good example of this is the IComparer<T> interface and its dual, the Comparison<T> delegate type. They represent the same basic idea; sometimes it's easier to express this as a delegate, and other times an interface makes life easier. (You can easily write code to convert between the two, of course.)

They are designed, very broadly speaking, for when you have code that you know will need to call other code - but you do not know at compile-time what that other code might be.
As an example, think of the Windows Forms Button.Click event, which uses a delegate. The Windows Forms programmers know that you will want something to happen when that button is pressed, but they have no way of knowing exactly what you will want done... it could be anything!
So you create a method and assign it to a delegate and set it to that event, and there you are. That's the basic reasoning for delegates, though there are lots of other good uses for them that are related.

Delegates are often used for Events. According to MSDN, delegates in .NET are designed for the following:
An eventing design pattern is used.
It is desirable to encapsulate a static method.
The caller has no need access other properties, methods, or interfaces on
the object implementing the method.
Easy composition is desired.
A class may need more than one implementation of the methodimplementation of the method
Another well put explanation from MSDN,
One good example of using a
single-method interface instead of a
delegate is IComparable or
IComparable. IComparable declares the
CompareTo method, which returns an
integer specifying a less than, equal
to, or greater than relationship
between two objects of the same type.
IComparable can be used as the basis
of a sort algorithm, and while using a
delegate comparison method as the
basis of a sort algorithm would be
valid, it is not ideal. Because the
ability to compare belongs to the
class, and the comparison algorithm
doesn’t change at run-time, a
single-method interface is ideal.single-method interface is ideal.
Since .NET 2.0 it has also been used for anonymous functions.
Wikipedia has a nice explanation about the Delegation pattern,
In software engineering, the delegation pattern is a design pattern in object-oriented programming where an object, instead of performing one of its stated tasks, delegates that task to an associated helper object. It passes the buck, so to speak (technically, an Inversion of Responsibility). The helper object is called the delegate. The delegation pattern is one of the fundamental abstraction patterns that underlie other software patterns such as composition (also referred to as aggregation), mixins and aspects.

Oversimplified: I'd say that a delegate is a placeholder for a function until that time when something assigns a real function to the delegate. Calling un-assigned delegates throws an exception.
Confusion occurs because there is often little difference made between the definition, declaration, instantiation and the invocation of delegates.
Definition:
Put this in a namespace as you would any class-definition.
public delegate bool DoSomething(string withThis);
This is comparable to a class-definition in that you can now declare variables of this delegate.
Declaration:
Put this is one of function routines like you would declare any variable.
DoSomething doSth;
Instantiation and assignment:
Usually you'll do this together with the declaration.
doSth = new DoSomething(MyDoSomethingFunc);
The "new DoSomething(..)" is the instantiation. The doSth = ... is the assignment.
Note that you must have already defined a function called "MyDoSomething" that takes a string and returns a bool.
Then you can invoke the function.
Invocation:
bool result = doSth(myStringValue);
Events:
You can see where events come in:
Since a member of a class is usually a declaration based upon a definition.
Like
class MyClass {
private int MyMember;
}
An event is a declaration based upon a delegate:
public delegate bool DoSomething(string withWhat);
class MyClass {
private event DoSomething MyEvent;
}
The difference with the previous example is that events are "special":
You can call un-assigned events without throwing an exception.
You can assign multiple functions to an event. They will then all get called sequentially. If one of those calls throws an exception, the rest doesn't get to play.
They're really syntactic sugar for arrays of delegates.
The point is of course that something/someone else will do the assigning for you.

Delegates allow you to pass a reference to a method. A common example is to pass a compare method to a sort function.

If you need to decide at runtime, which method to call, then you use a delegate. The delegate will then respond to some action/event at runtime, and call the the appropriate method. It's like sending a "delegate" to a wedding you don't want to attend yourself :-)
The C people will recognize this as a function pointer, but don't get caught up in the terminology here. All the delegate does (and it is actually a type), is provide the signature of the method that will later be called to implement the appropriate logic.
The "Illustrated C#" book by Dan Solis provides the easiest entry point for learning this concept that I have come across:
http://www.amazon.com/Illustrated-2008-Windows-Net-Daniel-Solis/dp/1590599543

A delegate is typically a combination of an object reference and a pointer to one of the object's class methods (delegates may be created for static methods, in which case there is no object reference). Delegates may be invoked without regard for the type of the included object, since the included method pointer is guaranteed to be valid for the included object.
To understand some of the usefulness behind delegates, think back to the language C, and the printf "family" of functions in C. Suppose one wanted to have a general-purpose version of "printf" which could not only be used as printf, fprintf, sprintf, etc. but could send its output to a serial port, a text box, a TCP port, a cookie-frosting machine, or whatever, without having to preallocate a buffer. Clearly such a function would need to accept a function pointer for the character-output routine, but that by itself would generally be insufficient.
A typical implementation (unfortunately not standardized) will have a general-purpose gp_printf routine which accepts (in addition to the format string and output parameters) a void pointer, and a pointer to a function which accepts a character and a void pointer. The gp_printf routine will not use the passed-in void pointer for any purpose itself, but will pass it to the character-output function. That function may then cast the pointer to a FILE* (if gp_printf is being called by fprintf), or a char** (if it's being called by sprintf), or a SERIAL_PORT* (if it's being called by serial_printf), or whatever.
Note that because any type of information could be passed via the void*, there would be no limit as to what gp_printf could do. There would be a danger, however: if the information passed in the void* isn't what the function is expecting, Undefined Behavior (i.e. potentially very bad things) would likely result. It would be the responsibility of the caller to ensure that the function pointer and void* are properly paired; nothing in the system would protect against incorrect usage.
In .net, a delegate would provide the combined functionality of the function pointer and void* above, with the added bonus that the delegate's constructor would ensure that the data was of the proper type for the function. A handy feature.

Why the Reset() method on Enumerator class must throw a NotSupportedException()?

From what I saw on http://csharpindepth.com/Articles/Chapter6/IteratorBlockImplementation.aspx, and article by Jon Skeet, the c# specification itself says that. What would be the reason?

That's not how I read the C# spec [Word doc]. Section 10.14.4 "Enumerator objects", states:
...[E]numerator objects do not support the IEnumerator.Reset method. Invoking this method causes a System.NotSupportedException to be thrown.
However, this section (and statement) is specific to "enumerator objects", which is defined as:
When a function member returning an enumerator interface type is implemented using an iterator block, invoking the function member does not immediately execute the code in the iterator block. Instead, an enumerator object is created and returned.
In other words, an "enumerator object" is a compiler generated IEnumerator1. There's no restrictions on every IEnumerator, just the ones generated from iterator blocks (aka yield).
As for why? I'd suspect because it's somewhat impossible to do in the general case - without saving every value and the consequent memory limitations of that. Combine that with the fact that IEnumerator.Reset() is rarely used (when's the last time that you Reset an enumerator?) and that MSDN specifically calls out that it need not be implemented:
The Reset method is provided for COM interoperability. It does not necessarily need to be implemented; instead, the implementer can simply throw a NotSupportedException.
and you get to cut out a lot of complexity without anyone really noticing.
As for requiring that it throw2, I suppose it's just simpler than letting the implementor decide. IMO, it's a bit much to require the throw - there may be reasonable cases that a compiler (or other implementation1) could generate a Reset method for, but I don't see it as being a real problem either.
1 Technically, the spec leaves open the possibility of other implementations:
An enumerator object is typically an instance of a compiler-generated enumerator class that encapsulates the code in the iterator block and implements the enumerator interfaces, but other methods of implementation are possible.
but I'm not aware of any other concrete implementations. Regardless, to be compliant, other implementations of an "enumerator object" would have to throw NotSupportedException as well.
2 Nitpicker's corner: I think there may be some quibble even in the "requirement" to throw. The spec, in not using the preferred "MUST, SHOULD, MAY" verbiage, leaves it a bit open. I read "causes" more as a note of implementation - not a requirement. Then again, I haven't read the entire spec, so perhaps they define these terms a bit more or are more explicit somewhere else.

It is impossible to support properly in all sequences; many are once only (network streams, etc). And if you can't rely on it all the time, it is useless, as the abstraction is broken. Sure you could have an IResettableEnumerator, but Reset() on IEnumerator doesn't
work. Frankly, it was a mistake (IMO).
I suspect it would also have made iterator blocks even more complicated (they are currently one of the two most complex parts of the compiler; I can't remember which is "top"; them, or anonymous methods / captured variables).

Here's what MSDN says
The Reset method is provided for COM
interoperability. It does not
necessarily need to be implemented;
instead, the implementer can simply
throw a NotSupportedException.
http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx>MSDN IEnumerator..::.Reset Method
It doesn't say it must, it just says it can.
EDIT:
However as Marc has pointed out, there is a difference in the C# 2.0 Spec
22.2 Enumerator objects
Note that enumerator objects do not
support the IEnumerator.Reset method.
Invoking this method causes a
System.NotSupportedException to be
thrown.

Using COM object from C++ that in C#.NET returns object []

I have a COM object that I'm trying to use from C++ (not .NET), and all of the example programs and manual are written assuming the use of C#.NET or VB.NET. COM is new to me so I'm a bit overwhelmed. I'm using #import on the TLB but am struggling to deal with the variants that are used as parameters. I have one particular method, that according to the docs and the example programs in C#.NET, is supposed to return an object[]. Then I'm supposed to cast the first entry in this array to a ControlEvent which then tells me what to do with the rest of the objects in the array. The C#.NET example looks like:
object [] objEvent = (object []) Ctl.GetEvent();
ControlEvent ev = (ControlEvent) objEvent[0];
In my case, GetEvent is returning me a _variant_t and I need to know how to convert this to an object[] so that I can further process. Its not clear to me even how I express 'object' in C++. I see _variant_t documentation showing me a million things I can convert the variant to, but none of them seem to be converting to anything I can use. I'm hoping for some assistance converting the above C#.NET code to Visual C++
Thanks.

Typically, you look at the vt member of the variant to see what type of thing it actually is. In this case I would expect it to be an array, so you would expect that the vartype would be some variation on VT_ARRAY (usually it is bitwise OR'ed with the type of the members). Then, you get the parray member which contains the SAFEARRAY instance that actually holds the array, and use the normal safe array functions to get the data out of the array.

I haven't done this, but from reading the documentation for the _variant_t class (and the comments below which corrected my original post), I think you should read the vt field of the _variant_t instance (actually the VARTYPE vt field of the VARIANT instance: the _variant_t instance directly derives from VARIANT) to see what type of thing it contains, as described in the reference documentation for the VARIANT struct. One you know what type of thing is contained in the variant, use the corresponding type-specific operator to read it.
You'll be in for some hurt if you try to use COM without understanding it (and you may want a book which describes that); you may well need to know about the IUnknown interface and the AddRef method, for example.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.