XNA/Mono Effect Throwing Runtime Cast Exception - c#

As a foreword, the exact same code works just fine in XNA, but Monogame throws an exception. This likely requires someone familiar with the Monogame rendering pipeline.
During the Draw section of my game, there's a ShadowmapResolver that renders out a texture that is the final calculated light pattern from a given light. It's receiving an exception when rendering from what is essentially EffectPass.Apply() complaining that from somewhere within Mono theres an attempted cast from int32[] to Single[]. Here's my code that calls it:
private void ExecuteTechnique(Texture2D source, RenderTarget2D destination,
string techniqueName, Texture2D shadowMap)
{
graphicsDevice.SetRenderTarget(destination);
graphicsDevice.Clear(Color.Transparent);
resolveShadowsEffect.Parameters["renderTargetSizeX"].SetValue((float)baseSizeX);
resolveShadowsEffect.Parameters["renderTargetSizeY"].SetValue((float)baseSizeY);
if (source != null)
resolveShadowsEffect.Parameters["InputTexture"].SetValue(source);
if (shadowMap != null)
resolveShadowsEffect.Parameters["ShadowMapTexture"].SetValue(shadowMap);
resolveShadowsEffect.CurrentTechnique = resolveShadowsEffect
.Techniques[techniqueName];
try
{
foreach (EffectPass pass in resolveShadowsEffect.CurrentTechnique.Passes)
{
pass.Apply(); // <--- InvalidCastException re-enters my program here
quadRender.Render(Vector2.One * -1, Vector2.One);
}
}
catch (Exception ex)
{
Util.Log(LogManager.LogLevel.Critical, ex.Message);
}
graphicsDevice.SetRenderTarget(null);
}
And here is the stacktrace:
at Microsoft.Xna.Framework.Graphics.ConstantBuffer.SetData(Int32 offset, Int32 rows, Int32 columns, Object data)
at Microsoft.Xna.Framework.Graphics.ConstantBuffer.SetParameter(Int32 offset, EffectParameter param)
at Microsoft.Xna.Framework.Graphics.ConstantBuffer.Update(EffectParameterCollection parameters)
at Microsoft.Xna.Framework.Graphics.EffectPass.Apply()
at JASG.ShadowmapResolver.ExecuteTechnique(Texture2D source, RenderTarget2D destination, String techniqueName, Texture2D shadowMap) in C:\Users\[snip]\dropbox\Projects\JASG2\JASG\JASG\Rendering\ShadowmapResolver.cs:line 253
So it would appear that one of the parameters of my shader which I am trying to set is confusing monogame somehow, but I don't see what it could be. I'm pushing floats, not int arrays. I even tried changing the RenderTarget2D.SurfaceFormat from Color to Single for all my targets and textures, still gives the exact same error.
Outside of the function I gave, in a broader scope, there are no other parameters being set since another EffectPass.Apply. There are multiple other effects that render without error before this one.
In case it helps, here's the source for the MonoGame Framework regarding ConstantBuffer.SetData()
private void SetData(int offset, int rows, int columns, object data)
{
// Shader registers are always 4 bytes and all the
// incoming data objects should be 4 bytes per element.
const int elementSize = 4;
const int rowSize = elementSize * 4;
// Take care of a single element.
if (rows == 1 && columns == 1)
{
// EffectParameter stores all values in arrays by default.
if (data is Array)
Buffer.BlockCopy(data as Array, 0, _buffer, offset, elementSize);
else
{
// TODO: When we eventually expose the internal Shader
// API then we will need to deal with non-array elements.
throw new NotImplementedException();
}
}
// Take care of the single copy case!
else if (rows == 1 || (rows == 4 && columns == 4))
Buffer.BlockCopy(data as Array, 0, _buffer, offset, rows*columns*elementSize);
else
{
var source = data as Array;
var stride = (columns*elementSize);
for (var y = 0; y < rows; y++)
Buffer.BlockCopy(source, stride*y, _buffer, offset + (rowSize*y),
columns*elementSize);
}
}
Is this some sort of marshaling problem? Thanks for your time!
Edit: P.S.: The exception is an InvalidCastException and not a NotImplementedException.

Not sure if this helps you or not but the only casting I see being done is the data as Array. I would bet that it is crashing on the line :
Buffer.BlockCopy(data as Array, 0, _buffer, offset, rows*columns*elementSize);
or
var source = data as Array;
Because they don't do any type checking before casting there. If that is the line it is crashing on it is because they don't seem to support non-array data values. I don't know this framework well enough to give you a solid answer on how to work around this. I would probably report this as a bug to the makers here

Try 2MGFX tool which optimizes shaders for monogame. MGFX tool tips

Related

What is the best choice of a collection for multiplexed 2D data in modern C# given the constraint of using preallocated memory?

I am trying to improve the usability of an open source C# API that wraps a C library. The underlying library pulls multiplexed 2D data from a server over a network connection. In C, the samples come out as a pointer to the data (many types are supported), e.g. float*. The pull function returns the number of data points (frames * channels, but channels is known and never changes) so that the client knows how much new data is being passed. It is up to the client to allocate enough memory behind these pointers. For example, if one wants to pull floats the function signature is something like:
long pull_floats(float *floatbuf);
and floatbuf better have sizeof(float)*nChannels*nMoreFramesThanIWillEverGet bytes behind it.
In order to accommodate this, the C# wrapper currently uses 2D arrays, e.g. float[,]. The way it is meant to be used is a literal mirror to the C method---to allocate more memory than one ever expects to these arrays and return the number of data points so that the client knows how many frames of data have just come in. The underlying dll handler has a signature like:
[DllImport(libname, CallingConvention = CallingConvention.Cdecl, CharSet = CharSet.Ansi, ExactSpelling = true)]
public static extern uint pull_floats(IntPtr obj, float[,] data_buffer);
And the C# wrapper itself has a definition like:
int PullFloats(float[,] floatbuf)
{
// DllHandler has the DllImport code
// Obj is the class with the handle to the C library
uint res = DllHandler.pull_floats(Obj, floatbuf);
return res/floatbuf.GetLength(1);
}
The C++ wrapper for this library is idiomatic. There, the client supplies a vector<vector<T>>& to the call and in a loop, each frame gets pushed into the multiplexed data container. Something like:
void pull_floats_cpp(std::vector<std::vector<float>>& floatbuf)
{
std::vector<float> frame;
floatbuf.clear();
while(pull_float_cpp(frame)) // C++ function to pull only one frame at a time
{
floatbuf.push_back(frame); // (memory may be allocated here)
}
}
This works because in C++ you can pun a reference to a std::vector to a primitive type like float*. That is, the vector frame from above goes into a wrapper like:
void pull_float_cpp(std:vector<float>& frame)
{
frame.resize(channel_count); // memory may be allocated here as well...
pull_float_c(&frame[0]);
}
where pull_float_c has a signature like:
void pull_float_c(float* frame);
I would like to do something similar in the C# API. Ideally the wrapper method would have a signature like:
void PullFloats(List<List<float>> floatbuf);
instead of
int PullFloats(float[,] floatbuf);
so that clients don't have work with 2D arrays and (more importantly) don't have to keep track of the number of frames they get. That should be inherent to the dimensions of the containing object so that clients can use enumeration patterns and foreach. But, unlike C++ std::vector, you can't pun a List to an array. As far as I know ToArray allocates memory and does a copy so that not only is there memory being allocated, but the new data doesn't go into the List of Lists that the array was built from.
I hope that the psuedocode + explanation of this problem is clear. Any suggestions for how to tackle it in an elegant C# way is much appreciated. Or, if someone can assure me that this is simply a rift between C and C# that may not be breached without imitating C-style memory management, at least I would know not to think about this any more.
Could a MemoryStream or a Span help here?
I came up with a pretty satisfactory way to wrap pre-allocated arrays in Lists. Please anyone let me know if there is a better way to do this, but according to this I think this is about as good as it gets---if the answer is to make a List out of an array, anyway. According to my debugger, 100,000 iterations of 5000 or so floats at a time,takes less than 12 seconds (which is far better than the underlying library demands in practice, but worse than I would like to see), the memory use stays flat at around 12 Mb (no copies), and the GC isn't called until the program exits:
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
namespace ListArrayTest
{
[StructLayout(LayoutKind.Explicit, Pack = 2)]
public class GenericDataBuffer
{
[FieldOffset(0)]
public int _numberOfBytes;
[FieldOffset(8)]
private readonly byte[] _byteBuffer;
[FieldOffset(8)]
private readonly float[] _floatBuffer;
[FieldOffset(8)]
private readonly int[] _intBuffer;
public byte[] ByteBuffer => _byteBuffer;
public float[] FloatBuffer => _floatBuffer;
public int[] IntBuffer => _intBuffer;
public GenericDataBuffer(int sizeToAllocateInBytes)
{
int aligned4Bytes = sizeToAllocateInBytes % 4;
sizeToAllocateInBytes = (aligned4Bytes == 0) ? sizeToAllocateInBytes : sizeToAllocateInBytes + 4 - aligned4Bytes;
// Allocating the byteBuffer is co-allocating the floatBuffer and the intBuffer
_byteBuffer = new byte[sizeToAllocateInBytes];
_numberOfBytes = _byteBuffer.Length;
}
public static implicit operator byte[](GenericDataBuffer genericDataBuffer)
{
return genericDataBuffer._byteBuffer;
}
public static implicit operator float[](GenericDataBuffer genericDataBuffer)
{
return genericDataBuffer._floatBuffer;
}
public static implicit operator int[](GenericDataBuffer genericDataBuffer)
{
return genericDataBuffer._intBuffer;
}
}
public class ListArrayTest<T>
{
private readonly Random _random = new();
const int _channels = 10;
const int _maxFrames = 500;
private readonly T[,] _array = new T[_maxFrames, _channels];
private readonly GenericDataBuffer _genericDataBuffer;
int _currentFrameCount;
public int CurrentFrameCount => _currentFrameCount;
// generate 'data' to pull
public void PushValues()
{
int frames = _random.Next(_maxFrames);
if (frames == 0) frames++;
for (int ch = 0; ch < _array.GetLength(1); ch++)
{
for (int i = 0; i < frames; i++)
{
switch (_array[0, 0]) // in real life this is done with type enumerators
{
case float: // only implementing float to be concise
_array[i, ch] = (T)(object)(float)i;
break;
}
}
}
_currentFrameCount = frames;
}
private void CopyFrame(int frameIndex)
{
for (int ch = 0; ch < _channels; ch++)
switch (_array[0, 0]) // in real life this is done with type enumerators
{
case float: // only implementing float to be concise
_genericDataBuffer.FloatBuffer[ch] = (float)(object)_array[frameIndex, ch];
break;
}
}
private void PullFrame(List<T> frame, int frameIndex)
{
frame.Clear();
CopyFrame(frameIndex);
for (int ch = 0; ch < _channels; ch++)
{
switch (frame)
{
case List<float>: // only implementing float to be concise
frame.Add((T)(object)BitConverter.ToSingle(_genericDataBuffer, ch * 4));
break;
}
}
}
public void PullChunk(List<List<T>> list)
{
list.Clear();
List<T> frame = new();
int frameIndex = 0;
while (frameIndex != _currentFrameCount)
{
PullFrame(frame, frameIndex);
list.Add(frame);
frameIndex++;
}
}
public ListArrayTest()
{
switch (_array[0, 0])
{
case float:
_genericDataBuffer = new(_channels * 4);
break;
}
}
}
internal class Program
{
static void Main(string[] args)
{
ListArrayTest<float> listArrayTest = new();
List<List<float>> chunk = new();
for (int i = 0; i < 100; i++)
{
listArrayTest.PushValues();
listArrayTest.PullChunk(chunk);
Console.WriteLine($"{i}: first value: {chunk[0][0]}");
}
}
}
}
Update
...and, using a nifty trick I found from Mark Heath (https://github.com/markheath), I can effectively type pun List<List<T>> back to a T* the same way as does the C++ API with std::vector<std::vector<T>> (see class GenericDataBuffer). It is a lot more complicated under the hood since one must be so verbose with type casting in C#, but it compiles without complaint and it works like a charm. Here is the blog post I stole the idea from: https://www.markheath.net/post/wavebuffer-casting-byte-arrays-to-float.
This also lets me ditch the need for clients being responsible to pre-allocate, at the cost of (as in the C++ wrapper) of having to do a bit of dynamic allocation internally. According to the debugger the GC doesn't get called and the memory stays flat, so I guess the Lists allocations are not relying on digging into the heap.

Fastest way to copy a blittable struct to an unmanaged memory location (IntPtr)

I have a function similar to the following:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public void SetVariable<T>(T newValue) where T : struct {
// I know by this point that T is blittable (i.e. only unmanaged value types)
// varPtr is a void*, and is where I want to copy newValue to
*varPtr = newValue; // This won't work, but is basically what I want to do
}
I saw Marshal.StructureToIntPtr(), but it seems quite slow, and this is performance-sensitive code. If I knew the type T I could just declare varPtr as a T*, but... Well, I don't.
Either way, I'm after the fastest possible way to do this. 'Safety' is not a concern: By this point in the code, I know that the size of the struct T will fit exactly in to the memory pointed to by varPtr.
One answer is to reimplement native memcpy instead in C#, making use of the same optimizing tricks that native memcpy attempts to do. You can see Microsoft doing this in their own source. See the Buffer.cs file in the Microsoft Reference Source:
// This is tricky to get right AND fast, so lets make it useful for the whole Fx.
// E.g. System.Runtime.WindowsRuntime!WindowsRuntimeBufferExtensions.MemCopy uses it.
internal unsafe static void Memcpy(byte* dest, byte* src, int len) {
// This is portable version of memcpy. It mirrors what the hand optimized assembly versions of memcpy typically do.
// Ideally, we would just use the cpblk IL instruction here. Unfortunately, cpblk IL instruction is not as efficient as
// possible yet and so we have this implementation here for now.
switch (len)
{
case 0:
return;
case 1:
*dest = *src;
return;
case 2:
*(short *)dest = *(short *)src;
return;
case 3:
*(short *)dest = *(short *)src;
*(dest + 2) = *(src + 2);
return;
case 4:
*(int *)dest = *(int *)src;
return;
...
Its interesting to note that they natively implement memcpy for all sizes up to 512; most of the sizes use pointer aliasing tricks to get the VM to emit instructions that operate on differing sizes. Only at 512 do they finally drop into invoking the native memcpy:
// P/Invoke into the native version for large lengths
if (len >= 512)
{
_Memcpy(dest, src, len);
return;
}
Presumably, native memcpy is even faster since it can be hand optimized to use SSE/MMX instructions to perform the copy.
As per BenVoigt's suggestion, I tried a few options. For all these tests I compiled with Any CPU architecture, on a standard VS2013 Release build, and ran the test outside of the IDE. Before each test was measured, the methods DoTestA() and DoTestB() were run multiple times to allow the JIT warmup.
First, I compared Marshal.StructToPtr to a byte-by-byte loop with various struct sizes. I've shown the code below using a SixtyFourByteStruct:
private unsafe static void DoTestA() {
fixed (SixtyFourByteStruct* fixedStruct = &structToCopy) {
byte* structStart = (byte*) fixedStruct;
byte* targetStart = (byte*) unmanagedTarget;
for (byte* structPtr = structStart, targetPtr = targetStart; structPtr < structStart + sizeof(SixtyFourByteStruct); ++structPtr, ++targetPtr) {
*targetPtr = *structPtr;
}
}
}
private static void DoTestB() {
Marshal.StructureToPtr(structToCopy, unmanagedTarget, false);
}
And the results:
>>> 500000 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method Avg. Min. Max. Jitter Total
A 82ns 0ns 22,000ns 21,917ns ! 41.017ms
B 137ns 0ns 38,700ns 38,562ns ! 68.834ms
As you can see, the manual loop is faster (as I suspected). The results are similar for a sixteen-byte and four-byte struct, with the difference being more pronounced the smaller the struct goes.
So now, to try the manual copy vs using P/Invoke and memcpy:
private unsafe static void DoTestA() {
fixed (FourByteStruct* fixedStruct = &structToCopy) {
byte* structStart = (byte*) fixedStruct;
byte* targetStart = (byte*) unmanagedTarget;
for (byte* structPtr = structStart, targetPtr = targetStart; structPtr < structStart + sizeof(FourByteStruct); ++structPtr, ++targetPtr) {
*targetPtr = *structPtr;
}
}
}
private unsafe static void DoTestB() {
fixed (FourByteStruct* fixedStruct = &structToCopy) {
memcpy(unmanagedTarget, (IntPtr) fixedStruct, new UIntPtr((uint) sizeof(FourByteStruct)));
}
}
>>> 500000 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method Avg. Min. Max. Jitter Total
A 61ns 0ns 28,000ns 27,938ns ! 30.736ms
B 84ns 0ns 45,900ns 45,815ns ! 42.216ms
So, it seems that the manual copy is still better in my case. Like before, the results were pretty similar for 4/16/64 byte structs (though the gap was <10ns for 64-byte size).
It occurred to me that I was only testing structures that fit on a cache line (I have a standard x86_64 CPU). So I tried a 128-byte structure, and it swung the balance in the favour of memcpy:
>>> 500000 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method Avg. Min. Max. Jitter Total
A 104ns 0ns 48,300ns 48,195ns ! 52.150ms
B 84ns 0ns 38,400ns 38,315ns ! 42.284ms
Anyway, the conclusion to all that is that the byte-by-byte copy seems the fastest for any struct of size <=64 bytes on an x86_64 CPU on my machine. Take it as you will (and maybe someone will spot an inefficiency in my code anyway).
FYI. I'm posting how I leveraged the accepted answer for others' benefit as there's a twist when accessing the method via reflection because it's overloaded.
public static class Buffer
{
public unsafe delegate void MemcpyDelegate(byte* dest, byte* src, int len);
public static readonly MemcpyDelegate Memcpy;
static Buffer()
{
var methods = typeof (System.Buffer).GetMethods(BindingFlags.Static | BindingFlags.NonPublic).Where(m=>m.Name == "Memcpy");
var memcpy = methods.First(mi => mi.GetParameters().Select(p => p.ParameterType).SequenceEqual(new[] {typeof (byte*), typeof (byte*), typeof (int)}));
Memcpy = (MemcpyDelegate) memcpy.CreateDelegate(typeof (MemcpyDelegate));
}
}
Usage:
public static unsafe void MemcpyExample()
{
int src = 12345;
int dst = 0;
Buffer.Memcpy((byte*) &dst, (byte*) &src, sizeof (int));
System.Diagnostics.Debug.Assert(dst==12345);
}
public void SetVariable<T>(T newValue) where T : struct
You cannot use generics to accomplish this the fast way. The compiler doesn't take your pretty blue eyes as a guarantee that T is actually blittable, the constraint isn't good enough. You should use overloads:
public unsafe void SetVariable(int newValue) {
*(int*)varPtr = newValue;
}
public unsafe void SetVariable(double newValue) {
*(double*)varPtr = newValue;
}
public unsafe void SetVariable(Point newValue) {
*(Point*)varPtr = newValue;
}
// etc...
Which might be inconvenient, but blindingly fast. It compiles to single MOV instruction with no method call overhead in Release mode. The fastest it could be.
And the back-up case, the profiler will tell you when you need to overload:
public unsafe void SetVariable<T>(T newValue) {
Marshal.StructureToPtr(newValue, (IntPtr)varPtr, false);
}

ArgumentOutOfRangeException on SerialPort.ReadTo()

My code indeterminately throws ArgumentOutOfRangeException: Non-negative number required. when invoking the ReadTo() method of the SerialPort class:
public static void RetrieveCOMReadings(List<SuperSerialPort> ports)
{
Parallel.ForEach(ports,
port => port.Write(port.ReadCommand));
Parallel.ForEach(ports,
port =>
{
try
{
// this is the offending line.
string readto = port.ReadTo(port.TerminationCharacter);
port.ResponseData = port.DataToMatch.Match(readto).Value;
}
catch (Exception ex)
{
Debug.WriteLine(ex.Message);
port.ResponseData = null;
}
});
}
SuperSerialPort is an extension of the SerialPort class, primarily to hold information required for communications specific to each device on the port.
A port always has the TerminationCharacter defined;
Most of the time it's a newline character:
I don't understand why this is happening.
If the ReadTo fails to find the character(s) specified in the input buffer, shouldn't it just timeout and return nothing?
The StackTrace is pointing to an offending function in the mscorlib, in the definition of the SerialPort class:
System.ArgumentOutOfRangeException occurred
HResult=-2146233086
Message=Non-negative number required.
Parameter name: byteCount
Source=mscorlib
ParamName=byteCount
StackTrace:
at System.Text.ASCIIEncoding.GetMaxCharCount(Int32 byteCount)
InnerException:
I followed it and here's what I found:
private int ReadBufferIntoChars(char[] buffer, int offset, int count, bool countMultiByteCharsAsOne)
{
Debug.Assert(count != 0, "Count should never be zero. We will probably see bugs further down if count is 0.");
int bytesToRead = Math.Min(count, CachedBytesToRead);
// There are lots of checks to determine if this really is a single byte encoding with no
// funky fallbacks that would make it not single byte
DecoderReplacementFallback fallback = encoding.DecoderFallback as DecoderReplacementFallback;
----> THIS LINE
if (encoding.IsSingleByte && encoding.GetMaxCharCount(bytesToRead) == bytesToRead &&
fallback != null && fallback.MaxCharCount == 1)
{
// kill ASCII/ANSI encoding easily.
// read at least one and at most *count* characters
decoder.GetChars(inBuffer, readPos, bytesToRead, buffer, offset);
bytesToRead is getting assigned a negative number because CachedBytesToRead is negative. The inline comments specify that CachedBytesToRead can never be negative, yet it's clearly the case:
private int readPos = 0; // position of next byte to read in the read buffer. readPos <= readLen
private int readLen = 0; // position of first unreadable byte => CachedBytesToRead is the number of readable bytes left.
private int CachedBytesToRead {
get {
return readLen - readPos;
}
Anyone have any rational explanation for why this is happening?
I don't believe I'm doing anything illegal in terms of reading/writing/accessing the SerialPorts.
This gets thrown constantly, with no good way to reproduce it.
There's bytes available on the input buffer, here you can see the state of some of the key properties when it breaks (readLen, readPos, BytesToRead, CachedBytesToRead):
Am I doing something glaringly wrong?
EDIT: A picture showing that the same port isn't being asynchronously accessed from the loop:
This is technically possible, in general a common issue with .NET classes that are not thread-safe. The SerialPort class is not, there's no practical case where it needs to be thread-safe.
The rough diagnostic is that two separate threads are calling ReadTo() on the same SerialPort object concurrently. A standard threading race condition will occur in the code that updates the readPos variable. Both threads have copied the same data from the buffer and each increment readPos. In effect advancing readPos too far by double the amount. Kaboom when the next call occurs with readPos larger than readLen, producing a negative value for the number of available bytes in the buffer.
The simple explanation is that your List<SuperSerialPort> collection contains the same port more than once. The Parallel.ForEach() statement triggers the race. Works just fine for a while, until two threads execute the decoder.GetChars() method simultaneously and both arrive at the next statement:
readPos += bytesToRead;
Best way to test the hypothesis is to add code that ensures that the list does contain the same port more than once. Roughly:
#if DEBUG
for (int ix = 0; ix < ports.Count - 1; ++ix)
for (int jx = ix + 1; jx < ports.Count; ++jx)
if (ports[ix].PortName == ports[jx].PortName)
throw new InvalidOperationException("Port used more than once");
#endif
A second explanation is that your method is being calling by more than one thread. That can't work, your method isn't thread-safe. Short from protecting it with a lock, making sure that only one thread ever calls it is the logical fix.
It can be caused because you are setting a termination character and using this character for readto. Instead try to use ReadLine or remove the termination character.

Workaround to add a default parameterless constructor to a struct

Let me describe my problem - I have a struct that wraps an unmanaged handle (let's call it Mem). I need this handle to call a particular method (say "retain" or alternatively, maintain a reference count) whenever it is copied.
In other words, I need a struct that maintains a reference count internally (I have a mechanism externally as well, but need a way to invoke that mechanism).
Unfortunately, C# doesn't let me do this in any way.
I also cannot make Mem a class because I will pass an array of these structs to unmanaged code and I do NOT want to convert them one by one before passing them in (just pin and pass).
Does anyone know of any workaround (IL Weaving, etc) that can be applied to add this behavior in? I believe IL doesn't prevent me from doing this, only C#, correct?
I am happy to answer any questions about the framework and restrictions I have, but I am not looking for - "please change your design" or "don't use C# for this" answers, thanks very much.
I believe IL doesn't prevent me from doing this, only C#, correct?
Yes, that's right where "this" is "a parameterless constructor for a struct". I blogged about that a while ago.
However, having a parameterless constructor does not do what you want in terms of notifying you every time a struct is copied. There's basically no way of doing that, as far as I'm aware. The constructor isn't even called in every case when you end up with a "default" value, and even if it were, it's certainly not called just for copy operations.
I know you don't want to hear "please change your design" but you're simply asking for something which does not exist in .NET.
I would suggest having some sort of method on the value type which returns a new copy, having taken appropriate action. You then need to make sure you always call that method at the right time. There will be nothing preventing you from getting this wrong, other than whatever testing you can build.
Does anyone know of any workaround (IL Weaving, etc) that can be applied to add this behavior in? I believe IL doesn't prevent me from doing this, only C#, correct?
This is correct, somewhat. The reason C# prevents this is that the constructor will not be used, even if its defined in IL, in many cases. Your case is one of these - if you create an array of structs, the constructors will not be called, even if they're defined in the IL.
Unfortunately, there isn't really a workaround since the CLR won't call the constructors, even if they exist.
Edit: I hosted the work from this answer on GitHub: the NOpenCL library.
Based on your comment, I determined the following as an appropriate long-term course of action for the problems being discussed here. Apparently the problem centers around the use of OpenCL within managed code. What you need is a proper interop layer for this API.
As an experiment, I wrote a managed wrapper for a large portion of the OpenCL API to evaluate the viability of SafeHandle to wrap cl_mem, cl_event, and other objects which require a call to clRelease* for cleanup. The most challenging part was implementing methods like clEnqueueReadBuffer which can take an array of these handles as a parameter. The initial declaration of this method looked like the following.
[DllImport(ExternDll.OpenCL)]
private static extern ErrorCode clEnqueueReadBuffer(
CommandQueueSafeHandle commandQueue,
BufferSafeHandle buffer,
[MarshalAs(UnmanagedType.Bool)] bool blockingRead,
IntPtr offset,
IntPtr size,
IntPtr destination,
uint numEventsInWaitList,
[In, MarshalAs(UnmanagedType.LPArray)] EventSafeHandle[] eventWaitList,
out EventSafeHandle #event);
Unfortunately, the P/Invoke layer does not support marshaling an array of SafeHandle objects, so I implemented an ICustomMarshaler called SafeHandleArrayMarshaler to handle this. Note that the current implementation does not use Constrained Execution Regions, so an asynchronous exception during marshaling can cause it to leak memory.
internal sealed class SafeHandleArrayMarshaler : ICustomMarshaler
{
private static readonly SafeHandleArrayMarshaler Instance = new SafeHandleArrayMarshaler();
private SafeHandleArrayMarshaler()
{
}
public static ICustomMarshaler GetInstance(string cookie)
{
return Instance;
}
public void CleanUpManagedData(object ManagedObj)
{
throw new NotSupportedException();
}
public void CleanUpNativeData(IntPtr pNativeData)
{
if (pNativeData == IntPtr.Zero)
return;
GCHandle managedHandle = GCHandle.FromIntPtr(Marshal.ReadIntPtr(pNativeData, -IntPtr.Size));
SafeHandle[] array = (SafeHandle[])managedHandle.Target;
managedHandle.Free();
for (int i = 0; i < array.Length; i++)
{
SafeHandle current = array[i];
if (current == null)
continue;
if (Marshal.ReadIntPtr(pNativeData, i * IntPtr.Size) != IntPtr.Zero)
array[i].DangerousRelease();
}
Marshal.FreeHGlobal(pNativeData - IntPtr.Size);
}
public int GetNativeDataSize()
{
return IntPtr.Size;
}
public IntPtr MarshalManagedToNative(object ManagedObj)
{
if (ManagedObj == null)
return IntPtr.Zero;
SafeHandle[] array = (SafeHandle[])ManagedObj;
int i = 0;
bool success = false;
try
{
for (i = 0; i < array.Length; success = false, i++)
{
SafeHandle current = array[i];
if (current != null && !current.IsClosed && !current.IsInvalid)
current.DangerousAddRef(ref success);
}
IntPtr result = Marshal.AllocHGlobal(array.Length * IntPtr.Size);
Marshal.WriteIntPtr(result, 0, GCHandle.ToIntPtr(GCHandle.Alloc(array, GCHandleType.Normal)));
for (int j = 0; j < array.Length; j++)
{
SafeHandle current = array[j];
if (current == null || current.IsClosed || current.IsInvalid)
{
// the memory for this element was initialized to null by AllocHGlobal
continue;
}
Marshal.WriteIntPtr(result, (j + 1) * IntPtr.Size, current.DangerousGetHandle());
}
return result + IntPtr.Size;
}
catch
{
int total = success ? i + 1 : i;
for (int j = 0; j < total; j++)
{
SafeHandle current = array[j];
if (current != null)
current.DangerousRelease();
}
throw;
}
}
public object MarshalNativeToManaged(IntPtr pNativeData)
{
throw new NotSupportedException();
}
}
This allowed me to successfully use the following interop declaration.
[DllImport(ExternDll.OpenCL)]
private static extern ErrorCode clEnqueueReadBuffer(
CommandQueueSafeHandle commandQueue,
BufferSafeHandle buffer,
[MarshalAs(UnmanagedType.Bool)] bool blockingRead,
IntPtr offset,
IntPtr size,
IntPtr destination,
uint numEventsInWaitList,
[In, MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(SafeHandleArrayMarshaler))] EventSafeHandle[] eventWaitList,
out EventSafeHandle #event);
This method is declared as private so I could expose it through a method that handles the numEventsInWaitList and eventWaitList arguments properly according to the OpenCL 1.2 API documentation.
internal static EventSafeHandle EnqueueReadBuffer(CommandQueueSafeHandle commandQueue, BufferSafeHandle buffer, bool blocking, IntPtr offset, IntPtr size, IntPtr destination, EventSafeHandle[] eventWaitList)
{
if (commandQueue == null)
throw new ArgumentNullException("commandQueue");
if (buffer == null)
throw new ArgumentNullException("buffer");
if (destination == IntPtr.Zero)
throw new ArgumentNullException("destination");
EventSafeHandle result;
ErrorHandler.ThrowOnFailure(clEnqueueReadBuffer(commandQueue, buffer, blocking, offset, size, destination, eventWaitList != null ? (uint)eventWaitList.Length : 0, eventWaitList != null && eventWaitList.Length > 0 ? eventWaitList : null, out result));
return result;
}
The API is finally exposed to user code as the following instance method in my ContextQueue class.
public Event EnqueueReadBuffer(Buffer buffer, bool blocking, long offset, long size, IntPtr destination, params Event[] eventWaitList)
{
EventSafeHandle[] eventHandles = null;
if (eventWaitList != null)
eventHandles = Array.ConvertAll(eventWaitList, #event => #event.Handle);
EventSafeHandle handle = UnsafeNativeMethods.EnqueueReadBuffer(this.Handle, buffer.Handle, blocking, (IntPtr)offset, (IntPtr)size, destination, eventHandles);
return new Event(handle);
}

Why is Marshal.WriteInt64 method's code so complex?

The code below is been reflected form .Net Framework:
[SecurityCritical]
public static unsafe void WriteInt64(IntPtr ptr, int ofs, long val){
try{
byte* numPtr = (byte*) (((void*) ptr) + ofs);
if ((((int) numPtr) & 7) == 0){
*((long*) numPtr) = val;
}
else{
byte* numPtr2 = (byte*) &val;
numPtr[0] = numPtr2[0];
numPtr[1] = numPtr2[1];
numPtr[2] = numPtr2[2];
numPtr[3] = numPtr2[3];
numPtr[4] = numPtr2[4];
numPtr[6] = numPtr2[6];
numPtr[7] = numPtr2[7];
}
}
catch (NullReferenceException){
throw new AccessViolationException();
}
}
In my opinion, *((long*) numPtr) = val is enough, and very efficient.
Why so complex?
It seems rather straightforward, although optimized.
Notice the outer if - it checks to see if you can write the Int64 in one operation (that happens if the pointer you hand the method is aligned properlty - points to the start of an Int64 in memory - the address needs to be a multiple of 8).
If you can't write in one operation, the code just writes one byte at a time, skipping the loop to save some time (this is called 'loop unrolling')

Categories

Resources