SharpDX 2.5 in DirectX11 in WPF - c#

I'm trying to implement DirectX 11 using SharpDX 2.5 into WPF.
Sadly http://directx4wpf.codeplex.com/ and http://sharpdxwpf.codeplex.com/ don't work properly with SharpDX 2.5. I was also not able to port the WPFHost DX10 sample to DX11 and the full code package of this example is down: http://www.indiedev.de/wiki/DirectX_in_WPF_integrieren
Can someone suggest another way of implementing?

SharpDX supports WPF via SharpDXElement.
Take a look in the Samples repository at the Toolkit.sln - all projects that have WPF in their name use SharpDXElement as rendering surface:
MiniCube.WPF - demonstrates basic SharpDX-WPF integration;
MiniCube.SwitchContext.WPF - demonstrates basic scenario when lifetime of the Game instance is different from the lifetime of SharpDXElement (in other words - when there is need to switch game rendering on another surface).
MiniCube.SwitchContext.WPF.MVVM - same as above, but more 'MVVM-way'.
Update: SharpDX.Toolkit has been deprecated and it is not maintained anymore. It is moved to a separate repository. The Toolkit samples were deleted, however I changed the link to a changeset where they are still present.

You can still use http://sharpdxwpf.codeplex.com/.
In order to work properly with SharpDX 2.5.0 you need to do a few modifications.
1) In project Sharp.WPF in class DXUtils.cs in method
Direct3D11.Buffer CreateBuffer<T>(this Direct3D11.Device device, T[] range)
add this line
stream.Position = 0;
just after
stream.WriteRange(range);
So fixed method looks like this:
public static Direct3D11.Buffer CreateBuffer<T>(this Direct3D11.Device device, T[] range)
where T : struct
{
int sizeInBytes = Marshal.SizeOf(typeof(T));
using (var stream = new DataStream(range.Length * sizeInBytes, true, true))
{
stream.WriteRange(range);
stream.Position = 0; // fix
return new Direct3D11.Buffer(device, stream, new Direct3D11.BufferDescription
{
BindFlags = Direct3D11.BindFlags.VertexBuffer,
SizeInBytes = (int)stream.Length,
CpuAccessFlags = Direct3D11.CpuAccessFlags.None,
OptionFlags = Direct3D11.ResourceOptionFlags.None,
StructureByteStride = 0,
Usage = Direct3D11.ResourceUsage.Default,
});
}
}
2) And in class D3D11 in file D3D11.cs
rename this
m_device.ImmediateContext.Rasterizer.SetViewports(new Viewport(0, 0, w, h, 0.0f, 1.0f));
into this
m_device.ImmediateContext.Rasterizer.SetViewport(new Viewport(0, 0, w, h, 0.0f, 1.0f));
i.e. SetViewports into SetViewport.
And it should work now.

Related

WPF 3D graphic loop takes too long

I'm trying to create goldbergs polyhedra, but the code that should draw it on my screen works too slow (about 22 seconds to draw 6th lvl of detalization)
Stopwatch sw = new Stopwatch();
var hexes = sphere.hexes.ToArray();
sw.Start();
for (int j = 0; j < hexes.Length; j++)
{
MeshGeometry3D myMeshGeometry3D = new MeshGeometry3D();
Vector3DCollection myNormalCollection = new Vector3DCollection();
foreach (var verts in hexes[j].Normals)
{
myNormalCollection.Add(verts);
}
myMeshGeometry3D.Normals = myNormalCollection;
Point3DCollection myPositionCollection = new Point3DCollection();
foreach (var verts in hexes[j].Normals)
{
myPositionCollection.Add(new Point3D(verts.X, verts.Y, verts.Z));
}
myMeshGeometry3D.Positions = myPositionCollection;
Int32Collection myTriangleIndicesCollection = new Int32Collection();
foreach (var triangle in hexes[j].Tris)
{
myTriangleIndicesCollection.Add(triangle);
}
myMeshGeometry3D.TriangleIndices = myTriangleIndicesCollection;
Material material = new DiffuseMaterial(
new SolidColorBrush(Colors.Black)); ;
if (switcher)
{
material = new DiffuseMaterial(
new SolidColorBrush(Colors.BlueViolet));
}
switcher = !switcher;
GeometryModel3D model = new GeometryModel3D(
myMeshGeometry3D, material);
myGeometryModel.Geometry = myMeshGeometry3D;
myModel3DGroup.Children.Add(model);
myModel3DGroup.Children.Add(myGeometryModel);
}
sw.Stop();
I've tried to make my loop parallel, but myGeometryModel and myModel3DGroup are in the main thread so i can't modify them.
Your code isn't the problem.
I tried it out (thanks for posting a link) and based on what I was seeing in the Visual Studio performance tools tried using the Model3DCollection constructor that takes an IEnumerable<Model3D>, instead of adding them one by one, which I see Andy also suggested. (During the build loop I added all the models to my own List<Model3D> - which took almost no time - and sourced the Model3DCollection with this list - which is where everything ground to a halt.)
While this approximately halved the time, it was still 20+ seconds on my souped up machine in debug mode. The Model3DCollection constructor with the IEnumerable - one line of code - took up nearly all the time.
For what it's worth - and I assume you know this, but for everyone else's benefit - detail level 5 rendered about 4x as fast for me, with 20,485 elements in the collection vs. 81,925 at level 6, while level 4 was essentially instantaneous at 5125 elements. So apparently every detail level quadruples your model count and in turn the time to construct the Model3DCollection. Point being, if you can get away with a lower detail level the render times improve dramatically.
But if you require that detail level, you really need to be looking at another platform IMO. C# itself is not the issue, but the bottleneck is in WPF, so you're rather stuck. If you need to stick with WPF, you might try looking into D3DImage and SharpDX. No C++ is required. Of course it would be a substantial re-write, but you're virtually certain to get the performance you're looking for.
Edit:
Experiment with code.
I added the following code to the end of the generation:
myViewport3D.Children.Add(myModelVisual3D);
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
XmlWriter writer = XmlWriter.Create(#"e:\temp\3dg.xaml", settings);
XamlWriter.Save(myViewport3D, writer);
This results in a file roughly 79 meg and of course does not speed the generation up at all.
I then load that file instead of generating it:
private void CreateHexaSphere(object sender, RoutedEventArgs e)
{
using (FileStream fs = new FileStream(#"e:\temp\3dg.xaml", FileMode.Open))
{
myViewport3D = (Viewport3D)XamlReader.Load(fs);
}
this.Content = myViewport3D;
return;
That takes roughly 7 seconds to show the sphere. Yep. Bit of a difference there.
If you are happy to just load that file from disk then 7 seconds seems way better than 52 seconds.
If you need to generate this thing on the fly then I would find a way to work with strings and build a similar string to what you get in that file.
Unsuccessful theory:
I think all the things that matter there are freezables.
.Freeze() a freezable and you increase performance but can also pass it from one thread to another.
You could do most of this on multiple parallel background threads.
EG each hex could be built on a set of parallel worker threads. Each building their own myMeshGeometry3D which was frozen and returned to a caller thread.
Loosely, you'd have a Task DoMeshGeom3D(int j) which returns one of your 3d geometries.
You could use plinq or task.whenall or parallel.foreach
Roughly
var taskList = new List<Task<MeshGeometry3D>>();
for (int j = 0; j < hexes.Length; j++)
{
taskList.Add(DoMeshGeom3D(j));
}
var result = await Task.WhenAll(taskList.ToList());
Grab your list of numerous MeshGeometry3D and assemble them all together.
And you could be awaiting that on a background thread in another task.
You can also try constructing your 3dgroup using a list or ienumerable of these things constructed in the separate threads rather than adding one at a time.
https://briancaos.wordpress.com/2020/09/11/run-tasks-in-parallel-using-net-core-c-and-async-coding/
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.whenall?view=netcore-3.1

Display point cloud continuous c# Intel Realsense

This maybe a continuation of my previous question about displaying ply file with Helix toolkit in c#. The problem I have with that solution is that it is not continuous and if a ply file is made it slows down the program a lot.
My code for making the point cloud looks like:
// CopyVertices is extensible, any of these will do:
var vertices = new float[points.Count * 3];
// var vertices = new Intel.RealSense.Math.Vertex[points.Count];
// var vertices = new UnityEngine.Vector3[points.Count];
// var vertices = new System.Numerics.Vector3[points.Count]; // SIMD
// var vertices = new GlmSharp.vec3[points.Count];
// var vertices = new byte[points.Count * 3 * sizeof(float)];
points.CopyVertices(vertices);
And the ply file is made with the line:
points.ExportToPLY("pointcloud.ply", colorFrame);
The helix toolkit is used like this:
Model3DGroup model1 = import.Load("pointcloud.ply");
model.Content = model1;
the rest of the code is like the C# wrapper of librealsense:
https://github.com/IntelRealSense/librealsense/tree/master/wrappers/csharp
Does anyone have an idea on how to make this pointcloud display continuous?
Are you using HelixToolkit.Wpf or HelixToolkit.SharpDX.Wpf?
Try to use HelixToolkit.SharpDX version if your point cloud is big.
Also try to avoid export and import while doing continuous update. You can convert your point cloud directly into helixtoolkit supported points format and update the point model.

Exception when creating Swap Chain with CreateSwapChainForComposition

I am trying to render DirectX12 in SwapChainPanel by using SharpDx but creating a SwapChain fails for an unknown reason. Here is a simplified version of what I have:
// select adapter based on some simple scoring mechanism
SelectedAdapter = SelectAdapter();
// create device
using (var defaultDevice = new Device(SelectedAdapter, FeatureLevel.Level_12_0))
Device = defaultDevice.QueryInterface<SharpDX.Direct3D12.Device2>();
// describe swap chain
SwapChainDescription1 swapChainDescription = new SwapChainDescription1
{
AlphaMode = AlphaMode.Ignore,
BufferCount = 2,
Format = Format.R8G8B8A8_UNorm,
Height = (int)(MainSwapChainPanel.RenderSize.Height),
Width = (int)(MainSwapChainPanel.RenderSize.Width),
SampleDescription = new SampleDescription(1, 0),
Scaling = Scaling.Stretch,
Stereo = false,
SwapEffect = SwapEffect.FlipSequential,
Usage = Usage.RenderTargetOutput
};
// create swap chain
using (var factory2 = SelectedAdapter.GetParent<Factory2>())
{
/*--> throws exception:*/
SwapChain1 swapChain1 = new SwapChain1(factory2, Device, ref swapChainDescription);
SwapChain = swapChain1.QueryInterface<SwapChain2>();
}
// tie created swap chain with swap chain panel
using (ISwapChainPanelNative nativeObject = ComObject.As<ISwapChainPanelNative>(MainSwapChainPanel))
nativeObject.SwapChain = SwapChain;
Selection of adapter works as expected (I have 2 adapters + software adapter). I can create a device and I can see that the app in task manager is using that selected adapter.
Creation of the swap chain is based mostly on this documentation here: https://learn.microsoft.com/en-us/windows/uwp/gaming/directx-and-xaml-interop#swapchainpanel-and-gaming
I get factory2 (with all the adapters and other things enumerated). Constructor of SwapChain1 internally is using factory2 to create a swap chain: https://github.com/sharpdx/SharpDX/blob/master/Source/SharpDX.DXGI/SwapChain1.cs#L64
I compared this method with several others examples and tutorials and this is the way it should be done, however, regardless of the Format I choose or adapter, I keep getting this exception:
{SharpDX.SharpDXException: HRESULT: [0x887A0001], Module:
[SharpDX.DXGI], ApiCode: [DXGI_ERROR_INVALID_CALL/InvalidCall],
Message: The application made a call that is invalid. Either the
parameters of the call or the state of some object was incorrect.
Enable the D3D debug layer in order to see details via debug messages.
at SharpDX.Result.CheckError() at
SharpDX.DXGI.Factory2.CreateSwapChainForComposition(IUnknown
deviceRef, SwapChainDescription1& descRef, Output restrictToOutputRef,
SwapChain1 swapChainOut) at SharpDX.DXGI.SwapChain1..ctor(Factory2
factory, ComObject device, SwapChainDescription1& description, Output
restrictToOutput) at UI.MainPage.CreateSwapChain()}
DebugLayer doesn't show any additional info.
The app itself is a regular Windows Universal Blank App (min target version Creators Update 15063). I know I can run Directx12 on my current hardware (C++ hello world works just fine).
Any ideas what is wrong?
This is how I got it working:
try
{
using (var factory4 = new Factory4())
{
SwapChain1 swapChain1 = new SwapChain1(factory4, CommandQueue, ref swapChainDescription);
SwapChain = swapChain1.QueryInterface<SwapChain2>();
}
}
catch (Exception e)
{
Debug.WriteLine(e.Message);
return;
}
using (ISwapChainPanelNative nativeObject = ComObject.As<ISwapChainPanelNative>(MainSwapChainPanel))
nativeObject.SwapChain = SwapChain;
So basically I need Factory4 interface to create temporary SwapChain1 from which I can query SwapChain2, then this SwapChain2 can be attached to SwapChainPanel.
Also, a very important thing to notice here is that even though SwapChain1 constructor signature (and documentation) https://github.com/sharpdx/SharpDX/blob/master/Source/SharpDX.DXGI/SwapChain1.cs#L51 says that 2nd argument should be device - it shouldn't. What you need to pass is a CommandQueue object. I have no idea why.
Also, constructor of SwapChain1 says it needs Factory2, but no, you have to pass Factory4!

How to chain together multiple NAudio ISampleProvider Effects

I have some DSP effects coded in the ISampleProvider model. To apply one effect I do this and it works fine.
string filename = "C:\myaudio.mp3";
MediaFoundationReader mediaFileReader = new MediaFoundationReader(filename);
ISampleProvider sampProvider = mediaFileReader.ToSampleProvider();
ReverbSampleProvider reverbSamplr = new ReverbSampleProvider(sampProvider);
IWavePlayer waveOutDevice.Init(reverbSamplr);
waveOutDevice.Play();
How can I apply multiple effects to the same input file simultaneously?
For example, if i have a Reverb effect and Distortion effect providers, how can I chain them together to apply them at the same time to one input file?
Effects can be chained together by passing one as the "source" for the next. So if you wanted your audio to go first through a reverb, and then distortion, you might do something like this, passing the original audio into the Reverb effect, the output of the reverb into the distortion effect and then sending the distortion to the waveOut device.
var reverb = new ReverbSampleProvider(sampProvider);
var distortion = new DistortionSampleProvider(reverb);
waveOutDevice.Init(distortion);
(n.b. NAudio does not come with built in reverb/distortion effects - you must make these yourself or source them from elsewhere)
Mark's answer is correct, but that approach is a pain if you're copy and pasting things around in different orders, because you have to change the variables that you're passing through.
For example, if you start with:
var lpf = new LowPassEffectStream(input);
var reverb = new ReverbEffectStream(lpf);
var stereo = new StereoEffectStream(reverb);
var vol = new VolumeSampleProvider(stereo);
waveOutDevice.Init(vol);
And you want to swap reverb and stereo, a quick copy-paste leaves you with the input variables backwards:
var lpf = new LowPassEffectStream(input);
var stereo = new StereoEffectStream(reverb); // <--
var reverb = new ReverbEffectStream(lpf); // <--
var vol = new VolumeSampleProvider(stereo);
waveOutDevice.Init(vol);
It also makes it easy to fix a parameter but forget to fix another, e.g. fixing the stereo effect to have lpf as its input, but forgetting to fix the reverb effect. This often results in skipped effects in the chain leading to frustrated debugging when the effect appears not to work.
To make things easier and less error-prone when I'm stacking effects together and re-ordering them, I created the following helper class:
class EffectChain : ISampleProvider
{
public EffectChain(ISampleProvider source)
{
this._sourceStream = source;
}
private readonly ISampleProvider _sourceStream;
private readonly List<ISampleProvider> _chain = new List<ISampleProvider>();
public ISampleProvider Head
{
get
{
return _chain.LastOrDefault() ?? _sourceStream;
}
}
public WaveFormat WaveFormat
{
get
{
return Head.WaveFormat;
}
}
public void AddEffect(ISampleProvider effect)
{
_chain.Add(effect);
}
public int Read(float[] buffer, int offset, int count)
{
return Head.Read(buffer, offset, count);
}
}
You can use it like this:
var effectChain = new EffectChain(input);
var lpf = new LowPassEffectStream(effectChain.Head);
effectChain.AddEffect(lpf);
var stereo = new StereoEffectStream(effectChain.Head);
effectChain.AddEffect(stereo);
var reverb = new ReverbEffectStream(effectChain.Head);
effectChain.AddEffect(reverb);
var vol = new VolumeSampleProvider(effectChain.Head);
effectChain.AddEffect(vol);
waveOutDevice.Init(effectChain);
This allows you to quickly re-order effects in the chain, as each effect takes the effect chain's head as an input. If you don't add any effects it just acts as a pass-through. You could easily expand this class to have more methods for managing the contained effects if you wanted to, but as it stands it works quite cleanly.

Am I reusing OpenCL/Cloo(C#) objects correctly?

I'm experimenting with OpenCL (through Cloo's C# interface). To do so, I'm experimenting with the customary matrix-multiplication-on-the-GPU. The problem is, during my speed tests, the application crashes. I'm trying to be efficient regarding the the re-allocation of various OpenCL objects, and I'm wondering if I'm botching something in doing so.
I'll put the code in this question, but for a bigger picture, you can get the code from github here: https://github.com/kwende/ClooMatrixMultiply
My main program does this:
Stopwatch gpuSw = new Stopwatch();
gpuSw.Start();
for (int c = 0; c < NumberOfIterations; c++)
{
float[] result = gpu.MultiplyMatrices(matrix1, matrix2, MatrixHeight, MatrixHeight, MatrixWidth);
}
gpuSw.Stop();
So I'm basically doing the call NumberOfIterations times, and timing the average execution time.
Within the MultiplyMatrices call, the first time through, I call Initialize to setup all the objects I'm going to reuse:
private void Initialize()
{
// get the intel integrated GPU
_integratedIntelGPUPlatform = ComputePlatform.Platforms.Where(n => n.Name.Contains("Intel")).First();
// create the compute context.
_context = new ComputeContext(
ComputeDeviceTypes.Gpu, // use the gpu
new ComputeContextPropertyList(_integratedIntelGPUPlatform), // use the intel openCL platform
null,
IntPtr.Zero);
// the command queue is the, well, queue of commands sent to the "device" (GPU)
_commandQueue = new ComputeCommandQueue(
_context, // the compute context
_context.Devices[0], // first device matching the context specifications
ComputeCommandQueueFlags.None); // no special flags
string kernelSource = null;
using (StreamReader sr = new StreamReader("kernel.cl"))
{
kernelSource = sr.ReadToEnd();
}
// create the "program"
_program = new ComputeProgram(_context, new string[] { kernelSource });
// compile.
_program.Build(null, null, null, IntPtr.Zero);
_kernel = _program.CreateKernel("ComputeMatrix");
}
I then enter the main body of my function (the part that will be executed NumberOfIterations times).
ComputeBuffer<float> matrix1Buffer = new ComputeBuffer<float>(_context,
ComputeMemoryFlags.ReadOnly| ComputeMemoryFlags.CopyHostPointer,
matrix1);
_kernel.SetMemoryArgument(0, matrix1Buffer);
ComputeBuffer<float> matrix2Buffer = new ComputeBuffer<float>(_context,
ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.CopyHostPointer,
matrix2);
_kernel.SetMemoryArgument(1, matrix2Buffer);
float[] ret = new float[matrix1Height * matrix2Width];
ComputeBuffer<float> retBuffer = new ComputeBuffer<float>(_context,
ComputeMemoryFlags.WriteOnly | ComputeMemoryFlags.CopyHostPointer,
ret);
_kernel.SetMemoryArgument(2, retBuffer);
_kernel.SetValueArgument<int>(3, matrix1WidthMatrix2Height);
_kernel.SetValueArgument<int>(4, matrix2Width);
_commandQueue.Execute(_kernel,
new long[] { 0 },
new long[] { matrix2Width ,matrix1Height },
null, null);
unsafe
{
fixed (float* retPtr = ret)
{
_commandQueue.Read(retBuffer,
false, 0,
ret.Length,
new IntPtr(retPtr),
null);
_commandQueue.Finish();
}
}
The third or fourth time through (it's somewhat random, which hints at memory access issues), the program crashes. Here is my kernel (I'm sure there are faster implementations, but right now my goal is just to get something working without blowing up):
kernel void ComputeMatrix(
global read_only float* matrix1,
global read_only float* matrix2,
global write_only float* output,
int matrix1WidthMatrix2Height,
int matrix2Width)
{
int x = get_global_id(0);
int y = get_global_id(1);
int i = y * matrix2Width + x;
float value = 0.0f;
// row y of matrix1 * column x of matrix2
for (int c = 0; c < matrix1WidthMatrix2Height; c++)
{
int m1Index = y * matrix1WidthMatrix2Height + c;
int m2Index = c * matrix2Width + x;
value += matrix1[m1Index] * matrix2[m2Index];
}
output[i] = value;
}
Ultimately the goal here is to better understand the zero-copy features of OpenCL (since I'm using Intel's integrated GPU). I have been having trouble getting it to work and so wanted to step back a bit to see if I understood even more basic things...apparently I don't as I can't get even this to work without blowing up.
The only other thing I can think of is it's how I'm pinning the pointer to send it to the .Read() function. But I don't know of an alternative.
Edit:
For what it's worth, I updated the last part of code (the read code) to this, and it still crashes:
_commandQueue.ReadFromBuffer(retBuffer, ref ret, false, null);
_commandQueue.Finish();
Edit #2
Solution found by huseyin tugrul buyukisik (see comment below).
Upon placing
matrix1Buffer.Dispose();
matrix2Buffer.Dispose();
retBuffer.Dispose();
At the end, it all worked fine.
OpenCl resources like buffers, kernels and commandqueues should be released after other resources that they are bound-to are released. Re-creating without releasing depletes avaliable slots quickly.
You have been re-creating arrays in a method of gpu and that was the scope of opencl buffers. When it finishes, GC cannot track opencl's unmanaged memory areas and that causes leaks, which makes crashes.
Many opencl implementations use C++ bindings which needs explicit release commands by C#, Java and other environments.
Also the set-argument part is not needed many times when repeated kernel executions use exact same buffer order as parameters of kernel.

Categories

Resources