My application slows down when processing Kinect RGB images with EMGU library

My application slows down when processing Kinect RGB images with EMGU library - c#

I'm currently using Kinect SDK with C# ( WPF application). I need to get RGB stream and process the images with EMGU library.
The problem is when i try to process the image with EMGU ( like converting image's format and change the colour of some pixels ) the application slows down and takes too long to respond .
I'm using 8GO RAM / Intel HD graphics 4000 / Intel core i7 .
Here's my simple code :
http://pastebin.com/5frLRwMN
Please help me :'(

I have run considerably heavier code (blob analysis) with the Kinect on a per frame basis and got away with great performance on a machine of similar configuration as yours, so I believe we can rule out your machine as the problem. I don't see any EMGU code in your sample however. In your example, you loop through 307k pixels with a pair of for loops. This is naturally a costly procedure to run, depending on the code in your loops. As you might expect, GetPixel and SetPixel are very slow methods to execute.
To speed up your code, first turn your image into an Emgu Image. Then to access your image, use a Byte:
Byte workImageRed = image.Data[x, y, 0];
Byte workImageGreen = image.Data[x, y, 1];
...
The third column refers to the BGR data. To set the pixel to another colour, try something like this:
byte[,,] workIm = image.Data;
workIm[x, y, 0] = 255;
workIm[x, y, 1] = 20;
...
Alternatively, you can set the pixel to a colour directly:
image[x, y] = new Bgr(Color.Blue);
This might be slower however.

Image processing is always slow. And if you do it at 30fps, it's normal that you get you app to hang: real time image processing is always a challenge. You may need to drop some frames in order to increase performace (...or perhaps switch to native C++ and seek a faster library).

Related

Fast desktop image capture

I am trying to develop a basic screen sharing and collaboration app in C#. I am currently working on capturing the screen, finding areas of the screen that have changed and subsequently need to be transmitted to the end client.
I am having a problem in that the overall frame rate of the screen capture is too low. I have a fairly good algorithm for finding areas of the screen that have changed. Given a byte array of pixels on the screen it calculates areas that have changed in 2-4ms, however the overall frame rate I am getting is 15-18 fps (i.e. taking somewhere around 60ms per frame). The bottleneck is capturing the data on the screen as a byte array which is taking around 35-50ms. I have tried a couple of different techniques and can't push the fps past 20.
At first I tried something like this:
var _bmp = new Bitmap(screenSectionToMonitor.Width, screenSectionToMonitor.Height);
var _gfx = Graphics.FromImage(_bmp);
_gfx.CopyFromScreen(_screenSectionToMonitor.X, _screenSectionToMonitor.Y, 0, 0, new Size(_screenSectionToMonitor.Width, _screenSectionToMonitor.Height), CopyPixelOperation.SourceCopy);
var data = _bmp.LockBits(new Rectangle(0, 0, _screenSectionToMonitor.Width, _screenSectionToMonitor.Height), ImageLockMode.ReadOnly, _bmp.PixelFormat);
var ptr = data.Scan0;
Marshal.Copy(ptr, _screenshot, 0, _screenSectionToMonitor.Height * _screenSectionToMonitor.Width * _bytesPerPixel);
_bmp.UnlockBits(data);
This is too slow taking around 45ms just to run the code above for a single 1080p screen. This makes the overall frame rate too slow to be smooth, so I then tried using DirectX as per the example here:
http://www.codeproject.com/Articles/274461/Very-fast-screen-capture-using-DirectX-in-Csharp
However this didn't really net any results. It marginally increased the speed of the screen capture but it was still much too slow (taking around 25-40ms, and the small increase wasn't worth the overhead of the extra DLLs, code, etc.
After googling around a bit I couldn't really find any better solutions, so my question is what is the best way to capture the pixels currently displaying on the screen? An ideal solution would:
Capture the screen as an array of bytes as RGBA
Work on older windows platforms (e.g. Windows XP and above)
Work with multiple displays
Uses existing system libraries rather than 3rd party DLLs
All these points are negotiable for a solution that return a decent overall framerate, in the region of 5-10ms for the actual capturing so the framerate can be 40-60fps.
Alternatively, If there no solution that matches above, am I taking the wrong path to calculate screen changes. Is there a better way to calculate areas of the screen that have changed?

Perhaps you can access the screen buffers at a lower level of code and hook directly into the layers and regions Windows uses as part of its screen updates. It sounds like you are after the raw display changes and Windows already has to keep track of this data. Just offering a direction for you to pursue while you find someone more knowledgeable.

Making C# mandelbrot drawing more efficient

First of all, I am aware that this question really sounds as if I didn't search, but I did, a lot.
I wrote a small Mandelbrot drawing code for C#, it's basically a windows form with a PictureBox on which I draw the Mandelbrot set.
My problem is, is that it's pretty slow. Without a deep zoom it does a pretty good job and moving around and zooming is pretty smooth, takes less than a second per drawing, but once I start to zoom in a little and get to places which require more calculations it becomes really slow.
On other Mandelbrot applications my computer does really fine on places which work much slower in my application, so I'm guessing there is much I can do to improve the speed.
I did the following things to optimize it:
Instead of using the SetPixel GetPixel methods on the bitmap object, I used LockBits method to write directly to memory which made things a lot faster.
Instead of using complex number objects (with classes I made myself, not the built-in ones), I emulated complex numbers using 2 variables, re and im. Doing this allowed me to cut down on multiplications because squaring the real part and the imaginary part is something that is done a few time during the calculation, so I just save the square in a variable and reuse the result without the need to recalculate it.
I use 4 threads to draw the Mandelbrot, each thread does a different quarter of the image and they all work simultaneously. As I understood, that means my CPU will use 4 of its cores to draw the image.
I use the Escape Time Algorithm, which as I understood is the fastest?
Here is my how I move between the pixels and calculate, it's commented out so I hope it's understandable:
//Pixel by pixel loop:
for (int r = rRes; r < wTo; r++)
{
for (int i = iRes; i < hTo; i++)
{
//These calculations are to determine what complex number corresponds to the (r,i) pixel.
double re = (r - (w/2))*step + zeroX ;
double im = (i - (h/2))*step - zeroY;
//Create the Z complex number
double zRe = 0;
double zIm = 0;
//Variables to store the squares of the real and imaginary part.
double multZre = 0;
double multZim = 0;
//Start iterating the with the complex number to determine it's escape time (mandelValue)
int mandelValue = 0;
while (multZre + multZim < 4 && mandelValue < iters)
{
/*The new real part equals re(z)^2 - im(z)^2 + re(c), we store it in a temp variable
tempRe because we still need re(z) in the next calculation
*/
double tempRe = multZre - multZim + re;
/*The new imaginary part is equal to 2*re(z)*im(z) + im(c)
* Instead of multiplying these by 2 I add re(z) to itself and then multiply by im(z), which
* means I just do 1 multiplication instead of 2.
*/
zRe += zRe;
zIm = zRe * zIm + im;
zRe = tempRe; // We can now put the temp value in its place.
// Do the squaring now, they will be used in the next calculation.
multZre = zRe * zRe;
multZim = zIm * zIm;
//Increase the mandelValue by one, because the iteration is now finished.
mandelValue += 1;
}
//After the mandelValue is found, this colors its pixel accordingly (unsafe code, accesses memory directly):
//(Unimportant for my question, I doubt the problem is with this because my code becomes really slow
// as the number of ITERATIONS grow, this only executes more as the number of pixels grow).
Byte* pos = px + (i * str) + (pixelSize * r);
byte col = (byte)((1 - ((double)mandelValue / iters)) * 255);
pos[0] = col;
pos[1] = col;
pos[2] = col;
}
}
What can I do to improve this? Do you find any obvious optimization problems in my code?
Right now there are 2 ways I know I can improve it:
I need to use a different type for numbers, double is limited with accuracy and I'm sure there are better non-built-in alternative types which are faster (they multiply and add faster) and have more accuracy, I just need someone to point me where I need to look and tell me if it's true.
I can move processing to the GPU. I have no idea how to do this (OpenGL maybe? DirectX? is it even that simple or will I need to learn a lot of stuff?). If someone can send me links to proper tutorials on this subject or tell me in general about it that would be great.
Thanks a lot for reading that far and hope you can help me :)

If you decide to move the processing to the gpu, you can choose from a number of options. Since you are using C#, XNA will allow you to use HLSL. RB Whitaker has the easiest XNA tutorials if you choose this option. Another option is OpenCL. OpenTK comes with a demo program of a julia set fractal. This would be very simple to modify to display the mandlebrot set. See here
Just remember to find the GLSL shader that goes with the source code.
About the GPU, examples are no help for me because I have absolutely
no idea about this topic, how does it even work and what kind of
calculations the GPU can do (or how is it even accessed?)
Different GPU software works differently however ...
Typically a programmer will write a program for the GPU in a shader language such as HLSL, GLSL or OpenCL. The program written in C# will load the shader code and compile it, and then use functions in an API to send a job to the GPU and get the result back afterwards.
Take a look at FX Composer or render monkey if you want some practice with shaders with out having to worry about APIs.
If you are using HLSL, the rendering pipeline looks like this.
The vertex shader is responsible for taking points in 3D space and calculating their position in your 2D viewing field. (Not a big concern for you since you are working in 2D)
The pixel shader is responsible for applying shader effects to the pixels after the vertex shader is done.
OpenCL is a different story, its geared towards general purpose GPU computing (ie: not just graphics). Its more powerful and can be used for GPUs, DSPs, and building super computers.

WRT coding for the GPU, you can look at Cudafy.Net (it does OpenCL too, which is not tied to NVidia) to start getting an understanding of what's going on and perhaps even do everything you need there. I've quickly found it - and my graphics card - unsuitable for my needs, but for the Mandelbrot at the stage you're at, it should be fine.
In brief: You code for the GPU with a flavour of C (Cuda C or OpenCL normally) then push the "kernel" (your compiled C method) to the GPU followed by any source data, and then invoke that "kernel", often with parameters to say what data to use - or perhaps a few parameters to tell it where to place the results in its memory.
When I've been doing fractal rendering myself, I've avoided drawing to a bitmap for the reasons already outlined and deferred the render phase. Besides that, I tend to write massively multithreaded code which is really bad for trying to access a bitmap. Instead, I write to a common store - most recently I've used a MemoryMappedFile (a builtin .Net class) since that gives me pretty decent random access speed and a huge addressable area. I also tend to write my results to a queue and have another thread deal with committing the data to storage; the compute times of each Mandelbrot pixel will be "ragged" - that is to say that they will not always take the same length of time. As a result, your pixel commit could be the bottleneck for very low iteration counts. Farming it out to another thread means your compute threads are never waiting for storage to complete.
I'm currently playing with the Buddhabrot visualisation of the Mandelbrot set, looking at using a GPU to scale out the rendering (since it's taking a very long time with the CPU) and having a huge result-set. I was thinking of targetting an 8 gigapixel image, but I've come to the realisation that I need to diverge from the constraints of pixels, and possibly away from floating point arithmetic due to precision issues. I'm also going to have to buy some new hardware so I can interact with the GPU differently - different compute jobs will finish at different times (as per my iteration count comment earlier) so I can't just fire batches of threads and wait for them all to complete without potentially wasting a lot of time waiting for one particularly high iteration count out of the whole batch.
Another point to make that I hardly ever see being made about the Mandelbrot Set is that it is symmetrical. You might be doing twice as much calculating as you need to.

For moving the processing to the GPU, you have lots of excellent examples here:
https://www.shadertoy.com/results?query=mandelbrot
Note that you need an WebGL capable browser to view that link. Works best in Chrome.
I'm no expert on fractals but you seem to have come far already with the optimizations. Going beyond that may make the code much harder to read and maintain so you should ask yourself it is worth it.
One technique I've often observed in other fractal programs is this: While zooming, calculate the fractal at a lower resolution and stretch it to full size during render. Then render at full resolution as soon as zooming stops.
Another suggestion is that when you use multiple threads you should take care that each thread don't read/write memory of other threads because this will cause cache collisions and hurt performance. One good algorithm could be split the work up in scanlines (instead of four quarters like you did now). Create a number of threads, then as long as there as lines left to process, assign a scanline to a thread that is available. Let each thread write the pixel data to a local piece of memory and copy this back to main bitmap after each line (to avoid cache collisions).

A function like "glReadPixels" in DirectX / SharpDX

I'm searching for a way to read a pixels color at the mousepoint. In OpenGL it was done by calling the function "glReadPixels" after drawing the scene (or parts of it). I want to make a simple color picking routine in the background, for identifing a shapes / lines in 3D Space.
So, is there any equivalent method/function/suggestion for doing the same in SharpDX (DirectX10 / DirectX11) ?

This is perfectly possible with Direct3D11: simply follow these steps:
Use DeviceContext.CopySubResourceRegion to copy part from the source texture to a staging texture (size of the pixel area you want to readback, same format, but with ResourceUsage.Staging)
Retrieve the pixel from this staging texture using Device.Map/UnMap.
There is plenty of discussion about this topic around the net (for example: "Reading one pixel from texture to CPU in DX11")

Another option is to use a small compute shader, like :
Texture2D<float4> TextureInput: register(t0);
StructuredBuffer<float2> UVBuffer: register(t1);
RWStructuredBuffer<float4> RWColorBuffer : register(u0);
SamplerState Sampler : register( s0 );
[numthreads(1, 1, 1)]
void CSGetPixels( uint3 DTid : SV_DispatchThreadID )
{
float4 c = TextureInput.SampleLevel(Sampler , UVBuffer[DTid.x].xy, 0);
RWColorBuffer [DTid.x] = c;
}
It gives you the advantage of being a bit more "format agnostic".
Process is then like that.
Create a small structured buffer for UV (float2) (pixel position/texture size, don't forget to flip Y axis of course). Copy the pixel position you want to sample into this buffer.
Create a writeable buffer and a staging buffer (float4). Needs to be same element count as your uv buffer.
Bind all and Dispatch
Copy writeable buffer into the staging one.
Map and read float4 data in cpu
Please note I omitted thread group optimization/checks in compute shader for simplicity.

Since you're using C#, my suggestion would be to use GDI+, as there is no such function like "glReadPixels" in DX. GDI+ offers very easy methods of reading the color of a pixel at your mouse pointer. Refer to stackoverflow.com/questions/1483928.
If GDI+ is a no go, as it isn't very fast, can't you stick to the usual object picking using a "Ray"? You want to identify (I suppose 3-dimensional) shapes/lines, this would be easy using a ray (and check for intersection) to pick them.

Graphics.DrawImage creates different image data on x86 and x64

Hey there!
Here is my setting:
I've got a c# application that extracts features from a series of images. Due to the size of a dataset (several thousand images) it is heavily parallelized, that's why we have a high-end machine with ssd that runs on Windows7 x64 (.NET4 runtime) to lift the hard work. I'm developing it on a Windows XP SP3 x86 machine under Visual Studio 2008 (.NET3.5) with Windows Forms - no chance to move to WPF by the way.
Edit3:
It's weird but I think I finally found out what's going on. Seems to be the codec for the image format that yields different results on the two machines! I don't know exactly what is going on there but the decoder on the xp machine produces more sane results than the win7 one. Sadly the better version is still in the x86 XP system :(. I guess the only solution to this one is changing the input image format to something lossless like png or bmp (Stupid me not thinking about the file format in the first place :)).
Edit2:
Thank you for your efforts. I think I will stick to implementing a converter on my own, it's not exactly what I wanted but I have to solve it somehow :). If anybody is reading this who has some ideas for me please let me know.
Edit:
In the comments I was recommended to use a third party lib for this. I think I didn't made myself clear enough in that I don't really want to use the DrawImage approach anyway - it's just a flawed quickhack to get an actually working new Bitmap(tmp, ... myPixelFormat) that would hopefully use some interpolation. The thing I want to achieve is solely to convert the incoming image to a common PixelFormat with some standard interpolation.
My problem is as follows. Some of the source images are in Indexed8bpp jpg format that don't get along very well with the WinForms imaging stuff. Therefore in my image loading logic there is a check for indexed images that will convert the image to my applications default format (e.g. Format16bpp) like that:
Image GetImageByPath(string path)
{
Image result = null;
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
Image tmp = Image.FromStream(fs); // Here goes the same image ...
if (tmp.PixelFormat == PixelFormat.Format1bppIndexed ||
tmp.PixelFormat == PixelFormat.Format4bppIndexed ||
tmp.PixelFormat == PixelFormat.Format8bppIndexed ||
tmp.PixelFormat == PixelFormat.Indexed)
{
// Creating a Bitmap container in the application's default format
result = new Bitmap(tmp.Width, tmp.Height, DefConf.DefaultPixelFormat);
Graphics g = Graphics.FromImage(result);
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
// We need not to scale anything in here
Rectangle drawRect = new Rectangle(0, 0, tmp.Width, tmp.Height);
// (*) Here is where the strange thing happens - I know I could use
// DrawImageUnscaled - that isn't working either
g.DrawImage(tmp, drawRect, drawRect, GraphicsUnit.Pixel);
g.Dispose();
}
else
{
result = new Bitmap(tmp); // Just copying the input stream
}
tmp.Dispose();
}
// (**) At this stage the x86 XP memory image differs from the
// the x64 Win7 image despite having the same settings
// on the very same image o.O
result.GetPixel(0, 0).B; // x86: 102, x64: 102
result.GetPixel(1, 0).B; // x86: 104, x64: 102
result.GetPixel(2, 0).B; // x86: 83, x64: 85
result.GetPixel(3, 0).B; // x86: 117, x64: 121
...
return result;
}
I tracked the problem down to (*). I think the InterpolationMode has something to do with it but there's no difference which of them I choose the results are different at (**) on the two systems anyway. I've been investigating test image data with some stupid copy&paste lines, to be sure it's not an issue with accessing the data in a wrong way.
The images all together look like this Electron Backscatter Diffraction Pattern. The actual color values differ subtly but they carry a lot of information - the interpolation even enhances it. It looks like the composition algorithm on the x86 machine uses the InterpolationMode property whereas the x64 thingy just spreads the palette values out without taking any interpolation into account.
I never noticed any difference between the output of the two machines until the day I implemented a histogram view feature on the data in my application. On the x86 machine it is balanced as one would expect it from watching the images. The x64 machine on the other hand would rather give some kind of sparse bar-diagram, an indication of indexed image data. It even effects the overall output data of the whole application - the output differs on both machines with the same data, that's not a good thing.
To me it looks like a bug in the x64 implementation, but that's just me :-). I just want the images on the x64 machine to have the same values as the x86 ones.
If anybody has an idea I'd be very pleased. I've been searching for similar behavior on the net for ages but resistance seems futile :)
Oh look out ... a whale!

If you want to make sure that this is always done the same way, you'll have to write your own code to handle it. Fortunately, it's not too difficult.
Your 8bpp image has a palette that contains the actual color values. You need to read that palette and convert the color values (which, if I remember correctly, are 24 bits) to 16-bit color values. You're going to lose information in the conversion, but you're already losing information in your conversion. At least this way, you'll lost the information in a predictable way.
Put the converted color values (there won't be more than 256 of them) into an array that you can use for lookup. Then ...
Create your destination bitmap and call LockBits to get a pointer to the actual bitmap data. Call LockBits to get a pointer to the bitmap data of the source bitmap. Then, for each pixel:
read the source bitmap pixel (8 bytes)
get the color value (16 bits) from your converted color array
store the color value in the destination bitmap
You could do this with GetPixel and SetPixel, but it would be very very slow.

I vaguely seem to recall that .NET graphics classes rely on GDI+. If that's still the case today, then there's no point in trying your app on different 64 bit systems with different video drivers. Your best bet would be to either do the interpolation using raw GDI operations (P/Invoke) or write your own pixel interpolation routine in software. Neither option is particularly attractive.

You really should use OpenCV for image handling like that, it's available in C# here: OpenCVSharp.

I use a standard method for the graphics object, and with this settings outperforms X86. Count performance at release runs, not debug. Also check optimize code at project properties, build tab. Studio 2017, framework 4.7.1
public static Graphics CreateGraphics(Image i)
{
Graphics g = Graphics.FromImage(i);
g.CompositingMode = CompositingMode.SourceOver;
g.CompositingQuality = CompositingQuality.HighSpeed;
g.InterpolationMode = InterpolationMode.NearestNeighbor;
g.SmoothingMode = SmoothingMode.HighSpeed;
return g;
}

How to load a specific patch/rectangle from an image?

We have an application that show a large image file (satellite image) from local network resource.
To speed up the image rendering, we divide the image to smaller patches (e.g. 6x6 cm) and the app tiles them appropriately.
But each time the satellite image updated, the dividing pre-process should be done, which is a time consuming work.
I wonder how can we load the patches from the original file?
PS 1: I find the LeadTools library, but we need an open source solution.
PS 2: The app is in .NET C#
Edit 1:
The format is not a point for us, but currently it's JPG.
changing the format to a another could be consider, but BMP format is hardly acceptable, because of it large volume.

I wote a beautifull attempt of answer to your question, but my browser ate it... :(
Basically what I tried to say was:
1.- Since Jpeg (and most compression formats) uses a secuential compression, you'll always need to decode all the bits that are before the ones that you need.
2.- The solution I propose need to be done with each format you need to support.
3.- There are a lot of open source jpeg decoders that you could modify. Jpeg decoders need to decode blocks of bits (of variable size) that convert into pixel blocks of size 8x8. What you could do is modify the code to save in memory only the blocks you need and discard all the others as soon as they aren't needed any more (basically as soon as they are decoded). With those memory-saved blocks, create the image you need.
4.- Since Jpeg works with blocks of 8x8, your work could be easier if you work with patches of sizes multiples of 8 pixels.
5.- The modification done to the jpeg decoder could be used to substitute the preprocessing of the images you are doing if you save the patch and discard the blocks as soon as you complete them. It would be really fast and less memory consuming.
I know it needs a lot of work and there are a lot of details to be taken in consideration (specially if you work with color images), but if you need performance I belive you will always end fighting or playing (as you want to see it) with the bytes.
Hope it helps.

I'm not 100% sure what you're after but if you're looking for a way to go from string imagePath, Rectangle desiredPortion to a System.Drawing.Image object then perhaps something like this:
public System.Drawing.Image LoadImagePiece(string imagePath, Rectangle desiredPortion)
{
using (Image img = Image.FromFile(path))
{
Bitmap result = new Bitmap(desiredPortion.Width, desiredPortion.Height, PixelFormat.Format24bppRgb);
using (Graphics g = Graphics.FromImage((Image)result))
{
g.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality;
g.PixelOffsetMode = System.Drawing.Drawing2D.PixelOffsetMode.HighQuality;
g.CompositingQuality = System.Drawing.Drawing2D.CompositingQuality.HighQuality;
g.DrawImage(img, 0, 0, desiredPortion, GraphicsUnit.Pixel);
}
return result;
}
}
Note that for performance reasons you may want to consider building multiple output images at once rather than calling this multiple times - perhaps passing it an array of rectangles and getting back an array of images or similar.
If that's not what you're after can you clarify what you're actually looking for?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.