GDI+ DrawImage notably slower in C++ (Win32) than in C# (WinForms)

GDI+ DrawImage notably slower in C++ (Win32) than in C# (WinForms) - c#

I am porting an application from C# (WinForms) to C++ and noticed that drawing an image using GDI+ is much slower in C++, even though it uses the same API.
The image is loaded at application startup into a System.Drawing.Image or Gdiplus::Image, respectively.
The C# drawing code is (directly in the main form):
public Form1()
{
this.SetStyle(ControlStyles.UserPaint | ControlStyles.AllPaintingInWmPaint | ControlStyles.OptimizedDoubleBuffer, true);
this.image = Image.FromFile(...);
}
private readonly Image image;
protected override void OnPaint(PaintEventArgs e)
{
base.OnPaint(e);
var sw = Stopwatch.StartNew();
e.Graphics.TranslateTransform(this.translation.X, this.translation.Y); /* NOTE0 */
e.Graphics.DrawImage(this.image, 0, 0, this.image.Width, this.image.Height);
Debug.WriteLine(sw.Elapsed.TotalMilliseconds.ToString()); // ~3ms
}
Regarding SetStyle: AFAIK, these flags (1) make WndProc ignore WM_ERASEBKGND, and (2) allocate a temporary HDC and Graphics for double buffered drawing.
The C++ drawing code is more bloated.
I have browsed the reference source of System.Windows.Forms.Control to see how it handles HDC and how it implements double buffering.
As far as I can tell, my implementation matches that closely (see NOTE1) (note that I implemented it in C++ first and then looked at how it's in the .NET source -- I may have overlooked things).
The rest of the program is more or less what you get when you create a fresh Win32 project in VS2019. All error handling omitted for readability.
// In wWinMain:
Gdiplus::GdiplusStartupInput gdiplusStartupInput;
Gdiplus::GdiplusStartup(&gdiplusToken, &gdiplusStartupInput, NULL);
gdip_bitmap = Gdiplus::Image::FromFile(...);
// In the WndProc callback:
case WM_PAINT:
// Need this for the back buffer bitmap
RECT client_rect;
GetClientRect(hWnd, &client_rect);
int client_width = client_rect.right - client_rect.left;
int client_height = client_rect.bottom - client_rect.top;
// Double buffering
HDC hdc0 = BeginPaint(hWnd, &ps);
HDC hdc = CreateCompatibleDC(hdc0);
HBITMAP back_buffer = CreateCompatibleBitmap(hdc0, client_width, client_height); /* NOTE1 */
HBITMAP dummy_buffer = (HBITMAP)SelectObject(hdc, back_buffer);
// Create GDI+ stuff on top of HDC
Gdiplus::Graphics *graphics = Gdiplus::Graphics::FromHDC(hdc);
QueryPerformanceCounter(...);
graphics->DrawImage(gdip_bitmap, 0, 0, bitmap_width, bitmap_height);
/* print performance counter diff */ // -> ~27 ms typically
delete graphics;
// Double buffering
BitBlt(hdc0, 0, 0, client_width, client_height, hdc, 0, 0, SRCCOPY);
SelectObject(hdc, dummy_buffer);
DeleteObject(back_buffer);
DeleteDC(hdc); // This is the temporary double buffer HDC
EndPaint(hWnd, &ps);
/* NOTE1 */: In the .NET source code they don't use CreateCompatibleBitmap, but CreateDIBSection instead.
That improves performance from 27 ms to 21 ms and is very cumbersome (see below).
In both cases I am calling Control.Invalidate or InvalidateRect, respectively, when the mouse moves (OnMouseMove, WM_MOUSEMOVE). The goal is to implement panning with the mouse using SetTransform - that's irrelevant for now as long as draw performance is bad.
NOTE2: https://stackoverflow.com/a/1617930/653473
This answer suggests that using Gdiplus::CachedBitmap is the trick. However, I can find no evidence in the C# WinForms source code that it makes use of cached bitmaps in any way - the C# code uses GdipDrawImageRectI which maps to GdipDrawImageRectI, which maps to Graphics::DrawImage(IN Image* image, IN INT x, IN INT y, IN INT width, IN INT height).
Regarding /* NOTE1 */, here is the replacement for CreateCompatibleBitmap (just substitute CreateVeryCompatibleBitmap):
bool bFillBitmapInfo(HDC hdc, BITMAPINFO *pbmi)
{
HBITMAP hbm = NULL;
bool bRet = false;
// Create a dummy bitmap from which we can query color format info about the device surface.
hbm = CreateCompatibleBitmap(hdc, 1, 1);
pbmi->bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
// Call first time to fill in BITMAPINFO header.
GetDIBits(hdc, hbm, 0, 0, NULL, pbmi, DIB_RGB_COLORS);
if ( pbmi->bmiHeader.biBitCount <= 8 ) {
// UNSUPPORTED
} else {
if ( pbmi->bmiHeader.biCompression == BI_BITFIELDS ) {
// Call a second time to get the color masks.
// It's a GetDIBits Win32 "feature".
GetDIBits(hdc, hbm, 0, pbmi->bmiHeader.biHeight, NULL, pbmi, DIB_RGB_COLORS);
}
bRet = true;
}
if (hbm != NULL) {
DeleteObject(hbm);
hbm = NULL;
}
return bRet;
}
HBITMAP CreateVeryCompatibleBitmap(HDC hdc, int width, int height)
{
BITMAPINFO *pbmi = (BITMAPINFO *)LocalAlloc(LMEM_ZEROINIT, 4096); // Because otherwise I would have to figure out the actual size of the color table at the end; whatever...
bFillBitmapInfo(hdc, pbmi);
pbmi->bmiHeader.biWidth = width;
pbmi->bmiHeader.biHeight = height;
if (pbmi->bmiHeader.biCompression == BI_RGB) {
pbmi->bmiHeader.biSizeImage = 0;
} else {
if ( pbmi->bmiHeader.biBitCount == 16 )
pbmi->bmiHeader.biSizeImage = width * height * 2;
else if ( pbmi->bmiHeader.biBitCount == 32 )
pbmi->bmiHeader.biSizeImage = width * height * 4;
else
pbmi->bmiHeader.biSizeImage = 0;
}
pbmi->bmiHeader.biClrUsed = 0;
pbmi->bmiHeader.biClrImportant = 0;
void *dummy;
HBITMAP back_buffer = CreateDIBSection(hdc, pbmi, DIB_RGB_COLORS, &dummy, NULL, 0);
LocalFree(pbmi);
return back_buffer;
}
Using a very compatible bitmap as the back buffer improves performance from 27 ms to 21 ms.
Regarding /* NOTE0 */ in the C# code -- the code is only fast if the transformation matrix doesn't scale. C# performance drops slightly when upscaling (~9ms), and drops significantly (~22ms) when downsampling.
This hints to: DrawImage probably wants to BitBlt if possible. But it can't in my C++ case because the Bitmap format (that was loaded from disk) is different from the back buffer format or something.
If I create a new more compatible bitmap (this time no clear difference between CreateCompatibleBitmap and CreateVeryCompatibleBitmap), and then draw the original bitmap onto that, and then only use the more compatible bitmap in the DrawImage call, then performance increases to about 4.5 ms. It also has the same performance characteristics when scaling now as the C# code.
if (better_bitmap == NULL)
{
HBITMAP tmp_bitmap = CreateVeryCompatibleBitmap(hdc0, gdip_bitmap->GetWidth(), gdip_bitmap->GetHeight());
HDC copy_hdc = CreateCompatibleDC(hdc0);
HGDIOBJ old = SelectObject(copy_hdc, tmp_bitmap);
Gdiplus::Graphics *copy_graphics = Gdiplus::Graphics::FromHDC(copy_hdc);
copy_graphics->DrawImage(gdip_bitmap, 0, 0, gdip_bitmap->GetWidth(), gdip_bitmap->GetHeight());
// Now tmp_bitmap contains the image, hopefully in the device's preferred format
delete copy_graphics;
SelectObject(copy_hdc, old);
DeleteDC(copy_hdc);
better_bitmap = Gdiplus::Bitmap::FromHBITMAP(tmp_bitmap, NULL);
}
BUT it's still consistently slower, there must be something missing still. And it raises a new question: Why is this not necessary in C# (same image and same machine)? Image.FromFile does not convert the bitmap format on loading as far as I can tell.
Why is the DrawImage call in the C++ code still slower, and what do I need to do to make it as fast as in C#?

I ended up replicating more of the .NET code insanity.
The magic call that makes it go fast is GdipImageForceValidation in System.Drawing.Image.FromFile. This function is basically not documented at all, and it is not even [officially] callable from C++. It is merely mentioned here: https://learn.microsoft.com/en-us/windows/win32/gdiplus/-gdiplus-image-flat
Gdiplus::Image::FromFile and GdipLoadImageFromFile don't actually load the full image into memory. It effectively gets copied from the disk every time it is being drawn. GdipImageForceValidation forces the image to be loaded into memory, or so it seems...
My initial idea of copying the image into a more compatible bitmap was on the right track, but the way I did it does not yield the best performance for GDI+ (because I used a GDI bitmap from the original HDC). Loading the image directly into a new GDI+ bitmap, regardless of pixel format, yields the same performance characteristics as seen in the C# implementation:
better_bitmap = new Gdiplus::Bitmap(gdip_bitmap->GetWidth(), gdip_bitmap->GetHeight(), PixelFormat24bppRGB);
Gdiplus::Graphics *graphics = Gdiplus::Graphics::FromImage(better_bitmap);
graphics->DrawImage(gdip_bitmap, 0, 0, gdip_bitmap->GetWidth(), gdip_bitmap->GetHeight());
delete graphics;
Even better yet, using PixelFormat32bppPARGB further improves performance substantially - the premultiplied alpha pays off when the image is repeatedly drawn (regardless of whether the source image has an alpha channel).
It seems calling GdipImageForceValidation effectively does something similar internally, although I don't know what it really does. Because Microsoft made it as impossible as they could to call the GDI+ flat API from C++ user code, I just modified Gdiplus::Image in my Windows SDK headers to include an appropriate method. Copying the bitmap explicitly to PARGB seems cleaner to me (and yields better performance).
Of course, after one finds out which undocumented function to use, google would also give some additional information: https://photosauce.net/blog/post/image-scaling-with-gdi-part-5-push-vs-pull-and-image-validation
GDI+ is not my favorite API.

Related

Implementing streaming video for Windows 10 UAP

I need to display in XAML a video stream coming from some network source. Video frames can come at undefined intervals. They're already assembled, decoded and presented in BGRA8 form in memory mapped file. XAML frontend is in C#, backend is written in C using WinAPI.
In C# I have a handle of this file.
Previously in .NET 4.5 I was creating InteropBitmap from this handle with System.Windows.Interop.Imaging.CreateBitmapSourceFromMemorySection and called Invalidate on arriving of new frame. Than I used this InteropBitmap as Source for XAML Image.
Now I need to do the same but for Windows 10 UAP platform.
There are no memory mapped files in .NET Core so I created a CX Windows Runtime Component. Here's most important part of it.
static byte* GetPointerToPixelData(IBuffer^ pixelBuffer, unsigned int *length)
{
if (length != nullptr)
{
*length = pixelBuffer->Length;
}
// Query the IBufferByteAccess interface.
ComPtr<IBufferByteAccess> bufferByteAccess;
reinterpret_cast<IInspectable*>(pixelBuffer)->QueryInterface(IID_PPV_ARGS(&bufferByteAccess));
// Retrieve the buffer data.
byte* pixels = nullptr;
bufferByteAccess->Buffer(&pixels);
return pixels;
}
void Adapter::Invalidate()
{
memcpy(m_bitmap_ptr, m_image, m_sz);
m_bitmap->Invalidate();
}
Adapter::Adapter(int handle, int width, int height)
{
m_sz = width * height * 32 / 8;
// Read access to mapped file
m_image = MapViewOfFile((HANDLE)handle, FILE_MAP_READ, 0, 0, m_sz);
m_bitmap = ref new WriteableBitmap(width, height);
m_bitmap_ptr = GetPointerToPixelData(m_bitmap->PixelBuffer, 0);
}
Adapter::~Adapter()
{
if ( m_image != NULL )
UnmapViewOfFile(m_image);
}
Now I can use m_bitmap as Source for XAML Image ( and don't forget to raise property change on invalidate otherwise image won't update ).
Is there a better or more standard way? How can I create WriteableBitmap from m_image so I won't need additional memcpy on invalidate?
UPDATE: I wonder if I can use MediaElement to display sequence of uncompressed bitmaps and get any benefits from it? MediaElement supports filters which is a very nice feature.

c# screen transfer over socket efficient improve ways

thats how i wrote your beautiful code(some simple changes for me for easier understanding)
private void Form1_Load(object sender, EventArgs e)
{
prev = GetDesktopImage();//get a screenshot of the desktop;
cur = GetDesktopImage();//get a screenshot of the desktop;
var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
ApplyXor(locked1, locked2);
compressionBuffer = new byte[1920* 1080 * 4];
// Compressed buffer -- where the data goes that we'll send.
int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;
backbuf = new CompressedCaptureScreen(backbufSize);
MessageBox.Show(compressionBuffer.Length.ToString());
int length = Compress();
MessageBox.Show(backbuf.Data.Length.ToString());//prints the new buffer size
}
the compression buffer length is for example 8294400
and the backbuff.Data.length is 8326947

I didn't like the compression suggestions, so here's what I would do.
You don't want to compress a video stream (so MPEG, AVI, etc are out of the question -- these don't have to be real-time) and you don't want to compress individual pictures (since that's just stupid).
Basically what you want to do is detect if things change and send the differences. You're on the right track with that; most video compressors do that. You also want a fast compression/decompression algorithm; especially if you go to more FPS that will become more relevant.
Differences. First off, eliminate all branches in your code, and make sure memory access is sequential (e.g. iterate x in the inner loop). The latter will give you cache locality. As for the differences, I'd probably use a 64-bit XOR; it's easy, branchless and fast.
If you want performance, it's probably better to do this in C++: The current C# implementation doesn't vectorize your code, and that will help you a great deal here.
Do something like this (I'm assuming 32bit pixel format):
for (int y=0; y<height; ++y) // change to PFor if you like
{
ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
for (int x=0; x<width; x += 2)
row2[x] ^= row1[x];
}
Fast compression and decompression usually means simpler compression algorithms. https://code.google.com/p/lz4/ is such an algorithm, and there's a proper .NET port available for that as well. You might want to read on how it works too; there is a streaming feature in LZ4 and if you can make it handle 2 images instead of 1 that will probably give you a nice compression boost.
All in all, if you're trying to compress white noise, it simply won't work and your frame rate will drop. One way to solve this is to reduce the colors if you have too much 'randomness' in a frame. A measure for randomness is entropy, and there are several ways to get a measure of the entropy of a picture ( https://en.wikipedia.org/wiki/Entropy_(information_theory) ). I'd stick with a very simple one: check the size of the compressed picture -- if it's above a certain limit, reduce the number of bits; if below, increase the number of bits.
Note that increasing and decreasing bits is not done with shifting in this case; you don't need your bits to be removed, you simply need your compression to work better. It's probably just as good to use a simple 'AND' with a bitmask. For example, if you want to drop 2 bits, you can do it like this:
for (int y=0; y<height; ++y) // change to PFor if you like
{
ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
ulong mask = 0xFFFCFCFCFFFCFCFC;
for (int x=0; x<width; x += 2)
row2[x] = (row2[x] ^ row1[x]) & mask;
}
PS: I'm not sure what I would do with the alpha component, I'll leave that up to your experimentation.
Good luck!
The long answer
I had some time to spare, so I just tested this approach. Here's some code to support it all.
This code normally run over 130 FPS with a nice constant memory pressure on my laptop, so the bottleneck shouldn't be here anymore. Note that you need LZ4 to get this working and that LZ4 is aimed at high speed, not high compression ratio's. A bit more on that later.
First we need something that we can use to hold all the data we're going to send. I'm not implementing the sockets stuff itself here (although that should be pretty simple using this as a start), I mainly focused on getting the data you need to send something over.
// The thing you send over a socket
public class CompressedCaptureScreen
{
public CompressedCaptureScreen(int size)
{
this.Data = new byte[size];
this.Size = 4;
}
public int Size;
public byte[] Data;
}
We also need a class that will hold all the magic:
public class CompressScreenCapture
{
Next, if I'm running high performance code, I make it a habit to preallocate all the buffers first. That'll save you time during the actual algorithmic stuff. 4 buffers of 1080p is about 33 MB, which is fine - so let's allocate that.
public CompressScreenCapture()
{
// Initialize with black screen; get bounds from screen.
this.screenBounds = Screen.PrimaryScreen.Bounds;
// Initialize 2 buffers - 1 for the current and 1 for the previous image
prev = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
cur = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
// Clear the 'prev' buffer - this is the initial state
using (Graphics g = Graphics.FromImage(prev))
{
g.Clear(Color.Black);
}
// Compression buffer -- we don't really need this but I'm lazy today.
compressionBuffer = new byte[screenBounds.Width * screenBounds.Height * 4];
// Compressed buffer -- where the data goes that we'll send.
int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;
backbuf = new CompressedCaptureScreen(backbufSize);
}
private Rectangle screenBounds;
private Bitmap prev;
private Bitmap cur;
private byte[] compressionBuffer;
private int backbufSize;
private CompressedCaptureScreen backbuf;
private int n = 0;
First thing to do is capture the screen. This is the easy part: simply fill the bitmap of the current screen:
private void Capture()
{
// Fill 'cur' with a screenshot
using (var gfxScreenshot = Graphics.FromImage(cur))
{
gfxScreenshot.CopyFromScreen(screenBounds.X, screenBounds.Y, 0, 0, screenBounds.Size, CopyPixelOperation.SourceCopy);
}
}
As I said, I don't want to compress 'raw' pixels. Instead, I'd much rather compress XOR masks of previous and the current image. Most of the times this will give you a whole lot of 0's, which is easy to compress:
private unsafe void ApplyXor(BitmapData previous, BitmapData current)
{
byte* prev0 = (byte*)previous.Scan0.ToPointer();
byte* cur0 = (byte*)current.Scan0.ToPointer();
int height = previous.Height;
int width = previous.Width;
int halfwidth = width / 2;
fixed (byte* target = this.compressionBuffer)
{
ulong* dst = (ulong*)target;
for (int y = 0; y < height; ++y)
{
ulong* prevRow = (ulong*)(prev0 + previous.Stride * y);
ulong* curRow = (ulong*)(cur0 + current.Stride * y);
for (int x = 0; x < halfwidth; ++x)
{
*(dst++) = curRow[x] ^ prevRow[x];
}
}
}
}
For the compression algorithm I simply pass the buffer to LZ4 and let it do its magic.
private int Compress()
{
// Grab the backbuf in an attempt to update it with new data
var backbuf = this.backbuf;
backbuf.Size = LZ4.LZ4Codec.Encode(
this.compressionBuffer, 0, this.compressionBuffer.Length,
backbuf.Data, 4, backbuf.Data.Length-4);
Buffer.BlockCopy(BitConverter.GetBytes(backbuf.Size), 0, backbuf.Data, 0, 4);
return backbuf.Size;
}
One thing to note here is that I make it a habit to put everything in my buffer that I need to send over the TCP/IP socket. I don't want to move data around if I can easily avoid it, so I'm simply putting everything that I need on the other side there.
As for the sockets itself, you can use a-sync TCP sockets here (I would), but if you do, you will need to add an extra buffer.
The only thing that remains is to glue everything together and put some statistics on the screen:
public void Iterate()
{
Stopwatch sw = Stopwatch.StartNew();
// Capture a screen:
Capture();
TimeSpan timeToCapture = sw.Elapsed;
// Lock both images:
var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
try
{
// Xor screen:
ApplyXor(locked2, locked1);
TimeSpan timeToXor = sw.Elapsed;
// Compress screen:
int length = Compress();
TimeSpan timeToCompress = sw.Elapsed;
if ((++n) % 50 == 0)
{
Console.Write("Iteration: {0:0.00}s, {1:0.00}s, {2:0.00}s " +
"{3} Kb => {4:0.0} FPS \r",
timeToCapture.TotalSeconds, timeToXor.TotalSeconds,
timeToCompress.TotalSeconds, length / 1024,
1.0 / sw.Elapsed.TotalSeconds);
}
// Swap buffers:
var tmp = cur;
cur = prev;
prev = tmp;
}
finally
{
cur.UnlockBits(locked1);
prev.UnlockBits(locked2);
}
}
Note that I reduce Console output to ensure that's not the bottleneck. :-)
Simple improvements
It's a bit wasteful to compress all those 0's, right? It's pretty easy to track the min and max y position that has data using a simple boolean.
ulong tmp = curRow[x] ^ prevRow[x];
*(dst++) = tmp;
hasdata |= tmp != 0;
You also probably don't want to call Compress if you don't have to.
After adding this feature you'll get something like this on your screen:
Iteration: 0.00s, 0.01s, 0.01s 1 Kb => 152.0 FPS
Using another compression algorithm might also help. I stuck to LZ4 because it's simple to use, it's blazing fast and compresses pretty well -- still, there are other options that might work better. See http://fastcompression.blogspot.nl/ for a comparison.
If you have a bad connection or if you're streaming video over a remote connection, all this won't work. Best to reduce the pixel values here. That's quite simple: apply a simple 64-bit mask during the xor to both the previous and current picture... You can also try using indexed colors - anyhow, there's a ton of different things you can try here; I just kept it simple because that's probably good enough.
You can also use Parallel.For for the xor loop; personally I didn't really care about that.
A bit more challenging
If you have 1 server that is serving multiple clients, things will get a bit more challenging, as they will refresh at different rates. We want the fastest refreshing client to determine the server speed - not slowest. :-)
To implement this, the relation between the prev and cur has to change. If we simply 'xor' away like here, we'll end up with a completely garbled picture at the slower clients.
To solve that, we don't want to swap prev anymore, as it should hold key frames (that you'll refresh when the compressed data becomes too big) and cur will hold incremental data from the 'xor' results. This means you can basically grab an arbitrary 'xor'red frame and send it over the line - as long as the prev bitmap is recent.

H264 or Equaivalent Codec Streaming
There are various compressed streaming available which does almost everything that you can do to optimize screen sharing over network. There are many open source and commercial libraries to stream.
Screen transfer in Blocks
H264 already does this, but if you want to do it yourself, you have to divide your screens into smaller blocks of 100x100 pixels, and compare these blocks with previous version and send these blocks over network.
Window Render Information
Microsoft RDP does lot better, it does not send screen as a raster image, instead it analyzes screen and creates screen blocks based on the windows on the screen. It then analyzes contents of screen and sends image only if needed, if it is a text box with some text in it, RDP sends information to render text box with a text with font information and other information. So instead of sending image, it sends information on what to render.
You can combine all techniques and make a mixed protocol to send screen blocks with image and other rendering information.

Instead of handling data as an array of bytes, you can handle it as an array of integers.
int* p = (int*)((byte*)scan0.ToPointer() + y * stride);
int* p2 = (int*)((byte*)scan02.ToPointer() + y * stride2);
for (int x = 0; x < nWidth; x++)
{
//always get the complete pixel when differences are found
if (*p2 != 0)
*p = *p2
++p;
++p2;
}

Is there a way to resize an image using GPU?

Is there a way to resize an image using GPU (graphic card) that is consumable through a .NET application?
I am looking for an extremely performant way to resize images and have heard that the GPU could do it much quicker than CPU (GDI+ using C#). Are there known implementations or sample code using the GPU to resize images that I could consume in .NET?

Have you thought about using XNA to resize your images? Here you can find out how to use XNA to save image as a png/jpeg to a MemoryStream and later reuse it a Bitmap object:
EDIT: I will post an example here (taken from the link above) on how you can possibly use XNA.
public static Image Texture2Image(Texture2D texture)
{
Image img;
using (MemoryStream MS = new MemoryStream())
{
texture.SaveAsPng(MS, texture.Width, texture.Height);
//Go To the beginning of the stream.
MS.Seek(0, SeekOrigin.Begin);
//Create the image based on the stream.
img = Bitmap.FromStream(MS);
}
return img;
}
I also found out today that you can OpenCV to use GPU/multicore CPUs. You can for example choose to use a .NET wrapper such as Emgu and and use its Image class to manipulate with your picture and return a .NET Bitmap class:
public static Bitmap ResizeBitmap(Bitmap sourceBM, int width, int height)
{
// Initialize Emgu Image object
Image<Bgr, Byte> img = new Image<Bgr, Byte>(sourceBM);
// Resize using liniear interpolation
img.Resize(width, height, INTER.CV_INTER_LINEAR);
// Return .NET Bitmap object
return img.ToBitmap();
}

I wrote a quick spike to check performance using WPF, though I cannot for sure say that its using the GPU.
Still, see below. This scales an image to 33.5 (or whatever) times its original size.
public void Resize()
{
double scaleFactor = 33.5;
var originalFileStream = System.IO.File.OpenRead(#"D:\SkyDrive\Pictures\Random\Misc\DoIt.jpg");
var originalBitmapDecoder = JpegBitmapDecoder.Create(originalFileStream, BitmapCreateOptions.None, BitmapCacheOption.OnLoad);
BitmapFrame originalBitmapFrame = originalBitmapDecoder.Frames.First();
var originalPixelFormat = originalBitmapFrame.Format;
TransformedBitmap transformedBitmap =
new TransformedBitmap(originalBitmapFrame, new System.Windows.Media.ScaleTransform()
{
ScaleX = scaleFactor,
ScaleY = scaleFactor
});
int stride = ((transformedBitmap.PixelWidth * transformedBitmap.Format.BitsPerPixel) + 7) / 8;
int pixelCount = (stride * (transformedBitmap.PixelHeight - 1)) + stride;
byte[] buffer = new byte[pixelCount];
transformedBitmap.CopyPixels(buffer, stride, 0);
WriteableBitmap transformedWriteableBitmap = new WriteableBitmap(transformedBitmap.PixelWidth, transformedBitmap.PixelHeight, transformedBitmap.DpiX, transformedBitmap.DpiY, transformedBitmap.Format, transformedBitmap.Palette);
transformedWriteableBitmap.WritePixels(new Int32Rect(0, 0, transformedBitmap.PixelWidth, transformedBitmap.PixelHeight), buffer, stride, 0);
BitmapFrame transformedFrame = BitmapFrame.Create(transformedWriteableBitmap);
var jpegEncoder = new JpegBitmapEncoder();
jpegEncoder.Frames.Add(transformedFrame);
using (var outputFileStream = System.IO.File.OpenWrite(#"C:\DATA\Scrap\WPF.jpg"))
{
jpegEncoder.Save(outputFileStream);
}
}
The image I was testing was 495 x 360. It resized it to over 16k x 12k in a couple of seconds, including save out.
It resizes to 1.5x around 165 times a second in a single-core run. On an i7 and the GPU seemingly doing nothing, CPU at 20% I'd expect to get 5x more when multithreaded.
Performance profiling shows a hot path to wpfgfx_v0400.dll which is the native WPF graphics library and is close to DirectX (look-up 'milcore' in Google).
So it might be accelerated, I don't know.
Luke

Yes, it is possible to use GPU to resize your images. This can be done using DirectX Surfaces (for example using SlimDx in C#). You should create a surface and move your image to it, and then you can stretch this surface to another target surface of your desired size using only GPU, and finally get back the resized image from the target surface. In these scenario, pixel format of the surfaces can be different and the GPU automatically handles it. But here there are things that can affect the performance of this operation. Moving data between GPU and CPU is a time consuming process. You can apply some techniques to boost performance based on your situation, and avoiding extra data transfer between CPU and GPU memory.

Why GC is making fail the following code snippet

I'm a little bit confusing about how .NET manages images, I have the following code, to build a managed bitmap form an unmanaged HBitmap, perserving the alpha channel.
public static Bitmap GetBitmapFromHBitmap(IntPtr nativeHBitmap)
{
Bitmap bmp = Bitmap.FromHbitmap(nativeHBitmap);
if (Bitmap.GetPixelFormatSize(bmp.PixelFormat) < 32)
return bmp;
BitmapData bmpData;
if (IsAlphaBitmap(bmp, out bmpData))
{
// MY QUESTION IS RELATED TO THIS
// IF CALL SUPPRESS_FINALIZE THE OBJECT
// IT WILL WORK, OTHERWISE IT FAILS
GC.SuppressFinalize(bmp);
return new Bitmap(
bmpData.Width,
bmpData.Height,
bmpData.Stride,
PixelFormat.Format32bppArgb,
bmpData.Scan0);
}
return bmp;
}
private static bool IsAlphaBitmap(Bitmap bmp, out BitmapData bmpData)
{
Rectangle bmpBounds = new Rectangle(0, 0, bmp.Width, bmp.Height);
bmpData = bmp.LockBits(bmpBounds, ImageLockMode.ReadOnly, bmp.PixelFormat);
try
{
return IsAlphaBitmap(bmpData);
}
finally
{
bmp.UnlockBits(bmpData);
}
}
private static bool IsAlphaBitmap(BitmapData bmpData)
{
for (int y = 0; y <= bmpData.Height - 1; y++)
{
for (int x = 0; x <= bmpData.Width - 1; x++)
{
Color pixelColor = Color.FromArgb(
Marshal.ReadInt32(bmpData.Scan0, (bmpData.Stride * y) + (4 * x)));
if (pixelColor.A > 0 & pixelColor.A < 255)
{
return true;
}
}
}
return false;
}
Ok, I know that the line GC.SuppressFinalize(bmp); has no sense, but when I remove that line, sometimes (each 4 or 5 calls) I get the following exception:
Attempted to read or write protected memory. This is often an
indication that other memory is corrupt.
I suspect that the garbage is collecting the bmp object before the return bitmap is drawn, so it try to access to the bits that are disposed by the framework. If I never collect the bmp it works, but it causes a memory leak (the bmp reference is never collected).
Do you know how could I solve this issue?

Take a look at the remakrs for the Bitmap constructor you are using
The caller is responsible for allocating and freeing the block of memory specified by the scan0 parameter. However, the memory should not be released until the related Bitmap is released.
This means that you need to make sure that you keep hold of the underlying block of memory that bmpData is pointing to until after the Bitmap instance returned by GetBitmapFromHBitmap is released.
Your problem is caused because the garbage collector has detected that bmp is unreachable (isn't being used) and so collects / finalizes it which definitely releases the underlying block of memory, however even if you suppress the finalizer you have still called UnlockBits which means that bmpData is already invalid anyway - it might work at the moment but thats completely down to chance. In order to make the above code correct you need to find a mechanism of keeping bmpData (and by extension bmp) valid for as long as the returned Bitmap instance is around - i.e. possibly a significant change to your application.
Alternatively see Converting Bitmap PixelFormats in C# for a completely different way of doing (I think) what you want to achieve while avoiding all of these problems entirely.

Use native HBitmap in C# while preserving alpha channel/transparency

Let's say I get a HBITMAP object/handle from a native Windows function. I can convert it to a managed bitmap using Bitmap.FromHbitmap(nativeHBitmap), but if the native image has transparency information (alpha channel), it is lost by this conversion.
There are a few questions on Stack Overflow regarding this issue. Using information from the first answer of this question (How to draw ARGB bitmap using GDI+?), I wrote a piece of code that I've tried and it works.
It basically gets the native HBitmap width, height and the pointer to the location of the pixel data using GetObject and the BITMAP structure, and then calls the managed Bitmap constructor:
Bitmap managedBitmap = new Bitmap(bitmapStruct.bmWidth, bitmapStruct.bmHeight,
bitmapStruct.bmWidth * 4, PixelFormat.Format32bppArgb, bitmapStruct.bmBits);
As I understand (please correct me if I'm wrong), this does not copy the actual pixel data from the native HBitmap to the managed bitmap, it simply points the managed bitmap to the pixel data from the native HBitmap.
And I don't draw the bitmap here on another Graphics (DC) or on another bitmap, to avoid unnecessary memory copying, especially for large bitmaps.
I can simply assign this bitmap to a PictureBox control or the the Form BackgroundImage property. And it works, the bitmap is displayed correctly, using transparency.
When I no longer use the bitmap, I make sure the BackgroundImage property is no longer pointing to the bitmap, and I dispose both the managed bitmap and the native HBitmap.
The Question: Can you tell me if this reasoning and code seems correct. I hope I will not get some unexpected behaviors or errors. And I hope I'm freeing all the memory and objects correctly.
private void Example()
{
IntPtr nativeHBitmap = IntPtr.Zero;
/* Get the native HBitmap object from a Windows function here */
// Create the BITMAP structure and get info from our nativeHBitmap
NativeMethods.BITMAP bitmapStruct = new NativeMethods.BITMAP();
NativeMethods.GetObjectBitmap(nativeHBitmap, Marshal.SizeOf(bitmapStruct), ref bitmapStruct);
// Create the managed bitmap using the pointer to the pixel data of the native HBitmap
Bitmap managedBitmap = new Bitmap(
bitmapStruct.bmWidth, bitmapStruct.bmHeight, bitmapStruct.bmWidth * 4, PixelFormat.Format32bppArgb, bitmapStruct.bmBits);
// Show the bitmap
this.BackgroundImage = managedBitmap;
/* Run the program, use the image */
MessageBox.Show("running...");
// When the image is no longer needed, dispose both the managed Bitmap object and the native HBitmap
this.BackgroundImage = null;
managedBitmap.Dispose();
NativeMethods.DeleteObject(nativeHBitmap);
}
internal static class NativeMethods
{
[StructLayout(LayoutKind.Sequential)]
public struct BITMAP
{
public int bmType;
public int bmWidth;
public int bmHeight;
public int bmWidthBytes;
public ushort bmPlanes;
public ushort bmBitsPixel;
public IntPtr bmBits;
}
[DllImport("gdi32", CharSet = CharSet.Auto, EntryPoint = "GetObject")]
public static extern int GetObjectBitmap(IntPtr hObject, int nCount, ref BITMAP lpObject);
[DllImport("gdi32.dll")]
internal static extern bool DeleteObject(IntPtr hObject);
}

The following code worked for me even if the HBITMAP is an icon or bmp, it doesn't flip the image when it's an icon, and also works with bitmaps that don't contain Alpha channel:
private static Bitmap GetBitmapFromHBitmap(IntPtr nativeHBitmap)
{
Bitmap bmp = Bitmap.FromHbitmap(nativeHBitmap);
if (Bitmap.GetPixelFormatSize(bmp.PixelFormat) < 32)
return bmp;
BitmapData bmpData;
if (IsAlphaBitmap(bmp, out bmpData))
return GetlAlphaBitmapFromBitmapData(bmpData);
return bmp;
}
private static Bitmap GetlAlphaBitmapFromBitmapData(BitmapData bmpData)
{
return new Bitmap(
bmpData.Width,
bmpData.Height,
bmpData.Stride,
PixelFormat.Format32bppArgb,
bmpData.Scan0);
}
private static bool IsAlphaBitmap(Bitmap bmp, out BitmapData bmpData)
{
Rectangle bmBounds = new Rectangle(0, 0, bmp.Width, bmp.Height);
bmpData = bmp.LockBits(bmBounds, ImageLockMode.ReadOnly, bmp.PixelFormat);
try
{
for (int y = 0; y <= bmpData.Height - 1; y++)
{
for (int x = 0; x <= bmpData.Width - 1; x++)
{
Color pixelColor = Color.FromArgb(
Marshal.ReadInt32(bmpData.Scan0, (bmpData.Stride * y) + (4 * x)));
if (pixelColor.A > 0 & pixelColor.A < 255)
{
return true;
}
}
}
}
finally
{
bmp.UnlockBits(bmpData);
}
return false;
}

Right, no copy is made. Which is why the Remarks section of the MSDN Library says:
The caller is responsible for
allocating and freeing the block of
memory specified by the scan0
parameter, however, the memory should
not be released until the related
Bitmap is released.
This wouldn't be a problem if the pixel data was copied. Incidentally, this is normally a difficult problem to deal with. You can't tell when the client code called Dispose(), there's no way to intercept that call. Which makes it impossible to make such a bitmap behave like a replacement for Bitmap. The client code has to be aware that additional work is needed.

After reading the good points made by Hans Passant in his answer, I changed the method to immediately copy the pixel data into the managed bitmap, and free the native bitmap.
I'm creating two managed bitmap objects (but only one allocates memory for the actual pixel data), and use graphics.DrawImage to copy the image. Is there a better way to accomplish this? Or is this good/fast enough?
public static Bitmap CopyHBitmapToBitmap(IntPtr nativeHBitmap)
{
// Get width, height and the address of the pixel data for the native HBitmap
NativeMethods.BITMAP bitmapStruct = new NativeMethods.BITMAP();
NativeMethods.GetObjectBitmap(nativeHBitmap, Marshal.SizeOf(bitmapStruct), ref bitmapStruct);
// Create a managed bitmap that has its pixel data pointing to the pixel data of the native HBitmap
// No memory is allocated for its pixel data
Bitmap managedBitmapPointer = new Bitmap(
bitmapStruct.bmWidth, bitmapStruct.bmHeight, bitmapStruct.bmWidth * 4, PixelFormat.Format32bppArgb, bitmapStruct.bmBits);
// Create a managed bitmap and allocate memory for pixel data
Bitmap managedBitmapReal = new Bitmap(bitmapStruct.bmWidth, bitmapStruct.bmHeight, PixelFormat.Format32bppArgb);
// Copy the pixels of the native HBitmap into the canvas of the managed bitmap
Graphics graphics = Graphics.FromImage(managedBitmapReal);
graphics.DrawImage(managedBitmapPointer, 0, 0);
// Delete the native HBitmap object and free memory
NativeMethods.DeleteObject(nativeHBitmap);
// Return the managed bitmap, clone of the native HBitmap, with correct transparency
return managedBitmapReal;
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

GDI+ DrawImage notably slower in C++ (Win32) than in C# (WinForms) - c#

Related

Implementing streaming video for Windows 10 UAP

c# screen transfer over socket efficient improve ways

Is there a way to resize an image using GPU?

Why GC is making fail the following code snippet

Use native HBitmap in C# while preserving alpha channel/transparency

Categories

Resources