I am writing a video player to play back frames captured by our ASIC. They are in a custom format, and I have been provided with a function that decodes the ASIC states. The video can be any size from 640x480 to 2560x1200 (!!!). The output of each state cycle is a 16x16 blocks of pixels, which I have to get into a video on the screen.
Every time the screen needs to be updated, I have the following information:
Block width
Block height
X coordinate of block start
Y coordinate of block start
A 1-D array of RGB32 pixel color info
Major limitations:
.NET 3.5
No unsafe code
I spent this morning trying out a WriteableBitmap, and using it as a source for an Image, something like this:
private WriteableBitmap ImageSource;
public MainWindow()
{
InitializeComponent();
ImageSource = new WriteableBitmap(FrameWidth, FrameHeight, 96, 96, PixelFormats.Bgr32, null);
ImagePanel.Source = ImageSource;
}
private void DrawBox(byte Red, byte Green, byte Blue, int X, int Y)
{
int BoxWidth = 16;
int BoxHeight = 16;
int BytesPerPixel = ImageSource.Format.BitsPerPixel / 8;
byte[] Pixels = new byte[BoxWidth * BoxHeight * BytesPerPixel];
for (int i = 0; i < Pixels.Length; i += 4)
{
Pixels[i + 0] = Blue;
Pixels[i + 1] = Green;
Pixels[i + 2] = Red;
Pixels[i + 3] = 0;
}
int Stride = BoxWidth * BytesPerPixel;
Int32Rect DrawBox = new Int32Rect(X, Y, BoxWidth, BoxHeight);
ImageSource.Lock();
ImageSource.WritePixels(DrawBox, Pixels, Stride, 0);
ImageSource.Unlock();
}
It works, my video plays on the screen, but it is sloooooooow. Nowhere near real-time play speed. Is there a better way, besides procedurally generating a series of bitmaps, to do this that I'm not seeing? I've read something about D3Dimage, but it seems that's more for exporting a 3D scene to a bitmap. Suggestions here?
If I understand correctly you receive a 16*16 pixel block as a capture, and in the example above you are then updating the writeable bitmap on each of these updates.
This looks like a lot of churn as a render is being triggered on each block.
Would it not be better to:
Maintain a single byte buffer that represents an entire frame.
Update the buffer with your 16*16 blocks as you receive them.
When you have received all blocks for a frame, write the buffer to the bitmap.
Re-use the frame buffer for the next frame by over-writing it.
In this way you will have far less churn on rendering as you will not trigger a render for each block you receive, but only for each frame.
Update 1
I would also consider using Bgr24 as your pixel format if you are not using an alpha channel. That way you will have 3 bytes per pixel, instead of 4, so quite a lot less overhead.
Update 2
If things are still too slow for larger frame sizes, you could also consider the following which adds complexity though.
Consider maintaining and flipping two frame buffers. In this way you could composite frames on a background thread and then blit the finished frame on the UI thread. Whist the UI thread is rendering the frame, you can continue compositing on the alternative buffer.
Update 3
If your pixel colour info array from your capture is in the correct format, you could try using Buffer.BlockCopy to bulk copy the source pixel array into the frame buffer array. You would have to keep your bitmap format as bgr32 though. This low level array copying technique could yield some performance gains.
Related
I'm using SharpDx to capture the screen (1 to 60fps). Some frames are all transparent and end up getting processed and saved by the code.
Is there any simple/fast way to detect these frames drops without having to open the generated bitmap and look for the alpha values?
Here's what I'm using (saves capture as image):
try
{
//Try to get duplicated frame within given time.
_duplicatedOutput.AcquireNextFrame(MinimumDelay, out var duplicateFrameInformation, out var screenResource);
//Copy resource into memory that can be accessed by the CPU.
using (var screenTexture2D = screenResource.QueryInterface<Texture2D>())
_device.ImmediateContext.CopySubresourceRegion(screenTexture2D, 0, new ResourceRegion(Left, Top, 0, Left + Width, Top + Height, 1), _screenTexture, 0);
//Get the desktop capture texture.
var mapSource = _device.ImmediateContext.MapSubresource(_screenTexture, 0, MapMode.Read, MapFlags.None); //, out var stream);
#region Get image data
var bitmap = new System.Drawing.Bitmap(Width, Height, PixelFormat.Format32bppArgb);
var boundsRect = new System.Drawing.Rectangle(0, 0, Width, Height);
//Copy pixels from screen capture Texture to GDI bitmap.
var mapDest = bitmap.LockBits(boundsRect, ImageLockMode.WriteOnly, bitmap.PixelFormat);
var sourcePtr = mapSource.DataPointer;
var destPtr = mapDest.Scan0;
for (var y = 0; y < Height; y++)
{
//Copy a single line
Utilities.CopyMemory(destPtr, sourcePtr, Width * 4);
//Advance pointers
sourcePtr = IntPtr.Add(sourcePtr, mapSource.RowPitch);
destPtr = IntPtr.Add(destPtr, mapDest.Stride);
}
//Release source and dest locks
bitmap.UnlockBits(mapDest);
//Bitmap is saved in here!!!
#endregion
_device.ImmediateContext.UnmapSubresource(_screenTexture, 0);
screenResource.Dispose();
_duplicatedOutput.ReleaseFrame();
}
catch (SharpDXException e)
{
if (e.ResultCode.Code != SharpDX.DXGI.ResultCode.WaitTimeout.Result.Code)
throw;
}
It's a modified version from this one.
I also have this version (saves capture as pixel array):
//Get the desktop capture texture.
var data = _device.ImmediateContext.MapSubresource(_screenTexture, 0, MapMode.Read, MapFlags.None, out var stream);
var bytes = new byte[stream.Length];
//BGRA32 is 4 bytes.
for (var height = 0; height < Height; height++)
{
stream.Position = height * data.RowPitch;
Marshal.Copy(new IntPtr(stream.DataPointer.ToInt64() + height * data.RowPitch), bytes, height * Width * 4, Width * 4);
}
I'm not sure if it's the best way of saving the screen capture as image and/or pixel array, but it's somewhat working.
Anyway, the problem is that some frames captured are fully transparent and they are useless to me. I need to somehow avoid saving them at all.
When capturing as pixel array, I can simply check the bytes array, to know if the 4th item is 255 or 0. When saving as image, I could use the bitmap.GetPixel(0,0).A to know if the image has content or not.
But with both ways I need to finish the capture and get the full image content before being capable of knowing if the frame was dropped or not.
Is there any way to know if the frame was correctly captured?
You propblem boils down to you trying to do this on a timer. There is no way to guarantee a minimum execution time for every single tick/frame. And if the user picks a too high value, you get effects like this. At worst you might have ticks queue up in the EventQueue, until you run into a Exception because the queue is overfilled.
What you need to do is limit the rate to a maximum. If it does not work as fast as the user wants, that is the reality. I wrote some simple rate limiting code just for such a case:
integer interval = 20;
DateTime dueTime = DateTime.Now.AddMillisconds(interval);
while(true){
if(DateTime.Now >= dueTime){
//insert code here
//Update next dueTime
dueTime = DateTime.Now.AddMillisconds(interval);
}
else{
//Just yield to not tax out the CPU
Thread.Sleep(1);
}
}
Just two notes:
this was designed to run in a seperate thread. You have to run it as such, or adapt the Thread.Sleep() to your multitasking option of choice.
DateTime.Now is not suited for such small timeframes (2 digit ms). Usually the returned value will only update every 18 ms or so. It actually varies over time. 60 FPS puts you around 16ms. You should be using Stopwatch or something similar.
In order to ignore frame drops, I'm using this code (so far it's working as expected):
//Try to get the duplicated frame within given time.
_duplicatedOutput.AcquireNextFrame(1000, out var duplicateFrameInformation, out var screenResource);
//Somehow, it was not possible to retrieve the resource.
if (screenResource == null || duplicateFrameInformation.AccumulatedFrames == 0)
{
//Mark the frame as dropped.
frame.WasDropped = true;
FrameList.Add(frame);
screenResource?.Dispose();
_duplicatedOutput.ReleaseFrame();
return;
}
I'm simply checking if the screenResource is not null and if there are frames accumulated.
I have an image loaded into a Bitmap in C# with a gradient background from a document i scanned in.
An example of it could be like the picture below:
My goal in C# is now to remove the background so that I have a solid white background. Now I myself can't seem to find a way to do this. Is there a way to achieve this in a way?
Thanks in advance.
Here is a version using LockBits.
The premise is if it's not black then change it to white.
It will be magnitudes faster the GetPixel and SetPixel
It works with the raw data in memory using pointers
iterates through every pixel
Checks the color and changes it to white if needed
Saves the image
Note : obviously this will destroy any antialiasing and smoothing, it will fail for certain image types, and other assorted issues.
using (var bmp = new Bitmap(#"D:\Test.png"))
{
var data = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppPArgb);
var white = Color.White.ToArgb();
var black = Color.Black.ToArgb();
try
{
var length = (int*)data.Scan0 + bmp.Height * bmp.Width;
for (var p = (int*)data.Scan0; p < length; p++)
if (*p != black) *p = white;
}
finally
{
// unlock the bitmap
bmp.UnlockBits(data);
bmp.Save(#"D:\Output.Bmp", ImageFormat.Bmp);
}
}
Output
If you know the gradient colors (e.g. only part of RGB color responsible for red changes) or at least the color of text (e.g. if it is always black) then you can iterate through all of image's pixels and then:
Use GetPixel() to get pixel color.
Check if it is text (black).
If it is, then move to the next pixel.
If it isn't, then change color to white with SetPixel().
For gradient it should be enough. For more complex backgrounds it would need a more complex algorithm.
thats how i wrote your beautiful code(some simple changes for me for easier understanding)
private void Form1_Load(object sender, EventArgs e)
{
prev = GetDesktopImage();//get a screenshot of the desktop;
cur = GetDesktopImage();//get a screenshot of the desktop;
var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
ApplyXor(locked1, locked2);
compressionBuffer = new byte[1920* 1080 * 4];
// Compressed buffer -- where the data goes that we'll send.
int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;
backbuf = new CompressedCaptureScreen(backbufSize);
MessageBox.Show(compressionBuffer.Length.ToString());
int length = Compress();
MessageBox.Show(backbuf.Data.Length.ToString());//prints the new buffer size
}
the compression buffer length is for example 8294400
and the backbuff.Data.length is 8326947
I didn't like the compression suggestions, so here's what I would do.
You don't want to compress a video stream (so MPEG, AVI, etc are out of the question -- these don't have to be real-time) and you don't want to compress individual pictures (since that's just stupid).
Basically what you want to do is detect if things change and send the differences. You're on the right track with that; most video compressors do that. You also want a fast compression/decompression algorithm; especially if you go to more FPS that will become more relevant.
Differences. First off, eliminate all branches in your code, and make sure memory access is sequential (e.g. iterate x in the inner loop). The latter will give you cache locality. As for the differences, I'd probably use a 64-bit XOR; it's easy, branchless and fast.
If you want performance, it's probably better to do this in C++: The current C# implementation doesn't vectorize your code, and that will help you a great deal here.
Do something like this (I'm assuming 32bit pixel format):
for (int y=0; y<height; ++y) // change to PFor if you like
{
ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
for (int x=0; x<width; x += 2)
row2[x] ^= row1[x];
}
Fast compression and decompression usually means simpler compression algorithms. https://code.google.com/p/lz4/ is such an algorithm, and there's a proper .NET port available for that as well. You might want to read on how it works too; there is a streaming feature in LZ4 and if you can make it handle 2 images instead of 1 that will probably give you a nice compression boost.
All in all, if you're trying to compress white noise, it simply won't work and your frame rate will drop. One way to solve this is to reduce the colors if you have too much 'randomness' in a frame. A measure for randomness is entropy, and there are several ways to get a measure of the entropy of a picture ( https://en.wikipedia.org/wiki/Entropy_(information_theory) ). I'd stick with a very simple one: check the size of the compressed picture -- if it's above a certain limit, reduce the number of bits; if below, increase the number of bits.
Note that increasing and decreasing bits is not done with shifting in this case; you don't need your bits to be removed, you simply need your compression to work better. It's probably just as good to use a simple 'AND' with a bitmask. For example, if you want to drop 2 bits, you can do it like this:
for (int y=0; y<height; ++y) // change to PFor if you like
{
ulong* row1 = (ulong*)(image1BasePtr + image1Stride * y);
ulong* row2 = (ulong*)(image2BasePtr + image2Stride * y);
ulong mask = 0xFFFCFCFCFFFCFCFC;
for (int x=0; x<width; x += 2)
row2[x] = (row2[x] ^ row1[x]) & mask;
}
PS: I'm not sure what I would do with the alpha component, I'll leave that up to your experimentation.
Good luck!
The long answer
I had some time to spare, so I just tested this approach. Here's some code to support it all.
This code normally run over 130 FPS with a nice constant memory pressure on my laptop, so the bottleneck shouldn't be here anymore. Note that you need LZ4 to get this working and that LZ4 is aimed at high speed, not high compression ratio's. A bit more on that later.
First we need something that we can use to hold all the data we're going to send. I'm not implementing the sockets stuff itself here (although that should be pretty simple using this as a start), I mainly focused on getting the data you need to send something over.
// The thing you send over a socket
public class CompressedCaptureScreen
{
public CompressedCaptureScreen(int size)
{
this.Data = new byte[size];
this.Size = 4;
}
public int Size;
public byte[] Data;
}
We also need a class that will hold all the magic:
public class CompressScreenCapture
{
Next, if I'm running high performance code, I make it a habit to preallocate all the buffers first. That'll save you time during the actual algorithmic stuff. 4 buffers of 1080p is about 33 MB, which is fine - so let's allocate that.
public CompressScreenCapture()
{
// Initialize with black screen; get bounds from screen.
this.screenBounds = Screen.PrimaryScreen.Bounds;
// Initialize 2 buffers - 1 for the current and 1 for the previous image
prev = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
cur = new Bitmap(screenBounds.Width, screenBounds.Height, PixelFormat.Format32bppArgb);
// Clear the 'prev' buffer - this is the initial state
using (Graphics g = Graphics.FromImage(prev))
{
g.Clear(Color.Black);
}
// Compression buffer -- we don't really need this but I'm lazy today.
compressionBuffer = new byte[screenBounds.Width * screenBounds.Height * 4];
// Compressed buffer -- where the data goes that we'll send.
int backbufSize = LZ4.LZ4Codec.MaximumOutputLength(this.compressionBuffer.Length) + 4;
backbuf = new CompressedCaptureScreen(backbufSize);
}
private Rectangle screenBounds;
private Bitmap prev;
private Bitmap cur;
private byte[] compressionBuffer;
private int backbufSize;
private CompressedCaptureScreen backbuf;
private int n = 0;
First thing to do is capture the screen. This is the easy part: simply fill the bitmap of the current screen:
private void Capture()
{
// Fill 'cur' with a screenshot
using (var gfxScreenshot = Graphics.FromImage(cur))
{
gfxScreenshot.CopyFromScreen(screenBounds.X, screenBounds.Y, 0, 0, screenBounds.Size, CopyPixelOperation.SourceCopy);
}
}
As I said, I don't want to compress 'raw' pixels. Instead, I'd much rather compress XOR masks of previous and the current image. Most of the times this will give you a whole lot of 0's, which is easy to compress:
private unsafe void ApplyXor(BitmapData previous, BitmapData current)
{
byte* prev0 = (byte*)previous.Scan0.ToPointer();
byte* cur0 = (byte*)current.Scan0.ToPointer();
int height = previous.Height;
int width = previous.Width;
int halfwidth = width / 2;
fixed (byte* target = this.compressionBuffer)
{
ulong* dst = (ulong*)target;
for (int y = 0; y < height; ++y)
{
ulong* prevRow = (ulong*)(prev0 + previous.Stride * y);
ulong* curRow = (ulong*)(cur0 + current.Stride * y);
for (int x = 0; x < halfwidth; ++x)
{
*(dst++) = curRow[x] ^ prevRow[x];
}
}
}
}
For the compression algorithm I simply pass the buffer to LZ4 and let it do its magic.
private int Compress()
{
// Grab the backbuf in an attempt to update it with new data
var backbuf = this.backbuf;
backbuf.Size = LZ4.LZ4Codec.Encode(
this.compressionBuffer, 0, this.compressionBuffer.Length,
backbuf.Data, 4, backbuf.Data.Length-4);
Buffer.BlockCopy(BitConverter.GetBytes(backbuf.Size), 0, backbuf.Data, 0, 4);
return backbuf.Size;
}
One thing to note here is that I make it a habit to put everything in my buffer that I need to send over the TCP/IP socket. I don't want to move data around if I can easily avoid it, so I'm simply putting everything that I need on the other side there.
As for the sockets itself, you can use a-sync TCP sockets here (I would), but if you do, you will need to add an extra buffer.
The only thing that remains is to glue everything together and put some statistics on the screen:
public void Iterate()
{
Stopwatch sw = Stopwatch.StartNew();
// Capture a screen:
Capture();
TimeSpan timeToCapture = sw.Elapsed;
// Lock both images:
var locked1 = cur.LockBits(new Rectangle(0, 0, cur.Width, cur.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
var locked2 = prev.LockBits(new Rectangle(0, 0, prev.Width, prev.Height),
ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
try
{
// Xor screen:
ApplyXor(locked2, locked1);
TimeSpan timeToXor = sw.Elapsed;
// Compress screen:
int length = Compress();
TimeSpan timeToCompress = sw.Elapsed;
if ((++n) % 50 == 0)
{
Console.Write("Iteration: {0:0.00}s, {1:0.00}s, {2:0.00}s " +
"{3} Kb => {4:0.0} FPS \r",
timeToCapture.TotalSeconds, timeToXor.TotalSeconds,
timeToCompress.TotalSeconds, length / 1024,
1.0 / sw.Elapsed.TotalSeconds);
}
// Swap buffers:
var tmp = cur;
cur = prev;
prev = tmp;
}
finally
{
cur.UnlockBits(locked1);
prev.UnlockBits(locked2);
}
}
Note that I reduce Console output to ensure that's not the bottleneck. :-)
Simple improvements
It's a bit wasteful to compress all those 0's, right? It's pretty easy to track the min and max y position that has data using a simple boolean.
ulong tmp = curRow[x] ^ prevRow[x];
*(dst++) = tmp;
hasdata |= tmp != 0;
You also probably don't want to call Compress if you don't have to.
After adding this feature you'll get something like this on your screen:
Iteration: 0.00s, 0.01s, 0.01s 1 Kb => 152.0 FPS
Using another compression algorithm might also help. I stuck to LZ4 because it's simple to use, it's blazing fast and compresses pretty well -- still, there are other options that might work better. See http://fastcompression.blogspot.nl/ for a comparison.
If you have a bad connection or if you're streaming video over a remote connection, all this won't work. Best to reduce the pixel values here. That's quite simple: apply a simple 64-bit mask during the xor to both the previous and current picture... You can also try using indexed colors - anyhow, there's a ton of different things you can try here; I just kept it simple because that's probably good enough.
You can also use Parallel.For for the xor loop; personally I didn't really care about that.
A bit more challenging
If you have 1 server that is serving multiple clients, things will get a bit more challenging, as they will refresh at different rates. We want the fastest refreshing client to determine the server speed - not slowest. :-)
To implement this, the relation between the prev and cur has to change. If we simply 'xor' away like here, we'll end up with a completely garbled picture at the slower clients.
To solve that, we don't want to swap prev anymore, as it should hold key frames (that you'll refresh when the compressed data becomes too big) and cur will hold incremental data from the 'xor' results. This means you can basically grab an arbitrary 'xor'red frame and send it over the line - as long as the prev bitmap is recent.
H264 or Equaivalent Codec Streaming
There are various compressed streaming available which does almost everything that you can do to optimize screen sharing over network. There are many open source and commercial libraries to stream.
Screen transfer in Blocks
H264 already does this, but if you want to do it yourself, you have to divide your screens into smaller blocks of 100x100 pixels, and compare these blocks with previous version and send these blocks over network.
Window Render Information
Microsoft RDP does lot better, it does not send screen as a raster image, instead it analyzes screen and creates screen blocks based on the windows on the screen. It then analyzes contents of screen and sends image only if needed, if it is a text box with some text in it, RDP sends information to render text box with a text with font information and other information. So instead of sending image, it sends information on what to render.
You can combine all techniques and make a mixed protocol to send screen blocks with image and other rendering information.
Instead of handling data as an array of bytes, you can handle it as an array of integers.
int* p = (int*)((byte*)scan0.ToPointer() + y * stride);
int* p2 = (int*)((byte*)scan02.ToPointer() + y * stride2);
for (int x = 0; x < nWidth; x++)
{
//always get the complete pixel when differences are found
if (*p2 != 0)
*p = *p2
++p;
++p2;
}
I am modifying the ColorBasic Kinect example in order to display an image overlaid to the video stream. So what I've done is to load an image with transparent background (now a GIF but it may change), and write to the displayed bitmap.
The error I'm getting is that the buffer I'm writing to is too small.
I cannot see what the actual error is (I'm a complete newbie in XAML/C#/Kinect), but the WriteableBitmap is 1920x1080, and the bitmap I want to copy is 200x200, so why am I getting this error? I cannot see how a transparent background could be of any harm, but I am beginning to suspect that...
Note that without the last WritePixels, the code works and I see the webcam's output. My code follows.
The overlay image:
public BitmapImage overlay = new BitmapImage(new Uri("C:\\users\\user\\desktop\\something.gif"));
The callback function that displays the Kinect's webcam (see the default example ColorBasic) with my very small modifications:
private void Reader_ColorFrameArrived(object sender, ColorFrameArrivedEventArgs e)
{
// ColorFrame is IDisposable
using (ColorFrame colorFrame = e.FrameReference.AcquireFrame())
{
if (colorFrame != null)
{
FrameDescription colorFrameDescription = colorFrame.FrameDescription;
using (KinectBuffer colorBuffer = colorFrame.LockRawImageBuffer())
{
this.colorBitmap.Lock();
// verify data and write the new color frame data to the display bitmap
if ((colorFrameDescription.Width == this.colorBitmap.PixelWidth) && (colorFrameDescription.Height == this.colorBitmap.PixelHeight))
{
colorFrame.CopyConvertedFrameDataToIntPtr(
this.colorBitmap.BackBuffer,
(uint)(colorFrameDescription.Width * colorFrameDescription.Height * 4),
ColorImageFormat.Bgra);
this.colorBitmap.AddDirtyRect(new Int32Rect(0, 0, this.colorBitmap.PixelWidth, this.colorBitmap.PixelHeight));
}
if(this.overlay != null)
{
// Calculate stride of source
int stride = overlay.PixelWidth * (overlay.Format.BitsPerPixel / 8);
// Create data array to hold source pixel data
byte[] data = new byte[stride * overlay.PixelHeight];
// Copy source image pixels to the data array
overlay.CopyPixels(data, stride, 0);
this.colorBitmap.WritePixels(new Int32Rect(0, 0, overlay.PixelWidth, overlay.PixelHeight), data, stride, 0);
}
this.colorBitmap.Unlock();
}
}
}
}
Your overlay.Format.BitsPerPixel / 8 will be 1 (because it's a gif), but you're trying to copy it to something that is not a gif, probably BGRA (32 bit). Thus you got a huge difference in size (4x).
.WritePixels should take in the stride value of the destination buffer, but you past it the stride value of the overlay (this can cause weird problems as well).
And finally, even if it went 100% smooth your overlay will not actually "overlay" anything, it will replace -- since I don't see any alpha bending math in your code.
Switch your .gif to a .png (32bit) and see if that helps.
Also, if you're looking for an AlphaBltMerge type code: I wrote the entire thing here.. it's very easy to understand.
Merge 2 - 32bit Images with Alpha Channels
Is there a way to resize an image using GPU (graphic card) that is consumable through a .NET application?
I am looking for an extremely performant way to resize images and have heard that the GPU could do it much quicker than CPU (GDI+ using C#). Are there known implementations or sample code using the GPU to resize images that I could consume in .NET?
Have you thought about using XNA to resize your images? Here you can find out how to use XNA to save image as a png/jpeg to a MemoryStream and later reuse it a Bitmap object:
EDIT: I will post an example here (taken from the link above) on how you can possibly use XNA.
public static Image Texture2Image(Texture2D texture)
{
Image img;
using (MemoryStream MS = new MemoryStream())
{
texture.SaveAsPng(MS, texture.Width, texture.Height);
//Go To the beginning of the stream.
MS.Seek(0, SeekOrigin.Begin);
//Create the image based on the stream.
img = Bitmap.FromStream(MS);
}
return img;
}
I also found out today that you can OpenCV to use GPU/multicore CPUs. You can for example choose to use a .NET wrapper such as Emgu and and use its Image class to manipulate with your picture and return a .NET Bitmap class:
public static Bitmap ResizeBitmap(Bitmap sourceBM, int width, int height)
{
// Initialize Emgu Image object
Image<Bgr, Byte> img = new Image<Bgr, Byte>(sourceBM);
// Resize using liniear interpolation
img.Resize(width, height, INTER.CV_INTER_LINEAR);
// Return .NET Bitmap object
return img.ToBitmap();
}
I wrote a quick spike to check performance using WPF, though I cannot for sure say that its using the GPU.
Still, see below. This scales an image to 33.5 (or whatever) times its original size.
public void Resize()
{
double scaleFactor = 33.5;
var originalFileStream = System.IO.File.OpenRead(#"D:\SkyDrive\Pictures\Random\Misc\DoIt.jpg");
var originalBitmapDecoder = JpegBitmapDecoder.Create(originalFileStream, BitmapCreateOptions.None, BitmapCacheOption.OnLoad);
BitmapFrame originalBitmapFrame = originalBitmapDecoder.Frames.First();
var originalPixelFormat = originalBitmapFrame.Format;
TransformedBitmap transformedBitmap =
new TransformedBitmap(originalBitmapFrame, new System.Windows.Media.ScaleTransform()
{
ScaleX = scaleFactor,
ScaleY = scaleFactor
});
int stride = ((transformedBitmap.PixelWidth * transformedBitmap.Format.BitsPerPixel) + 7) / 8;
int pixelCount = (stride * (transformedBitmap.PixelHeight - 1)) + stride;
byte[] buffer = new byte[pixelCount];
transformedBitmap.CopyPixels(buffer, stride, 0);
WriteableBitmap transformedWriteableBitmap = new WriteableBitmap(transformedBitmap.PixelWidth, transformedBitmap.PixelHeight, transformedBitmap.DpiX, transformedBitmap.DpiY, transformedBitmap.Format, transformedBitmap.Palette);
transformedWriteableBitmap.WritePixels(new Int32Rect(0, 0, transformedBitmap.PixelWidth, transformedBitmap.PixelHeight), buffer, stride, 0);
BitmapFrame transformedFrame = BitmapFrame.Create(transformedWriteableBitmap);
var jpegEncoder = new JpegBitmapEncoder();
jpegEncoder.Frames.Add(transformedFrame);
using (var outputFileStream = System.IO.File.OpenWrite(#"C:\DATA\Scrap\WPF.jpg"))
{
jpegEncoder.Save(outputFileStream);
}
}
The image I was testing was 495 x 360. It resized it to over 16k x 12k in a couple of seconds, including save out.
It resizes to 1.5x around 165 times a second in a single-core run. On an i7 and the GPU seemingly doing nothing, CPU at 20% I'd expect to get 5x more when multithreaded.
Performance profiling shows a hot path to wpfgfx_v0400.dll which is the native WPF graphics library and is close to DirectX (look-up 'milcore' in Google).
So it might be accelerated, I don't know.
Luke
Yes, it is possible to use GPU to resize your images. This can be done using DirectX Surfaces (for example using SlimDx in C#). You should create a surface and move your image to it, and then you can stretch this surface to another target surface of your desired size using only GPU, and finally get back the resized image from the target surface. In these scenario, pixel format of the surfaces can be different and the GPU automatically handles it. But here there are things that can affect the performance of this operation. Moving data between GPU and CPU is a time consuming process. You can apply some techniques to boost performance based on your situation, and avoiding extra data transfer between CPU and GPU memory.