Ram optimization SDK like RAMRush, for image processing - c#

I have a problem in my application, it needs a heavy memory optimization, but for now i have to deliver a test version of the application, i need something like "ramrush" as api can called from my application without any executables, Ramrush is solving my problem magically now, but i can't deliver it with my app.

To solve the OP's immediate needs (even if it's a bit late today)
C# memory profile
Microsoft DebugDiag, see blog post
There are two levels of memory optimization in image processing code.
At the easier level, the programmer tries to delete image objects that are no longer needed, as frequently and early as possible (i.e. after every line of code).
The more difficult level requires implementing some image processing steps as memory-efficient pipelines.
One example:
class RgbToGray : ImageSource
{
private ImageSource m_src;
public RgbToGray(ImageSource src)
{
m_src = src;
}
public void GetPixels(int x0, int y0, int rectWidth, int rectHeight, out Pixel[,] result)
{
// omitted: validate parameters
Pixel[,] temp = new Pixel[rectHeight, rectWidth];
m_src.GetPixels(x0, y0, rectWidth, rectHeight, out temp);
for (int y = y0; y < y0 + rectHeight; ++y)
{
for (int x = x0; x < x0 + rectWidth; ++x)
{
result[y,x] = SomeCalculation(temp[y,x]);
}
}
}
};
In this example, the RgbToGray performs an image processing on-demand, without requiring its own permanent memory usage, but instead relies on a temporary buffer. For this implementation to work, the caller must call GetPixels with a particular "buffer granularity" which must be smaller than the full image size, in order to achieve the memory-saving effect.
(The particular framework I have in mind is Windows Imaging Component, but similar ideas can be found in many other frameworks such as OpenCV's MatExpr template.)
Apparently, not all frameworks allow such optimization.

Related

Memory Allocation Time (The Fast Way)

For a really simple code snippet, I'm trying to see how much of the time is spent actually allocating objects on the small object heap (SOH).
static void Main(string[] args)
{
const int noNumbers = 10000000; // 10 mil
ArrayList numbers = new ArrayList();
Random random = new Random(1); // use the same seed as to make
// benchmarking consistent
for (int i = 0; i < noNumbers; i++)
{
int currentNumber = random.Next(10); // generate a non-negative
// random number less than 10
object o = currentNumber; // BOXING occurs here
numbers.Add(o);
}
}
In particular, I want to know how much time is spent allocating space for the all the boxed int instances on the heap (I know, this is an ArrayList and there's horrible boxing going on as well - but it's just for educational purposes).
The CLR has 2 ways of performing memory allocations on the SOH: either calling the JIT_TrialAllocSFastMP (for multi-processor systems, ...SFastSP for single processor ones) allocation helper - which is really fast since it consists of a few assembly instructions - or failing back to the slower JIT_New allocation helper.
PerfView sees just fine the JIT_New being invoked:
However, I can't figure out which - if any - is the native function involved for the "quick way" of allocating. I certainly don't see any JIT_TrialAllocSFastMP. I've already tried raising the count of the loop (from 10 to 500 mil), in the hope of increasing my chances of of getting a glimpse of a few stacks containing the elusive function, but to no avail.
Another approach was to use JetBrains dotTrace (line-by-line) performance viewer, but it falls short of what I want: I do get to see the approximate time it takes the boxing operation for each int, but 1) it's just a bar and 2) there's both the allocation itself and the copying of the value (of which the latter is not what I'm after).
Using the JetBrains dotTrace Timeline viewer won't work either, since they currently don't (quite) support native callstacks.
At this point it's unclear to me if there's a method being dynamically generated and called when JIT_TrialAllocSFastMP is invoked - and by miracle neither of the PerfView-collected stack frames (one every 1 ms) ever capture it -, or somehow the Main's method body gets patched, and those few assembly instructions mentioned above are somehow injected directly in the code. It's also hard to believe that the fast way of allocating memory is never called.
You could ask "But you already have the .NET Core CLR code, why can't you figure out yourself ?". Since the .NET Framework CLR code is not publicly available, I've looked into its sibling, the .NET Core version of the CLR (as Matt Warren recommends in his step 6 here). The \src\vm\amd64\JitHelpers_InlineGetThread.asm file contains a JIT_TrialAllocSFastMP_InlineGetThread function. The issue is that parsing/understanding the C++ code there is above my grade, and also I can't really think of a way to "Step Into" and see how the JIT-ed code is generated, since this is way lower-level that your usual Press-F11-in-Visual-Studio.
Update 1: Let's simplify the code, and only consider individual boxed int values:
const int noNumbers = 10000000; // 10 mil
object o = null;
for (int i=0;i<noNumbers;i++)
{
o = i;
}
Since this is a Release build, and dead code elimination could kick in, WinDbg is used to check the final machine code.
The resulting JITed code, whose main loop is highlighted in blue below, which simply does repeated boxing, shows that the method that handles the memory allocation is not inlined (note the call to hex address 00af30f4):
This method in turn tries to allocate via the "fast" way, and if that fails, goes back to the "slow" way of a call to JIT_New itself):
It's interesting how the call stack in PerfView obtained from the code above doesn't show any intermediary method between the level of Main and the JIT_New entry itself (given that Main doesn't directly call JIT_New):

Windows Forms Parallel Drawing

Is it possible to draw on a panel using a Parallel.For loop? I have a multidimensional array of doubles, double[][] plants, and I want to draw the columns in parallel, where each entry in the array is drawn as a rectangle in a grid on the panel.
On the line with grapahics.FillRectangle(), I keep getting this error when I try:
An exception of type 'System.InvalidOperationException' occurred in System.Drawing.dll but was not handled in user code
Additional information: Object is currently in use elsewhere.
Here is the code I am using:
Parallel.For(0, simWidth, i =>
{
Color plantColor;
RectangleF plantRectangle= new Rectangle();
SolidBrush plantBrush = new SolidBrush(Color.Black);
for (int j = 0; j < simHeight; ++j)
{
int r, g = 255, b;
r = b = (int)(255 * (Math.Tanh(simulation.plants[i, j]) + 1) / 2.0);
plantColor = Color.FromArgb(100, r, g, b);
plantBrush.Color = plantColor;
plantRectangle.Location = new PointF(i * cellSize, j * cellSize);
graphics.FillRectangle(plantBrush, plantRectangle);
}
plantBrush.Dispose();
});
I think what is happening is that the graphics object cannot handle multiple calls at once. Is there any way around this? I tried creating a local reference to the graphics object in each parallel call but that did not work.
Is it possible to draw on a panel using a Parallel.For loop?
No, not in any way that would actually be useful.
UI objects, such as a Panel, have "thread affinity". That is, they are owned by a particular thread, and must only ever be used in that thread.
GDI+ objects, like your Graphics object (you don't say where you got that object, but one hopes it was passed to you in PaintEventArgsā€¦if not, you have other design flaws in your code) can be more forgiving, but are not thread-safe. You could add synchronization around the actual uses of the object, but those uses are the slow part. Serializing them will negate most of the benefit of concurrency in the code.
Your question does not make clear whether your use of Parallel here was even an actual attempt to address some specific performance problem, never mind what that problem actually was. There are numerous questions with answers on Stack Overflow that discuss various techniques for improving rendering performance in Windows Forms code.
In general, most of these techniques involve reducing the total amount of work done by caching as much as possible. Based on the code you've shown, there are at least two things you might want to do:
Cache the computations for the rectangles and colors. You can even do that part of the computation with Parallel, whenever the underlying parameters change.
Draw everything into a Bitmap object. This will have to be done single-threaded, but a) it doesn't have to be done in the UI thread that owns your UI objects, and b) you (again) can do this just once, whenever the underlying parameters change. Having drawn into a Bitmap, then you can just draw the Bitmap object when the Paint event occurs, instead of having to re-render everything from scratch.

Image.RotateFlip leaking memory :/

Although I have been programming for about 11 years(mostly VB6, last 6 months C#), it's THE first time to actually ask a question :) I have found all my answers from teh interwebz but this issue i can't solve myself. Your site is among the most helpful places i have got the best answers from!
I will show the code i'm using (an extract of what's relevant). Problem is that when using RotateFlip method then the memory is increasing rapidly to ~200M and then get's collected by GC after some time. The main method calling it iterates about 30 times per second so the performance is of utmost importance here. I have tried using graphics matrix transform but this sometimes fails and shows non-flipped image. The application itself is based on using a webcam, hiding the preview, taking the callback picture and showing it in picturebox. It then overlays a rectangle on if from another class. That's the reason to use callback and not preview window.
Capture.cs class:
internal Bitmap LiveImage;
int ISampleGrabberCB.BufferCB(double bufferSize, IntPtr pBuffer, int bufferLen)
{
LiveImage = new Bitmap(_width, _height, _stride, PixelFormat.Format24bppRgb, pBuffer);
if (ExpImg) // local bool, used rarely when the picture saving is triggered
{
LiveImage.RotateFlip(RotateFlipType.RotateNoneFlipY);
var a = LiveImage.Clone(new Rectangle(Currect.Left, Currect.Top, Currect.Width, Currect.Height),
LiveImage.PixelFormat);
using (a)
a.Save("ocr.bmp", ImageFormat.Bmp);
}
else // dmnit, rotateflip leaks like h*ll but matrix transform doesn't sometimes flip :S
{
LiveImage.RotateFlip(RotateFlipType.RotateNoneFlipY);
/*using (var g = Graphics.FromImage(LiveImage))
{
g.Transform = _mtx;
g.DrawImage(LiveImage, 0, 0);
}*/
}
GC.Collect(); // gotta use it with rotateflip, otherwise it gets crazy big, like ~200M :O
return 0;
}
}
In main form i have an event that's updating the picture in the picturebox:
private void SetPic()
{
pctCamera.Image = _cam.LiveImage;
_cam.PicIsFree = false;
}
Because i need to get the image to main form which is in another class then i figured the most logical is the exposed Bitmap which is updated on every callback frame.
The reason i don't want to use matrix transform is because it's slower and sometimes with this speed it fails to flip the image and the frequency of such behavior is quite different with different PC's with different hardware capabilities and CPU speeds, also the fastest framerate 30fps with a 1.2GHz CPU shows this very frequently.
So, can you help me to figure it out? I'm not actually using it in current version, i'm using the commented-out matrix transform because i feel bad for using GC.Collect :(
Thank You!!!
pctCamera.Image = _cam.LiveImage;
Heavy memory usage like you observe is a sure sign that you missed an opportunity to call Dispose() somewhere, letting the unmanaged resources (memory mostly) used by a bitmap get released early instead of letting the garbage collector do it. The quoted statement is one such case, you are not disposing the old image referenced by the picture box. Fix:
if (pctCamera.Image != null) pctCamera.Image.Dispose();
pctCamera.Image = _cam.LiveImage;
You can rewrite your code like this:
internal Bitmap LiveImage;
int ISampleGrabberCB.BufferCB(double bufferSize, IntPtr pBuffer, int bufferLen)
{
using (LiveImage = new Bitmap(_width, _height, _stride, PixelFormat.Format24bppRgb, pBuffer))
{
LiveImage.RotateFlip(RotateFlipType.RotateNoneFlipY);
if (ExpImg) // local bool, used rarely when the picture saving is triggered
{
var a = LiveImage.Clone(new Rectangle(Currect.Left, Currect.Top, Currect.Width, Currect.Height),
LiveImage.PixelFormat);
using (a)
a.Save("ocr.bmp", ImageFormat.Bmp);
}
}
return 0;
}
Bitmap is an Image class, and implements the IDispose. As you create Bitmap each time, I suggest to use using statement for automatically freeing the resources.
GC.Collect is there for this situation. Collecting the data is the ONLY way to free it and when creating HUGE bitmaps its the way to go. Does a GC.Collect really slow things down?
Other then that you should keep the number of bitmap copies as low as possible.

.NET, get memory used to hold struct instance

It's possible to determine memory usage (according to Jon Skeet's blog)
like this :
public class Program
{
private static void Main()
{
var before = GC.GetTotalMemory(true);
var point = new Point(1, 0);
var after = GC.GetTotalMemory(true);
Console.WriteLine("Memory used: {0} bytes", after - before);
}
#region Nested type: Point
private class Point
{
public int X;
public int Y;
public Point(int x, int y)
{
X = x;
Y = y;
}
}
#endregion
}
It prints Memory used: 16 bytes (I'm running x64 machine).
Consider we change Point declaration from class to struct. How then to determine memory used? Is is possible at all? I was unable to find anything about getting stack size in .NET
P.S
Yes, when changed to 'struct', Point instances will often be stored on Stack(not always), instead of Heap.Sorry for not posting it first time together with the question.
P.P.S
This situation has no practical usage at all(IMHO), It's just interesting for me whether it is possible to get Stack(short term storage) size. I was unable to find any info about it, so asked you, SO experts).
You won't see a change in GetTotalMemory if you create the struct the way you did, since it's going to be part of the thread's stack, and not allocated separately. The GetTotalMemory call will still work, and show you the total allocation size of your process, but the struct will not cause new memory to be allocated.
You can use sizeof(Type) or Marshal.SizeOf to return the size of a struct (in this case, 8 bytes).
There is special CPU register, ESP, that contains pointer to the top of the stack. Probably you can find a way to read this register from .Net (using some unsafe or external code). Then just compare value of this pointer at given moment with value at thread start - and difference between them will be more or less acurate amount of memory, used for thread's stack. Not sure if it really works, just an idea :)
In isolation, as you have done here, you might have a "reasonable" amount of success with this methodology. I am not confident the information is useful, but running this methodology, especially if you run it numerous times to ensure you did not have any other piece of code or GC action affecting the outcome. Utilizing this methodology in a real world application is less likely to give accurate results, however, as there are too many variables.
But realize, this only "reasonable" and not a surety.
Why do you need to know the size of objects? Just curious, as knowing the business reason may lead to other alternatives.

C# Static class member and System.Windows.Controls.Image performance issue

I am currently investigating a performance issue in an application and have highlighted the following;
I have a class -
public static class CommonIcons
{
...
public static readonly System.Windows.Media.ImageSource Attributes = typeof(CommonIcons).Assembly.GetImageFromResourcePath("Resources/attributes.png");
...
}
As a test harness I then have the following code using this class to show the issue -
for (int loop = 0; loop < 20000; loop++)
{
// store time before call
System.Windows.Controls.Image image = new System.Windows.Controls.Image
{
Source = CommonIcons.Attributes,
Width = 16,
Height = 16,
VerticalAlignment = VerticalAlignment.Center,
SnapsToDevicePixels = true
};
// store time after call
// log time between before and after
}
At the start of the loop the time difference is less than 0.001 seconds, but after 20000 goes this has increased to 0.015 seconds.
If I don't use the static member and directly refer to my icon, then I do not have the performance hit, i.e.
for (int loop = 0; loop < 20000; loop++)
{
// store time before call
System.Windows.Controls.Image image = new System.Windows.Controls.Image
{
Source = typeof(CommonIcons).Assembly.GetImageFromResourcePath("Resources/attributes.png"),
Width = 16,
Height = 16,
VerticalAlignment = VerticalAlignment.Center,
SnapsToDevicePixels = true
};
// store time after call
// log time between before and after
}
But in my real world program I don't want to be creating the imagesource on every call (increased memory until a garbage collection), hence why a static member is used. However I also cannot live with the performance hit.
Can someone explain why the original code is creating this performance hit? And also a better solution for what I am trying to do?
Thanks
It smells like something to do with garbage collection. I wonder whether there's some kind of coupling between the ImageSource and the Image which is causing problems in your first case. Have you looked to see what the memory usage of your test harness looks like in each case?
Out of interest, what happens if you set the Source to null at the end of each iteration? I know this is a bit silly, but then that's a natural corollary of it being a test harness :) It might be a further indication that it's a link between the source and the image...
Can you store only constant strings like "Resources/attributes.png" in your CommonIcons class ?
The difference is not between static member or not, but it is that in the first version you create 20000 images all having the same imagesource. I don't know exactly what is going on, but under the hood there may be delegates created automatically which handle communication between imagesource and image and everytime if an event in the imagesource occurs, 20000 clients needs to be notified, so this is a large performance hit.
In the second version, each of the 20000 created images have their own imagesource so you don't experience this overhead.
Note that you should always dispose graphical objects like Images with their Dispose()-method after you are done with them, this will speed up your application a bit and lowers your general memory usage.

Categories

Resources