How to mitigate CPU to GPU bottleneck

How to mitigate CPU to GPU bottleneck - c#

I am using ComputeSharp library to run a compute shader on a very large set of data. The dataset is around 10GB, separated into smaller (about 3Gb) pieces for GPU to handle. The problem is that each piece takes about 1s to load, compute and return, even though the computation is almost instant.
I am looking for a way to speed this up, as now it gets outperformed by CPU in certain cases.
More details:
The dataset consists of custom points forming a point cloud. The shader is finding the points with highest values and using those to render an image. The max size of the point cloud will be about 500million points.
The points are already as small as they can be, saving all the metadata in a single int. Everything gets put in a buffer and passed to shader which spits out another buffer with result. I already tried and failed to use textures as they do not support custom types.
Edit (Minimal reproduction):
public struct DataPoint
{
public float3 Position;
public uint Value;
}
public void ComputeOneChunk(DataPoint[] dataPoints)
{
var stopWatch = new Stopwatch();
stopWatch.Start();
using var currentChunk = _gpu.AllocateReadOnlyBuffer(dataPoints);
stopWatch.Stop();
Debug.WriteLine($"Buffering took {stopWatch.ElapsedMilliseconds}ms");
stopWatch.Restart();
_gpu.For(dataPoints.Count, 1, new FindMax(
currentChunk,
resultBuffer));
stopWatch.Stop();
Debug.WriteLine($"Execution took {stopWatch.ElapsedMilliseconds}ms");
}
[AutoConstructor]
public readonly partial struct FindMaxForWell : IComputeShader
{
public readonly ReadOnlyBuffer<DataPoint> buffer;
public readonly ReadWriteBuffer<uint> resultBuffer;
public void Execute()
{
//DoStuff
}
}

Found a solution. It seems that converting dataPoints array to a buffer takes a long time and involves some intermediate steps. To speed this up you can use UploadBuffer which eliminates these extra steps.
_currentChunk = _gpu.AllocateReadOnlyBuffer<DataPoint>(uploadBuffer.Length);
_currentChunk.CopyFrom(uploadBuffer);
This changed 1s to around 300ms, which is probably as fast as it will ever go over PCIe.
Another problem appeared however, and that is that UploadBuffer uses Span and accessing it to save data is very slow, which I am still trying to solve.

Related

Store data inside long number or class instance for better performance

I'm writing an AI for my puzzle game and I'm facing the following situation:
Currently, I have a class, Move, which represents a move in my game, which has similiar logic to chess.
In the Move class, I'm storing the following data:
The move player color.
The moving piece.
The origin position on the board.
The destination position on the board.
The piece that has been killed by performing this move (if any).
The move score.
In addition, I got some methods which describes amove, such as IsResigned, Undo etc.
This move instance is being passed along in my AI, which is based on the Alpha Beta algorithm. Therfore, the move instance is being passed many times, and I'm constructing many many Move class instances along the AI implementation. Thus, I'm afriad that it may have significant inpact of the performance.
To reduce the performance inpact, I thought about the following solution:
Instead of using instances of the Move class, I'll store my entire move data inside a long number (using bitwise operations), and then will extract the information as needed.
For instance:
- Player color will be from bit 1 - 2 (1 bit).
- Oirign position will be from bit 2 - 12 (10 bits).
and so on.
See this example:
public long GenerateMove(PlayerColor color, int origin, int destination) {
return ((int)color) | (origin << 10) | (destination << 20);
}
public PlayerColor GetColor(long move) {
return move & 0x1;
}
public int GetOrigin(long move) {
return (int)((move >> 10) & 0x3f);
}
public int GetDestination(long move) {
return (int)((move >> 20) & 0x3f);
}
Using this method, I can pass along the AI just long numbers, instead of class instances.
However, I got some wonders: Put aside the added complexity to the program, class instances are being passed in C# by reference (i.e. by sending a pointer to that address). So does my alternative method even make sense? The case is even worse, since I'm using long numbers here (64bis), but the pointer address may be represented as an integer (32bits) - so it may even have worest performance than my current implementation.
What is your opinion about this alternative method?

There are a couple of things to say here:
Are you actually having performance problems (and are you sure memory usage is the reason)? Memory allocation for new instances is very cheap in .net and normally, you will not notice garbage collection. So you might be barking up the wrong tree here.
When you pass an instance of a reference type, you are just passing a reference; when you store a reference type (e.g. in an array), you will just store the reference. So unless you create a lot of distinct instances or copy the data into new instances, passing the reference does not increase heap size. So passing references might be the most efficient way to go.
If you create a lot of copies and discard them quickly and you are afraid of memory impact (again, do you face actual problems?), you can create value types (structinstead of class). But you have to be aware of the value type semantics (you are always working on copies).
You can not rely on a reference being 32 bit. On a 64 bit system, it will be 64 bit.
I would strongly advise against storing the data in an integer variable. It makes your code less maintainable and that is in most of the cases not worth the performance tradeoff. Unless you are in severe trouble, don't do it.
If you don't want to give up on the idea of using a numeric value, use at least a struct, that is composed of two System.Collections.Specialized.BitVector32 instances. This is a built in .NET type that will do the mask and shift operations for you. In that struct you can also encapsulate accessing the values in properties, so you can keep this rather unusual way of storing your values away from your other code.
UPDATE:
I would recommend you use a profiler to see where the performance problems are. It is almost impossible (and defenitely not a good use of your time) to use guesswork for performance optimization. Once you see the profiler results, you'll probably be surprised about the reason of your problems. I would bet that memory usage or memory allocation is not it.
In case you actually come to the conclusion that memory consumption of your Move instances is the reason and using small value types would solve the problem (I'd be surprised), don't use an Int64, use a custom struct (as described in 6.) like the following, that will be the same size as an Int64:
[System.Runtime.InteropServices.StructLayout( System.Runtime.InteropServices.LayoutKind.Sequential, Pack = 4 )]
public struct Move {
private static readonly BitVector32.Section SEC_COLOR = BitVector32.CreateSection( 1 );
private static readonly BitVector32.Section SEC_ORIGIN = BitVector32.CreateSection( 63, SEC_COLOR );
private static readonly BitVector32.Section SEC_DESTINATION = BitVector32.CreateSection( 63, SEC_ORIGIN );
private BitVector32 low;
private BitVector32 high;
public PlayerColor Color {
get {
return (PlayerColor)low[ SEC_COLOR ];
}
set {
low[ SEC_COLOR ] = (int)value;
}
}
public int Origin {
get {
return low[ SEC_ORIGIN ];
}
set {
low[ SEC_ORIGIN ] = value;
}
}
public int Destination {
get {
return low[ SEC_DESTINATION ];
}
set {
low[ SEC_DESTINATION ] = value;
}
}
}
But be aware that you are now using a value type, so you have to use it accordingly. That means assignments create copies of the original (i.e. changing the destination value will leave the source unchanged), using ref parameters if you want to persist changes made by subroutines and avoid boxing at any cost to prevent even worse performance (some operations can mean boxing even though you won't immediately notice, e.g. passing the struct that implements an interface as an argument of the interface type). Using structs (just as well as using Int64) will only be worth it when you create a lot of temporary values, which you quickly throw away. And then you'll still need to confirm with a profile that your performance is actually improved.

Reducing memory pressure with many small classes

I am struggling with memory problems in some legacy code.
The code performs various tasks with huge point clouds, all based around the following data structure:
public class Point
{
public double X { get; set; }
public double Y { get; set; }
public double Z { get; set; }
}
At any time, a couple millions of these points may hang around in memory in various lists. Many of these stay in memory long enough to go into generation 2 of the garbage collection. Since the program runs in 32bit mode, virtual address space is limited.
The programs using this legacy code sometimes crash with OutOfMemoryExceptions. Even if they do not crash, they consume far more memory than they should, and virtual address space is frequently fragmented to the point where no continuous chunk of memory larger than 50mb is available (e.g. MemoryFailPoint(50) fails). A couple of methods have explicit calls to GC.Collect(), and removing those increases the frequency of the crashes.
Now, I know of a two ways to solve this problem, both of which I cannot use:
Use a struct instead of a class for the points. Do not store those structs within a List, instead use arrays, to avoid copying the points with each access. Structs have far less overhead per instance than classes, and do not bother the garbage collector as much.
Unfortunately this would require huge breaking changes to the code; the existing methods all expect the point class to be mutable, and references to individual points are passed around everywhere. The copy-by-value semantics of structs will cause all sorts of problems.
Switch the whole app to 64bit. This would not reduce the memory, but would increase the virtual address space to a point where at least the app would not crash anymore.
Unfortunately there are a couple of legacy 32bit dlls that prevent this.
Is there any other way I could keep working with the existing Point class in 32bit, but reduce memory pressure and ease work for the garbage collector?
Can I somehow allocate and free all those points myself in unmanaged memory, while still passing references around in managed code?
Or is there another workaround I have missed?

In your comment, you already achieved significant improvement from 36 bytes down to 24 bytes by going from double to float:
I did some tests with float instead of double: I was able to save 33%
( the original point class is 24 byte + 12 byte overhead, with floats
it is 12 byte + 12 byte overhead.) With a struct I could save another
12 bytes for the class overhead. – HugoRune Sep 20 '16 at 16:19
You can get an additional 8 byte improvement by encapsulating a struct in your objects, and this achieves 16 bytes per Point (12 bytes for the floats with 4 bytes overhead), and saves an additional 22% for a total of 55%:
public struct PointFloat
{
public float X;
public float Y;
public float Z;
}
public class Point
{
private PointFloat dbls;
public float X
{
get { return dbls.X; }
set { dbls.X = value; }
}
public float Y
{
get { return dbls.Y; }
set { dbls.Y = value; }
}
public float Z
{
get { return dbls.Z; }
set { dbls.Z = value; }
}
}

Any way to workaround WPF's calling of GC.Collect(2) aside from reflection?

I recently had to check in this monstrosity into production code to manipulate private fields in a WPF class: (tl;dr how do I avoid having to do this?)
private static class MemoryPressurePatcher
{
private static Timer gcResetTimer;
private static Stopwatch collectionTimer;
private static Stopwatch allocationTimer;
private static object lockObject;
public static void Patch()
{
Type memoryPressureType = typeof(Duration).Assembly.GetType("MS.Internal.MemoryPressure");
if (memoryPressureType != null)
{
collectionTimer = memoryPressureType.GetField("_collectionTimer", BindingFlags.Static | BindingFlags.NonPublic)?.GetValue(null) as Stopwatch;
allocationTimer = memoryPressureType.GetField("_allocationTimer", BindingFlags.Static | BindingFlags.NonPublic)?.GetValue(null) as Stopwatch;
lockObject = memoryPressureType.GetField("lockObj", BindingFlags.Static | BindingFlags.NonPublic)?.GetValue(null);
if (collectionTimer != null && allocationTimer != null && lockObject != null)
{
gcResetTimer = new Timer(ResetTimer);
gcResetTimer.Change(TimeSpan.Zero, TimeSpan.FromMilliseconds(500));
}
}
}
private static void ResetTimer(object o)
{
lock (lockObject)
{
collectionTimer.Reset();
allocationTimer.Reset();
}
}
}
To understand why I would do something so crazy, you need to look at MS.Internal.MemoryPressure.ProcessAdd():
/// <summary>
/// Check the timers and decide if enough time has elapsed to
/// force a collection
/// </summary>
private static void ProcessAdd()
{
bool shouldCollect = false;
if (_totalMemory >= INITIAL_THRESHOLD)
{
// need to synchronize access to the timers, both for the integrity
// of the elapsed time and to ensure they are reset and started
// properly
lock (lockObj)
{
// if it's been long enough since the last allocation
// or too long since the last forced collection, collect
if (_allocationTimer.ElapsedMilliseconds >= INTER_ALLOCATION_THRESHOLD
|| (_collectionTimer.ElapsedMilliseconds > MAX_TIME_BETWEEN_COLLECTIONS))
{
_collectionTimer.Reset();
_collectionTimer.Start();
shouldCollect = true;
}
_allocationTimer.Reset();
_allocationTimer.Start();
}
// now that we're out of the lock do the collection
if (shouldCollect)
{
Collect();
}
}
return;
}
The important bit is near the end, where it calls the method Collect():
private static void Collect()
{
// for now only force Gen 2 GCs to ensure we clean up memory
// These will be forced infrequently and the memory we're tracking
// is very long lived so it's ok
GC.Collect(2);
}
Yes, that's WPF actually forcing a gen 2 garbage collection, which forces a full blocking GC. A naturally occurring GC happens without blocking on the gen 2 heap. What this means in practice is that whenever this method is called, our entire app locks up. The more memory your app is using, and the more fragmented your gen 2 heap is, the longer it will take. Our app presently caches quite a bit of data and can easily take up a gig of memory and the forced GC can lock up our app on a slow device for several seconds -- every 850 MS.
For despite the author's protestations to the contrary, it is easy to arrive at a scenario where this method is called with great frequency. This memory code of WPF's occurs when loading a BitmapSource from a file. We virtualize a listview with thousands of items where each item is represented by a thumbnail stored on disk. As we scroll down, we are dynamically loading in those thumbnails, and that GC is happening at maximum frequency. So scrolling becomes unbelievably slow and choppy with the app locking up constantly.
With that horrific reflection hack I mentioned up top, we force the timers to never be met, and thus WPF never forces the GC. Furthermore, there appear to be no adverse consequences -- memory grows as one scrolls and eventually a GC is triggered naturally without locking up the main thread.
Is there any other option to prevent those calls to GC.Collect(2) that is not so flagrantly hideous as my solution? Would love to get an explanation for what the concrete problems are that might arise from following through with this hack. By that I mean problems with avoiding the call to GC.Collect(2). (seems to me the GC occurring naturally ought to be sufficient)

Notice: Do this only if it causes a bottleneck in your app, and make sure you understand the consequences - See Hans's answer for a good explanation on why they put this in WPF in the first place.
You have some nasty code there trying to fix a nasty hack in the framework... As it's all static and called from multiple places in WPF, you can't really do better than use reflection to break it (other solutions would be much worse).
So don't expect a clean solution there. No such thing exists unless they change the WPF code.
But I think your hack could be simpler and avoid using a timer: just hack the _totalMemory value and you're done. It's a long, which means it can go to negative values. And very big negative values at that.
private static class MemoryPressurePatcher
{
public static void Patch()
{
var memoryPressureType = typeof(Duration).Assembly.GetType("MS.Internal.MemoryPressure");
var totalMemoryField = memoryPressureType?.GetField("_totalMemory", BindingFlags.Static | BindingFlags.NonPublic);
if (totalMemoryField?.FieldType != typeof(long))
return;
var currentValue = (long) totalMemoryField.GetValue(null);
if (currentValue >= 0)
totalMemoryField.SetValue(null, currentValue + long.MinValue);
}
}
Here, now your app would have to allocate about 8 exabytes before calling GC.Collect. Needless to say, if this happens you'll have bigger problems to solve. :)
If you're worried about the possibility of an underflow, just use long.MinValue / 2 as the offset. This still leaves you with 4 exabytes.
Note that AddToTotal actually performs bounds checking of _totalMemory, but it does this with a Debug.Assert here:
Debug.Assert(newValue >= 0);
As you'll be using a release version of the .NET Framework, these asserts will be disabled (with a ConditionalAttribute), so there's no need to worry about that.
You've asked what problems could arise with this approach. Let's take a look.
The most obvious one: MS changes the WPF code you're trying to hack.
Well, in that case, it pretty much depends on the nature of the change.
They change the type name/field name/field type: in that case, the hack will not be performed, and you'll be back to stock behavior. The reflection code is pretty defensive, it won't throw an exception, it just won't do anything.
They change the Debug.Assert call to a runtime check which is enabled in the release version. In that case, your app is doomed. Any attempt to load an image from disk will throw. Oops.
This risk is mitigated by the fact that their own code is pretty much a hack. They don't intend it to throw, it should go unnoticed. They want it to sit quiet and fail silently. Letting images load is a much more important feature which should not be impaired by some memory management code whose only purpose is to keep memory usage to a minimum.
In the case of your original patch in the OP, if they change the constant values, your hack may stop working.
They change the algorithm while keeping the class and field intact. Well... anything could happen, depending on the change.
Now, let's suppose the hack works and disables the GC.Collect call successfully.
The obvious risk in this case is increased memory usage. Since collections will be less frequent, more memory will be allocated at a given time. This should not be a big issue, since collections would still occur naturally when gen 0 fills up.
You'd also have more memory fragmentation, this is a direct consequence of fewer collections. This may or may not be a problem for you - so profile your app.
Fewer collections also means fewer objects are promoted to a higher generation. This is a good thing. Ideally, you should have short-lived objects in gen 0, and long-lived objects in gen 2. Frequent collections will actually cause short-lived objects to be promoted to gen 1 and then to gen 2, and you'll end up with many unreachable objects in gen 2. These will only be cleaned up with a gen 2 collection, will cause heap fragmentation, and will actually increase the GC time, since it'll have to spend more time compacting the heap. This is actually the primary reason why calling GC.Collect yourself is considered a bad practice - you're actively defeating the GC strategy, and this affects the whole application.
In any case, the correct approach would be to load the images, scale them down and display these thumbnails in your UI. All of this processing should be done in a background thread. In the case of JPEG images, load the embedded thumbnails - they may be good enough. And use an object pool so you don't need to instantiate new bitmaps each time, this completely bypasses the MemoryPressure class problem. And yes, that's exactly what the other answers suggest ;)

I think what you have is just fine. Well done, nice hack, Reflection is an awesome tool to fix wonky framework code. I've used it myself many times. Just limit its usage to the view that displays the ListView, it is too dangerous to have it active all the time.
Noodling a bit about the underlying problem, the horrid ProcessAdd() hack is of course very crude. It is a consequence of BitmapSource not implementing IDisposable. A questionable design decision, SO is filled with questions about it. However, about all of them are about the opposite problem, this timer not being quick enough to keep up. It just doesn't work very well.
There isn't anything you can do to change the way this code works. The values it works off are const declarations. Based on values that might have been appropriate 15 years ago, the probable age of this code. It starts at one megabyte and calls "10s of MB" a problem, life was simpler back then :) They forgot to write it so it scales properly, GC.AddMemoryPressure() would probably be fine today. Too little too late, they can't fix this anymore without dramatically altering program behavior.
You can certainly defeat the timer and avoid your hack. Surely the problem you have right now is that its Interval is about the same as the rate at which a user scrolls through the ListView when he doesn't read anything but just tries to find the record of interest. That's a UI design problem that's so common with list views with thousands of rows, an issue you probably don't want to address. What you need to do is cache the thumbnails, gathering the ones you know you'll likely need next. Best way to do that is to do so in a threadpool thread. Measure time while you do this, you can spend up to 850 msec. That code is however not going to be smaller than what you have now, not much prettier either.

.NET 4.6.2 will fix it by killing the MemoryPressure class alltogether. I just have checked the preview and my UI hangs are entirely gone.
.NET 4.6 implemements it
internal SafeMILHandleMemoryPressure(long gcPressure)
{
this._gcPressure = gcPressure;
this._refCount = 0;
GC.AddMemoryPressure(this._gcPressure);
}
whereas pre .NET 4.6.2 you had this crude MemoryPressure class which would force a GC.Collect every 850ms (if in between no WPF Bitmaps were allocated) or every 30s regardless how much WPF bitmaps you did allocate.
For reference the old handle was implemented like
internal SafeMILHandleMemoryPressure(long gcPressure)
{
    this._gcPressure = gcPressure;
    this._refCount = 0;
    if (this._gcPressure > 8192L)
    {
        MemoryPressure.Add(this._gcPressure); // Kills UI interactivity !!!!!
        return;
    }
    GC.AddMemoryPressure(this._gcPressure);
}
This makes a huge difference as you can see the GC suspension times drop dramatically in a simple test application I did write to repro the issue.
Here you see that the GC suspension times did drop from 2,71s down to 0,86s. This remains nearly constant even for multi GB managed heaps. This also increases overall application performance because now the background GC can do its work where it should: In the background. This prevents sudden halts of all managed threads which can continue to do work happily although the GC is cleaning things up. Not many people are aware what background GC gives them but this makes a real world difference of ca. 10-15% for common application workloads. If you have a multi GB managed application where a full GC can take seconds you will notice a dramatic improvement. In some tests were an application had a memory leak (5GB managed heap, full GC suspend time 7s) I did see 35s UI delays due to these forced GCs!

To the updated question regarding what the concrete problems are you may encounter by using the reflection approach, I think #HansPassant was thorough in his assessment of your specific approach. But more generally speaking, the risk you run with your current approach is the same risk you run with using any reflection against code you don't own; it can change underneath of you in the next update. So long as you're comfortable with that, the code you have should have negligible risk.
To hopefully answer the original question, there may be a way to workaround the GC.Collect(2) problem by minimizing the number of BitmapSource operations. Below is a sample app which illustrates my thought. Similar to what you described, it uses a virtualized ItemsControl to display thumbnails from disk.
While there may be others, the main point of interest is how the thumbnail images are constructed. The app creates a cache of WriteableBitmap objects upfront. As list items are requested by the UI, it reads the image from disk, using a BitmapFrame to retrieve the image information, primarily pixel data. A WriteableBitmap object is pulled from the cache, it's pixel data is overwritten, then it is assigned to the view-model. As existing list items fall out of view and are recycled, the WriteableBitmap object is returned to the cache for later reuse. The only BitmapSource-related activity incurred during that entire process is the actual loading of the image from disk.
It is worth noting that, that the image returned by the GetBitmapImageBytes() method must be exactly the same size as those in the WriteableBitmap cache for this pixel-overwrite approach to work; currently 256 x 256. For simplicity's sake, the bitmap images I used in my testing were already at this size, but it should be trivial to implement scaling, as needed.
MainWindow.xaml:
<Window x:Class="VirtualizedListView.MainWindow"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
Title="MainWindow" Height="500" Width="500">
<Grid>
<ItemsControl VirtualizingStackPanel.IsVirtualizing="True"
VirtualizingStackPanel.VirtualizationMode="Recycling"
VirtualizingStackPanel.CleanUpVirtualizedItem="VirtualizingStackPanel_CleanUpVirtualizedItem"
ScrollViewer.CanContentScroll="True"
ItemsSource="{Binding Path=Thumbnails}">
<ItemsControl.ItemTemplate>
<DataTemplate>
<Border BorderBrush="White" BorderThickness="1">
<Image Source="{Binding Image, Mode=OneTime}" Height="128" Width="128" />
</Border>
</DataTemplate>
</ItemsControl.ItemTemplate>
<ItemsControl.ItemsPanel>
<ItemsPanelTemplate>
<VirtualizingStackPanel />
</ItemsPanelTemplate>
</ItemsControl.ItemsPanel>
<ItemsControl.Template>
<ControlTemplate>
<Border BorderThickness="{TemplateBinding Border.BorderThickness}"
Padding="{TemplateBinding Control.Padding}"
BorderBrush="{TemplateBinding Border.BorderBrush}"
Background="{TemplateBinding Panel.Background}"
SnapsToDevicePixels="True">
<ScrollViewer Padding="{TemplateBinding Control.Padding}" Focusable="False">
<ItemsPresenter SnapsToDevicePixels="{TemplateBinding UIElement.SnapsToDevicePixels}" />
</ScrollViewer>
</Border>
</ControlTemplate>
</ItemsControl.Template>
</ItemsControl>
</Grid>
</Window>
MainWindow.xaml.cs:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Threading;
using System.Windows;
using System.Windows.Controls;
using System.Windows.Documents;
using System.Windows.Media;
using System.Windows.Media.Imaging;
using System.Windows.Threading;
namespace VirtualizedListView
{
public partial class MainWindow : Window
{
private const string ThumbnailDirectory = #"D:\temp\thumbnails";
private ConcurrentQueue<WriteableBitmap> _writeableBitmapCache = new ConcurrentQueue<WriteableBitmap>();
public MainWindow()
{
InitializeComponent();
DataContext = this;
// Load thumbnail file names
List<string> fileList = new List<string>(System.IO.Directory.GetFiles(ThumbnailDirectory));
// Load view-model
Thumbnails = new ObservableCollection<Thumbnail>();
foreach (string file in fileList)
Thumbnails.Add(new Thumbnail(GetImageForThumbnail) { FilePath = file });
// Create cache of pre-built WriteableBitmap objects; note that this assumes that all thumbnails
// will be the exact same size. This will need to be tuned for your needs
for (int i = 0; i <= 99; ++i)
_writeableBitmapCache.Enqueue(new WriteableBitmap(256, 256, 96d, 96d, PixelFormats.Bgr32, null));
}
public ObservableCollection<Thumbnail> Thumbnails
{
get { return (ObservableCollection<Thumbnail>)GetValue(ThumbnailsProperty); }
set { SetValue(ThumbnailsProperty, value); }
}
public static readonly DependencyProperty ThumbnailsProperty =
DependencyProperty.Register("Thumbnails", typeof(ObservableCollection<Thumbnail>), typeof(MainWindow));
private BitmapSource GetImageForThumbnail(Thumbnail thumbnail)
{
// Get the thumbnail data via the proxy in the other app domain
ImageLoaderProxyPixelData pixelData = GetBitmapImageBytes(thumbnail.FilePath);
WriteableBitmap writeableBitmap;
// Get a pre-built WriteableBitmap out of the cache then overwrite its pixels with the current thumbnail information.
// This avoids the memory pressure being set in this app domain, keeping that in the app domain of the proxy.
while (!_writeableBitmapCache.TryDequeue(out writeableBitmap)) { Thread.Sleep(1); }
writeableBitmap.WritePixels(pixelData.Rect, pixelData.Pixels, pixelData.Stride, 0);
return writeableBitmap;
}
private ImageLoaderProxyPixelData GetBitmapImageBytes(string fileName)
{
// All of the BitmapSource creation occurs in this method, keeping the calls to
// MemoryPressure.ProcessAdd() localized to this app domain
// Load the image from file
BitmapFrame bmpFrame = BitmapFrame.Create(new Uri(fileName));
int stride = bmpFrame.PixelWidth * bmpFrame.Format.BitsPerPixel;
byte[] pixels = new byte[bmpFrame.PixelHeight * stride];
// Construct and return the image information
bmpFrame.CopyPixels(pixels, stride, 0);
return new ImageLoaderProxyPixelData()
{
Pixels = pixels,
Stride = stride,
Rect = new Int32Rect(0, 0, bmpFrame.PixelWidth, bmpFrame.PixelHeight)
};
}
public void VirtualizingStackPanel_CleanUpVirtualizedItem(object sender, CleanUpVirtualizedItemEventArgs e)
{
// Get a reference to the WriteableBitmap before nullifying the property to release the reference
Thumbnail thumbnail = (Thumbnail)e.Value;
WriteableBitmap thumbnailImage = (WriteableBitmap)thumbnail.Image;
thumbnail.Image = null;
// Asynchronously add the WriteableBitmap back to the cache
Dispatcher.BeginInvoke((Action)(() =>
{
_writeableBitmapCache.Enqueue(thumbnailImage);
}), System.Windows.Threading.DispatcherPriority.Loaded);
}
}
// View-Model
public class Thumbnail : DependencyObject
{
private Func<Thumbnail, BitmapSource> _imageGetter;
private BitmapSource _image;
public Thumbnail(Func<Thumbnail, BitmapSource> imageGetter)
{
_imageGetter = imageGetter;
}
public string FilePath
{
get { return (string)GetValue(FilePathProperty); }
set { SetValue(FilePathProperty, value); }
}
public static readonly DependencyProperty FilePathProperty =
DependencyProperty.Register("FilePath", typeof(string), typeof(Thumbnail));
public BitmapSource Image
{
get
{
if (_image== null)
_image = _imageGetter(this);
return _image;
}
set { _image = value; }
}
}
public class ImageLoaderProxyPixelData
{
public byte[] Pixels { get; set; }
public Int32Rect Rect { get; set; }
public int Stride { get; set; }
}
}
As a benchmark, (for myself if no one else, I suppose) I've tested this approach on a 10-year old laptop with a Centrino processor and had almost no issue with fluidity in the UI.

I wish I could take credit for this, but I believe a better answer is already there: How can I prevent garbage collection from being called when calling ShowDialog on a xaml window?
Even from the code of the ProcessAdd method one can see that nothing gets executed if _totalMemory is small enough. So I think this code is much easier to use and with less side effects:
typeof(BitmapImage).Assembly
.GetType("MS.Internal.MemoryPressure")
.GetField("_totalMemory", BindingFlags.NonPublic | BindingFlags.Static)
.SetValue(null, Int64.MinValue / 2);
We need to understand, though, what the method is supposed to do, and the comment from the .NET source is pretty clear:
/// Avalon currently only tracks unmanaged memory pressure related to Images.
/// The implementation of this class exploits this by using a timer-based
/// tracking scheme. It assumes that the unmanaged memory it is tracking
/// is allocated in batches, held onto for a long time, and released all at once
/// We have profiled a variety of scenarios and found images do work this way
So my conclusion is that by disabling their code, you risk filling up your memory because of the way images are managed. However, since you know that the application you use is large and that it might need GC.Collect to be called, a very simple and safe fix would be for you to call it yourself, when you feel you can.
The code there tries to execute it every time the total memory used goes over a threshold, with a timer so it doesn't happen too often. That would be 30 seconds for them. So why don't you call GC.Collect(2) when you are closing forms or doing other things that would release the use of many images? Or when the computer is idle or the app is not in focus, etc?
I took the time to check where the _totalMemory value comes from and it seems that every time they create a WritableBitmap, they add the memory for it to _totalMemory, which is calculated here: http://referencesource.microsoft.com/PresentationCore/R/dca5f18570fed771.html as pixelWidth * pixelHeight * pixelFormat.InternalBitsPerPixel / 8 * 2; and further on in methods that work with Freezables. It is an internal mechanism to keep track of the memory allocated by the graphical representation of almost any WPF control.
It sounds to me as you could not only set _totalMemory to a very low value, but also hijack the mechanism. You could occasionally read the value, add to it the large value you subtracted initially, and get the actual value of memory used by drawn controls and decide whether you want to GC.Collect or not.

.NET, get memory used to hold struct instance

It's possible to determine memory usage (according to Jon Skeet's blog)
like this :
public class Program
{
private static void Main()
{
var before = GC.GetTotalMemory(true);
var point = new Point(1, 0);
var after = GC.GetTotalMemory(true);
Console.WriteLine("Memory used: {0} bytes", after - before);
}
#region Nested type: Point
private class Point
{
public int X;
public int Y;
public Point(int x, int y)
{
X = x;
Y = y;
}
}
#endregion
}
It prints Memory used: 16 bytes (I'm running x64 machine).
Consider we change Point declaration from class to struct. How then to determine memory used? Is is possible at all? I was unable to find anything about getting stack size in .NET
P.S
Yes, when changed to 'struct', Point instances will often be stored on Stack(not always), instead of Heap.Sorry for not posting it first time together with the question.
P.P.S
This situation has no practical usage at all(IMHO), It's just interesting for me whether it is possible to get Stack(short term storage) size. I was unable to find any info about it, so asked you, SO experts).

You won't see a change in GetTotalMemory if you create the struct the way you did, since it's going to be part of the thread's stack, and not allocated separately. The GetTotalMemory call will still work, and show you the total allocation size of your process, but the struct will not cause new memory to be allocated.
You can use sizeof(Type) or Marshal.SizeOf to return the size of a struct (in this case, 8 bytes).

There is special CPU register, ESP, that contains pointer to the top of the stack. Probably you can find a way to read this register from .Net (using some unsafe or external code). Then just compare value of this pointer at given moment with value at thread start - and difference between them will be more or less acurate amount of memory, used for thread's stack. Not sure if it really works, just an idea :)

In isolation, as you have done here, you might have a "reasonable" amount of success with this methodology. I am not confident the information is useful, but running this methodology, especially if you run it numerous times to ensure you did not have any other piece of code or GC action affecting the outcome. Utilizing this methodology in a real world application is less likely to give accurate results, however, as there are too many variables.
But realize, this only "reasonable" and not a surety.
Why do you need to know the size of objects? Just curious, as knowing the business reason may lead to other alternatives.

C# Static class member and System.Windows.Controls.Image performance issue

I am currently investigating a performance issue in an application and have highlighted the following;
I have a class -
public static class CommonIcons
{
...
public static readonly System.Windows.Media.ImageSource Attributes = typeof(CommonIcons).Assembly.GetImageFromResourcePath("Resources/attributes.png");
...
}
As a test harness I then have the following code using this class to show the issue -
for (int loop = 0; loop < 20000; loop++)
{
// store time before call
System.Windows.Controls.Image image = new System.Windows.Controls.Image
{
Source = CommonIcons.Attributes,
Width = 16,
Height = 16,
VerticalAlignment = VerticalAlignment.Center,
SnapsToDevicePixels = true
};
// store time after call
// log time between before and after
}
At the start of the loop the time difference is less than 0.001 seconds, but after 20000 goes this has increased to 0.015 seconds.
If I don't use the static member and directly refer to my icon, then I do not have the performance hit, i.e.
for (int loop = 0; loop < 20000; loop++)
{
// store time before call
System.Windows.Controls.Image image = new System.Windows.Controls.Image
{
Source = typeof(CommonIcons).Assembly.GetImageFromResourcePath("Resources/attributes.png"),
Width = 16,
Height = 16,
VerticalAlignment = VerticalAlignment.Center,
SnapsToDevicePixels = true
};
// store time after call
// log time between before and after
}
But in my real world program I don't want to be creating the imagesource on every call (increased memory until a garbage collection), hence why a static member is used. However I also cannot live with the performance hit.
Can someone explain why the original code is creating this performance hit? And also a better solution for what I am trying to do?
Thanks

It smells like something to do with garbage collection. I wonder whether there's some kind of coupling between the ImageSource and the Image which is causing problems in your first case. Have you looked to see what the memory usage of your test harness looks like in each case?
Out of interest, what happens if you set the Source to null at the end of each iteration? I know this is a bit silly, but then that's a natural corollary of it being a test harness :) It might be a further indication that it's a link between the source and the image...

Can you store only constant strings like "Resources/attributes.png" in your CommonIcons class ?

The difference is not between static member or not, but it is that in the first version you create 20000 images all having the same imagesource. I don't know exactly what is going on, but under the hood there may be delegates created automatically which handle communication between imagesource and image and everytime if an event in the imagesource occurs, 20000 clients needs to be notified, so this is a large performance hit.
In the second version, each of the 20000 created images have their own imagesource so you don't experience this overhead.
Note that you should always dispose graphical objects like Images with their Dispose()-method after you are done with them, this will speed up your application a bit and lowers your general memory usage.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.