Reading from PackagePart stream does not release memory - c#

In our application, we are reading an XPS file using the System.IO.Packaging.Package class. When we read from a stream of a PackagePart, we can see from the Task Manager that the application's memory consumption rises. However, when the reading is done, the memory consumption doesn't fall back to what it was before reading from the stream.
To illustrate the problem, I wrote a simple code sample that you can use in a stand alone wpf application.
public partial class Window1 : Window
{
public Window1()
{
InitializeComponent();
_package = Package.Open(#"c:\test\1000pages.xps", FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
private void ReadPackage()
{
foreach (PackagePart part in _package.GetParts())
{
using (Stream partStream = part.GetStream())
{
byte[] arr = new byte[partStream.Length];
partStream.Read(arr, 0, (int)partStream.Length);
partStream.Close();
}
}
}
Package _package;
private void Button_Click(object sender, RoutedEventArgs e)
{
ReadPackage();
}
}
The ReadPackage() method will read all the PackagePart objects' stream contents into a local array. In the sample, I used a 1000 page XPS document as the package source in order to easily see the memory consumption change of the application. On my machine, the stand alone app's memory consumption starts at 18MB then rises to 100MB after calling the method. Calling the method again can raise the memory consumption again but it can fall back to 100MB. However, it doesn't fall back to 18MB anymore.
Has anyone experienced this while using PackagePart? Or am I using it wrong? I think the internal implementation of PackagePart is caching the data that was read.
Thank you!

You do not specify how you measure the "memory consumption" of your application but perhaps you are using task manager? To get a better view of what is going on I suggest that you examine some performance counters for your application. Both .NET heap and general process memory performance counters are available.
If you really want to understand the details of how your application uses memory you can use the Microsoft CLR profiler.
What you see may be a result of the .NET heap expanding to accomodate a very large file. Big objects are placed on the Large Object Heap (LOH) and even if the .NET memory is garbage collected the free memory is never returned to the operating system. Also, objects on the LOH are never moved around during garbage collection and this may fragment the LOH exhausting the available address space even though there is plenty of free memory.
Has anyone experienced this while using PackagePart? Or am I using it wrong?
If you want to control the resources used by the package you are not using it in the best way. Packages are disposable and in general you should use it like this:
using (var package = Package.Open(#"c:\test\1000pages.xps", FileMode.Open, FileAccess.ReadWrite, FileShare.None)) {
// ... process the package
}
At the end of the using statement resources consumed by the package are either already released or can be garbage collected.
If you really want to keep the _package member of your form you should at some point call Close() (or IDisposable.Dispose()) to release the resources. Calling GC.Collect() is not recommended and will not necessarily be able to recycle the resources used by the package. Any managed memory (e.g. package buffers) that are reachable from _package will not be garbage collected no matter how often you try to force a garbage collection.

Related

Is this behavior due to unpredictable timings of garbage collection when it cleans up? [duplicate]

This question already has answers here:
IOException: The process cannot access the file 'file path' because it is being used by another process
(12 answers)
Closed 1 year ago.
When I change selection in a list, the dialog gets updated, reading from the file associated with that item in the list. I fixed the problem by using the using statement in below code.
private void ViewModel_SelectedNoteChanged(object sender, EventArgs e)
{
try
{
contentRichTextBox.Document.Blocks.Clear();
if (VM.SelectedNote != null)
{
if (!string.IsNullOrEmpty(VM.SelectedNote.FileLocation))
{
using (FileStream fileStream = new FileStream(VM.SelectedNote.FileLocation, FileMode.Open))
{
var contents = new TextRange(contentRichTextBox.Document.ContentStart, contentRichTextBox.Document.ContentEnd);
contents.Load(fileStream, DataFormats.Rtf);
}
}
}
}
catch(Exception ex)
{
MessageBox.Show(ex.Message);
}
}
If I don't use the using statement above, its hit or miss. Sometimes it reads the contents from a file and display them, otherwise it throws an exception.
System.IO.IOException: 'The process cannot access the file
'D:\EClone\bin\Debug\netcoreapp3.1\3.rtf'
because it is being used by another process.'
My question is why the unpredictable behavior without the using statement? Is it because of known unpredictability of garbage collection (when it decides to clean up), because we can never be sure when does the garbage collection cleans up the file? Is this a good example of non-deterministic nature of garbage collection? Or is it something else?
This is exactly due to incorrect usage of IDisposable and GC.
Without using
you open file (and thus lock it) and store a reference in a variable
the function exits, but the file is still opened and locked.
if GC runned before you will try to open file again - all works fine, because in a file wrapper's finalizer there is a code to close (and unlock) a file.
if you try to access the same file before GC run - you will get exception.
In case with using, you will close and unlock file explicitly by calling Dispose (using will call it for you). Dispose will close and unlock file immediately without waiting GC, so with using your code will always work as expected.
Moreover, as you can't control, when the GC will run, the code without using is unstable by design, so even if it works fine today, it may stop working tomorrow because of the system have more free memory (and GC runs no so often as today), or because .net update (with some GC changes/improvements) and so on.
So, as conclusion: if you have something, which implements IDisposable, always call Dispose when you not need the object anymore (with using or manually). You can omit calling Dispose only if it is explicitly allowed in documentation or guides (for example, as for Task class)

UWP AudioGraph : Garbage Collector causes clicks in the audio output

I have a C# UWP application that uses the AudioGraph API.
I use a custom effect on a MediaSourceAudioInputNode.
I followed the sample on this page :
https://learn.microsoft.com/en-us/windows/uwp/audio-video-camera/custom-audio-effects
It works but I can hear multiple clicks per second in the speakers when the custom effect is running.
Here is the code for my ProcessFrame method :
public unsafe void ProcessFrame(ProcessAudioFrameContext context)
{
if (context == null)
{
throw new ArgumentNullException(nameof(context));
}
AudioFrame frame = context.InputFrame;
using (AudioBuffer inputBuffer = frame.LockBuffer(AudioBufferAccessMode.Read))
using (IMemoryBufferReference inputReference = inputBuffer.CreateReference())
{
((IMemoryBufferByteAccess)inputReference).GetBuffer(out byte* inputDataInBytes, out uint inputCapacity);
Span<float> samples = new Span<float>(inputDataInBytes, (int)inputCapacity / sizeof(float));
for (int i = 0; i < samples.Length; i++)
{
float sample = samples[i];
// sample processing...
samples[i] = sample;
}
}
}
I used the Visual Studio profiler to identify the cause of the problem.
It is clear that there is a memory problem. The garbage collection runs several times each second. At each garbage collection, I can hear a click.
The Visual Studio profiler shows that the garbage-collected objects are type ProcessAudioFrameContext.
These objects are created by the AudioGraph API before entering the ProcessFrame method and passed as a parameter to the method.
Is there something that I can do to avoid these frequent garbage collections ?
The problem is not specific to custom effects, but it is a general problem with AudioGraph (current SDK is 1809).
Garbage collections can pause the AudioGraph thread for a too long time (more than 10ms, it is the default size of audio buffers). The result is that clicks can be heard in the audio output.
The use of custom effects puts a lot of pressure on the garbage collector.
I found a good workaround. It uses the GC.TryStartNoGCRegion method.
After this method is called, the clicks completely disappear. But the app keeps growing in memory until the GC.EndNoGCRegion method is called.
// at the beginning of playback...
// 240 Mb is the amount of memory that can be allocated before a GC occurs
GC.TryStartNoGCRegion(240 * 1024 * 1024, true);
// ... at the end of playback
GC.EndNoGCRegion();
MSDN doc :
https://learn.microsoft.com/fr-fr/dotnet/api/system.gc.trystartnogcregion?view=netframework-4.7.2
And a good article :
https://mattwarren.org/2016/08/16/Preventing-dotNET-Garbage-Collections-with-the-TryStartNoGCRegion-API/
the garbage collector is probably reacting to you initializing the sample temporary memory every frame, which is then released after the frame, try assign the memory for holding the samples in your start up code and just reuse it every frame.

Memory leak analysis and help requested

I've been using the methodology outlined by Shivprasad Koirala to check for memory leaks from code running inside a C# application (VoiceAttack). It basically involves using the Performance Monitor to track an application's private bytes as well as bytes in all heaps and compare these counters to assess if there is a leak and what type (managed/unmanaged). Ideally I need to test outside of Visual Studio, which is why I'm using this method.
The following portion of code generates the below memory profile (bear in mind the code has a little different format compared to Visual Studio because this is a function contained within the main C# application):
public void main()
{
string FilePath = null;
using (FileDialog myFileDialog = new OpenFileDialog())
{
myFileDialog.Title = "this is the title";
myFileDialog.FileName = "testFile.txt";
myFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
myFileDialog.FilterIndex = 1;
if (myFileDialog.ShowDialog() == DialogResult.OK)
{
FilePath = myFileDialog.FileName;
var extension = Path.GetExtension(FilePath);
var compareType = StringComparison.InvariantCultureIgnoreCase;
if (extension.Equals(".txt", compareType) == false)
{
FilePath = null;
VA.WriteToLog("Selected file is not a text file. Action canceled.");
}
else
VA.WriteToLog(FilePath);
}
else
VA.WriteToLog("No file selected. Action canceled.");
}
VA.WriteToLog("done");
}
You can see that after running this code the private bytes don't come back to the original count and the bytes in all heaps are roughly constant, which implies that there is a portion of unmanaged memory that was not released. Running this same inline function a few times consecutively doesn't cause further increases to the maximum observed private bytes or the unreleased memory. Once the main C# application (VoiceAttack) closes all the related memory (including the memory for the above code) is released. The bad news is that under normal circumstances the main application may be kept running indefinitely by the user, causing the allocated memory to remain unreleased.
For good measure I threw this same code into VS (with a pair of Thread.Sleep(5000) added before and after the using block for better graphical analysis) and built an executable to track with the Performance Monitor method, and the result is the same. There is an initial unmanaged memory jump for the OpenFileDialog and the allocated unmanaged memory never comes back down to the original value.
Does the memory and leak tracking methodology outlined above make sense? If YES, is there anything that can be done to properly release the unmanaged memory?
Does the memory and leak tracking methodology outlined above make sense?
No. You shouldn't expect unmanaged committed memory (Private Bytes) always be released. For instance processes have an unmanaged heap, which is managed to allow for subsequent allocations. And since Windows can page your committed memory, it isn't critical to minimize each processes committed memory.
If repeated calls don't increase memory use, you don't have a memory leak, you have delayed initialization. Some components aren't initialized until you use them, so their memory usage isn't being taken into account when you establish your baseline.

C# Possible Memory Leak?

So, I have an app, written in C# (vs2010), performing OCR using the tesseract 3.02 dll and Charles Weld's terreract .net wrapper.
I think I have a memory leak and it seems to be in the area of code where the Pix object is allocated. I am taking a PDF, converting that to a grayscale PNG, then loading that into a Pix object for OCR. When it works, it works really well. Image is large in size (5100 or so pixels in each dim) but not so large in size (only 500k or so).
My code:
Init engine at app startup:
private TesseractEngine engine = new TesseractEngine(#"./tessdata/", "eng+fra", EngineMode.Default);
Method to convert PDF to PNG, then calls:
// Load the image file created earlier into a Pix object.
Pix pixImage = Pix.LoadFromFile(Path.Combine(textBoxSourceFolder.Text, sourceFile));
And then calls the following:
// Perform OCR on the image referenced in the Pix object.
private String PerformImageOCR(Pix pixImage)
{
int safety = 0;
do
{
try
{
// Deskew the image.
pixImage = pixImage.Deskew();
//pixImage.Save(#"c:\temp\img_deskewed.png", ImageFormat.Png); // Debugging - verify image deskewed properly to allow good OCR.
string text = "";
// Use the tesseract OCR engine to process the image
using (var page = engine.Process(pixImage))
{
// and then extract the text.
text = page.GetText();
}
return text;
}
catch (Exception e)
{
MessageBox.Show(string.Format("There was an error performing OCR on image, Retrying.\n\nError:\n{0}", e.Message), "Error", MessageBoxButtons.OK);
}
} while (++safety < 3);
return string.Empty;
}
I have observed that memory usage jumps by about 31MB when the Pix object is created, then jumps again while OCR is being performed, then finally settles about 33MB higher than before it started. ie: if app, after loading, was consuming 50MB, loading the Pix object causes the memory usage to jump to about 81MB. Performing OCR will see it spike to 114+MB, then, once the process is complete and the results saved, the memory usage settles to about 84MB. Repeating this over many files in a folder will eventually cause the app to barf at 1.5GB or so consumed.
I think my code is okay, but there's something somewhere that's holding onto resources.
The tesseract and leptonica dlls are written in C and I have recompiled them with VS2010 along with the latest or recommended image lib versions, as appropriate. What I'm unsure of, is how to diagnose a memory leak in a C dll from a C# app using visual studio. If I were using Linux, I'd use a tool such as valgrind to help me spot the leak, but my leak sniffing skills on the windows side are sadly lacking. Looking for advice on how to proceed.
Reading your code here I do not see you disposing your Pix pixImage anywhere? That's what is taking up all the resources when you are processing x images.
Before you return your string result you should call the dispose method on your pixImage. That should reduce the amount of resources used by your program.
I'm not familliar with Tesseract or the wrapper, but for memory profiling issues, if you have Visual Studio 2012/2013, you can use the Performance Wizard. I know it's available in Ultimate, but not sure on other versions.
http://blogs.msdn.com/b/dotnet/archive/2013/04/04/net-memory-allocation-profiling-with-visual-studio-2012.aspx
It's either something in your code or something in the wrapper is not disposing an unmanaged object properly. My guess would be it's in the wrapper. Running the Performance Wizard or another C# memory profiler (like JetBrains DotTrace) may help you track it down.

Process Memory Size - Different Counters

I'm trying to find out how much memory my own .Net server process is using (for monitoring and logging purposes).
I'm using:
Process.GetCurrentProcess().PrivateMemorySize64
However, the Process object has several different properties that let me read the memory space used:
Paged, NonPaged, PagedSystem, NonPagedSystem, Private, Virtual, WorkingSet
and then the "peaks": which i'm guessing just store the maximum values these last ones ever took.
Reading through the MSDN definition of each property hasn't proved too helpful for me. I have to admit my knowledge regarding how memory is managed (as far as paging and virtual goes) is very limited.
So my question is obviously "which one should I use?", and I know the answer is "it depends".
This process will basically hold a bunch of lists in memory of things that are going on, while other processes communicate with it and query it for stuff. I'm expecting the server where this will run on to require lots of RAM, and so i'm querying this data over time to be able to estimate RAM requirements when compared to the sizes of the lists it keeps inside.
So... Which one should I use and why?
If you want to know how much the GC uses try:
GC.GetTotalMemory(true)
If you want to know what your process uses from Windows (VM Size column in TaskManager) try:
Process.GetCurrentProcess().PrivateMemorySize64
If you want to know what your process has in RAM (as opposed to in the pagefile) (Mem Usage column in TaskManager) try:
Process.GetCurrentProcess().WorkingSet64
See here for more explanation on the different sorts of memory.
OK, I found through Google the same page that Lars mentioned, and I believe it's a great explanation for people that don't quite know how memory works (like me).
http://shsc.info/WindowsMemoryManagement
My short conclusion was:
Private Bytes = The Memory my process has requested to store data. Some of it may be paged to disk or not. This is the information I was looking for.
Virtual Bytes = The Private Bytes, plus the space shared with other processes for loaded DLLs, etc.
Working Set = The portion of ALL the memory of my process that has not been paged to disk. So the amount paged to disk should be (Virtual - Working Set).
Thanks all for your help!
If you want to use the "Memory (Private Working Set)" as shown in Windows Vista task manager, which is the equivalent of Process Explorer "WS Private Bytes", here is the code. Probably best to throw this infinite loop in a thread/background task for real-time stats.
using System.Threading;
using System.Diagnostics;
//namespace...class...method
Process thisProc = Process.GetCurrentProcess();
PerformanceCounter PC = new PerformanceCounter();
PC.CategoryName = "Process";
PC.CounterName = "Working Set - Private";
PC.InstanceName = thisProc.ProcessName;
while (true)
{
String privMemory = (PC.NextValue()/1000).ToString()+"KB (Private Bytes)";
//Do something with string privMemory
Thread.Sleep(1000);
}
To get the value that Task Manager gives, my hat's off to Mike Regan's solution above. However, one change: it is not: perfCounter.NextValue()/1000; but perfCounter.NextValue()/1024; (i.e. a real kilobyte). This gives the exact value you see in Task Manager.
Following is a full solution for displaying the 'memory usage' (Task manager's, as given) in a simple way in your WPF or WinForms app (in this case, simply in the title). Just call this method within the new Window constructor:
private void DisplayMemoryUsageInTitleAsync()
{
origWindowTitle = this.Title; // set WinForms or WPF Window Title to field
BackgroundWorker wrkr = new BackgroundWorker();
wrkr.WorkerReportsProgress = true;
wrkr.DoWork += (object sender, DoWorkEventArgs e) => {
Process currProcess = Process.GetCurrentProcess();
PerformanceCounter perfCntr = new PerformanceCounter();
perfCntr.CategoryName = "Process";
perfCntr.CounterName = "Working Set - Private";
perfCntr.InstanceName = currProcess.ProcessName;
while (true)
{
int value = (int)perfCntr.NextValue() / 1024;
string privateMemoryStr = value.ToString("n0") + "KB [Private Bytes]";
wrkr.ReportProgress(0, privateMemoryStr);
Thread.Sleep(1000);
}
};
wrkr.ProgressChanged += (object sender, ProgressChangedEventArgs e) => {
string val = e.UserState as string;
if (!string.IsNullOrEmpty(val))
this.Title = string.Format(#"{0} ({1})", origWindowTitle, val);
};
wrkr.RunWorkerAsync();
}`
Is this a fair description? I'd like to share this with my team so please let me know if it is incorrect (or incomplete):
There are several ways in C# to ask how much memory my process is using.
Allocated memory can be managed (by the CLR) or unmanaged.
Allocated memory can be virtual (stored on disk) or loaded (into RAM pages)
Allocated memory can be private (used only by the process) or shared (e.g. belonging to a DLL that other processes are referencing).
Given the above, here are some ways to measure memory usage in C#:
1) Process.VirtualMemorySize64(): returns all the memory used by a process - managed or unmanaged, virtual or loaded, private or shared.
2) Process.PrivateMemorySize64(): returns all the private memory used by a process - managed or unmanaged, virtual or loaded.
3) Process.WorkingSet64(): returns all the private, loaded memory used by a process - managed or unmanaged
4) GC.GetTotalMemory(): returns the amount of managed memory being watched by the garbage collector.
Working set isn't a good property to use. From what I gather, it includes everything the process can touch, even libraries shared by several processes, so you're seeing double-counted bytes in that counter. Private memory is a much better counter to look at.
I'd suggest to also monitor how often pagefaults happen. A pagefault happens when you try to access some data that have been moved from physical memory to swap file and system has to read page from disk before you can access this data.

Categories

Resources