I am trying to figure out how the MemoryCache should be used in order to avoid getting out of memory exceptions. I come from ASP.Net background where the cache manages it's own memory usage so I expect that MemoryCache would do the same. This does not appear to be the case as illustrated in the bellow test program I made:
class Program
{
static void Main(string[] args)
{
var cache = new MemoryCache("Cache");
for (int i = 0; i < 100000; i++)
{
AddToCache(cache, i);
}
Console.ReadLine();
}
private static void AddToCache(MemoryCache cache, int i)
{
var key = "File:" + i;
var contents = System.IO.File.ReadAllBytes("File.txt");
var policy = new CacheItemPolicy
{
SlidingExpiration = TimeSpan.FromHours(12)
};
policy.ChangeMonitors.Add(
new HostFileChangeMonitor(
new[] { Path.GetFullPath("File.txt") }
.ToList()));
cache.Add(key, contents, policy);
Console.Clear();
Console.Write(i);
}
}
The above throws an out of memory exception after approximately reaching 2GB of memory usage (Any CPU) or after consuming all my machine's physical memory (x64)(16GB).
If I remove the cache.Add bit the program throws no exception. If I include a call to cache.Trim(5) after every cache add I see that it releases some memory and it keeps aproximately 150 objects in the cache at any given time (from cache.GetCount()).
Is calling cache.Trim my program's responsibility? If so when should it be called (like how can my program know that the memory is getting full)? How do you calculate the percentage argument?
Note: I am planning to use the MemoryCache in a long running windows service so it is critical for it to have proper memory management.
MemoryCache has a background thread that periodically estimates how much memory the process is using and how many keys are in the cache. When it thinks you are getting close to the cachememorylimit, it will Trim the cache. Each time this background thread runs, it checks to see how close you are to the limits, and it will increase the polling frequency under memory pressure.
If you add items very quickly, the background thread doesn't have a chance to run, and you can run out of memory before the cache can trim and GC can run (in a x64 process this can result in massive heap size and multi minute GC pauses). The trim process/memory estimation is also known to have bugs under some conditions.
If your program is prone to out of memory due to rapidly loading an excessive number of objects, something with a bounded size like an LRU cache is a much better strategy. LRU typically uses a policy based on item count to evict the least recently used items.
I wrote a thread safe implementation of TLRU (a time aware least recently used policy), that you can easily use as a drop in replacement for ConcurrentDictionary.
It's available on Github here: https://github.com/bitfaster/BitFaster.Caching
Install-Package BitFaster.Caching
Using it would look like something this for your program, and it will not run out of memory (depending on how big your files are):
class Program
{
static void Main(string[] args)
{
int capacity = 80;
TimeSpan timeToLive = TimeSpan.FromMinutes(5);
var lru = new ConcurrentTLru<int, byte[]>(capacity, timeToLive);
for (int i = 0; i < 100000; i++)
{
var value = lru.GetOrAdd(1, (k) => System.IO.File.ReadAllBytes("File.txt"));
}
Console.ReadLine();
}
}
If you really want to avoid running out of memory, you should also consider reading the files into a RecyclableMemoryStream, and using the Scoped class in BitFaster to make the cached values thread safe and avoid races on dispose.
Related
One of our programs suffered from a severe memory leak: its process memory rose by 1 GB per day at a customer site.
I could set up the scenario in our test center, and could get a memory leak of some 700 MB per day.
This application is a Windows service written in C# which communicates with devices over a CAN bus.
The memory leak does not depend on the rate of data the application writes to the CAN bus. But it clearly depends on the number of messages received.
The "unmanaged" side of reading the messages is:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct CAN_MSG
{
public uint time_stamp;
public uint id;
public byte len;
public byte rtr;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 8)]
public byte[] a_data;
}
[DllImport("IEICAN02.dll", EntryPoint = "#3")]
public static extern int CAN_CountMsgs(ushort card_idx, byte can_no, byte que_type);
//ICAN_API INT32 _stdcall CAN_CountMsgs(UINT16 card_idx, UINT8 can_no,UINT8 que_type);
[DllImport("IEICAN02.dll", EntryPoint = "#10")]
public static extern int CAN_ReadMsg(ushort card_idx, byte can_no, ushort count, [MarshalAs(UnmanagedType.LPArray), Out()] CAN_MSG[] msg);
//ICAN_API INT32 _stdcall CAN_ReadMsg(UINT16 card_idx, UINT8 can_no, UINT16 count, CAN_MSG* p_obj);
We use essentially as follows:
private void ReadMessages()
{
while (keepRunning)
{
// get the number of messages in the queue
int messagesCounter = ICAN_API.CAN_CountMsgs(_CardIndex, _PortIndex, ICAN_API.CAN_RX_QUE);
if (messagesCounter > 0)
{
// create an array of appropriate size for those messages
CAN_MSG[] canMessages = new CAN_MSG[messagesCounter];
// read them
int actualReadMessages = ICAN_API.CAN_ReadMsg(_CardIndex, _PortIndex, (ushort)messagesCounter, canMessages);
// transform them into "our" objects
CanMessage[] messages = TransformMessages(canMessages);
Thread thread = new Thread(() => RaiseEventWithCanMessages(messages))
{
Priority = ThreadPriority.AboveNormal
};
thread.Start();
}
Thread.Sleep(20);
}
}
// transformation process:
new CanMessage
{
MessageData = (byte[])messages[i].a_data.Clone(),
MessageId = messages[i].id
};
The loop is executed once per every ~30 milliseconds.
When I call RaiseEventWithCanMessages(messages) in the same thread, the memory leak disappears (well, not completely, some 10 MB per day - i.e. about 1% of the original leak - remain, but that other leak is likely unrelated).
I do not understand how this creation of threads can lead to a memory leak. Can you provide me with some information how the memory leak is caused?
Addendum 2018-08-16:
The application starts of with some 50 MB of memory, and crashes at some 2GB. That means, that Gigabytes of memory are available for most of the time.
Also, CPU is at some 20% - 3 out of 4 cores are idle.
The number of threads used by the application remains rather constant around ~30 threads.
Overall, there are plenty of resources available for the Garbage Collection. Still, GC fails.
With some 30 threads per second, and a memory leak of 700 MB per day, on average ~300 bytes of memory leak per freshly created thread; with ~5 messages per new thread, some ~60bytes per message. The "unmanaged" struct does not make it into the new thread, its contents are copied into a newly instantiated class.
So: why does GC fail despite the enormous amount of resources available for it?
You're creating 2 arrays and a thread every ~30 milliseconds, without any coordination between them. The arrays could be a problem, but frankly I'm much more worried about the thread - creating threads is really, really expensive. You should not be creating them this frequently.
I'm also concerned about what happens if the read loop is out-pacing the thread - i.e. if RaiseEventWithCanMessages takes more time than the code that does the query/sleep. In that scenario, you'd have a constant growth of threads. And you'd probably also have all the various RaiseEventWithCanMessages fighting with each-other.
The fact that putting RaiseEventWithCanMessages inline "fixes" it suggests that the main problem here is either the sheer number of threads being created (bad), or the many overlapping and growing numbers of concurrent RaiseEventWithCanMessages.
The simplest fix would be: don't use the extra threads here.
If you actually want concurrent operations, I would have exactly two threads here - one that does the query, and one that does whatever RaiseEventWithCanMessages is, both in a loop. I would then coordinate between the threads such that the query thread waits for the previous RaiseEventWithCanMessages thing to be complete, such that it hands it over in a coordinated style - so there is always at most one outstanding RaiseEventWithCanMessages, and you stop running queries if it isn't keeping up.
Essentially:
CanMessage[] messages = TransformMessages(canMessages);
HandToConsumerBlockingUntilAvailable(messages); // TODO: implement
with the other thread basically doing:
var nextMessages = BlockUntilAvailableFromProducer(); // TODO: implement
A very basic implementation of this could be just:
void HandToConsumerBlockingUntilAvailable(CanMessage[] messages) {
lock(_queue) {
if(_queue.Length != 0) Monitor.Wait(_queue); // block until space
_queue.Enqueue(messages);
if(queue.Length == 1) Monitor.PulseAll(_queue); // wake consumer
}
}
CanMessage[] BlockUntilAvailableFromProducer() {
lock(_queue) {
if(_queue.Length == 0) Monitor.Wait(_queue); // block until work
var next = _queue.Dequeue();
Monitor.Pulse(_queue); // wake producer
return _next;
}
}
private readonly Queue<CanMessage[]> _queue = new Queue<CanMessage[]>;
This implementation enforces that there is no more than 1 outstanding unprocessed Message[] in the queue.
This addresses the issues of creating lots of threads, and the issues of the query loop out-pacing the RaiseEventWithCanMessages code.
I might also look into using the ArrayPool<T>.Shared for leasing oversized arrays (meaning: you need to be careful not to read more data than you've actually written, since you might have asked for an array of 500 but been given one of size 512), rather than constantly allocating arrays.
I'm developing an application (.NET 4.0, C#) that:
1. Scans file system.
2. Opens and reads some files.
The app will work in background and should have low impact on the disk usage. It shouldn't bother users if they are doing their usual tasks and the disk usage is high. And vice versa, the app can go faster if nobody is using the disk.
The main issue is I don't know real amount and size of I/O operations because of using API (mapi32.dll) to read files. If I ask API to do something I don't know how many bytes it reads to handle my response.
So the question is how to monitor and manage the disk usage? Including file system scanning and files reading...
Check performance counters that are used by standard Performance Monitor tool? Or any other ways?
Using the System.Diagnostics.PerformanceCounter class, attach to the PhysicalDisk counter related to the drive that you are indexing.
Below is some code to illustrate, although its currently hard coded to the "C:" drive. You will want to change "C:" to whichever drive your process is scanning. (This is rough sample code only to illustrate the existence of performance counters - don't take it as providing accurate information - should always be used as a guide only. Change for your own purpose)
Observe the % Idle Time counter which indicates how often the drive is doing anything.
0% idle means the disk is busy, but does not necessarily mean that it is flat-out and cannot transfer more data.
Combine the % Idle Time with Current Disk Queue Length and this will tell you if the drive is getting so busy that it cannot service all the requests for data. As a general guideline, anything over 0 means the drive is probably flat-out busy and anything over 2 means the drive is completely saturated. These rules apply to both SSD and HDD fairly well.
Also, any value that you read is an instantaneous value at a point in time. You should do a running average over a few results, e.g. take a reading every 100ms and average 5 readings before using the information from the result to make a decision (i.e., waiting until the counters settle before making your next IO request).
internal DiskUsageMonitor(string driveName)
{
// Get a list of the counters and look for "C:"
var perfCategory = new PerformanceCounterCategory("PhysicalDisk");
string[] instanceNames = perfCategory.GetInstanceNames();
foreach (string name in instanceNames)
{
if (name.IndexOf("C:") > 0)
{
if (string.IsNullOrEmpty(driveName))
driveName = name;
}
}
_readBytesCounter = new PerformanceCounter("PhysicalDisk",
"Disk Read Bytes/sec",
driveName);
_writeBytesCounter = new PerformanceCounter("PhysicalDisk",
"Disk Write Bytes/sec",
driveName);
_diskQueueCounter = new PerformanceCounter("PhysicalDisk",
"Current Disk Queue Length",
driveName);
_idleCounter = new PerformanceCounter("PhysicalDisk",
"% Idle Time",
driveName);
InitTimer();
}
internal event DiskUsageResultHander DiskUsageResult;
private void InitTimer()
{
StopTimer();
_perfTimer = new Timer(_updateResolutionMillisecs);
_perfTimer.Elapsed += PerfTimerElapsed;
_perfTimer.Start();
}
private void PerfTimerElapsed(object sender, ElapsedEventArgs e)
{
float diskReads = _readBytesCounter.NextValue();
float diskWrites = _writeBytesCounter.NextValue();
float diskQueue = _diskQueueCounter.NextValue();
float idlePercent = _idleCounter.NextValue();
if (idlePercent > 100)
{
idlePercent = 100;
}
if (DiskUsageResult != null)
{
var stats = new DiskUsageStats
{
DriveName = _readBytesCounter.InstanceName,
DiskQueueLength = (int)diskQueue,
ReadBytesPerSec = (int)diskReads,
WriteBytesPerSec = (int)diskWrites,
DiskUsagePercent = 100 - (int)idlePercent
};
DiskUsageResult(stats);
}
}
A long term ago Microsoft Research published a paper on this (sorry I can’t remember the url).
From what I recall:
The program started off doing very few "work items".
They measured how long it took for each of their "work item".
After running for a bit, they could work out how fast an "work item" was with no load on the system.
From then on, if the "work item" were fast (e.g. no other programmers making requests), they made more requests, otherwise they backed-off
The basic ideal is:
“if they are slowing me down, then I
must be slowing them down, so do less
work if I am being slowed down”
Something to ponder: what if there are other processes which follow the same (or a similar) strategy? Which one would run during the "idle time"? Would the other processes get a chance to make use of the idle time at all?
Obviously this can't be done correctly unless there is some well-known OS mechanism for fairly dividing resources during idle time. In windows, this is done by calling SetPriorityClass.
This document about I/O prioritization in Vista seems to imply that IDLE_PRIORITY_CLASS will not really lower the priority of I/O requests (though it will reduce the scheduling priority for the process). Vista added new PROCESS_MODE_BACKGROUND_BEGIN and PROCESS_MODE_BACKGROUND_END values for that.
In C#, you can normally set the process priority with the Process.PriorityClass property. The new values for Vista are not available though, so you'll have to call the Windows API function directly. You can do that like this:
[DllImport("kernel32.dll", CharSet=CharSet.Auto, SetLastError=true)]
public static extern bool SetPriorityClass(IntPtr handle, uint priorityClass);
const uint PROCESS_MODE_BACKGROUND_BEGIN = 0x00100000;
static void SetBackgroundMode()
{
if (!SetPriorityClass(new IntPtr(-1), PROCESS_MODE_BACKGROUND_BEGIN))
{
// handle error...
}
}
I did not test the code above. Don't forget that it can only work on Vista or better. You'll have to use Environment.OSVersion to check for earlier operating systems and implement a fall-back strategy.
See this question and this also for related queries. I would suggest for a simple solution just querying for the current disk & CPU usage % every so often, and only continue with the current task when they are under a defined threshold. Just make sure your work is easily broken into tasks, and that each task can be easily & efficiently start/stopped.
Check if the screensaver is running ? Good indication that the user is away from the keyboard
I have a relatively large system (~25000 lines so far) for monitoring radio-related devices. It shows graphs and such using latest version of ZedGraph.
The program is coded using C# on VS2010 with Win7.
The problem is:
when I run the program from within VS, it runs slow
when I run the program from the built EXE, it runs slow
when I run the program though Performance Wizard / CPU Profiler, it runs Blazing Fast.
when I run the program from the built EXE, and then start VS and Attach a profiler to ANY OTHER PROCESS, my program speeds up!
I want the program to always run that fast!
Every project in the solution is set to RELEASE, Debug unmanaged code is DISABLED, Define DEBUG and TRACE constants is DISABLED, Optimize Code - I tried either, Warning Level - I tried either, Suppress JIT - I tried either,
in short I tried all the solutions already proposed on StackOverflow - none worked. Program is slow outside profiler, fast in profiler.
I don't think the problem is in my code, because it becomes fast if I attach the profiler to other, unrelated process as well!
Please help!
I really need it to be that fast everywhere, because it's a business critical application and performance issues are not tolerated...
UPDATES 1 - 8 follow
--------------------Update1:--------------------
The problem seems to Not be ZedGraph related, because it still manifests after I replaced ZedGraph with my own basic drawing.
--------------------Update2:--------------------
Running the program in a Virtual machine, the program still runs slow, and running profiler from the Host machine doesn't make it fast.
--------------------Update3:--------------------
Starting screen capture to video also speeds the program up!
--------------------Update4:--------------------
If I open the Intel graphics driver settings window (this thing: http://www.intel.com/support/graphics/sb/img/resolution_new.jpg)
and just constantly hover with the cursor over buttons, so they glow, etc, my program speeds up!.
It doesn't speed up if I run GPUz or Kombustor though, so no downclocking on the GPU - it stays steady 850Mhz.
--------------------Update5:--------------------
Tests on different machines:
-On my Core i5-2400S with Intel HD2000, UI runs slow and CPU usage is ~15%.
-On a colleague's Core 2 Duo with Intel G41 Express, UI runs fast, but CPU usage is ~90% (which isn't normal either)
-On Core i5-2400S with dedicated Radeon X1650, UI runs blazing fast, CPU usage is ~50%.
--------------------Update6:--------------------
A snip of code showing how I update a single graph (graphFFT is an encapsulation of ZedGraphControl for ease of use):
public void LoopDataRefresh() //executes in a new thread
{
while (true)
{
while (!d.Connected)
Thread.Sleep(1000);
if (IsDisposed)
return;
//... other graphs update here
if (signalNewFFT && PanelFFT.Visible)
{
signalNewFFT = false;
#region FFT
bool newRange = false;
if (graphFFT.MaxY != d.fftRangeYMax)
{
graphFFT.MaxY = d.fftRangeYMax;
newRange = true;
}
if (graphFFT.MinY != d.fftRangeYMin)
{
graphFFT.MinY = d.fftRangeYMin;
newRange = true;
}
List<PointF> points = new List<PointF>(2048);
int tempLength = 0;
short[] tempData = new short[2048];
int i = 0;
lock (d.fftDataLock)
{
tempLength = d.fftLength;
tempData = (short[])d.fftData.Clone();
}
foreach (short s in tempData)
points.Add(new PointF(i++, s));
graphFFT.SetLine("FFT", points);
if (newRange)
graphFFT.RefreshGraphComplete();
else if (PanelFFT.Visible)
graphFFT.RefreshGraph();
#endregion
}
//... other graphs update here
Thread.Sleep(5);
}
}
SetLine is:
public void SetLine(String lineTitle, List<PointF> values)
{
IPointListEdit ip = zgcGraph.GraphPane.CurveList[lineTitle].Points as IPointListEdit;
int tmp = Math.Min(ip.Count, values.Count);
int i = 0;
while(i < tmp)
{
if (values[i].X > peakX)
peakX = values[i].X;
if (values[i].Y > peakY)
peakY = values[i].Y;
ip[i].X = values[i].X;
ip[i].Y = values[i].Y;
i++;
}
while(ip.Count < values.Count)
{
if (values[i].X > peakX)
peakX = values[i].X;
if (values[i].Y > peakY)
peakY = values[i].Y;
ip.Add(values[i].X, values[i].Y);
i++;
}
while(values.Count > ip.Count)
{
ip.RemoveAt(ip.Count - 1);
}
}
RefreshGraph is:
public void RefreshGraph()
{
if (!explicidX && autoScrollFlag)
{
zgcGraph.GraphPane.XAxis.Scale.Max = Math.Max(peakX + grace.X, rangeX);
zgcGraph.GraphPane.XAxis.Scale.Min = zgcGraph.GraphPane.XAxis.Scale.Max - rangeX;
}
if (!explicidY)
{
zgcGraph.GraphPane.YAxis.Scale.Max = Math.Max(peakY + grace.Y, maxY);
zgcGraph.GraphPane.YAxis.Scale.Min = minY;
}
zgcGraph.Refresh();
}
.
--------------------Update7:--------------------
Just ran it through the ANTS profiler. It tells me that the ZedGraph refresh counts when the program is fast are precisely two times higher compared to when it's slow.
Here are the screenshots:
I find it VERY strange that, considering the small difference in the length of the sections, performance differs twice with mathematical precision.
Also, I updated the GPU driver, that didn't help.
--------------------Update8:--------------------
Unfortunately, for a few days now, I'm unable to reproduce the issue... I'm getting constant acceptable speed (which still appear a bit slower than what I had in the profiler two weeks ago) which isn't affected by any of the factors that used to affect it two weeks ago - profiler, video capturing or GPU driver window. I still have no explanation of what was causing it...
Luaan posted the solution in the comments above, it's the system wide timer resolution. Default resolution is 15.6 ms, the profiler sets the resolution to 1ms.
I had the exact same problem, very slow execution that would speed up when the profiler was opened. The problem went away on my PC but popped back up on other PCs seemingly at random. We also noticed the problem disappeared when running a Join Me window in Chrome.
My application transmits a file over a CAN bus. The app loads a CAN message with eight bytes of data, transmits it and waits for an acknowledgment. With the timer set to 15.6ms each round trip took exactly 15.6ms and the entire file transfer would take about 14 minutes. With the timer set to 1ms round trip time varied but would be as low as 4ms and the entire transfer time would drop to less than two minutes.
You can verify your system timer resolution as well as find out which program increased the resolution by opening a command prompt as administrator and entering:
powercfg -energy duration 5
The output file will have the following in it somewhere:
Platform Timer Resolution:Platform Timer Resolution
The default platform timer resolution is 15.6ms (15625000ns) and should be used whenever the system is idle. If the timer resolution is increased, processor power management technologies may not be effective. The timer resolution may be increased due to multimedia playback or graphical animations.
Current Timer Resolution (100ns units) 10000
Maximum Timer Period (100ns units) 156001
My current resolution is 1 ms (10,000 units of 100nS) and is followed by a list of the programs that requested the increased resolution.
This information as well as more detail can be found here: https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/
Here is some code to increase the timer resolution (originally posted as the answer to this question: how to set timer resolution from C# to 1 ms?):
public static class WinApi
{
/// <summary>TimeBeginPeriod(). See the Windows API documentation for details.</summary>
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Interoperability", "CA1401:PInvokesShouldNotBeVisible"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Security", "CA2118:ReviewSuppressUnmanagedCodeSecurityUsage"), SuppressUnmanagedCodeSecurity]
[DllImport("winmm.dll", EntryPoint = "timeBeginPeriod", SetLastError = true)]
public static extern uint TimeBeginPeriod(uint uMilliseconds);
/// <summary>TimeEndPeriod(). See the Windows API documentation for details.</summary>
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Interoperability", "CA1401:PInvokesShouldNotBeVisible"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Security", "CA2118:ReviewSuppressUnmanagedCodeSecurityUsage"), SuppressUnmanagedCodeSecurity]
[DllImport("winmm.dll", EntryPoint = "timeEndPeriod", SetLastError = true)]
public static extern uint TimeEndPeriod(uint uMilliseconds);
}
Use it like this to increase resolution :WinApi.TimeBeginPeriod(1);
And like this to return to the default :WinApi.TimeEndPeriod(1);
The parameter passed to TimeEndPeriod() must match the parameter that was passed to TimeBeginPeriod().
There are situations when slowing down a thread can speed up other threads significantly, usually when one thread is polling or locking some common resource frequently.
For instance (this is a windows-forms example) when the main thread is checking overall progress in a tight loop instead of using a timer, for example:
private void SomeWork() {
// start the worker thread here
while(!PollDone()) {
progressBar1.Value = PollProgress();
Application.DoEvents(); // keep the GUI responisive
}
}
Slowing it down could improve performance:
private void SomeWork() {
// start the worker thread here
while(!PollDone()) {
progressBar1.Value = PollProgress();
System.Threading.Thread.Sleep(300); // give the polled thread some time to work instead of responding to your poll
Application.DoEvents(); // keep the GUI responisive
}
}
Doing it correctly, one should avoid using the DoEvents call alltogether:
private Timer tim = new Timer(){ Interval=300 };
private void SomeWork() {
// start the worker thread here
tim.Tick += tim_Tick;
tim.Start();
}
private void tim_Tick(object sender, EventArgs e){
tim.Enabled = false; // prevent timer messages from piling up
if(PollDone()){
tim.Tick -= tim_Tick;
return;
}
progressBar1.Value = PollProgress();
tim.Enabled = true;
}
Calling Application.DoEvents() can potentially cause allot of headaches when GUI stuff has not been disabled and the user kicks off other events or the same event a 2nd time simultaneously, causing stack climbs which by nature queue the first action behind the new one, but I'm going off topic.
Probably that example is too winforms specific, I'll try making a more general example. If you have a thread that is filling a buffer that is processed by other threads, be sure to leave some System.Threading.Thread.Sleep() slack in the loop to allow the other threads to do some processing before checking if the buffer needs to be filled again:
public class WorkItem {
// populate with something usefull
}
public static object WorkItemsSyncRoot = new object();
public static Queue<WorkItem> workitems = new Queue<WorkItem>();
public void FillBuffer() {
while(!done) {
lock(WorkItemsSyncRoot) {
if(workitems.Count < 30) {
workitems.Enqueue(new WorkItem(/* load a file or something */ ));
}
}
}
}
The worker thread's will have difficulty to obtain anything from the queue since its constantly being locked by the filling thread. Adding a Sleep() (outside the lock) could significantly speed up other threads:
public void FillBuffer() {
while(!done) {
lock(WorkItemsSyncRoot) {
if(workitems.Count < 30) {
workitems.Enqueue(new WorkItem(/* load a file or something */ ));
}
}
System.Threading.Thread.Sleep(50);
}
}
Hooking up a profiler could in some cases have the same effect as the sleep function.
I'm not sure if I've given representative examples (it's quite hard to come up with something simple) but I guess the point is clear, putting sleep() in the correct place can help improve the flow of other threads.
---------- Edit after Update7 -------------
I'd remove that LoopDataRefresh() thread altogether. Rather put a timer in your window with an interval of at least 20 (which would be 50 frames a second if none were skipped):
private void tim_Tick(object sender, EventArgs e) {
tim.Enabled = false; // skip frames that come while we're still drawing
if(IsDisposed) {
tim.Tick -= tim_Tick;
return;
}
// Your code follows, I've tried to optimize it here and there, but no guarantee that it compiles or works, not tested at all
if(signalNewFFT && PanelFFT.Visible) {
signalNewFFT = false;
#region FFT
bool newRange = false;
if(graphFFT.MaxY != d.fftRangeYMax) {
graphFFT.MaxY = d.fftRangeYMax;
newRange = true;
}
if(graphFFT.MinY != d.fftRangeYMin) {
graphFFT.MinY = d.fftRangeYMin;
newRange = true;
}
int tempLength = 0;
short[] tempData;
int i = 0;
lock(d.fftDataLock) {
tempLength = d.fftLength;
tempData = (short[])d.fftData.Clone();
}
graphFFT.SetLine("FFT", tempData);
if(newRange) graphFFT.RefreshGraphComplete();
else if(PanelFFT.Visible) graphFFT.RefreshGraph();
#endregion
// End of your code
tim.Enabled = true; // Drawing is done, allow new frames to come in.
}
}
Here's the optimized SetLine() which no longer takes a list of points but the raw data:
public class GraphFFT {
public void SetLine(String lineTitle, short[] values) {
IPointListEdit ip = zgcGraph.GraphPane.CurveList[lineTitle].Points as IPointListEdit;
int tmp = Math.Min(ip.Count, values.Length);
int i = 0;
peakX = values.Length;
while(i < tmp) {
if(values[i] > peakY) peakY = values[i];
ip[i].X = i;
ip[i].Y = values[i];
i++;
}
while(ip.Count < values.Count) {
if(values[i] > peakY) peakY = values[i];
ip.Add(i, values[i]);
i++;
}
while(values.Count > ip.Count) {
ip.RemoveAt(ip.Count - 1);
}
}
}
I hope you get that working, as I commented before, I hav'nt got the chance to compile or check it so there could be some bugs there. There's more to be optimized there, but the optimizations should be marginal compared to the boost of skipping frames and only collecting data when we have the time to actually draw the frame before the next one comes in.
If you closely study the graphs in the video at iZotope, you'll notice that they too are skipping frames, and sometimes are a bit jumpy. That's not bad at all, it's a trade-off you make between the processing power of the foreground thread and the background workers.
If you really want the drawing to be done in a separate thread, you'll have to draw the graph to a bitmap (calling Draw() and passing the bitmaps device context). Then pass the bitmap on to the main thread and have it update. That way you do lose the convenience of the designer and property grid in your IDE, but you can make use of otherwise vacant processor cores.
---------- edit answer to remarks --------
Yes there is a way to tell what calls what. Look at your first screen-shot, you have selected the "call tree" graph. Each next line jumps in a bit (it's a tree-view, not just a list!). In a call-graph, each tree-node represents a method that has been called by its parent tree-node (method).
In the first image, WndProc was called about 1800 times, it handled 872 messages of which 62 triggered ZedGraphControl.OnPaint() (which in turn accounts for 53% of the main threads total time).
The reason you don't see another rootnode, is because the 3rd dropdown box has selected "[604] Mian Thread" which I didn't notice before.
As for the more fluent graphs, I have 2nd thoughts on that now after looking more closely to the screen-shots. The main thread has clearly received more (double) update messages, and the CPU still has some headroom.
It looks like the threads are out-of-sync and in-sync at different times, where the update messages arrive just too late (when WndProc was done and went to sleep for a while), and then suddenly in time for a while. I'm not very familiar with Ants, but does it have a side-by side thread timeline including sleep time? You should be able to see what's going on in such a view. Microsofts threads view tool would come in handy for this:
When I have never heard or seen something similar; I’d recommend the common sense approach of commenting out sections of code/injecting returns at tops of functions until you find the logic that’s producing the side effect. You know your code and likely have an educated guess where to start chopping. Else chop mostly all as a sanity test and start adding blocks back. I’m often amazed how fast one can find those seemingly impossible bugs to track. Once you find the related code, you will have more clues to solve your issue.
There is an array of potential causes. Without stating completeness, here is how you could approach your search for the actual cause:
Environment variables: the timer issue in another answer is only one example. There might be modifications to the Path and to other variables, new variables could be set by the profiler. Write the current environment variables to a file and compare both configurations. Try to find suspicious entries, unset them one by one (or in combinations) until you get the same behavior in both cases.
Processor frequency. This can easily happen on laptops. Potentially, the energy saving system sets the frequency of the processor(s) to a lower value to save energy. Some apps may 'wake' the system up, increasing the frequency. Check this via performance monitor (permon).
If the apps runs slower than possible there must be some inefficient resource utilization. Use the profiler to investigate this! You can attache the profiler to the (slow) running process to see which resources are under-/ over-utilized. Mostly, there are two major categories of causes for too slow execution: memory bound and compute bound execution. Both can give more insight into what is triggering the slow-down.
If, however, your app actually changes its efficiency by attaching to a profiler you can still use your favorite monitor app to see, which performance indicators do actually change. Again, perfmon is your friend.
If you have a method which throws a lot of exceptions, it can run slowly in debug mode and fast in CPU Profiling mode.
As detailed here, debug performance can be improved by using the DebuggerNonUserCode attribute. For example:
[DebuggerNonUserCode]
public static bool IsArchive(string filename)
{
bool result = false;
try
{
//this calls an external library, which throws an exception if the file is not an archive
result = ExternalLibrary.IsArchive(filename);
}
catch
{
}
return result;
}
See the following concurrent performance analysis representing the work done by a parallel foreach:
Inside the loop each thread reads data from the DB and process it. There are no locks between threads as each one process different data.
Looks like there are periodic locks in all the thread of the foreach due to unknown reasons (see the black vertical rectangles). If you see the selected locked segment (the dark red one) you will see that the stack shows the thread locked at StockModel.Quotation constructor. The code there just constructs two empty lists!
I've read somewhere that this could be caused by the GC so I've changed the garbage collection to run in server mode with:
<runtime>
<gcServer enabled="true"/>
</runtime>
I got a small improvement (about 10% - 15% faster) but I still have the vertical locks everywhere.
I've also added to all the DB queries the WITH(NOLOCK) as I'm only reading data without any difference.
Any hint on what's happening here?
The computer where the analysis has been done has 8 cores.
EDIT: After enabling Microsoft Symbol servers turns out that all threads are blocked on calls like wait_gor_gc_done or WaitUntilGCComplete. I thought that enabling GCServer I had one GC for each thread so I would avoid the "vertical" lock but seems that it's not the case. Am I wrong?
Second question: as the machine is not under memory pressure (5 of 8 gigs are used) is there a way to delay the GC execution or to pause it until the parallel foreach ends (or to configure it to fire less often)?
If your StockModel.Quotation class allows for it, you could create a pool to limit the number of new objects created. This is a technique they sometimes use in games to prevent the garbage collector stalling in the middle of renders.
Here's a basic pool implementation:
class StockQuotationPool
{
private List<StockQuotation> poolItems;
private volatile int itemsInPool;
public StockQuotationPool(int poolSize)
{
this.poolItems = new List<StockQuotation>(poolSize);
this.itemsInPool = poolSize;
}
public StockQuotation Create(string name, decimal value)
{
if (this.itemsInPool == 0)
{
// Block until new item ready - maybe use semaphore.
throw new NotImplementedException();
}
// Items are in the pool, but no items have been created.
if (this.poolItems.Count == 0)
{
this.itemsInPool--;
return new StockQuotation(name, value);
}
// else, return one in the pool
this.itemsInPool--;
var item = this.poolItems[0];
this.poolItems.Remove(item);
item.Name = name;
item.Value = value;
return item;
}
public void Release(StockQuotation quote)
{
if (!this.poolItems.Contains(quote)
{
this.poolItems.Add(quote);
this.itemsInPool++;
}
}
}
That's assuming that the StockQuotation looks something like this:
class StockQuotation
{
internal StockQuotation(string name, decimal value)
{
this.Name = name;
this.Value = value;
}
public string Name { get; set; }
public decimal Value { get; set; }
}
Then instead of calling the new StockQuotation() constructor, you ask the pool for a new instance. The pool returns an existing instance (you can precreate them if you want) and sets all the properties so that it looks like a new instance. You may need to play around until you find a pool size that is large enough to accommodate the threads at the same time.
Here's how you'd call it from the thread.
// Get the pool, maybe from a singleton.
var pool = new StockQuotationPool(100);
var quote = pool.Create("test", 1.00m);
try
{
// Work with quote
}
finally
{
pool.Release(quote);
}
Lastly, this class isn't thread safe at the moment. Let me know if you need any help with making it so.
You could try using GCLatencyMode.LowLatency; See related question here: Prevent .NET Garbage collection for short period of time
I recently attempted this with no luck. Garbage collection was still being called when caching bitmap images of Icon sizes on a form I was displaying. What worked for me was using Ants performance profiler and Reflector to find the exact calls that were causing the GC.Collect and work around it.
I'm trying to find out how much memory my own .Net server process is using (for monitoring and logging purposes).
I'm using:
Process.GetCurrentProcess().PrivateMemorySize64
However, the Process object has several different properties that let me read the memory space used:
Paged, NonPaged, PagedSystem, NonPagedSystem, Private, Virtual, WorkingSet
and then the "peaks": which i'm guessing just store the maximum values these last ones ever took.
Reading through the MSDN definition of each property hasn't proved too helpful for me. I have to admit my knowledge regarding how memory is managed (as far as paging and virtual goes) is very limited.
So my question is obviously "which one should I use?", and I know the answer is "it depends".
This process will basically hold a bunch of lists in memory of things that are going on, while other processes communicate with it and query it for stuff. I'm expecting the server where this will run on to require lots of RAM, and so i'm querying this data over time to be able to estimate RAM requirements when compared to the sizes of the lists it keeps inside.
So... Which one should I use and why?
If you want to know how much the GC uses try:
GC.GetTotalMemory(true)
If you want to know what your process uses from Windows (VM Size column in TaskManager) try:
Process.GetCurrentProcess().PrivateMemorySize64
If you want to know what your process has in RAM (as opposed to in the pagefile) (Mem Usage column in TaskManager) try:
Process.GetCurrentProcess().WorkingSet64
See here for more explanation on the different sorts of memory.
OK, I found through Google the same page that Lars mentioned, and I believe it's a great explanation for people that don't quite know how memory works (like me).
http://shsc.info/WindowsMemoryManagement
My short conclusion was:
Private Bytes = The Memory my process has requested to store data. Some of it may be paged to disk or not. This is the information I was looking for.
Virtual Bytes = The Private Bytes, plus the space shared with other processes for loaded DLLs, etc.
Working Set = The portion of ALL the memory of my process that has not been paged to disk. So the amount paged to disk should be (Virtual - Working Set).
Thanks all for your help!
If you want to use the "Memory (Private Working Set)" as shown in Windows Vista task manager, which is the equivalent of Process Explorer "WS Private Bytes", here is the code. Probably best to throw this infinite loop in a thread/background task for real-time stats.
using System.Threading;
using System.Diagnostics;
//namespace...class...method
Process thisProc = Process.GetCurrentProcess();
PerformanceCounter PC = new PerformanceCounter();
PC.CategoryName = "Process";
PC.CounterName = "Working Set - Private";
PC.InstanceName = thisProc.ProcessName;
while (true)
{
String privMemory = (PC.NextValue()/1000).ToString()+"KB (Private Bytes)";
//Do something with string privMemory
Thread.Sleep(1000);
}
To get the value that Task Manager gives, my hat's off to Mike Regan's solution above. However, one change: it is not: perfCounter.NextValue()/1000; but perfCounter.NextValue()/1024; (i.e. a real kilobyte). This gives the exact value you see in Task Manager.
Following is a full solution for displaying the 'memory usage' (Task manager's, as given) in a simple way in your WPF or WinForms app (in this case, simply in the title). Just call this method within the new Window constructor:
private void DisplayMemoryUsageInTitleAsync()
{
origWindowTitle = this.Title; // set WinForms or WPF Window Title to field
BackgroundWorker wrkr = new BackgroundWorker();
wrkr.WorkerReportsProgress = true;
wrkr.DoWork += (object sender, DoWorkEventArgs e) => {
Process currProcess = Process.GetCurrentProcess();
PerformanceCounter perfCntr = new PerformanceCounter();
perfCntr.CategoryName = "Process";
perfCntr.CounterName = "Working Set - Private";
perfCntr.InstanceName = currProcess.ProcessName;
while (true)
{
int value = (int)perfCntr.NextValue() / 1024;
string privateMemoryStr = value.ToString("n0") + "KB [Private Bytes]";
wrkr.ReportProgress(0, privateMemoryStr);
Thread.Sleep(1000);
}
};
wrkr.ProgressChanged += (object sender, ProgressChangedEventArgs e) => {
string val = e.UserState as string;
if (!string.IsNullOrEmpty(val))
this.Title = string.Format(#"{0} ({1})", origWindowTitle, val);
};
wrkr.RunWorkerAsync();
}`
Is this a fair description? I'd like to share this with my team so please let me know if it is incorrect (or incomplete):
There are several ways in C# to ask how much memory my process is using.
Allocated memory can be managed (by the CLR) or unmanaged.
Allocated memory can be virtual (stored on disk) or loaded (into RAM pages)
Allocated memory can be private (used only by the process) or shared (e.g. belonging to a DLL that other processes are referencing).
Given the above, here are some ways to measure memory usage in C#:
1) Process.VirtualMemorySize64(): returns all the memory used by a process - managed or unmanaged, virtual or loaded, private or shared.
2) Process.PrivateMemorySize64(): returns all the private memory used by a process - managed or unmanaged, virtual or loaded.
3) Process.WorkingSet64(): returns all the private, loaded memory used by a process - managed or unmanaged
4) GC.GetTotalMemory(): returns the amount of managed memory being watched by the garbage collector.
Working set isn't a good property to use. From what I gather, it includes everything the process can touch, even libraries shared by several processes, so you're seeing double-counted bytes in that counter. Private memory is a much better counter to look at.
I'd suggest to also monitor how often pagefaults happen. A pagefault happens when you try to access some data that have been moved from physical memory to swap file and system has to read page from disk before you can access this data.