One of our programs suffered from a severe memory leak: its process memory rose by 1 GB per day at a customer site.
I could set up the scenario in our test center, and could get a memory leak of some 700 MB per day.
This application is a Windows service written in C# which communicates with devices over a CAN bus.
The memory leak does not depend on the rate of data the application writes to the CAN bus. But it clearly depends on the number of messages received.
The "unmanaged" side of reading the messages is:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct CAN_MSG
{
public uint time_stamp;
public uint id;
public byte len;
public byte rtr;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 8)]
public byte[] a_data;
}
[DllImport("IEICAN02.dll", EntryPoint = "#3")]
public static extern int CAN_CountMsgs(ushort card_idx, byte can_no, byte que_type);
//ICAN_API INT32 _stdcall CAN_CountMsgs(UINT16 card_idx, UINT8 can_no,UINT8 que_type);
[DllImport("IEICAN02.dll", EntryPoint = "#10")]
public static extern int CAN_ReadMsg(ushort card_idx, byte can_no, ushort count, [MarshalAs(UnmanagedType.LPArray), Out()] CAN_MSG[] msg);
//ICAN_API INT32 _stdcall CAN_ReadMsg(UINT16 card_idx, UINT8 can_no, UINT16 count, CAN_MSG* p_obj);
We use essentially as follows:
private void ReadMessages()
{
while (keepRunning)
{
// get the number of messages in the queue
int messagesCounter = ICAN_API.CAN_CountMsgs(_CardIndex, _PortIndex, ICAN_API.CAN_RX_QUE);
if (messagesCounter > 0)
{
// create an array of appropriate size for those messages
CAN_MSG[] canMessages = new CAN_MSG[messagesCounter];
// read them
int actualReadMessages = ICAN_API.CAN_ReadMsg(_CardIndex, _PortIndex, (ushort)messagesCounter, canMessages);
// transform them into "our" objects
CanMessage[] messages = TransformMessages(canMessages);
Thread thread = new Thread(() => RaiseEventWithCanMessages(messages))
{
Priority = ThreadPriority.AboveNormal
};
thread.Start();
}
Thread.Sleep(20);
}
}
// transformation process:
new CanMessage
{
MessageData = (byte[])messages[i].a_data.Clone(),
MessageId = messages[i].id
};
The loop is executed once per every ~30 milliseconds.
When I call RaiseEventWithCanMessages(messages) in the same thread, the memory leak disappears (well, not completely, some 10 MB per day - i.e. about 1% of the original leak - remain, but that other leak is likely unrelated).
I do not understand how this creation of threads can lead to a memory leak. Can you provide me with some information how the memory leak is caused?
Addendum 2018-08-16:
The application starts of with some 50 MB of memory, and crashes at some 2GB. That means, that Gigabytes of memory are available for most of the time.
Also, CPU is at some 20% - 3 out of 4 cores are idle.
The number of threads used by the application remains rather constant around ~30 threads.
Overall, there are plenty of resources available for the Garbage Collection. Still, GC fails.
With some 30 threads per second, and a memory leak of 700 MB per day, on average ~300 bytes of memory leak per freshly created thread; with ~5 messages per new thread, some ~60bytes per message. The "unmanaged" struct does not make it into the new thread, its contents are copied into a newly instantiated class.
So: why does GC fail despite the enormous amount of resources available for it?
You're creating 2 arrays and a thread every ~30 milliseconds, without any coordination between them. The arrays could be a problem, but frankly I'm much more worried about the thread - creating threads is really, really expensive. You should not be creating them this frequently.
I'm also concerned about what happens if the read loop is out-pacing the thread - i.e. if RaiseEventWithCanMessages takes more time than the code that does the query/sleep. In that scenario, you'd have a constant growth of threads. And you'd probably also have all the various RaiseEventWithCanMessages fighting with each-other.
The fact that putting RaiseEventWithCanMessages inline "fixes" it suggests that the main problem here is either the sheer number of threads being created (bad), or the many overlapping and growing numbers of concurrent RaiseEventWithCanMessages.
The simplest fix would be: don't use the extra threads here.
If you actually want concurrent operations, I would have exactly two threads here - one that does the query, and one that does whatever RaiseEventWithCanMessages is, both in a loop. I would then coordinate between the threads such that the query thread waits for the previous RaiseEventWithCanMessages thing to be complete, such that it hands it over in a coordinated style - so there is always at most one outstanding RaiseEventWithCanMessages, and you stop running queries if it isn't keeping up.
Essentially:
CanMessage[] messages = TransformMessages(canMessages);
HandToConsumerBlockingUntilAvailable(messages); // TODO: implement
with the other thread basically doing:
var nextMessages = BlockUntilAvailableFromProducer(); // TODO: implement
A very basic implementation of this could be just:
void HandToConsumerBlockingUntilAvailable(CanMessage[] messages) {
lock(_queue) {
if(_queue.Length != 0) Monitor.Wait(_queue); // block until space
_queue.Enqueue(messages);
if(queue.Length == 1) Monitor.PulseAll(_queue); // wake consumer
}
}
CanMessage[] BlockUntilAvailableFromProducer() {
lock(_queue) {
if(_queue.Length == 0) Monitor.Wait(_queue); // block until work
var next = _queue.Dequeue();
Monitor.Pulse(_queue); // wake producer
return _next;
}
}
private readonly Queue<CanMessage[]> _queue = new Queue<CanMessage[]>;
This implementation enforces that there is no more than 1 outstanding unprocessed Message[] in the queue.
This addresses the issues of creating lots of threads, and the issues of the query loop out-pacing the RaiseEventWithCanMessages code.
I might also look into using the ArrayPool<T>.Shared for leasing oversized arrays (meaning: you need to be careful not to read more data than you've actually written, since you might have asked for an array of 500 but been given one of size 512), rather than constantly allocating arrays.
Related
I am writing a program in C# under Windows 10 to capture SysEx messages from MIDI devices. I'm using a callback function as follows:
private delegate void MidiInProc(
int handle,
uint msg,
int instance,
int param1,
int param2);
[DllImport("winmm.dll")]
private static extern int midiInOpen(
out int handle,
int deviceID,
MidiInProc proc,
int instance,
int flags);
private int hHandle;
public MidiInput(int deviceID)
{
MidiInProc midiInProc = MidiInProcess;
int hResult = midiInOpen(
out hHandle,
deviceID,
midiInProc,
0,
CALLBACK_FUNCTION | MIDI_IO_STATUS);
for (int i = 0; i < NUM_MIDIHDRS; ++i)
{
InitMidiHdr(i);
}
}
private void MidiInProcess(int hMidiIn, uint uMsg, int dwInstance, int dwParam1, int dwParam2)
{
switch (uMsg)
{
case MIM_OPEN:
break;
case MIM_CLOSE:
break;
case MIM_DATA:
QueueData(dwParam1);
break;
case MIM_MOREDATA:
QueueData(dwParam1);
break;
case MIM_LONGDATA:
QueueLongData((IntPtr)dwParam1);
break;
case MIM_ERROR:
throw new ApplicationException(string.Format("Invalid MIDI message: {0}", dwParam1));
case MIM_LONGERROR:
throw new ApplicationException("Invalid SysEx message.");
default:
throw new ApplicationException(string.Format("unexpected message: {0}", uMsg));
}
}
The InitMidiHdr method creates NUM_MIDIHDRS headers and buffers on the heap and passes their addresses to the driver with calls to MidiInPrepareHeader and MidiInAddBuffer. When SysEx data is received, the callback switches to the MIM_LONGDATA case and queues the buffer address to another thread which dequeues it, processes it, and then passes it back to the driver for further use via another call to MidiInAddBuffer.
Now, some programs do not use a separate thread, instead processing SysEx data on the callback's thread. From what I have read, this works most of the time, but not always, and is against the MSDN's advice: "Applications should not call any multimedia functions from inside the callback function[.]" This may not be a problem with contemporary drivers, but some users have, in the past, reported deadlocks when calling MidiInAddBuffer directly from the callback.
But...
When I use a worker thread to call MidiInAddBuffer, I can't think of a way to guarantee that it will keep up with calls to the callback. Sure, if the only thing the worker does is return the buffer, it will probably stay ahead of the callback, but relying on one thread to stay ahead of another without explicit synchronization is a bad practice. (Having the callback wait on a signal from the worker that it has returned the buffer doesn't work, as MidiInAddBuffer blocks when called from another thread until the callback returns, which leads to a deadlock if you condition its return on a return from MidiInAddBuffer in the worker thread.) Thus, it's possible in that scenario for the driver to run out of buffers while the MIDI device is still sending SysEx data. (Indeed, even in the scenario where the callback does all the processing itself, it too might fall behind, with the driver using up all of its buffers before the device stops sending SysEx data.)
I have found a couple of open-source projects that use a worker to call MidiInAddBuffer, but both appear to rely on the worker to stay ahead of calls to the callback. In fact, in one case, I added a 50ms sleep call to the worker, with the result being that it fell far behind the callback thread, which ultimately meant that only about half of the SysEx messages made it to the callback. Apparently, the driver doesn't buffer SysEx data internally and, when it has no buffers left to use, throws SysEx data away until it gets another buffer.
Now, memory being cheap and all that, one "solution" is to have a lot of buffers. My Behringer BCF2000, however, sends almost 500 distinct SysEx messages in a single dump, each in its own buffer. That's not an impossible number to supply, but it requires guesswork as to how many is really enough (and, given some of the exotic things people use SysEx messages to do, like pass around audio samples, such an approach could get unwieldy).
Alas, the MIM_LONGERROR case never gets called when my tests show that my code isn't keeping up, so that's no help.
So, here's my question: If my code is unable to keep up with the rate at which SysEx buffers are consumed by a MIDI device's driver and I end up missing some SysEx data, is there any way to at least detect that I have missed that data?
There is no reporting mechanism for missed SysEx buffers.
At the MIDI speed of 3125 bytes/s, the 50 ms delay corresponds to a buffer of 156 bytes; if the actual messages are shorter, you are guaranteed to fall behind.
But real code will not have such a consistent delay, so this is not a realistic test. Your threads might get random scheduling delays, but this is no problem as long as you have enough buffers queued. (And if some other code with higher priority prevents your code from being executed at all, there's nothing you can do anyway.)
I'm developing an application (.NET 4.0, C#) that:
1. Scans file system.
2. Opens and reads some files.
The app will work in background and should have low impact on the disk usage. It shouldn't bother users if they are doing their usual tasks and the disk usage is high. And vice versa, the app can go faster if nobody is using the disk.
The main issue is I don't know real amount and size of I/O operations because of using API (mapi32.dll) to read files. If I ask API to do something I don't know how many bytes it reads to handle my response.
So the question is how to monitor and manage the disk usage? Including file system scanning and files reading...
Check performance counters that are used by standard Performance Monitor tool? Or any other ways?
Using the System.Diagnostics.PerformanceCounter class, attach to the PhysicalDisk counter related to the drive that you are indexing.
Below is some code to illustrate, although its currently hard coded to the "C:" drive. You will want to change "C:" to whichever drive your process is scanning. (This is rough sample code only to illustrate the existence of performance counters - don't take it as providing accurate information - should always be used as a guide only. Change for your own purpose)
Observe the % Idle Time counter which indicates how often the drive is doing anything.
0% idle means the disk is busy, but does not necessarily mean that it is flat-out and cannot transfer more data.
Combine the % Idle Time with Current Disk Queue Length and this will tell you if the drive is getting so busy that it cannot service all the requests for data. As a general guideline, anything over 0 means the drive is probably flat-out busy and anything over 2 means the drive is completely saturated. These rules apply to both SSD and HDD fairly well.
Also, any value that you read is an instantaneous value at a point in time. You should do a running average over a few results, e.g. take a reading every 100ms and average 5 readings before using the information from the result to make a decision (i.e., waiting until the counters settle before making your next IO request).
internal DiskUsageMonitor(string driveName)
{
// Get a list of the counters and look for "C:"
var perfCategory = new PerformanceCounterCategory("PhysicalDisk");
string[] instanceNames = perfCategory.GetInstanceNames();
foreach (string name in instanceNames)
{
if (name.IndexOf("C:") > 0)
{
if (string.IsNullOrEmpty(driveName))
driveName = name;
}
}
_readBytesCounter = new PerformanceCounter("PhysicalDisk",
"Disk Read Bytes/sec",
driveName);
_writeBytesCounter = new PerformanceCounter("PhysicalDisk",
"Disk Write Bytes/sec",
driveName);
_diskQueueCounter = new PerformanceCounter("PhysicalDisk",
"Current Disk Queue Length",
driveName);
_idleCounter = new PerformanceCounter("PhysicalDisk",
"% Idle Time",
driveName);
InitTimer();
}
internal event DiskUsageResultHander DiskUsageResult;
private void InitTimer()
{
StopTimer();
_perfTimer = new Timer(_updateResolutionMillisecs);
_perfTimer.Elapsed += PerfTimerElapsed;
_perfTimer.Start();
}
private void PerfTimerElapsed(object sender, ElapsedEventArgs e)
{
float diskReads = _readBytesCounter.NextValue();
float diskWrites = _writeBytesCounter.NextValue();
float diskQueue = _diskQueueCounter.NextValue();
float idlePercent = _idleCounter.NextValue();
if (idlePercent > 100)
{
idlePercent = 100;
}
if (DiskUsageResult != null)
{
var stats = new DiskUsageStats
{
DriveName = _readBytesCounter.InstanceName,
DiskQueueLength = (int)diskQueue,
ReadBytesPerSec = (int)diskReads,
WriteBytesPerSec = (int)diskWrites,
DiskUsagePercent = 100 - (int)idlePercent
};
DiskUsageResult(stats);
}
}
A long term ago Microsoft Research published a paper on this (sorry I can’t remember the url).
From what I recall:
The program started off doing very few "work items".
They measured how long it took for each of their "work item".
After running for a bit, they could work out how fast an "work item" was with no load on the system.
From then on, if the "work item" were fast (e.g. no other programmers making requests), they made more requests, otherwise they backed-off
The basic ideal is:
“if they are slowing me down, then I
must be slowing them down, so do less
work if I am being slowed down”
Something to ponder: what if there are other processes which follow the same (or a similar) strategy? Which one would run during the "idle time"? Would the other processes get a chance to make use of the idle time at all?
Obviously this can't be done correctly unless there is some well-known OS mechanism for fairly dividing resources during idle time. In windows, this is done by calling SetPriorityClass.
This document about I/O prioritization in Vista seems to imply that IDLE_PRIORITY_CLASS will not really lower the priority of I/O requests (though it will reduce the scheduling priority for the process). Vista added new PROCESS_MODE_BACKGROUND_BEGIN and PROCESS_MODE_BACKGROUND_END values for that.
In C#, you can normally set the process priority with the Process.PriorityClass property. The new values for Vista are not available though, so you'll have to call the Windows API function directly. You can do that like this:
[DllImport("kernel32.dll", CharSet=CharSet.Auto, SetLastError=true)]
public static extern bool SetPriorityClass(IntPtr handle, uint priorityClass);
const uint PROCESS_MODE_BACKGROUND_BEGIN = 0x00100000;
static void SetBackgroundMode()
{
if (!SetPriorityClass(new IntPtr(-1), PROCESS_MODE_BACKGROUND_BEGIN))
{
// handle error...
}
}
I did not test the code above. Don't forget that it can only work on Vista or better. You'll have to use Environment.OSVersion to check for earlier operating systems and implement a fall-back strategy.
See this question and this also for related queries. I would suggest for a simple solution just querying for the current disk & CPU usage % every so often, and only continue with the current task when they are under a defined threshold. Just make sure your work is easily broken into tasks, and that each task can be easily & efficiently start/stopped.
Check if the screensaver is running ? Good indication that the user is away from the keyboard
I am trying to figure out how the MemoryCache should be used in order to avoid getting out of memory exceptions. I come from ASP.Net background where the cache manages it's own memory usage so I expect that MemoryCache would do the same. This does not appear to be the case as illustrated in the bellow test program I made:
class Program
{
static void Main(string[] args)
{
var cache = new MemoryCache("Cache");
for (int i = 0; i < 100000; i++)
{
AddToCache(cache, i);
}
Console.ReadLine();
}
private static void AddToCache(MemoryCache cache, int i)
{
var key = "File:" + i;
var contents = System.IO.File.ReadAllBytes("File.txt");
var policy = new CacheItemPolicy
{
SlidingExpiration = TimeSpan.FromHours(12)
};
policy.ChangeMonitors.Add(
new HostFileChangeMonitor(
new[] { Path.GetFullPath("File.txt") }
.ToList()));
cache.Add(key, contents, policy);
Console.Clear();
Console.Write(i);
}
}
The above throws an out of memory exception after approximately reaching 2GB of memory usage (Any CPU) or after consuming all my machine's physical memory (x64)(16GB).
If I remove the cache.Add bit the program throws no exception. If I include a call to cache.Trim(5) after every cache add I see that it releases some memory and it keeps aproximately 150 objects in the cache at any given time (from cache.GetCount()).
Is calling cache.Trim my program's responsibility? If so when should it be called (like how can my program know that the memory is getting full)? How do you calculate the percentage argument?
Note: I am planning to use the MemoryCache in a long running windows service so it is critical for it to have proper memory management.
MemoryCache has a background thread that periodically estimates how much memory the process is using and how many keys are in the cache. When it thinks you are getting close to the cachememorylimit, it will Trim the cache. Each time this background thread runs, it checks to see how close you are to the limits, and it will increase the polling frequency under memory pressure.
If you add items very quickly, the background thread doesn't have a chance to run, and you can run out of memory before the cache can trim and GC can run (in a x64 process this can result in massive heap size and multi minute GC pauses). The trim process/memory estimation is also known to have bugs under some conditions.
If your program is prone to out of memory due to rapidly loading an excessive number of objects, something with a bounded size like an LRU cache is a much better strategy. LRU typically uses a policy based on item count to evict the least recently used items.
I wrote a thread safe implementation of TLRU (a time aware least recently used policy), that you can easily use as a drop in replacement for ConcurrentDictionary.
It's available on Github here: https://github.com/bitfaster/BitFaster.Caching
Install-Package BitFaster.Caching
Using it would look like something this for your program, and it will not run out of memory (depending on how big your files are):
class Program
{
static void Main(string[] args)
{
int capacity = 80;
TimeSpan timeToLive = TimeSpan.FromMinutes(5);
var lru = new ConcurrentTLru<int, byte[]>(capacity, timeToLive);
for (int i = 0; i < 100000; i++)
{
var value = lru.GetOrAdd(1, (k) => System.IO.File.ReadAllBytes("File.txt"));
}
Console.ReadLine();
}
}
If you really want to avoid running out of memory, you should also consider reading the files into a RecyclableMemoryStream, and using the Scoped class in BitFaster to make the cached values thread safe and avoid races on dispose.
I have a relatively large system (~25000 lines so far) for monitoring radio-related devices. It shows graphs and such using latest version of ZedGraph.
The program is coded using C# on VS2010 with Win7.
The problem is:
when I run the program from within VS, it runs slow
when I run the program from the built EXE, it runs slow
when I run the program though Performance Wizard / CPU Profiler, it runs Blazing Fast.
when I run the program from the built EXE, and then start VS and Attach a profiler to ANY OTHER PROCESS, my program speeds up!
I want the program to always run that fast!
Every project in the solution is set to RELEASE, Debug unmanaged code is DISABLED, Define DEBUG and TRACE constants is DISABLED, Optimize Code - I tried either, Warning Level - I tried either, Suppress JIT - I tried either,
in short I tried all the solutions already proposed on StackOverflow - none worked. Program is slow outside profiler, fast in profiler.
I don't think the problem is in my code, because it becomes fast if I attach the profiler to other, unrelated process as well!
Please help!
I really need it to be that fast everywhere, because it's a business critical application and performance issues are not tolerated...
UPDATES 1 - 8 follow
--------------------Update1:--------------------
The problem seems to Not be ZedGraph related, because it still manifests after I replaced ZedGraph with my own basic drawing.
--------------------Update2:--------------------
Running the program in a Virtual machine, the program still runs slow, and running profiler from the Host machine doesn't make it fast.
--------------------Update3:--------------------
Starting screen capture to video also speeds the program up!
--------------------Update4:--------------------
If I open the Intel graphics driver settings window (this thing: http://www.intel.com/support/graphics/sb/img/resolution_new.jpg)
and just constantly hover with the cursor over buttons, so they glow, etc, my program speeds up!.
It doesn't speed up if I run GPUz or Kombustor though, so no downclocking on the GPU - it stays steady 850Mhz.
--------------------Update5:--------------------
Tests on different machines:
-On my Core i5-2400S with Intel HD2000, UI runs slow and CPU usage is ~15%.
-On a colleague's Core 2 Duo with Intel G41 Express, UI runs fast, but CPU usage is ~90% (which isn't normal either)
-On Core i5-2400S with dedicated Radeon X1650, UI runs blazing fast, CPU usage is ~50%.
--------------------Update6:--------------------
A snip of code showing how I update a single graph (graphFFT is an encapsulation of ZedGraphControl for ease of use):
public void LoopDataRefresh() //executes in a new thread
{
while (true)
{
while (!d.Connected)
Thread.Sleep(1000);
if (IsDisposed)
return;
//... other graphs update here
if (signalNewFFT && PanelFFT.Visible)
{
signalNewFFT = false;
#region FFT
bool newRange = false;
if (graphFFT.MaxY != d.fftRangeYMax)
{
graphFFT.MaxY = d.fftRangeYMax;
newRange = true;
}
if (graphFFT.MinY != d.fftRangeYMin)
{
graphFFT.MinY = d.fftRangeYMin;
newRange = true;
}
List<PointF> points = new List<PointF>(2048);
int tempLength = 0;
short[] tempData = new short[2048];
int i = 0;
lock (d.fftDataLock)
{
tempLength = d.fftLength;
tempData = (short[])d.fftData.Clone();
}
foreach (short s in tempData)
points.Add(new PointF(i++, s));
graphFFT.SetLine("FFT", points);
if (newRange)
graphFFT.RefreshGraphComplete();
else if (PanelFFT.Visible)
graphFFT.RefreshGraph();
#endregion
}
//... other graphs update here
Thread.Sleep(5);
}
}
SetLine is:
public void SetLine(String lineTitle, List<PointF> values)
{
IPointListEdit ip = zgcGraph.GraphPane.CurveList[lineTitle].Points as IPointListEdit;
int tmp = Math.Min(ip.Count, values.Count);
int i = 0;
while(i < tmp)
{
if (values[i].X > peakX)
peakX = values[i].X;
if (values[i].Y > peakY)
peakY = values[i].Y;
ip[i].X = values[i].X;
ip[i].Y = values[i].Y;
i++;
}
while(ip.Count < values.Count)
{
if (values[i].X > peakX)
peakX = values[i].X;
if (values[i].Y > peakY)
peakY = values[i].Y;
ip.Add(values[i].X, values[i].Y);
i++;
}
while(values.Count > ip.Count)
{
ip.RemoveAt(ip.Count - 1);
}
}
RefreshGraph is:
public void RefreshGraph()
{
if (!explicidX && autoScrollFlag)
{
zgcGraph.GraphPane.XAxis.Scale.Max = Math.Max(peakX + grace.X, rangeX);
zgcGraph.GraphPane.XAxis.Scale.Min = zgcGraph.GraphPane.XAxis.Scale.Max - rangeX;
}
if (!explicidY)
{
zgcGraph.GraphPane.YAxis.Scale.Max = Math.Max(peakY + grace.Y, maxY);
zgcGraph.GraphPane.YAxis.Scale.Min = minY;
}
zgcGraph.Refresh();
}
.
--------------------Update7:--------------------
Just ran it through the ANTS profiler. It tells me that the ZedGraph refresh counts when the program is fast are precisely two times higher compared to when it's slow.
Here are the screenshots:
I find it VERY strange that, considering the small difference in the length of the sections, performance differs twice with mathematical precision.
Also, I updated the GPU driver, that didn't help.
--------------------Update8:--------------------
Unfortunately, for a few days now, I'm unable to reproduce the issue... I'm getting constant acceptable speed (which still appear a bit slower than what I had in the profiler two weeks ago) which isn't affected by any of the factors that used to affect it two weeks ago - profiler, video capturing or GPU driver window. I still have no explanation of what was causing it...
Luaan posted the solution in the comments above, it's the system wide timer resolution. Default resolution is 15.6 ms, the profiler sets the resolution to 1ms.
I had the exact same problem, very slow execution that would speed up when the profiler was opened. The problem went away on my PC but popped back up on other PCs seemingly at random. We also noticed the problem disappeared when running a Join Me window in Chrome.
My application transmits a file over a CAN bus. The app loads a CAN message with eight bytes of data, transmits it and waits for an acknowledgment. With the timer set to 15.6ms each round trip took exactly 15.6ms and the entire file transfer would take about 14 minutes. With the timer set to 1ms round trip time varied but would be as low as 4ms and the entire transfer time would drop to less than two minutes.
You can verify your system timer resolution as well as find out which program increased the resolution by opening a command prompt as administrator and entering:
powercfg -energy duration 5
The output file will have the following in it somewhere:
Platform Timer Resolution:Platform Timer Resolution
The default platform timer resolution is 15.6ms (15625000ns) and should be used whenever the system is idle. If the timer resolution is increased, processor power management technologies may not be effective. The timer resolution may be increased due to multimedia playback or graphical animations.
Current Timer Resolution (100ns units) 10000
Maximum Timer Period (100ns units) 156001
My current resolution is 1 ms (10,000 units of 100nS) and is followed by a list of the programs that requested the increased resolution.
This information as well as more detail can be found here: https://randomascii.wordpress.com/2013/07/08/windows-timer-resolution-megawatts-wasted/
Here is some code to increase the timer resolution (originally posted as the answer to this question: how to set timer resolution from C# to 1 ms?):
public static class WinApi
{
/// <summary>TimeBeginPeriod(). See the Windows API documentation for details.</summary>
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Interoperability", "CA1401:PInvokesShouldNotBeVisible"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Security", "CA2118:ReviewSuppressUnmanagedCodeSecurityUsage"), SuppressUnmanagedCodeSecurity]
[DllImport("winmm.dll", EntryPoint = "timeBeginPeriod", SetLastError = true)]
public static extern uint TimeBeginPeriod(uint uMilliseconds);
/// <summary>TimeEndPeriod(). See the Windows API documentation for details.</summary>
[System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Interoperability", "CA1401:PInvokesShouldNotBeVisible"), System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Security", "CA2118:ReviewSuppressUnmanagedCodeSecurityUsage"), SuppressUnmanagedCodeSecurity]
[DllImport("winmm.dll", EntryPoint = "timeEndPeriod", SetLastError = true)]
public static extern uint TimeEndPeriod(uint uMilliseconds);
}
Use it like this to increase resolution :WinApi.TimeBeginPeriod(1);
And like this to return to the default :WinApi.TimeEndPeriod(1);
The parameter passed to TimeEndPeriod() must match the parameter that was passed to TimeBeginPeriod().
There are situations when slowing down a thread can speed up other threads significantly, usually when one thread is polling or locking some common resource frequently.
For instance (this is a windows-forms example) when the main thread is checking overall progress in a tight loop instead of using a timer, for example:
private void SomeWork() {
// start the worker thread here
while(!PollDone()) {
progressBar1.Value = PollProgress();
Application.DoEvents(); // keep the GUI responisive
}
}
Slowing it down could improve performance:
private void SomeWork() {
// start the worker thread here
while(!PollDone()) {
progressBar1.Value = PollProgress();
System.Threading.Thread.Sleep(300); // give the polled thread some time to work instead of responding to your poll
Application.DoEvents(); // keep the GUI responisive
}
}
Doing it correctly, one should avoid using the DoEvents call alltogether:
private Timer tim = new Timer(){ Interval=300 };
private void SomeWork() {
// start the worker thread here
tim.Tick += tim_Tick;
tim.Start();
}
private void tim_Tick(object sender, EventArgs e){
tim.Enabled = false; // prevent timer messages from piling up
if(PollDone()){
tim.Tick -= tim_Tick;
return;
}
progressBar1.Value = PollProgress();
tim.Enabled = true;
}
Calling Application.DoEvents() can potentially cause allot of headaches when GUI stuff has not been disabled and the user kicks off other events or the same event a 2nd time simultaneously, causing stack climbs which by nature queue the first action behind the new one, but I'm going off topic.
Probably that example is too winforms specific, I'll try making a more general example. If you have a thread that is filling a buffer that is processed by other threads, be sure to leave some System.Threading.Thread.Sleep() slack in the loop to allow the other threads to do some processing before checking if the buffer needs to be filled again:
public class WorkItem {
// populate with something usefull
}
public static object WorkItemsSyncRoot = new object();
public static Queue<WorkItem> workitems = new Queue<WorkItem>();
public void FillBuffer() {
while(!done) {
lock(WorkItemsSyncRoot) {
if(workitems.Count < 30) {
workitems.Enqueue(new WorkItem(/* load a file or something */ ));
}
}
}
}
The worker thread's will have difficulty to obtain anything from the queue since its constantly being locked by the filling thread. Adding a Sleep() (outside the lock) could significantly speed up other threads:
public void FillBuffer() {
while(!done) {
lock(WorkItemsSyncRoot) {
if(workitems.Count < 30) {
workitems.Enqueue(new WorkItem(/* load a file or something */ ));
}
}
System.Threading.Thread.Sleep(50);
}
}
Hooking up a profiler could in some cases have the same effect as the sleep function.
I'm not sure if I've given representative examples (it's quite hard to come up with something simple) but I guess the point is clear, putting sleep() in the correct place can help improve the flow of other threads.
---------- Edit after Update7 -------------
I'd remove that LoopDataRefresh() thread altogether. Rather put a timer in your window with an interval of at least 20 (which would be 50 frames a second if none were skipped):
private void tim_Tick(object sender, EventArgs e) {
tim.Enabled = false; // skip frames that come while we're still drawing
if(IsDisposed) {
tim.Tick -= tim_Tick;
return;
}
// Your code follows, I've tried to optimize it here and there, but no guarantee that it compiles or works, not tested at all
if(signalNewFFT && PanelFFT.Visible) {
signalNewFFT = false;
#region FFT
bool newRange = false;
if(graphFFT.MaxY != d.fftRangeYMax) {
graphFFT.MaxY = d.fftRangeYMax;
newRange = true;
}
if(graphFFT.MinY != d.fftRangeYMin) {
graphFFT.MinY = d.fftRangeYMin;
newRange = true;
}
int tempLength = 0;
short[] tempData;
int i = 0;
lock(d.fftDataLock) {
tempLength = d.fftLength;
tempData = (short[])d.fftData.Clone();
}
graphFFT.SetLine("FFT", tempData);
if(newRange) graphFFT.RefreshGraphComplete();
else if(PanelFFT.Visible) graphFFT.RefreshGraph();
#endregion
// End of your code
tim.Enabled = true; // Drawing is done, allow new frames to come in.
}
}
Here's the optimized SetLine() which no longer takes a list of points but the raw data:
public class GraphFFT {
public void SetLine(String lineTitle, short[] values) {
IPointListEdit ip = zgcGraph.GraphPane.CurveList[lineTitle].Points as IPointListEdit;
int tmp = Math.Min(ip.Count, values.Length);
int i = 0;
peakX = values.Length;
while(i < tmp) {
if(values[i] > peakY) peakY = values[i];
ip[i].X = i;
ip[i].Y = values[i];
i++;
}
while(ip.Count < values.Count) {
if(values[i] > peakY) peakY = values[i];
ip.Add(i, values[i]);
i++;
}
while(values.Count > ip.Count) {
ip.RemoveAt(ip.Count - 1);
}
}
}
I hope you get that working, as I commented before, I hav'nt got the chance to compile or check it so there could be some bugs there. There's more to be optimized there, but the optimizations should be marginal compared to the boost of skipping frames and only collecting data when we have the time to actually draw the frame before the next one comes in.
If you closely study the graphs in the video at iZotope, you'll notice that they too are skipping frames, and sometimes are a bit jumpy. That's not bad at all, it's a trade-off you make between the processing power of the foreground thread and the background workers.
If you really want the drawing to be done in a separate thread, you'll have to draw the graph to a bitmap (calling Draw() and passing the bitmaps device context). Then pass the bitmap on to the main thread and have it update. That way you do lose the convenience of the designer and property grid in your IDE, but you can make use of otherwise vacant processor cores.
---------- edit answer to remarks --------
Yes there is a way to tell what calls what. Look at your first screen-shot, you have selected the "call tree" graph. Each next line jumps in a bit (it's a tree-view, not just a list!). In a call-graph, each tree-node represents a method that has been called by its parent tree-node (method).
In the first image, WndProc was called about 1800 times, it handled 872 messages of which 62 triggered ZedGraphControl.OnPaint() (which in turn accounts for 53% of the main threads total time).
The reason you don't see another rootnode, is because the 3rd dropdown box has selected "[604] Mian Thread" which I didn't notice before.
As for the more fluent graphs, I have 2nd thoughts on that now after looking more closely to the screen-shots. The main thread has clearly received more (double) update messages, and the CPU still has some headroom.
It looks like the threads are out-of-sync and in-sync at different times, where the update messages arrive just too late (when WndProc was done and went to sleep for a while), and then suddenly in time for a while. I'm not very familiar with Ants, but does it have a side-by side thread timeline including sleep time? You should be able to see what's going on in such a view. Microsofts threads view tool would come in handy for this:
When I have never heard or seen something similar; I’d recommend the common sense approach of commenting out sections of code/injecting returns at tops of functions until you find the logic that’s producing the side effect. You know your code and likely have an educated guess where to start chopping. Else chop mostly all as a sanity test and start adding blocks back. I’m often amazed how fast one can find those seemingly impossible bugs to track. Once you find the related code, you will have more clues to solve your issue.
There is an array of potential causes. Without stating completeness, here is how you could approach your search for the actual cause:
Environment variables: the timer issue in another answer is only one example. There might be modifications to the Path and to other variables, new variables could be set by the profiler. Write the current environment variables to a file and compare both configurations. Try to find suspicious entries, unset them one by one (or in combinations) until you get the same behavior in both cases.
Processor frequency. This can easily happen on laptops. Potentially, the energy saving system sets the frequency of the processor(s) to a lower value to save energy. Some apps may 'wake' the system up, increasing the frequency. Check this via performance monitor (permon).
If the apps runs slower than possible there must be some inefficient resource utilization. Use the profiler to investigate this! You can attache the profiler to the (slow) running process to see which resources are under-/ over-utilized. Mostly, there are two major categories of causes for too slow execution: memory bound and compute bound execution. Both can give more insight into what is triggering the slow-down.
If, however, your app actually changes its efficiency by attaching to a profiler you can still use your favorite monitor app to see, which performance indicators do actually change. Again, perfmon is your friend.
If you have a method which throws a lot of exceptions, it can run slowly in debug mode and fast in CPU Profiling mode.
As detailed here, debug performance can be improved by using the DebuggerNonUserCode attribute. For example:
[DebuggerNonUserCode]
public static bool IsArchive(string filename)
{
bool result = false;
try
{
//this calls an external library, which throws an exception if the file is not an archive
result = ExternalLibrary.IsArchive(filename);
}
catch
{
}
return result;
}
I'm developing an application (.NET 4.0, C#) that:
1. Scans file system.
2. Opens and reads some files.
The app will work in background and should have low impact on the disk usage. It shouldn't bother users if they are doing their usual tasks and the disk usage is high. And vice versa, the app can go faster if nobody is using the disk.
The main issue is I don't know real amount and size of I/O operations because of using API (mapi32.dll) to read files. If I ask API to do something I don't know how many bytes it reads to handle my response.
So the question is how to monitor and manage the disk usage? Including file system scanning and files reading...
Check performance counters that are used by standard Performance Monitor tool? Or any other ways?
Using the System.Diagnostics.PerformanceCounter class, attach to the PhysicalDisk counter related to the drive that you are indexing.
Below is some code to illustrate, although its currently hard coded to the "C:" drive. You will want to change "C:" to whichever drive your process is scanning. (This is rough sample code only to illustrate the existence of performance counters - don't take it as providing accurate information - should always be used as a guide only. Change for your own purpose)
Observe the % Idle Time counter which indicates how often the drive is doing anything.
0% idle means the disk is busy, but does not necessarily mean that it is flat-out and cannot transfer more data.
Combine the % Idle Time with Current Disk Queue Length and this will tell you if the drive is getting so busy that it cannot service all the requests for data. As a general guideline, anything over 0 means the drive is probably flat-out busy and anything over 2 means the drive is completely saturated. These rules apply to both SSD and HDD fairly well.
Also, any value that you read is an instantaneous value at a point in time. You should do a running average over a few results, e.g. take a reading every 100ms and average 5 readings before using the information from the result to make a decision (i.e., waiting until the counters settle before making your next IO request).
internal DiskUsageMonitor(string driveName)
{
// Get a list of the counters and look for "C:"
var perfCategory = new PerformanceCounterCategory("PhysicalDisk");
string[] instanceNames = perfCategory.GetInstanceNames();
foreach (string name in instanceNames)
{
if (name.IndexOf("C:") > 0)
{
if (string.IsNullOrEmpty(driveName))
driveName = name;
}
}
_readBytesCounter = new PerformanceCounter("PhysicalDisk",
"Disk Read Bytes/sec",
driveName);
_writeBytesCounter = new PerformanceCounter("PhysicalDisk",
"Disk Write Bytes/sec",
driveName);
_diskQueueCounter = new PerformanceCounter("PhysicalDisk",
"Current Disk Queue Length",
driveName);
_idleCounter = new PerformanceCounter("PhysicalDisk",
"% Idle Time",
driveName);
InitTimer();
}
internal event DiskUsageResultHander DiskUsageResult;
private void InitTimer()
{
StopTimer();
_perfTimer = new Timer(_updateResolutionMillisecs);
_perfTimer.Elapsed += PerfTimerElapsed;
_perfTimer.Start();
}
private void PerfTimerElapsed(object sender, ElapsedEventArgs e)
{
float diskReads = _readBytesCounter.NextValue();
float diskWrites = _writeBytesCounter.NextValue();
float diskQueue = _diskQueueCounter.NextValue();
float idlePercent = _idleCounter.NextValue();
if (idlePercent > 100)
{
idlePercent = 100;
}
if (DiskUsageResult != null)
{
var stats = new DiskUsageStats
{
DriveName = _readBytesCounter.InstanceName,
DiskQueueLength = (int)diskQueue,
ReadBytesPerSec = (int)diskReads,
WriteBytesPerSec = (int)diskWrites,
DiskUsagePercent = 100 - (int)idlePercent
};
DiskUsageResult(stats);
}
}
A long term ago Microsoft Research published a paper on this (sorry I can’t remember the url).
From what I recall:
The program started off doing very few "work items".
They measured how long it took for each of their "work item".
After running for a bit, they could work out how fast an "work item" was with no load on the system.
From then on, if the "work item" were fast (e.g. no other programmers making requests), they made more requests, otherwise they backed-off
The basic ideal is:
“if they are slowing me down, then I
must be slowing them down, so do less
work if I am being slowed down”
Something to ponder: what if there are other processes which follow the same (or a similar) strategy? Which one would run during the "idle time"? Would the other processes get a chance to make use of the idle time at all?
Obviously this can't be done correctly unless there is some well-known OS mechanism for fairly dividing resources during idle time. In windows, this is done by calling SetPriorityClass.
This document about I/O prioritization in Vista seems to imply that IDLE_PRIORITY_CLASS will not really lower the priority of I/O requests (though it will reduce the scheduling priority for the process). Vista added new PROCESS_MODE_BACKGROUND_BEGIN and PROCESS_MODE_BACKGROUND_END values for that.
In C#, you can normally set the process priority with the Process.PriorityClass property. The new values for Vista are not available though, so you'll have to call the Windows API function directly. You can do that like this:
[DllImport("kernel32.dll", CharSet=CharSet.Auto, SetLastError=true)]
public static extern bool SetPriorityClass(IntPtr handle, uint priorityClass);
const uint PROCESS_MODE_BACKGROUND_BEGIN = 0x00100000;
static void SetBackgroundMode()
{
if (!SetPriorityClass(new IntPtr(-1), PROCESS_MODE_BACKGROUND_BEGIN))
{
// handle error...
}
}
I did not test the code above. Don't forget that it can only work on Vista or better. You'll have to use Environment.OSVersion to check for earlier operating systems and implement a fall-back strategy.
See this question and this also for related queries. I would suggest for a simple solution just querying for the current disk & CPU usage % every so often, and only continue with the current task when they are under a defined threshold. Just make sure your work is easily broken into tasks, and that each task can be easily & efficiently start/stopped.
Check if the screensaver is running ? Good indication that the user is away from the keyboard