reducing the CPU use of NetNamedPipe when under load - c#

I have a Windows service that uses NetNamedPipe to communicate with other processes on the same machine. It works fine except for one problem: high CPU use. Is there anything I can do to reduce this usage?
To better understand this issue, I made a simple test program that talks to itself over a named pipe and tracks its own CPU use. When using the named pipe infrequently (1 operation per second), CPU use is very low. When using the named pipe frequently (thousands of operations per second), the CPU use increases.
Here is some sample output demonstrating the behaviour. (Note that the CPU uses the Process > % Processor Time counter, which is not as simple as the CPU use you might see in Task Manager.)
NetNamedPipe Passed: 31309 Failed: 0 Elapsed: 10.4 s Rate: 3000 Hz Process CPU: 30.0 %
NetNamedPipe Passed: 13 Failed: 0 Elapsed: 11.0 s Rate: 1 Hz Process CPU: 0.9 %
Ideally, I'd like to continue using NetNamedPipe, but do something to reduce the CPU use. I have experimented with tweaking the optional settings of NetNamedPipeBinding using ideas on Stack Overflow and elsewhere, but was unable to reduce the CPU use. Maybe there is something I am missing?
I realise, that quite possibly, I might have to do something more drastic. I might need to send fewer, larger messages, in "bundles". Or I might need to use a different means of inter-process communication. Any advice on what to look into would be appreciated.
My test program code is below. It targets .NET Framework 4.7.2. I have been running on Windows 10.
Program.cs
using System;
using System.Diagnostics;
using System.Threading;
namespace IpcExperiments
{
class Program
{
private static readonly string MyName = "Alice";
private static readonly string ProcessName = "IpcExperiments";
private static readonly double DesiredRate = 3000; // In Hz
// private static readonly double DesiredRate = Double.MaxValue; // Go as fast as possible!
private static PerformanceCounter ProcessCpu = null;
static void Main(string[] args)
{
ProcessCpu = new PerformanceCounter("Process", "% Processor Time", ProcessName);
Test(new Experiments.NetNamedPipe(), MyName, DesiredRate);
// Optionally, add other tests here.
Console.Write("\r ");
Console.WriteLine();
Console.WriteLine("All tests complete! Press Enter to finish.");
Console.ReadLine();
}
private static void Test(Experiments.IIpcExperiment experiment, string myName, double desiredRate = Double.MaxValue)
{
int i = 0;
int successes = 0;
int fails = 0;
double elapsed = 0;
double rate = 0;
double thisCpu = 0;
double avgCpu = 0;
double cpuCount = 0;
string matchingName = String.Format("Hello {0}!", myName);
string experimentName = experiment.GetExperimentName();
Console.Write("\rCreating {0}...", experimentName);
experiment.Setup();
DateTime startTime = DateTime.Now;
DateTime nextCpuRead = DateTime.MinValue;
while (!Console.KeyAvailable)
{
if (experiment.SayHello(myName).Equals(matchingName))
{
successes++;
}
else
{
fails++;
}
if (nextCpuRead < DateTime.Now)
{
thisCpu = ProcessCpu.NextValue();
if (cpuCount == 0)
{
avgCpu = thisCpu;
}
else
{
avgCpu = ((avgCpu * cpuCount) + thisCpu) / (cpuCount + 1);
}
cpuCount++;
nextCpuRead = DateTime.Now.AddSeconds(1);
}
elapsed = (DateTime.Now - startTime).TotalSeconds;
rate = ((double)i) / elapsed;
Console.Write("\r{0}\tPassed: {1}\tFailed: {2}\tElapsed: {3:0.0} s\tRate: {4:0} Hz\t Process CPU: {5:0.0} %"
, experimentName
, successes
, fails
, elapsed
, rate
, avgCpu);
while (rate > desiredRate && !Console.KeyAvailable)
{
Thread.Sleep(1);
elapsed = (DateTime.Now - startTime).TotalSeconds;
rate = ((double)i) / elapsed;
}
i++;
}
Console.ReadKey(true);
Console.WriteLine();
Console.Write("\rDisposing {0}...", experimentName);
experiment.Shutdown();
}
}
}
IIpcExperiment.cs
namespace IpcExperiments.Experiments
{
interface IIpcExperiment
{
string GetExperimentName();
void Setup();
void Shutdown();
string SayHello(string myName);
}
}
NetNamedPipe.cs
using System;
using System.ServiceModel;
namespace IpcExperiments.Experiments
{
[ServiceContract]
public interface INetNamedPipe
{
[OperationContract]
string SayHello(string myName);
}
public class IpcInterface : INetNamedPipe
{
public string SayHello(string myName)
{
return String.Format("Hello {0}!", myName);
}
}
public class NetNamedPipe : IIpcExperiment
{
private ServiceHost Host;
private INetNamedPipe Client;
public void Setup()
{
SetupHost();
SetupClient();
}
public void Shutdown()
{
Host.Close();
}
public string GetExperimentName()
{
return "NetNamedPipe";
}
public string SayHello(string myName)
{
return Client.SayHello(myName);
}
private void SetupHost()
{
Host = new ServiceHost(typeof(IpcInterface),
new Uri[]{
new Uri(#"net.pipe://localhost")
});
NetNamedPipeBinding nnpb = new NetNamedPipeBinding();
Host.AddServiceEndpoint(typeof(INetNamedPipe)
, nnpb
, "NetNamedPipeExample");
Host.Open();
}
private void SetupClient()
{
NetNamedPipeBinding nnpb = new NetNamedPipeBinding();
ChannelFactory<INetNamedPipe> pipeFactory =
new ChannelFactory<INetNamedPipe>(
nnpb,
new EndpointAddress(#"net.pipe://localhost/NetNamedPipeExample"));
Client = pipeFactory.CreateChannel();
}
}
}

Here's how I solved this in the end.
Before the fix, in the sample code in the question above, I made repeated calls to SayHello and the overhead in doing so consumed a lot of CPU.
After the fix, I am getting the same data through a single Stream. I suspect that the CPU overhead of setting up a stream is approximately the same, but the stream only needs to be set up once. The overall CPU use is much lower.
Streams are supported by WCF named pipes so I didn't have to abandon using named pipes.
You can read about streaming here, or if that link dies put TransferMode.Streaming into your favourite search engine.
My stream needed to be "infinite" so it could push data forever, so I needed to make a custom Stream. This answer on Stack Overflow helped guide me.
I still have some rough edges to work out but the CPU use problem (i.e. the crux of this question) seems to have been solved by this approach.

Related

Threading.Timer stops in Console application

I'm working with with a dictionary containing a ID as key and a queue as the value. I have one thread writing to the queues, and another thread reading from the queues, so I need to use the Concurrent-structures that were introduced in .NET 4.0. As a part of this i tried to write a test application just to fill the queues, but I came across an issue with the timers stopping after around 10 seconds. I really don't understand why as there is nothing to catch, no error message or anything to give me a hint about what might be wrong.
So can someone please explain to me why the timer stops after around 10 seconds? I've tried this on two different computers (both using Visual Studio 2012, but with .NET Framework 4.0).
class Program {
private readonly ConcurrentDictionary<int, ConcurrentQueue<TestObject>> _pipes =
new ConcurrentDictionary<int, ConcurrentQueue<TestObject>>();
static void Main() {
Program program = new Program();
program.Run();
Console.Read();
}
private void Run() {
_pipes[100] = new ConcurrentQueue<TestObject>();
_pipes[200] = new ConcurrentQueue<TestObject>();
_pipes[300] = new ConcurrentQueue<TestObject>();
Timer timer = new Timer(WriteStuff, null, 0, 100);
}
private void WriteStuff(object sender) {
for (int i = 0; i < 5; i++) {
foreach (KeyValuePair<int, ConcurrentQueue<TestObject>> pipe in _pipes) {
pipe.Value.Enqueue(
new TestObject { Name = DateTime.Now.ToString("o") + "-" + i });
}
i++;
}
Console.WriteLine(DateTime.Now + "added stuff");
}
}
internal class TestObject {
public string Name { get; set; }
public bool Sent { get; set; }
}
Most likely, the timer is going out of scope and being collected. Declare the timer at outer scope. That is:
private Timer timer;
private void Run()
{
...
timer = new Timer(WriteStuff, null, 0, 100);
}
Also, I think you'll find that BlockingCollection is easier to work with than ConcurrentQueue. BlockingCollection wraps a very nice API around concurrent collections, making it easier to do non-busy waits on the queue when removing things. In its default configuration, it uses a ConcurrentQueue as the backing store. All you have to do to use it is replace ConcurrentQueue in your code with BlockingCollection, and change from calling Enqueue to calling Add. As in:
for (int i = 0; i < 5; i++)
{
foreach (var pipe in _pipes)
{
pipe.Value.Add(
new TestObject { Name = DateTime.Now.ToString("o") + "-" + i });
}
}

What is an efficent method for in-order processing of events using CCR?

I was experimenting with CCR iterators as a solution to a task that requires parallel processing of tons of data feeds, where the data from each feed needs to be processed in order. None of the feeds are dependent on each other, so the in-order processing can be paralleled per-feed.
Below is a quick and dirty mockup with one integer feed, which simply shoves integers into a Port at a rate of about 1.5K/second, and then pulls them out using a CCR iterator to keep the in-order processing guarantee.
class Program
{
static Dispatcher dispatcher = new Dispatcher();
static DispatcherQueue dispatcherQueue =
new DispatcherQueue("DefaultDispatcherQueue", dispatcher);
static Port<int> intPort = new Port<int>();
static void Main(string[] args)
{
Arbiter.Activate(
dispatcherQueue,
Arbiter.FromIteratorHandler(new IteratorHandler(ProcessInts)));
int counter = 0;
Timer t = new Timer( (x) =>
{ for(int i = 0; i < 1500; ++i) intPort.Post(counter++);}
, null, 0, 1000);
Console.ReadKey();
}
public static IEnumerator<ITask> ProcessInts()
{
while (true)
{
yield return intPort.Receive();
int currentValue;
if( (currentValue = intPort) % 1000 == 0)
{
Console.WriteLine("{0}, Current Items In Queue:{1}",
currentValue, intPort.ItemCount);
}
}
}
}
What surprised me about this greatly was that CCR could not keep up on a Corei7 box, with the queue size growing without bounds. In another test to measure the latency from the Post() to the Receive() under a load or ~100 Post/sec., the latency between the first Post() and Receive() in each batch was around 1ms.
Is there something wrong with my mockup? If so, what is a better way of doing this using CCR?
Yes, I agree, this does indeed seem weird. Your code seems initially to perform smoothly, but after a few thousand items, processor usage rises to the point where performance is really lacklustre. This disturbs me and suggests a problem in the framework. After a play with your code, I can't really identify why this is the case. I'd suggest taking this problem to the Microsoft Robotics Forums and seeing if you can get George Chrysanthakopoulos (or one of the other CCR brains) to tell you what the problem is. I can however surmise that your code as it stands is terribly inefficient.
The way that you are dealing with "popping" items from the Port is very inefficient. Essentially the iterator is woken each time there is a message in the Port and it deals with only one message (despite the fact that there might be several hundred more in the Port), then hangs on the yield while control is passed back to the framework. At the point that the yielded receiver causes another "awakening" of the iterator, many many messages have filled the Port. Pulling a thread from the Dispatcher to deal with only a single item (when many have piled up in the meantime) is almost certainly not the best way to get good throughput.
I've modded your code such that after the yield, we check the Port to see if there are any further messages queued and deal with them too, thereby completely emptying the Port before we yield back to the framework. I've also refactored your code somewhat to use CcrServiceBase which simplifies the syntax of some of the tasks you are doing:
internal class Test:CcrServiceBase
{
private readonly Port<int> intPort = new Port<int>();
private Timer timer;
public Test() : base(new DispatcherQueue("DefaultDispatcherQueue",
new Dispatcher(0,
"dispatcher")))
{
}
public void StartTest() {
SpawnIterator(ProcessInts);
var counter = 0;
timer = new Timer(x =>
{
for (var i = 0; i < 1500; ++i)
intPort.Post(counter++);
}
,
null,
0,
1000);
}
public IEnumerator<ITask> ProcessInts()
{
while (true)
{
yield return intPort.Receive();
int currentValue = intPort;
ReportCurrent(currentValue);
while(intPort.Test(out currentValue))
{
ReportCurrent(currentValue);
}
}
}
private void ReportCurrent(int currentValue)
{
if (currentValue % 1000 == 0)
{
Console.WriteLine("{0}, Current Items In Queue:{1}",
currentValue,
intPort.ItemCount);
}
}
}
Alternatively, you could do away with the iterator completely, as it's not really well used in your example (although I'm not entirely sure what effect this has on the order of processing):
internal class Test : CcrServiceBase
{
private readonly Port<int> intPort = new Port<int>();
private Timer timer;
public Test() : base(new DispatcherQueue("DefaultDispatcherQueue",
new Dispatcher(0,
"dispatcher")))
{
}
public void StartTest()
{
Activate(
Arbiter.Receive(true,
intPort,
i =>
{
ReportCurrent(i);
int currentValue;
while (intPort.Test(out currentValue))
{
ReportCurrent(currentValue);
}
}));
var counter = 0;
timer = new Timer(x =>
{
for (var i = 0; i < 500000; ++i)
{
intPort.Post(counter++);
}
}
,
null,
0,
1000);
}
private void ReportCurrent(int currentValue)
{
if (currentValue % 1000000 == 0)
{
Console.WriteLine("{0}, Current Items In Queue:{1}",
currentValue,
intPort.ItemCount);
}
}
}
Both these examples significantly increase throughput by orders of magnitude. Hope this helps.

MethodBase.GetCurrentMethod() Performance?

I have written a log class and a function as in the following code:
Log(System.Reflection.MethodBase methodBase, string message)
Every time I log something I also log the class name from the methodBase.Name and methodBase.DeclaringType.Name.
I read the following post Using Get CurrentMethod and I noticed that this method is slow.
Should I use the this.GetType() instead of System.Reflection.MethodBase or I should manually log the class/method name in my log e.g. Log("ClassName.MethodName", "log message)? What is the best practice?
It really depends.
If you use the this.GetType() approach you will lose the method information, but you will have a big performance gain (apparently a factor of 1200, according to your link).
If you offer an interface that lets the caller supply strings (e.g. Log("ClassName.MethodName", "log message"), you will probably gain even better performance, but this makes your API less friendly (the calling developer has to supply the class name and method name).
I know this is an old question, but I figured I'd throw out a simple solution that seems to perform well and maintains symbols
static void Main(string[] args)
{
int loopCount = 1000000; // 1,000,000 (one million) iterations
var timer = new Timer();
timer.Restart();
for (int i = 0; i < loopCount; i++)
Log(MethodBase.GetCurrentMethod(), "whee");
TimeSpan reflectionRunTime = timer.CalculateTime();
timer.Restart();
for (int i = 0; i < loopCount; i++)
Log((Action<string[]>)Main, "whee");
TimeSpan lookupRunTime = timer.CalculateTime();
Console.WriteLine("Reflection Time: {0}ms", reflectionRunTime.TotalMilliseconds);
Console.WriteLine(" Lookup Time: {0}ms", lookupRunTime.TotalMilliseconds);
Console.WriteLine();
Console.WriteLine("Press Enter to exit");
Console.ReadLine();
}
public static void Log(Delegate info, string message)
{
// do stuff
}
public static void Log(MethodBase info, string message)
{
// do stuff
}
public class Timer
{
private DateTime _startTime;
public void Restart()
{
_startTime = DateTime.Now;
}
public TimeSpan CalculateTime()
{
return DateTime.Now.Subtract(_startTime);
}
}
Running this code gives me the following results:
Reflection Time: 1692.1692ms
Lookup Time: 19.0019ms
Press Enter to exit
For one million iterations, that's not bad at all, especially compared to straight up reflection. The method group is being cast to a Delegate type, you maintain a symbolic link all the way into the logging. No goofy magic strings.

Simplest Possible Performance Counter Example

What is the smallest amount of C# code to get a performance counter up and running?
I simply want to measure the number of CPU cycles and/or time between two points in my code. I've skimmed through all the waffle on the web but it seems like WAY more code than is necessary for such a trivial task. I just want to get a quick measurement up and running and concentrate more on what I'm working on.
I don't think you need a performance counter for that. Do you need more than the timing you can get from StopWatch ? It is very accurate.
Stopwatch watch = Stopwatch.StartNew();
// Do work
watch.Stop();
// elapsed time is in watch.Elapsed
However, to answer the question you actually asked: If you just want to query existing counters, it is in fact quite simple. Here is a full example:
using System;
using System.Diagnostics;
using System.Linq;
static class Test
{
static void Main()
{
var processorCategory = PerformanceCounterCategory.GetCategories()
.FirstOrDefault(cat => cat.CategoryName == "Processor");
var countersInCategory = processorCategory.GetCounters("_Total");
DisplayCounter(countersInCategory.First(cnt => cnt.CounterName == "% Processor Time"));
}
private static void DisplayCounter(PerformanceCounter performanceCounter)
{
while (!Console.KeyAvailable)
{
Console.WriteLine("{0}\t{1} = {2}",
performanceCounter.CategoryName, performanceCounter.CounterName, performanceCounter.NextValue());
System.Threading.Thread.Sleep(1000);
}
}
}
Of course, the process will need appropiate permissions to access the performance counters you need.
I like something that can take any code block and wrap it with stopwatch profiling code to measure time spent executing it:
using System.Diagnostics;
using System.Threading;
public static T Profile<T>(Func<T> codeBlock, string description = "")
{
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
T res = codeBlock();
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
const double thresholdSec = 2;
double elapsed = ts.TotalSeconds;
if(elapsed > thresholdSec)
System.Diagnostics.Debug.Write(description + " code was too slow! It took " +
elapsed + " second(s).");
return res;
}
Then call it like that:
Profile(() => MyObj.MySlowMethod());
or:
Profile(() => MyObj.MySlowMethod(), "I can explain why");
There is no trivial way to get this up and running in .NET. However, the simplest way I've found is to build on top of the Enterprise Library which provides some out of the box capabilities for working with performance counters. For example: the Performance Counter Handler
The Enterprise Library also gives you some capabilities for much more easily managing the installation of performance counters.
Additionally, it let's you build on top of it so, you can create an AvergeTimeMeter which allows you to just do this:
private static EnterpriseLibraryPerformanceCounter averageRequestTimeCounter = PerformanceCounterManager.GetEnterpriseLibraryCounter(MadPerformanceCountersListener.AverageRequestTime);
private static EnterpriseLibraryPerformanceCounter averageRequestTimeCounterBase = PerformanceCounterManager.GetEnterpriseLibraryCounter(MadPerformanceCountersListener.AverageRequestTimeBase);
public void DoSomethingWeWantToMonitor()
{
using (new AverageTimeMeter(averageRequestTimeCounter, averageRequestTimeCounterBase))
{
// code here that you want to perf mon
}
}
This allows you to simply encapsulate the code you want to monitor in a using block - and concentrate on the code you actually want to work on rather than worrying about all the performance counter infrastructure.
To do this, you'll create a re-usable AverageTimeMeter class like this:
public sealed class AverageTimeMeter : IDisposable
{
private EnterpriseLibraryPerformanceCounter averageCounter;
private EnterpriseLibraryPerformanceCounter baseCounter;
private Stopwatch stopWatch;
private string instanceName;
public AverageTimeMeter(EnterpriseLibraryPerformanceCounter averageCounter, EnterpriseLibraryPerformanceCounter baseCounter, string instanceName = null)
{
this.stopWatch = new Stopwatch();
this.averageCounter = averageCounter;
this.baseCounter = baseCounter;
this.instanceName = instanceName;
this.stopWatch.Start();
}
public void Dispose()
{
this.stopWatch.Stop();
if (this.baseCounter != null)
{
this.baseCounter.Increment();
}
if (this.averageCounter != null)
{
if (string.IsNullOrEmpty(this.instanceName))
{
this.averageCounter.IncrementBy(this.stopWatch.ElapsedTicks);
}
else
{
this.averageCounter.SetValueFor(this.instanceName, this.averageCounter.Value + this.stopWatch.ElapsedTicks);
}
}
}
}
You have to register your performance counters (shown in the EntLib examples) but this should get your started.

Time required for a process to complete

I am new to C# world. I am attempting to calculate time taken by a algorithum for the purpose of comparison. Following code measures the elapsed time from when a subroutine is called until the subroutine returns to the main program.This example is taken from "Data structures through C#" by Michael McMillan.
After running this program the output is Time=0, which is incorrect. The program appears to be logically correct. Can anybody help me. Following is the code
using System;
using System.Diagnostics;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Chap1
{
class chap1
{
static void Main()
{
int[] nums = new int[100000];
BuildArray(nums);
Timing tObj = new Timing();
tObj.startTime();
DisplayNums(nums);
tObj.stopTime();
Console.WriteLine("Time: " + tObj.result().TotalSeconds);
Console.WriteLine("Start Time: " + tObj.startTime().TotalSeconds);
Console.WriteLine("Duration : " + tObj.result().TotalSeconds);
Console.ReadKey();
}
static void BuildArray(int[] arr)
{
for (int i = 0; i <= 99999; i++)
arr[i] = i;
}
static void DisplayNums(int[] arr)
{
for (int i = 0; i <= arr.GetUpperBound(0); i++)
Console.WriteLine(arr[i]);
}
}
class Timing
{
TimeSpan StartTiming;
TimeSpan duration;
public Timing()
{
StartTiming = new TimeSpan(0);
duration = new TimeSpan(0);
}
public TimeSpan startTime()
{
GC.Collect();
GC.WaitForPendingFinalizers();
StartTiming = Process.GetCurrentProcess().Threads[0].UserProcessorTime;
return StartTiming;
}
public void stopTime()
{
duration = Process.GetCurrentProcess().Threads[0].UserProcessorTime.Subtract(StartTiming);
}
public TimeSpan result()
{
return duration;
}
}
}
The Stopwatch class is designed for this.
UserProcessorTime doesn't begin to have the resolution necessary to measure counting to 100000 in a for loop. Your WriteLine calls won't be included in user time as they are I/O time. Your code might not be running on thread 0. User time isn't updated except at context switches. When you print startTime, you're changing the stored value. There are probably some other things that can go wrong I haven't thought of.
I strongly suggest you use the Stopwatch class which takes advantage of the CPU's performance counters.
You don't use the Timing class anywhere in your main function and I don't see where you print the time either. Is this the EXACT code you're running?
Update per new code:
Don't run it in debug mode... build your release version and then run the executable manually: http://social.msdn.microsoft.com/forums/en-US/vbgeneral/thread/3f10a46a-ba03-4f5a-9d1f-272a348d660c/
I tested your code and it worked fine when running the release version, but when I was running it in the debugger it was not working properly.

Categories

Resources