measuring time across processes

measuring time across processes - c#

I need to measure the communitcation latency between two processes on the same machine.The best way I have come up with is serializing DateTime.UtcNow (DateTime.Now seems to be so utterly slow that it extremely distorts my measurements) into the message and comparing it with the DateTime.UtcNow in the other process. Is this as good as it gets? Or is there a better way?

If your goal is to measure and compare exact times between processes, you should use the Windows API function QueryPerformanceCounter(). The values that it returns are synchronized between processes because it returns an internal processor value.
Stopwatch also uses QueryPerformanceCounter() in its implementation, but it doesn't expose the absolute values that are returned so you can't use it.
You will have to use P/Invoke to call QueryPerformanceCounter() but it's pretty easy.
The overhead of using P/Invoke is small. From the MSDN documentation:
PInvoke has an overhead of between 10 and 30 x86 instructions per call. In addition to this fixed cost, marshaling creates additional overhead. There is no marshaling cost between blittable types that have the same representation in managed and unmanaged code. For example, there is no cost to translate between int and Int32.
Since the value returned from QueryPerformanceCounter() is a long, there will be no additional marshaling cost from it, so you're left with an overhead of 10-30 instructions.
Also see this MSDN blog where it is stated that the resolution of UtcNow is around 10ms - which is pretty huge compared to the resolution of the performance counter. (Although I actually don't believe this is true for Windows 8; my measurements seem to show that UtcNow has a millisecond resolution).
Anyway, it is easy to demonstrate that P/Invoking QueryPerformanceCounter() has a higher resolution than using DateTime.UtcNow.
If you run a release build of the following code (run from OUTSIDE a debugger), you'll see that almost all the DateTime.UtcNow elapsed times are 0, whereas all the QueryPerformanceCounter() ones are nonzero.
This is because the resolution of DateTime.UtcNow is not high enough to measure the elapsed time of calling Thread.Sleep(0), whereas QueryPerformanceCounter() is.
using System;
using System.Runtime.InteropServices;
using System.Threading;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
for (int i = 0; i < 100; ++i)
{
var t1 = DateTime.UtcNow;
Thread.Sleep(0);
var t2 = DateTime.UtcNow;
Console.WriteLine("UtcNow elapsed = " + (t2-t1).Ticks);
}
for (int i = 0; i < 100; ++i)
{
long q1, q2;
QueryPerformanceCounter(out q1);
Thread.Sleep(0);
QueryPerformanceCounter(out q2);
Console.WriteLine("QPC elapsed = " + (q2-q1));
}
}
[DllImport("kernel32.dll", SetLastError=true)]
static extern bool QueryPerformanceCounter(out long lpPerformanceCount);
}
}
Now I realise that it could be that the overhead of calling QueryPerformanceCounter() is so high that it is measuring how long it is taking to call, rather than how long Thread.Sleep(0) takes. We can eliminate that in two ways:
Firstly, we can modify the first loop as follows:
for (int i = 0; i < 100; ++i)
{
var t1 = DateTime.UtcNow;
long dummy;
QueryPerformanceCounter(out dummy);
Thread.Sleep(0);
QueryPerformanceCounter(out dummy);
var t2 = DateTime.UtcNow;
Console.WriteLine("UtcNow elapsed = " + (t2-t1).Ticks);
}
Now the UtcNow should be timing Thread.Sleep(0) and two calls to QueryPerformanceCounter(). But if you run it, you'll still see almost all the elapsed times being zero.
Secondly, we can time how long it takes to call QueryPerformanceCounter() a million times:
var t1 = DateTime.UtcNow;
for (int i = 0; i < 1000000; ++i)
{
long dummy;
QueryPerformanceCounter(out dummy);
}
var t2 = DateTime.UtcNow;
Console.WriteLine("Elapsed = " + (t2-t1).TotalMilliseconds);
On my system it takes around 32ms to call QueryPerformanceCounter() one million times.
Finally, we can time how long it takes to call DateTime.UtcNow one million times:
var t1 = DateTime.UtcNow;
for (int i = 0; i < 1000000; ++i)
{
var dummy = DateTime.UtcNow;
}
var t2 = DateTime.UtcNow;
Console.WriteLine("Elapsed = " + (t2-t1).TotalMilliseconds);
On my system that takes around 10ms, which is around 3 times faster than calling QueryPerformanceCounter().
In Summary
So DateTime.UtcNow has lower overhead than P/Invoking QueryPerformanceCounter(), but it has much lower resolution.
So you pays your money and you takes your choice!

Related

C# Performance counter for CPU usage (processor time) how does the RawValue relate to the NextValue()?

I've been trying to grab the CPU usage of a process to verify it isn't hanging. The method NextValue() seems to be the recommended approach, but it seems to only get the CPU usage at the time of checking, and not the total CPU used in between checks (experimentation code and results below)
I ran the following code to see If I could figure out the relationship between the two values, but I can't seem to tell if they actually relate to each other directly. I can tell that 'foo.TotalProcessorTime' does seem to line up with counter.RawValue exactly.
var foo = Process.Start("MyProces.exe");
PerformanceCounter counter = new PerformanceCounter("Process", "% Processor Time", foo.ProcessName, true);
decimal totalValue = 0;
while (true)
{
var newTotal = counter.RawValue;
var increment = counter.NextValue();
var theoryTotal = totalValue + (decimal)increment*10000;//tried 10,000 and 100,000
logger.Info($"New Total: {newTotal} increment {increment} oldTotal + increment {theoryTotal} difference {newTotal-theoryTotal}");
totalValue = newTotal;
await Task.Delay(100);//different increments don't seem to affect the magnitude of NextValue
}
The following is my experimentation of NextValue(), the logs on the right are for the 1s interval, on the left is a 10s interval, with me doing the same things on the process in question.
counter.NextValue();//initialize
while (true)
{
await Task.Delay(10000);//tried 1s and 10s intervals, greater values on the 1s interval
logger.Info($"Percentage: {counter.NextValue()}");
}
TL;DR; How is NextValue() derived from RawValue? Or are they metrics that measure the same things but in a completely unconnected way? And if NextValue() really measures the value in between calls of NextValue() why am I getting '0's when I use a long interval between measurements and leave the process alone at the time of referencing NextValue()?

Is parallel code supposed to run slower than sequential code, after a certain dataset size?

I'm fairly new to C# and programming in general and I was trying out parallel programming.
I have written this example code that computes the sum of an array first, using multiple threads, and then, using one thread (the main thread).
I've timed both cases.
static long Sum(int[] numbers, int start, int end)
{
long sum = 0;
for (int i = start; i < end; i++)
{
sum += numbers[i];
}
return sum;
}
static async Task Main()
{
// Arrange data.
const int COUNT = 100_000_000;
int[] numbers = new int[COUNT];
Random random = new();
for (int i = 0; i < numbers.Length; i++)
{
numbers[i] = random.Next(100);
}
// Split task into multiple parts.
int threadCount = Environment.ProcessorCount;
int taskCount = threadCount - 1;
int taskSize = numbers.Length / taskCount;
var start = DateTime.Now;
// Run individual parts in separate threads.
List<Task<long>> tasks = new();
for (int i = 0; i < taskCount; i++)
{
int begin = i * taskSize;
int end = (i == taskCount - 1) ? numbers.Length : (i + 1) * taskSize;
tasks.Add(Task.Run(() => Sum(numbers, begin, end)));
}
// Wait for all threads to finish, as we need the result.
var partialSums = await Task.WhenAll(tasks);
long sumAsync = partialSums.Sum();
var durationAsync = (DateTime.Now - start).TotalMilliseconds;
Console.WriteLine($"Async sum: {sumAsync}");
Console.WriteLine($"Async duration: {durationAsync} miliseconds");
// Sequential
start = DateTime.Now;
long sumSync = Sum(numbers, 0, numbers.Length);
var durationSync = (DateTime.Now - start).TotalMilliseconds;
Console.WriteLine($"Sync sum: {sumSync}");
Console.WriteLine($"Sync duration: {durationSync} miliseconds");
var factor = durationSync / durationAsync;
Console.WriteLine($"Factor: {factor:0.00}x");
}
When the array size is 100 million, the parallel sum is computed 2x faster. (on average).
But when the array size is 1 billion, it's significantly slower than the sequential sum.
Why is it running slower?
Hardware Information
Environment.ProcessorCount = 4
GC.GetGCMemoryInfo().TotalAvailableMemoryBytes = 8468377600
Timing:
When array size is 100,000,000
When array size is 1,000,000,000
New Test:
This time instead of separate threads (it was 3 in my case) working on different parts of a single array of 1,000,000,000 integers, I physically divided the dataset into 3 separate arrays of 333,333,333 (one-third in size). This time, although, I'm working on adding up a billion integers on the same machine, my parallel code runs faster (as expected)
private static void InitArray(int[] numbers)
{
Random random = new();
for (int i = 0; i < numbers.Length; i++)
{
numbers[i] = (int)random.Next(100);
}
}
public static async Task Main()
{
Stopwatch stopwatch = new();
const int SIZE = 333_333_333; // one third of a billion
List<int[]> listOfArrays = new();
for (int i = 0; i < Environment.ProcessorCount - 1; i++)
{
int[] numbers = new int[SIZE];
InitArray(numbers);
listOfArrays.Add(numbers);
}
// Sequential.
stopwatch.Start();
long syncSum = 0;
foreach (var array in listOfArrays)
{
syncSum += Sum(array);
}
stopwatch.Stop();
var sequentialDuration = stopwatch.Elapsed.TotalMilliseconds;
Console.WriteLine($"Sequential sum: {syncSum}");
Console.WriteLine($"Sequential duration: {sequentialDuration} ms");
// Parallel.
stopwatch.Restart();
List<Task<long>> tasks = new();
foreach (var array in listOfArrays)
{
tasks.Add(Task.Run(() => Sum(array)));
}
var partialSums = await Task.WhenAll(tasks);
long parallelSum = partialSums.Sum();
stopwatch.Stop();
var parallelDuration = stopwatch.Elapsed.TotalMilliseconds;
Console.WriteLine($"Parallel sum: {parallelSum}");
Console.WriteLine($"Parallel duration: {parallelDuration} ms");
Console.WriteLine($"Factor: {sequentialDuration / parallelDuration:0.00}x");
}
Timing
I don't know if it helps figure out what went wrong in the first approach.

The asynchronous pattern is not the same as running code in parallel. The main reason for asynchronous code is better resource utilization while the computer is waiting for some kind of IO device. Your code would be better described as parallel computing or concurrent computing.
While your example should work fine, it may not be the easiest, nor optimal way to do it. The easiest option would probably be to use Parallel Linq: numbers.AsParallel().Sum();. There is also a Parallel.For method that should be better suited, including an overload that maintains a thread local state. Note that while the parallel.For will attempt to optimize its partitioning, you probably want to process chunks of data in each iteration to reduce overhead. I would try around 1-10k values or so.
We can only guess the reason your parallel method is slower. Summing numbers is a really fast operation, so it may be that the computation is limited by memory bandwith or Cache usage. And while you want your work partitions to be fairly large, using too large partitions may result in less overall parallelism if a thread gets suspended for any reason. You may also want partitions on certain sizes to work well with the caching system, see cache associativity. It is also possible you are including things you did not intend to measure, like compilation times or GCs, See benchmark .Net that takes care of many of the edge cases when measuring performance.
Also, never use DateTime for measuring performance, Stopwatch is both much easier to use and much more accurate.

My machine has 4GB RAM, so initializing an int[1_000_000_000] results in memory paging. Going from int[100_000_000] to int[1_000_000_000] results in non-linear performance degradation (100x instead of 10x). Essentially a CPU-bound operation becomes I/O-bound. Instead of adding numbers, the program spends most of its time reading segments of the array from the disk. In these conditions using multiple threads can be detrimental for the overall performance, because the pattern of accessing the storage device becomes more erratic and less streamlined.
Maybe something similar happens on your 8GB RAM machine too, but I can't say for sure.

Parallel.ForEach search doesn't find the correct value

This is my first attempt at parallel programming.
I'm writing a test console app before using this in my real app and I can't seem to get it right. When I run this, the parallel search is always faster than the sequential one, but the parallel search never finds the correct value. What am I doing wrong?
I tried it without using a partitioner (just Parallel.For); it was slower than the sequential loop and gave the wrong number. I saw a Microsoft doc that said for simple computations, using Partitioner.Create can speed things up. So I tried that but still got the wrong values. Then I saw Interlocked, but I think I'm using it wrong.
Any help would be greatly appreciated
Random r = new Random();
Stopwatch timer = new Stopwatch();
do {
// Make and populate a list
List<short> test = new List<short>();
for (int x = 0; x <= 10000000; x++)
{
test.Add((short)(r.Next(short.MaxValue) * r.NextDouble()));
}
// Initialize result variables
short rMin = short.MaxValue;
short rMax = 0;
// Do min/max normal search
timer.Start();
foreach (var amp in test)
{
rMin = Math.Min(rMin, amp);
rMax = Math.Max(rMax, amp);
}
timer.Stop();
// Display results
Console.WriteLine($"rMin: {rMin} rMax: {rMax} Time: {timer.ElapsedMilliseconds}");
// Initialize parallel result variables
short pMin = short.MaxValue;
short pMax = 0;
// Create list partioner
var rangePortioner = Partitioner.Create(0, test.Count);
// Do min/max parallel search
timer.Restart();
Parallel.ForEach(rangePortioner, (range, loop) =>
{
short min = short.MaxValue;
short max = 0;
for (int i = range.Item1; i < range.Item2; i++)
{
min = Math.Min(min, test[i]);
max = Math.Max(max, test[i]);
}
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMin), Math.Min(pMin, min));
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMax), Math.Max(pMax, max));
});
timer.Stop();
// Display results
Console.WriteLine($"pMin: {pMin} pMax: {pMax} Time: {timer.ElapsedMilliseconds}");
Console.WriteLine("Press enter to run again; any other key to quit");
} while (Console.ReadKey().Key == ConsoleKey.Enter);
Sample output:
rMin: 0 rMax: 32746 Time: 106
pMin: 0 pMax: 32679 Time: 66
Press enter to run again; any other key to quit

The correct way to do a parallel search like this is to compute local values for each thread used, and then merge the values at the end. This ensures that synchronization is only needed at the final phase:
var items = Enumerable.Range(0, 10000).ToList();
int globalMin = int.MaxValue;
int globalMax = int.MinValue;
Parallel.ForEach<int, (int Min, int Max)>(
items,
() => (int.MaxValue, int.MinValue), // Create new min/max values for each thread used
(item, state, localMinMax) =>
{
var localMin = Math.Min(item, localMinMax.Min);
var localMax = Math.Max(item, localMinMax.Max);
return (localMin, localMax); // return the new min/max values for this thread
},
localMinMax => // called one last time for each thread used
{
lock(items) // Since this may run concurrently, synchronization is needed
{
globalMin = Math.Min(globalMin, localMinMax.Min);
globalMax = Math.Max(globalMax, localMinMax.Max);
}
});
As you can see this is quite a bit more complex than a regular loop, and this is not even doing anything fancy like partitioning. An optimized solution would work over larger blocks to reduce overhead, but this is omitted for simplicity, and it looks like the OP is aware such issues already.
Be aware that multi threaded programming is difficult. While it is a great idea to try out such techniques in a playground rather than a real program, I would still suggest that you should start by studying the potential dangers of thread safety, there is fairly easy to find good resources about this.
Not all problems will be as obviously wrong like this, and it is quite easy to cause issues that breaks once in a million, or only when the cpu load is high, or only on single CPU systems, or issues that are only detected long after the code is put into production. It is a good practice to be paranoid whenever multiple threads may read and write the same memory concurrently.
I would also recommend learning about immutable data types, and pure functions, since these are much safer and easier to reason about once multiple threads are involved.

Interlocked.Exchange is thread safe only for Exchange, every Math.Min and Math.Max can be with race condition. You should compute min/max for every batch separately and then join results.

Using low-lock techniques like the Interlocked class is tricky and advanced. Taking into consideration that your experience in multithreading is not excessive, I would say go with a simple and trusty lock:
object locker = new object();
//...
lock (locker)
{
pMin = Math.Min(pMin, min);
pMax = Math.Max(pMax, max);
}

Why comparing two identical byte array takes different amount of time to complete ?

I am trying to compute hashes and then compare them to simulate timing attack in c#
This is the code i am using for this purpose:
private void btnHash_Click(object sender, EventArgs e)
{
MD5 md5 = new MD5CryptoServiceProvider();
var firstHashByte = md5.ComputeHash(ASCIIEncoding.ASCII.GetBytes(txtBoxText.Text));
txtBoxHash.Text = Convert.ToBase64String(firstHashByte);
var secondHashByte = md5.ComputeHash(ASCIIEncoding.ASCII.GetBytes(txtBoxSecondText.Text));
txtBoxHashtwo.Text = Convert.ToBase64String(secondHashByte);
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
NormalEquality(firstHashByte, secondHashByte);
//SlowEquals(firstHashByte, secondHashByte);
stopwatch.Stop();
lstBoxTimeSpan.Items.Add(stopwatch.ElapsedTicks.ToString());
}
private static void NormalEquality(byte[] hashByte, byte[] hashByte2)
{
bool bEqual = false;
if (hashByte.Length == hashByte2.Length)
{
int i = 0;
while ((i < hashByte2.Length) && (hashByte2[i] == hashByte[i]))
{
i += 1;
}
if (i == hashByte2.Length)
{
bEqual = true;
}
}
}
each time i try to run this, i get different times for even identical hashes!
Why is it happening?
I also noticed that using the following method that is said to generate a constant time for both identical and different hashes failes to do so, and it acts just like the previous method, generating different times for just anything! (identical hashesh or different hashes!)
private static bool SlowEquals(byte[] a, byte[] b)
{
uint diff = (uint)a.Length ^ (uint)b.Length;
for (int i = 0; i < a.Length && i < b.Length; i++)
diff |= (uint)(a[i] ^ b[i]);
return diff == 0;
}
Why is it like this ? Any idea?
( By the way as a side question:
Does C# string == comparison internally does this kind of array comparison or it is just another story?
Since whenever i tried to use == string on a Base64 version of hashes, i got 0 time, both for identical and different hashes
I did :
stopwatch.Start();
if ( firstHashString == secondHashString);
stopwatch.Stop();
)

You get different times because the stopwatch resolution is too low to measure so little code. Eventhough the resolution of the stopwatch is very high, the CPU still have time to run thousands of instructions between every tick of the stopwatch.
During the execution of the method the stopwatch will only go a few ticks, so the resulting times will vary very much.
If you run the method for example a thousand times or a million times, you get enough work to measure with small enough time variations.

There are any number of things that affect the timing of method calls, a computer is a vast system, and the tiniest change can become the proverbial butterfly. To be frank, tiny timing differences are nothing to worry about, what matters is whether it always produces the correct result.
One thing to try might be repeating the method call many times eg. a million or ten million times, and timing all of the calls eg.
stopwatch.Start();
for (int i=0; i<1000000; i++) {
// call test here
}
stopwatch.Stop();
If you repeat the above a few times, the timings should be pretty close together.

Performance profiling in .NET

I wrote a class which uses Stopwatch to profile methods and for/foreach loops. With for and foreach loops it tests a standard loop against a Parallel.For or Parallel.ForEach implementation.
You would write performance tests like so:
Method:
PerformanceResult result = Profiler.Execute(() => { FooBar(); });
For loop:
SerialParallelPerformanceResult result = Profiler.For(0, 100, x => { FooBar(x); });
ForEach loop:
SerialParallelPerformanceResult result = Profiler.ForEach(list, item => { FooBar(item); });
Whenever I run the tests (one of .Execute, .For or .ForEach) I put them in a loop so I can see how the performance changes over time.
Example of performance might be:
Method execution 1 = 200ms
Method execution 2 = 12ms
Method execution 3 = 0ms
For execution 1 = 300ms (Serial), 100ms (Parallel)
For execution 2 = 20ms (Serial), 75ms (Parallel)
For execution 3 = 2ms (Serial), 50ms (Parallel)
ForEach execution 1 = 350ms (Serial), 300ms (Parallel)
ForEach execution 2 = 24ms (Serial), 89ms (Parallel)
ForEach execution 3 = 1ms (Serial), 21ms (Parallel)
My questions are:
Why does performance change over time, what is .NET doing in the background to facilitate this?
How/why is a serial operation faster than a parallel one? I have made sure that I make the operations complex to see the difference properly...in most cases serial operations seem faster!?
NOTE: For parallel processing I am testing on an 8 core machine.

After some more exploration into performance profiling, I have discovered that using a Stopwatch is not an accurate way to measure the performance of a particular task
(Thanks hatchet and Loren for your comments on this!)
Reasons a stopwatch are not accurate:
Measurements are calculated in elapsed time in milliseconds, not CPU time.
Measurements can be influenced by background "noise" and thread intensive processes.
Measurements do not take into account JIT compilation and overhead.
That being said, using a stopwatch is OK for casual exploration of performance. With that in mind, I have improved my profiling algorithm somewhat.
Where before it simply executed the expression that was passed to it, it now has the facility to iterate over the expression several times, building an average execution time. The first run can be omitted since this is where JIT kicks in, and some major overhead may occur. Understandably, this will never be as sophisticated as using a professional profiling tool like Redgate's ANTS profiler, but it's OK for simpler tasks!

As per my comment above: I did some simple tests on my own and found no difference over time. Can you share your code? I'll put mine in an answer as it doesn't fit here.
This is my sample code.
(I also tried with both static and instance methods with no difference)
class Program
{
static void Main(string[] args)
{
int to = 50000000;
OtherStuff os = new OtherStuff();
Console.WriteLine(Profile(() => os.CountTo(to)));
Console.WriteLine(Profile(() => os.CountTo(to)));
Console.WriteLine(Profile(() => os.CountTo(to)));
}
static long Profile(Action method)
{
Stopwatch st = Stopwatch.StartNew();
method();
st.Stop();
return st.ElapsedMilliseconds;
}
}
class OtherStuff
{
public void CountTo(int to)
{
for (int i = 0; i < to; i++)
{
// some work...
i++;
i--;
}
}
}
A sample output would be:
331
331
334
Consider executing this method instead:
class OtherStuff
{
public string CountTo(Guid id)
{
using(SHA256 sha = SHA256.Create())
{
int x = default(int);
for (int index = 0; index < 16; index++)
{
x = id.ToByteArray()[index] >> 32 << 16;
}
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
byte[] y = new byte[1024];
rng.GetBytes(y);
y = y.Concat(BitConverter.GetBytes(x)).ToArray();
return BitConverter.ToString(sha.ComputeHash(BitConverter.GetBytes(x).Where(o => o >> 2 < 0).ToArray()));
}
}
}
Sample output:
11
0
0

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.