Parallel.Foreach - last item waits for other items

Parallel.Foreach - last item waits for other items - c#

I've written a parellel.foreach, but the last iteration allways waits for execution before all other items have been processed, in this case, number 10 fires only when 1-9 are ready, tried it a couple of times, why is that and can i fix that?
here is my code:
class Program
{
public static void Main(string[] args)
{
int Min = 100;
int Max = 200;
Random randNum = new Random();
int[] test2 = Enumerable.Repeat(0, 10).Select(i => randNum.Next(Min, Max)).ToArray();
string[] result = test2.Select(x => x.ToString()).ToArray();
int count = 0;
var options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(result, options, rec =>
{
count++;
go(count);
});
Console.WriteLine("Ready!");
Console.ReadKey();
}
public static void go(int counter)
{
Console.WriteLine("START: " + counter.ToString());
Thread.Sleep(3500);
Console.WriteLine("END: " + counter.ToString());
}
}
EDIT
I ran the program for 20-30 times and it seems that a few times 10 is started before 9 ended. I think this happens in about 4 or 5 percent of the times. It's (working properly) happening some sort of random. because it works sometimes it feels a bit buggy, but not a big issue for the performance of my application which i'm building. You understand the above was a testcase, not a part of the actual project :)

Related

Show program runtime every 3 seconds

I'm learning async in C# and want to show a program runtime every three seconds. I have two solutions, but neither works completely properly.
Solution 1
In the first solution, I have a loop where I call two methods. The first performs the calculation and the second shows the startup time, which is divisible by 3.
namespace Async
{
class Program
{
static void Main(string[] args)
{
PerformLoop();
Console.ReadLine();
}
public static async void PerformLoop()
{
Stopwatch timer = new Stopwatch();
timer.Start();
List<Task> l = new List<Task>();
for (int i = 0; i < 50; i++)
{
l.Add(AsyncCalculation(i));
l.Add(ShowTime(Convert.ToInt32(timer.Elapsed.TotalMilliseconds)));
}
await Task.WhenAll(l);
timer.Stop();
Console.WriteLine("Total execution time: " +
timer.Elapsed.TotalMilliseconds);
}
public async static Task AsyncCalculation(int i)
{
var result = 10 * i;
Console.WriteLine("Calculation result: " + result);
}
public async static Task ShowTime(int execTime)
{
if (execTime % 3 == 0)
{
Console.WriteLine("Execution time: " + execTime);
}
}
}
}
Solution 2
In the second solution, I call two methods in a loop. The first performs calculations and the second displays the operation time after 3 seconds. Unfortunately, in this case, the second method blocks the execution of the first.
namespace Async
{
class Program
{
static void Main(string[] args)
{
CallMethod();
Console.ReadLine();
}
public static async void CallMethod()
{
for (int i = 0; i < 50; i++)
{
var results = Calculation(i);
var calcResult = results.Item1;
var time = results.Item2;
ShowResult(calcResult);
await ShowDelayTime(time);
}
}
public static void ShowResult(int calcResult)
{
Console.WriteLine("Calculation result: " + calcResult);
}
public async static Task ShowDelayTime(int execTime)
{
await Task.Delay(3000);
Console.WriteLine("Execution time: " + execTime);
}
public static Tuple<int, int> Calculation(int i)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var result = 10 * i;
stopwatch.Stop();
return Tuple.Create(result,
Convert.ToInt32(stopwatch.Elapsed.TotalMilliseconds));
}
}
}
I have no idea how to continuously display the calculation results and show the running time of the program by three seconds :(((
Edit
Expected output (example):
Calculation result: 0
Calculation result: 10
Execution time: 3 seconds
Calculation result: 20
Calculation result: 30
Calculation result: 40
Execution time: 6 seconds
Calcultion result: 50
//Next iterations
The program now shows the result, waits three seconds, and then go to next iteration. I want iterations for calculations to show regardless (independently) of time. I want the time to show every three seconds of the program running.

You could use a System.Threading.Timer in order to invoke a callback every 3 seconds, and a Stopwatch in order to measure the elapsed seconds since the start of the program:
var stopwatch = Stopwatch.StartNew();
var timer = new System.Threading.Timer(_ =>
{
Console.WriteLine($"Execution time: {stopwatch.Elapsed.TotalSeconds:#,0} seconds");
}, null, 3000, 3000);

Multithreading is taking more time than sequential threading

I am new to C#
I am generating random numbers saving into an integer array of size 1 million, then I search user input number and its occurrences in an array using single thread then I search it using 5 threads. My processor has 4 cores.
THE PROBLEM is multithreading is taking way more time than sequential I just cannot figure out why any help would be much appreciated.
Here is the code.
namespace LAB_2
{
class Program
{
static int[] arr = new int[1000000];
static int counter = 0, c1 = 0, c2 = 0, c3 = 0, c4 = 0,c5=0;
static int x = 0;
#if DEBUG
static void Main(string[] args)
{
try
{
//Take input
generate();
Console.WriteLine("Enter number to search for its occurances");
x = Console.Read();
//Multithreaded search
Stopwatch stopwatch2 = Stopwatch.StartNew();
multithreaded_search();
stopwatch2.Stop();
Console.WriteLine("Multithreaded search");
Console.WriteLine("Total milliseconds with multiple threads = " + stopwatch2.ElapsedMilliseconds);
//search without multithreading
Stopwatch stopwatch = Stopwatch.StartNew();
search();
stopwatch.Stop();
Console.WriteLine("Total milliseconds without multiple threads = " + stopwatch.ElapsedMilliseconds);
}
finally
{
Console.WriteLine("Press enter to close...");
Console.ReadLine();
}
#endif
}
public static void generate() //Populate the array
{
Random rnd = new Random();
for (int i = 0; i < 1000000; i++)
{
arr[i] = rnd.Next(1, 500000);
}
}
public static void search() //single threaded/Normal searching
{
int counter = 0;
for (int i = 0; i < 1000000; i++)
{
if (x == arr[i])
{
counter++;
}
}
Console.WriteLine("Number of occurances " + counter);
}
public static void multithreaded_search()
{
Task thr1 = Task.Factory.StartNew(() => doStuff(0, 200000, "c1"));
Task thr2 = Task.Factory.StartNew(() => doStuff(200001, 400000, "c2"));
Task thr3 = Task.Factory.StartNew(() => doStuff(400001, 600000, "c3"));
Task thr4 = Task.Factory.StartNew(() => doStuff(600001, 800000, "c4"));
Task thr5 = Task.Factory.StartNew(() => doStuff(800001, 1000000, "c5"));
//IF I don't use WaitAll then the search is
//faster than sequential, but gets compromised
Task.WaitAll(thr1, thr2, thr3, thr4, thr5);
counter = c1 + c2 + c3 + c4 + c5;
Console.WriteLine("Multithreaded search");
Console.WriteLine("Number of occurances " + counter);
}
static void doStuff(int stime, int etime, String c)
{
for (int i = stime; i < etime; i++)
{
if (x == arr[i])
{
switch (c)
{
case "c1":
c1++;
break;
case "c2":
c2++;
break;
case "c3":
c3++;
break;
case "c4":
c4++;
break;
case "c5":
c5++;
break;
};
}
Thread.Yield();
}
}
}
}

First, in your doStuff you do more work than in search. While it is not likely to have a tangible effect, you never know.
Second, Thread.Yield is a killer with tasks. This methods is intended to be used in very marginal situations like spinning when you think a lock might be too expensive. Here, it is just a brake to your code, causing the OS scheduler to do more work, perhaps even do a context-switch on the current core, which in turn will invalidate the cache.
Finally, your data and computations are small. Moderns CPUs will enumerate such an array in no time, and it is likely a great part of it, or even all, fits in the cache. Concurrent processing has its overhead.
I recommend Benchmark.NET.

C# Multi threading using Task class

I've been trying to implement multi threading which looks something like this:
static void Main(string[] args)
{
List<Task> tskList = new List<Task>();
for (int i = 0; i < 100; i++)
{
Task taskTemp = new Task(() => { Display(i); });
taskTemp.Start();
tskList.Add(taskTemp);
//Thread.Sleep(10);
}
Task.WaitAll(tskList.ToArray());
}
public static void Display(int value)
{
Thread.Sleep(1000);
Console.WriteLine(value);
}
Without the Thread.Sleep(10) part, I get output printed as 100 times "100" instead of 0 to 99 which I'm getting with that sleep time of 10 ms.
My guess is that this could be happening because of the time required to schedule the thread by the system and by the time the thread is about to actually start, the value has reached 100.
If I put enough wait time (say 1000 ms instead of 10), will it be guaranteed to not have this problem? Or should I suspect that the system may take even more time to schedule the thread when CPU utilization is too much? What is the best way to solve this problem?
Thanks in advance for any inputs!

you should add a local variable to hold 'i', such as :
for (int i = 0; i < 100; i++)
{
var t = i;
Task taskTemp = new Task(() => { Display(t); });
taskTemp.Start();
tskList.Add(taskTemp);
//Thread.Sleep(10);
}

Just make a copy of "i" to "i1" and use it as local variable. "i" is always changed, thats why you get 100 100 100....:
private static void Main(string[] args)
{
var tskList = new List<Task>();
for (var i = 0; i < 100; i++)
{
var i1 = i;
var taskTemp = new Task(() => { Display(i1); });
taskTemp.Start();
tskList.Add(taskTemp);
}
Task.WaitAll(tskList.ToArray());
}
public static void Display(int value)
{
Thread.Sleep(1000);
Console.WriteLine(value);
}

Code sample that shows casting to uint is more efficient than range check

So I am looking at this question and the general consensus is that uint cast version is more efficient than range check with 0. Since the code is also in MS's implementation of List I assume it is a real optimization. However I have failed to produce a code sample that results in better performance for the uint version. I have tried different tests and there is something missing or some other part of my code is dwarfing the time for the checks. My last attempt looks like this:
class TestType
{
public TestType(int size)
{
MaxSize = size;
Random rand = new Random(100);
for (int i = 0; i < MaxIterations; i++)
{
indexes[i] = rand.Next(0, MaxSize);
}
}
public const int MaxIterations = 10000000;
private int MaxSize;
private int[] indexes = new int[MaxIterations];
public void Test()
{
var timer = new Stopwatch();
int inRange = 0;
int outOfRange = 0;
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if (x < 0 || x > MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 1: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
inRange = 0;
outOfRange = 0;
timer.Reset();
timer.Start();
for (int i = 0; i < MaxIterations; i++)
{
int x = indexes[i];
if ((uint)x > (uint)MaxSize)
{
throw new Exception();
}
inRange += indexes[x];
}
timer.Stop();
Console.WriteLine("Comparision 2: " + inRange + "/" + outOfRange + ", elapsed: " + timer.ElapsedMilliseconds + "ms");
}
}
class Program
{
static void Main()
{
TestType t = new TestType(TestType.MaxIterations);
t.Test();
TestType t2 = new TestType(TestType.MaxIterations);
t2.Test();
TestType t3 = new TestType(TestType.MaxIterations);
t3.Test();
}
}
The code is a bit of a mess because I tried many things to make uint check perform faster like moving the compared variable into a field of a class, generating random index access and so on but in every case the result seems to be the same for both versions. So is this change applicable on modern x86 processors and can someone demonstrate it somehow?
Note that I am not asking for someone to fix my sample or explain what is wrong with it. I just want to see the case where the optimization does work.

if (x < 0 || x > MaxSize)
The comparison is performed by the CMP processor instruction (Compare). You'll want to take a look at Agner Fog's instruction tables document (PDF), it list the cost of instructions. Find your processor back in the list, then locate the CMP instruction.
For mine, Haswell, CMP takes 1 cycle of latency and 0.25 cycles of throughput.
A fractional cost like that could use an explanation, Haswell has 4 integer execution units that can execute instructions at the same time. When a program contains enough integer operations, like CMP, without an interdependency then they can all execute at the same time. In effect making the program 4 times faster. You don't always manage to keep all 4 of them busy at the same time with your code, it is actually pretty rare. But you do keep 2 of them busy in this case. Or in other words, two comparisons take just as long as single one, 1 cycle.
There are other factors at play that make the execution time identical. One thing helps is that the processor can predict the branch very well, it can speculatively execute x > MaxSize in spite of the short-circuit evaluation. And it will in fact end up using the result since the branch is never taken.
And the true bottleneck in this code is the array indexing, accessing memory is one of the slowest thing the processor can do. So the "fast" version of the code isn't faster even though it provides more opportunity to allow the processor to concurrently execute instructions. It isn't much of an opportunity today anyway, a processor has too many execution units to keep busy. Otherwise the feature that makes HyperThreading work. In both cases the processor bogs down at the same rate.
On my machine, I have to write code that occupies more than 4 engines to make it slower. Silly code like this:
if (x < 0 || x > MaxSize || x > 10000000 || x > 20000000 || x > 3000000) {
outOfRange++;
}
else {
inRange++;
}
Using 5 compares, now I can a difference, 61 vs 47 msec. Or in other words, this is a way to count the number of integer engines in the processor. Hehe :)
So this is a micro-optimization that probably used to pay off a decade ago. It doesn't anymore. Scratch it off your list of things to worry about :)

I would suggest attempting code which does not throw an exception when the index is out of range. Exceptions are incredibly expensive and can completely throw off your bench results.
The code below does a timed-average bench for 1,000 iterations of 1,000,000 results.
using System;
using System.Diagnostics;
namespace BenchTest
{
class Program
{
const int LoopCount = 1000000;
const int AverageCount = 1000;
static void Main(string[] args)
{
Console.WriteLine("Starting Benchmark");
RunTest();
Console.WriteLine("Finished Benchmark");
Console.Write("Press any key to exit...");
Console.ReadKey();
}
static void RunTest()
{
int cursorRow = Console.CursorTop; int cursorCol = Console.CursorLeft;
long totalTime1 = 0; long totalTime2 = 0;
long invalidOperationCount1 = 0; long invalidOperationCount2 = 0;
for (int i = 0; i < AverageCount; i++)
{
Console.SetCursorPosition(cursorCol, cursorRow);
Console.WriteLine("Running iteration: {0}/{1}", i + 1, AverageCount);
int[] indexArgs = RandomFill(LoopCount, int.MinValue, int.MaxValue);
int[] sizeArgs = RandomFill(LoopCount, 0, int.MaxValue);
totalTime1 += RunLoop(TestMethod1, indexArgs, sizeArgs, ref invalidOperationCount1);
totalTime2 += RunLoop(TestMethod2, indexArgs, sizeArgs, ref invalidOperationCount2);
}
PrintResult("Test 1", TimeSpan.FromTicks(totalTime1 / AverageCount), invalidOperationCount1);
PrintResult("Test 2", TimeSpan.FromTicks(totalTime2 / AverageCount), invalidOperationCount2);
}
static void PrintResult(string testName, TimeSpan averageTime, long invalidOperationCount)
{
Console.WriteLine(testName);
Console.WriteLine(" Average Time: {0}", averageTime);
Console.WriteLine(" Invalid Operations: {0} ({1})", invalidOperationCount, (invalidOperationCount / (double)(AverageCount * LoopCount)).ToString("P3"));
}
static long RunLoop(Func<int, int, int> testMethod, int[] indexArgs, int[] sizeArgs, ref long invalidOperationCount)
{
Stopwatch sw = new Stopwatch();
Console.Write("Running {0} sub-iterations", LoopCount);
sw.Start();
long startTickCount = sw.ElapsedTicks;
for (int i = 0; i < LoopCount; i++)
{
invalidOperationCount += testMethod(indexArgs[i], sizeArgs[i]);
}
sw.Stop();
long stopTickCount = sw.ElapsedTicks;
long elapsedTickCount = stopTickCount - startTickCount;
Console.WriteLine(" - Time Taken: {0}", new TimeSpan(elapsedTickCount));
return elapsedTickCount;
}
static int[] RandomFill(int size, int minValue, int maxValue)
{
int[] randomArray = new int[size];
Random rng = new Random();
for (int i = 0; i < size; i++)
{
randomArray[i] = rng.Next(minValue, maxValue);
}
return randomArray;
}
static int TestMethod1(int index, int size)
{
return (index < 0 || index >= size) ? 1 : 0;
}
static int TestMethod2(int index, int size)
{
return ((uint)(index) >= (uint)(size)) ? 1 : 0;
}
}
}

You aren't comparing like with like.
The code you were talking about not only saved one branch by using the optimisation, but also 4 bytes of CIL in a small method.
In a small method 4 bytes can be the difference in being inlined and not being inlined.
And if the method calling that method is also written to be small, then that can mean two (or more) method calls are jitted as one piece of inline code.
And maybe some of it is then, because it is inline and available for analysis by the jitter, optimised further again.
The real difference is not between index < 0 || index >= _size and (uint)index >= (uint)_size, but between code that has repeated efforts to minimise the method body size and code that does not. Look for example at how another method is used to throw the exception if necessary, further shaving off a couple of bytes of CIL.
(And no, that's not to say that I think all methods should be written like that, but there certainly can be performance differences when one does).

Multithreaded code to do work using configured number of thread

I want to create a multithreaded application code. I want to execute configured no of threads and each thread do the work. I want to know is this the write approach or do we have better approach. All the threads needs to be executed asynchronously.
public static bool keepThreadsAlive = false;
static void Main(string[] args)
{
Program pgm = new Program();
int noOfThreads = 4;
keepThreadsAlive = true;
for (int i = 1; i <= noOfThreads; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(DoWork), (object)i);
}
System.Console.ReadLine();
StopAllThreads();
System.Console.ReadLine();
}
private static void DoWork(object threadNumber)
{
int threadNum = (int)threadNumber;
int counter = 1;
while (keepThreadsAlive)
{
counter = ProcessACK(threadNum, counter);
}
}
private static int ProcessACK(int threadNum, int counter)
{
System.Console.WriteLine("Thread {0} count {1}", threadNum, counter++);
Random ran = new Random();
int randomNumber = ran.Next(5000, 100000);
for (int i = 0; i < randomNumber; i++) ;
Thread.Sleep(2000);
return counter;
}

As others have pointed out, the methods you are using are dated and not as elegant as the more modern C# approach to accomplishing the same tasks.
Have a look at System.Threading.Tasks for an overview of what is available to you these days. There is even a way to set the maximum threads used in a parallel operation. Here is a simple (pseudocode) example:
Parallel.ForEach(someListOfItems, new ParallelOptions { MaxDegreeOfParallelism = 8 }, item =>
{
//do stuff for each item in "someListOfItems" using a maximum of 8 threads.
});
Hope this helps.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel.Foreach - last item waits for other items - c#

Related

Show program runtime every 3 seconds

Multithreading is taking more time than sequential threading

C# Multi threading using Task class

Code sample that shows casting to uint is more efficient than range check

Multithreaded code to do work using configured number of thread

Categories

Resources