running one process after another using multithreading and c# - c#

I use multithreading to process a list of data.
In this example below, for each element, how to make sure "SecondProcess" always runs after "FirstProcess" finishes? The order of elements in the queue being processed doesn't really matter.
public class Processor
{
public void Process()
{
IList<int> queue = QueueGenerator.GetRandomInt(50); //gets a list of 50 unique random integer
foreach (int eachElement in queue)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(FirstProcess), eachElement);
}
Console.ReadLine();
}
private void FirstProcess(object toProcess)
{
int i = 0;
int.TryParse(toProcess.ToString(), out i);
string odd = "odd";
string even = "even";
string toDisplay = (i%2 == 0)
? string.Format("First step: Processing {0} ({1} number)", i, even)
: string.Format("First step: Processing {0} ({1} number)", i, odd);
Console.WriteLine(toDisplay);
}
private void SecondProcess(object toProcess)
{
int i = 0;
int.TryParse(toProcess.ToString(), out i);
Console.WriteLine("Second step: Processing -> {0}", i);
}
}
any idea please?
Thanks

If, instead of
foreach (int eachElement in queue)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(FirstProcess), eachElement);
}
you did
Parallel.ForEach(queue, eachElement => FirstProcess(eachElement));
this will call the delegate for each item in queue in parallel in the ThreadPool, but block until all elements have been processed.
This means that when the next line of code executes on this calling thread, all the work will be complete.
Now, you just do it again:
Parallel.ForEach(queue, eachElement => SecondProcess(eachElement));
Using the Parallel class will have advantages because it can make use of a partitioner, and so, effectively queue batched operations to the ThreadPool rather than queuing individual items into the ThreadPool queue.

Related

The most efficient way to go through a list and check all of them in a post request

I'm trying to come up with the best solution for going through a list of strings and performing a POST request with each one of them.
My previous attempt was to make a Queue<String> of the strings 200 or more threads and each thread had a task to Dequeue a string from the list and perform the task, which performed worse than I expected.
What I'm doing wrong here?
My code:
class Checker
{
public Queue<string> pins;
public Checker()
{
pins = GetPins();
StartThreads(1000);
}
public void StartThreads(int threadsCount)
{
Console.WriteLine("Starting Threads");
for (int n = 0; n < 200; n++)
{
var thread = new Thread(Printer);
thread.Name = String.Format("Thread Number ({0})", n);
thread.Start();
}
}
public Queue<string> GetPins()
{
Queue<string> numbers = new Queue<string>();
for (int n = 0; n < 100000; n++)
{
numbers.Enqueue(n.ToString().PadLeft(5, '0'));
}
Console.WriteLine("Got Pins");
return numbers;
}
void Printer()
{
while (pins.Count > 0)
{
var num = pins.Dequeue();
Console.WriteLine();
Console.WriteLine(String.Format("{0} - {1}", num, Thread.CurrentThread.Name));
}
}
}
As you can see I generate 100.000 5 digits long pins and perform a task (output them through console) and assuming i have 1000 threads, it has to be incredibly fast, which is not.
Please tell me what I'm doing wrong and anything I can improve. Thank you!
Queue is not a thread-safe collection. See ConcurrentQueue
Your StartTheads() method does not use the threadsCount arg and therefore only starting 200 threads.
You also need to be careful with threads. Consider using Tasks or ThreadPool instead. IIRC, This will let your app decide how many threads it needs depending on the task count.

How to correctly use BlockingCollection.GetConsumingEnumerable?

I'm trying to implement a producer/consumer pattern using BlockingCollection<T> so I've written up a simple console application to test it.
public class Program
{
public static void Main(string[] args)
{
var workQueue = new WorkQueue();
workQueue.StartProducingItems();
workQueue.StartProcessingItems();
while (true)
{
}
}
}
public class WorkQueue
{
private BlockingCollection<int> _queue;
private static Random _random = new Random();
public WorkQueue()
{
_queue = new BlockingCollection<int>();
// Prefill some items.
for (int i = 0; i < 100; i++)
{
//_queue.Add(_random.Next());
}
}
public void StartProducingItems()
{
Task.Run(() =>
{
_queue.Add(_random.Next()); // Should be adding items to the queue constantly, but instead adds one and then nothing else.
});
}
public void StartProcessingItems()
{
Task.Run(() =>
{
foreach (var item in _queue.GetConsumingEnumerable())
{
Console.WriteLine("Worker 1: " + item);
}
});
Task.Run(() =>
{
foreach (var item in _queue.GetConsumingEnumerable())
{
Console.WriteLine("Worker 2: " + item);
}
});
}
}
However there are 3 problems with my design:
I don't know the correct way of blocking/waiting in my Main method. Doing a simple empty while loop seems terribly inefficient and CPU usage wasting simply for the sake of making sure the application doesn't end.
There's also another problem with my design, in this simple application I have a producer that produces items indefinitely, and should never stop. In a real world setup, I'd want it to end eventually (e.g. ran out of files to process). In that case, how should I wait for it to finish in the Main method? Make StartProducingItems async and then await it?
Either the GetConsumingEnumerable or Add is not working as I expected. The producer should constantly adding items, but it adds one item and then never adds anymore. This one item is then processed by one of the consumers. Both consumers then block waiting for items to be added, but none are. I know of the Take method, but again spinning on Take in a while loop seems pretty wasteful and inefficient. There is a CompleteAdding method but that then does not allow anything else to ever be added and throws an exception if you try, so that is not suitable.
I know for sure that both consumers are in fact blocking and waiting for new items, as I can switch between threads during debugging:
EDIT:
I've made the changes suggested in one of the comments, but the Task.WhenAll still returns right away.
public Task StartProcessingItems()
{
var consumers = new List<Task>();
for (int i = 0; i < 2; i++)
{
consumers.Add(Task.Run(() =>
{
foreach (var item in _queue.GetConsumingEnumerable())
{
Console.WriteLine($"Worker {i}: " + item);
}
}));
}
return Task.WhenAll(consumers.ToList());
}
GetConsumingEnumerable() is blocking. If you want to add to the queue constantly, you should put the call to _queue.Add in a loop:
public void StartProducingItems()
{
Task.Run(() =>
{
while (true)
_queue.Add(_random.Next());
});
}
Regarding the Main() method you could call the Console.ReadLine() method to prevent the main thread from finishing before you have pressed a key:
public static void Main(string[] args)
{
var workQueue = new WorkQueue();
workQueue.StartProducingItems();
workQueue.StartProcessingItems();
Console.WriteLine("Press a key to terminate the application...");
Console.ReadLine();
}

ConcurrentQueue that allows me to wait on one producer

I've a problem of Producer/Consumer. Currently I've a simple Queue surrounded by a lock.
I'm trying to replace it with something more efficient.
My first choice was to use a ConcurrentQueue, but I don't see how to make my consumer wait on the next produced message(without doing Thread.Sleep).
Also, I would like to be able to clear the whole queue if its size reach a specific number.
Can you suggest some existing class or implementation that would match my requirements?
Here is an example on how you can use the BlockingCollection class to do what you want:
BlockingCollection<int> blocking_collection = new BlockingCollection<int>();
//Create producer on a thread-pool thread
Task.Run(() =>
{
int number = 0;
while (true)
{
blocking_collection.Add(number++);
Thread.Sleep(100); //simulating that the producer produces ~10 items every second
}
});
int max_size = 10; //Maximum items to have
int items_to_skip = 0;
//Consumer
foreach (var item in blocking_collection.GetConsumingEnumerable())
{
if (items_to_skip > 0)
{
items_to_skip--; //quickly skip items (to meet the clearing requirement)
continue;
}
//process item
Console.WriteLine(item);
Thread.Sleep(200); //simulating that the consumer can only process ~5 items per second
var collection_size = blocking_collection.Count;
if (collection_size > max_size) //If we reach maximum size, we flag that we want to skip items
{
items_to_skip = collection_size;
}
}

Ensure a long running task is only fired once and subsequent request are queued but with only one entry in the queue

I have a compute intensive method Calculate that may run for a few seconds, requests come from multiple threads.
Only one Calculate should be executing, a subsequent request should be queued until the initial request completes. If there is already a request queued then the the subsequent request can be discarded (as the queued request will be sufficient)
There seems to be lots of potential solutions but I just need the simplest.
UPDATE: Here's my rudimentaryattempt:
private int _queueStatus;
private readonly object _queueStatusSync = new Object();
public void Calculate()
{
lock(_queueStatusSync)
{
if(_queueStatus == 2) return;
_queueStatus++;
if(_queueStatus == 2) return;
}
for(;;)
{
CalculateImpl();
lock(_queueStatusSync)
if(--_queueStatus == 0) return;
}
}
private void CalculateImpl()
{
// long running process will take a few seconds...
}
The simplest, cleanest solution IMO is using TPL Dataflow (as always) with a BufferBlock acting as the queue. BufferBlock is thread-safe, supports async-await, and more important, has TryReceiveAll to get all the items at once. It also has OutputAvailableAsync so you can wait asynchronously for items to be posted to the buffer. When multiple requests are posted you simply take the last and forget about the rest:
var buffer = new BufferBlock<Request>();
var task = Task.Run(async () =>
{
while (await buffer.OutputAvailableAsync())
{
IList<Request> requests;
buffer.TryReceiveAll(out requests);
Calculate(requests.Last());
}
});
Usage:
buffer.Post(new Request());
buffer.Post(new Request());
Edit: If you don't have any input or output for the Calculate method you can simply use a boolean to act as a switch. If it's true you can turn it off and calculate, if it became true again while Calculate was running then calculate again:
public bool _shouldCalculate;
public void Producer()
{
_shouldCalculate = true;
}
public async Task Consumer()
{
while (true)
{
if (!_shouldCalculate)
{
await Task.Delay(1000);
}
else
{
_shouldCalculate = false;
Calculate();
}
}
}
A BlockingCollection that only takes 1 at a time
The trick is to skip if there are any items in the collection
I would go with the answer from I3aron +1
This is (maybe) a BlockingCollection solution
public static void BC_AddTakeCompleteAdding()
{
using (BlockingCollection<int> bc = new BlockingCollection<int>(1))
{
// Spin up a Task to populate the BlockingCollection
using (Task t1 = Task.Factory.StartNew(() =>
{
for (int i = 0; i < 100; i++)
{
if (bc.TryAdd(i))
{
Debug.WriteLine(" add " + i.ToString());
}
else
{
Debug.WriteLine(" skip " + i.ToString());
}
Thread.Sleep(30);
}
bc.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
try
{
// Consume consume the BlockingCollection
while (true)
{
Debug.WriteLine("take " + bc.Take());
Thread.Sleep(100);
}
}
catch (InvalidOperationException)
{
// An InvalidOperationException means that Take() was called on a completed collection
Console.WriteLine("That's All!");
}
}))
Task.WaitAll(t1, t2);
}
}
}
It sounds like a classic producer-consumer. I'd recommend looking into BlockingCollection<T>. It is part of the System.Collection.Concurrent namespace. On top of that you can implement your queuing logic.
You may supply to a BlockingCollection any internal structure to hold its data, such as a ConcurrentBag<T>, ConcurrentQueue<T> etc. The latter is the default structure used.

How to work with the queue using the task factory

There is a queue. There is a function that processes messages from this queue. This function takes the message from the queue, start new task to process the next message, waiting data from other sources, and then carries out the calculation.
This is example
using System;
using System.Diagnostics;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace TestTaskFactory
{
class Program
{
static int Data = 50;
static int ActiveTasksNumber = 0;
static int MaxActiveTasksNumber = 0;
static Stopwatch clock = new Stopwatch();
static object locker = new object();
static object locker2 = new object();
static void Main(string[] args)
{
clock.Start();
Task.Factory.StartNew(() => DoWork());
while (true)
{
Thread.Sleep(10000);
}
}
public static void DoWork()
{
//imitation of geting message from some queue
int message = GetMessageFromQueue();
lock (locker2)
{
ActiveTasksNumber++;
MaxActiveTasksNumber = Math.Max(MaxActiveTasksNumber,
ActiveTasksNumber);
Console.Write("\r" + message + " ");
}
//Run new task to work with next message
Task.Factory.StartNew(() => DoWork());
//imitation wait some other data
Thread.Sleep(3000);
//imitation of calculations with message
int tmp = 0;
for (int i = 0; i < 30000000; i++)
{
tmp = Math.Max(message, i);
}
lock (locker2)
{
ActiveTasksNumber--;
}
}
public static int GetMessageFromQueue()
{
lock (locker)
{
if (Data == 0)
{
//Queue is empty. All tasks completed except one
//that is waiting for new data
clock.Stop();
Console.WriteLine("\rMax active tasks number = "
+ MaxActiveTasksNumber
+ "\tTime = " + clock.ElapsedMilliseconds + "ms");
Console.Write("Press key to run next iteration");
clock.Reset();
Console.ReadKey();
Console.Write(" ");
//In queue received new data. Processing repeat
clock.Start();
ActiveTasksNumber = 0;
MaxActiveTasksNumber = 0;
Data = 50;
}
Data--;
return Data;
}
}
}
}
My guess, when the queue is empty, all tasks are completed except one task that awaits the new data. When data arrives in the queue the calculations are repeated.
But if you look at the results , every time the number of simultaneously running tasks increases.
Why is this happening?
Test results
Your approach is wrong.
First of all, where is your Queue?
For any jobs you want to queue in a concurrent environment, use the ConcurrentQueue.
The concurrent queue, is used in this fashion, it doesn't need to be locked at any time.
// To create your Queue
ConcurrentQueue<string> queue = new ConcurrentQueue<string>();
// To add objects to your Queue
queue.Enqueue("foo");
// To deque items from your Queue
String bar;
queue.TryDequeue(out bar);
// To loop a process until your Queue is empty
while(!queue.IsEmpty)
{
String bar;
queue.TryDequeue(out bar);
}
Next is how you are incrementing and decrementing your counters, there is a far better way of doing it which is thread safe. Again, the data doesn't need to be locked.
// Change your data type from int to long
static long ActiveTasksNumber = 0;
static long MaxActiveTasksNumber = 0;
// To increment the values in a Thread safe fashion:
Interlocked.Increment(ref ActiveTasksNumber);
// To decrement:
Interlocked.Decrement(ref MaxActiveTasksNumber);
Implement what I've shown you, and it should make your problems disappear
Edit:
Namespaces
using System.Collections.Concurrent;
using System.Threading;
To expand on my comment:
You have, in essence, this:
public static void DoWork()
{
// imitation of geting message from some queue
int message = GetMessageFromQueue();
// Run new task to work with next message
Task.Factory.StartNew(() => DoWork());
// do some work
}
Your code is going to get the first message, start a task to work with the next item, and then do its work. While the first task is working, the second gets an item and spawns yet another task to get an item from the queue. So now you have two threads supposedly doing work and a third that's going to spawn yet another, etc . . .
Nothing in your code stops it from creating a new task for every item in the queue.
If your queue started with 38 things, it's highly likely that you'll end up with 38 concurrent tasks.
You need to limit the number of tasks you're running at the same time. There are many ways to do that. Perhaps the easiest is a simple producer-consumer model using BlockingCollection.

Categories

Resources