I've a problem of Producer/Consumer. Currently I've a simple Queue surrounded by a lock.
I'm trying to replace it with something more efficient.
My first choice was to use a ConcurrentQueue, but I don't see how to make my consumer wait on the next produced message(without doing Thread.Sleep).
Also, I would like to be able to clear the whole queue if its size reach a specific number.
Can you suggest some existing class or implementation that would match my requirements?
Here is an example on how you can use the BlockingCollection class to do what you want:
BlockingCollection<int> blocking_collection = new BlockingCollection<int>();
//Create producer on a thread-pool thread
Task.Run(() =>
{
int number = 0;
while (true)
{
blocking_collection.Add(number++);
Thread.Sleep(100); //simulating that the producer produces ~10 items every second
}
});
int max_size = 10; //Maximum items to have
int items_to_skip = 0;
//Consumer
foreach (var item in blocking_collection.GetConsumingEnumerable())
{
if (items_to_skip > 0)
{
items_to_skip--; //quickly skip items (to meet the clearing requirement)
continue;
}
//process item
Console.WriteLine(item);
Thread.Sleep(200); //simulating that the consumer can only process ~5 items per second
var collection_size = blocking_collection.Count;
if (collection_size > max_size) //If we reach maximum size, we flag that we want to skip items
{
items_to_skip = collection_size;
}
}
Related
I'm trying to come up with the best solution for going through a list of strings and performing a POST request with each one of them.
My previous attempt was to make a Queue<String> of the strings 200 or more threads and each thread had a task to Dequeue a string from the list and perform the task, which performed worse than I expected.
What I'm doing wrong here?
My code:
class Checker
{
public Queue<string> pins;
public Checker()
{
pins = GetPins();
StartThreads(1000);
}
public void StartThreads(int threadsCount)
{
Console.WriteLine("Starting Threads");
for (int n = 0; n < 200; n++)
{
var thread = new Thread(Printer);
thread.Name = String.Format("Thread Number ({0})", n);
thread.Start();
}
}
public Queue<string> GetPins()
{
Queue<string> numbers = new Queue<string>();
for (int n = 0; n < 100000; n++)
{
numbers.Enqueue(n.ToString().PadLeft(5, '0'));
}
Console.WriteLine("Got Pins");
return numbers;
}
void Printer()
{
while (pins.Count > 0)
{
var num = pins.Dequeue();
Console.WriteLine();
Console.WriteLine(String.Format("{0} - {1}", num, Thread.CurrentThread.Name));
}
}
}
As you can see I generate 100.000 5 digits long pins and perform a task (output them through console) and assuming i have 1000 threads, it has to be incredibly fast, which is not.
Please tell me what I'm doing wrong and anything I can improve. Thank you!
Queue is not a thread-safe collection. See ConcurrentQueue
Your StartTheads() method does not use the threadsCount arg and therefore only starting 200 threads.
You also need to be careful with threads. Consider using Tasks or ThreadPool instead. IIRC, This will let your app decide how many threads it needs depending on the task count.
I use multithreading to process a list of data.
In this example below, for each element, how to make sure "SecondProcess" always runs after "FirstProcess" finishes? The order of elements in the queue being processed doesn't really matter.
public class Processor
{
public void Process()
{
IList<int> queue = QueueGenerator.GetRandomInt(50); //gets a list of 50 unique random integer
foreach (int eachElement in queue)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(FirstProcess), eachElement);
}
Console.ReadLine();
}
private void FirstProcess(object toProcess)
{
int i = 0;
int.TryParse(toProcess.ToString(), out i);
string odd = "odd";
string even = "even";
string toDisplay = (i%2 == 0)
? string.Format("First step: Processing {0} ({1} number)", i, even)
: string.Format("First step: Processing {0} ({1} number)", i, odd);
Console.WriteLine(toDisplay);
}
private void SecondProcess(object toProcess)
{
int i = 0;
int.TryParse(toProcess.ToString(), out i);
Console.WriteLine("Second step: Processing -> {0}", i);
}
}
any idea please?
Thanks
If, instead of
foreach (int eachElement in queue)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(FirstProcess), eachElement);
}
you did
Parallel.ForEach(queue, eachElement => FirstProcess(eachElement));
this will call the delegate for each item in queue in parallel in the ThreadPool, but block until all elements have been processed.
This means that when the next line of code executes on this calling thread, all the work will be complete.
Now, you just do it again:
Parallel.ForEach(queue, eachElement => SecondProcess(eachElement));
Using the Parallel class will have advantages because it can make use of a partitioner, and so, effectively queue batched operations to the ThreadPool rather than queuing individual items into the ThreadPool queue.
I have an application where i have 1000+ small parts of 1 large file.
I have to upload maximum of 16 parts at a time.
I used Thread parallel library of .Net.
I used Parallel.For to divide in multiple parts and assigned 1 method which should be executed for each part and set DegreeOfParallelism to 16.
I need to execute 1 method with checksum values which are generated by different part uploads, so i have to set certain mechanism where i have to wait for all parts upload say 1000 to complete.
In TPL library i am facing 1 issue is it is randomly executing any of the 16 threads from 1000.
I want some mechanism using which i can run first 16 threads initially, if the 1st or 2nd or any of the 16 thread completes its task next 17th part should be started.
How can i achieve this ?
One possible candidate for this can be TPL Dataflow. This is a demonstration which takes in a stream of integers and prints them out to the console. You set the MaxDegreeOfParallelism to whichever many threads you wish to spin in parallel:
void Main()
{
var actionBlock = new ActionBlock<int>(
i => Console.WriteLine(i),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 16});
foreach (var i in Enumerable.Range(0, 200))
{
actionBlock.Post(i);
}
}
This can also scale well if you want to have multiple producer/consumers.
Here is the manual way of doing this.
You need a queue. The queue is sequence of pending tasks. You have to dequeue and put them inside list of working task. When ever the task is done remove it from list of working task and take another from queue. Main thread controls this process. Here is the sample of how to do this.
For the test i used List of integer but it should work for other types because its using generics.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
ParallelQueue(items, DoWork);
}
private static void ParallelQueue<T>(List<T> items, Action<T> action)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (pending.Count != 0 && working.Count < 16) // Maximum tasks
{
var item = pending.Dequeue(); // get item from queue
working.Add(Task.Run(() => action((T)item))); // run task
}
else
{
Task.WaitAny(working.ToArray());
working.RemoveAll(x => x.IsCompleted); // remove finished tasks
}
}
}
private static void DoWork(int i) // do your work here.
{
// this is just an example
Task.Delay(i).Wait();
Console.WriteLine(i);
}
Please let me know if you encounter problem of how to implement DoWork for your self. because if you change method signature you may need to do some changes.
Update
You can also do this with async await without blocking the main thread.
private static void Main()
{
Random r = new Random();
var items = Enumerable.Range(0, 100).Select(x => r.Next(100, 200)).ToList();
Task t = ParallelQueue(items, DoWork);
// able to do other things.
t.Wait();
}
private static async Task ParallelQueue<T>(List<T> items, Func<T, Task> func)
{
Queue pending = new Queue(items);
List<Task> working = new List<Task>();
while (pending.Count + working.Count != 0)
{
if (working.Count < 16 && pending.Count != 0)
{
var item = pending.Dequeue();
working.Add(Task.Run(async () => await func((T)item)));
}
else
{
await Task.WhenAny(working);
working.RemoveAll(x => x.IsCompleted);
}
}
}
private static async Task DoWork(int i)
{
await Task.Delay(i);
}
var workitems = ... /*e.g. Enumerable.Range(0, 1000000)*/;
SingleItemPartitioner.Create(workitems)
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(16)
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.ForAll(i => { Thread.Slee(1000); Console.WriteLine(i); });
This should be all you need. I forgot how the methods are named exactly... Look at the documentation.
Test this by printing to the console after sleeping for 1sec (which this sample code does).
Another option would be to use a BlockingCollection<T> as a queue between your file reader thread and your 16 uploader threads. Each uploader thread would just loop around consuming the blocking collection until it is complete.
And, if you want to limit memory consumption in the queue you can set an upper limit on the blocking collection such that the file-reader thread will pause when the buffer has reached capacity. This is particularly useful in a server environment where you may need to limit memory used per user/API call.
// Create a buffer of 4 chunks between the file reader and the senders
BlockingCollection<Chunk> queue = new BlockingCollection<Chunk>(4);
// Create a cancellation token source so you can stop this gracefully
CancellationTokenSource cts = ...
File reader thread
...
queue.Add(chunk, cts.Token);
...
queue.CompleteAdding();
Sending threads
for(int i = 0; i < 16; i++)
{
Task.Run(() => {
foreach (var chunk in queue.GetConsumingEnumerable(cts.Token))
{
.. do the upload
}
});
}
I've an application that works with a queue with strings (which corresponds to different tasks that application needs to perform). At random moments the queue can be filled with strings (like several times a minute sometimes but it also can take a few hours.
Till now I always had a timer that checked every few seconds the queue whether there were items in the queue and removed them.
I think there must be a nicer solution than this way. Is there any way to get an event or so when an item is added to the queue?
Yes. Take a look at TPL Dataflow, in particular, the BufferBlock<T>, which does more or less the same as BlockingCollection without the nasty side-effect of jamming up your threads by leveraging async/await.
So you can:
void Main()
{
var b = new BufferBlock<string>();
AddToBlockAsync(b);
ReadFromBlockAsync(b);
}
public async Task AddToBlockAsync(BufferBlock<string> b)
{
while (true)
{
b.Post("hello");
await Task.Delay(1000);
}
}
public async Task ReadFromBlockAsync(BufferBlock<string> b)
{
await Task.Delay(10000); //let some messages buffer up...
while(true)
{
var msg = await b.ReceiveAsync();
Console.WriteLine(msg);
}
}
I'd take a look at BlockingCollection.GetConsumingEnumerable. The collection will be backed with a queue by default, and it is a nice way to automatically take values from the queue as they are added using a simple foreach loop.
There is also an overload that allows you to supply a CancellationToken meaning you can cleanly break out.
Have you looked at BlockingCollection ? The GetConsumingEnumerable() method allows an indefinite loop to be run on the consumer, to which will new items will be yielded once an item becomes available, with no need for timers, or Thread.Sleep's:
// Common:
BlockingCollection<string> _blockingCollection =
new BlockingCollection<string>();
// Producer
for (var i = 0; i < 100; i++)
{
_blockingCollection.Add(i.ToString());
Thread.Sleep(500); // So you can track the consumer synchronization. Remove.
}
// Consumer:
foreach (var item in _blockingCollection.GetConsumingEnumerable())
{
Debug.WriteLine(item);
}
I'm using C# Parallel.ForEach to process more than thousand subsets of data. One set takes 5-30 minutes to process, depending on size of the set. In my computer with option
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = Environment.ProcessorCount
I'll get 8 parallel processes. As I understood, processes are divided equally between parallel tasks (e.g. the first task gets jobs number 1,9,17 etc, the second gets 2,10,18 etc.); therefore, one task can finish own jobs sooner than others. Because those sets of data took less time than others.
The problem is that four parallel tasks finish their jobs within 24 hours, but the last one finish in 48 hours. It there some chance to organize parallelism so that all parallel tasks are finishing equally? It means all parallel tasks continue working until all jobs are done?
Since the jobs are not equal, you can't split the number of jobs between processors and have them finish at about the same time. I think what you need here is 8 worker threads that retrieve the next job in line. You will have to use a lock on the function to get the next job.
Somebody correct me if I'm wrong, but off the top of my head... a worker thread could be given a function like this:
public void ProcessJob()
{
for (Job myJob = GetNextJob(); myJob != null; myJob = GetNextJob())
{
// process job
}
}
And the function to get the next job would look like:
private List<Job> jobs;
private int currentJob = 0;
private Job GetNextJob()
{
lock (jobs)
{
Job job = null;
if (currentJob < jobs.Count)
{
job = jobs[currentJob];
currentJob++;
}
return job;
}
}
It seems that there is no ready-to-use solution and it has to be created.
My previous code was:
var ListOfSets = (from x in Database
group x by x.SetID into z
select new { ID = z.Key}).ToList();
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = Environment.ProcessorCount;
Parallel.ForEach(ListOfSets, po, SingleSet=>
{
AnalyzeSet(SingleSet.ID);
});
To share work equally between all CPU-s, I still use Parallel to do the work, but instead of ForEach I use For and an idea from Matt. The new code is:
Parallel.For(0, Environment.ProcessorCount, i=>
{
while(ListOfSets.Count() > 0)
{
double SetID = 0;
lock (ListOfSets)
{
SetID = ListOfSets[0].ID;
ListOfSets.RemoveAt(0);
}
AnalyzeSet(SetID);
}
});
So, thank you for your advice.
One option, as suggested by others, is to manage your own producer consumer queue. I'd like to note that using the BlockingCollection makes this very easy to do.
BlockingCollection<JobData> queue = new BlockingCollection<JobData>();
//add data to queue; if it can be done quickly, just do it inline.
//If it's expensive, start a new task/thread just to add items to the queue.
foreach (JobData job in data)
queue.Add(job);
queue.CompleteAdding();
for (int i = 0; i < Environment.ProcessorCount; i++)
{
Task.Factory.StartNew(() =>
{
foreach (var job in queue.GetConsumingEnumerable())
{
ProcessJob(job);
}
}, TaskCreationOptions.LongRunning);
}