How to share Threads in C# Properly? - c#

I built an OCR application which reads PDF files and OCR's them. I built it using Multi-threading with the Parallel.ForEach function.
This works brilliantly, but I noticed that the way the threads are divided seems to work differently to what I'm expecting.
Scenario: When allocating only 10 threads using MaxDegreeOfParallelism, it divides the workload and I can see 10 threads being immediately started. However, there are 100 items that needs to be processed. When it gets around 80/100 items processed, it slows down by only running 2 out of the 10 threads. I suspect this is due to 8/10 threads have successfully completed their portion of the work, but because some PDFs took longer on a certain thread, that thread is still processing his portion of the work.
So my question is, how can I write this better so that even if it does get to 80/100, there should ALWAYS be 10 active threads... (of course when it gets to 90+ the threads will die down, but at least it wont process 1 by 1 when the last thread still has workload to complete.
I hope this makes sense. Here is a snippet of my code:
Parallel.ForEach(F.files, new ParallelOptions { MaxDegreeOfParallelism = iNumberOfThreads }, items =>
{
//do work here
}
});

Thanks to Panagiotis Kanavos, I've implemented ActionBlock<T>, which resolves my problem.
var getData = new ActionBlock<JsonPDFReader.File>(items =>
{
//Code Here
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = iNumberOfThreads });
foreach (JsonPDFReader.File items in F.files)
{
getData.Post(items);
}
getData.Complete();
getData.Completion.Wait();

Related

Best way to process X number of threads at a time in a loop?

I'm starting a remote process for hundreds of servers and want to run about 3 threads at a time. So first I'd like to queue up 3 threads and have them each run the processData() function, then the rest of the items within the forloop have to wait until the queue opens up to run that function so I can continue processing 3 threads in parallel at any given time until completion. What is the best way to go about doing this?
foreach (ServerData serv in servers) {
processData(...)
}
You can try Parallel.ForEach functionality. It also contains some options to customize how many threads will be running at the same time. If you don't specify anything, it will create the threads acccording to the available memory/CPU processing capacity available.
Example:
Parallel.ForEach(servers, (serv) =>
{
//processData function...
});
If you want to run only 3 threads at the same time:
Parallel.ForEach(servers, new ParallelOptions { MaxDegreeOfParallelism = 3 }, (serv) =>
{
//processData function...
});
More information about Parallel.ForEach available here: https://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx

await Task.Delay takes longer than expected

I wrote a multithreaded app which uses async/await extensively. It is supposed to download some stuff at a scheduled time. To achieve that, it uses 'await Task.Delay'. Sometimes it sends thousands requests every minute.
It works as expected, but sometimes my program needs to log something big. When it does, it serializes many objects and saves them to a file. During that time, I noticed that my scheduled tasks are executed too late. I've put all the logging to a separate thread with the lowest priority and the problem doesn't occur that often anymore, but it still happens. The things is, I want to know when it happens and in order to know that I have to use something like that:
var delayTestDate = DateTime.Now;
await Task.Delay(5000);
if((DateTime.Now - delayTestDate).TotalMilliseconds > 6000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
Moreover, I have found that 'Task.Run', which I also use, can also cause delays. To monitor that, I have to use even more ugly code:
var delayTestDate = DateTime.Now;
await Task.Run(() =>
{
if((DateTime.Now - delayTestDate).TotalMilliseconds > 1000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
//do some stuff
delayTestDate = DateTime.Now;
});
if((DateTime.Now - delayTestDate).TotalMilliseconds > 1000/*delays up to 1 second are tolerated*/) Console.WriteLine("The task has been delayed!");
I have to use it before and after every await and Task.Run and inside every async function, which is ugly and inconvenient. I can't put it into a separate function, since it would have to be async and I would have to await it anyway. Does anybody have an idea of a more elegant solution?
EDIT:
Some information I provided in the comments:
As #YuvalItzchakov noticed, the problem may be caused by Thread Pool starvation. That's why I used System.Threading.Thread to take care of the logging outside of the Thread Pool, but as I said, the problem still sometimes occur.
I have a processor with four cores and by subtracting results of ThreadPool.GetAvailableThreads from ThreadPool.GetMaxThreads I get 0 busy worker threads and 1-2 busy completion port threads. Process.GetCurrentProcess().Threads.Count usually returns about 30. It's a Windows Forms app and although it only has a tray icon with a menu, it starts with 11 threads. When it gets to sending thousands requests per minute, it quickly gets up to 30.
As #Noseratio suggested, I tried to play with ThreadPool.SetMinThreads and ThreadPool.SetMaxThreads, but it didn't even change the numbers of busy threads mentioned above.
When you execute Task.Run it uses Thread Pool threads to execute those tasks. When you have long running tasks, you are causing starvation to the Thread Pool, since its resources are currently occupied with long running tasks.
2 Suggestions:
When running long running tasks, make sure to use Task.Factory.Startnew with TaskCreationOptions.LongRunning, which will trigger a new thread creation. You must be cautious here as well, as spinning too many new threads will cause excessive context switches which will cause your app to slow down
Use true async where you have to do IO Bound work, use apis that support the TAP such as HttpClient and Stream, which wont cause a new thread to execute blocking work.
There are overheads in async/await, as well as the tasks themselves being executed at a lower priority. If you need something to happen reliably at an accurate interval, async/await / TPL is not the interface to use.
Try creating an independent background thread that loops until it is scheduled to do work. This way you can control the priority and timing directly without going through TPL / async.
Thread backgroundThread = new Thread(BackgroundWork);
DateTime nextInterval = DateTime.Now;
public void BackgroundWork()
{
if(DateTime.Now > nextInterval){
DoWork();
nextInterval = nextInterval.Add(new TimeSpan(0,0,0,10)); // 10 seconds
}
Thread.Sleep(100);
}
Adjust the Sleep(..) and interval values as needed.
I think you're experiencing the situation described by Joe Duffy in his "CLR thread pool injection, stuttering problems" blog post:
One silly thing our thread pool currently does has to do with how it
creates new threads. Namely, it severely throttles creation of new
threads once you surpass the “minimum” number of threads, which, by
default, is the number of CPUs on the machine. We limit ourselves to
at most one new thread per 500ms once we reach or surpass this number.
One solution might be to explicitly increase the minimum number of thread pool threads before making any use of TPL, e.g.:
ThreadPool.SetMaxThreads(workerThreads: 200, completionPortThreads: 200);
ThreadPool.SetMinThreads(workerThreads: 100, completionPortThreads: 100);
Try playing with these numbers and see if the problem goes away.

Task.Factory.StartNew - confused about the pool

Hi I'm getting myself tied up with Task.Factory.StartNew. Just as I think I get the idea of it someone has suggested I write the following code;
bool exitLoop = false;
while (!exitLoop)
{
exitLoop = true;
var messages = Queue.GetMessages(20);
foreach (var message in messages)
{
exitLoop = false;
Task.Factory.StartNew(() =>
{
DeliverMessage(message);
});
}
}
In theory this is going to drain a queue, 20 messages at a time, attempting to creat a Task for every message in the queue. So if we had a 1000 messages in the queue then in an instant we'd have 25 tasks and it would eat its way through all the msgs. I previously thought I understood this, I thought StartNew would block once it ran out of entries - in the old days that would have been ~ 25. But given this is .net 4.5 which I'm now under the impression that the upper limit for a pool is now pretty high. What puzzles me is that I would have assumed that is going to flood the pool with new tasks and start blocking, i.e. in an instant I now have 1000 tasks running. So if the pool limit is now hardly a limit why am I not seeing 1000 tasks?
[Edit]
ok, so what I'm seeing is that 1000 tasks are queued to run, rather than are running. So how do I determine the number of running/runnable tasks?
I know this is quite a while after your post, but I hope this may help someone facing your specific challenge. Your last comment stated that the 'DeliverMessage' method was making HTTP requests.
If you are using the 'WebClient' object (for example) to make your requests, it will be bound by the ServicePointManager.DefaultConnectionLimit property. This means it will create at most two (by default) concurrent connections to the host. If you created 1,000 parallel tasks, all 1,000 of those would have to be serviced by those two connections.
You'll have to play around with different values for this setting to find the right balance between throughput in your application and load on the web server.

How to stop BackgroundWorker from queuing?

I have the following code:
for (int i = 1; i <= 500; i++)
{
BackgroundWorker t = new BackgroundWorker();
t.DoWork += SOME DB METHOD THAT TAKES 5 SECONDS
t.RunWorkerAsync();
}
When I profile this in SQL I notice that the BackgroundWorker appears to be queuing the threads in such a way that only 4 or 5 active connections are open at the same time vs. all 500 connections opening at once. I get no timeouts or blocking from my DB. How can I prevent this queuing and hit the database with all 500 concurrent threads at once?
BackgroundWorker uses the ThreadPool. You can adjust the ThreadPool with ThreadPool.SetMinThreads and ThreadPool.SetMaxThreads. If it will be actually possible to establish that many connections to your database server may be another question (and cause other problems).
However, it's not recommendable to start 500 BackgroundWorker instances! A better solution could be provided by the "Task Parallel Library" with the Task class.
Something like this should help:
Task.Factory.StartNew(
() => { SOME DB METHOD THAT TAKES 5 SECONDS },
TaskCreationOptions.LongRunning
);
From the MSDN documentation:
LongRunning - Specifies that a task will be a long-running,
coarse-grained operation involving fewer, larger components than
fine-grained systems. It provides a hint to the TaskScheduler that
oversubscription may be warranted. Oversubscription lets you create
more threads than the available number of hardware threads.
Or, you could completely bypass the thread pool and use the Thread class directly:
var t = new Thread(() => { SOME DB METHOD THAT TAKES 5 SECONDS });
t.Start();
"Raw" threads will be harder to work with than tasks, though...
You don't, since your computer can't possibly run 500 threads at once. Most probably, you're having 8 to 16 logical threads, and 4 or 5 is what's left available when you run your code. Seems 100% legit.

Multi Thread c# application System.OutOfMemoryException after 1~5 minutes of runtime

Here is my Timer Elapsed Event, I am receiving the System.OutOfMemoryException on the line Thread thread = new Thread(threadStart);
I am receiving the error fairly fast (1~5 minutes, randomly), and it does not cause unexpected results in my program. I am just wondering what is causing this error, and I am afraid it may cause unexpected results if it is left unchecked. I have searched on the internet and am comming no where near the number of max threads.
readList contains about 46 enteries.
Any help would be appreciated.
private void glob_loopTimer_Elapsed(object sender, ElapsedEventArgs e)
{
try
{
ParameterizedThreadStart threadStart = new ParameterizedThreadStart(readHoldingRegisters);
foreach (readwriteDataGridRow.Read row in readList)
{
Thread thread = new Thread(threadStart);
thread.IsBackground = true;
thread.Start(System.Convert.ToInt32(row.Address));
}
}
catch (Exception ex)
{
UpdateConsole(new object[] { ex.Message.ToString() + " " + ex.StackTrace.ToString(), Color.Red });
Thread.CurrentThread.Abort(); // maybe?
}
}
EDIT:
Here is a bit more information.
My program is reading registers from a Serial Device using the Modbus RTU protocol.
A single register takes less than a tenth of a second to retrieve from readHoldingRegisters
I am open to suggestions on what else to use rather than threads.
note: I need to call readHoldingRegisters 40 - 100 times in a single 'pass'. The passes start when the user hits connect and end when he hits disconnect. Timers are not needed, they just offered a simple way for me to maintain the loop with a start and stop button.
EDIT: Solved
private void glob_loopTimer_Elapsed(object sender, ElapsedEventArgs e)
{
try
{
foreach (readwriteDataGridRow.Read row in readList)
{
readHoldingRegisters(row.Address);
}
}
catch (Exception ex)
{
UpdateConsole(new object[] { ex.Message.ToString() + " " + ex.StackTrace.ToString(), Color.Red });
}
}
The additional Threads were the problem and were not needed.
Ughh, do not, ever (well almost ever) abort threads. There are many preferable ways to make a System.Thread stop. Look around SO, you will find plenty of examples on why doing this is a bad idea and alternative approaches.
On with your question: The problem doesn't seem to be the number of rows in readList. It is more likely that your glob_looperTimer_Elapsed event handler is being executed many times and you are basically starting more and more threads.
What is the interval of your glob_loopTimer?
So how many times is glob_loopTimer_Elapsed called? The name implies that it is run on a periodic timer interval. If so, and if the 46 threads that get created on each invocation do not terminate about as quickly as the timer interval fires, then you could easily be spawning too many threads and running out of memory space as a result. Perhaps you could try logging when each thread starts and when each one finishes to get an idea about how many are in flight at once?
Keep in mind that every thread you allocate will have a certain amount of stack space allocated to it. Depending upon your runtime configuration, this amount of stack space may not be negligible (as in, it may be 1 MB per thread or more) and it may quickly consume your available memory even if you're not close to approaching the theoretical maximum number of threads supported by the OS.
Besides your problem I'll consider using ThreadPool or the TPL.
When using System.Thread there is no automisn to manage the threads...
Also each Thread allocates some memory which could lead to you problem.
The Threadpool and the TPL manage this resources by themselves
see also: -> Thread vs ThreadPool
Reusing threads that have already been created instead of creating new ones (an expensive process)
...
If you queue 100 thread pool tasks, it will only use as many threads as have already been created to service these requests (say 10
for example). The thread pool will make frequent checks (I believe
every 500ms in 3.5 SP1) and if there are queued tasks, it will make
one new thread. If your tasks are quick, then the number of new
threads will be small and reusing the 10 or so threads for the short
tasks will be faster than creating 100 threads up front.
If your workload consistently has large numbers of thread pool requests coming in, then the thread pool will tune itself to your
workload by creating more threads in the pool by the above process so
that there are a larger number of thread available to process requests
check Here for more in depth info on how the thread pool functions under the hood
I just know
Each thread also consumes (by default) around 1 MB of memory.

Categories

Resources