I have a method that processeswords in two lists, a priority list and a standard list.
ConcurrentBag<Word> PriorityWords = ...;
ConcurrentBag<Word> UnprocessedWords = ...;
public void ProcessAllWords()
{
while (true)
{
Word word = SelectWordToProcess();
if (word == null) break;
ProcessWord(word);
}
}
private Word SelectWordToProcess()
{
Word word;
if (PriorityWords.TryTake(out word) || UnprocessedWords.TryTake(out word))
return word;
else
return null;
}
public void ProcessWord(Word word) { ... }
I want to run this method on multiple cores. Currently, I simply open one thread per processor:
for (int i = 0; i < Environment.ProcessorCount; i++)
{
new Thread(ProcessAllWords).Start();
}
Is there a more suitable way that lets the system decide how many threads to open based on current system performance, similar to Parallel.ForEach()?
EDIT: More detail on the application.
The word list is prepopulated with ~180,000 words and every word is to be permutated with every other word. ProcessAllWords is an O(n²) operation. The threads will all run flat-out until all words are processed, then terminate. While the threads are running, I can asynchronously give priority to specific words by adding them to the PriorityWords list. Initial tests show my system processes about 5 words/sec, so that's 10 hours of 100% CPU processing.
Your method of starting Environment.ProcessorCount threads is good. The Task Parallel Library will do the automatic scheduling that you're looking for, but at the cost of over subscribing your CPU. This will decrease the responsiveness of your application to priority words.
As for the various methods of using TPL, parallel for and task factory will both queue up a bunch of words, making it very unresponsive to priority. You can maintain your priority with a generator method and PLINQ, but then you get a fixed number of threads as you have now. You can either set your thread count or use the default of 2xEnvironment.ProcessorCount. All in all, since your task is CPU bound, I would keep your current implementation.
Related
My application processes millions of pieces of data which vary in size. Small objects are processed quickly while others can take upwards of fifteen minutes.
My current code:
List<QueueRecords> queueRecords= Get500QueueRecords();
bool morefiles=true;
while(morefiles)
{
Parallel.ForEach(
queueRecords,parallelOptions,(record,loopstate)=>
{
//dowork
}
queueRecords = Get500QueueRecords();
if(queueRecords.Count() == 0)
{
morefiles = false;
}
}
The issue with this is that many times I will end up with one thread performing a long running task while there are still massive amounts of data to be processed.
Which pattern should I look into to resolve this issue?
Issues:
1) Get500QueueRecords could also taking some time to execute during which time you aren't doing any processing;
2) If the last record in a set takes 15 minutes you are only processing one at a time when it's processing because ParallelForEach will be waiting for it to complete.
You really should look at TPL DataFlow (https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/dataflow-task-parallel-library) or at least create a reader Task that's pumping data into a BlockingCollection<T> and then launch multiple reader Tasks that pull from the blocking collection until it's consumed.
Using a producer and a consumer with a finite size BlockingCollection<T> between them allows you to control (i) how many items are buffered from the reader Task and (ii) how many Tasks you have consuming it.
Background
Historically, I've not had to write too much thread related code. I am familiar with the process and I can use System.Threading to do what I need to do when the need arises. However, the bulk of my experience has been stuck in .Net 2.0 world (ugh!) and I need a bit of help executing a very simple task in the most recent version of .Net.
I need to write a simple, parallel job but I would prefer it to execute as a low-priority task that doesn't bring my PC to a sluggish halt.
Below is a simple parallel example that demonstrates the type of work that I'm trying to accomplish. In this example, I take a random number and I store it if the new value is larger than the last, largest random number that has been discovered.
Please understand, the point of this example is to strictly show that I have a calculation that I wish to repeatedly execute, compare and store. I understand the need for variable locking and I also know that this code isn't perfect. However, it is an extremely simple example of the nature of what I need to accomplish. If you run this code, your CPU will come to a grinding halt.
int i = 0;
ThreadSafeRNG r = new ThreadSafeRNG();
ParallelOptions.MaxDegreeOfParallelism = 4; //Assume I have 8 cores.
Parallel.For(0, Int32.MaxValue, (j, loopState) =>
{
int k = r.Next();
if (k > i) k = i;
}
How can I do this work, in parallel, and have it execute as a low-priority CPU job. The above mechanism has provided tremendous performance improvements, over a standard for-loop. However, the cost has been that I can't use my computer, even after setting the MaxDegreeOfParallelism option.
What can I do?
Because Parallel.For uses ThreadPool threads you can not set the priority of the thread to be low. However you can change the priority of the entire application
Process currentProcess = Process.GetCurrentProcess();
ProcessPriorityClass oldPriority = currentProcess.PriorityClass;
try
{
currentProcess.PriorityClass = ProcessPriorityClass.BelowNormal;
int i = 0;
ThreadSafeRNG r = new ThreadSafeRNG();
ParallelOptions.MaxDegreeOfParallelism = 4; //Assume I have 8 cores.
Parallel.For(0, Int32.MaxValue, (j, loopState) =>
{
int k = r.Next();
if (k > i) k = i;
}
}
finally
{
//Bring the priority back up to the original level.
currentProcess.PriorityClass = oldPriority;
}
If you really don't want your whole application's priority to be lowered combine this trick with some form of IPC like WCF and have your slow long running operation running in a 2nd process that you start up and kill as needed.
Using the Parallel.For method, you cannot set thread priority since they are ThreadPool threads.
Your best bet is to set ParallelOptions.MaxDegreeOfParallelism to ProcessorCount - 1, this way you have a free core to do other things (such as handle the GUI).
I have a for loop with more than 20k iterations,for each iteration it is taking around two or three seconds and total around 20minutes. how i can optimize this for loop. I am using .net3.5 so parallel foreach is not possible. so i splited the 200000 nos into small chunks and implemented some threading now i am able reduce the time by 50%. is there any other way to optimize these kind of for loops.
My sample code is given below
static double sum=0.0;
public double AsyncTest()
{
List<Item> ItemsList = GetItem();//around 20k items
int count = 0;
bool flag = true;
var newItemsList = ItemsList.Take(62).ToList();
while (flag)
{
int j=0;
WaitHandle[] waitHandles = new WaitHandle[62];
foreach (Item item in newItemsList)
{
var delegateInstance = new MyDelegate(MyMethod);
IAsyncResult asyncResult = delegateInstance.BeginInvoke(item.id, new AsyncCallback(MyAsyncResults), null);
waitHandles[j] = asyncResult.AsyncWaitHandle;
j++;
}
WaitHandle.WaitAll(waitHandles);
count = count + 62;
newItemsList = ItemsList.Skip(count).Take(62).ToList();
}
return sum;
}
public double MyMethod(int id)
{
//Calculations
return sum;
}
static public void MyAsyncResults(IAsyncResult iResult)
{
AsyncResult asyncResult = (AsyncResult) iResult;
MyDelegate del = (MyDelegate) asyncResult.AsyncDelegate;
double mySum = del.EndInvoke(iResult);
sum = sum + mySum;
}
It's possible to reduce number of loops by various techniques. However, this won't give you any noticeable improvement since the heavy computation is performed inside your loops. If you've already parallelized it to use all your CPU cores there is not much to be done. There is a certain amount of computation to be done and there is a certain computer power available. You can't squeeze from your machine more than it can provide.
You can try to:
Do a more efficient implementation of your algorithm if it's possible
Switch to faster environment/language, such as unmanaged C/C++.
Is there a rationale behind your batches size (62)?
Is "MyMethod" method IO bound or CPU bound?
What you do in each cycle is wait till all the batch completes and this wastes some cycles (you are actually waiting for all 62 calls to complete before taking the next batch).
Why won't you change the approach a bit so that you still keep N operations running simultaneosly, but you fire a new operation as soon as one of the executind operations completes?
According to this blog, for loops are more faster than foreach in case of collections. Try looping with for. It will help.
It sounds like you have a CPU intensive MyMethod. For CPU intensive tasks, you can gain a significant improvement by parallelization, but only to the point of better utilizing all CPU cores. Beyond that point, too much parallelization can start to hurt performance -- which I think is what you're doing. (This is unlike I/O intensive tasks where you pretty much parallelize as much as possible.)
What you need to do, in my opinion, is write another method that takes a "chunk" of items (not a single item) and returns their "sum":
double SumChunk(IEnumerable<Item> items)
{
return items.Sum(x => MyMethod(x));
}
Then divide the number of items by n (n being the degree of parallelism -- try n = number of CPU cores, and compare that to x2) and pass each chunk to an async task of SumChunk. And finally, sum up the sub-results.
Also, watch if any of the chunks is completed much before the other ones. If that's the case, then your task distributions is not homogen. You'd need to create smaller chunks (say chunks of 300 items) and pass those to SumChunk.
Correct me if I'm wrong, but it looks to me like your threading is at the individual item level - I wonder if this may be a little too granular.
You are already doing your work in blocks of 62 items. What if you were to take those items and process all of them within a single thread? I.e., you would have something like this:
void RunMyMethods(IEnumerable<Item> items)
{
foreach(Item item in items)
{
var result = MyMethod(item);
...
}
}
Keep in mind that WaitHandle objects can be slower than using Monitor objects: http://www.yoda.arachsys.com/csharp/threads/waithandles.shtml
Otherwise, the usual advice holds: profile the performance to find the true bottlenecks. In your question you state that it takes 2-3 seconds per iteration - with 20000 iterations, it would take a fair bit more than 20 minutes.
Edit:
If you are wanting to maximise your usage of CPU time, then it may be best to split your 20000 items into, say, four groups of 5000 and process each group in its own thread. I would imagine that this sort of "thick 'n chunky" concurrency would be more efficient than a very fine-grained approach.
To start with, the numbers just don't add:
20k iterations,for each iteration it is taking around two or three seconds and total around 20minutes
That's a x40 'parallelism factor' - you can never achieve that running on a normal machine.
Second, when 'optimizing' a CPU intensive computation, there's no sense in parallelizing beyond the number of cores. Try dropping that magical 62 to 16 and bench test - it will actually run faster.
I ran a deformed malversion of your code on my laptop, and got some 10-20% improvement using Parallel.ForEach
So maybe you can make it run 17 minutes instead of 20 - does it really matter ?
I have been writing "linear" winforms for couple of months and now I am trying to figure out threading.
this is my loop that has around 40,000 rows and it takes around 1 second to perform a task on this row:
foreach (String CASE in MAIN_CASES_LIST)
{
//bunch of code here
}
How do I
put each loop into separate thread
maintain no more than x-amount of threads at the same time
If you're on .NET 4 you can utilize Parallel.ForEach
Parallel.ForEach(MAIN_CASES_LIST, CASE =>
{
//bunch of code here
});
There's a great library called SmartThreadPool which may be useful here, it does a lot of useful stuff with threading and queueing, abstracting most of this away from you
Not sure if it will help you, but you can queue up a crapload of work items, limit the number of threads, etc etc
http://www.codeproject.com/Articles/7933/Smart-Thread-Pool
Of course if you want to get your hands dirty with multi-threading or use Parallel go for it, it's just a suggestion :)
To mix the answers above, and add a limit to the maximum number of threads created, you can use this overloaded call. Just be sure to add "using System.Threading.Tasks;" at the top.
LinkedList<String> theList = new LinkedList<string>();
ParallelOptions parOptions = new ParallelOptions();
parOptions.MaxDegreeOfParallelism = 5; //only up to 5 threads allowed.
Parallel.ForEach(theList.AsEnumerable(), parOptions , (string CASE) =>
{
//bunch of code here
});
Ok, So, I just started screwing around with threading, now it's taking a bit of time to wrap my head around the concepts so i wrote a pretty simple test to see how much faster if faster at all printing out 20000 lines would be (and i figured it would be faster since i have a quad core processor?)
so first i wrote this, (this is how i would normally do the following):
System.DateTime startdate = DateTime.Now;
for (int i = 0; i < 10000; ++i)
{
Console.WriteLine("Producing " + i);
Console.WriteLine("\t\t\t\tConsuming " + i);
}
System.DateTime endtime = DateTime.Now;
Console.WriteLine(a.startdate.Second + ":" + a.startdate.Millisecond + " to " + endtime.Second + ":" + endtime.Millisecond);
And then with threading:
public class Test
{
static ProducerConsumer queue;
public System.DateTime startdate = DateTime.Now;
static void Main()
{
queue = new ProducerConsumer();
new Thread(new ThreadStart(ConsumerJob)).Start();
for (int i = 0; i < 10000; i++)
{
Console.WriteLine("Producing {0}", i);
queue.Produce(i);
}
Test a = new Test();
}
static void ConsumerJob()
{
Test a = new Test();
for (int i = 0; i < 10000; i++)
{
object o = queue.Consume();
Console.WriteLine("\t\t\t\tConsuming {0}", o);
}
System.DateTime endtime = DateTime.Now;
Console.WriteLine(a.startdate.Second + ":" + a.startdate.Millisecond + " to " + endtime.Second + ":" + endtime.Millisecond);
}
}
public class ProducerConsumer
{
readonly object listLock = new object();
Queue queue = new Queue();
public void Produce(object o)
{
lock (listLock)
{
queue.Enqueue(o);
Monitor.Pulse(listLock);
}
}
public object Consume()
{
lock (listLock)
{
while (queue.Count == 0)
{
Monitor.Wait(listLock);
}
return queue.Dequeue();
}
}
}
Now, For some reason i assumed this would be faster, but after testing it 15 times, the median of the results is ... a few milliseconds different in favor of non threading
Then i figured hey ... maybe i should try it on a million Console.WriteLine's, but the results were similar
am i doing something wrong ?
Writing to the console is internally synchronized. It is not parallel. It also causes cross-process communication.
In short: It is the worst possible benchmark I can think of ;-)
Try benchmarking something real, something that you actually would want to speed up. It needs to be CPU bound and not internally synchronized.
As far as I can see you have only got one thread servicing the queue, so why would this be any quicker?
I have an example for why your expectation of a big speedup through multi-threading is wrong:
Assume you want to upload 100 pictures. The single threaded variant loads the first, uploads it, loads the second, uploads it, etc.
The limiting part here is the bandwidth of your internet connection (assuming that every upload uses up all the upload bandwidth you have).
What happens if you create 100 threads to upload 1 picture only? Well, each thread reads its picture (this is the part that speeds things up a little, because reading the pictures is done in parallel instead of one after the other).
As the currently active thread uses 100% of the internet upload bandwidth to upload its picture, no other thread can upload a single byte when it is not active. As the amount of bytes that needs to be transmitted, the time that 100 threads need to upload one picture each is the same time that one thread needs to upload 100 pictures one after the other.
You only get a speedup if uploading pictures was limited to lets say 50% of the available bandwidth. Then, 100 threads would be done in 50% of the time it would take one thread to upload 100 pictures.
"For some reason i assumed this would be faster"
If you don't know why you assumed it would be faster, why are you surprised that it's not? Simply starting up new threads is never guaranteed to make any operation run faster. There has to be some inefficiency in the original algorithm that a new thread can reduce (and that is sufficient to overcome the extra overhead of creating the thread).
All the advice given by others is good advice, especially the mention of the fact that the console is serialized, as well as the fact that adding threads does not guarantee speedup.
What I want to point out and what it seems the others missed is that in your original scenario you are printing everything in the main thread, while in the second scenario you are merely delegating the entire printing task to the secondary worker. This cannot be any faster than your original scenario because you simply traded one worker for another.
A scenario where you might see speedup is this one:
for(int i = 0; i < largeNumber; i++)
{
// embarrassingly parallel task that takes some time to process
}
and then replacing that with:
int i = 0;
Parallel.For(i, largeNumber,
o =>
{
// embarrassingly parallel task that takes some time to process
});
This will split the loop among the workers such that each worker processes a smaller chunk of the original data. If the task does not need synchronization you should see the expected speedup.
Cool test.
One thing to have in mind when dealing with threads is bottlenecks. Consider this:
You have a Restaurant. Your kitchen can make a new order every 10
minutes (your chef has a bladder problem so he's always in the
bathroom, but is your girlfriend's cousin), so he produces 6 orders an
hour.
You currently employ only one waiter, which can attend tables
immediately (he's probably on E, but you don't care as long as the
service is good).
During the first week of business everything is fine: you get
customers every ten minutes. Customers still wait for exactly ten
minutes for their meal, but that's fine.
However, after that week, you are getting as much as 2 costumers every
ten minutes, and they have to wait as much as 20 minutes to get their
meal. They start complaining and making noises. And god, you have
noise. So what do you do?
Waiters are cheap, so you hire two more. Will the wait time change?
Not at all... waiters will get the order faster, sure (attend two
customers in parallel), but still some customers wait 20 minutes for
the chef to complete their orders.You need another chef, but as you
search, you discover they are lacking! Every one of them is on TV
doing some crazy reality show (except for your girlfriend's cousin who
actually, you discover, is a former drug dealer).
In your case, waiters are the threads making calls to Console.WriteLine; But your chef is the Console itself. It can only service so much calls a second. Adding some threads might make things a bit faster, but the gains should be minimal.
You have multiple sources, but only 1 output. It that case multi-threading will not speed it up. It's like having a road where 4 lanes that merge into 1 lane. Having 4 lanes will move traffic faster, but at the end it will slow back down when it merges into 1 lane.