I am new to Parallel Programming and infact this is the first time I am trying it. I am currently doing a project in .NET 4 and prefer to do have 4 or 5 parallel executions.
I see some options. There is Task.Factory.StartNew Parallel.For Parallel.ForEach etc.
What I am going to do is post to a web-site and fetch the responses for about 200 URLs.
When I use Parallel.ForEach I didn't find a way to control the number of threads and the application went using 130+ threads and the website went unresponsive :)
I am interested in using Task.Factory.StartNew within a for loop and divide the URLs in to 4 or 5 tasks.
List<Task> tasks = new List<Task>();
for (int i = 0; i < 5; i++)
{
List<string> UrlForTask = GetUrlsForTask(i,5); //Lets say will return some thing like 1 of 5 of the list of URLs
int j = i;
var t = Task.Factory.StartNew(() =>
{
List<PageSummary> t = GetSummary(UrlForTask);
Summary.AddRange(t); //Summary is a public variable
}
tasks.Add(t);
}
I believe that these Tasks kind of boil down to threads. So if I make Summary a List<PageSummary> will it be kind of thread safe (I understand there are issues accessing a shared variable by multiple threads)?
Is this where we should use ConcurrentQueue<T> ?
Do you know of a good resource that helps to learn about accessing and updating a shared variable by multiple tasks etc?
What is the best way I could use for this type of task as you may think ?
Parallel.ForEach has overloads that take a ParallelOptions instance. The MaxDegreeOfParallelism property of that class is what you need to use.
List<MyRequest> requests = ...;
BlockingCollection<MyResponse> responses = ...;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(
requests,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
request => responses.Add(MyDownload(request)));
responses.CompleteAdding();
});
foreach (var response in responses.GetConsumingEnumerable())
{
Console.WriteLine(response.MyMessage);
}
Related
Hi is there any possible way to get the status of the threads from a Thread.Join, or can i make a breakout from a Thread.Join at a specified period?
For eg:
I have a loop that have n-jobs, i've got 3 free cores for 3 parallel threads, and after Joining the 3 threads, i wonder if there's a way to check if a thread has done it's job to start another job in it's place.
I want to keep the 3 cores working all time, not to wait for all threads to stop and then start another 3 of them.
The simplest, and most likely best, solution is to use the threadpool. The threadpool automatically scales based on available processors and cores.
ThreadPool.QueueUserWorkItem(state => TaskOne());
ThreadPool.QueueUserWorkItem(state => TaskTwo());
ThreadPool.QueueUserWorkItem(state => TaskThree());
ThreadPool.QueueUserWorkItem(state => TaskFour());
If you need to do this the hard way, you could keep a queue of pending tasks and a list of currently running tasks, and use a timeout for the Join() call so that it returns false if the thread is not ready.
I can't think of any reason to prefer the complex to the simple solution, but there might be one, of course.
var MAX_RUNNING = 3;
var JOIN_TIMEOUT_MS = 50;
var waiting = new Queue<ThreadStart>();
var running = new List<Thread>();
waiting.Enqueue(new ThreadStart(TaskOne));
waiting.Enqueue(new ThreadStart(TaskTwo));
waiting.Enqueue(new ThreadStart(TaskThree));
waiting.Enqueue(new ThreadStart(TaskFour));
while (waiting.Any() || running.Any())
{
while (running.Count < MAX_RUNNING && waiting.Any())
{
var next = new Thread(waiting.Dequeue());
next.Start();
running.Add(next);
}
for (var i = running.Count - 1; i >= 0; --i)
{
var t = running[i];
if(t.ThreadState == System.Threading.ThreadState.Stopped) {
running.RemoveAt(i);
break;
}
if (t.Join(JOIN_TIMEOUT_MS))
{
running.RemoveAt(i);
break;
}
}
}
I have 1000 elements in a TPL dataflow block,
each element will call external webservices.
the web service supports a maximum of 10 simultaneous calls,
which is easily achieved using:
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
...
}
The web service requires each call to have a unique id passed which distinguises it from the other simultaneous calls.
In theory this should be a guid, but in practise the 11th GUID will fail - because the throttling mechanism on the server is slow to recognise that the first call is finished.
The vendor suggests we recycle the guids, keeping 10 in active use.
I intend to have an array of GUIDS, each task will use (Interlocked.Increment(ref COUNTER) % 10 ) as the array index
EDIT :
I just realised this won't work!
It assumes tasks will complete in order which they may not
I could implement this as a queue of IDs where each task borrows and returns one, but the question still stands, is there a an easier, pre bulit thread-safe way to do this?
(there will never be enough calls for COUNTER to overflow)
But I've been surprised a number of times by C# (I'm new to .net) that I am implementing something that already exists.
Is there a better thread-safe way for each task to recycle from a pool of ids?
Creating resource pools is the exact situation System.Collections.ConcurrentBag<T> is useful for. Wrap it up in a BlockingCollection<T> to make the code easier.
class Example
{
private readonly BlockingCollection<Guid> _guidPool;
private readonly TransformBlock<Foo, Bar> _transform;
public Example(int concurrentLimit)
{
_guidPool = new BlockingCollection<Guid>(new ConcurrentBag<Guid>(), concurrentLimit)
for(int i = 0: i < concurrentLimit; i++)
{
_guidPool.Add(Guid.NewGuid());
}
_transform = new TransformBlock<Foo, Bar>(() => SomeAction,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = concurrentLimit
//...
});
//...
}
private async Task<Bar> SomeAction(Foo foo)
{
var id= _guidPool.Take();
try
{
//...
}
finally
{
_guidPool.Add(id);
}
}
}
I have been trying make an asynchronous approach to my CPU-bound function which compute some aggregate functions. The thing is that there is some Deadlock (I suppose), because the time of calculation is too different. I am reallz newbie in this Task Parallel world, I also read Stephem Cleary articles but I am still unsure of all aspect this asynchronous approach.
My Code:
private static void Main(string[] args)
{
PIServer server = ConnectToDefaultPIServer();
AFTimeRange timeRange = new AFTimeRange("1/1/2012", "6/30/2012");
Program p = new Program();
for (int i = 0; i < 10; i++)
{
p.TestAsynchronousCall(server, timeRange);
//p.TestAsynchronousCall(server, timeRange).Wait();-same results
}
Console.WriteLine("Main check-disconnected done");
Console.ReadKey();
}
private async Task TestAsynchronousCall(PIServer server, AFTimeRange timeRange)
{
AsyncClass asyn;
for (int i = 0; i < 1; i++)
{
asyn = new AsyncClass();
await asyn.DoAsyncTask(server, timeRange);
//asyn.DoAsyncTask(server, timeRange);-same results
}
}
public async Task DoAsyncTask(PIServer server, AFTimeRange timeRange)
{
var timeRanges = DivideTheTimeRange(timeRange);
Task<Dictionary<PIPoint, AFValues>>[] tasksArray = new Task<Dictionary<PIPoint, AFValues>>[2];
tasksArray[0] = (Task.Run(() => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[0])));
// tasksArray[1] = tasksArray[0].ContinueWith((x) => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[1]));
tasksArray[1] = (Task.Run(() => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[1])));
Task.WaitAll(tasksArray);
//await Task.WhenAll(tasksArray); -same results
for (int i = 0; i < tasksArray.Length; i++)
{
Program.Show(tasksArray[i].Result);
}
}
I measure time throught Stopwatch in AverageValueOfTagPerDay functions. This function is synchronous (Is that a problem?). Each Task take 12 seconds. But when I uncommented the line and use ContinueWith() approach, these Tasks take 5-6 seconds each(which is desirable). How is it possible?
More strange is that when I set the for loop in Main() on 10, sometimes it takes 5 seconds as well as when I use ContinueWith(). So I guess somewhere is deadlock but I am unable to find that.
Sorry for english, I got still problem make good senteces when I try explain some difficulties.
I have been trying make an asynchronous approach to my CPU-bound function which compute some aggregate functions.
"Asynchronous" and "CPU-bound" are not terms that go together. If you have a CPU-bound process, then you should use parallel technologies (Parallel, Parallel LINQ, TPL Dataflow).
I am reallz newbie in this Task Parallel world, I also read Stephem Cleary articles but I am still unsure of all aspect this asynchronous approach.
Possibly because I do not cover parallel technologies in any of my articles or blog posts. :) I do cover them in my book, but not online. My online work focuses on asynchrony, which is ideal for I/O-based operations.
To solve your problem, you should use a parallel approach:
public Dictionary<PIPoint, AFValues>[] DoTask(PIServer server, AFTimeRange timeRange)
{
var timeRanges = DivideTheTimeRange(timeRange);
var result = timeRanges.AsParallel().AsOrdered().
Select(range => CalculationClass.AverageValueOfTagPerDay(server, range)).
ToArray();
return result;
}
Of course, this approach assumes that PIServer is threadsafe. It also assumes that there's no I/O being done by the "server" class; if there is, then TPL Dataflow may be a better choice than Parallel LINQ.
If you are planning to use this code in a UI application and don't want to block the UI thread, then you can call the code asynchronously like this:
var results = await Task.Run(() => DoTask(server, timeRange));
foreach (var result in results)
Program.Show(result);
I have an array of filepath in List<string> with thousands of files. I want to process them in a function parallel with 8 threads.
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(fileList, opt, item => DoSomething(item));
This code works fine for me but it guarantees to run max 8 threads and I want to run 8 threads always. CLR decides the number of threads to be use as per CPU load.
Please suggest a way in threading that always 8 threads are used in computing with minimum overhead.
Use a producer / consumer model. Create one producer and 8 consumers. For example:
BlockingCollection<string> _filesToProcess = new BlockingCollection<string>();
// start 8 tasks to do the processing
List<Task> _consumers = new List<Task>();
for (int i = 0; i < 8; ++i)
{
var t = Task.Factory.StartNew(ProcessFiles, TaskCreationOptions.LongRunning);
_consumers.Add(t);
}
// Populate the queue
foreach (var filename in filelist)
{
_filesToProcess.Add(filename);
}
// Mark the collection as complete for adding
_filesToProcess.CompleteAdding();
// wait for consumers to finish
Task.WaitAll(_consumers.ToArray(), Timeout.Infinite);
Your processing method removes things from the BlockingCollection and processes them:
void ProcessFiles()
{
foreach (var filename in _filesToProcess.GetConsumingEnumerable())
{
// do something with the file name
}
}
That will keep 8 threads running until the collection is empty. Assuming, of course, you have 8 cores on which to run the threads. If you have fewer available cores, then there will be a lot of context switching, which will cost you.
See BlockingCollection for more information.
Within a static counter, you might be able to get the number of current threads.
Every time you call start a task there is the possibility to use the Task.ContinueWith (http://msdn.microsoft.com/en-us/library/dd270696.aspx) to notify that it's over and you can start another one.
This way there is going to be always 8 tasks running.
OrderablePartitioner<Tuple<int, int>> chunkPart = Partitioner.Create(0, fileList.Count, 1);//Partition the list in chunk of 1 entry
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(chunkPart, opt, chunkRange =>
{
for (int i = chunkRange.Item1; i < chunkRange.Item2; i++)
{
DoSomething(fileList[i].FullName);
}
});
I have a huge collection, over which i have to perform a specific task(which involves calling a wcf service). I want to control the number of threads instead of using Parallel.ForEach directly. Here i have 2 options:
I am using below to partition the data:
List<MyCollectionObject> MyCollection = new List<MyCollectionObject>();
public static IEnumerable<List<T>> PartitionMyData<T>(this IList<T> source, Int32 size)
{
for (int i = 0; i < Math.Ceiling(source.Count / (Double)size); i++)
{
yield return new List<T>(source.Skip(size * i).Take(size));
}
}
Option 1:
MyCollection.PartitionMyData(AutoEnrollRequests.Count()/threadValue).AsParallel().AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
foreach(MyCollectionObject obj in requests)
{
//Do Something
}
}
Option2:
MyCollection.PartitionMyData(threadValue).AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
Action<MyCollectionObject> dosomething =
{
}
Parallel.ForEach(requests,dosomething)
}
If i have 16 objects in my collection, as per my knowledge Option1 will launch 4 threads, each thread having 4 objects will be processed synchronously.
Option 2 will launch 4 threads with 1 object each, process them and again will launch 4 threads.
Can anyone please suggest which option is better?
P.S.
I understand .Net framework does thread pooling and we need not control the number of threads but due to some design decision we want to use it.
Thanks In Advance,
Rohit
I want to control the number of threads instead of using Parallel.ForEach directly
You can control de number of threads in Parallel.ForEach if you use this call with a ParallelOptions object:
Parallel.ForEach(requests,
new ParallelOptions(){MaxDegreeOfParallelism = 4}, //change here
dosomething)
It's impossible to give an A or B answer here. It depends on too many unknowns.
I will assume you want the fastest approach. To see which is better, run both on the target environment (or closest approximation you can get) and see which one completes fastest.