Parallel Programming C# - TPL - Update Shared Variable & Suggestions - c#

I am new to Parallel Programming and infact this is the first time I am trying it. I am currently doing a project in .NET 4 and prefer to do have 4 or 5 parallel executions.
I see some options. There is Task.Factory.StartNew Parallel.For Parallel.ForEach etc.
What I am going to do is post to a web-site and fetch the responses for about 200 URLs.
When I use Parallel.ForEach I didn't find a way to control the number of threads and the application went using 130+ threads and the website went unresponsive :)
I am interested in using Task.Factory.StartNew within a for loop and divide the URLs in to 4 or 5 tasks.
List<Task> tasks = new List<Task>();
for (int i = 0; i < 5; i++)
{
List<string> UrlForTask = GetUrlsForTask(i,5); //Lets say will return some thing like 1 of 5 of the list of URLs
int j = i;
var t = Task.Factory.StartNew(() =>
{
List<PageSummary> t = GetSummary(UrlForTask);
Summary.AddRange(t); //Summary is a public variable
}
tasks.Add(t);
}
I believe that these Tasks kind of boil down to threads. So if I make Summary a List<PageSummary> will it be kind of thread safe (I understand there are issues accessing a shared variable by multiple threads)?
Is this where we should use ConcurrentQueue<T> ?
Do you know of a good resource that helps to learn about accessing and updating a shared variable by multiple tasks etc?
What is the best way I could use for this type of task as you may think ?

Parallel.ForEach has overloads that take a ParallelOptions instance. The MaxDegreeOfParallelism property of that class is what you need to use.
List<MyRequest> requests = ...;
BlockingCollection<MyResponse> responses = ...;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(
requests,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
request => responses.Add(MyDownload(request)));
responses.CompleteAdding();
});
foreach (var response in responses.GetConsumingEnumerable())
{
Console.WriteLine(response.MyMessage);
}

Related

Thread.Join status/breakout

Hi is there any possible way to get the status of the threads from a Thread.Join, or can i make a breakout from a Thread.Join at a specified period?
For eg:
I have a loop that have n-jobs, i've got 3 free cores for 3 parallel threads, and after Joining the 3 threads, i wonder if there's a way to check if a thread has done it's job to start another job in it's place.
I want to keep the 3 cores working all time, not to wait for all threads to stop and then start another 3 of them.
The simplest, and most likely best, solution is to use the threadpool. The threadpool automatically scales based on available processors and cores.
ThreadPool.QueueUserWorkItem(state => TaskOne());
ThreadPool.QueueUserWorkItem(state => TaskTwo());
ThreadPool.QueueUserWorkItem(state => TaskThree());
ThreadPool.QueueUserWorkItem(state => TaskFour());
If you need to do this the hard way, you could keep a queue of pending tasks and a list of currently running tasks, and use a timeout for the Join() call so that it returns false if the thread is not ready.
I can't think of any reason to prefer the complex to the simple solution, but there might be one, of course.
var MAX_RUNNING = 3;
var JOIN_TIMEOUT_MS = 50;
var waiting = new Queue<ThreadStart>();
var running = new List<Thread>();
waiting.Enqueue(new ThreadStart(TaskOne));
waiting.Enqueue(new ThreadStart(TaskTwo));
waiting.Enqueue(new ThreadStart(TaskThree));
waiting.Enqueue(new ThreadStart(TaskFour));
while (waiting.Any() || running.Any())
{
while (running.Count < MAX_RUNNING && waiting.Any())
{
var next = new Thread(waiting.Dequeue());
next.Start();
running.Add(next);
}
for (var i = running.Count - 1; i >= 0; --i)
{
var t = running[i];
if(t.ThreadState == System.Threading.ThreadState.Stopped) {
running.RemoveAt(i);
break;
}
if (t.Join(JOIN_TIMEOUT_MS))
{
running.RemoveAt(i);
break;
}
}
}

identifiing the simultaneous tasks in a TPL dataflow

I have 1000 elements in a TPL dataflow block,
each element will call external webservices.
the web service supports a maximum of 10 simultaneous calls,
which is easily achieved using:
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
...
}
The web service requires each call to have a unique id passed which distinguises it from the other simultaneous calls.
In theory this should be a guid, but in practise the 11th GUID will fail - because the throttling mechanism on the server is slow to recognise that the first call is finished.
The vendor suggests we recycle the guids, keeping 10 in active use.
I intend to have an array of GUIDS, each task will use (Interlocked.Increment(ref COUNTER) % 10 ) as the array index
EDIT :
I just realised this won't work!
It assumes tasks will complete in order which they may not
I could implement this as a queue of IDs where each task borrows and returns one, but the question still stands, is there a an easier, pre bulit thread-safe way to do this?
(there will never be enough calls for COUNTER to overflow)
But I've been surprised a number of times by C# (I'm new to .net) that I am implementing something that already exists.
Is there a better thread-safe way for each task to recycle from a pool of ids?
Creating resource pools is the exact situation System.Collections.ConcurrentBag<T> is useful for. Wrap it up in a BlockingCollection<T> to make the code easier.
class Example
{
private readonly BlockingCollection<Guid> _guidPool;
private readonly TransformBlock<Foo, Bar> _transform;
public Example(int concurrentLimit)
{
_guidPool = new BlockingCollection<Guid>(new ConcurrentBag<Guid>(), concurrentLimit)
for(int i = 0: i < concurrentLimit; i++)
{
_guidPool.Add(Guid.NewGuid());
}
_transform = new TransformBlock<Foo, Bar>(() => SomeAction,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = concurrentLimit
//...
});
//...
}
private async Task<Bar> SomeAction(Foo foo)
{
var id= _guidPool.Take();
try
{
//...
}
finally
{
_guidPool.Add(id);
}
}
}

Asynchronous Tasks take too much time

I have been trying make an asynchronous approach to my CPU-bound function which compute some aggregate functions. The thing is that there is some Deadlock (I suppose), because the time of calculation is too different. I am reallz newbie in this Task Parallel world, I also read Stephem Cleary articles but I am still unsure of all aspect this asynchronous approach.
My Code:
private static void Main(string[] args)
{
PIServer server = ConnectToDefaultPIServer();
AFTimeRange timeRange = new AFTimeRange("1/1/2012", "6/30/2012");
Program p = new Program();
for (int i = 0; i < 10; i++)
{
p.TestAsynchronousCall(server, timeRange);
//p.TestAsynchronousCall(server, timeRange).Wait();-same results
}
Console.WriteLine("Main check-disconnected done");
Console.ReadKey();
}
private async Task TestAsynchronousCall(PIServer server, AFTimeRange timeRange)
{
AsyncClass asyn;
for (int i = 0; i < 1; i++)
{
asyn = new AsyncClass();
await asyn.DoAsyncTask(server, timeRange);
//asyn.DoAsyncTask(server, timeRange);-same results
}
}
public async Task DoAsyncTask(PIServer server, AFTimeRange timeRange)
{
var timeRanges = DivideTheTimeRange(timeRange);
Task<Dictionary<PIPoint, AFValues>>[] tasksArray = new Task<Dictionary<PIPoint, AFValues>>[2];
tasksArray[0] = (Task.Run(() => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[0])));
// tasksArray[1] = tasksArray[0].ContinueWith((x) => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[1]));
tasksArray[1] = (Task.Run(() => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[1])));
Task.WaitAll(tasksArray);
//await Task.WhenAll(tasksArray); -same results
for (int i = 0; i < tasksArray.Length; i++)
{
Program.Show(tasksArray[i].Result);
}
}
I measure time throught Stopwatch in AverageValueOfTagPerDay functions. This function is synchronous (Is that a problem?). Each Task take 12 seconds. But when I uncommented the line and use ContinueWith() approach, these Tasks take 5-6 seconds each(which is desirable). How is it possible?
More strange is that when I set the for loop in Main() on 10, sometimes it takes 5 seconds as well as when I use ContinueWith(). So I guess somewhere is deadlock but I am unable to find that.
Sorry for english, I got still problem make good senteces when I try explain some difficulties.
I have been trying make an asynchronous approach to my CPU-bound function which compute some aggregate functions.
"Asynchronous" and "CPU-bound" are not terms that go together. If you have a CPU-bound process, then you should use parallel technologies (Parallel, Parallel LINQ, TPL Dataflow).
I am reallz newbie in this Task Parallel world, I also read Stephem Cleary articles but I am still unsure of all aspect this asynchronous approach.
Possibly because I do not cover parallel technologies in any of my articles or blog posts. :) I do cover them in my book, but not online. My online work focuses on asynchrony, which is ideal for I/O-based operations.
To solve your problem, you should use a parallel approach:
public Dictionary<PIPoint, AFValues>[] DoTask(PIServer server, AFTimeRange timeRange)
{
var timeRanges = DivideTheTimeRange(timeRange);
var result = timeRanges.AsParallel().AsOrdered().
Select(range => CalculationClass.AverageValueOfTagPerDay(server, range)).
ToArray();
return result;
}
Of course, this approach assumes that PIServer is threadsafe. It also assumes that there's no I/O being done by the "server" class; if there is, then TPL Dataflow may be a better choice than Parallel LINQ.
If you are planning to use this code in a UI application and don't want to block the UI thread, then you can call the code asynchronously like this:
var results = await Task.Run(() => DoTask(server, timeRange));
foreach (var result in results)
Program.Show(result);

Suggest a way in threading that always fixed number of threads must be used in computing?

I have an array of filepath in List<string> with thousands of files. I want to process them in a function parallel with 8 threads.
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(fileList, opt, item => DoSomething(item));
This code works fine for me but it guarantees to run max 8 threads and I want to run 8 threads always. CLR decides the number of threads to be use as per CPU load.
Please suggest a way in threading that always 8 threads are used in computing with minimum overhead.
Use a producer / consumer model. Create one producer and 8 consumers. For example:
BlockingCollection<string> _filesToProcess = new BlockingCollection<string>();
// start 8 tasks to do the processing
List<Task> _consumers = new List<Task>();
for (int i = 0; i < 8; ++i)
{
var t = Task.Factory.StartNew(ProcessFiles, TaskCreationOptions.LongRunning);
_consumers.Add(t);
}
// Populate the queue
foreach (var filename in filelist)
{
_filesToProcess.Add(filename);
}
// Mark the collection as complete for adding
_filesToProcess.CompleteAdding();
// wait for consumers to finish
Task.WaitAll(_consumers.ToArray(), Timeout.Infinite);
Your processing method removes things from the BlockingCollection and processes them:
void ProcessFiles()
{
foreach (var filename in _filesToProcess.GetConsumingEnumerable())
{
// do something with the file name
}
}
That will keep 8 threads running until the collection is empty. Assuming, of course, you have 8 cores on which to run the threads. If you have fewer available cores, then there will be a lot of context switching, which will cost you.
See BlockingCollection for more information.
Within a static counter, you might be able to get the number of current threads.
Every time you call start a task there is the possibility to use the Task.ContinueWith (http://msdn.microsoft.com/en-us/library/dd270696.aspx) to notify that it's over and you can start another one.
This way there is going to be always 8 tasks running.
OrderablePartitioner<Tuple<int, int>> chunkPart = Partitioner.Create(0, fileList.Count, 1);//Partition the list in chunk of 1 entry
ParallelOptions opt= new ParallelOptions();
opt.TaskScheduler = null;
opt.MaxDegreeOfParallelism = 8;
Parallel.ForEach(chunkPart, opt, chunkRange =>
{
for (int i = chunkRange.Item1; i < chunkRange.Item2; i++)
{
DoSomething(fileList[i].FullName);
}
});

Controlling number of threads using AsParallel or Parallel.ForEach

I have a huge collection, over which i have to perform a specific task(which involves calling a wcf service). I want to control the number of threads instead of using Parallel.ForEach directly. Here i have 2 options:
I am using below to partition the data:
List<MyCollectionObject> MyCollection = new List<MyCollectionObject>();
public static IEnumerable<List<T>> PartitionMyData<T>(this IList<T> source, Int32 size)
{
for (int i = 0; i < Math.Ceiling(source.Count / (Double)size); i++)
{
yield return new List<T>(source.Skip(size * i).Take(size));
}
}
Option 1:
MyCollection.PartitionMyData(AutoEnrollRequests.Count()/threadValue).AsParallel().AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
foreach(MyCollectionObject obj in requests)
{
//Do Something
}
}
Option2:
MyCollection.PartitionMyData(threadValue).AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
Action<MyCollectionObject> dosomething =
{
}
Parallel.ForEach(requests,dosomething)
}
If i have 16 objects in my collection, as per my knowledge Option1 will launch 4 threads, each thread having 4 objects will be processed synchronously.
Option 2 will launch 4 threads with 1 object each, process them and again will launch 4 threads.
Can anyone please suggest which option is better?
P.S.
I understand .Net framework does thread pooling and we need not control the number of threads but due to some design decision we want to use it.
Thanks In Advance,
Rohit
I want to control the number of threads instead of using Parallel.ForEach directly
You can control de number of threads in Parallel.ForEach if you use this call with a ParallelOptions object:
Parallel.ForEach(requests,
new ParallelOptions(){MaxDegreeOfParallelism = 4}, //change here
dosomething)
It's impossible to give an A or B answer here. It depends on too many unknowns.
I will assume you want the fastest approach. To see which is better, run both on the target environment (or closest approximation you can get) and see which one completes fastest.

Categories

Resources