identifiing the simultaneous tasks in a TPL dataflow - c#

I have 1000 elements in a TPL dataflow block,
each element will call external webservices.
the web service supports a maximum of 10 simultaneous calls,
which is easily achieved using:
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
...
}
The web service requires each call to have a unique id passed which distinguises it from the other simultaneous calls.
In theory this should be a guid, but in practise the 11th GUID will fail - because the throttling mechanism on the server is slow to recognise that the first call is finished.
The vendor suggests we recycle the guids, keeping 10 in active use.
I intend to have an array of GUIDS, each task will use (Interlocked.Increment(ref COUNTER) % 10 ) as the array index
EDIT :
I just realised this won't work!
It assumes tasks will complete in order which they may not
I could implement this as a queue of IDs where each task borrows and returns one, but the question still stands, is there a an easier, pre bulit thread-safe way to do this?
(there will never be enough calls for COUNTER to overflow)
But I've been surprised a number of times by C# (I'm new to .net) that I am implementing something that already exists.
Is there a better thread-safe way for each task to recycle from a pool of ids?

Creating resource pools is the exact situation System.Collections.ConcurrentBag<T> is useful for. Wrap it up in a BlockingCollection<T> to make the code easier.
class Example
{
private readonly BlockingCollection<Guid> _guidPool;
private readonly TransformBlock<Foo, Bar> _transform;
public Example(int concurrentLimit)
{
_guidPool = new BlockingCollection<Guid>(new ConcurrentBag<Guid>(), concurrentLimit)
for(int i = 0: i < concurrentLimit; i++)
{
_guidPool.Add(Guid.NewGuid());
}
_transform = new TransformBlock<Foo, Bar>(() => SomeAction,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = concurrentLimit
//...
});
//...
}
private async Task<Bar> SomeAction(Foo foo)
{
var id= _guidPool.Take();
try
{
//...
}
finally
{
_guidPool.Add(id);
}
}
}

Related

How to avoid running out of RAM, during a concurrent data proccessing?

I have an issue with data concurrent processing. My PC is running out of RAM quickly. Any advices on how to fix my concurrent implementation?
Common class:
public class CalculationResult
{
public int Count { get; set; }
public decimal[] RunningTotals { get; set; }
public CalculationResult(decimal[] profits)
{
this.Count = 1;
this.RunningTotals = new decimal[12];
profits.CopyTo(this.RunningTotals, 0);
}
public void Update(decimal[] newData)
{
this.Count++;
// summ arrays
for (int i = 0; i < 12; i++)
this.RunningTotals[i] = this.RunningTotals[i] + newData[i];
}
public void Update(CalculationResult otherResult)
{
this.Count += otherResult.Count;
// summ arrays
for (int i = 0; i < 12; i++)
this.RunningTotals[i] = this.RunningTotals[i] + otherResult.RunningTotals[i];
}
}
Single-core implementation of the code is following:
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
foreach (var i in itterations)
{
// do the processing
// ..
string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing
if (combinations.ContainsKey(combination))
combinations[combination].Update(newData);
else
combinations.Add(combination, new CalculationResult(newData));
}
Multi-core implementation:
ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
Parallel.ForEach(itterations, (i, state) =>
{
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
// do the processing
// ..
// add combination to combinations -> same logic as in single core implementation
results.Add(combinations);
});
Dictionary<string, CalculationResult> combinationsReal = new Dictionary<string, CalculationResult>();
foreach (var item in results)
{
foreach (var pair in item)
{
if (combinationsReal.ContainsKey(pair.Key))
combinationsReal[pair.Key].Update(pair.Value);
else
combinationsReal.Add(pair.Key, pair.Value);
}
}
The issue I am having is that almost each combinations dictionary ends up with 930k records in it, which is on average consumes 400 [MB] RAM memory.
Now, in single core implementation there is only one such dictionary. All checks are performed against one dictionary. But this is slow approach and I want to use multi-core optimizations.
In multi-core implementation there is a ConcurrentBag instance created which holds all combinations dictionaries. As soon as the multi-thread job is finished - all dictionaries are aggregated into one. This approach works well for small amount of concurrent iterations. For example, for 4 iterations my RAM usage was ~ 1.5 [GB]. The issue arises, when I set the full amount of parallel iterations, which is 200! No amount of PC RAM is enough to hold all dictionaries, with million records each!
I was thinking about using ConcurrentDictioanary, until I found out that the "TryAdd" method does not guarantee integrity of added data in my situation, as I also need to run updates on running totals.
The only real multi-threaded option is, instead of adding all combinations to dictionary - is to save them to some DB. Data aggregation will then be a matter of 1 SQL select statement with a group by clause... but I don't like the idea of creating a temporary table and running DB instance just for that..
Is there a work around on how to processes data concurrently and not run out of RAM?
EDIT:
Maybe the real question should have been - how to make updating of RunningTotals thread-safe when using ConcurrentDictionary? I have just ran across this thread, with a similar issue with ConcurrentDictionary, but my situation seems to be more complicated as I have an array that needs to be updated. I am still investigating this matter.
EDIT2: Here is a working solution with ConcurrentDictionary. All I needed to do is to add a lock for the dictionary key.
ConcurrentDictionary<string, CalculationResult> combinations = new ConcurrentDictionary<string, CalculationResult>();
Parallel.ForEach(itterations, (i, state) =>
{
// do the processing
// ..
string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing
if (combinations.ContainsKey(combination)) {
lock(combinations[combination])
combinations[combination].Update(newData);
}
else
combinations.TryAdd(combination, new CalculationResult(newData));
});
Single-thread code execution time is 1m 48s, whereas this solution execution time is 1m 7s for 4 iterations (37% performance increase). I am still wondering if SQL approach will be any faster, with millions of records? I will test it out possibly tomorrow and update.
Edit 3: For those of you wondering what's wrong with ConcurrentDictionary updates on a value - run this code with and without the lock.
public class Result
{
public int Count { get; set; }
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Start");
List<int> keys = new List<int>();
for (int i = 0; i < 100; i++)
keys.Add(i);
ConcurrentDictionary<int, Result> dict = new ConcurrentDictionary<int, Result>();
Parallel.For(0, 8, i =>
{
foreach(var key in keys)
{
if (dict.ContainsKey(key))
{
//lock (dict[key]) // uncomment this
dict[key].Count++;
}
else
dict.TryAdd(key, new Result());
}
});
// any output here is incorrect behavior. best result = no lines
foreach (var item in dict)
if (item.Value.Count != 7) { Console.WriteLine($"{item.Key}; {item.Value.Count}"); }
Console.WriteLine($"Finish");
Console.ReadKey();
}
}
Edit 4: After trials and errors I couldn't optimize SQL approach. This turned out to be the worst idea :) I have used an SQL Lite database. In-memory and in-file. With transaction and reusable SQL command parameters. Due to the huge amount of records that needed to be inserted - the performance is lacking. Data aggregation is the easiest part, but it takes a huge amount of time just to insert 4 millions of rows, I can't even begin to imagine how the 240 million of data could be processed efficiently.. So far (and also strangely), ConcurrentBag approach seems to be the fastest on my PC. Followed by a ConcurrentDictionary approach. ConcurrentBag is a bit heavier on memory, though. Thanks to the work of #Alisson - it is now perfectly fine to use it for larger set of iterations!
So, you just need to be sure you'll have no more than 4 concurrent iterations, that's the limit of your computer resources and by using only this computer, there is no magic.
I created a class to control the concurrent execution and the number of concurrent tasks it will perform.
The class will hold these properties:
public class ConcurrentCalculationProcessor
{
private const int MAX_CONCURRENT_TASKS = 4;
private readonly IEnumerable<int> _codes;
private readonly List<Task<Dictionary<string, CalculationResult>>> _tasks;
private readonly Dictionary<string, CalculationResult> _combinationsReal;
public ConcurrentCalculationProcessor(IEnumerable<int> codes)
{
this._codes = codes;
this._tasks = new List<Task<Dictionary<string, CalculationResult>>>();
this._combinationsReal = new Dictionary<string, CalculationResult>();
}
}
I made the number of concurrent tasks a const, but it could be a parameter in the constructor.
I created a method to handle the processing. For test purposes, I simulated a loop through 900k itens, adding them to a dictionary, and finally returning them:
private async Task<Dictionary<string, CalculationResult>> ProcessCombinations()
{
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
// do the processing
// here we should do something that worth using concurrency
// like querying databases, consuming APIs/WebServices, and other I/O stuff
for (int i = 0; i < 950000; i++)
combinations[i.ToString()] = new CalculationResult(new decimal[] { 1, 10, 15 });
return await Task.FromResult(combinations);
}
The main method will start tasks in parallel, adding them to a list of tasks, so we can keep track of them lately.
Everytime the list reaches the maximum concurrent tasks, we await a method called ProcessRealCombinations.
public async Task<Dictionary<string, CalculationResult>> Execute()
{
ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
for (int i = 0; i < this._codes.Count(); i++)
{
// start the task imediately
var task = ProcessCombinations();
this._tasks.Add(task);
if (this._tasks.Count() >= MAX_CONCURRENT_TASKS)
{
// if we have more than MAX_CONCURRENT_TASKS in progress, we start processing some of them
// this will await any of the current tasks to complete, them process it (and any other task which may have been completed as well)...
await ProcessCompletedTasks().ConfigureAwait(false);
}
}
// keep processing until all the pending tasks have been completed...it should be no more than MAX_CONCURRENT_TASKS
while(this._tasks.Any())
await ProcessCompletedTasks().ConfigureAwait(false);
return this._combinationsReal;
}
The next method ProcessCompletedTasks will wait for at least one of the existing tasks to complete. After that, it will take all the completed tasks from the list (that one which finished and any other which may have been finished together), and get the result of them (the combinations).
With each processedCombinations, it'll merge with this._combinationsReal (using the same logic you provided in your question).
private async Task ProcessCompletedTasks()
{
await Task.WhenAny(this._tasks).ConfigureAwait(false);
var completedTasks = this._tasks.Where(t => t.IsCompleted).ToArray();
// completedTasks will have at least one task, but it may have more ;)
foreach (var completedTask in completedTasks)
{
var processedCombinations = await completedTask.ConfigureAwait(false);
foreach (var pair in processedCombinations)
{
if (this._combinationsReal.ContainsKey(pair.Key))
this._combinationsReal[pair.Key].Update(pair.Value);
else
this._combinationsReal.Add(pair.Key, pair.Value);
}
this._tasks.Remove(completedTask);
}
}
For each processedCombinations merged in _combinationsReal, it will remove its respective task from the list, and move on (start adding more tasks again). This will happen until we have created all the tasks for all iterations.
Finally, we keep processing it, until there are no more tasks in the list.
If you monitor the RAM consumption, you'll notice it will increase to about 1.5 GB (when we have 4 tasks being processed concurrently), then decrease to about 0.8 GB (when we remove tasks from the list). At least this is what happened in my computer.
Here is a fiddle, however I had to decrease the number of itens from 900k to 100, because fiddle limits the memory usage to avoid abuse.
I hope this help you somehow.
One thing to notice about all this stuff, is that you will benefit from using concurrent tasks mostly if your ProcessCombinations (the method that is executed concurrently when processing those 900k items) calls external resources, like reading files from your HD, executing a query in a database, calling an API/WebService method. I guess that code is probably reading 900k items from an external resource, then this will reduce the time needed to process it.
If the items were previously loaded and ProcessCombinations is just reading data that was already in memory, then the concurrency won't help at all (actually I believe it would make your code ran slower). If that's the case, then we are applying concurrency in the wrong place.
Using async calls in parallel is likely to help more when said calls are going to access external resources (either to get or store data), and depending on how many concurrent calls that external resources can support, it may still not make such a difference.

Count number of threads used by Parallel.ForEach

How can I determine the number of threads used during a specific call of Parallel.ForEach (or Parallel.Invoke, or Parallel.For)
I know how to limit the maximum number of threads, e.g.
Parallel.ForEach(myList,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
item => { doStuff(item); });
I know that the Task.Parallel library uses some heuristics to determine the optimal number of additional threadpool threads to use at runtime, in addition to the current thread; some value between 0 and MaxDegreeOfParallelism.
I would like to know how many threads have actually been used, for logging purposes:
Stopwatch watch = Stopwatch.StartNew();
Parallel.ForEach(myList, item => { doStuff(item); });
trace.TraceInformation("Task finished in {0}ms using {1} threads",
watch.ElapsedMilliseconds, NUM_THREADS_USED);
I mainly want this data logged for curiosity's sake, and to improve my understanding. It does not have to be 100% reliable, since I do not intend to use it for anything else.
Is there a way to get this number, without major performance penalties?
You could use a (thread-safe) list to store the IDs of the used threads and count them:
ConcurrentBag<int> threadIDs = new ConcurrentBag<int>();
Parallel.ForEach(myList, item => {
threadIDs.Add(Thread.CurrentThread.ManagedThreadId);
doStuff(item);
});
int usedThreads = threadIDs.Distinct().Count();
This does have a performance impact (especially the thread-safety logic of ConcurrentBag), but I can't tell how big that is. The relative effect depends on how much work doStuff does itself. If that method has only a few commands, this thread counting solution may even change the number of used threads.
In your DoStuff method you can add the code like this
private void DoStuff(T item)
{
Logger.Log($"Item {item.ToString()} was handled by thread # {Thread.CurrentThread.ManagedThreadId}");
// your logic here
}
I know that the Task.Parallel library uses some heuristics to determine the optimal number of additional threadpool threads to use at runtime, in addition to the current thread; some value between 0 and MaxDegreeOfParallelism.
I would like to know how many threads have actually been used, for logging purposes
Since you mention the thread pool and MaxDoP, I interpreted this question as you wanted to know how many concurrent threads were used at any one time. This you can find out by using a field and Interlocked.
class MyClass
{
private int _concurrentThreadCount;
private ILog _logger; //for example
public void DoWork()
{
var listOfSomething = GetListOfStuff();
Parallel.ForEach(listOfSomething, singleSomething =>
{
Interlocked.Increment(ref _concurrentThreadCount);
_logger.Info($"Doing some work. Concurrent thread count: {_concurrentThreadCount}");
// do work
Interlocked.Decrement(ref _concurrentThreadCount);
});
}
}
While I am aware this is an older question, I followed up on Evk's suggestion. Also not sure about the performance impact, but you could use a concurrentdictionary to keep track of the threadids:
var threadIDs = new ConcurrentDictionary<int, int>();
Parallel.ForEach(myList, item => {
threadIDs.TryAdd(Thread.CurrentThread.ManagedThreadId, 0);
doStuff(item);
});
int usedThreads = threadIDs.Keys.Count();

Controlling number of threads using AsParallel or Parallel.ForEach

I have a huge collection, over which i have to perform a specific task(which involves calling a wcf service). I want to control the number of threads instead of using Parallel.ForEach directly. Here i have 2 options:
I am using below to partition the data:
List<MyCollectionObject> MyCollection = new List<MyCollectionObject>();
public static IEnumerable<List<T>> PartitionMyData<T>(this IList<T> source, Int32 size)
{
for (int i = 0; i < Math.Ceiling(source.Count / (Double)size); i++)
{
yield return new List<T>(source.Skip(size * i).Take(size));
}
}
Option 1:
MyCollection.PartitionMyData(AutoEnrollRequests.Count()/threadValue).AsParallel().AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
foreach(MyCollectionObject obj in requests)
{
//Do Something
}
}
Option2:
MyCollection.PartitionMyData(threadValue).AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
Action<MyCollectionObject> dosomething =
{
}
Parallel.ForEach(requests,dosomething)
}
If i have 16 objects in my collection, as per my knowledge Option1 will launch 4 threads, each thread having 4 objects will be processed synchronously.
Option 2 will launch 4 threads with 1 object each, process them and again will launch 4 threads.
Can anyone please suggest which option is better?
P.S.
I understand .Net framework does thread pooling and we need not control the number of threads but due to some design decision we want to use it.
Thanks In Advance,
Rohit
I want to control the number of threads instead of using Parallel.ForEach directly
You can control de number of threads in Parallel.ForEach if you use this call with a ParallelOptions object:
Parallel.ForEach(requests,
new ParallelOptions(){MaxDegreeOfParallelism = 4}, //change here
dosomething)
It's impossible to give an A or B answer here. It depends on too many unknowns.
I will assume you want the fastest approach. To see which is better, run both on the target environment (or closest approximation you can get) and see which one completes fastest.

Parallel Programming C# - TPL - Update Shared Variable & Suggestions

I am new to Parallel Programming and infact this is the first time I am trying it. I am currently doing a project in .NET 4 and prefer to do have 4 or 5 parallel executions.
I see some options. There is Task.Factory.StartNew Parallel.For Parallel.ForEach etc.
What I am going to do is post to a web-site and fetch the responses for about 200 URLs.
When I use Parallel.ForEach I didn't find a way to control the number of threads and the application went using 130+ threads and the website went unresponsive :)
I am interested in using Task.Factory.StartNew within a for loop and divide the URLs in to 4 or 5 tasks.
List<Task> tasks = new List<Task>();
for (int i = 0; i < 5; i++)
{
List<string> UrlForTask = GetUrlsForTask(i,5); //Lets say will return some thing like 1 of 5 of the list of URLs
int j = i;
var t = Task.Factory.StartNew(() =>
{
List<PageSummary> t = GetSummary(UrlForTask);
Summary.AddRange(t); //Summary is a public variable
}
tasks.Add(t);
}
I believe that these Tasks kind of boil down to threads. So if I make Summary a List<PageSummary> will it be kind of thread safe (I understand there are issues accessing a shared variable by multiple threads)?
Is this where we should use ConcurrentQueue<T> ?
Do you know of a good resource that helps to learn about accessing and updating a shared variable by multiple tasks etc?
What is the best way I could use for this type of task as you may think ?
Parallel.ForEach has overloads that take a ParallelOptions instance. The MaxDegreeOfParallelism property of that class is what you need to use.
List<MyRequest> requests = ...;
BlockingCollection<MyResponse> responses = ...;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(
requests,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
request => responses.Add(MyDownload(request)));
responses.CompleteAdding();
});
foreach (var response in responses.GetConsumingEnumerable())
{
Console.WriteLine(response.MyMessage);
}

Why does Parallel.Foreach create endless threads?

The code below continues to create threads, even when the queue is empty..until eventually an OutOfMemory exception occurs. If i replace the Parallel.ForEach with a regular foreach, this does not happen. anyone know of reasons why this may happen?
public delegate void DataChangedDelegate(DataItem obj);
public class Consumer
{
public DataChangedDelegate OnCustomerChanged;
public DataChangedDelegate OnOrdersChanged;
private CancellationTokenSource cts;
private CancellationToken ct;
private BlockingCollection<DataItem> queue;
public Consumer(BlockingCollection<DataItem> queue) {
this.queue = queue;
Start();
}
private void Start() {
cts = new CancellationTokenSource();
ct = cts.Token;
Task.Factory.StartNew(() => DoWork(), ct);
}
private void DoWork() {
Parallel.ForEach(queue.GetConsumingPartitioner(), item => {
if (item.DataType == DataTypes.Customer) {
OnCustomerChanged(item);
} else if(item.DataType == DataTypes.Order) {
OnOrdersChanged(item);
}
});
}
}
I think Parallel.ForEach() was made primarily for processing bounded collections. And it doesn't expect collections like the one returned by GetConsumingPartitioner(), where MoveNext() blocks for a long time.
The problem is that Parallel.ForEach() tries to find the best degree of parallelism, so it starts as many Tasks as the TaskScheduler lets it run. But the TaskScheduler sees there are many Tasks that take a very long time to finish, and that they're not doing anything (they block) so it keeps on starting new ones.
I think the best solution is to set the MaxDegreeOfParallelism.
As an alternative, you could use TPL Dataflow's ActionBlock. The main difference in this case is that ActionBlock doesn't block any threads when there are no items to process, so the number of threads wouldn't get anywhere near the limit.
The Producer/Consumer pattern is mainly used when there is just one Producer and one Consumer.
However, what you are trying to achieve (multiple consumers) more neatly fits in the Worklist pattern. The following code was taken from a slide for unit2 slide "2c - Shared Memory Patterns" from a parallel programming class taught at the University of Utah, which is available in the download at http://ppcp.codeplex.com/
BlockingCollection<Item> workList;
CancellationTokenSource cts;
int itemcount
public void Run()
{
int num_workers = 4;
//create worklist, filled with initial work
worklist = new BlockingCollection<Item>(
new ConcurrentQueue<Item>(GetInitialWork()));
cts = new CancellationTokenSource();
itemcount = worklist.Count();
for( int i = 0; i < num_workers; i++)
Task.Factory.StartNew( RunWorker );
}
IEnumberable<Item> GetInitialWork() { ... }
public void RunWorker() {
try {
do {
Item i = worklist.Take( cts.Token );
//blocks until item available or cancelled
Process(i);
//exit loop if no more items left
} while (Interlocked.Decrement( ref itemcount) > 0);
} finally {
if( ! cts.IsCancellationRequested )
cts.Cancel();
}
}
}
public void AddWork( Item item) {
Interlocked.Increment( ref itemcount );
worklist.Add(item);
}
public void Process( Item i )
{
//Do what you want to the work item here.
}
The preceding code allows you to add worklist items to the queue, and lets you set an arbitrary number of workers (in this case, four) to pull items out of the queue and process them.
Another great resource for the Parallelism on .Net 4.0 is the book "Parallel Programming with Microsoft .Net" which is freely available at: http://msdn.microsoft.com/en-us/library/ff963553
Internally in the Task Parallel Library, the Parallel.For and Parallel.Foreach follow a hill-climbing algorithm to determine how much parallelism should be utilized for the operation.
More or less, they start with running the body on one task, move to two, and so on, until a break-point is reached and they need to reduce the number of tasks.
This works quite well for method bodies that complete quickly, but if the body takes a long time to run, it may take a long time before the it realizes it needs to decrease the amount of parallelism. Until that point, it continues adding tasks, and possibly crashes the computer.
I learned the above during a lecture given by one of the developers of the Task Parallel Library.
Specifying the MaxDegreeOfParallelism is probably the easiest way to go.

Categories

Resources