How to avoid running out of RAM, during a concurrent data proccessing? - c#

I have an issue with data concurrent processing. My PC is running out of RAM quickly. Any advices on how to fix my concurrent implementation?
Common class:
public class CalculationResult
{
public int Count { get; set; }
public decimal[] RunningTotals { get; set; }
public CalculationResult(decimal[] profits)
{
this.Count = 1;
this.RunningTotals = new decimal[12];
profits.CopyTo(this.RunningTotals, 0);
}
public void Update(decimal[] newData)
{
this.Count++;
// summ arrays
for (int i = 0; i < 12; i++)
this.RunningTotals[i] = this.RunningTotals[i] + newData[i];
}
public void Update(CalculationResult otherResult)
{
this.Count += otherResult.Count;
// summ arrays
for (int i = 0; i < 12; i++)
this.RunningTotals[i] = this.RunningTotals[i] + otherResult.RunningTotals[i];
}
}
Single-core implementation of the code is following:
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
foreach (var i in itterations)
{
// do the processing
// ..
string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing
if (combinations.ContainsKey(combination))
combinations[combination].Update(newData);
else
combinations.Add(combination, new CalculationResult(newData));
}
Multi-core implementation:
ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
Parallel.ForEach(itterations, (i, state) =>
{
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
// do the processing
// ..
// add combination to combinations -> same logic as in single core implementation
results.Add(combinations);
});
Dictionary<string, CalculationResult> combinationsReal = new Dictionary<string, CalculationResult>();
foreach (var item in results)
{
foreach (var pair in item)
{
if (combinationsReal.ContainsKey(pair.Key))
combinationsReal[pair.Key].Update(pair.Value);
else
combinationsReal.Add(pair.Key, pair.Value);
}
}
The issue I am having is that almost each combinations dictionary ends up with 930k records in it, which is on average consumes 400 [MB] RAM memory.
Now, in single core implementation there is only one such dictionary. All checks are performed against one dictionary. But this is slow approach and I want to use multi-core optimizations.
In multi-core implementation there is a ConcurrentBag instance created which holds all combinations dictionaries. As soon as the multi-thread job is finished - all dictionaries are aggregated into one. This approach works well for small amount of concurrent iterations. For example, for 4 iterations my RAM usage was ~ 1.5 [GB]. The issue arises, when I set the full amount of parallel iterations, which is 200! No amount of PC RAM is enough to hold all dictionaries, with million records each!
I was thinking about using ConcurrentDictioanary, until I found out that the "TryAdd" method does not guarantee integrity of added data in my situation, as I also need to run updates on running totals.
The only real multi-threaded option is, instead of adding all combinations to dictionary - is to save them to some DB. Data aggregation will then be a matter of 1 SQL select statement with a group by clause... but I don't like the idea of creating a temporary table and running DB instance just for that..
Is there a work around on how to processes data concurrently and not run out of RAM?
EDIT:
Maybe the real question should have been - how to make updating of RunningTotals thread-safe when using ConcurrentDictionary? I have just ran across this thread, with a similar issue with ConcurrentDictionary, but my situation seems to be more complicated as I have an array that needs to be updated. I am still investigating this matter.
EDIT2: Here is a working solution with ConcurrentDictionary. All I needed to do is to add a lock for the dictionary key.
ConcurrentDictionary<string, CalculationResult> combinations = new ConcurrentDictionary<string, CalculationResult>();
Parallel.ForEach(itterations, (i, state) =>
{
// do the processing
// ..
string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing
if (combinations.ContainsKey(combination)) {
lock(combinations[combination])
combinations[combination].Update(newData);
}
else
combinations.TryAdd(combination, new CalculationResult(newData));
});
Single-thread code execution time is 1m 48s, whereas this solution execution time is 1m 7s for 4 iterations (37% performance increase). I am still wondering if SQL approach will be any faster, with millions of records? I will test it out possibly tomorrow and update.
Edit 3: For those of you wondering what's wrong with ConcurrentDictionary updates on a value - run this code with and without the lock.
public class Result
{
public int Count { get; set; }
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Start");
List<int> keys = new List<int>();
for (int i = 0; i < 100; i++)
keys.Add(i);
ConcurrentDictionary<int, Result> dict = new ConcurrentDictionary<int, Result>();
Parallel.For(0, 8, i =>
{
foreach(var key in keys)
{
if (dict.ContainsKey(key))
{
//lock (dict[key]) // uncomment this
dict[key].Count++;
}
else
dict.TryAdd(key, new Result());
}
});
// any output here is incorrect behavior. best result = no lines
foreach (var item in dict)
if (item.Value.Count != 7) { Console.WriteLine($"{item.Key}; {item.Value.Count}"); }
Console.WriteLine($"Finish");
Console.ReadKey();
}
}
Edit 4: After trials and errors I couldn't optimize SQL approach. This turned out to be the worst idea :) I have used an SQL Lite database. In-memory and in-file. With transaction and reusable SQL command parameters. Due to the huge amount of records that needed to be inserted - the performance is lacking. Data aggregation is the easiest part, but it takes a huge amount of time just to insert 4 millions of rows, I can't even begin to imagine how the 240 million of data could be processed efficiently.. So far (and also strangely), ConcurrentBag approach seems to be the fastest on my PC. Followed by a ConcurrentDictionary approach. ConcurrentBag is a bit heavier on memory, though. Thanks to the work of #Alisson - it is now perfectly fine to use it for larger set of iterations!

So, you just need to be sure you'll have no more than 4 concurrent iterations, that's the limit of your computer resources and by using only this computer, there is no magic.
I created a class to control the concurrent execution and the number of concurrent tasks it will perform.
The class will hold these properties:
public class ConcurrentCalculationProcessor
{
private const int MAX_CONCURRENT_TASKS = 4;
private readonly IEnumerable<int> _codes;
private readonly List<Task<Dictionary<string, CalculationResult>>> _tasks;
private readonly Dictionary<string, CalculationResult> _combinationsReal;
public ConcurrentCalculationProcessor(IEnumerable<int> codes)
{
this._codes = codes;
this._tasks = new List<Task<Dictionary<string, CalculationResult>>>();
this._combinationsReal = new Dictionary<string, CalculationResult>();
}
}
I made the number of concurrent tasks a const, but it could be a parameter in the constructor.
I created a method to handle the processing. For test purposes, I simulated a loop through 900k itens, adding them to a dictionary, and finally returning them:
private async Task<Dictionary<string, CalculationResult>> ProcessCombinations()
{
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
// do the processing
// here we should do something that worth using concurrency
// like querying databases, consuming APIs/WebServices, and other I/O stuff
for (int i = 0; i < 950000; i++)
combinations[i.ToString()] = new CalculationResult(new decimal[] { 1, 10, 15 });
return await Task.FromResult(combinations);
}
The main method will start tasks in parallel, adding them to a list of tasks, so we can keep track of them lately.
Everytime the list reaches the maximum concurrent tasks, we await a method called ProcessRealCombinations.
public async Task<Dictionary<string, CalculationResult>> Execute()
{
ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
for (int i = 0; i < this._codes.Count(); i++)
{
// start the task imediately
var task = ProcessCombinations();
this._tasks.Add(task);
if (this._tasks.Count() >= MAX_CONCURRENT_TASKS)
{
// if we have more than MAX_CONCURRENT_TASKS in progress, we start processing some of them
// this will await any of the current tasks to complete, them process it (and any other task which may have been completed as well)...
await ProcessCompletedTasks().ConfigureAwait(false);
}
}
// keep processing until all the pending tasks have been completed...it should be no more than MAX_CONCURRENT_TASKS
while(this._tasks.Any())
await ProcessCompletedTasks().ConfigureAwait(false);
return this._combinationsReal;
}
The next method ProcessCompletedTasks will wait for at least one of the existing tasks to complete. After that, it will take all the completed tasks from the list (that one which finished and any other which may have been finished together), and get the result of them (the combinations).
With each processedCombinations, it'll merge with this._combinationsReal (using the same logic you provided in your question).
private async Task ProcessCompletedTasks()
{
await Task.WhenAny(this._tasks).ConfigureAwait(false);
var completedTasks = this._tasks.Where(t => t.IsCompleted).ToArray();
// completedTasks will have at least one task, but it may have more ;)
foreach (var completedTask in completedTasks)
{
var processedCombinations = await completedTask.ConfigureAwait(false);
foreach (var pair in processedCombinations)
{
if (this._combinationsReal.ContainsKey(pair.Key))
this._combinationsReal[pair.Key].Update(pair.Value);
else
this._combinationsReal.Add(pair.Key, pair.Value);
}
this._tasks.Remove(completedTask);
}
}
For each processedCombinations merged in _combinationsReal, it will remove its respective task from the list, and move on (start adding more tasks again). This will happen until we have created all the tasks for all iterations.
Finally, we keep processing it, until there are no more tasks in the list.
If you monitor the RAM consumption, you'll notice it will increase to about 1.5 GB (when we have 4 tasks being processed concurrently), then decrease to about 0.8 GB (when we remove tasks from the list). At least this is what happened in my computer.
Here is a fiddle, however I had to decrease the number of itens from 900k to 100, because fiddle limits the memory usage to avoid abuse.
I hope this help you somehow.
One thing to notice about all this stuff, is that you will benefit from using concurrent tasks mostly if your ProcessCombinations (the method that is executed concurrently when processing those 900k items) calls external resources, like reading files from your HD, executing a query in a database, calling an API/WebService method. I guess that code is probably reading 900k items from an external resource, then this will reduce the time needed to process it.
If the items were previously loaded and ProcessCombinations is just reading data that was already in memory, then the concurrency won't help at all (actually I believe it would make your code ran slower). If that's the case, then we are applying concurrency in the wrong place.
Using async calls in parallel is likely to help more when said calls are going to access external resources (either to get or store data), and depending on how many concurrent calls that external resources can support, it may still not make such a difference.

Related

Parallel.ForEach faster than Task.WaitAll for I/O bound tasks?

I have two versions of my program that submit ~3000 HTTP GET requests to a web server.
The first version is based off of what I read here. That solution makes sense to me because making web requests is I/O bound work, and the use of async/await along with Task.WhenAll or Task.WaitAll means that you can submit 100 requests all at once and then wait for them all to finish before submitting the next 100 requests so that you don't bog down the web server. I was surprised to see that this version completed all of the work in ~12 minutes - way slower than I expected.
The second version submits all 3000 HTTP GET requests inside a Parallel.ForEach loop. I use .Result to wait for each request to finish before the rest of the logic within that iteration of the loop can execute. I thought that this would be a far less efficient solution, since using threads to perform tasks in parallel is usually better suited for performing CPU bound work, but I was surprised to see that the this version completed all of the work within ~3 minutes!
My question is why is the Parallel.ForEach version faster? This came as an extra surprise because when I applied the same two techniques against a different API/web server, version 1 of my code was actually faster than version 2 by about 6 minutes - which is what I expected. Could performance of the two different versions have something to do with how the web server handles the traffic?
You can see a simplified version of my code below:
private async Task<ObjectDetails> TryDeserializeResponse(HttpResponseMessage response)
{
try
{
using (Stream stream = await response.Content.ReadAsStreamAsync())
using (StreamReader readStream = new StreamReader(stream, Encoding.UTF8))
using (JsonTextReader jsonTextReader = new JsonTextReader(readStream))
{
JsonSerializer serializer = new JsonSerializer();
ObjectDetails objectDetails = serializer.Deserialize<ObjectDetails>(
jsonTextReader);
return objectDetails;
}
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<HttpResponseMessage> TryGetResponse(string urlStr)
{
try
{
HttpResponseMessage response = await httpClient.GetAsync(urlStr)
.ConfigureAwait(false);
if (response.StatusCode != HttpStatusCode.OK)
{
throw new WebException("Response code is "
+ response.StatusCode.ToString() + "... not 200 OK.");
}
return response;
}
catch (Exception e)
{
// Log exception
return null;
}
}
private async Task<ListOfObjects> GetObjectDetailsAsync(string baseUrl, int id)
{
string urlStr = baseUrl + #"objects/id/" + id + "/details";
HttpResponseMessage response = await TryGetResponse(urlStr);
ObjectDetails objectDetails = await TryDeserializeResponse(response);
return objectDetails;
}
// With ~3000 objects to retrieve, this code will create 100 API calls
// in parallel, wait for all 100 to finish, and then repeat that process
// ~30 times. In other words, there will be ~30 batches of 100 parallel
// API calls.
private Dictionary<int, Task<ObjectDetails>> GetAllObjectDetailsInBatches(
string baseUrl, Dictionary<int, MyObject> incompleteObjects)
{
int batchSize = 100;
int numberOfBatches = (int)Math.Ceiling(
(double)incompleteObjects.Count / batchSize);
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= new Dictionary<int, Task<ObjectDetails>>(incompleteObjects.Count);
var orderedIncompleteObjects = incompleteObjects.OrderBy(pair => pair.Key);
for (int i = 0; i < 1; i++)
{
var batchOfObjects = orderedIncompleteObjects.Skip(i * batchSize)
.Take(batchSize);
var batchObjectsTaskList = batchOfObjects.Select(
pair => GetObjectDetailsAsync(baseUrl, pair.Key));
Task.WaitAll(batchObjectsTaskList.ToArray());
foreach (var objTask in batchObjectsTaskList)
objectTaskDict.Add(objTask.Result.id, objTask);
}
return objectTaskDict;
}
public void GetObjectsVersion1()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Dictionary<int, Task<ObjectDetails>> objectTaskDict
= GetAllObjectDetailsInBatches(baseUrl, incompleteObjects);
foreach (KeyValuePair<int, MyObject> pair in incompleteObjects)
{
ObjectDetails objectDetails = objectTaskDict[pair.Key].Result
.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
};
}
public void GetObjectsVersion2()
{
string baseUrl = #"https://mywebserver.com:/api";
// GetIncompleteObjects is not shown, but it is not relevant to
// the question
Dictionary<int, MyObject> incompleteObjects = GetIncompleteObjects();
Parallel.ForEach(incompleteHosts, pair =>
{
ObjectDetails objectDetails = GetObjectDetailsAsync(
baseUrl, pair.Key).Result.objectDetails;
// Code here that copies fields from objectDetails to pair.Value
// (the incompleteObject)
AllObjects.Add(pair.Value);
});
}
A possible reason why Parallel.ForEach may run faster is because it creates the side-effect of throttling. Initially x threads are processing the first x elements (where x in the number of the available cores), and progressively more threads may be added depending on internal heuristics. Throttling IO operations is a good thing because it protects the network and the server that handles the requests from becoming overburdened. Your alternative improvised method of throttling, by making requests in batches of 100, is far from ideal for many reasons, one of them being that 100 concurrent requests are a lot of requests! Another one is that a single long running operation may delay the completion of the batch until long after the completion of the other 99 operations.
Note that Parallel.ForEach is also not ideal for parallelizing IO operations. It just happened to perform better than the alternative, wasting memory all along. For better approaches look here: How to limit the amount of concurrent async I/O operations?
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?view=netframework-4.8
Basically the parralel foreach allows iterations to run in parallel so you are not constraining the iteration to run in serial, on a host that is not thread constrained this will tend to lead to improved throughput
In short:
Parallel.Foreach() is most useful for CPU bound tasks.
Task.WaitAll() is more useful for IO bound tasks.
So in your case, you are getting information from webservers, which is IO. If the async methods are implemented correctly, it won't block any thread. (It will use IO Completion ports to wait on) This way the threads can do other stuff.
By running the async methods GetObjectDetailsAsync(baseUrl, pair.Key).Result synchroniced, it will block a thread. So the threadpool will be flood by waiting threads.
So I think the Task solution will have a better fit.

Concurrency Delay When Performance testing on EF

I'm running a Performance Testing on EF because we were having some issue running concurrent calls on our Server.
Here is my Tests which i'm executing against northwind database, with 20,000 extra employees in the database to slow down retrieval
single execution generally return in about 451 ms, however when calling in parallel it seems I'm seeing some weird extra execution time
When I Run 10 Concurrent calls I'm getting 4 times the over head. It may seem trivial but when I execute things on my database you can see it a whole lot worse.
When running sp_whoisactive you can see that sql has already finished and it is wait to send back the (246ms)ASYNC_NETWORK_IO
Sql is not bottle necked.
class Program
{
public static int MAX_IO_THREADS = 2000;
public static int MIN_IO_THREADS = 1000;
public static int CONCURRENT_CALLS = 10;
static void Main(string[] args)
{
ConfigureThreadPool();
WarmUpContexts();
ProfileAction();
Console.ReadKey();
}
protected static void ProfileAction()
{
var callRange = Enumerable.Range(0, CONCURRENT_CALLS);
var stopwatch = new Stopwatch();
stopwatch.Start();
var results = new ConcurrentDictionary<int, string>();
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 512 };
Parallel.ForEach(callRange, parallelOptions, i =>
{
var callSw = new Stopwatch();
callSw.Start();
var result = RunTest();
callSw.Stop();
results.TryAdd(i, $"Call Index: {i} took {callSw.ElapsedMilliseconds} milliseconds to complete");
});
stopwatch.Stop();
var milliseconds = stopwatch.ElapsedMilliseconds;
foreach (var result in results.OrderBy(x => x.Key))
{
Console.WriteLine(result);
}
Console.WriteLine("Total Time " + milliseconds);
}
public static void ConfigureThreadPool()
{
ThreadPool.GetMinThreads(out var defaultMinWorkerCount, out var _);
ThreadPool.GetMaxThreads(out var defaultMaxWorkerCount, out var _);
ThreadPool.SetMaxThreads(defaultMaxWorkerCount, MAX_IO_THREADS);
var successMin = ThreadPool.SetMinThreads(defaultMinWorkerCount, MIN_IO_THREADS);
}
static void WarmUpContexts()
{
using (var stagingContext = new NORTHWNDEntities())
{
stagingContext.Employees.First();
}
}
static int RunTest()
{
using (var context = new Entities())
{
var employees = context.Employees.ToArray();
return employees.Length;
}
}
}
How can I optimize the performance of my threads so they actually return NOTE if I call async this will hang forever,
could this be a bug in .net?
A Few reasons why my project code was Delaying.
1.Configure Thread Pool. Opening up your IO threads will help delay with request that are not only cpu bound. Such as writing a file to disk, or a network bound request.
Note: if you're setting higher values you need to be careful not to
have you min exceed the max, because these functions are poorly
named, and should be called TrySetMin/TrySetMax since they return a
bool whether successful or not
Be careful setting these I was only uses extremely high numbers for testing and learning purposes
ThreadPool.SetMaxThreads(defaultMaxWorkerCount, MAX_IO_THREADS);
ThreadPool.SetMinThreads(defaultMinWorkerCount, MIN_IO_THREADS);
2.GcServer Apparently, in .Net by Default the Garbage Collector is optimized for single Core performance instead of multicore Performance. You can read more on this issue on MSDN
Note: in .Net 4.0 Enabling GCServer Causing issues for application who had UI. In .Net 4.5 these issues were fixed by enabling gcConcurrent=true by Default
<configuration>
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
3.Additional Delays caused by EntityFrameWork, It looks like mapping relationships is really expensive, even if you are not pull out any related entities.
var employees = context.Employees.AsNoTracking().ToArray();
In was testing using an extremely large database, Moral is, if you are loading data for GET requests, if possible load without tracking.
Note: This may cause unexpected behavior as, different entities who share a common Foreign Key will have Different CLR object associated with them

identifiing the simultaneous tasks in a TPL dataflow

I have 1000 elements in a TPL dataflow block,
each element will call external webservices.
the web service supports a maximum of 10 simultaneous calls,
which is easily achieved using:
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
...
}
The web service requires each call to have a unique id passed which distinguises it from the other simultaneous calls.
In theory this should be a guid, but in practise the 11th GUID will fail - because the throttling mechanism on the server is slow to recognise that the first call is finished.
The vendor suggests we recycle the guids, keeping 10 in active use.
I intend to have an array of GUIDS, each task will use (Interlocked.Increment(ref COUNTER) % 10 ) as the array index
EDIT :
I just realised this won't work!
It assumes tasks will complete in order which they may not
I could implement this as a queue of IDs where each task borrows and returns one, but the question still stands, is there a an easier, pre bulit thread-safe way to do this?
(there will never be enough calls for COUNTER to overflow)
But I've been surprised a number of times by C# (I'm new to .net) that I am implementing something that already exists.
Is there a better thread-safe way for each task to recycle from a pool of ids?
Creating resource pools is the exact situation System.Collections.ConcurrentBag<T> is useful for. Wrap it up in a BlockingCollection<T> to make the code easier.
class Example
{
private readonly BlockingCollection<Guid> _guidPool;
private readonly TransformBlock<Foo, Bar> _transform;
public Example(int concurrentLimit)
{
_guidPool = new BlockingCollection<Guid>(new ConcurrentBag<Guid>(), concurrentLimit)
for(int i = 0: i < concurrentLimit; i++)
{
_guidPool.Add(Guid.NewGuid());
}
_transform = new TransformBlock<Foo, Bar>(() => SomeAction,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = concurrentLimit
//...
});
//...
}
private async Task<Bar> SomeAction(Foo foo)
{
var id= _guidPool.Take();
try
{
//...
}
finally
{
_guidPool.Add(id);
}
}
}

Make Parallel.ForEach wait to get work until a slot opens

I'm using Parallel.ForEach to work a bunch of items. The problem is, I want to prioritize which items get worked depending on the number of workers (slots) that are open. E.g. if I am working 8 parallel things and a slot opens between task 1-4, I want to assign easy work to those slots. The bottom half of the slots will get the hard work. This way, I won't get all 8 slots tied up doing hard/long-running work, easy/quick items will be run first. I've implemented this as follows:
The Code
const int workers = 8;
List<Thing> thingsToDo = ...; //Get the things that need to be done.
Thing[] currentlyWorkingThings = new Thing[workers]; //One slot for each worker.
void Run() {
Parallel.ForEach(PrioritizeThings(thingsToDo), o => {
int index = 0;
//"PrioritizeTasks" added this thing to the list of currentlyWorkingThings.
//Find my position in this list.
lock (currentlyWorkingThings)
index = currentlyWorkingThings.IndexOf(o);
//Do work on this thing...
//Then remove it from the list of currently working things, thereby
// opening a new slot when this worker returns/finishes.
lock (currentlyWorkingThings)
currentlyWorkingThings[index] = null;
});
}
IEnumerable<Thing> PrioritizeThings(List<Thing> thingsToDo) {
int slots = workers;
int halfSlots = (int)Math.Ceiling(slots / 2f);
//Sort thingsToDo by their difficulty, easiest first.
//Loop until we've worked every Thing.
while (thingsToDo.Count > 0) {
int slotToFill = ...; //Find the first open slot.
Thing nextThing = null;
lock (currentlyWorkingThings) {
//If the slot is in the "top half", get the next easy thing - otherwise
// get the next hard thing.
if (slotToFill < halfSlots)
nextThing = thingsToDo.First();
else
nextThing = thingsToDo.Last();
//Add the nextThing to the list of currentlyWorkingThings and remove it from
// the list of thingsToDo.
currentlyWorkingThings[slotToFill] = nextThing;
thingsToDo.Remove(nextThing);
}
//Return the nextThing to work.
yield return nextThing;
}
}
The Problem
So the issue I'm seeing here is that Parallel is requesting the next thing to work on from PrioritizeThings before a slot has opened (before an existing thing has been completed). I assume that Parallel is looking ahead and getting things to work ready in advance. I'd like it to not do this, and only fill a worker/slot when it is completely done. The only way I've thought of to fix this is to add a sleep/wait loop in PrioritizeThings which won't return a thing to work until it sees a legitimate open slot. But I don't like that and I was hoping that there was some way to make Parallel wait longer before getting work. Any suggestions?
There is a way built in (kinda) to support exactly the situation you are describing.
When you create the ForEach you will need to pass in a ParallelOptions with a non-standard TaskScheduler. The hard part is creating a TaskSchedueler to do that priority system for you, fortunately Microsoft released a pack of examples that contains one such scheduler called "ParallelExtensionsExtras" with its scheduler QueuedTaskScheduler
private static void Main(string[] args)
{
int totalMaxConcurrancy = Environment.ProcessorCount;
int highPriorityMaxConcurrancy = totalMaxConcurrancy / 2;
if (highPriorityMaxConcurrancy == 0)
highPriorityMaxConcurrancy = 1;
QueuedTaskScheduler qts = new QueuedTaskScheduler(TaskScheduler.Default, totalMaxConcurrancy);
var highPriortiyScheduler = qts.ActivateNewQueue(0);
var lowPriorityScheduler = qts.ActivateNewQueue(1);
BlockingCollection<Foo> highPriorityWork = new BlockingCollection<Foo>();
BlockingCollection<Foo> lowPriorityWork = new BlockingCollection<Foo>();
List<Task> processors = new List<Task>(2);
processors.Add(Task.Factory.StartNew(() =>
{
Parallel.ForEach(highPriorityWork.GetConsumingPartitioner(), //.GetConsumingPartitioner() is also from ParallelExtensionExtras, it gives better performance than .GetConsumingEnumerable() with Parallel.ForEeach(
new ParallelOptions() { TaskScheduler = highPriortiyScheduler, MaxDegreeOfParallelism = highPriorityMaxConcurrancy },
ProcessWork);
}, TaskCreationOptions.LongRunning));
processors.Add(Task.Factory.StartNew(() =>
{
Parallel.ForEach(lowPriorityWork.GetConsumingPartitioner(),
new ParallelOptions() { TaskScheduler = lowPriorityScheduler},
ProcessWork);
}, TaskCreationOptions.LongRunning));
//Add some work to do here to the highPriorityWork or lowPriorityWork collections
//Lets the blocking collections know we are no-longer going to be adding new items so it will break out of the `ForEach` once it has finished the pending work.
highPriorityWork.CompleteAdding();
lowPriorityWork.CompleteAdding();
//Waits for the two collections to compleatly empty before continueing
Task.WaitAll(processors.ToArray());
}
private static void ProcessWork(Foo work)
{
//...
}
Even though you have two instances of Parallel.ForEach running the combined total of both of them will not use more than the value you passed in for MaxConcurrency in to the QueuedTaskScheduler constructor and it will give preference to emptying the highPriorityWork collection first if there is work to do in both (up to a limit of 1/2 of all of the available slots so that you don't choke the low priority queue, you could easily adjust this to be a higher or lower ratio depending on your performance needs).
If you don't want the high priority to always win and you rather have a "round-robin" style scheduler that alternates between the two lists (so you don't want the quick items to always win, but just have them shuffled in with the slow items) you can set the same priority level to two or more queues (or just use the RoundRobinTaskSchedulerQueue which does the same thing)

Speed up loop using multithreading in C# (Question)

Imagine I have an function which goes through one million/billion strings and checks smth in them.
f.ex:
foreach (String item in ListOfStrings)
{
result.add(CalculateSmth(item));
}
it consumes lot's of time, because CalculateSmth is very time consuming function.
I want to ask: how to integrate multithreading in this kinda process?
f.ex: I want to fire-up 5 threads and each of them returns some results, and thats goes-on till the list has items.
Maybe anyone can show some examples or articles..
Forgot to mention I need it in .NET 2.0
You could try the Parallel extensions (part of .NET 4.0)
These allow you to write something like:
Parallel.Foreach (ListOfStrings, (item) =>
result.add(CalculateSmth(item));
);
Of course result.add would need to be thread safe.
The Parallel extensions is cool, but this can also be done just by using the threadpool like this:
using System.Collections.Generic;
using System.Threading;
namespace noocyte.Threading
{
class CalcState
{
public CalcState(ManualResetEvent reset, string input) {
Reset = reset;
Input = input;
}
public ManualResetEvent Reset { get; private set; }
public string Input { get; set; }
}
class CalculateMT
{
List<string> result = new List<string>();
List<ManualResetEvent> events = new List<ManualResetEvent>();
private void Calc() {
List<string> aList = new List<string>();
aList.Add("test");
foreach (var item in aList)
{
CalcState cs = new CalcState(new ManualResetEvent(false), item);
events.Add(cs.Reset);
ThreadPool.QueueUserWorkItem(new WaitCallback(Calculate), cs);
}
WaitHandle.WaitAll(events.ToArray());
}
private void Calculate(object s)
{
CalcState cs = s as CalcState;
cs.Reset.Set();
result.Add(cs.Input);
}
}
}
Note that concurrency doesn't magically give you more resource. You need to establish what is slowing CalculateSmth down.
For example, if it's CPU-bound (and you're on a single core) then the same number of CPU ticks will go to the code, whether you execute them sequentially or in parallel. Plus you'd get some overhead from managing the threads. Same argument applies to other constraints (e.g. I/O)
You'll only get performance gains in this if CalculateSmth is leaving resource free during its execution, that could be used by another instance. That's not uncommon. For example, if the task involves IO followed by some CPU stuff, then process 1 could be doing the CPU stuff while process 2 is doing the IO. As mats points out, a chain of producer-consumer units can achieve this, if you have the infrastructure.
You need to split up the work you want to do in parallel. Here is an example of how you can split the work in two:
List<string> work = (some list with lots of strings)
// Split the work in two
List<string> odd = new List<string>();
List<string> even = new List<string>();
for (int i = 0; i < work.Count; i++)
{
if (i % 2 == 0)
{
even.Add(work[i]);
}
else
{
odd.Add(work[i]);
}
}
// Set up to worker delegates
List<Foo> oddResult = new List<Foo>();
Action oddWork = delegate { foreach (string item in odd) oddResult.Add(CalculateSmth(item)); };
List<Foo> evenResult = new List<Foo>();
Action evenWork = delegate { foreach (string item in even) evenResult.Add(CalculateSmth(item)); };
// Run two delegates asynchronously
IAsyncResult evenHandle = evenWork.BeginInvoke(null, null);
IAsyncResult oddHandle = oddWork.BeginInvoke(null, null);
// Wait for both to finish
evenWork.EndInvoke(evenHandle);
oddWork.EndInvoke(oddHandle);
// Merge the results from the two jobs
List<Foo> allResults = new List<Foo>();
allResults.AddRange(oddResult);
allResults.AddRange(evenResult);
return allResults;
The first question you must answer is whether you should be using threading
If your function CalculateSmth() is basically CPU-bound, i.e. heavy in CPU-usage and basically no I/O-usage, then I have a hard time seeing the point of using threads, since the threads will be competing over the same resource, in this case the CPU.
If your CalculateSmth() is using both CPU and I/O, then it might be a point in using threading.
I totally agree with the comment to my answer. I made a erroneous assumption that we were talking about a single CPU with one core, but these days we have multi-core CPUs, my bad.
Not that I have any good articles here right now, but what you want to do is something along Producer-Consumer with a Threadpool.
The Producers loops through and creates tasks (which in this case could be to just queue up the items in a List or Stack). The Consumers are, say, five threads that reads one item off the stack, consumes it by calculating it, and then stores it else where.
This way the multithreading is limited to just those five threads, and they will all have work to do up until the stack is empty.
Things to think about:
Put protection on the input and output list, such as a mutex.
If the order is important, make sure that the output order is maintained. One example could be to store them in a SortedList or something like that.
Make sure that the CalculateSmth is thread safe, that it doesn't use any global state.

Categories

Resources