Speed up loop using multithreading in C# (Question)

Speed up loop using multithreading in C# (Question) - c#

Imagine I have an function which goes through one million/billion strings and checks smth in them.
f.ex:
foreach (String item in ListOfStrings)
{
result.add(CalculateSmth(item));
}
it consumes lot's of time, because CalculateSmth is very time consuming function.
I want to ask: how to integrate multithreading in this kinda process?
f.ex: I want to fire-up 5 threads and each of them returns some results, and thats goes-on till the list has items.
Maybe anyone can show some examples or articles..
Forgot to mention I need it in .NET 2.0

You could try the Parallel extensions (part of .NET 4.0)
These allow you to write something like:
Parallel.Foreach (ListOfStrings, (item) =>
result.add(CalculateSmth(item));
);
Of course result.add would need to be thread safe.

The Parallel extensions is cool, but this can also be done just by using the threadpool like this:
using System.Collections.Generic;
using System.Threading;
namespace noocyte.Threading
{
class CalcState
{
public CalcState(ManualResetEvent reset, string input) {
Reset = reset;
Input = input;
}
public ManualResetEvent Reset { get; private set; }
public string Input { get; set; }
}
class CalculateMT
{
List<string> result = new List<string>();
List<ManualResetEvent> events = new List<ManualResetEvent>();
private void Calc() {
List<string> aList = new List<string>();
aList.Add("test");
foreach (var item in aList)
{
CalcState cs = new CalcState(new ManualResetEvent(false), item);
events.Add(cs.Reset);
ThreadPool.QueueUserWorkItem(new WaitCallback(Calculate), cs);
}
WaitHandle.WaitAll(events.ToArray());
}
private void Calculate(object s)
{
CalcState cs = s as CalcState;
cs.Reset.Set();
result.Add(cs.Input);
}
}
}

Note that concurrency doesn't magically give you more resource. You need to establish what is slowing CalculateSmth down.
For example, if it's CPU-bound (and you're on a single core) then the same number of CPU ticks will go to the code, whether you execute them sequentially or in parallel. Plus you'd get some overhead from managing the threads. Same argument applies to other constraints (e.g. I/O)
You'll only get performance gains in this if CalculateSmth is leaving resource free during its execution, that could be used by another instance. That's not uncommon. For example, if the task involves IO followed by some CPU stuff, then process 1 could be doing the CPU stuff while process 2 is doing the IO. As mats points out, a chain of producer-consumer units can achieve this, if you have the infrastructure.

You need to split up the work you want to do in parallel. Here is an example of how you can split the work in two:
List<string> work = (some list with lots of strings)
// Split the work in two
List<string> odd = new List<string>();
List<string> even = new List<string>();
for (int i = 0; i < work.Count; i++)
{
if (i % 2 == 0)
{
even.Add(work[i]);
}
else
{
odd.Add(work[i]);
}
}
// Set up to worker delegates
List<Foo> oddResult = new List<Foo>();
Action oddWork = delegate { foreach (string item in odd) oddResult.Add(CalculateSmth(item)); };
List<Foo> evenResult = new List<Foo>();
Action evenWork = delegate { foreach (string item in even) evenResult.Add(CalculateSmth(item)); };
// Run two delegates asynchronously
IAsyncResult evenHandle = evenWork.BeginInvoke(null, null);
IAsyncResult oddHandle = oddWork.BeginInvoke(null, null);
// Wait for both to finish
evenWork.EndInvoke(evenHandle);
oddWork.EndInvoke(oddHandle);
// Merge the results from the two jobs
List<Foo> allResults = new List<Foo>();
allResults.AddRange(oddResult);
allResults.AddRange(evenResult);
return allResults;

The first question you must answer is whether you should be using threading
If your function CalculateSmth() is basically CPU-bound, i.e. heavy in CPU-usage and basically no I/O-usage, then I have a hard time seeing the point of using threads, since the threads will be competing over the same resource, in this case the CPU.
If your CalculateSmth() is using both CPU and I/O, then it might be a point in using threading.
I totally agree with the comment to my answer. I made a erroneous assumption that we were talking about a single CPU with one core, but these days we have multi-core CPUs, my bad.

Not that I have any good articles here right now, but what you want to do is something along Producer-Consumer with a Threadpool.
The Producers loops through and creates tasks (which in this case could be to just queue up the items in a List or Stack). The Consumers are, say, five threads that reads one item off the stack, consumes it by calculating it, and then stores it else where.
This way the multithreading is limited to just those five threads, and they will all have work to do up until the stack is empty.
Things to think about:
Put protection on the input and output list, such as a mutex.
If the order is important, make sure that the output order is maintained. One example could be to store them in a SortedList or something like that.
Make sure that the CalculateSmth is thread safe, that it doesn't use any global state.

Related

How to avoid running out of RAM, during a concurrent data proccessing?

I have an issue with data concurrent processing. My PC is running out of RAM quickly. Any advices on how to fix my concurrent implementation?
Common class:
public class CalculationResult
{
public int Count { get; set; }
public decimal[] RunningTotals { get; set; }
public CalculationResult(decimal[] profits)
{
this.Count = 1;
this.RunningTotals = new decimal[12];
profits.CopyTo(this.RunningTotals, 0);
}
public void Update(decimal[] newData)
{
this.Count++;
// summ arrays
for (int i = 0; i < 12; i++)
this.RunningTotals[i] = this.RunningTotals[i] + newData[i];
}
public void Update(CalculationResult otherResult)
{
this.Count += otherResult.Count;
// summ arrays
for (int i = 0; i < 12; i++)
this.RunningTotals[i] = this.RunningTotals[i] + otherResult.RunningTotals[i];
}
}
Single-core implementation of the code is following:
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
foreach (var i in itterations)
{
// do the processing
// ..
string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing
if (combinations.ContainsKey(combination))
combinations[combination].Update(newData);
else
combinations.Add(combination, new CalculationResult(newData));
}
Multi-core implementation:
ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
Parallel.ForEach(itterations, (i, state) =>
{
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
// do the processing
// ..
// add combination to combinations -> same logic as in single core implementation
results.Add(combinations);
});
Dictionary<string, CalculationResult> combinationsReal = new Dictionary<string, CalculationResult>();
foreach (var item in results)
{
foreach (var pair in item)
{
if (combinationsReal.ContainsKey(pair.Key))
combinationsReal[pair.Key].Update(pair.Value);
else
combinationsReal.Add(pair.Key, pair.Value);
}
}
The issue I am having is that almost each combinations dictionary ends up with 930k records in it, which is on average consumes 400 [MB] RAM memory.
Now, in single core implementation there is only one such dictionary. All checks are performed against one dictionary. But this is slow approach and I want to use multi-core optimizations.
In multi-core implementation there is a ConcurrentBag instance created which holds all combinations dictionaries. As soon as the multi-thread job is finished - all dictionaries are aggregated into one. This approach works well for small amount of concurrent iterations. For example, for 4 iterations my RAM usage was ~ 1.5 [GB]. The issue arises, when I set the full amount of parallel iterations, which is 200! No amount of PC RAM is enough to hold all dictionaries, with million records each!
I was thinking about using ConcurrentDictioanary, until I found out that the "TryAdd" method does not guarantee integrity of added data in my situation, as I also need to run updates on running totals.
The only real multi-threaded option is, instead of adding all combinations to dictionary - is to save them to some DB. Data aggregation will then be a matter of 1 SQL select statement with a group by clause... but I don't like the idea of creating a temporary table and running DB instance just for that..
Is there a work around on how to processes data concurrently and not run out of RAM?
EDIT:
Maybe the real question should have been - how to make updating of RunningTotals thread-safe when using ConcurrentDictionary? I have just ran across this thread, with a similar issue with ConcurrentDictionary, but my situation seems to be more complicated as I have an array that needs to be updated. I am still investigating this matter.
EDIT2: Here is a working solution with ConcurrentDictionary. All I needed to do is to add a lock for the dictionary key.
ConcurrentDictionary<string, CalculationResult> combinations = new ConcurrentDictionary<string, CalculationResult>();
Parallel.ForEach(itterations, (i, state) =>
{
// do the processing
// ..
string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing
if (combinations.ContainsKey(combination)) {
lock(combinations[combination])
combinations[combination].Update(newData);
}
else
combinations.TryAdd(combination, new CalculationResult(newData));
});
Single-thread code execution time is 1m 48s, whereas this solution execution time is 1m 7s for 4 iterations (37% performance increase). I am still wondering if SQL approach will be any faster, with millions of records? I will test it out possibly tomorrow and update.
Edit 3: For those of you wondering what's wrong with ConcurrentDictionary updates on a value - run this code with and without the lock.
public class Result
{
public int Count { get; set; }
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Start");
List<int> keys = new List<int>();
for (int i = 0; i < 100; i++)
keys.Add(i);
ConcurrentDictionary<int, Result> dict = new ConcurrentDictionary<int, Result>();
Parallel.For(0, 8, i =>
{
foreach(var key in keys)
{
if (dict.ContainsKey(key))
{
//lock (dict[key]) // uncomment this
dict[key].Count++;
}
else
dict.TryAdd(key, new Result());
}
});
// any output here is incorrect behavior. best result = no lines
foreach (var item in dict)
if (item.Value.Count != 7) { Console.WriteLine($"{item.Key}; {item.Value.Count}"); }
Console.WriteLine($"Finish");
Console.ReadKey();
}
}
Edit 4: After trials and errors I couldn't optimize SQL approach. This turned out to be the worst idea :) I have used an SQL Lite database. In-memory and in-file. With transaction and reusable SQL command parameters. Due to the huge amount of records that needed to be inserted - the performance is lacking. Data aggregation is the easiest part, but it takes a huge amount of time just to insert 4 millions of rows, I can't even begin to imagine how the 240 million of data could be processed efficiently.. So far (and also strangely), ConcurrentBag approach seems to be the fastest on my PC. Followed by a ConcurrentDictionary approach. ConcurrentBag is a bit heavier on memory, though. Thanks to the work of #Alisson - it is now perfectly fine to use it for larger set of iterations!

So, you just need to be sure you'll have no more than 4 concurrent iterations, that's the limit of your computer resources and by using only this computer, there is no magic.
I created a class to control the concurrent execution and the number of concurrent tasks it will perform.
The class will hold these properties:
public class ConcurrentCalculationProcessor
{
private const int MAX_CONCURRENT_TASKS = 4;
private readonly IEnumerable<int> _codes;
private readonly List<Task<Dictionary<string, CalculationResult>>> _tasks;
private readonly Dictionary<string, CalculationResult> _combinationsReal;
public ConcurrentCalculationProcessor(IEnumerable<int> codes)
{
this._codes = codes;
this._tasks = new List<Task<Dictionary<string, CalculationResult>>>();
this._combinationsReal = new Dictionary<string, CalculationResult>();
}
}
I made the number of concurrent tasks a const, but it could be a parameter in the constructor.
I created a method to handle the processing. For test purposes, I simulated a loop through 900k itens, adding them to a dictionary, and finally returning them:
private async Task<Dictionary<string, CalculationResult>> ProcessCombinations()
{
Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
// do the processing
// here we should do something that worth using concurrency
// like querying databases, consuming APIs/WebServices, and other I/O stuff
for (int i = 0; i < 950000; i++)
combinations[i.ToString()] = new CalculationResult(new decimal[] { 1, 10, 15 });
return await Task.FromResult(combinations);
}
The main method will start tasks in parallel, adding them to a list of tasks, so we can keep track of them lately.
Everytime the list reaches the maximum concurrent tasks, we await a method called ProcessRealCombinations.
public async Task<Dictionary<string, CalculationResult>> Execute()
{
ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
for (int i = 0; i < this._codes.Count(); i++)
{
// start the task imediately
var task = ProcessCombinations();
this._tasks.Add(task);
if (this._tasks.Count() >= MAX_CONCURRENT_TASKS)
{
// if we have more than MAX_CONCURRENT_TASKS in progress, we start processing some of them
// this will await any of the current tasks to complete, them process it (and any other task which may have been completed as well)...
await ProcessCompletedTasks().ConfigureAwait(false);
}
}
// keep processing until all the pending tasks have been completed...it should be no more than MAX_CONCURRENT_TASKS
while(this._tasks.Any())
await ProcessCompletedTasks().ConfigureAwait(false);
return this._combinationsReal;
}
The next method ProcessCompletedTasks will wait for at least one of the existing tasks to complete. After that, it will take all the completed tasks from the list (that one which finished and any other which may have been finished together), and get the result of them (the combinations).
With each processedCombinations, it'll merge with this._combinationsReal (using the same logic you provided in your question).
private async Task ProcessCompletedTasks()
{
await Task.WhenAny(this._tasks).ConfigureAwait(false);
var completedTasks = this._tasks.Where(t => t.IsCompleted).ToArray();
// completedTasks will have at least one task, but it may have more ;)
foreach (var completedTask in completedTasks)
{
var processedCombinations = await completedTask.ConfigureAwait(false);
foreach (var pair in processedCombinations)
{
if (this._combinationsReal.ContainsKey(pair.Key))
this._combinationsReal[pair.Key].Update(pair.Value);
else
this._combinationsReal.Add(pair.Key, pair.Value);
}
this._tasks.Remove(completedTask);
}
}
For each processedCombinations merged in _combinationsReal, it will remove its respective task from the list, and move on (start adding more tasks again). This will happen until we have created all the tasks for all iterations.
Finally, we keep processing it, until there are no more tasks in the list.
If you monitor the RAM consumption, you'll notice it will increase to about 1.5 GB (when we have 4 tasks being processed concurrently), then decrease to about 0.8 GB (when we remove tasks from the list). At least this is what happened in my computer.
Here is a fiddle, however I had to decrease the number of itens from 900k to 100, because fiddle limits the memory usage to avoid abuse.
I hope this help you somehow.
One thing to notice about all this stuff, is that you will benefit from using concurrent tasks mostly if your ProcessCombinations (the method that is executed concurrently when processing those 900k items) calls external resources, like reading files from your HD, executing a query in a database, calling an API/WebService method. I guess that code is probably reading 900k items from an external resource, then this will reduce the time needed to process it.
If the items were previously loaded and ProcessCombinations is just reading data that was already in memory, then the concurrency won't help at all (actually I believe it would make your code ran slower). If that's the case, then we are applying concurrency in the wrong place.
Using async calls in parallel is likely to help more when said calls are going to access external resources (either to get or store data), and depending on how many concurrent calls that external resources can support, it may still not make such a difference.

Count number of threads used by Parallel.ForEach

How can I determine the number of threads used during a specific call of Parallel.ForEach (or Parallel.Invoke, or Parallel.For)
I know how to limit the maximum number of threads, e.g.
Parallel.ForEach(myList,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
item => { doStuff(item); });
I know that the Task.Parallel library uses some heuristics to determine the optimal number of additional threadpool threads to use at runtime, in addition to the current thread; some value between 0 and MaxDegreeOfParallelism.
I would like to know how many threads have actually been used, for logging purposes:
Stopwatch watch = Stopwatch.StartNew();
Parallel.ForEach(myList, item => { doStuff(item); });
trace.TraceInformation("Task finished in {0}ms using {1} threads",
watch.ElapsedMilliseconds, NUM_THREADS_USED);
I mainly want this data logged for curiosity's sake, and to improve my understanding. It does not have to be 100% reliable, since I do not intend to use it for anything else.
Is there a way to get this number, without major performance penalties?

You could use a (thread-safe) list to store the IDs of the used threads and count them:
ConcurrentBag<int> threadIDs = new ConcurrentBag<int>();
Parallel.ForEach(myList, item => {
threadIDs.Add(Thread.CurrentThread.ManagedThreadId);
doStuff(item);
});
int usedThreads = threadIDs.Distinct().Count();
This does have a performance impact (especially the thread-safety logic of ConcurrentBag), but I can't tell how big that is. The relative effect depends on how much work doStuff does itself. If that method has only a few commands, this thread counting solution may even change the number of used threads.

In your DoStuff method you can add the code like this
private void DoStuff(T item)
{
Logger.Log($"Item {item.ToString()} was handled by thread # {Thread.CurrentThread.ManagedThreadId}");
// your logic here
}

I know that the Task.Parallel library uses some heuristics to determine the optimal number of additional threadpool threads to use at runtime, in addition to the current thread; some value between 0 and MaxDegreeOfParallelism.
I would like to know how many threads have actually been used, for logging purposes
Since you mention the thread pool and MaxDoP, I interpreted this question as you wanted to know how many concurrent threads were used at any one time. This you can find out by using a field and Interlocked.
class MyClass
{
private int _concurrentThreadCount;
private ILog _logger; //for example
public void DoWork()
{
var listOfSomething = GetListOfStuff();
Parallel.ForEach(listOfSomething, singleSomething =>
{
Interlocked.Increment(ref _concurrentThreadCount);
_logger.Info($"Doing some work. Concurrent thread count: {_concurrentThreadCount}");
// do work
Interlocked.Decrement(ref _concurrentThreadCount);
});
}
}

While I am aware this is an older question, I followed up on Evk's suggestion. Also not sure about the performance impact, but you could use a concurrentdictionary to keep track of the threadids:
var threadIDs = new ConcurrentDictionary<int, int>();
Parallel.ForEach(myList, item => {
threadIDs.TryAdd(Thread.CurrentThread.ManagedThreadId, 0);
doStuff(item);
});
int usedThreads = threadIDs.Keys.Count();

Make Parallel.ForEach wait to get work until a slot opens

I'm using Parallel.ForEach to work a bunch of items. The problem is, I want to prioritize which items get worked depending on the number of workers (slots) that are open. E.g. if I am working 8 parallel things and a slot opens between task 1-4, I want to assign easy work to those slots. The bottom half of the slots will get the hard work. This way, I won't get all 8 slots tied up doing hard/long-running work, easy/quick items will be run first. I've implemented this as follows:
The Code
const int workers = 8;
List<Thing> thingsToDo = ...; //Get the things that need to be done.
Thing[] currentlyWorkingThings = new Thing[workers]; //One slot for each worker.
void Run() {
Parallel.ForEach(PrioritizeThings(thingsToDo), o => {
int index = 0;
//"PrioritizeTasks" added this thing to the list of currentlyWorkingThings.
//Find my position in this list.
lock (currentlyWorkingThings)
index = currentlyWorkingThings.IndexOf(o);
//Do work on this thing...
//Then remove it from the list of currently working things, thereby
// opening a new slot when this worker returns/finishes.
lock (currentlyWorkingThings)
currentlyWorkingThings[index] = null;
});
}
IEnumerable<Thing> PrioritizeThings(List<Thing> thingsToDo) {
int slots = workers;
int halfSlots = (int)Math.Ceiling(slots / 2f);
//Sort thingsToDo by their difficulty, easiest first.
//Loop until we've worked every Thing.
while (thingsToDo.Count > 0) {
int slotToFill = ...; //Find the first open slot.
Thing nextThing = null;
lock (currentlyWorkingThings) {
//If the slot is in the "top half", get the next easy thing - otherwise
// get the next hard thing.
if (slotToFill < halfSlots)
nextThing = thingsToDo.First();
else
nextThing = thingsToDo.Last();
//Add the nextThing to the list of currentlyWorkingThings and remove it from
// the list of thingsToDo.
currentlyWorkingThings[slotToFill] = nextThing;
thingsToDo.Remove(nextThing);
}
//Return the nextThing to work.
yield return nextThing;
}
}
The Problem
So the issue I'm seeing here is that Parallel is requesting the next thing to work on from PrioritizeThings before a slot has opened (before an existing thing has been completed). I assume that Parallel is looking ahead and getting things to work ready in advance. I'd like it to not do this, and only fill a worker/slot when it is completely done. The only way I've thought of to fix this is to add a sleep/wait loop in PrioritizeThings which won't return a thing to work until it sees a legitimate open slot. But I don't like that and I was hoping that there was some way to make Parallel wait longer before getting work. Any suggestions?

There is a way built in (kinda) to support exactly the situation you are describing.
When you create the ForEach you will need to pass in a ParallelOptions with a non-standard TaskScheduler. The hard part is creating a TaskSchedueler to do that priority system for you, fortunately Microsoft released a pack of examples that contains one such scheduler called "ParallelExtensionsExtras" with its scheduler QueuedTaskScheduler
private static void Main(string[] args)
{
int totalMaxConcurrancy = Environment.ProcessorCount;
int highPriorityMaxConcurrancy = totalMaxConcurrancy / 2;
if (highPriorityMaxConcurrancy == 0)
highPriorityMaxConcurrancy = 1;
QueuedTaskScheduler qts = new QueuedTaskScheduler(TaskScheduler.Default, totalMaxConcurrancy);
var highPriortiyScheduler = qts.ActivateNewQueue(0);
var lowPriorityScheduler = qts.ActivateNewQueue(1);
BlockingCollection<Foo> highPriorityWork = new BlockingCollection<Foo>();
BlockingCollection<Foo> lowPriorityWork = new BlockingCollection<Foo>();
List<Task> processors = new List<Task>(2);
processors.Add(Task.Factory.StartNew(() =>
{
Parallel.ForEach(highPriorityWork.GetConsumingPartitioner(), //.GetConsumingPartitioner() is also from ParallelExtensionExtras, it gives better performance than .GetConsumingEnumerable() with Parallel.ForEeach(
new ParallelOptions() { TaskScheduler = highPriortiyScheduler, MaxDegreeOfParallelism = highPriorityMaxConcurrancy },
ProcessWork);
}, TaskCreationOptions.LongRunning));
processors.Add(Task.Factory.StartNew(() =>
{
Parallel.ForEach(lowPriorityWork.GetConsumingPartitioner(),
new ParallelOptions() { TaskScheduler = lowPriorityScheduler},
ProcessWork);
}, TaskCreationOptions.LongRunning));
//Add some work to do here to the highPriorityWork or lowPriorityWork collections
//Lets the blocking collections know we are no-longer going to be adding new items so it will break out of the `ForEach` once it has finished the pending work.
highPriorityWork.CompleteAdding();
lowPriorityWork.CompleteAdding();
//Waits for the two collections to compleatly empty before continueing
Task.WaitAll(processors.ToArray());
}
private static void ProcessWork(Foo work)
{
//...
}
Even though you have two instances of Parallel.ForEach running the combined total of both of them will not use more than the value you passed in for MaxConcurrency in to the QueuedTaskScheduler constructor and it will give preference to emptying the highPriorityWork collection first if there is work to do in both (up to a limit of 1/2 of all of the available slots so that you don't choke the low priority queue, you could easily adjust this to be a higher or lower ratio depending on your performance needs).
If you don't want the high priority to always win and you rather have a "round-robin" style scheduler that alternates between the two lists (so you don't want the quick items to always win, but just have them shuffled in with the slow items) you can set the same priority level to two or more queues (or just use the RoundRobinTaskSchedulerQueue which does the same thing)

Controlling number of threads using AsParallel or Parallel.ForEach

I have a huge collection, over which i have to perform a specific task(which involves calling a wcf service). I want to control the number of threads instead of using Parallel.ForEach directly. Here i have 2 options:
I am using below to partition the data:
List<MyCollectionObject> MyCollection = new List<MyCollectionObject>();
public static IEnumerable<List<T>> PartitionMyData<T>(this IList<T> source, Int32 size)
{
for (int i = 0; i < Math.Ceiling(source.Count / (Double)size); i++)
{
yield return new List<T>(source.Skip(size * i).Take(size));
}
}
Option 1:
MyCollection.PartitionMyData(AutoEnrollRequests.Count()/threadValue).AsParallel().AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
foreach(MyCollectionObject obj in requests)
{
//Do Something
}
}
Option2:
MyCollection.PartitionMyData(threadValue).AsOrdered()
.Select(no => InvokeTask(no)).ToArray();
private void InvokeTask(List<MyCollectionObject> requests)
{
Action<MyCollectionObject> dosomething =
{
}
Parallel.ForEach(requests,dosomething)
}
If i have 16 objects in my collection, as per my knowledge Option1 will launch 4 threads, each thread having 4 objects will be processed synchronously.
Option 2 will launch 4 threads with 1 object each, process them and again will launch 4 threads.
Can anyone please suggest which option is better?
P.S.
I understand .Net framework does thread pooling and we need not control the number of threads but due to some design decision we want to use it.
Thanks In Advance,
Rohit

I want to control the number of threads instead of using Parallel.ForEach directly
You can control de number of threads in Parallel.ForEach if you use this call with a ParallelOptions object:
Parallel.ForEach(requests,
new ParallelOptions(){MaxDegreeOfParallelism = 4}, //change here
dosomething)

It's impossible to give an A or B answer here. It depends on too many unknowns.
I will assume you want the fastest approach. To see which is better, run both on the target environment (or closest approximation you can get) and see which one completes fastest.

Adding variable to a List that is in use by another thread, is this bad practise?

I have an infinite loop in a separate thread operating on a List of strings. I want to be able to add strings to this list while the thread is runnning. I have a feeling the code I am writing is 'wrong'. In the infinite loop I am iterating through each string in the list and performing operations on it, so it seems like I can't just add a string to this list from my main thread, as I will be interfering with a variable that is concurrently being accessed by another thread. Here's what my code looks like -
class StringTest
{
public List<string> ListOfStrings = new List<string>();
public Task MainLoopTask;
bool IsRunning = false;
public void AddToList(string myString)
{
ListOfStrings.Add(myString); // Adding a string to the list
if (!IsRunning)
{
IsRunning = true;
MainLoopTask = Task.Factory.StartNew(MainLoop);
}
}
public void MainLoop()
{
while (true)
{
foreach(string s in ListOfStrings) // Operating on the list in a separate thread
{
...
...
...
}
}
}
Is this bad code or is it ok? If it's is bad, what can I do to fix it?

That is not safe, and will eventually fail spectacularly in production.
Instead, you should use a thread-safe collection; probably a concurrent queue.

List<T> is safe for concurrent reading. That is, it's perfectly safe (from a stability standpoint) to have multiple threads reading the list, but writing to the list can only be done from one thread and you may not allow other threads to read from it while writing is taking place.
The simplest solution, especially if you only have two threads, is to use a simple lock statement to prevent two threads from interacting with it at the same time.
For instance:
Replace this:
ListOfStrings.Add(myString);
With this:
lock(ListOfStrings)
{
ListOfStrings.Add(myString);
}
And this:
foreach(string s in ListOfStrings) // Operating on the list in a separate thread
{
...
...
...
}
With this:
lock(ListOfStrings)
{
foreach(string s in ListOfStrings) // Operating on the list in a separate thread
{
...
...
...
}
}
This will make sure that your code blocks don't execute at the same time by creating an exclusive lock on the ListOfStrings object. If the list is small and the operations are trivial, this is likely sufficient. If either is not true (the list is large or the operations are non-trivial), then you'll probably want something more robust, such as creating a copy of the list and clearing the original within the body of the lock statement, then having your thread operate on that copy of the list.

This is not really an answer, but I cannot put code in a comment, so, I am writing as a reply to your comment to #SLaks which says "How so?"
If your AddToList method attempts to add a string while your MainLoop is in the middle of the foreach loop, your AddToList method will have to wait until the foreach loop is done. This can be remedied as follows:
public void MainLoop()
{
while (true)
{
string item;
lock( ListOfStrings )
{
if( ListOfStrings.Count == 0 )
continue;
item = ListOfStrings[0];
ListOfStrings.RemoveAt( 0 );
}
//do something with item
}
}
But using a thread-safe collection as SLaks proposed is still better because it is less hassle.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Speed up loop using multithreading in C# (Question) - c#

You could try the Parallel extensions (part of .NET 4.0) These allow you to write something like: Parallel.Foreach (ListOfStrings, (item) => result.add(CalculateSmth(item)); ); Of course result.add would need to be thread safe.

Related

How to avoid running out of RAM, during a concurrent data proccessing?

Count number of threads used by Parallel.ForEach

Make Parallel.ForEach wait to get work until a slot opens

Controlling number of threads using AsParallel or Parallel.ForEach

Adding variable to a List that is in use by another thread, is this bad practise?

Categories

Resources