Executing a list of objects in parallel - c#

I have a list of object that I want to execute at the same time.
private List<Calculate_Data> ListOfDataToCalculate = new List<Calculate_Data>();
I have a method that loops through the list of as and call StartCalculate() follow:
public async void StartCalculating()
{
foreach (var alldata in ListOfDataToCalculate )
{
alldata.StartCalculate();
}
}
Is there a way to blast them all and make them alldata.StartCalculate() in parallel instead of in sequence?
I know that we can use
var calculatetask = Task.Run(() =>
{
alldata.StartCalculate();
});
But is there to do them all at the same time? Some objects are longer to execute and some are faster. The ending does not matter. What's important is doing all at the same time.
Thank you

Use Parallel.ForEach:
Parallel.ForEach(ListOfDataToCalculate, alldata =>
{
alldata.StartCalculate();
});

Related

Parallel Task and Subtasks workflow

I'm new to C# threads and tasks and I'm trying to develop a workflow but without success probably because I'm mixing tasks with for iterations...
The point is:
I've got a bunch of lists, and inside each one there are some things to do, and need to make them work as much parallel and less blocking possible, and as soon as each subBunchOfThingsTodo is done ( it means every thing to do inside it is done parallely) it has do some business(DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone()).
e.g:
bunchOfSubBunchsOfThingsTodo
subBunchOfThingsTodo
ThingToDo1
ThingToDo2
subBunchOfThingsTodo
ThingToDo1
ThingToDo2
ThingToDo3
subBunchOfThingsTodo
ThingToDo1
ThingToDo2...
This is how I'm trying but unfortunately each iteration waits the previous one bunchOfThingsToDo and I need them to work in parallel.
The same happens to the things to do , they wait the previous thing to start...
List<X> bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
foreach (var subBunchOfThingsToDo in bunchOfSubBunchsOfThingsTodo)
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
var parent = Task.Factory.StartNew(() =>
{
foreach (var thingToDo in subBunchOfThingsToDo.ThingsToDo)
{
var child = Task.Factory.StartNew(() =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
}
});
parent.Wait();
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
}
You may want to try using Task.WhenAll and playing with linq to generate a collection of hot tasks:
static async void ProcessThingsToDo(IEnumerable<ThingToDo> bunchOfThingsToDo)
{
IEnumerable<Task> GetSubTasks(ThingToDo thing)
=> thing.SubBunchOfThingsToDo.Select( async subThing => await Task.Run(subThing));
var tasks = bunchOfThingsToDo
.Select(async thing => await Task.WhenAll(GetSubTasks(thing)));
await Task.WhenAll(tasks);
}
This way you are running each subThingToDo on a separate task and you get only one Task composed by all subtasks for each thingToDo
EDIT
ThingToDo is a rather simple class in this sample:
class ThingToDo
{
public IEnumerable<Action> SubBunchOfThingsToDo { get; }
}
With minimum changes of your code you can try this way:
var toWait = new List<Task>();
List<X> bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
foreach (var subBunchOfThingsToDo in bunchOfSubBunchsOfThingsTodo)
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
var parent = Task.Factory.StartNew(() =>
{
Parallel.ForEach(subBunchOfThingsToDo.ThingsToDo,
thingToDo =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
});
//parent.Wait();
var handle = parent.ContinueWith((x) =>
{
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
})
.Start();
toWait.Add(handle);
}
Task.WhenAll(toWait);
Thanks to downvoters team, that advised 'good' solution:
var bunchOfSubBunchsOfThingsTodo = getBunchOfSubBunchsOfThingsTodo();
var toWait = bunchOfSubBunchsOfThingsTodo
.Select(subBunchOfThingsToDo =>
{
return Task.Run(() =>
{
int idSubBunchOfThingsToDo = subBunchOfThingsToDo.ThingsToDo.FirstOrDefault().IdSubBunchOfThingsToDo;
Parallel.ForEach(subBunchOfThingsToDo.ThingsToDo,
thingToDo =>
{
//Do some stuff with thingToDo... Here I call several business methods
});
DoSomethingAfterEveryThingToDoOfThisSubBunchOfThingsAreDone(idSubBunchOfThingsToDo);
});
});
Task.WhenAll(toWait);

Multiple tasks using Parallel Foreach and WhenAny. Is that possible?

I want to scrape several websites at the same time, but just add the information to the database one by one. Meanwhile my code looks similar to this:
List<SiteMetadata> sitesList = GetSites();
var tasks = new List<Task<SiteMetadata>>();
foreach (var item in sitesList)
tasks.Add(item.LoadMetaDataAsync());
int totalSites = sitesList.Count;
int finishedSites = 0;
int errors = 0;
while (totalSites != finishedSites)
{
var tempSite = await Task.WhenAny(tasks.ToArray());
//WRITE HERE TO DB!!!!!!!!!!!!!!!!!
tasks.Remove(tempSite);
var tempLog = apiHandler.WriteToDatabase(tempSite.Result);
if (tempLog.Type == LogType.Error)
{
errors++;
LogsHandler.AddToLog(tempLog);
}
finishedSites++;
}
I want is to increase the efficiency here and replace the:
var tasks = new List<Task<SiteMetadata>>();
foreach (var item in sitesList)
tasks.Add(item.LoadMetaDataAsync());
to something like this:
var runAll = Task.Factory.StartNew(() => Parallel.ForEach(sitesList, item => item.LoadMetaDataAsync()));
But the problem is that I dont know how to get the first task that finishes and to the database one by one. There is anyway to do this using the Parallel or something similar or even something more efficient than what I am doing right now?
Thanks in advance.
I want to scrape several websites at the same time, but just add the information to the database one by one.
Your code already does that.
I want is to increase the efficiency here and replace
That won't increase efficiency; it will decrease it. Parallel.ForEach is a parallel operation, where "parallel" means "concurrent using multiple threads". Starting multiple tasks and then combining them with Task.WhenAll is how you do concurrency without using multiple threads. Not using unnecessary threads is more efficient.
However, it looks like what you're doing may benefit from TPL Dataflow, which allows you to define a "pipeline" to send data through. It won't increase your "efficiency", but it may clarify the code.
I think you're facing the "multi providers - one consumer" issue. I suggest you to use Thread-Safe Collections.
In the following console sample, I use ConcurrentBag to store task results, then in main thread, I use a while loop to grab a result and print it out(You can do this in your own work thread). Note there isn't any lock in the entire program:
private static readonly Random Random = new Random(DateTime.Now.Millisecond);
private static readonly ConcurrentBag<int> Bag = new ConcurrentBag<int>();
private static void Main(string[] args)
{
for (int i = 0; i < 10; i++)
{
Task.Run(async () => await SampleTask());
}
while (true)
{
if (Console.KeyAvailable && Console.ReadKey(true).Key == ConsoleKey.Escape) break;
int item;
if (Bag.TryTake(out item))
Console.WriteLine(item);
}
}
private static async Task SampleTask()
{
await Task.Delay(Random.Next(1000));
Bag.Add(Random.Next(10));
}

Working with Lists and Parallel.Invoke

I'm attempting to run methods using Parallel.Invoke, with each method appending the response to a list outside of Parallel.Invoke
I've been playing around using a lock, but the following code doesn't work
var allResults = new List<ResultRecord>();
var sync = new object();
Parallel.Invoke(
() => { var results = GetResultSet1(); lock (sync) { allResults.Concat(results); } },
() => { var results = GetResultSet2(); lock (sync) { allResults.Concat(results); } });
This code doesn't setup the list, allResults ends up being empty.
As already explained by PetSerAl in a comment, Concat() does not modify the list, it returns the modified list. Using AddRange() instead of Concat() is a solution, but I think using Tasks is clearer than Parallel.Invoke() here:
var resultSet1Task = Task.Run(() => GetResultSet1());
var resultSet2Task = Task.Run(() => GetResultSet2());
List<ResultRecord> allResults = resultSet1Task.Result.Concat(resultSet2Task.Result).ToList();
This way, there is no need for explicit locking, which makes the code safer.
If you can, you could also use await instead of .Result.

Thread safe with Linq and Tasks on a Collection

Given some code like so
public class CustomCollectionClass : Collection<CustomData> {}
public class CustomData
{
string name;
bool finished;
string result;
}
public async Task DoWorkInParallel(CustomCollectionClass collection)
{
// collection can be retrieved from a DB, may not exist.
if (collection == null)
{
collection = new CustomCollectionClass();
foreach (var data in myData)
{
collection.Add(new CustomData()
{
name = data.Name;
});
}
}
// This part doesn't feel safe. Not sure what to do here.
var processTasks = myData.Select(o =>
this.DoWorkOnItemInCollection(collection.Single(d => d.name = o.Name))).ToArray();
await Task.WhenAll(processTasks);
await SaveModifedCollection(collection);
}
public async Task DoWorkOnItemInCollection(CustomData data)
{
await DoABunchOfWorkElsewhere();
// This doesn't feel safe either. Lock here?
data.finished = true;
data.result = "Parallel";
}
As I noted in a couple comments inline, it doesn't feel safe for me to do the above, but I'm not sure. I do have a collection of elements that I'd like to assign a unique element to each parallel task and have those tasks be able to modify that single element of the collection based on what work is done. End result being, I wanted to save the collection after individual, different elements have been modified in parallel. If this isn't a safe way to do it, how best would I go about this?
Your code is the right way to do this, assuming starting DoABunchOfWorkElsewhere() multiple times is itself safe.
You don't need to worry about your LINQ query, because it doesn't actually run in parallel. All it does is to invoke DoWorkOnItemInCollection() multiple times. Those invocations may work in parallel (or not, depending on your synchronization context and the implementation of DoABunchOfWorkElsewhere()), but the code you showed is safe.
Your above code should work without issue. You are passing off one item to each worker thread. I'm not so sure about the async attribute. You might just return a Task, and then in your method do:
public Task DoWorkOnItemInCollection(CustomData data)
{
return Task.Run(() => {
DoABunchOfWorkElsewhere().Wait();
data.finished = true;
data.result = "Parallel";
});
}
You might want to be careful, with large amount of items, you could overflow your max thread count with background threads. In this case, c# just deletes your threads, which can be difficult to debug later.
I have done this before, It might be easier if instead of handing the whole collection to some magic linq, rather do a classic consumer problem:
class ParallelWorker<T>
{
private Action<T> Action;
private Queue<T> Queue = new Queue<T>();
private object QueueLock = new object();
private void DoWork()
{
while(true)
{
T item;
lock(this.QueueLock)
{
if(this.Queue.Count == 0) return; //exit thread
item = this.Queue.DeQueue();
}
try { this.Action(item); }
catch { /*...*/ }
}
}
public void DoParallelWork(IEnumerable<T> items, int maxDegreesOfParallelism, Action<T> action)
{
this.Action = action;
this.Queue.Clear();
this.Queue.AddRange(items);
List<Thread> threads = new List<Thread>();
for(int i = 0; i < items; i++)
{
ParameterizedThreadStart threadStart = new ParameterizedThreadStart(DoWork);
Thread thread = new Thread(threadStart);
thread.Start();
threads.Add(thread);
}
foreach(Thread thread in threads)
{
thread.Join();
}
}
}
This was done IDE free, so there may be typos.
I'm going to make the suggestion that you use Microsoft's Reactive Framework (NuGet "Rx-Main") to do this task.
Here's the code:
public void DoWorkInParallel(CustomCollectionClass collection)
{
var query =
from x in collection.ToObservable()
from r in Observable.FromAsync(() => DoWorkOnItemInCollection(x))
select x;
query.Subscribe(x => { }, ex => { }, async () =>
{
await SaveModifedCollection(collection);
});
}
Done. That's it. Nothing more.
I have to say though, that when I tried to get your code to run it was full of bugs and issues. I suspect that the code you posted isn't your production code, but an example you wrote specifically for this question. I suggest that you try to make a running compilable example before posting.
Nevertheless, my suggestion should work for you with a little tweaking.
It is multi-threaded and thread-safe. And it does do cleanly save the modified collection when done.

Process Asynchronous Calls in Sequence

I am making a bunch or asynchronous calls to Azure Table Storage. For obvious reasons insertion of these records are not in the same order as they were invoked.
I am planning to introduce ConcurrentQueue to ensure sequence. Following sample code written as a POC seems to achieve desired result.
I am wondering is this the best way I can ensure asynchronous calls
will be completed in sequence?
public class ProductService
{
ConcurrentQueue<string> ordersQueue = new ConcurrentQueue<string>();
//Place make calls here
public void PlaceOrder()
{
Task.Run(() =>
{
Parallel.For(0, 100, (i) =>
{
string item = "Product " + i;
ordersQueue.Enqueue(item);
Console.WriteLine("Placed Order: " + item);
Task.Delay(2000).Wait();
});
});
}
//Process calls in sequence, I am hoping concurrentQueue will be consistent.
public void Deliver()
{
Task.Run(() =>
{
while(true)
{
string productId;
ordersQueue.TryDequeue(out productId);
if (!string.IsNullOrEmpty(productId))
{
Console.WriteLine("Delivered: " + productId);
}
}
});
}
}
If you want to process records asynchronously and sequentially this sounds like a perfect fit for TPL Dataflow's ActionBlock. Simply create a block with the action to execute and post records to it. It supports async actions and keeps order:
var block = new ActionBlock<Product>(async product =>
{
await product.ExecuteAsync();
});
block.Post(new Product());
It also supports processing in parallel and bounded capacity if you need.
Try using Microsoft's Reactive Framework.
This worked for me:
IObservable<Task<string>> query =
from i in Observable.Range(0, 100, Scheduler.Default)
let item = "Product " + i
select AzureAsyncCall(item);
query
.Subscribe(async x =>
{
var result = await x;
/* do something with result */
});
The AzureAsyncCall call signature I used was public Task<string> AzureAsyncCall(string x).
I dropped in a bunch of Console.WriteLine(Thread.CurrentThread.ManagedThreadId); calls to ensure I was getting the right async behaviour in my test code. It worked well.
All the calls were asynchronous and serialized one after the other.

Categories

Resources