Change normal loop to Parallel loop - c#

I have the following code:
static void Main(string[] args)
{
TaskExecuter.Execute();
}
class Task
{
int _delay;
private Task(int delay) { _delay = delay; }
public void Execute() { Thread.Sleep(_delay); }
public static IEnumerable GetAllTasks()
{
Random r = new Random(4711);
for (int i = 0; i < 10; i++)
yield return new Task(r.Next(100, 5000));
}
}
static class TaskExecuter
{
public static void Execute()
{
foreach (Task task in Task.GetAllTasks())
{
task.Execute();
}
}
}
I need to change the loop in Execute method to paralle with multiple threads, I tried the following, but it isn't working since GetAllTasks returns IEnumerable and not a list
Parallel.ForEach(Task.GetAllTasks(), task =>
{
//Execute();
});

Parallel.ForEach works with IEnumerable<T>, so adjust your GetAllTasks to return IEnumerable<Task>.
Also .net has widely used Task class, I would avoid naming own class like that to avoid confusion.

Parallel.ForEach takes an IEnumerable<TSource>, so your code should be fine. However, you need to perform the Execute call on the task instance that is passed as parameter to your lambda statement.
Parallel.ForEach(Task.GetAllTasks(), task =>
{
task.Execute();
});
This can also be expressed as a one-line lambda expression:
Parallel.ForEach(Task.GetAllTasks(), task => task.Execute());
There is also another subtle bug in your code that you should pay attention to. Per its internal implementation, Parallel.ForEach may enumerate the elements of your sequence in parallel. However, you are calling an instance method of the Random class in your enumerator, which is not thread-safe, possibly leading to race issues. The easiest way to work around this would be to pre-populate your sequence as a list:
Parallel.ForEach(Task.GetAllTasks().ToList(), task => task.Execute());

This worked on my linqpad. I just renamed your Task class to Work and also returned an IEnumerable<T> from GetAllTasks:
class Work
{
int _delay;
private Work(int delay) { _delay = delay; }
public void Execute() { Thread.Sleep(_delay); }
public static IEnumerable<Work> GetAllTasks()
{
Random r = new Random(4711);
for (int i = 0; i < 10; i++)
yield return new Work(r.Next(100, 5000));
}
}
static class TaskExecuter
{
public static void Execute()
{
foreach (Work task in Work.GetAllTasks())
{
task.Execute();
}
}
}
void Main()
{
System.Threading.Tasks.Parallel.ForEach(Work.GetAllTasks(), new Action<Work>(task =>
{
//Execute();
}));
}

Related

Waiting for all jobs to be finished with async await

I'm trying to understand the usage of async-await in C#5. If I have 2 jobs started in a method, is there a best way to wait for their completion in C#5+ ? I've done the example below but I fail to see what the async await keywork brings here besides free documentation with async keyword.
I made the following example, I want "FINISHED !" to be printed last. It is not the case however. What did I miss ? How can I make the async method wait until all jobs are finished ? Is there a point using async-await here ? I could just do Task.WaitAll with a non-async method here. I don't really understand what async brings in case you want to wait.
class Program
{
static void Main(string[] args)
{
var fooWorker = new FooWorker();
var barWorker = new BarWorker();
var test = new Class1(fooWorker, barWorker);
test.SomeWork();
Console.ReadLine();
}
}
public class Foo
{
public Foo(string bar) => Bar = bar;
public string Bar { get; }
}
public class Class1
{
private IEnumerable<Foo> _foos;
private readonly FooWorker _fooWorker;
private readonly BarWorker _barWorker;
public Class1(FooWorker fooWorker, BarWorker barWorker)
{
_fooWorker = fooWorker;
_barWorker = barWorker;
}
public void SomeWork()
{
_foos = ProduceManyFoo();
MoreWork();
Console.WriteLine("FINISHED !");
}
private async void MoreWork()
{
if (_foos == null || !_foos.Any()) return;
var fooList = _foos.ToList();
Task fooWorkingTask = _fooWorker.Work(fooList);
Task barWorkingTask = _barWorker.Work(fooList);
await Task.WhenAll(fooWorkingTask, barWorkingTask);
}
private IEnumerable<Foo> ProduceManyFoo()
{
int i = 0;
if (++i < 100) yield return new Foo(DateTime.Now.ToString(CultureInfo.InvariantCulture));
}
}
public abstract class AWorker
{
protected virtual void DoStuff(IEnumerable<Foo> foos)
{
foreach (var foo in foos)
{
Console.WriteLine(foo.Bar);
}
}
public Task Work(IEnumerable<Foo> foos) => Task.Run(() => DoStuff(foos));
}
public class FooWorker : AWorker { }
public class BarWorker : AWorker { }
You are firing off tasks and just forgetting them, while the thread continues running. This fixes it.
Main:
static async Task Main(string[] args)
{
var fooWorker = new FooWorker();
var barWorker = new BarWorker();
var test = new Class1(fooWorker, barWorker);
await test.SomeWork();
Console.ReadLine();
}
SomeWork:
public async Task SomeWork()
{
_foos = ProduceManyFoo();
await MoreWork();
Console.WriteLine("FINISHED !");
}
MoreWork signature change:
private async Task MoreWork()
The obvious code smell which should help make the problem clear is using async void. Unless required this should always be avoided.
When using async and await you'll usually want to chain the await calls to the top-level (in this case Main).
await is non-blocking, so anything that calls an async method should really care about the Task being returned.

How to safely write to the same List

I've got a public static List<MyDoggie> DoggieList;
DoggieList is appended to and written to by multiple processes throughout my application.
We run into this exception pretty frequently:
Collection was modified; enumeration operation may not execute
Assuming there are multiple classes writing to DoggieList how do we get around this exception?
Please note that this design is not great, but at this point we need to quickly fix it in production.
How can we perform mutations to this list safely from multiple threads?
I understand we can do something like:
lock(lockObject)
{
DoggieList.AddRange(...)
}
But can we do this from multiple classes against the same DoggieList?
you can also create you own class and encapsulate locking thing in that only, you can try like as below ,
you can add method you want like addRange, Remove etc.
class MyList {
private object objLock = new object();
private List<int> list = new List<int>();
public void Add(int value) {
lock (objLock) {
list.Add(value);
}
}
public int Get(int index) {
int val = -1;
lock (objLock) {
val = list[0];
}
return val;
}
public void GetAll() {
List<int> retList = new List<int>();
lock (objLock) {
retList = new List<T>(list);
}
return retList;
}
}
Good stuff : Concurrent Collections very much in detail :http://www.albahari.com/threading/part5.aspx#_Concurrent_Collections
making use of concurrent collection ConcurrentBag Class can also resolve issue related to multiple thread update
Example
using System.Collections.Concurrent;
using System.Threading.Tasks;
public static class Program
{
public static void Main()
{
var items = new[] { "item1", "item2", "item3" };
var bag = new ConcurrentBag<string>();
Parallel.ForEach(items, bag.Add);
}
}
Using lock a the disadvantage of preventing concurrent readings.
An efficient solution which does not require changing the collection type is to use a ReaderWriterLockSlim
private static readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
With the following extension methods:
public static class ReaderWriterLockSlimExtensions
{
public static void ExecuteWrite(this ReaderWriterLockSlim aLock, Action action)
{
aLock.EnterWriteLock();
try
{
action();
}
finally
{
aLock.ExitWriteLock();
}
}
public static void ExecuteRead(this ReaderWriterLockSlim aLock, Action action)
{
aLock.EnterReadLock();
try
{
action();
}
finally
{
aLock.ExitReadLock();
}
}
}
which can be used the following way:
_lock.ExecuteWrite(() => DoggieList.Add(new Doggie()));
_lock.ExecuteRead(() =>
{
// safe iteration
foreach (MyDoggie item in DoggieList)
{
....
}
})
And finally if you want to build your own collection based on this:
public class SafeList<T>
{
private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
private readonly List<T> _list = new List<T>();
public T this[int index]
{
get
{
T result = default(T);
_lock.ExecuteRead(() => result = _list[index]);
return result;
}
}
public List<T> GetAll()
{
List<T> result = null;
_lock.ExecuteRead(() => result = _list.ToList());
return result;
}
public void ForEach(Action<T> action) =>
_lock.ExecuteRead(() => _list.ForEach(action));
public void Add(T item) => _lock.ExecuteWrite(() => _list.Add(item));
public void AddRange(IEnumerable<T> items) =>
_lock.ExecuteWrite(() => _list.AddRange(items));
}
This list is totally safe, multiple threads can add or get items in parallel without any concurrency issue. Additionally, multiple threads can get items in parallel without locking each other, it's only when writing than 1 single thread can work on the collection.
Note that this collection does not implement IEnumerable<T> because you could get an enumerator and forget to dispose it which would leave the list locked in read mode.
make DoggieList of type ConcurrentStack and then use pushRange method. It is thread safe.
using System.Collections.Concurrent;
var doggieList = new ConcurrentStack<MyDoggie>();
doggieList.PushRange(YourCode)

Threading With List Property

public static class People
{
List<string> names {get; set;}
}
public class Threading
{
public static async Task DoSomething()
{
var t1 = new Task1("bob");
var t2 = new Task1("erin");
await Task.WhenAll(t1,t2);
}
private static async Task Task1(string name)
{
await Task.Run(() =>
{
if(People.names == null) People.names = new List<string>();
Peoples.names.Add(name);
}
}
}
Is that dangerous to initialize a list within a thread? Is it possible that both threads could initialize the list and remove one of the names?
So I was thinking of three options:
Leave it like this since it is simple - only if it is safe though
Do same code but use a concurrentBag - I know thread safe but is initialize safe
Using [DataMember(EmitDefaultValue = new List())] and then just do .Add in Task1 and not worry about initializing. But the only con to this is sometimes the list wont need to be used at all and it seems like a waste to initialize it everytime.
Okay so what I figured worked best for my case was I used a lock statement.
public class Class1
{
private static Object thisLock = new Object();
private static async Task Task1(string name)
{
await Task.Run(() =>
{
AddToList(name);
}
}
private static AddToList(string name)
{
lock(thisLock)
{
if(People.names == null) People.names = new List<string>();
People.names.Add(name);
}
}
}
public static class People
{
public static List<string> names {get; set;}
}
for a simple case like this the easiest way to get thread-safety is using the lock statement:
public static class People
{
static List<string> _names = new List<string>();
public static void AddName(string name)
{
lock (_names)
{
_names.Add(name);
}
}
public static IEnumerable<string> GetNames()
{
lock(_names)
{
return _names.ToArray();
}
}
}
public class Threading
{
public static async Task DoSomething()
{
var t1 = new Task1("bob");
var t2 = new Task1("erin");
await Task.WhenAll(t1,t2);
}
private static async Task Task1(string name)
{
People.AddName(name);
}
}
of course it's not very usefull (why not just add without the threads) - but I hope you get the idea.
If you don't use some kind of lock and concurrently read and write to a List you will most likely get an InvalidOperationException saying the collection has changed during read.
Because you don't really know when a user will use the collection you might return the easiest way to get thread-saftey is copying the collection into an array and returning this.
If this is not practical (collection to large, ..) you have to use the classes in System.Collections.Concurrrent for example the BlockingCollection but those are a bit more involved.

Parallel.ForEach stalled when integrated with BlockingCollection

I adopted my implementation of parallel/consumer based on the code in this question
class ParallelConsumer<T> : IDisposable
{
private readonly int _maxParallel;
private readonly Action<T> _action;
private readonly TaskFactory _factory = new TaskFactory();
private CancellationTokenSource _tokenSource;
private readonly BlockingCollection<T> _entries = new BlockingCollection<T>();
private Task _task;
public ParallelConsumer(int maxParallel, Action<T> action)
{
_maxParallel = maxParallel;
_action = action;
}
public void Start()
{
try
{
_tokenSource = new CancellationTokenSource();
_task = _factory.StartNew(
() =>
{
Parallel.ForEach(
_entries.GetConsumingEnumerable(),
new ParallelOptions { MaxDegreeOfParallelism = _maxParallel, CancellationToken = _tokenSource.Token },
(item, loopState) =>
{
Log("Taking" + item);
if (!_tokenSource.IsCancellationRequested)
{
_action(item);
Log("Finished" + item);
}
else
{
Log("Not Taking" + item);
_entries.CompleteAdding();
loopState.Stop();
}
});
},
_tokenSource.Token);
}
catch (OperationCanceledException oce)
{
System.Diagnostics.Debug.WriteLine(oce);
}
}
private void Log(string message)
{
Console.WriteLine(message);
}
public void Stop()
{
Dispose();
}
public void Enqueue(T entry)
{
Log("Enqueuing" + entry);
_entries.Add(entry);
}
public void Dispose()
{
if (_task == null)
{
return;
}
_tokenSource.Cancel();
while (!_task.IsCanceled)
{
}
_task.Dispose();
_tokenSource.Dispose();
_task = null;
}
}
And here is a test code
class Program
{
static void Main(string[] args)
{
TestRepeatedEnqueue(100, 1);
}
private static void TestRepeatedEnqueue(int itemCount, int parallelCount)
{
bool[] flags = new bool[itemCount];
var consumer = new ParallelConsumer<int>(parallelCount,
(i) =>
{
flags[i] = true;
}
);
consumer.Start();
for (int i = 0; i < itemCount; i++)
{
consumer.Enqueue(i);
}
Thread.Sleep(1000);
Debug.Assert(flags.All(b => b == true));
}
}
The test always fails - it always stuck at around 93th-item from the 100 tested. Any idea which part of my code caused this issue, and how to fix it?
You cannot use Parallel.Foreach() with BlockingCollection.GetConsumingEnumerable(), as you have discovered.
For an explanation, see this blog post:
https://devblogs.microsoft.com/pfxteam/parallelextensionsextras-tour-4-blockingcollectionextensions/
Excerpt from the blog:
BlockingCollection’s GetConsumingEnumerable implementation is using BlockingCollection’s internal synchronization which already supports multiple consumers concurrently, but ForEach doesn’t know that, and its enumerable-partitioning logic also needs to take a lock while accessing the enumerable.
As such, there’s more synchronization here than is actually necessary, resulting in a potentially non-negligable performance hit.
[Also] the partitioning algorithm employed by default by both Parallel.ForEach and PLINQ use chunking in order to minimize synchronization costs: rather than taking the lock once per element, it'll take the lock, grab a group of elements (a chunk), and then release the lock.
While this design can help with overall throughput, for scenarios that are focused more on low latency, that chunking can be prohibitive.
That blog also provides the source code for a method called GetConsumingPartitioner() which you can use to solve the problem.
public static class BlockingCollectionExtensions
{
public static Partitioner<T> GetConsumingPartitioner<T>(this BlockingCollection<T> collection)
{
return new BlockingCollectionPartitioner<T>(collection);
}
public class BlockingCollectionPartitioner<T> : Partitioner<T>
{
private BlockingCollection<T> _collection;
internal BlockingCollectionPartitioner(BlockingCollection<T> collection)
{
if (collection == null)
throw new ArgumentNullException("collection");
_collection = collection;
}
public override bool SupportsDynamicPartitions
{
get { return true; }
}
public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
{
if (partitionCount < 1)
throw new ArgumentOutOfRangeException("partitionCount");
var dynamicPartitioner = GetDynamicPartitions();
return Enumerable.Range(0, partitionCount).Select(_ => dynamicPartitioner.GetEnumerator()).ToArray();
}
public override IEnumerable<T> GetDynamicPartitions()
{
return _collection.GetConsumingEnumerable();
}
}
}
The reason for failure is because of the following reason as explained here
The partitioning algorithm employed by default by both
Parallel.ForEach and PLINQ use chunking in order to minimize
synchronization costs: rather than taking the lock once per element,
it'll take the lock, grab a group of elements (a chunk), and then
release the lock.
To get it to work, you can add a method on your ParallelConsumer<T> class to indicate that the adding is completed, as below
public void StopAdding()
{
_entries.CompleteAdding();
}
And now call this method after your for loop , as below
consumer.Start();
for (int i = 0; i < itemCount; i++)
{
consumer.Enqueue(i);
}
consumer.StopAdding();
Otherwise, Parallel.ForEach() would wait for the threshold to be reached so as to grab the chunk and start processing.

TPL with multiple Method calls Console.Write issue

If i use TPL i run into problems in Parse.. Methods i do use Console.Write to build some Line but somtimes one is to fast and writes in the other Methods row. How do i lock or is there some better way?
Parallel.Invoke(
() => insertedOne = Lib.ParseOne(list),
() => insertedTwo = Lib.ParseTwo(list),
() => insertedThree = Lib.ParseThree(list));
Example for Parse.. Methods.
public static int ParseOne(string[] _list) {
Console.Write("blabla");
Console.Write("blabla");
return 0;
}
public static int ParseTwo(string[] _list) {
Console.Write("hahahah");
Console.Write("hahahah");
return 0;
}
public static int ParseThree(string[] _list) {
Console.Write("egegege");
Console.Write("egegege");
return 0;
}
To be able to print your blablas, hahahahs and egegeges as a single entity(indivisible)
you can write your method as:
public static int ParseThree(string[] _list)
{
lock (Console.Out)
{
Console.Write("egegege");
Console.Write("egegege");
}
return 0;
}
Why don't you run all the tasks in one thread, one after the other?
System.Threading.Tasks.Task.Factory.StartNew(()=>
{
insertedOne = Lib.ParseOne(list);
insertedTwo = Lib.ParseTwo(list);
insertedThree = Lib.ParseThree(list);
});
This way you won't have that much of a race condition.

Categories

Resources