Concurrency vs Normal Collection? - c#

so i have a question about the System.Collections.Concurrent
I saw that the Concurrent is acctually a safe thread collection, but in wich cases it can be helpfull?
I made 2 examples and the result are the same
First the ConcurrentQueue:
static ConcurrentQueue<int> queue = new ConcurrentQueue<int>();
private static readonly object obj = new object();
static int i = 0;
static int Num = 0;
static void Run(object loopNum)
{
lock (obj)
{
for (int N = 0; N < 10; N++)
{
queue.Enqueue (i);
Thread.Sleep(250);
queue.TryDequeue(out Num);
Console.WriteLine($"{Num} Added! in {loopNum} Loop, ThreadID: [{Thread.CurrentThread.ManagedThreadId}]");
i++;
}
}
}
And now the normal Queue:
static Queue<int> queue = new Queue<int>();
private static readonly object obj = new object();
static int i = 0;
static void Run(object loopNum)
{
lock (obj)
{
for (int N = 0; N < 10; N++)
{
queue.Enqueue (i);
Thread.Sleep(250);
Console.WriteLine($"{queue.Dequeue()} Added! in {loopNum} Loop, ThreadID: [{Thread.CurrentThread.ManagedThreadId}]");
i++;
}
}
}
Main:
static void Main()
{
Thread[] Th = new Thread[] { new Thread(Run), new Thread(Run) };
Th[0].Start("First");
Th[1].Start("Second");
Console.ReadKey();
}
The result are the same
Sure, it got some diffrent methods like TryDequeue And a few more, but what it really helpfull for?
Any help will be very appriciated :)

Don't use lock() in conjunction with ConcurrentQueue<> or similar items in that namespace. It's detrimental to performance.
You can use ConcurrentQueue<> safely with multiple threads and have great performance. The same can not be said with lock() and regular collections.
That's why your results are the same.

The reason for using ConcurrentQueue<T> is to avoid writing your own locking code.
If you have multiple threads adding or removing items from a Queue<T> you are likely to get an exception. Using a ConcurrentQueue<T> will avoid the exceptions.
Here's a sample program which will likely cause an exception when using multiple threads to write to a Queue<T> while it works with a ConcurrentQueue<T>:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Threading.Tasks;
internal class Program
{
private static void Main()
{
var queue1 = new ConcurrentQueue<int>();
var queue2 = new Queue<int>();
// This will work fine.
var task1 = Task.Run(() => producer(item => queue1.Enqueue(item)));
var task2 = Task.Run(() => producer(item => queue1.Enqueue(item)));
Task.WaitAll(task1, task2);
// This will cause an exception.
var task3 = Task.Run(() => producer(item => queue2.Enqueue(item)));
var task4 = Task.Run(() => producer(item => queue2.Enqueue(item)));
Task.WaitAll(task3, task4);
}
private static void producer(Action<int> add)
{
for (int i = 0; i < 10000; ++i)
add(i);
}
}
Try running it and see what happens.

When you are using the lock construct, your code effectively executes in sequence, not in parallel. This solution is suitable for the version with simple Queue as it's not thread-safe, but with ConcurrentQueue, using lock kinda defeats the purpose. Remove the lock for ConcurrentQueue, remove the Thread.Sleep, and use 20 threads instead of 2 just for kicks. You can use Parallel.For() method to spawn your tasks.
Parallel.For(0, 20, i => Run());

Thank you everyone for all your answers, really helped me out, i appriciate it alot.
By the way Matthew Watson, your example sometimes give's an exception and sometime's isnt, i made a better example, but yeah i get the point.
private static void Main()
{
var queue1 = new ConcurrentQueue<int>();
var queue2 = new Queue<int>();
// This will work fine.
var task1 = Enumerable.Range(0, 40)
.Select(_ => Task.Run(() => producer(item => queue1.Enqueue(item))))
.ToArray();
Task.WaitAll(task1);
// This will cause an exception.
var task2 = Enumerable.Range(0, 40)
.Select(_ => Task.Run(() => producer(item => queue2.Enqueue(item))))
.ToArray();
Task.WaitAll(task2);
}
Thanks again :)

Related

How to handle threads that hang when using SemaphoreSlim

I have some code that runs thousands of URLs through a third party library. Occasionally the method in the library hangs which takes up a thread. After a while all threads are taken up by processes doing nothing and it grinds to a halt.
I am using a SemaphoreSlim to control adding new threads so I can have an optimal number of tasks running. I need a way to identify tasks that have been running too long and then to kill them but also release a thread from the SemaphoreSlim so a new task can be created.
I am struggling with the approach here so I made some test code that immitates what I am doing. It create tasks that have a 10% chance of hanging so very quickly all threads have hung.
How should I be checking for these and killing them off?
Here is the code:
class Program
{
public static SemaphoreSlim semaphore;
public static List<Task> taskList;
static void Main(string[] args)
{
List<string> urlList = new List<string>();
Console.WriteLine("Generating list");
for (int i = 0; i < 1000; i++)
{
//adding random strings to simulate a large list of URLs to process
urlList.Add(Path.GetRandomFileName());
}
Console.WriteLine("Queueing tasks");
semaphore = new SemaphoreSlim(10, 10);
Task.Run(() => QueueTasks(urlList));
Console.ReadLine();
}
static void QueueTasks(List<string> urlList)
{
taskList = new List<Task>();
foreach (var url in urlList)
{
Console.WriteLine("{0} tasks can enter the semaphore.",
semaphore.CurrentCount);
semaphore.Wait();
taskList.Add(DoTheThing(url));
}
}
static async Task DoTheThing(string url)
{
Random rand = new Random();
// simulate the IO process
await Task.Delay(rand.Next(2000, 10000));
// add a 10% chance that the thread will hang simulating what happens occasionally with http request
int chance = rand.Next(1, 100);
if (chance <= 10)
{
while (true)
{
await Task.Delay(1000000);
}
}
semaphore.Release();
Console.WriteLine(url);
}
}
As people have already pointed out, Aborting threads in general is bad and there is no guaranteed way of doing it in C#. Using a separate process to do the work and then kill it is a slightly better idea than attempting Thread.Abort; but still not the best way to go. Ideally, you want co-operative threads/processes, which use IPC to decide when to bail out themselves. This way the cleanup is done properly.
With all that said, you can use code like below to do what you intend to do. I have written it assuming your task will be done in a thread. With slight changes, you can use the same logic to do your task in a process
The code is by no means bullet-proof and is meant to be illustrative. The concurrent code is not really tested well. Locks are held for longer than needed and some places I am not locking (like the Log function)
class TaskInfo {
public Thread Task;
public DateTime StartTime;
public TaskInfo(ParameterizedThreadStart startInfo, object startArg) {
Task = new Thread(startInfo);
Task.Start(startArg);
StartTime = DateTime.Now;
}
}
class Program {
const int MAX_THREADS = 1;
const int TASK_TIMEOUT = 6; // in seconds
const int CLEANUP_INTERVAL = TASK_TIMEOUT; // in seconds
public static SemaphoreSlim semaphore;
public static List<TaskInfo> TaskList;
public static object TaskListLock = new object();
public static Timer CleanupTimer;
static void Main(string[] args) {
List<string> urlList = new List<string>();
Log("Generating list");
for (int i = 0; i < 2; i++) {
//adding random strings to simulate a large list of URLs to process
urlList.Add(Path.GetRandomFileName());
}
Log("Queueing tasks");
semaphore = new SemaphoreSlim(MAX_THREADS, MAX_THREADS);
Task.Run(() => QueueTasks(urlList));
CleanupTimer = new Timer(CleanupTasks, null, CLEANUP_INTERVAL * 1000, CLEANUP_INTERVAL * 1000);
Console.ReadLine();
}
// TODO: Guard against re-entrancy
static void CleanupTasks(object state) {
Log("CleanupTasks started");
lock (TaskListLock) {
var now = DateTime.Now;
int n = TaskList.Count;
for (int i = n - 1; i >= 0; --i) {
var task = TaskList[i];
Log($"Checking task with ID {task.Task.ManagedThreadId}");
// kill processes running for longer than anticipated
if (task.Task.IsAlive && now.Subtract(task.StartTime).TotalSeconds >= TASK_TIMEOUT) {
Log("Cleaning up hung task");
task.Task.Abort();
}
// remove task if it is not alive
if (!task.Task.IsAlive) {
Log("Removing dead task from list");
TaskList.RemoveAt(i);
continue;
}
}
if (TaskList.Count == 0) {
Log("Disposing cleanup thread");
CleanupTimer.Dispose();
}
}
Log("CleanupTasks done");
}
static void QueueTasks(List<string> urlList) {
TaskList = new List<TaskInfo>();
foreach (var url in urlList) {
Log($"Trying to schedule url = {url}");
semaphore.Wait();
Log("Semaphore acquired");
ParameterizedThreadStart taskRoutine = obj => {
try {
DoTheThing((string)obj);
} finally {
Log("Releasing semaphore");
semaphore.Release();
}
};
var task = new TaskInfo(taskRoutine, url);
lock (TaskListLock)
TaskList.Add(task);
}
Log("All tasks queued");
}
// simulate all processes get hung
static void DoTheThing(string url) {
while (true)
Thread.Sleep(5000);
}
static void Log(string msg) {
Console.WriteLine("{0:HH:mm:ss.fff} Thread {1,2} {2}", DateTime.Now, Thread.CurrentThread.ManagedThreadId.ToString(), msg);
}
}

Tasks are starting sequencial instead of parallel

I need to start tasks in parallel, but I choose to use Task.Run instead of Parallel.Foreach, so I can get some feedback when all tasks finished and enable UI controls.
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if(cells != null)
{
List<Task> tasks = new List<Task>();
foreach (Cell c in cells)
{
Cell cell = c;
var progressHandler = new Progress<string>(value =>
{
cell.Status = value;
});
var progress = progressHandler as IProgress<string>;
Task t = Task.Run(() =>
{
progress.Report("Starting...");
int a = 123;
for (int i = 0; i < 200000; i++)
{
a = a + i;
Task.Delay(500).Wait();
}
progress.Report("Done");
});
tasks.Add(t);
}
await Task.WhenAll(tasks);
Console.WriteLine("Done, enabld UI controls");
}
}
So what I expect is that I see in UI "Starting..." almost instantly for all items. What I actually see is first 4 items are "Starting..." (I guess because all 4 CPU cores are used per thread), then each second or less new item is "Starting". I have total 37 items and it takes around 30 seconds for all items to start all tasks.
How can I make it as parallel as possible?
How can I make it as parallel as possible?
The part of inner for loop is simulating long running CPU-bound job, which I would like to start at the same time as much as possible.
It's already as parallel as possible. Starting 37 threads that all have CPU-bound work to do will not make it go any faster, since you're apparently running it on a 4-core machine. There are 4 cores, so only 4 threads can actually run at a time. The other 33 threads are going to be waiting while 4 are running. They would only appear to run simultaneously.
That said, if you really want to start up all those thread pool threads, you can do this by calling ThreadPool.SetMinThreads.
I need to start tasks in parallel, but I choose to use Task.Run instead of Parallel.Foreach, so I can get some feedback when all tasks finished and enable UI controls.
Since you have parallel work to do, you should use Parallel. If you want the nice resume-on-the-UI-thread behavior of await, then you can use a single await Task.Run, something like this:
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if (cells == null)
return;
var workItems = cells.Select(c => new
{
Cell = c,
Progress = new Progress<string>(value => { c.Status = value; }),
}).ToList();
await Task.Run(() => Parallel.ForEach(workItems, item =>
{
var progress = item.Progress as IProgress<string>();
progress.Report("Starting...");
int a = 123;
for (int i = 0; i < 200000; i++)
{
a = a + i;
Thread.Sleep(500);
}
progress.Report("Done");
}));
Console.WriteLine("Done, enabld UI controls");
}
I'd say, it is as parallel as possible. If you have 4 cores, you can run 4 threads in parallel.
If you can do stuff while waiting for the "delay", have a look into asynchronous programming (where one thread can run multiple tasks "at once", because most of them are waiting for something).
EDIT: you can also run Parallel.ForEach in its own task and await that:
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if(cells != null)
{
await Task.Run( () => Parallel.ForEach( cells, c => ... ) );
}
}
I think it relies on your taskcreation-options.
TaskCreationOptions.LongRunning
Here you can find further informations:
https://msdn.microsoft.com/en-us/library/system.threading.tasks.taskcreationoptions(v=vs.110).aspx
But you have to know, that task uses a threadpool with a finite maximum amount of threads. You can use LongRunning to signal, that this task needs a long time and should not clog your pool. I thinks it's more complex to create a long-running task, because the scheduler may create a new thread.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace TaskTest
{
internal class Program
{
private static void Main(string[] args)
{
var demo = new Program();
demo.SimulateClick();
Console.ReadLine();
}
public void SimulateClick()
{
buttonStart_Click(null, null);
}
private async void buttonStart_Click(object sender, EventArgs e)
{
var tasks = new List<Task>();
for (var i = 0; i < 36; i++)
{
var taskId = i;
var t = Task.Factory.StartNew((() =>
{
Console.WriteLine($"Starting Task ({taskId})");
for (var ii = 0; ii < 200000; ii++)
{
Task.Delay(TimeSpan.FromMilliseconds(500)).Wait();
var s1 = new string(' ', taskId);
var s2 = new string(' ', 36-taskId);
Console.WriteLine($"Updating Task {s1}X{s2} ({taskId})");
}
Console.Write($"Done ({taskId})");
}),TaskCreationOptions.LongRunning);
tasks.Add(t);
}
await Task.WhenAll(tasks);
Console.WriteLine("Done, enabld UI controls");
}
}
}

System.ArgumentException when locking

In my real application I need to iterate collection, but it can be changed from other thread. So I need to copy collection to iterate on it. I reproduced this problem to small example, but apparently my lack of understanding of locks and threads results in System.ArgumentException. Tried different things with lock, but result is the same.
class Program
{
static List<int> list;
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 1000000; i++)
{
list.Add(i);
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
}
}
}
static void WorkThreadFunction()
{
lock (list)
{
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
}
}
Option 1:
Here's a modified version of your code:
class Program
{
static List<int> list;
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 1000000; i++)
{
lock (list) //Lock before modification
{
list.Add(i);
}
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
}
}
Console.ReadLine();
}
static void WorkThreadFunction()
{
lock (list)
{
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
}
}
What happens here is that your list is being modified while being converted to another list collection (where argument exception is happening). So to avoid that you will need to lock the list as shown above.
Option 2: (No lock)
Using Concurrent collections to remove the lock:
using System.Collections.Concurrent;
//Change this line
static List<int> list;
//To this line
static ConcurrentBag<int> list;
And remove all lock statements.
I see some issues in your algorithm, and may be you should refactor it. In case of using either locks or ConcurrentBag class you should realize that the copying entire collection into new one simply for enumeration is very huge and very time-consuming operation, and during it you can't operate with collection efficiently.
lock (list)
{
// VERY LONG OPERATION HERE
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
You really shouldn't lock collection for such amount of time - at the end of the for loop you have a lot of Threads which are blocking each other. You have to use the TPL classes for this approach and shouldn't use Threads directly.
The other case you can choose is to implement some of optimistic lock-free algorithm with double check for the collection version, or even lock-free and wait-free algorithm with storing the snapshot of the collection and checking for it inside your methods for the collection access. Additional information can be found here.
I think that the information you gave isn't enough to suggest you the right way to solve your issue.
Tried Joel's suggestions. ConcurrentBag was very slow. Locking at each of millions iteration seems inefficient. Looks like Event Wait Handles are good in this case (takes 3 time less than with locks on my pc).
class Program
{
static List<int> list;
static ManualResetEventSlim mres = new ManualResetEventSlim(false);
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 10000000; i++)
{
list.Add(i);
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
mres.Wait();
}
}
}
static void WorkThreadFunction()
{
List<int> tmp = list.ToList();
Console.WriteLine(list.Count);
mres.Set();
}
}

Repeat a task (TPL) in windows service, using ContinueWith

I have a windows service (written in C#) that use the task parallel library dll to perform some parallel tasks (5 tasks a time)
After the tasks are executed once I would like to repeat the same tasks on an on going basis (hourly). Call the QueuePeek method
Do I use a timer or a counter like I have setup in the code snippet below?
I am using a counter to set up the tasks, once I reach five I exit the loop, but I also use a .ContinueWith to decrement the counter, so my thought is that the counter value would be below 5 hence the loop would continue. But my ContinueWith seems to be executing on the main thread and the loop then exits.
The call to DecrementCounter using the ContinueWith does not seem to work
FYI : The Importer class is to load some libraries using MEF and do the work
This is my code sample:
private void QueuePeek()
{
var list = SetUpJobs();
while (taskCounter < 5)
{
int j = taskCounter;
Task task = null;
task = new Task(() =>
{
DoLoad(j);
});
taskCounter += 1;
tasks[j] = task;
task.ContinueWith((t) => DecrementTaskCounter());
task.Start();
ds.SetJobStatus(1);
}
if (taskCounter == 0)
Console.WriteLine("Completed all tasks.");
}
private void DoLoad(int i)
{
ILoader loader;
DataService.DataService ds = new DataService.DataService();
Dictionary<int, dynamic> results = ds.AssignRequest(i);
var data = results.Where(x => x.Key == 2).First();
int loaderId = (int)data.Value;
Importer imp = new Importer();
loader = imp.Run(GetLoaderType(loaderId));
LoaderProcessor lp = new LoaderProcessor(loader);
lp.ExecuteLoader();
}
private void DecrementTaskCounter()
{
Console.WriteLine(string.Format("Decrementing task counter with threadId: {0}",Thread.CurrentThread.ManagedThreadId) );
taskCounter--;
}
I see a few issues with your code that can potentially lead to some hard to track-down bugs. First, if using a counter that all of the tasks can potentially be reading and writing to at the same time, try using Interlocked. For example:
Interlocked.Increment(ref _taskCounter); // or Interlocked.Decrement(ref _taskCounter);
If I understand what you're trying to accomplish, I think what you want to do is to use a timer that you re-schedule after each group of tasks is finished.
public class Worker
{
private System.Threading.Timer _timer;
private int _timeUntilNextCall = 3600000;
public void Start()
{
_timer = new Timer(new TimerCallback(QueuePeek), null, 0, Timeout.Infinite);
}
private void QueuePeek(object state)
{
int numberOfTasks = 5;
Task[] tasks = new Task[numberOfTasks];
for(int i = 0; i < numberOfTasks; i++)
{
tasks[i] = new Task(() =>
{
DoLoad();
});
tasks[i].Start();
}
// When all tasks are complete, set to run this method again in x milliseconds
Task.Factory.ContinueWhenAll(tasks, (t) => { _timer.Change(_timeUntilNextCall, Timeout.Infinite); });
}
private void DoLoad() { }
}

TPL Dataflow Blocks

Question: Why using a WriteOnceBlock (or BufferBlock) for getting back the answer (like sort of callback) from another BufferBlock<Action> (getting back the answer happens in that posted Action) causes a deadlock (in this code)?
I thought that methods in a class can be considered as messages that we are sending to the object (like the original point of view about OOP that was proposed by - I think - Alan Kay). So I wrote this generic Actor class that helps to convert and ordinary object to an Actor (Of-course there are lots of unseen loopholes here because of mutability and things, but that's not the main concern here).
So we have these definitions:
public class Actor<T>
{
private readonly T _processor;
private readonly BufferBlock<Action<T>> _messageBox = new BufferBlock<Action<T>>();
public Actor(T processor)
{
_processor = processor;
Run();
}
public event Action<T> Send
{
add { _messageBox.Post(value); }
remove { }
}
private async void Run()
{
while (true)
{
var action = await _messageBox.ReceiveAsync();
action(_processor);
}
}
}
public interface IIdGenerator
{
long Next();
}
Now; why this code works:
static void Main(string[] args)
{
var idGenerator1 = new IdInt64();
var idServer1 = new Actor<IIdGenerator>(idGenerator1);
const int n = 1000;
for (var i = 0; i < n; i++)
{
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(answer.Receive());
}, TaskCreationOptions.LongRunning); // Runs on a separate new thread
t.Start();
}
Console.WriteLine("press any key you like! :)");
Console.ReadKey();
Trace.Flush();
}
And this code does not work:
static void Main(string[] args)
{
var idGenerator1 = new IdInt64();
var idServer1 = new Actor<IIdGenerator>(idGenerator1);
const int n = 1000;
for (var i = 0; i < n; i++)
{
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(answer.Receive());
}, TaskCreationOptions.PreferFairness); // Runs and is managed by Task Scheduler
t.Start();
}
Console.WriteLine("press any key you like! :)");
Console.ReadKey();
Trace.Flush();
}
Different TaskCreationOptions used here to create Tasks. Maybe I am wrong about TPL Dataflow concepts here, just started to use it (A [ThreadStatic] hidden somewhere?).
The problematic issue with your code is this part: answer.Receive().
When you move it inside the action the deadlock doesn't happen:
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
Trace.WriteLine(answer.Receive());
};
idServer1.Send += action;
});
t.Start();
So why is that? answer.Receive();, as opposed to await answer.ReceiveAsnyc(); blocks the thread until an answer is returned. When you use TaskCreationOptions.LongRunning each task gets its own thread, so there's no problem, but without it (the TaskCreationOptions.PreferFairness is irrelevant) all the thread pool threads are busy waiting and so everything is much slower. It doesn't actually deadlock, as you can see when you use 15 instead of 1000.
There are other solutions that help understand the problem:
Increasing the thread pool with ThreadPool.SetMinThreads(1000, 0); before the original code.
Using ReceiveAsnyc:
Task.Run(async () =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(await answer.ReceiveAsync());
});

Categories

Resources