Unit tests testing thread safety - Object not available randomly - c#

We have some legacy code that tests thread safety on a number of classes. A recent hardware upgrade (from 2 to 4 core) is presenting random failures with an exception accessing an item from List<>.
[Test]
public void CheckThreadSafeInThreadPool()
{
Console.WriteLine("Initialised ThreadLocalDataContextStore...");
var container = new ContextContainerTest();
Console.WriteLine("Starting...");
container.StartPool();
while (container.ThreadNumber < 5)
{
Thread.Sleep(1000);
}
foreach (var message in container.Messages)
{
Console.WriteLine(message);
if (message.Contains("A supposedly new thread is able to see the old value"))
{
Assert.Fail("Thread leaked values - not thread safe");
}
}
Console.WriteLine("Complete");
}
public class ContextContainerTest
{
private ThreadLocalDataContextStore store;
public int ThreadNumber;
public List<string> Messages;
public void StartPool()
{
Messages = new List<string>();
store = new ThreadLocalDataContextStore();
store.ClearContext();
var msoContext = new MsoContext();
msoContext.Principal = new GenericPrincipal(new GenericIdentity("0"), null);
store.StoreContext(msoContext);
for (var counter = 0; counter < 5; counter++)
{
Messages.Add(string.Format("Assigning work item {0}", counter));
ThreadPool.QueueUserWorkItem(ExecuteMe, counter);
}
}
public void ExecuteMe(object input)
{
string hashCode = Thread.CurrentThread.GetHashCode().ToString();
if (store.GetContext() == null || store.GetContext().Principal == null)
{
Messages.Add(string.Format("[{0}] A New Thread", hashCode));
var msoContext = new MsoContext();
msoContext.Principal = new GenericPrincipal(new GenericIdentity("2"), null);
store.StoreContext(msoContext);
}
else if (store.GetContext().Principal.Identity.Name == "1")
{
Messages.Add(string.Format("[{0}] Thread reused", hashCode));
}
else
{
Messages.Add(string.Format("[{0}] A supposedly new thread is able to see the old value {1}"
, hashCode, store.GetContext().GetDiagnosticInformation()));
}
Messages.Add(string.Format("[{0}] Context at starting: {1}", hashCode, store.GetContext().GetDiagnosticInformation()));
store.GetContext().SetAsCurrent(new GenericPrincipal(new GenericIdentity("99"), null));
Messages.Add(string.Format("[{0}] Context at End: {1}", hashCode, store.GetContext().GetDiagnosticInformation()));
store.GetContext().SetAsCurrent(new GenericPrincipal(new GenericIdentity("1"), null));
Thread.Sleep(80);
ThreadNumber++;
}
}
The failure is random, and occurs at the following section of code within the test itself;
foreach (var message in container.Messages)
{
Console.WriteLine(message);
if (message.Contains("A supposedly new thread is able to see the old value"))
{
Assert.Fail("Thread leaked values - not thread safe");
}
}
A subtle change resolves the issue, but someone is niggling that we should not need to do that, why is the message null if Messages is not and why does it work most of the time and not others.
if (message != null && message.Contains("A supposedly new thread is able to see the old value"))
{
}
Another solution was to change the List to be threadsafe, but that doesnt answer why the issue arose in the first place.

List<T> is not a thread safe element if you are using .Net 4 and above you can use ConcurrentBag<T> from System.Collection.Concurrent and if older you got to implement one yourself. See this might help.
Hope I was helpful.

Related

How to avoid collection modification during JSON serialization in looped multithreaded task?

I have a problem during serialization to JSON file, when using Newtonsoft.Json.
In a loop I am fiering tasks in various threads:
List<Task> jockeysTasks = new List<Task>();
for (int i = 1; i < 1100; i++)
{
int j = i;
Task task = Task.Run(async () =>
{
LoadedJockey jockey = new LoadedJockey();
jockey = await Task.Run(() => _scrapServices.ScrapSingleJockeyPL(j));
if (jockey.Name != null)
{
_allJockeys.Add(jockey);
}
UpdateStatusBar = j * 100 / 1100;
if (j % 100 == 0)
{
await Task.Run(() => _dataServices.SaveAllJockeys(_allJockeys)); //saves everything to JSON file
}
});
jockeysTasks.Add(task);
}
await Task.WhenAll(jockeysTasks);
And if (j % 100 == 0), it is rying to save the collection _allJockeys to the file (I will make some counter to make it more reliable, but that is not the point):
public void SaveAllJockeys(List<LoadedJockey> allJockeys)
{
if (allJockeys.Count != 0)
{
if (File.Exists(_jockeysFileName)) File.Delete(_jockeysFileName);
try
{
using (StreamWriter file = File.CreateText(_jockeysFileName))
{
JsonSerializer serializer = new JsonSerializer();
serializer.Serialize(file, allJockeys);
}
}
catch (Exception e)
{
dialog.ShowDialog("Could not save the results, " + e.ToString(), "Error");
}
}
}
During that time, as I belive, another tasks are adding new collection item to the collection, and it is throwing to me the exception:
Collection was modified; enumeration operation may not execute.
As I was reading in THE ARTICLE, you can change type of iteration to avoid an exception. As far as I know, I can not modify the way, how Newtonsoft.Json pack is doing it.
Thank you in advance for any tips how to avoid the exception and save the collection wihout unexpected changes.
You should probably inherit from List and use a ReaderWriterLock (https://learn.microsoft.com/en-us/dotnet/api/system.threading.readerwriterlock?view=netframework-4.8)
i.e. (not tested pseudo C#)
public class MyJockeys: List<LoadedJockey>
{
System.Threading.ReaderWriterLock _rw_lock = new System.Threading.ReaderWriterLock();
public new Add(LoadedJockey j)
{
try
{
_rw_lock.AcquireWriterLock(5000); // or whatever you deem an acceptable timeout
base.Add(j);
}
finally
{
_rw_lock.ReleaseWriterLock();
}
}
public ToJSON()
{
try
{
_rw_lock.AcquireReaderLock(5000); // or whatever you deem an acceptable timeout
string s = ""; // Serialize here using Newtonsoft
return s;
}
finally
{
_rw_lock.ReleaseReaderLock();
}
}
// And override Remove and anything else you need
}
Get the idea?
Hope this helps.
Regards,
Adam.
I tied to use ToList() on the collection, what creates copy of the list, with positive effect.

How to cache slow resource initialisation from C# Web API REST Server?

Context
I am trying to implement a REST API web service that "wraps" an existing C program.
Problem / Goal
Given that the C program has slow initialisation time and high RAM usage when I tell it to open a specific folder (assume this cannot be improved), I am thinking of caching the C handle/object, so the next time a GET request hits the same folder, I can use the existing handle.
What I've tried
First declare a static dictionary mapping from folder path to handle:
static ConcurrentDictionary<string, IHandle> handles = new ConcurrentDictionary<string, IHandle>();
In my GET function:
IHandle theHandle = handles.GetOrAdd(dir.Name, x => {
return new Handle(x); //this is the slow and memory-intensive function
});
This way, whenever a specific folder has been GET'd before, it will already have a handle ready for me to use.
Why it's not good
So now I run the risk of running out of memory if too many folders are cached simultaneously. How might I add a GC-like background process to TryRemove() and call IHandle.Dispose() on old handles, perhaps in a Least Recently Used or Least Frequently Used policy? Ideally it should start triggering only upon low physical memory available.
I have tried adding the following statement in the GET function, but it seems too hacky and is very limited in function. This way works OK only if I always want handles to expire after 10 seconds, and it does not restart the timer if a subsequent request comes in within 10 seconds.
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
System.Threading.Thread.Sleep(10000);
if (handles.TryRemove(dir.Name, out var handle2))
handle2.Dispose();
});
What this question is not
I don't think caching the output is the solution here. After I return the result of this GET request (it's just the metadata of the folder contents), there might be another GET request for more in-depth data, which requires calling Handle's methods.
I hope my question is clear enough!
Handles closing on low memory.
ConcurrentQueue<(string, IHandle)> handles = new ConcurrentQueue<(string, IHandle)>();
void CheckMemory_OptionallyReleaseOldHandles()
{
var performance = new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes");
while (performance.NextValue() <= YOUR_TRESHHOLD)
{
if (handles.TryDequeue(out ValueTuple<string, IHandle> value))
{
value.Item2.Dispose();
}
}
}
Your Get method.
IHandle GetHandle()
{
IHandle theHandle = handles.FirstOrDefault(v => v.Item1 == dir.Name).Item2;
if (theHandle == null)
{
theHandle = new Handle(dir.Name);
handles.Enqueue((dir.Name, theHandle));
}
return theHandle;
});
Your background task.
void SetupMemoryCheck()
{
Action<CancellationToken> BeCheckingTheMemory = ct =>
{
for(;;)
{
if (ct.IsCancellationRequested)
{
break;
}
CheckMemory_OptionallyReleaseOldHandles();
Thread.Sleep(500);
};
};
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
var tf = new TaskFactory(ct, TaskCreationOptions.LongRunning, TaskContinuationOptions.None, TaskScheduler.Current);
tf.StartNew(() => BeCheckingTheMemory(ct));
});
}
I suppose the collection will have little elems so there is no need to dictionary.
I did’t catch your LRU/LFU demand first time. Here you can check for some hybrid LRU/LFU cache model.
Handles closing on low memory.
/*
* string – handle name,
* IHandle – the handle,
* int – hit count,
*/
ConcurrentDictionary<string, (IHandle, int)> handles = new ConcurrentDictionary<string, (IHandle, int)>();
void FreeResources()
{
if (handles.Count == 0)
{
return;
}
var performance = new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes");
while (performance.NextValue() <= YOUR_TRESHHOLD)
{
int maxIndex = (int)Math.Ceiling(handles.Count / 2.0d);
KeyValuePair<string, (IHandle, int)> candidate = handles.First();
for (int index = 1; index < maxIndex; index++)
{
KeyValuePair<string, (IHandle, int)> item = handles.ElementAt(index);
if(item.Value.Item2 < candidate.Value.Item2)
{
candidate = item;
}
}
candidate.Value.Item1.Dispose();
handles.TryRemove(candidate.Key, out _);
}
}
Get method.
IHandle GetHandle(Dir dir, int handleOpenAttemps = 1)
{
if(handles.TryGetValue(dir.Name, out (IHandle, int) handle))
{
handle.Item2++;
}
else
{
if(new System.Diagnostics.PerformanceCounter("Memory", "Available MBytes").NextValue() < YOUR_TRESHHOLD)
{
FreeResources();
}
try
{
handle.Item1 = new Handle(dir.Name);
}
catch (OutOfMemoryException)
{
if (handleOpenAttemps == 2)
{
return null;
}
FreeResources();
return GetHandle(dir, handleOpenAttemps++);
}
catch (Exception)
{
// Your handling.
}
handle.Item2 = 1;
handles.TryAdd(dir.Name, handle);
}
return handle.Item1;
}
Background task.
void SetupMemoryCheck()
{
Action<CancellationToken> BeCheckingTheMemory = ct =>
{
for (;;)
{
if (ct.IsCancellationRequested) break;
FreeResources();
Thread.Sleep(500);
}
};
HostingEnvironment.QueueBackgroundWorkItem(ct =>
{
new Task(() => BeCheckingTheMemory(ct), TaskCreationOptions.LongRunning).Start();
});
}
If you expect big collection the for loop could be optimised.

How to create working thread that start on signal and also rest

I want to create a working thread that starts on signal(task was added to shared task list), and will rest when done.
Requirements:
Other system threads can add task any time
the working thread should rest if it has nothing to-do
if more tasks are added while the working thread is active, it should complete them too.
working thread can rest for hours -> work arrive like raindrops(sometime we have a storm)
After thread adds new task to the workingList(shared list) it signals(AutoRestEvent.Set()) the working-thread to start working.
I have race condition between the Set() and the WaitOne() functions.
public static void AddWork(object obj)
{
Monitor.Enter(_syncO);
_workingList.Add(obj);
_signal.Set();
Monitor.Exit(_syncO);
}
static object _syncO = new object();
static AutoResetEvent _signal = new AutoResetEvent(false);
static List<object> _workingList = new List<object>();
static void DoWork()
{
Thread tradeThread1 = new Thread(() =>
{
Thread.CurrentThread.IsBackground = true;
string tradeMessage = string.Empty;
while (true)
{
Monitor.Enter(_syncO);
var arr = _workingList.ToArray();
Monitor.Exit(_syncO);
// race condition when the set happens just before the
// thread was locked
if (arr.Count() == 0)
_signal.WaitOne();
Monitor.Enter(_syncO);
arr = _workingList.ToArray();
Monitor.Exit(_syncO);
int count = 0;
var deleteList = new List<object>();
while (true)
{
foreach (var item in arr)
{
// the value is changing every iteration.
// this is why I need the Sleep at the end
bool b = Handle4(item);
if (b)
deleteList.Add(item);
}
if (count == 100)
break;
Thread.Sleep(100);
count++;
}
// remove done tasks from _workingList
RemoveItems(deleteList);
// we can't close, so we re-set footprints(notifications) on our price cable. -> execute on broker tick(the right tick)
// reHang trade on cable
foreach (var item in _workingList)
{
// re-use the undeleted tasks
}
}
});
tradeThread1.Start();
}
Based on the help of #John Wu I come up with the following solution:
The BlockingCollection will act as gate for the working thread. each iteration will copy the new task
private static void AddWork(object tap)
{
queue.Add(tap);
}
private static BlockingCollection<object> queue = new BlockingCollection<object>();
static void Work()
{
Thread tradeThread1 = new Thread(() =>
{
while (true)
{
var workingList = new List<object>();
var deleteList = new List<object>();
var reEvaluateList = new List<object>();
while (true)
{
if (workingList.Count() == 0)
{
// thread will wait until new work arrives -> it will start working again on the first task to come in.
workingList.Add(queue.Take());
}
foreach (var item in workingList)
{
bool b = Handle4(item);
if (b)
deleteList.Add(item);
else
item.ExitCounter++;
if (item.ExitCounter == 1000)
reEvaluateList.Add(item);
}
RemoveItems(deleteList, workingList);
// we can't close, so we re-set
// we reevaluate tasks that are working for X amount of time and didn't finish
foreach (var item in reEvaluateList)
ReEvaluate(item);
RemoveItems(reEvaluateList, workingList);
// wait.. the item change-over-time, so a wait is a type of calculation.
Thread.Sleep(100);
// we want to avoid locking if we still have task to process
if (queue.Count() == 0)
continue;
// add new work to local list
workingList.Add(queue.Take());
}
}
});
tradeThread1.Start();
}
It feels a-little-bit messy. Any ideas on how to make it better?

Using threads to parse multiple Html pages faster

Here's what I'm trying to do:
Get one html page from url which contains multiple links inside
Visit each link
Extract some data from visited link and create object using it
So far All i did is just simple and slow way:
public List<Link> searchLinks(string name)
{
List<Link> foundLinks = new List<Link>();
// getHtmlDocument() just returns HtmlDocument using input url.
HtmlDocument doc = getHtmlDocument(AU_SEARCH_URL + fixSpaces(name));
var link_list = doc.DocumentNode.SelectNodes(#"/html/body/div[#id='parent-container']/div[#id='main-content']/ol[#id='searchresult']/li/h2/a");
foreach (var link in link_list)
{
// TODO Threads
// getObject() creates object using data gathered
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
}
return foundLinks;
}
To make it faster/efficient I need to implement threads, but I'm not sure how i should approach it, because I can't just randomly start threads, I need to wait for them to finish, thread.Join() kind of solves 'wait for threads to finish' problem, but it becomes not fast anymore i think, because threads will be launched after earlier one is finished.
The simplest way to offload the work to multiple threads would be to use Parallel.ForEach() in place of your current loop. Something like this:
Parallel.ForEach(link_list, link =>
{
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
});
I'm not sure if there are other threading concerns in your overall code. (Note, for example, that this would no longer guarantee that the data would be added to foundLinks in the same order.) But as long as there's nothing explicitly preventing concurrent work from taking place then this would take advantage of threading over multiple CPU cores to process the work.
Maybe you should use Thread pool :
Example from MSDN :
using System;
using System.Threading;
public class Fibonacci
{
private int _n;
private int _fibOfN;
private ManualResetEvent _doneEvent;
public int N { get { return _n; } }
public int FibOfN { get { return _fibOfN; } }
// Constructor.
public Fibonacci(int n, ManualResetEvent doneEvent)
{
_n = n;
_doneEvent = doneEvent;
}
// Wrapper method for use with thread pool.
public void ThreadPoolCallback(Object threadContext)
{
int threadIndex = (int)threadContext;
Console.WriteLine("thread {0} started...", threadIndex);
_fibOfN = Calculate(_n);
Console.WriteLine("thread {0} result calculated...", threadIndex);
_doneEvent.Set();
}
// Recursive method that calculates the Nth Fibonacci number.
public int Calculate(int n)
{
if (n <= 1)
{
return n;
}
return Calculate(n - 1) + Calculate(n - 2);
}
}
public class ThreadPoolExample
{
static void Main()
{
const int FibonacciCalculations = 10;
// One event is used for each Fibonacci object.
ManualResetEvent[] doneEvents = new ManualResetEvent[FibonacciCalculations];
Fibonacci[] fibArray = new Fibonacci[FibonacciCalculations];
Random r = new Random();
// Configure and start threads using ThreadPool.
Console.WriteLine("launching {0} tasks...", FibonacciCalculations);
for (int i = 0; i < FibonacciCalculations; i++)
{
doneEvents[i] = new ManualResetEvent(false);
Fibonacci f = new Fibonacci(r.Next(20, 40), doneEvents[i]);
fibArray[i] = f;
ThreadPool.QueueUserWorkItem(f.ThreadPoolCallback, i);
}
// Wait for all threads in pool to calculate.
WaitHandle.WaitAll(doneEvents);
Console.WriteLine("All calculations are complete.");
// Display the results.
for (int i= 0; i<FibonacciCalculations; i++)
{
Fibonacci f = fibArray[i];
Console.WriteLine("Fibonacci({0}) = {1}", f.N, f.FibOfN);
}
}
}

NullReferenceException when creating a thread

I was looking at this thread on creating a simple thread pool. There, I came across #MilanGardian's response for .NET 3.5 which was elegant and served my purpose:
using System;
using System.Collections.Generic;
using System.Threading;
namespace SimpleThreadPool
{
public sealed class Pool : IDisposable
{
public Pool(int size)
{
this._workers = new LinkedList<Thread>();
for (var i = 0; i < size; ++i)
{
var worker = new Thread(this.Worker) { Name = string.Concat("Worker ", i) };
worker.Start();
this._workers.AddLast(worker);
}
}
public void Dispose()
{
var waitForThreads = false;
lock (this._tasks)
{
if (!this._disposed)
{
GC.SuppressFinalize(this);
this._disallowAdd = true; // wait for all tasks to finish processing while not allowing any more new tasks
while (this._tasks.Count > 0)
{
Monitor.Wait(this._tasks);
}
this._disposed = true;
Monitor.PulseAll(this._tasks); // wake all workers (none of them will be active at this point; disposed flag will cause then to finish so that we can join them)
waitForThreads = true;
}
}
if (waitForThreads)
{
foreach (var worker in this._workers)
{
worker.Join();
}
}
}
public void QueueTask(Action task)
{
lock (this._tasks)
{
if (this._disallowAdd) { throw new InvalidOperationException("This Pool instance is in the process of being disposed, can't add anymore"); }
if (this._disposed) { throw new ObjectDisposedException("This Pool instance has already been disposed"); }
this._tasks.AddLast(task);
Monitor.PulseAll(this._tasks); // pulse because tasks count changed
}
}
private void Worker()
{
Action task = null;
while (true) // loop until threadpool is disposed
{
lock (this._tasks) // finding a task needs to be atomic
{
while (true) // wait for our turn in _workers queue and an available task
{
if (this._disposed)
{
return;
}
if (null != this._workers.First && object.ReferenceEquals(Thread.CurrentThread, this._workers.First.Value) && this._tasks.Count > 0) // we can only claim a task if its our turn (this worker thread is the first entry in _worker queue) and there is a task available
{
task = this._tasks.First.Value;
this._tasks.RemoveFirst();
this._workers.RemoveFirst();
Monitor.PulseAll(this._tasks); // pulse because current (First) worker changed (so that next available sleeping worker will pick up its task)
break; // we found a task to process, break out from the above 'while (true)' loop
}
Monitor.Wait(this._tasks); // go to sleep, either not our turn or no task to process
}
}
task(); // process the found task
this._workers.AddLast(Thread.CurrentThread);
task = null;
}
}
private readonly LinkedList<Thread> _workers; // queue of worker threads ready to process actions
private readonly LinkedList<Action> _tasks = new LinkedList<Action>(); // actions to be processed by worker threads
private bool _disallowAdd; // set to true when disposing queue but there are still tasks pending
private bool _disposed; // set to true when disposing queue and no more tasks are pending
}
public static class Program
{
static void Main()
{
using (var pool = new Pool(5))
{
var random = new Random();
Action<int> randomizer = (index =>
{
Console.WriteLine("{0}: Working on index {1}", Thread.CurrentThread.Name, index);
Thread.Sleep(random.Next(20, 400));
Console.WriteLine("{0}: Ending {1}", Thread.CurrentThread.Name, index);
});
for (var i = 0; i < 40; ++i)
{
var i1 = i;
pool.QueueTask(() => randomizer(i1));
}
}
}
}
}
I am using this as follows:
static void Main(string[] args)
{
...
...
while(keepRunning)
{
...
pool.QueueTask(() => DoTask(eventObject);
}
...
}
private static void DoTask(EventObject e)
{
// Do some computations
pool.QueueTask(() => DoAnotherTask(eventObject)); // this is a relatively smaller computation
}
I am getting the following exception after running the code for about two days:
Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Generic.LinkedList`1.InternalInsertNodeBefore(LinkedListNode`1 node, LinkedListNode`1 newNode)
at System.Collections.Generic.LinkedList`1.AddLast(T value)
at MyProg.Pool.Worker()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
I am unable to figure out what is causing this as I am unable to get this error again. Any suggestions on how to fix this?
Seems like access to _workers linked list is not properly synchronized. Consider this scenario:
Lets assume that at some point this._workets list contains one item.
First thread calls this._workers.AddLast(Thread.CurrentThread); but gets interrupted at a very special place - inside AddLast() method:
public void AddLast(LinkedListNode<T> node)
{
this.ValidateNewNode(node);
if (this.head == null)
{
this.InternalInsertNodeToEmptyList(node);
}
else
{
// here we got interrupted - the list was not empty,
// but it would be pretty soon, and this.head becomes null
// InternalInsertNodeBefore() does not expect that
this.InternalInsertNodeBefore(this.head, node);
}
node.list = (LinkedList<T>) this;
}
Other thread calls this._workers.RemoveFirst();. There is no lock() around that statement so it completes and now list is empty. AddLast() now should call InternalInsertNodeToEmptyList(node); but it can't as the condition was already evaluated.
Putting a simple lock(this._tasks) around single this._workers.AddLast() line should prevent such scenario.
Other bad scenarios include adding item to the same list at the same time by two threads.
Think I found the issue. The code sample has a missed lock()
private void Worker()
{
Action task = null;
while (true) // loop until threadpool is disposed
{
lock (this._tasks) // finding a task needs to be atomic
{
while (true) // wait for our turn in _workers queue and an available task
{
....
}
}
task(); // process the found task
this._workers.AddLast(Thread.CurrentThread);
task = null;
}
}
The lock should be extended or wrapped around this._workers.AddLast(Thread.CurrentThread);
If you look at the other code that modifies LinkedList (Pool.QueueTask), it is wrapped in a lock.

Categories

Resources