Concurrent readers in .NET 4.0 C#

Concurrent readers in .NET 4.0 C# - c#

I came across this puzzle in a programming contest where I am not supposed to use any inbuilt concurrent .NET 4.0 data structures.
I have to override the ToString() method and this method should support concurrent readers.
This is the solution which I came up with but I strongly believe it is does not support concurrent readers. How can I support concurrent readers without locking the list?
class Puzzle
{
private List<string> listOfStrings = new List<string>();
public void Add(string item)
{
lock (listOfStrings)
{
if (item != null)
{
listOfStrings.Add(item);
}
}
}
public override string ToString()
{
lock (listOfStrings)
{
return string.Join(",", listOfStrings);
}
}
}

Because List<T> is just a T[] under the hood, there's nothing unsafe about reading the list from multiple threads. Your issue is that reading while writing and writing concurrently are unsafe. Because of this, you should use ReaderWriterLockSlim.
class Puzzle
{
private List<string> listOfStrings = new List<string>();
private ReaderWriterLockSlim listLock = new ReaderWriterLockSlim();
public void Add(string item)
{
listLock.EnterWriteLock();
try
{
listOfStrings.Add(item);
}
finally
{
listlock.ExitWriteLock();
}
}
public override string ToString()
{
listLock.EnterReadLock();
try
{
return string.Join(",", listOfStrings);
}
finally
{
listLock.ExitReadLock();
}
}
}
ReaderWriterLockSlim will allow multiple threads to enter the lock in read mode, but they will all block if/while something is in the lock in write mode. Likewise, entering the lock in write mode will block until all threads have exited the lock in read mode. The practical outworking is that multiple reads can happen at the same time as long as nothing is writing, and one write operation can happen at a time as long as nothing is reading.

Since it doesn't look like Remove isn't a requirement, why can't you just return a string?
class Puzzle
{
private string Value { get; set; }
public Puzzle()
{
Value = String.Empty;
}
public void Add(String item)
{
Value += "," + item;
}
public override string ToString()
{
return Value;
}
}

Related

Threading With List Property

public static class People
{
List<string> names {get; set;}
}
public class Threading
{
public static async Task DoSomething()
{
var t1 = new Task1("bob");
var t2 = new Task1("erin");
await Task.WhenAll(t1,t2);
}
private static async Task Task1(string name)
{
await Task.Run(() =>
{
if(People.names == null) People.names = new List<string>();
Peoples.names.Add(name);
}
}
}
Is that dangerous to initialize a list within a thread? Is it possible that both threads could initialize the list and remove one of the names?
So I was thinking of three options:
Leave it like this since it is simple - only if it is safe though
Do same code but use a concurrentBag - I know thread safe but is initialize safe
Using [DataMember(EmitDefaultValue = new List())] and then just do .Add in Task1 and not worry about initializing. But the only con to this is sometimes the list wont need to be used at all and it seems like a waste to initialize it everytime.

Okay so what I figured worked best for my case was I used a lock statement.
public class Class1
{
private static Object thisLock = new Object();
private static async Task Task1(string name)
{
await Task.Run(() =>
{
AddToList(name);
}
}
private static AddToList(string name)
{
lock(thisLock)
{
if(People.names == null) People.names = new List<string>();
People.names.Add(name);
}
}
}
public static class People
{
public static List<string> names {get; set;}
}

for a simple case like this the easiest way to get thread-safety is using the lock statement:
public static class People
{
static List<string> _names = new List<string>();
public static void AddName(string name)
{
lock (_names)
{
_names.Add(name);
}
}
public static IEnumerable<string> GetNames()
{
lock(_names)
{
return _names.ToArray();
}
}
}
public class Threading
{
public static async Task DoSomething()
{
var t1 = new Task1("bob");
var t2 = new Task1("erin");
await Task.WhenAll(t1,t2);
}
private static async Task Task1(string name)
{
People.AddName(name);
}
}
of course it's not very usefull (why not just add without the threads) - but I hope you get the idea.
If you don't use some kind of lock and concurrently read and write to a List you will most likely get an InvalidOperationException saying the collection has changed during read.
Because you don't really know when a user will use the collection you might return the easiest way to get thread-saftey is copying the collection into an array and returning this.
If this is not practical (collection to large, ..) you have to use the classes in System.Collections.Concurrrent for example the BlockingCollection but those are a bit more involved.

Is HashSet<T>.Contains() efficient with large lists, multiple threads?

I am writing a multi-threaded program to scrape a certain site and collect ID's. It is storing these ID's in a shared static List<string> object.
When any item is added to the List<string>, it is first checked against a HashSet<string> which contains a blacklist of already collected ID's.
I do this as follows:
private static HashSet<string> Blacklist = new HashSet<string>();
private static List<string> IDList = new List<string>();
public static void AddIDToIDList(string ID)
{
lock (IDList)
{
if (IsIDBlacklisted(ID))
return;
IDList.Add(ID);
}
}
public static bool IsIDBlacklisted(string ID)
{
lock (Blacklist)
{
if (Blacklist.Contains(ID))
return true;
}
return false;
}
The Blacklist is saved to a file after finishing and is loaded every time the program starts, therefore, it will get pretty large over time (up to 50k records). Is there a more efficient way to not only store this blacklist, but also to check each ID against it?
Thanks!

To improve performance try to use ConcurrentBag<T> collection. Also there is no need to lock BlackList because it's not being modified e.g.:
private static HashSet<string> Blacklist = new HashSet<string>();
private static ConcurrentBag<string> IDList = new ConcurrentBag<string>();
public static void AddIDToIDList(string ID)
{
if (Blacklist.Contains(ID))
{
return;
}
IDList.Add(ID);
}

Read operations are thread safe on HashSet, as long as Blacklist is not being modified you don't need to lock on it. Also you should lock inside the blacklist check so the lock is taken less often, this also will increase your performance.
private static HashSet<string> Blacklist = new HashSet<string>();
private static List<string> IDList = new List<string>();
public static void AddIDToIDList(string ID)
{
if (IsIDBlacklisted(ID))
return;
lock (IDList)
{
IDList.Add(ID);
}
}
public static bool IsIDBlacklisted(string ID)
{
return Blacklist.Contains(ID);
}
If Blacklist is being modified the best way to lock around it is using a ReaderWriterLock (use the slim version if you are using a newer .NET)
private static HashSet<string> Blacklist = new HashSet<string>();
private static List<string> IDList = new List<string>();
private static ReaderWriterLockSlim BlacklistLock = new ReaderWriterLockSlim();
public static void AddIDToIDList(string ID)
{
if (IsIDBlacklisted(ID))
return;
lock (IDList)
{
IDList.Add(ID);
}
}
public static bool IsIDBlacklisted(string ID)
{
BlacklistLock.EnterReadLock();
try
{
return Blacklist.Contains(ID);
}
finally
{
BlacklistLock.ExitReadLock();
}
}
public static bool AddToIDBlacklist(string ID)
{
BlacklistLock.EnterWriteLock();
try
{
return Blacklist.Add(ID);
}
finally
{
BlacklistLock.ExitWriteLock();
}
}

Two considerations - First, if you use the indexer of a .NET dictionary (i.e., System.Collections.Generic.Dictionary) like this (rather than calling the Add() method):
idList[id] = id;
then it will add the item if it doesn't already exist - otherwise, it will replace the existing item at that key. Second, you can use the ConcurrentDictionary (in the System.Collections.Concurrent namespace) for thread-safety so you don't have to worry about the locking yourself. Same comment applies about using the indexer.

In your scenario, yes, HashSet is the best option for this since it contains one value to look up unlike a Dictionary which requires a key and a value to do a lookup.
And ofcourse as others have said no need of locking HashSet if it is not being modified. and consider marking it as readonly.

lock statement saturation

I have this code:
class Program
{
static void Main(string[] args)
{
TestClass instanceOfClass = new TestClass();
while (true)
{
Thread threadTest = new Thread(new ParameterizedThreadStart(AddNewToClass));
threadTest.Start(instanceOfClass);
}
}
static void AddNewToClass(object parameter)
{
var instance = (TestClass)parameter;
while (true)
{
if (instance.Contains(1))
{
continue;
}
else
{
instance.AddNew(1);
}
}
}
}
class TestClass
{
public Dictionary<int, string> dictionary;
public TestClass()
{
dictionary = new Dictionary<int, string>();
}
public void AddNew(int test)
{
lock (dictionary)
{
dictionary.Add(test, "Test string");
}
}
public bool Contains(int test)
{
lock (dictionary)
{
if (dictionary.ContainsKey(test))
{
return true;
}
else
{
return false;
}
}
}
}
What I want to do, is to have several different threads that add/remove objects from a Dictionary. I tried running this and I get this exception:
An item with the same key has already been added.
Which seems extremely weird. As far as I know the lock statement should block the dictionary in question and TestClass.Contains(1) should always return true, and it is throwing an exception as it returned true more than once (therefore the exception).
Anyone knows why this might happen? thanks

Your Contains() method is atomic. So is your Add() method. AddNewToClass(), however, is not. One thread may get a result from Contains()...but there's no guarantee regarding when it might or might not be suspended (or resumed).
That's your race condition.

Your lock only protects the blocks that it surrounds - it is this that needs protection
static void AddNewToClass(object parameter)
{
var instance = (TestClass)parameter;
while (true)
{
if (instance.Contains(1))
{
continue;
}
else
{
instance.AddNew(1);
}
}
}
Between the if (instance.Contains(1)) and the instance.AddNew(1); you can get preempted.
If you went with something like instance.AddItemIfMissing(1);
public void AddItemIfMissing(int test)
{
lock (dictionary)
{
if (!dictionary.ContainsKey(test))
{
dictionary.Add(test, "Test string");
}
}
}
This would do what you want.

You have a racing condition. After you lock on, you need to check again if the dictionary already contains an item by the same key, since another thread might have added it before you acquired the lock. But why reinvent the wheel? There are numerous helper classes, like ConcurrentBag, in the Parallel Extensions library. Or use a well thought through Singleton Pattern.

static void AddNewToClass(object parameter)
{
var instance = (TestClass)parameter;
while (true)
{
if (instance.Contains(1))
{
continue;
} // **thread switch maybe happens here will cause your problem**
else
{
instance.AddNew(1);
}
}
}
So following is better
lock(instance)
{
if (instance.Contains(1))
{
continue;
} // **thread switch maybe happens here will cause your problem**
else
{
instance.AddNew(1);
}
}

Threading and List<> collection

I have List<string> collection called List<string> list.
I have two threads.
One thread is enumerating through all list elements and adding to collection.
Second thread is enumerating through all list elements and removing from it.
How can make it thread safe?
I tried creating global Object "MyLock" and using lock(MyLock) block in each thread function but it didn't work.
Can you help me?

If you have access to .NET 4.0 you can use the class ConcurrentQueue or a BlockingCollection with a ConcurrentQueue backing it. It does exactly what you are trying to do and does not require any locking. The BlockingCollection will make your thread wait if there is no items available in the list.
A example of removing from the ConcurrentQueue you do something like
ConcurrentQueue<MyClass> cq = new ConcurrentQueue<MyClass>();
void GetStuff()
{
MyClass item;
if(cq.TryDeqeue(out item))
{
//Work with item
}
}
This will try to remove a item, but if there are none available it does nothing.
BlockingCollection<MyClass> bc = BlockingCollection<MyClass>(new ConcurrentQueue<MyClass>());
void GetStuff()
{
if(!bc.IsCompleated) //check to see if CompleatedAdding() was called and the list is empty.
{
try
{
MyClass item = bc.Take();
//Work with item
}
catch (InvalidOpperationExecption)
{
//Take is marked as completed and is empty so there will be nothing to take
}
}
}
This will block and wait on the Take till there is something available to take from the list. Once you are done you can call CompleteAdding() and Take will throw a execption when the list becomes empty instead of blocking.

Without knowing more about your program and requirements, I'm going say that this is a "Bad Idea". Altering a List<> while iterating through it's contents will most likely throw an exception.
You're better off using a Queue<> instead of a List<>, as a Queue<> was designed with synchronization in mind.

You should be able to lock directly on your list:
lock(list) {
//work with list here
}
However adding/removing from the list while enumerating it will likely cause an exception...

Lock on the SyncRoot of your List<T>:
lock(list.SyncRoot)
{
}
More information on how to use it properly can be found here

You could implement your own version of IList<T> that wraps the underlying List<T> to provide locking on every method call.
public class LockingList<T> : IList<T>
{
public LockingList(IList<T> inner)
{
this.Inner = inner;
}
private readonly object gate = new object();
public IList<T> Inner { get; private set; }
public int IndexOf(T item)
{
lock (gate)
{
return this.Inner.IndexOf(item);
}
}
public void Insert(int index, T item)
{
lock (gate)
{
this.Inner.Insert(index, item);
}
}
public void RemoveAt(int index)
{
lock (gate)
{
this.Inner.RemoveAt(index);
}
}
public T this[int index]
{
get
{
lock (gate)
{
return this.Inner[index];
}
}
set
{
lock (gate)
{
this.Inner[index] = value;
}
}
}
public void Add(T item)
{
lock (gate)
{
this.Inner.Add(item);
}
}
public void Clear()
{
lock (gate)
{
this.Inner.Clear();
}
}
public bool Contains(T item)
{
lock (gate)
{
return this.Inner.Contains(item);
}
}
public void CopyTo(T[] array, int arrayIndex)
{
lock (gate)
{
this.Inner.CopyTo(array, arrayIndex);
}
}
public int Count
{
get
{
lock (gate)
{
return this.Inner.Count;
}
}
}
public bool IsReadOnly
{
get
{
lock (gate)
{
return this.Inner.IsReadOnly;
}
}
}
public bool Remove(T item)
{
lock (gate)
{
return this.Inner.Remove(item);
}
}
public IEnumerator<T> GetEnumerator()
{
lock (gate)
{
return this.Inner.ToArray().AsEnumerable().GetEnumerator();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
lock (gate)
{
return this.Inner.ToArray().GetEnumerator();
}
}
}
You would use this code like this:
var list = new LockingList<int>(new List<int>());
If you're using large lists and/or performance is an issue then this kind of locking may not be terribly performant, but in most cases it should be fine.
It is very important to notice that the two GetEnumerator methods call .ToArray(). This forces the evaluation of the enumerator before the lock is released thus ensuring that any modifications to the list don't affect the actual enumeration.
Using code like lock (list) { ... } or lock (list.SyncRoot) { ... } do not cover you against list changes occurring during enumerations. These solutions only cover against concurrent modifications to the list - and that's only if all callers do so within a lock. Also these solutions can cause your code to die if some nasty bit of code takes a lock and doesn't release it.
In my solution you'll notice I have a object gate that is a private variable internal to the class that I lock on. Nothing outside the class can lock on this so it is safe.
I hope this helps.

As others already said, you can use concurrent collections from the System.Collections.Concurrent namespace. If you can use one of those, this is preferred.
But if you really want a list which is just synchronized, you could look at the SynchronizedCollection<T>-Class in System.Collections.Generic.
Note that you had to include the System.ServiceModel assembly, which is also the reason why I don't like it so much. But sometimes I use it.

Is there any class in the .NET Framework to represent a holding container for objects?

I am looking for a class that defines a holding structure for an object. The value for this object could be set at a later time than when this container is created. It is useful to pass such a structure in lambdas or in callback functions etc.
Say:
class HoldObject<T> {
public T Value { get; set; }
public bool IsValueSet();
public void WaitUntilHasValue();
}
// and then we could use it like so ...
HoldObject<byte[]> downloadedBytes = new HoldObject<byte[]>();
DownloadBytes("http://www.stackoverflow.com", sender => downloadedBytes.Value = sender.GetBytes());
It is rather easy to define this structure, but I am trying to see if one is available in FCL. I also want this to be an efficient structure that has all needed features like thread safety, efficient waiting etc.
Any help is greatly appreciated.

Never seen a class like that, but should be pretty simple.
public class ObjectHolder<T>
{
private T value;
private ManualResetEvent waitEvent = new ManualResetEvent(false);
public T Value
{
get { return value; }
set
{
this.value = value;
ManualResetEvent evt = waitEvent;
if(evt != null)
{
evt.Set();
evt.Dispose();
evt = null;
}
}
}
public bool IsValueSet
{
get { return waitEvent == null; }
}
public void WaitUntilHasValue()
{
ManualResetEvent evt = waitEvent;
if(evt != null) evt.WaitOne();
}
}

What you're trying to accomplish feels a lot like a future. Early CTP's of .NET 4.0 TPL had a Future<T> class. With the RTM of .NET 4.0 it has been renamed to Task<T>. If you squint, you can see the resemblance between:
class HoldObject<T>
{
public T Value { get; set; }
public bool IsValueSet();
public void WaitUntilHasValue();
}
and
class Task<T>
{
public T Value { get }
public bool IsCompleted { get; }
public void Wait();
}
If you're not using .NET 4.0 yet, you can download the Reactive Extensions for .NET 3.5sp1. It contains a System.Threading.dll assembly that contains TPL for .NET 3.5.
While the Value is read-only, changing it can of course be done by the return value of the delegate you supply the task. Of course I'm not exactly sure if this meets your requirements, but your example can be written as follows:
var downloadBytesTask = Task<byte[]>.Factory.StartNew(() =>
DownloadBytes("http://www.stackoverflow.com"));
if (!downloadBytesTask.IsCompleted)
{
downloadBytesTask.Wait();
}
var bytes = downloadBytesTask.Value;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Concurrent readers in .NET 4.0 C# - c#

Since it doesn't look like Remove isn't a requirement, why can't you just return a string? class Puzzle { private string Value { get; set; } public Puzzle() { Value = String.Empty; } public void Add(String item) { Value += "," + item; } public override string ToString() { return Value; } }

Related

Threading With List Property

Is HashSet<T>.Contains() efficient with large lists, multiple threads?

lock statement saturation

Threading and List<> collection

Is there any class in the .NET Framework to represent a holding container for objects?

Categories

Resources