C# Parallel.ForEach - randomly writes a null value to the DataRow

C# Parallel.ForEach - randomly writes a null value to the DataRow - c#

Why does the following code at random times write null values to the DataRow object?
ConcurrentDictionary<long, DataRow> dRowDict = new ConcurrentDictionary<long, DataRow>();
foreach (long klucz in data.Keys)
{
DataRow row = table.NewRow();
dRowDict.TryAdd(klucz, row);
}
Parallel.ForEach(data.Keys, klucz =>
{
Dane prz = data[klucz] as Dane;
DataRow dr = dRowDict[klucz];
foreach (PropertyDescriptor prop in llProp)
{
dr[prop.Name] = prop.GetValue(prz) ?? DBNull.Value;
}
});
foreach (DataRow dRow in dRowDict.Values)
{
table.Rows.Add(dRow);
if (dRow["FILED"] == DBNull.Value)
{
MessageBox.Show("ERROR...NULL VALUE"); //why this happen?
}
}
A plain for loop does not cause this problem - why, where is my bug?
My question is - can I modify the properties of any object inside a parallel loop?
Parallel.ForEach(myData.AsEnumerable()..., value =>
{
Object x = new Object(); // or x = dict[key];
x.A = ...;
x.B = ...; //Can I do this and is it safe?
}

Modifying the properties of an object inside a parallel loop is only allowed if the object's class is thread-safe. In other word if the class was specifically designed to allow access to its members from multiple threads concurrently. Most classes are not designed this way, and instead they assume that will be accessed by one thread at a time.
From my experience thread-"unsafety" is considered the default, so if you are searching in the documentation of a specific class for a Thread Safety section, and you can't find it, you should assume that the class is not thread safe.
Accessing a non-thread-safe object from multiple threads concurrently results to "undefined behavior". This is a nice and formal synonym of "all kind of nastiness". It includes but it's not limited to random exceptions, corrupted state, lost updates, torn values, compromised security etc. These phenomena are non-deterministic in nature and often non-reproducible. So it is a strongly inadvisable practice.
The ADO.NET classes are thread-safe only for multithreaded read operations. Meaning that it is safe for multiple threads to read their properties concurrently. But when a thread is modifying an ADO.NET object, no other thread should access this object, either for reading or modifying it.

Related

Is HashSet<T> thread safe as a value of ConcurrentDictionary<TKey, HashSet<T>>?

If I have the following code:
var dictionary = new ConcurrentDictionary<int, HashSet<string>>();
foreach (var user in users)
{
if (!dictionary.ContainsKey(user.GroupId))
{
dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
dictionary[user.GroupId].Add(user.Id.ToString());
}
Is the act of adding an item into the HashSet inherently thread safe because HashSet is a value property of the concurrent dictionary?

No. Putting a container in a thread-safe container does not make the inner container thread safe.
dictionary[user.GroupId].Add(user.Id.ToString());
is calling HashSet's add after retrieving it from the ConcurrentDictionary. If this GroupId is looked up from two threads at once this would break your code with strange failure modes. I saw the result of one of my teammates making the mistake of not locking his sets, and it wasn't pretty.
This is a plausible solution. I'd do something different myself but this is closer to your code.
if (!dictionary.ContainsKey(user.GroupId))
{
dictionary.TryAdd(user.GroupId, new HashSet<string>());
}
var groups = dictionary[user.GroupId];
lock(groups)
{
groups.Add(user.Id.ToString());
}

No, the collection (the dictionary itself) is thread-safe, not whatever you put in it. You have a couple of options:
Use AddOrUpdate as #TheGeneral mentioned:
dictionary.AddOrUpdate(user.GroupId, new HashSet<string>(), (k,v) => v.Add(user.Id.ToString());
Use a concurrent collection, like the ConcurrentBag<T>:
ConcurrentDictionary<int, ConcurrentBag<string>>
Whenever you are building the Dictionary, as in your code, you should be better off accessing it as little as possible. Think of something like this:
var dictionary = new ConcurrentDictionary<int, ConcurrentBag<string>>();
var grouppedUsers = users.GroupBy(u => u.GroupId);
foreach (var group in grouppedUsers)
{
// get the bag from the dictionary or create it if it doesn't exist
var currentBag = dictionary.GetOrAdd(group.Key, new ConcurrentBag<string>());
// load it with the users required
foreach (var user in group)
{
if (!currentBag.Contains(user.Id.ToString())
{
currentBag.Add(user.Id.ToString());
}
}
}
If you actually want a built-in concurrent HashSet-like collection, you'd need to use ConcurrentDictionary<int, ConcurrentDictionary<string, string>>, and care either about the key or the value from the inner one.

C# Update entries of a dictionary in parallel?

No idea if this is possible, but rather than iterate over a dictionary and modify entries based on some condition, sequentially, I was wondering if it is possible to do this in parallel?
For example, rather than:
Dictionary<int, byte> dict = new Dictionary<int, byte>();
for (int i = 0; i < dict.Count; i++)
{
dict[i] = 255;
}
I'd like something like:
Dictionary<int, byte> dict = new Dictionary<int, byte>();
dict.Parallel(x=>x, <condition>, <function_to_apply>);
I realise that in order to build the indices for modifying the dict, we would need to iterate and build a list of ints... but I was wondering if there was some sneaky way to do this that would be both faster and more concise than the first example.
I could of course iterate through the dict and for each entry, spawn a new thread and run some code, return the value and build a new, updated dictionary, but that seems really overkill.
The reason I'm curious is that the <function_to_apply> might be expensive.

I could of course iterate through the dict and for each entry, spawn a new thread and run some code, return the value and build a new, updated dictionary, but that seems really overkill.
Assuming you don't need the dictionary while it's rebuilt it's not that much:
var newDictionary = dictionary.AsParallel()
.Select(kvp =>
/* do whatever here as long as
it works with the local kvp variable
and not the original dict */
new
{
Key = kvp.Key,
NewValue = function_to_apply(kvp.Key, kvp.Value)
})
.ToDictionary(x => x.Key,
x => x.NewValue);
Then lock whatever sync object you need and swap the new and old dictionaries.

First of all, I mostly agree with others recommending ConcurrentDictionary<> - it is designed to be thread-safe.
But if you are adventurous coder ;) and performance it super-critical for you, you could sometimes try doing what you (I suppose) is trying to do in case no new keys are added and no keys are removed from dictionary during your parallel manipulations:
int keysNumber = 1000000;
Dictionary<int, string> d = Enumerable.Range(1, keysNumber)
.ToDictionary(x => x, x => (string)null);
Parallel.For(1, keysNumber + 1, k => { d[k] = "Value" + k; /*Some complex logic might go here*/ });
To verify data consistency after these operations you can add:
Debug.Assert(d.Count == keysNumber);
for (int i = 1; i <= keysNumber; i++)
{
Debug.Assert(d[i] == "Value" + i);
}
Console.WriteLine("Successful");
WHY IT WORKS:
Basically we have created dictionary in advance from SINGLE main thread and then popullated it in parallel. What allows us to do that is that current Dictionary implementation (Microsoft does not guarantee that, but most likely won't ever change) defines it's structure solely on keys, and values are just assigned to corresponding cells. Since each key is being assigned a new value from single thread we do not have race condition, and since navigating the hashtable concurrently does not alter it, everything works fine.
But you should be really careful with such code and have very good reasons not to use ConcurrentDictionary.
PS: My main idea is not even a "hack" of using Dicrionary concurrently, but to draw attention that not always every data structure need to be concurrent. I saw ConcurrentDictionary<int, ConcurrentStack<...>>, while each stack object in dictionary could be accessed only from single thread and that is an overkill and doesn't make your performance better. Just keep in mind what are you affecting and what can go wrong with multithreading scenarios.

Using the Concurrent Dictionary - Thread Safe Collection Modification

Recently I was running into the following exception when using a generic dictionary
An InvalidOperationException has occurred. A collection was modified
I realized that this error was primarily because of thread safety issues on the static dictionary I was using.
A little background: I currently have an application which has 3 different methods that are related to this issue.
Method A iterates through the dictionary using foreach and returns a value.
Method B adds data to the dictionary.
Method C changes the value of the key in the dictionary.
Sometimes while iterating through the dictionary, data is also being added, which is the cause of this issue. I keep getting this exception in the foreach part of my code where I iterate over the contents of the dictionary. In order to resolve this issue, I replaced the generic dictionary with the ConcurrentDictionary and here are the details of what I did.
Aim : My main objective is to completely remove the exception
For method B (which adds a new key to the dictionary) I replaced .Add with TryAdd
For method C (which updates the value of the dictionary) I did not make any changes. A rough sketch of the code is as follows :
static public int ChangeContent(int para)
{
foreach (KeyValuePair<string, CustObject> pair in static_container)
{
if (pair.Value.propA != para ) //Pending cancel
{
pair.Value.data_id = prim_id; //I am updating the content
return 0;
}
}
return -2;
}
For method A - I am simply iterating over the dictionary and this is where the running code stops (in debug mode) and Visual Studio informs me that this is where the error occured.The code I am using is similar to the following
static public CustObject RetrieveOrderDetails(int para)
{
foreach (KeyValuePair<string, CustObject> pair in static_container)
{
if (pair.Value.cust_id.Equals(symbol))
{
if (pair.Value.OrderStatus != para)
{
return pair.Value; //Found
}
}
}
return null; //Not found
}
Are these changes going to resolve the exception that I am getting.
Edit:
It states on this page that the method GetEnumerator allows you to traverse through the elements in parallel with writes (although it may be outdated). Isnt that the same as using foreach ?

For modification of elements, one option is to manually iterate the dictionary using a for loop, e.g.:
Dictionary<string, string> test = new Dictionary<string, string>();
int dictionaryLength = test.Count();
for (int i = 0; i < dictionaryLength; i++)
{
test[test.ElementAt(i).Key] = "Some new content";
}
Be weary though, that if you're also adding to the Dictionary, you must increment dictionaryLength (or decrement it if you move elements) appropriately.
Depending on what exactly you're doing, and if order matters, you may wish to use a SortedDictionary instead.
You could extend this by updating dictionaryLength explicitly by recalling test.Count() at each iteration, and also use an additional list containing a list of keys you've already modified and so on and so forth if there's a danger of missing any, it really depends what you're doing as much as anything and what your needs are.
You can further get a list of keys using test.Keys.ToList(), that option would work as follows:
Dictionary<string, string> test = new Dictionary<string, string>();
List<string> keys = test.Keys.ToList();
foreach (string key in keys)
{
test[key] = "Some new content";
}
IEnumerable<string> newKeys = test.Keys.ToList().Except(keys);
if(newKeys.Count() > 0)
// Do it again or whatever.
Note that I've also shown an example of how to find out whether any new keys were added between you getting the initial list of keys, and completing iteration such that you could then loop round and handle the new keys.
Hopefully one of these options will suit (or you may even want to mix and match- for loop on the keys for example updating that as you go instead of the length) - as I say, it's as much about what precisely you're trying to do as much as anything.

Before doing foreach() try out copying container to a new instance
var unboundContainer = static_container.ToList();
foreach (KeyValuePair<string, CustObject> pair in unboundContainer)
Also I think updating Value property is not right from thread safety perspectives, refactor your code to use TryUpdate() instead.

synchronized collection thread safety

I have System.Collections.Generic.SynchronizedCollection shared collection. Our code uses .Net 4.0 Task library to span threads and pass the synchronized collection to the thread. So far threads has not been adding or removing items into the collection. But the new requirement which requires one of the thread has to remove items from the collection while the other thread just read the collection. Do I need to add lock before removing the items from the Collection? If so, would reader thread be thread safe? Or Suggest best way to get the thread safety?

No it is not fully thread-safe. Try the following in a simple Console-Application and see how it crashes with an exception:
var collection = new SynchronizedCollection<int>();
var n = 0;
Task.Run(
() =>
{
while (true)
{
collection.Add(n++);
Thread.Sleep(5);
}
});
Task.Run(
() =>
{
while (true)
{
Console.WriteLine("Elements in collection: " + collection.Count);
var x = 0;
if (collection.Count % 100 == 0)
{
foreach (var i in collection)
{
Console.WriteLine("They are: " + i);
x++;
if (x == 100)
{
break;
}
}
}
}
});
Console.ReadKey();
Note, that if you replace the SynchronizedCollection with a ConcurrentBag, you will get thread-safety:
var collection = new ConcurrentBag<int>();
SynchronizedCollection is simply not thread-safe in this application. Use Concurrent Collections instead.

As Alexander already pointed out the SynchronizedCollection is not thread safe for this scenario.
The SynchronizedCollection actually wraps a normal generic list and just delegates every call to the underlying list with a lock surrounding the call. This is also done in GetEnumerator. So the getting of the enumerator is synchronized but NOT the actual enumeration.
var collection = new SynchronizedCollection<string>();
collection.Add("Test1");
collection.Add("Test2");
collection.Add("Test3");
collection.Add("Test4");
var enumerator = collection.GetEnumerator();
enumerator.MoveNext();
collection.Add("Test5");
//The next call will throw a InvalidOperationException ("Collection was modified")
enumerator.MoveNext();
When using a foreach an enumerator will be called in this way. So adding a ToArray() before enumerating through this array will not work either as this will first enumerate into this array.
This enumeration could be faster when what you are doing inside of your foreach so it could reduce the probability of getting a concurrency issue.
As Richard pointed out: for true thread safety go for the System.Collections.Concurrent classes.

Yes, SynchronizedCollection will do the locking for you.
If you have multiple readers and just one writer, you may want to look at using a ReaderWriterLock, instead of SynchronizedCollection.
Also, if you are .Net 4+ then take a look at System.Collections.Concurrent. These classes have much better performance than SynchronizedCollection.

C# Locking mechanism - write only locking

In continuation for my latest ponders about locks in C# and .NET,
Consider the following scenario:
I have a class which contains a specific collection (for this example, i've used a Dictionary<string, int>) which is updated from a data source every few minutes using a specific method which it's body you can see below:
DataTable dataTable = dbClient.ExecuteDataSet(i_Query).GetFirstTable();
lock (r_MappingLock)
{
i_MapObj.Clear();
foreach (DataRow currRow in dataTable.Rows)
{
i_MapObj.Add(Convert.ToString(currRow[i_Column1]), Convert.ToInt32[i_Column2]));
}
}
r_MappingLock is an object dedicated to lock the critical section which refreshes the dictionary's contents.
i_MapObj is the dictionary object
i_Column1 and i_Column2 are the datatable's column names which contain the desired data for the mapping.
Now, I also have a class method which receives a string and returns the correct mapped int based on the mentioned dictionary.
I want this method to wait until the refresh method completes it's execution, so at first glance one would consider the following implementation:
lock (r_MappingLock)
{
int? retVal = null;
if (i_MapObj.ContainsKey(i_Key))
{
retVal = i_MapObj[i_Key];
}
return retVal;
}
This will prevent unexpected behaviour and return value while the dictionary is being updated, but another issue arises:
Since every thread which executes the above method tries to claim the lock, it means that if multiple threads try to execute this method at the same time, each will have to wait until the previous thread finished executing the method and try to claim the lock, and this is obviously an undesirable behaviour since the above method is only for reading purposes.
I was thinking of adding a boolean member to the class which will be set to true or false wether the dictionary is being updated or not and checking it within the "read only" method, but this arise other race-condition based issues...
Any ideas how to solve this gracefully?
Thanks again,
Mikey

Have a look at the built in ReaderWriterLock.

I would just switch to using a ConcurrentDictionary to avoid this situation altogether - manually locking is error-prone. Also as I can gather from "C#: The Curious ConcurrentDictionary", ConcurrentDictionary is already read-optimized.

Albin pointed out correctly at ReaderWriterLock. I will add an even nicer one: ReaderWriterGate by Jeffrey Richter. Enjoy!

You might consider creating a new dictionary when updating, instead of locking. This way, you will always have consistent results, but reads during updates would return previous data:
private volatile Dictionary<string, int> i_MapObj = new Dictionary<string, int>();
private void Update()
{
DataTable dataTable = dbClient.ExecuteDataSet(i_Query).GetFirstTable();
var newData = new Dictionary<string, int>();
foreach (DataRow currRow in dataTable.Rows)
{
newData.Add(Convert.ToString(currRow[i_Column1]), Convert.ToInt32[i_Column2]));
}
// Start using new data - reference assignments are atomic
i_MapObj = newData;
}
private int? GetValue(string key)
{
int value;
if (i_MapObj.TryGetValue(key, out value))
return value;
return null;
}

In C# 4.0 there is ReaderWriterLockSlim class that is a lot faster!
Almost as fast as a lock().
Keep the policy to disallow recursion (LockRecursionPolicy::NoRecursion) to keep performances so high.
Look at this page for more info.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Parallel.ForEach - randomly writes a null value to the DataRow - c#

Related

Is HashSet<T> thread safe as a value of ConcurrentDictionary<TKey, HashSet<T>>?

C# Update entries of a dictionary in parallel?

Using the Concurrent Dictionary - Thread Safe Collection Modification

synchronized collection thread safety

C# Locking mechanism - write only locking

Categories

Resources