I have System.Collections.Generic.SynchronizedCollection shared collection. Our code uses .Net 4.0 Task library to span threads and pass the synchronized collection to the thread. So far threads has not been adding or removing items into the collection. But the new requirement which requires one of the thread has to remove items from the collection while the other thread just read the collection. Do I need to add lock before removing the items from the Collection? If so, would reader thread be thread safe? Or Suggest best way to get the thread safety?
No it is not fully thread-safe. Try the following in a simple Console-Application and see how it crashes with an exception:
var collection = new SynchronizedCollection<int>();
var n = 0;
Task.Run(
() =>
{
while (true)
{
collection.Add(n++);
Thread.Sleep(5);
}
});
Task.Run(
() =>
{
while (true)
{
Console.WriteLine("Elements in collection: " + collection.Count);
var x = 0;
if (collection.Count % 100 == 0)
{
foreach (var i in collection)
{
Console.WriteLine("They are: " + i);
x++;
if (x == 100)
{
break;
}
}
}
}
});
Console.ReadKey();
Note, that if you replace the SynchronizedCollection with a ConcurrentBag, you will get thread-safety:
var collection = new ConcurrentBag<int>();
SynchronizedCollection is simply not thread-safe in this application. Use Concurrent Collections instead.
As Alexander already pointed out the SynchronizedCollection is not thread safe for this scenario.
The SynchronizedCollection actually wraps a normal generic list and just delegates every call to the underlying list with a lock surrounding the call. This is also done in GetEnumerator. So the getting of the enumerator is synchronized but NOT the actual enumeration.
var collection = new SynchronizedCollection<string>();
collection.Add("Test1");
collection.Add("Test2");
collection.Add("Test3");
collection.Add("Test4");
var enumerator = collection.GetEnumerator();
enumerator.MoveNext();
collection.Add("Test5");
//The next call will throw a InvalidOperationException ("Collection was modified")
enumerator.MoveNext();
When using a foreach an enumerator will be called in this way. So adding a ToArray() before enumerating through this array will not work either as this will first enumerate into this array.
This enumeration could be faster when what you are doing inside of your foreach so it could reduce the probability of getting a concurrency issue.
As Richard pointed out: for true thread safety go for the System.Collections.Concurrent classes.
Yes, SynchronizedCollection will do the locking for you.
If you have multiple readers and just one writer, you may want to look at using a ReaderWriterLock, instead of SynchronizedCollection.
Also, if you are .Net 4+ then take a look at System.Collections.Concurrent. These classes have much better performance than SynchronizedCollection.
Related
I created list of object. And i want to fill the list in different tasks. It looks correct but it doesn't work.
This is my code:
var splittedDataList = Extensions.ListExtensions.SplitList(source, 500);
// Create a list of tasks
var poolTasks = new List<Task>();
var objectList = new List<Car>();
for (int i = 0; i < splittedDataList.Count; i++)
{
var data = splittedDataList[i];
poolTasks.Add(Task.Factory.StartNew(() =>
{
// Collect list of car
objectList = CollectCarList(data);
}));
}
// Wait all tasks to finish
Task.WaitAll(poolTasks.ToArray());
public List<Car> CollectCarList(List<Car> list)
{
///
return list;
}
The code is using Tasks as if they were threads to flatten a nested list. Tasks aren't threads, they're a promise that something will produce a result in the future. In JavaScript they're actually called promises.
The question's exact code is flattening a nested list. This can easily be done with Enumerable.SelectMany(), eg :
var cars=source.SelectMany(data=>data).ToList();
Flattening isn't an expensive operation so there shouldn't be any need for parallelism. If there are really that many items, Parallel LINQ can be used with .AsParallel(). LINQ operators after that are executed using parallel algorithms and collected at the end :
var cars=source.AsParallel()
.SelectMany(data=>data)
.ToList();
Parallel LINQ is far more useful if it's used to parallelize the real time consuming processing before flattening :
var cars=source.AsParallel()
.Select(data=>DoSomethingExpensive(data))
.SelectMany(data=>data)
.ToList();
Parallel LINQ is built for data parallelism - processing large amounts of in-memory data by partitioning the input and using worker tasks to process each partition with minimal synchronization between workers. It's definitely not meant for executing lots of asynchronous operations concurrently. There are other high-level classes for that
First off List are not thread safe. If you really wanted to fill a list via different async tasks then you would probably want to use some sort of concurrent collection.
https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent?view=net-6.0
The second questions is why would you want to do this? In your current example all this work is CPU bound anyway so creating multiple tasks does not really get you anywhere. It's not going to speed anything up, in fact it will do quite the contrary as the async state machine calls will add overhead to the processing.
If your input lists where coming from various other async tasks, e.g calls to a database then this might make more sense. In any case based on what I see above this would do what your asking.
object ListLock = new object();
async void Main()
{
var splittedDataList = new List<List<int>> { Enumerable.Range(0, 500).ToList(), Enumerable.Range(0, 500).ToList() };
// Create a list of tasks
var poolTasks = new List<Task>();
var objectList = new List<int>();
for (int i = 0; i < splittedDataList.Count; i++)
{
var data = splittedDataList[i];
poolTasks.Add(Task.Factory.StartNew(() =>
{
lock (ListLock)
{
// Collect list of car
objectList.AddRange(CollectCarList(data));
}
}));
}
// Wait all tasks to finish
Task.WaitAll(poolTasks.ToArray());
objectList.Dump();
}
// You can define other methods, fields, classes and namespaces here
public List<int> CollectCarList(List<int> list)
{
///
return list;
}
I changed the list to be a simple List of int as I didn't what the definition of Car was in your application. The lock is required to overcome the thread safety issue with List. This could be removed if you used some kind of concurrent collection. I just want to reiterate that what this code is doing in it's current state is pointless. You would be better off just doing all this on a single thread unless there is some actual async IO going somewhere else.
static List<int> numbers = new List<int>();
static void Main(string[] args)
{
foreach(var number in GetNumbers())
{
if (number == 1)
{
Thread t = new Thread(() =>
{
numbers.Add(234567);
});
t.Start();
}
Console.WriteLine(number);
}
}
public static IEnumerable<int> GetNumbers()
{
for(int i =0; i <=10;i++)
{
numbers.Add(i);
}
foreach (var number in numbers)
{
yield return number;
}
}
In the above example, I was iterating over collection using yield and added an item into the collection to get the updated number.
I understand modifying a collection which we are iterating throws a collection modified exception but with IEnumerable, I get deferred execution and I should be able to add to the main collection as yield would return data one by one.
I understand, removing an item could be problematic but adding an item to iterating collection should not be problem. However,if it is not allowed as what I have seen in above example(even it is deferred execution), what if I have situation like this:
"There is a large collection and so many consumers are iterating over it using IEnumerable and yield. They take each item, do some processing with this item.
and If there is any new item added into main list or collection, then client should get the latest item too.
Even though you are using yield, you are using it inside a foreach.
The docs state:
It is safe to perform multiple read operations on a List, but
issues can occur if the collection is modified while it’s being read.
This is the fundamental issue here - you are reading the collection (as evidenced by the foreach) while writing to it. That just isn't allowed.
This may be worth reading - https://social.msdn.microsoft.com/Forums/vstudio/en-US/a90c87be-9553-4d48-9892-d482ee325f02/why-cant-change-value-in-foreach?forum=csharpgeneral
You likely want to consider using ConcurrentBag, ConcurrentQueue or ConcurrentStack as alternatives.
I have a Task factory thats kicking off many tasks, sometimes over 1000. I add every Task to a list, and remove it when the Task has completed.
var scan = Task.Factory.StartNew(() =>
{
return operation.Run();
}, token.Token
);
operations.Add(scan);
When a task Completes:
var finishedTask = scan.ContinueWith(resultTask =>
OperationComplete(resultTask),
TaskContinuationOptions.OnlyOnRanToCompletion
);
public virtual void OperationComplete(Task task)
{
operations.Remove(task);
}
When all are complete:
Task.Factory.ContinueWhenAll(operations.ToArray(),
result =>
{
AllOperationsComplete();
}, TaskContinuationOptions.None);
Then, at certain points in my application I want to get the count of running tasks. (This is where I get the error: "Collection was modified; enumeration operation may not execute.")
public int Count()
{
int running = operations.Count<Task>((x) => x.Status == TaskStatus.Running);
return running;
}
A couple questions:
1) Should I even worry about removing the tasks from the list? The list could easily be in the 1000s.
2) Whats the best way to make Count() safe? Creating a new List and adding operations to it will still enumerate the collection, if I remember right.
Either you need to lock to make sure only one thread accesses the list at a time (whether that's during removal or counting) or you should use a concurrent collection. Don't forget that Count(Func<T, bool>) needs to iterate over the collection in order to perform the count - it's like using a foreach loop... and you can't modify a collection (in general) while you're iterating over it.
I suspect that ConcurrentBag is an appropriate choice here - and as you're using TPL, presumably you have the .NET 4 concurrent collections available...
You need to make sure you don't modify a collection while you're iterating. Most collections don't support that. A lock would likely suffice.
But, you'll likely want to revisit the design. Locking a collection for an extended period of time will likely kill any performance gains you where hoping to get from asynchronous Tasks.
Given the code is already checking status as part of the count call, and assuming you aren't doing the count until after all tasks are in the collection, just not removing them seems like the simplest answer. Make sure to actually measure perf differences if you decide to switch out List for something else, especially if the number of times that Count call is done is low relative to the size of the collection. :)
You can use a ConcurrentDictionary
to keep track of your tasks (Concurrentbags don't let you remove specific items).
ConcurrentDictionary<Task, string> runningTasks = new ConcurrentDictionary<Task, string>();
Task task = Task.Factory.StartNew(() =>
{
// Do your stuff
}).ContinueWith(processedTask => {
var outString; // A string we don't care about
runningTasks.TryRemove(processedTask, out outString);
});
runningTasks.TryAdd(task, "Hello I'm a task");
// Add lots more tasks to runningTasks
while (runningTasks.Count > 0)
{
Console.WriteLine("I'm still waiting...");
Thread.Sleep(1000);
}
If you wanna do a proper "WaitAll" (requires LINQ):
try
{
Task[] keys = runningTasks.Keys.Select(x => x).ToArray();
Task.WaitAll(keys);
}
catch { } // WaitAll will always throw an exception.
Hope it helps.
I have multithreads application and i get this error
************** Exception Text **************
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at System.Collections.Generic.List`1.Enumerator.MoveNext()
...
I probably have problem with my collection, because on one thread i read my collection and on another thread i modify collection.
public readonly ObservableCollectionThreadSafe<GMapMarker> Markers = new ObservableCollectionThreadSafe<GMapMarker>();
public void problem()
{
foreach (GMapMarker m in Markers)
{
...
}
}
I am trying to lock collection with this code, but doesn't work.
public void problem()
{
lock(Markers)
{
foreach (GMapMarker m in Markers)
{
...
}
}
}
Any ideas to fix that problem?
This is a pretty common mistake - modifying a collection whilst iterating it using foreach, keep in mind that foreach uses readonly IEnumerator instance.
Try to loop through the collection using for() with an extra index check so if the index is out of bounds you would be able to apply additional logic to handle it. You can also use LINQ's Count() as another loop exit condition by evaluating the Count value each time if the underlying enumeration does not implement ICollection:
If Markers implements IColletion - lock on SyncRoot:
lock (Markers.SyncRoot)
Use for():
for (int index = 0; index < Markers.Count(); index++)
{
if (Markers>= Markers.Count())
{
// TODO: handle this case to avoid run time exception
}
}
You might find this post useful: How do foreach loops work in C#?
You need to lock both on the reading and the writing side. Otherwise one of the threads will not know about the lock and will try to read/modify the collection, while the other is modifying/reading (respectively) with the lock held
Try to read a clone of your collection
foreach (GMapMarker m in Markers.Copy())
{
...
}
this will create a new copy of your collection that will not be affected by another thread but may cause a performance issue in case of huge collection.
So I think it will be better if you locked the collection while reading and writing processes.
This worked for me. Perform a ToList() operation on Markers:
foreach (GMapMarker m in Markers.ToList())
You can use a foreach but you have to cast the collection to a list and use the dot operator to access the behavior methods.
Example: Markers.Tolist().ForEach(i => i.DeleteObject())
Not totally sure what you're doing with your collection. My example is assuming you just wanted to delete all items from the collection, but it can be applied to any behavior you're trying to do with your collection.
I'm trying to run multiple functions that connect to a remote site (by network) and return a generic list. But I want to run them simultaneously.
For example:
public static List<SearchResult> Search(string title)
{
//Initialize a new temp list to hold all search results
List<SearchResult> results = new List<SearchResult>();
//Loop all providers simultaneously
Parallel.ForEach(Providers, currentProvider =>
{
List<SearchResult> tmpResults = currentProvider.SearchTitle((title));
//Add results from current provider
results.AddRange(tmpResults);
});
//Return all combined results
return results;
}
As I see it, multiple insertions to 'results' may happend at the same time... Which may crash my application.
How can I avoid this?
You can use a concurrent collection.
The System.Collections.Concurrent namespace provides several thread-safe collection classes that should be used in place of the corresponding types in the System.Collections and System.Collections.Generic namespaces whenever multiple threads are accessing the collection concurrently.
You could for example use ConcurrentBag since you have no guarantee which order the items will be added.
Represents a thread-safe, unordered collection of objects.
//In the class scope:
Object lockMe = new Object();
//In the function
lock (lockMe)
{
results.AddRange(tmpResults);
}
Basically a lock means that only one thread can have access to that critical section at the same time.
For those who prefer code:
public static ConcurrentBag<SearchResult> Search(string title)
{
var results = new ConcurrentBag<SearchResult>();
Parallel.ForEach(Providers, currentProvider =>
{
results.Add(currentProvider.SearchTitle((title)));
});
return results;
}
The Concurrent Collections are new for .Net 4; they are designed to work with the new parallel functionality.
See Concurrent Collections in the .NET Framework 4:
Before .NET 4, you had to provide your own synchronization mechanisms if multiple threads might be accessing a single shared collection. You had to lock the collection ...
... the [new] classes and interfaces in System.Collections.Concurrent [added in .NET 4] provide a consistent implementation for [...] multi-threaded programming problems involving shared data across threads.
This could be expressed concisely using PLINQ's AsParallel and SelectMany:
public static List<SearchResult> Search(string title)
{
return Providers.AsParallel()
.SelectMany(p => p.SearchTitle(title))
.ToList();
}