In my real application I need to iterate collection, but it can be changed from other thread. So I need to copy collection to iterate on it. I reproduced this problem to small example, but apparently my lack of understanding of locks and threads results in System.ArgumentException. Tried different things with lock, but result is the same.
class Program
{
static List<int> list;
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 1000000; i++)
{
list.Add(i);
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
}
}
}
static void WorkThreadFunction()
{
lock (list)
{
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
}
}
Option 1:
Here's a modified version of your code:
class Program
{
static List<int> list;
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 1000000; i++)
{
lock (list) //Lock before modification
{
list.Add(i);
}
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
}
}
Console.ReadLine();
}
static void WorkThreadFunction()
{
lock (list)
{
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
}
}
What happens here is that your list is being modified while being converted to another list collection (where argument exception is happening). So to avoid that you will need to lock the list as shown above.
Option 2: (No lock)
Using Concurrent collections to remove the lock:
using System.Collections.Concurrent;
//Change this line
static List<int> list;
//To this line
static ConcurrentBag<int> list;
And remove all lock statements.
I see some issues in your algorithm, and may be you should refactor it. In case of using either locks or ConcurrentBag class you should realize that the copying entire collection into new one simply for enumeration is very huge and very time-consuming operation, and during it you can't operate with collection efficiently.
lock (list)
{
// VERY LONG OPERATION HERE
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
You really shouldn't lock collection for such amount of time - at the end of the for loop you have a lot of Threads which are blocking each other. You have to use the TPL classes for this approach and shouldn't use Threads directly.
The other case you can choose is to implement some of optimistic lock-free algorithm with double check for the collection version, or even lock-free and wait-free algorithm with storing the snapshot of the collection and checking for it inside your methods for the collection access. Additional information can be found here.
I think that the information you gave isn't enough to suggest you the right way to solve your issue.
Tried Joel's suggestions. ConcurrentBag was very slow. Locking at each of millions iteration seems inefficient. Looks like Event Wait Handles are good in this case (takes 3 time less than with locks on my pc).
class Program
{
static List<int> list;
static ManualResetEventSlim mres = new ManualResetEventSlim(false);
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 10000000; i++)
{
list.Add(i);
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
mres.Wait();
}
}
}
static void WorkThreadFunction()
{
List<int> tmp = list.ToList();
Console.WriteLine(list.Count);
mres.Set();
}
}
Related
I have a multi-line textbox and I want to process each line with multi threads.
The textbox could have a lot of lines (1000+), but not as many threads. I want to use custom amount of threads to read all those 1000+ lines without any duplicates (as in each thread reading UNIQUE lines only, if a line has been read by other thread, not to read it again).
What I have right now:
private void button5_Click(object sender, EventArgs e)
{
for (int i = 0; i < threadCount; i++)
{
new Thread(new ThreadStart(threadJob)).Start();
}
}
private void threadJob()
{
for (int i = 0; i < txtSearchTerms.Lines.Length; i++)
{
lock (threadLock)
{
Console.WriteLine(txtSearchTerms.Lines[i]);
}
}
}
It does start the correct amount of threads, but they all read the same variable multiple times.
Separate data collection and data processing and next possible steps after calculation. You can safely collect results calculated in parallel by using ConcurrentBag<T>, which is simply thread-safe collection.
Then you don't need to worry about "locking" objects and all lines will be "processed" only once.
1. Collect data
2. Execute collected data in parallel
3. Handle calculated result
private string Process(string line)
{
// Your logic for given line
}
private void Button_Click(object sender, EventArgs e)
{
var results = new ConcurrentBag<string>();
Parallel.ForEach(txtSearchTerms.Lines,
line =>
{
var result = Process(line);
results.Add(result);
});
foreach (var result in results)
{
Console.WriteLine(result);
}
}
By default Parallel.ForEach will use as much threads as underlying scheduler provides.
You can control amount of used threads by passing instance of ParallelOptions to the Parallel.ForEach method.
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
var results = new ConcurrentBag<string>();
Parallel.ForEach(values,
options,
value =>
{
var result = Process(value);
results.Add(result);
});
Consider using Parallel.ForEach to iterate over the Lines array. It is just like a normal foreach loop (i.e. each value will be processed only once), but the work is done in parallel - with multiple Tasks (threads).
var data = txtSearchTerms.Lines;
var threadCount = 4; // or whatever you want
Parallel.ForEach(data,
new ParallelOptions() { MaxDegreeOfParallelism = threadCount },
(val) =>
{
//Your code here
Console.WriteLine(val);
});
The above code will need this line to be added at the top of your file:
using System.Threading.Tasks;
Alternatively if you want to not just execute something, but also return / project something then instead try:
var results = data.AsParallel(new ParallelLinqOptions()
{
MaxDegreeOfParallelism = threadCount
}).Select(val =>
{
// Your code here, I just return the value but you could return whatever you want
return val;
}).ToList();
which still executes the code in parallel, but also returns a List (in this case with the same values in the original TextBox). And most importantly, the List will be in the same order as your input.
There many ways to do it what you want.
Take an extra class field:
private int _counter;
Use it instead of loop index. Increment it inside the lock:
private void threadJob()
{
while (true)
{
lock (threadLock)
{
if (_counter >= txtSearchTerms.Lines.Length)
return;
Console.WriteLine(txtSearchTerms.Lines[_counter]);
_counter++;
}
}
}
It works, but it very inefficient.
Lets consider another way. Each thread will handle its part of the dataset independently from the others.
public void button5_Click(object sender, EventArgs e)
{
for (int i = 0; i < threadCount; i++)
{
new Thread(new ParameterizedThreadStart(threadJob)).Start(i);
}
}
private void threadJob(object o)
{
int threadNumber = (int)o;
int count = txtSearchTerms.Lines.Length / threadCount;
int start = threadNumber * count;
int end = threadNumber != threadCount - 1 ? start + count : txtSearchTerms.Lines.Length;
for (int i = start; i < end; i++)
{
Console.WriteLine(txtSearchTerms.Lines[i]);
}
}
This is more efficient because threads do not wait on the lock. However, the array elements are processed not in a general manner.
so i have a question about the System.Collections.Concurrent
I saw that the Concurrent is acctually a safe thread collection, but in wich cases it can be helpfull?
I made 2 examples and the result are the same
First the ConcurrentQueue:
static ConcurrentQueue<int> queue = new ConcurrentQueue<int>();
private static readonly object obj = new object();
static int i = 0;
static int Num = 0;
static void Run(object loopNum)
{
lock (obj)
{
for (int N = 0; N < 10; N++)
{
queue.Enqueue (i);
Thread.Sleep(250);
queue.TryDequeue(out Num);
Console.WriteLine($"{Num} Added! in {loopNum} Loop, ThreadID: [{Thread.CurrentThread.ManagedThreadId}]");
i++;
}
}
}
And now the normal Queue:
static Queue<int> queue = new Queue<int>();
private static readonly object obj = new object();
static int i = 0;
static void Run(object loopNum)
{
lock (obj)
{
for (int N = 0; N < 10; N++)
{
queue.Enqueue (i);
Thread.Sleep(250);
Console.WriteLine($"{queue.Dequeue()} Added! in {loopNum} Loop, ThreadID: [{Thread.CurrentThread.ManagedThreadId}]");
i++;
}
}
}
Main:
static void Main()
{
Thread[] Th = new Thread[] { new Thread(Run), new Thread(Run) };
Th[0].Start("First");
Th[1].Start("Second");
Console.ReadKey();
}
The result are the same
Sure, it got some diffrent methods like TryDequeue And a few more, but what it really helpfull for?
Any help will be very appriciated :)
Don't use lock() in conjunction with ConcurrentQueue<> or similar items in that namespace. It's detrimental to performance.
You can use ConcurrentQueue<> safely with multiple threads and have great performance. The same can not be said with lock() and regular collections.
That's why your results are the same.
The reason for using ConcurrentQueue<T> is to avoid writing your own locking code.
If you have multiple threads adding or removing items from a Queue<T> you are likely to get an exception. Using a ConcurrentQueue<T> will avoid the exceptions.
Here's a sample program which will likely cause an exception when using multiple threads to write to a Queue<T> while it works with a ConcurrentQueue<T>:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Threading.Tasks;
internal class Program
{
private static void Main()
{
var queue1 = new ConcurrentQueue<int>();
var queue2 = new Queue<int>();
// This will work fine.
var task1 = Task.Run(() => producer(item => queue1.Enqueue(item)));
var task2 = Task.Run(() => producer(item => queue1.Enqueue(item)));
Task.WaitAll(task1, task2);
// This will cause an exception.
var task3 = Task.Run(() => producer(item => queue2.Enqueue(item)));
var task4 = Task.Run(() => producer(item => queue2.Enqueue(item)));
Task.WaitAll(task3, task4);
}
private static void producer(Action<int> add)
{
for (int i = 0; i < 10000; ++i)
add(i);
}
}
Try running it and see what happens.
When you are using the lock construct, your code effectively executes in sequence, not in parallel. This solution is suitable for the version with simple Queue as it's not thread-safe, but with ConcurrentQueue, using lock kinda defeats the purpose. Remove the lock for ConcurrentQueue, remove the Thread.Sleep, and use 20 threads instead of 2 just for kicks. You can use Parallel.For() method to spawn your tasks.
Parallel.For(0, 20, i => Run());
Thank you everyone for all your answers, really helped me out, i appriciate it alot.
By the way Matthew Watson, your example sometimes give's an exception and sometime's isnt, i made a better example, but yeah i get the point.
private static void Main()
{
var queue1 = new ConcurrentQueue<int>();
var queue2 = new Queue<int>();
// This will work fine.
var task1 = Enumerable.Range(0, 40)
.Select(_ => Task.Run(() => producer(item => queue1.Enqueue(item))))
.ToArray();
Task.WaitAll(task1);
// This will cause an exception.
var task2 = Enumerable.Range(0, 40)
.Select(_ => Task.Run(() => producer(item => queue2.Enqueue(item))))
.ToArray();
Task.WaitAll(task2);
}
Thanks again :)
Here's what I'm trying to do:
Get one html page from url which contains multiple links inside
Visit each link
Extract some data from visited link and create object using it
So far All i did is just simple and slow way:
public List<Link> searchLinks(string name)
{
List<Link> foundLinks = new List<Link>();
// getHtmlDocument() just returns HtmlDocument using input url.
HtmlDocument doc = getHtmlDocument(AU_SEARCH_URL + fixSpaces(name));
var link_list = doc.DocumentNode.SelectNodes(#"/html/body/div[#id='parent-container']/div[#id='main-content']/ol[#id='searchresult']/li/h2/a");
foreach (var link in link_list)
{
// TODO Threads
// getObject() creates object using data gathered
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
}
return foundLinks;
}
To make it faster/efficient I need to implement threads, but I'm not sure how i should approach it, because I can't just randomly start threads, I need to wait for them to finish, thread.Join() kind of solves 'wait for threads to finish' problem, but it becomes not fast anymore i think, because threads will be launched after earlier one is finished.
The simplest way to offload the work to multiple threads would be to use Parallel.ForEach() in place of your current loop. Something like this:
Parallel.ForEach(link_list, link =>
{
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
});
I'm not sure if there are other threading concerns in your overall code. (Note, for example, that this would no longer guarantee that the data would be added to foundLinks in the same order.) But as long as there's nothing explicitly preventing concurrent work from taking place then this would take advantage of threading over multiple CPU cores to process the work.
Maybe you should use Thread pool :
Example from MSDN :
using System;
using System.Threading;
public class Fibonacci
{
private int _n;
private int _fibOfN;
private ManualResetEvent _doneEvent;
public int N { get { return _n; } }
public int FibOfN { get { return _fibOfN; } }
// Constructor.
public Fibonacci(int n, ManualResetEvent doneEvent)
{
_n = n;
_doneEvent = doneEvent;
}
// Wrapper method for use with thread pool.
public void ThreadPoolCallback(Object threadContext)
{
int threadIndex = (int)threadContext;
Console.WriteLine("thread {0} started...", threadIndex);
_fibOfN = Calculate(_n);
Console.WriteLine("thread {0} result calculated...", threadIndex);
_doneEvent.Set();
}
// Recursive method that calculates the Nth Fibonacci number.
public int Calculate(int n)
{
if (n <= 1)
{
return n;
}
return Calculate(n - 1) + Calculate(n - 2);
}
}
public class ThreadPoolExample
{
static void Main()
{
const int FibonacciCalculations = 10;
// One event is used for each Fibonacci object.
ManualResetEvent[] doneEvents = new ManualResetEvent[FibonacciCalculations];
Fibonacci[] fibArray = new Fibonacci[FibonacciCalculations];
Random r = new Random();
// Configure and start threads using ThreadPool.
Console.WriteLine("launching {0} tasks...", FibonacciCalculations);
for (int i = 0; i < FibonacciCalculations; i++)
{
doneEvents[i] = new ManualResetEvent(false);
Fibonacci f = new Fibonacci(r.Next(20, 40), doneEvents[i]);
fibArray[i] = f;
ThreadPool.QueueUserWorkItem(f.ThreadPoolCallback, i);
}
// Wait for all threads in pool to calculate.
WaitHandle.WaitAll(doneEvents);
Console.WriteLine("All calculations are complete.");
// Display the results.
for (int i= 0; i<FibonacciCalculations; i++)
{
Fibonacci f = fibArray[i];
Console.WriteLine("Fibonacci({0}) = {1}", f.N, f.FibOfN);
}
}
}
I have following code which throws SemaphoreFullException, I don't understand why ?
If I change _semaphore = new SemaphoreSlim(0, 2) to
_semaphore = new SemaphoreSlim(0, int.MaxValue)
then all works fine.
Can anyone please find fault with this code and explain to me.
class BlockingQueue<T>
{
private Queue<T> _queue = new Queue<T>();
private SemaphoreSlim _semaphore = new SemaphoreSlim(0, 2);
public void Enqueue(T data)
{
if (data == null) throw new ArgumentNullException("data");
lock (_queue)
{
_queue.Enqueue(data);
}
_semaphore.Release();
}
public T Dequeue()
{
_semaphore.Wait();
lock (_queue)
{
return _queue.Dequeue();
}
}
}
public class Test
{
private static BlockingQueue<string> _bq = new BlockingQueue<string>();
public static void Main()
{
for (int i = 0; i < 100; i++)
{
_bq.Enqueue("item-" + i);
}
for (int i = 0; i < 5; i++)
{
Thread t = new Thread(Produce);
t.Start();
}
for (int i = 0; i < 100; i++)
{
Thread t = new Thread(Consume);
t.Start();
}
Console.ReadLine();
}
private static Random _random = new Random();
private static void Produce()
{
while (true)
{
_bq.Enqueue("item-" + _random.Next());
Thread.Sleep(2000);
}
}
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
}
}
}
If you want to use the semaphore to control the number of concurrent threads, you're using it wrong. You should acquire the semaphore when you dequeue an item, and release the semaphore when the thread is done processing that item.
What you have right now is a system that allows only two items to be in the queue at any one time. Initially, your semaphore has a count of 2. Each time you enqueue an item, the count is reduced. After two items, the count is 0 and if you try to release again you're going to get a semaphore full exception.
If you really want to do this with a semaphore, you need to remove the Release call from the Enqueue method. And add a Release method to the BlockingQueue class. You then would write:
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
bq.Release();
}
}
That would make your code work, but it's not a very good solution. A much better solution would be to use BlockingCollection<T> and two persistent consumers. Something like:
private BlockingCollection<int> bq = new BlockingCollection<int>();
void Test()
{
// create two consumers
var c1 = new Thread(Consume);
var c2 = new Thread(Consume);
c1.Start();
c2.Start();
// produce
for (var i = 0; i < 100; ++i)
{
bq.Add(i);
}
bq.CompleteAdding();
c1.Join();
c2.Join();
}
void Consume()
{
foreach (var i in bq.GetConsumingEnumerable())
{
Console.WriteLine("Consumed-" + i);
Thread.Sleep(1000);
}
}
That gives you two persistent threads consuming the items. The benefit is that you avoid the cost of spinning up a new thread (or having the RTL assign a pool thread) for each item. Instead, the threads do non-busy waits on the queue. You also don't have to worry about explicit locking, etc. The code is simpler, more robust, and much less likely to contain a bug.
I'm running some experiments, based on the .NET thread safe, and non-thread safe dictionary's, as well as my custom one.
The results for writing 20,000,000 (20 million) ints to each are as follows:
Non-thread safe: 909 milliseconds (less then 1 second) Dictionary
Thread safe: 11914 milliseconds (more then 11 seconds) ConcurrentDictionary
Custom: 909 milliseconds (less then 1 second) 2 dictionary's
Thread safe (ConcurrentTryAdd): 12697 milliseconds (more then 12 seconds) No better then #2
These tests were conducted in a single threaded environment, I'm trying to get the speed of the non-thread safe dictionary, with the safety of the thread safe one.
The results are promising so far, I'm surprised how poorly the ConcurrentDictionary handled, maybe its meant for certain scenarios only?
Anyway, below is the code I used to test the three dictionary's, can you tell me if my custom one is thread safe? Do I have to add a lock to if (_list.ContainsKey(threadId))? I don't think so since its only a read, and when the dictionary has an element added to it (a write) its protected by a lock, blocking other threads trying to read it.
There is no locks once the thread has the dictionary, because another thread cannot write to that same dictionary, since each thread gets their own dictionary (based on the ManagedThreadId), making it as safe as a single thread.
Main
using System;
using System.Diagnostics;
namespace LockFreeTests
{
class Program
{
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
int i = 20000000; // 20 million
IWork work = new Custom(); // Replace with: Control(), Concurrent(), or Custom()
work.Start(i);
sw.Stop();
Console.WriteLine("Total time: {0}\r\nPress anykey to continue...", sw.Elapsed.TotalMilliseconds);
Console.ReadKey(true);
}
}
}
Non-thread safe
using System.Collections.Generic;
namespace LockFreeTests
{
class Control : IWork
{
public void Start(int i)
{
var list = new Dictionary<int, int>();
for (int n = 0; n < i; n++)
{
list.Add(n, n);
}
}
}
}
Thread safe
using System.Collections.Concurrent;
namespace LockFreeTests
{
class Concurrent : IWork
{
public void Start(int i)
{
var list = new ConcurrentDictionary<int, int>();
for (int n = 0; n < i; n++)
{
list.AddOrUpdate(n, n, (a, b) => b);
}
}
}
}
Thread Safe (try add)
using System.Collections.Concurrent;
namespace LockFreeTests
{
class ConcurrentTryAdd : IWork
{
public void Start(int i)
{
var list = new ConcurrentDictionary<int, int>();
for (int n = 0; n < i; n++)
{
bool result = list.TryAdd(n, n);
if (!result)
{
n--;
}
}
}
}
}
Custom
using System.Collections.Generic;
using System.Threading;
namespace LockFreeTests
{
class Custom : IWork
{
private static Dictionary<int, Dictionary<int, int>> _list = null;
static Custom()
{
_list = new Dictionary<int, Dictionary<int, int>>();
}
public void Start(int i)
{
int threadId = Thread.CurrentThread.ManagedThreadId;
Dictionary<int, int> threadList = null;
bool firstTime = false;
lock (_list)
{
if (_list.ContainsKey(threadId))
{
threadList = _list[threadId];
}
else
{
threadList = new Dictionary<int, int>();
firstTime = true;
}
}
for (int n = 0; n < i; n++)
{
threadList.Add(n, n);
}
if (firstTime)
{
lock (_list)
{
_list.Add(threadId, threadList);
}
}
}
}
}
IWorK
namespace LockFreeTests
{
public interface IWork
{
void Start(int i);
}
}
Multi-threaded Example
using System;
using System.Diagnostics;
using System.Threading.Tasks;
namespace LockFreeTests
{
class Program
{
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
int totalWork = 20000000; // 20 million
int cores = Environment.ProcessorCount;
int workPerCore = totalWork / cores;
IWork work = new Custom(); // Replace with: Control(), Concurrent(), ConcurrentTryAdd(), or Custom()
var tasks = new Task[cores];
for (int n = 0; n < cores; n++)
{
tasks[n] = Task.Factory.StartNew(() =>
{
work.Start(workPerCore);
});
}
Task.WaitAll(tasks);
sw.Stop();
Console.WriteLine("Total time: {0}\r\nPress anykey to continue...", sw.Elapsed.TotalMilliseconds);
Console.ReadKey(true);
}
}
}
The above code runs in 528 milliseconds, that's a 40% speed improvement (from the single thread test)
It's not thread-safe.
Do I have to add a lock to if (_list.ContainsKey(threadId))? I don't think so since its only a read, and when the dictionary has an element added to it (a write) its protected by a lock, blocking other threads trying to read it.
Yes, you do need a lock here to make it thread-safe.
I just wrote about my lock-free thread-safe copy-on-write dictionary implementation here:
http://www.singulink.com/CodeIndex/post/fastest-thread-safe-lock-free-dictionary
It is very fast for quick bursts of writes and lookups usually run at 100% standard Dictionary speed without locking. If you write occasionally and read often, this is the fastest option available.