How to prevent threads using the same variables - c#

I have a multi-line textbox and I want to process each line with multi threads.
The textbox could have a lot of lines (1000+), but not as many threads. I want to use custom amount of threads to read all those 1000+ lines without any duplicates (as in each thread reading UNIQUE lines only, if a line has been read by other thread, not to read it again).
What I have right now:
private void button5_Click(object sender, EventArgs e)
{
for (int i = 0; i < threadCount; i++)
{
new Thread(new ThreadStart(threadJob)).Start();
}
}
private void threadJob()
{
for (int i = 0; i < txtSearchTerms.Lines.Length; i++)
{
lock (threadLock)
{
Console.WriteLine(txtSearchTerms.Lines[i]);
}
}
}
It does start the correct amount of threads, but they all read the same variable multiple times.

Separate data collection and data processing and next possible steps after calculation. You can safely collect results calculated in parallel by using ConcurrentBag<T>, which is simply thread-safe collection.
Then you don't need to worry about "locking" objects and all lines will be "processed" only once.
1. Collect data
2. Execute collected data in parallel
3. Handle calculated result
private string Process(string line)
{
// Your logic for given line
}
private void Button_Click(object sender, EventArgs e)
{
var results = new ConcurrentBag<string>();
Parallel.ForEach(txtSearchTerms.Lines,
line =>
{
var result = Process(line);
results.Add(result);
});
foreach (var result in results)
{
Console.WriteLine(result);
}
}
By default Parallel.ForEach will use as much threads as underlying scheduler provides.
You can control amount of used threads by passing instance of ParallelOptions to the Parallel.ForEach method.
var options = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
var results = new ConcurrentBag<string>();
Parallel.ForEach(values,
options,
value =>
{
var result = Process(value);
results.Add(result);
});

Consider using Parallel.ForEach to iterate over the Lines array. It is just like a normal foreach loop (i.e. each value will be processed only once), but the work is done in parallel - with multiple Tasks (threads).
var data = txtSearchTerms.Lines;
var threadCount = 4; // or whatever you want
Parallel.ForEach(data,
new ParallelOptions() { MaxDegreeOfParallelism = threadCount },
(val) =>
{
//Your code here
Console.WriteLine(val);
});
The above code will need this line to be added at the top of your file:
using System.Threading.Tasks;
Alternatively if you want to not just execute something, but also return / project something then instead try:
var results = data.AsParallel(new ParallelLinqOptions()
{
MaxDegreeOfParallelism = threadCount
}).Select(val =>
{
// Your code here, I just return the value but you could return whatever you want
return val;
}).ToList();
which still executes the code in parallel, but also returns a List (in this case with the same values in the original TextBox). And most importantly, the List will be in the same order as your input.

There many ways to do it what you want.
Take an extra class field:
private int _counter;
Use it instead of loop index. Increment it inside the lock:
private void threadJob()
{
while (true)
{
lock (threadLock)
{
if (_counter >= txtSearchTerms.Lines.Length)
return;
Console.WriteLine(txtSearchTerms.Lines[_counter]);
_counter++;
}
}
}
It works, but it very inefficient.
Lets consider another way. Each thread will handle its part of the dataset independently from the others.
public void button5_Click(object sender, EventArgs e)
{
for (int i = 0; i < threadCount; i++)
{
new Thread(new ParameterizedThreadStart(threadJob)).Start(i);
}
}
private void threadJob(object o)
{
int threadNumber = (int)o;
int count = txtSearchTerms.Lines.Length / threadCount;
int start = threadNumber * count;
int end = threadNumber != threadCount - 1 ? start + count : txtSearchTerms.Lines.Length;
for (int i = start; i < end; i++)
{
Console.WriteLine(txtSearchTerms.Lines[i]);
}
}
This is more efficient because threads do not wait on the lock. However, the array elements are processed not in a general manner.

Related

Tasks are starting sequencial instead of parallel

I need to start tasks in parallel, but I choose to use Task.Run instead of Parallel.Foreach, so I can get some feedback when all tasks finished and enable UI controls.
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if(cells != null)
{
List<Task> tasks = new List<Task>();
foreach (Cell c in cells)
{
Cell cell = c;
var progressHandler = new Progress<string>(value =>
{
cell.Status = value;
});
var progress = progressHandler as IProgress<string>;
Task t = Task.Run(() =>
{
progress.Report("Starting...");
int a = 123;
for (int i = 0; i < 200000; i++)
{
a = a + i;
Task.Delay(500).Wait();
}
progress.Report("Done");
});
tasks.Add(t);
}
await Task.WhenAll(tasks);
Console.WriteLine("Done, enabld UI controls");
}
}
So what I expect is that I see in UI "Starting..." almost instantly for all items. What I actually see is first 4 items are "Starting..." (I guess because all 4 CPU cores are used per thread), then each second or less new item is "Starting". I have total 37 items and it takes around 30 seconds for all items to start all tasks.
How can I make it as parallel as possible?
How can I make it as parallel as possible?
The part of inner for loop is simulating long running CPU-bound job, which I would like to start at the same time as much as possible.
It's already as parallel as possible. Starting 37 threads that all have CPU-bound work to do will not make it go any faster, since you're apparently running it on a 4-core machine. There are 4 cores, so only 4 threads can actually run at a time. The other 33 threads are going to be waiting while 4 are running. They would only appear to run simultaneously.
That said, if you really want to start up all those thread pool threads, you can do this by calling ThreadPool.SetMinThreads.
I need to start tasks in parallel, but I choose to use Task.Run instead of Parallel.Foreach, so I can get some feedback when all tasks finished and enable UI controls.
Since you have parallel work to do, you should use Parallel. If you want the nice resume-on-the-UI-thread behavior of await, then you can use a single await Task.Run, something like this:
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if (cells == null)
return;
var workItems = cells.Select(c => new
{
Cell = c,
Progress = new Progress<string>(value => { c.Status = value; }),
}).ToList();
await Task.Run(() => Parallel.ForEach(workItems, item =>
{
var progress = item.Progress as IProgress<string>();
progress.Report("Starting...");
int a = 123;
for (int i = 0; i < 200000; i++)
{
a = a + i;
Thread.Sleep(500);
}
progress.Report("Done");
}));
Console.WriteLine("Done, enabld UI controls");
}
I'd say, it is as parallel as possible. If you have 4 cores, you can run 4 threads in parallel.
If you can do stuff while waiting for the "delay", have a look into asynchronous programming (where one thread can run multiple tasks "at once", because most of them are waiting for something).
EDIT: you can also run Parallel.ForEach in its own task and await that:
private async void buttonStart_Click(object sender, EventArgs e)
{
var cells = objectListView.CheckedObjects;
if(cells != null)
{
await Task.Run( () => Parallel.ForEach( cells, c => ... ) );
}
}
I think it relies on your taskcreation-options.
TaskCreationOptions.LongRunning
Here you can find further informations:
https://msdn.microsoft.com/en-us/library/system.threading.tasks.taskcreationoptions(v=vs.110).aspx
But you have to know, that task uses a threadpool with a finite maximum amount of threads. You can use LongRunning to signal, that this task needs a long time and should not clog your pool. I thinks it's more complex to create a long-running task, because the scheduler may create a new thread.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace TaskTest
{
internal class Program
{
private static void Main(string[] args)
{
var demo = new Program();
demo.SimulateClick();
Console.ReadLine();
}
public void SimulateClick()
{
buttonStart_Click(null, null);
}
private async void buttonStart_Click(object sender, EventArgs e)
{
var tasks = new List<Task>();
for (var i = 0; i < 36; i++)
{
var taskId = i;
var t = Task.Factory.StartNew((() =>
{
Console.WriteLine($"Starting Task ({taskId})");
for (var ii = 0; ii < 200000; ii++)
{
Task.Delay(TimeSpan.FromMilliseconds(500)).Wait();
var s1 = new string(' ', taskId);
var s2 = new string(' ', 36-taskId);
Console.WriteLine($"Updating Task {s1}X{s2} ({taskId})");
}
Console.Write($"Done ({taskId})");
}),TaskCreationOptions.LongRunning);
tasks.Add(t);
}
await Task.WhenAll(tasks);
Console.WriteLine("Done, enabld UI controls");
}
}
}

C# - Lock not working but Volatile AND Lock does?

I am trying to poll an API as fast and as efficiently as possible to get market data. The API allows you to get market data from batchSize markets per request. The API allows you to have 3 concurrent requests but no more (or throws errors).
I may be requesting data from many more than batchSize different markets.
I continuously loop through all of the markets, requesting the data in batches, one batch per thread and 3 threads at any time.
The total number of markets (and hence batches) can change at any time.
I'm using the following code:
private static object lockObj = new object();
private void PollMarkets()
{
const int NumberOfConcurrentRequests = 3;
for (int i = 0; i < NumberOfConcurrentRequests; i++)
{
int batch = 0;
Task.Factory.StartNew(async () =>
{
while (true)
{
if (markets.Count > 0)
{
List<string> batchMarketIds;
lock (lockObj)
{
var numBatches = (int)Math.Ceiling((double)markets.Count / batchSize);
batchMarketIds = markets.Keys.Skip(batch*batchSize).Take(batchSize).ToList();
batch = (batch + 1) % numBatches;
}
var marketData = await GetMarketData(batchMarketIds);
// Do something with marketData
}
else
{
await Task.Delay(1000); // wait for some markets to be added.
}
}
}
});
}
}
Even though there is a lock in the critical section, each thread starts with batch = 0 (each thread is often polling for duplicate data).
If I change batch to a private volatile field the above code works as I want it to (volatile and lock).
So for some reason my lock doesn't work? I feel like it's something obvious but I'm missing it.
I believe that it is best here to use a lock instead of a volatile field, is this also correct?
Thanks
The issue was that you were defining the batch variable inside the for loop. That meant that the threads were using their own variable instead of sharing it.
In my mind you should use Queue<> to create a jobs pipeline.
Something like this
private int batchSize = 10;
private Queue<int> queue = new Queue<int>();
private void AddMarket(params int[] marketIDs)
{
lock (queue)
{
foreach (var marketID in marketIDs)
{
queue.Enqueue(marketID);
}
if (queue.Count >= batchSize)
{
Monitor.Pulse(queue);
}
}
}
private void Start()
{
for (var tid = 0; tid < 3; tid++)
{
Task.Run(async () =>
{
while (true)
{
List<int> toProcess;
lock (queue)
{
if (queue.Count < batchSize)
{
Monitor.Wait(queue);
continue;
}
toProcess = new List<int>(batchSize);
for (var count = 0; count < batchSize; count++)
{
toProcess.Add(queue.Dequeue());
}
if (queue.Count >= batchSize)
{
Monitor.Pulse(queue);
}
}
var marketData = await GetMarketData(toProcess);
}
});
}
}

System.ArgumentException when locking

In my real application I need to iterate collection, but it can be changed from other thread. So I need to copy collection to iterate on it. I reproduced this problem to small example, but apparently my lack of understanding of locks and threads results in System.ArgumentException. Tried different things with lock, but result is the same.
class Program
{
static List<int> list;
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 1000000; i++)
{
list.Add(i);
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
}
}
}
static void WorkThreadFunction()
{
lock (list)
{
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
}
}
Option 1:
Here's a modified version of your code:
class Program
{
static List<int> list;
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 1000000; i++)
{
lock (list) //Lock before modification
{
list.Add(i);
}
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
}
}
Console.ReadLine();
}
static void WorkThreadFunction()
{
lock (list)
{
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
}
}
What happens here is that your list is being modified while being converted to another list collection (where argument exception is happening). So to avoid that you will need to lock the list as shown above.
Option 2: (No lock)
Using Concurrent collections to remove the lock:
using System.Collections.Concurrent;
//Change this line
static List<int> list;
//To this line
static ConcurrentBag<int> list;
And remove all lock statements.
I see some issues in your algorithm, and may be you should refactor it. In case of using either locks or ConcurrentBag class you should realize that the copying entire collection into new one simply for enumeration is very huge and very time-consuming operation, and during it you can't operate with collection efficiently.
lock (list)
{
// VERY LONG OPERATION HERE
List<int> tmp = list.ToList(); //Exception here!
Console.WriteLine(list.Count);
}
You really shouldn't lock collection for such amount of time - at the end of the for loop you have a lot of Threads which are blocking each other. You have to use the TPL classes for this approach and shouldn't use Threads directly.
The other case you can choose is to implement some of optimistic lock-free algorithm with double check for the collection version, or even lock-free and wait-free algorithm with storing the snapshot of the collection and checking for it inside your methods for the collection access. Additional information can be found here.
I think that the information you gave isn't enough to suggest you the right way to solve your issue.
Tried Joel's suggestions. ConcurrentBag was very slow. Locking at each of millions iteration seems inefficient. Looks like Event Wait Handles are good in this case (takes 3 time less than with locks on my pc).
class Program
{
static List<int> list;
static ManualResetEventSlim mres = new ManualResetEventSlim(false);
static void Main(string[] args)
{
list = new List<int>();
for (int i = 0; i < 10000000; i++)
{
list.Add(i);
if (i == 1000)
{
Thread t = new Thread(new ThreadStart(WorkThreadFunction));
t.Start();
mres.Wait();
}
}
}
static void WorkThreadFunction()
{
List<int> tmp = list.ToList();
Console.WriteLine(list.Count);
mres.Set();
}
}

Using threads to parse multiple Html pages faster

Here's what I'm trying to do:
Get one html page from url which contains multiple links inside
Visit each link
Extract some data from visited link and create object using it
So far All i did is just simple and slow way:
public List<Link> searchLinks(string name)
{
List<Link> foundLinks = new List<Link>();
// getHtmlDocument() just returns HtmlDocument using input url.
HtmlDocument doc = getHtmlDocument(AU_SEARCH_URL + fixSpaces(name));
var link_list = doc.DocumentNode.SelectNodes(#"/html/body/div[#id='parent-container']/div[#id='main-content']/ol[#id='searchresult']/li/h2/a");
foreach (var link in link_list)
{
// TODO Threads
// getObject() creates object using data gathered
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
}
return foundLinks;
}
To make it faster/efficient I need to implement threads, but I'm not sure how i should approach it, because I can't just randomly start threads, I need to wait for them to finish, thread.Join() kind of solves 'wait for threads to finish' problem, but it becomes not fast anymore i think, because threads will be launched after earlier one is finished.
The simplest way to offload the work to multiple threads would be to use Parallel.ForEach() in place of your current loop. Something like this:
Parallel.ForEach(link_list, link =>
{
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
});
I'm not sure if there are other threading concerns in your overall code. (Note, for example, that this would no longer guarantee that the data would be added to foundLinks in the same order.) But as long as there's nothing explicitly preventing concurrent work from taking place then this would take advantage of threading over multiple CPU cores to process the work.
Maybe you should use Thread pool :
Example from MSDN :
using System;
using System.Threading;
public class Fibonacci
{
private int _n;
private int _fibOfN;
private ManualResetEvent _doneEvent;
public int N { get { return _n; } }
public int FibOfN { get { return _fibOfN; } }
// Constructor.
public Fibonacci(int n, ManualResetEvent doneEvent)
{
_n = n;
_doneEvent = doneEvent;
}
// Wrapper method for use with thread pool.
public void ThreadPoolCallback(Object threadContext)
{
int threadIndex = (int)threadContext;
Console.WriteLine("thread {0} started...", threadIndex);
_fibOfN = Calculate(_n);
Console.WriteLine("thread {0} result calculated...", threadIndex);
_doneEvent.Set();
}
// Recursive method that calculates the Nth Fibonacci number.
public int Calculate(int n)
{
if (n <= 1)
{
return n;
}
return Calculate(n - 1) + Calculate(n - 2);
}
}
public class ThreadPoolExample
{
static void Main()
{
const int FibonacciCalculations = 10;
// One event is used for each Fibonacci object.
ManualResetEvent[] doneEvents = new ManualResetEvent[FibonacciCalculations];
Fibonacci[] fibArray = new Fibonacci[FibonacciCalculations];
Random r = new Random();
// Configure and start threads using ThreadPool.
Console.WriteLine("launching {0} tasks...", FibonacciCalculations);
for (int i = 0; i < FibonacciCalculations; i++)
{
doneEvents[i] = new ManualResetEvent(false);
Fibonacci f = new Fibonacci(r.Next(20, 40), doneEvents[i]);
fibArray[i] = f;
ThreadPool.QueueUserWorkItem(f.ThreadPoolCallback, i);
}
// Wait for all threads in pool to calculate.
WaitHandle.WaitAll(doneEvents);
Console.WriteLine("All calculations are complete.");
// Display the results.
for (int i= 0; i<FibonacciCalculations; i++)
{
Fibonacci f = fibArray[i];
Console.WriteLine("Fibonacci({0}) = {1}", f.N, f.FibOfN);
}
}
}

How to use multi threading in a For loop

I want to achieve the below requirement; please suggest some solution.
string[] filenames = Directory.GetFiles("C:\Temp"); //10 files
for (int i = 0; i < filenames.count; i++)
{
ProcessFile(filenames[i]); //it takes time to execute
}
I wanted to implement multi-threading. e.g There are 10 files. I wanted to process 3 files at a time (configurable, say maxthreadcount). So 3 files will be processed in 3 threads from the for loop and if any thread completes the execution, it should pick the next item from the for loop. Also wanted to ensure all the files are processed before it exits the for loop.
Please suggest best approach.
Try
Parallel.For(0, filenames.Length, i => {
ProcessFile(filenames[i]);
});
MSDN
It's only available since .Net 4. Hope that acceptable.
This will do the job in .net 2.0:
class Program
{
static int workingCounter = 0;
static int workingLimit = 10;
static int processedCounter = 0;
static void Main(string[] args)
{
string[] files = Directory.GetFiles("C:\\Temp");
int checkCount = files.Length;
foreach (string file in files)
{
//wait for free limit...
while (workingCounter >= workingLimit)
{
Thread.Sleep(100);
}
workingCounter += 1;
ParameterizedThreadStart pts = new ParameterizedThreadStart(ProcessFile);
Thread th = new Thread(pts);
th.Start(file);
}
//wait for all threads to complete...
while (processedCounter< checkCount)
{
Thread.Sleep(100);
}
Console.WriteLine("Work completed!");
}
static void ProcessFile(object file)
{
try
{
Console.WriteLine(DateTime.Now.ToString() + " recieved: " + file + " thread count is: " + workingCounter.ToString());
//make some sleep for demo...
Thread.Sleep(2000);
}
catch (Exception ex)
{
//handle your exception...
string exMsg = ex.Message;
}
finally
{
Interlocked.Decrement(ref workingCounter);
Interlocked.Increment(ref processedCounter);
}
}
}
Take a look at the Producer/Consumer Queue example by Joe Albahari. It should provide a good starting point for what you're trying to accomplish.
You could use the ThreadPool.
Example:
ThreadPool.SetMaxThreads(3, 3);
for (int i = 0; i < filenames.count; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(ProcessFile), filenames[i]);
}
static void ProcessFile(object fileNameObj)
{
var fileName = (string)fileNameObj;
// do your processing here.
}
If you are using the ThreadPool elsewhere in your application then this would not be a good solution since it is shared across your app.
You could also grab a different thread pool implementation, for example SmartThreadPool
Rather than starting a thread for each file name, put the file names into a queue and then start up three threads to process them. Or, since the main thread is now free, start up two threads and let the main thread work on it, too:
Queue<string> MyQueue;
void MyProc()
{
string[] filenames = Directory.GetFiles(...);
MyQueue = new Queue(filenames);
// start two threads
Thread t1 = new Thread((ThreadStart)ProcessQueue);
Thread t2 = new Thread((ThreadStart)ProcessQueue);
t1.Start();
t2.Start();
// main thread processes the queue, too!
ProcessQueue();
// wait for threads to complete
t1.Join();
t2.Join();
}
private object queueLock = new object();
void ProcessQueue()
{
while (true)
{
string s;
lock (queueLock)
{
if (MyQueue.Count == 0)
{
// queue is empty
return;
}
s = MyQueue.Dequeue();
}
ProcessFile(s);
}
}
Another option is to use a semaphore to control how many threads are working:
Semaphore MySem = new Semaphore(3, 3);
void MyProc()
{
string[] filenames = Directory.GetFiles(...);
foreach (string s in filenames)
{
mySem.WaitOne();
ThreadPool.QueueUserWorkItem(ProcessFile, s);
}
// wait for all threads to finish
int count = 0;
while (count < 3)
{
mySem.WaitOne();
++count;
}
}
void ProcessFile(object state)
{
string fname = (string)state;
// do whatever
mySem.Release(); // release so another thread can start
}
The first will perform somewhat better because you don't have the overhead of starting and stopping a thread for each file name processed. The second is much shorter and cleaner, though, and takes full advantage of the thread pool. Likely you won't notice the performance difference.
Can set max threads unsing ParallelOptions
Parallel.For Method (Int32, Int32, ParallelOptions, Action)
ParallelOptions.MaxDegreeOfParallelism
var results = filenames.ToArray().AsParallel().Select(filename=>ProcessFile(filename)).ToArray();
bool ProcessFile(object fileNameObj)
{
var fileName = (string)fileNameObj;
// do your processing here.
return true;
}

Categories

Resources