I am writing a program that can be easily partitioned into several distinct parts. Simplified, it would look like this:
Reader class would work with getting data from a certain device,
Analyzer class would perform calculations on the data obtained from the device at regular intervals,
Form1 class that outputs UI (graphical representation of data gathered by Reader and number output by Analyzer
Naturally, I'd like those three classes to run in separate threads (on separate cores). Meaning - all methods of Reader run in its own thread, all methods of Analyzer run in its own thread, and Form1 runs in default thread.
However, all that comes to mind is using Thread or BackgroundWorker classes, and then instead of calling some resource-heavy method on Reader or Analyzer I'd instead call
BackgroundWorker.RunWorkerAsync()
I suppose this is not the best way to do it, is it? I'd rather somehow create the class in a separate thread and leave it there for its lifespan, but I just don't get how do I do it... And I can't think of a suitable search query it seems because I haven't found answer when I searched for one.
EDIT: Thank you for the comments, I think I understand, the question itself was assuming that you can create a class "on a thread" - with implied meaning of "any method of this class called will execute on its thread" - which makes no sense, and cannot be done.
I think you are on the right track. You will need
two threads Reader and Analyzer started by Form1. They basically consist of big loops that run until some flag stopReader or stopAnalyzer is set:
two concurrent queues, let's call them readQueue and analyzedQueue. Reader will put stuff in readQueue, Analyzer will read from readQueue and write to analyzedQueue, and Form1 will read from analyzedQueue.
void runReader()
{
while (!stopReader)
{
var data = ...; // read data from device
readQueue.Enqueue(data);
}
}
void runAnalyzer()
{
while (!stopAnalyzer)
{
Data data;
if (readQueue.TryDequeue(out data))
{
var result = ...; // analyze data
analyzedQueue.Enqueue(result);
}
else
{
Thread.Sleep(...); // wait a while
}
}
}
Instead of Thread.Sleep, you could use a BlockingCollection to make Analyzer wait until a new data item is available. In that case, you might want to use a CancellationToken instead of a Boolean for stopAnalyzer, so that you can interrupt BlockingCollection.Take when stopping your algorithm.
Related
I am using a thread-safe third party library to retrieve data from a historian.
The operating mode for a typical scenario is the following:
Library instance;
Result[] Process(string[] itemNames) {
var itemsIds = instance.ReserveItems(itemNames);
Result[] results = instance.ProcessItems(itemIds);
instance.ReleaseItems(itemIds);
return results;
}
Library is a class that is expensive to instantiate, so it is used here as a singleton (instance), and it works perfectly against multiple threads.
However, I notice sometimes that a Result is marked as failed ("item not found"), when multiple threads attempt to execute Process with an itemNames array that shares some common items. Because the library is very badly documented, that was unexpected.
By intensively logging, I have deduced that a thread could release an item at the same time another one is about to process it.
After a couple of mails to the library's vendor, I learnt that instance shares a list of reserved items between thread, and that it is necessary to synchronize the calls...
Uncompiling some part of the library confirmed this: there is a class level m_items list that is used by both ReserveItems and ReleaseItems.
So I envision the following waste:
Result[] Process(string[] itemNames) {
lock(instance) {
var itemsIds = instance.ReserveItems(itemNames);
Result[] results = instance.ProcessItems(itemIds);
instance.ReleaseItems(itemIds);
return results;
}
}
But it seems a bit too violent to me.
As this Library works perfectly when different items are processed by multiple thread, how can perform a more fine-grained synchronization and avoid a performance penalty?
EDIT - 2018-11-09
I noticed that the whole ProcessItems method body of the Library is enclosed into a lock statement...
So any attempt at fine synchronization around this is futile. I ended up enclosing my Process method body in a lock statement as well, the performance penalty is -as expected now- not perceptible at all.
You could implement a lock per item ID. That could take the form of a Dictionary<string, object> where the value is the lock object (new object()).
If you want to process the same item ID on multiple threads at the same time without blocking everything in case of conflict, you could track more state in the dictionary value to do that. As an example, you could use a Dictionary<string, Lazy<Result>>. The first thread to need an item ID would initialize and directly consume the lazy. Other threads can then detect that an operation is in progress on that item ID and also consume the lazy.
I'm working through my first attempt to thread an application. The app works with a large data set that is split up into manageable chunks which are stored on disk, so the entire data set never has to reside in memory all at once. Instead, a subset of the data can be loaded piecemeal as needed. These chunks were previously being loaded one after the other in the main thread. Of course, this would effectively pause all GUI and other operation until the data was fully loaded.
So I decided to look into threading, and do my loading while the app continues to function normally. I was able to get the basic concept working with a ThreadPool by doing something along the lines of the pseudo-code below:
public class MyApp
{
List<int> listOfIndiciesToBeLoaded; //This list gets updated based on user input
Dictionary<int,Stuff> loadedStuff = new Dictionary<int,Stuff>();
//The main thread queues items to be loaded by the ThreadPool
void QueueUpLoads()
{
foreach(int index in listOfIndiciesToBeLoaded)
{
if(!loadedStuff.ContainsKey(index))
loadedStuff.Add(index,new Stuff());
LoadInfo loadInfo = new LoadInfo(index);
ThreadPool.QueueUserWorkItem(LoadStuff, loadInfo);
}
}
//LoadStuff is called from the worker threads
public void LoadStuff(System.Object loadInfoObject)
{
LoadInfo loadInfo = loadInfoObject as LoadInfo;
int index = loadInfo.index;
int[] loadedValues = LoadValuesAtIndex(index); /* here I do my loading and ...*/
//Then I put the loaded data in the corresponding entry in the dictionary
loadedStuff[index].values = loadedValues;
//Now it is accessible from the main thread and it is flagged as loaded
loadedStuff[index].loaded = true;
}
}
public class Stuff
{
//As an example lets say the data being loaded is an array of ints
int[] values;
bool loaded = false;
}
//a class derived from System.Object to be passed via ThreadPool.QueueUserWorkItem
public class LoadInfo : System.Object
{
public int index;
public LoadInfo(int index)
{
this.index = index;
}
}
This is very primitive compared to the quite involved examples I've come across while trying to learn this stuff in the past few days. Sure, it loads the data concurrently and stuffs it into a dictionary accessible from the main thread, but it also leaves me with a crucial problem. I need the main thread to be notified when an item is loaded and which item it is so that the new data can be processed and displayed. Ideally, I'd like to have each completed load call a function on the main thread and provide it the index and newly loaded data as parameters. I understand that I can't just call functions on the main thread from multiple other threads running concurrently. They have to be queued up in some way for the main thread to run them when it is not doing something else. But this is where my current understanding of thread communication falls off.
I've read over a few in-depth explanations of how events and delegates can be set up using Control.Invoke(delegate) when working with Windows Forms. But I'm not working with Windows Forms and haven't been able to apply these ideas. I suppose I need a more universal approach that doesn't depend on the Control class. If you do respond, please be detailed and maybe use some of the naming in my pseudo-code. That way it will be easier for me to follow. Threading appears to be a pretty deep topic, and I'm just coming to grips with the basics. Also please feel free to make suggestions on how I can refine my question to be more clear.
If you aren't using a GUI framework with some kind of dispatcher or GUI thread (like WPF or WinForms) then you'll have to do this manually.
One way to do this is to use a SynchronizationContext.
It's somewhat tricky to manage but there are a few articles which go into how it works and how can you make your own:
http://www.codeproject.com/Articles/31971/Understanding-SynchronizationContext-Part-I
http://www.codeproject.com/Articles/32113/Understanding-SynchronizationContext-Part-II
However I would also consider using either a single 'DictionaryChanged' boolean which is regularly checked by your 'main thread' (when it is idle) to indicate that the dictionary is changed. The flag could then be reset on the main thread to indicate that this has been handled. Keep in mind that you'll need to do some locking there.
You could also queue messages using a thread safe queue which is written by the background thread and read from the main thread if a simple variable is not sufficient. This is essentially what most dispatcher implementations are actually doing under the hood.
TLDR; version of the main questions:
While working with threads, is it safe to read a list's contents with 1 thread, while another write to it, as long you do not delete list contents (reoganize order) and only reads new object after the new object is added fully
While an Int is being updated from "Old Value" to "New Value" by one thread, is there is a risk, if another thread reads this Int that the value returned is neither "Old Value" or "New Value"
Is it possible for a thread to "skip" a critical region if its busy, instead of just going to sleep and wait for the regions release?
I have 2 pieces of code running in seperate threads and I want to have the one act as a producer for the other. I do not want either thread "sleeping" while waiting for access, but instead skip forward in their internal code if the other thread is accessing this.
My original plan were to share the data via this approach (and once counter got high enough switch to a secondary list to avoid overflows).
pseudo code of flow as I original intended it.
Producer
{
Int counterProducer;
bufferedObject newlyProducedObject;
List <buffered_Object> objectsProducer;
while(true)
{
<Do stuff until a new product is created and added to newlyProducedObject>;
objectsProducer.add(newlyProducedObject_Object);
counterProducer++
}
}
Consumer
{
Int counterConsumer;
Producer objectProducer; (contains reference to Producer class)
List <buffered_Object> personalQueue
while(true)
<Do useful work, such as working on personal queue, and polish nails if no personal queue>
//get all outstanding requests and move to personal queue
while (counterConsumer < objectProducer.GetcounterProducer())
{
personalQueue.add(objectProducer.GetItem(counterconsumer+1));
counterConsumer++;
}
}
Looking at this, everything looked fine at first glance, I knew I would not be retrieving a half constructed product from the queue, so the status of the list regardless of where it is should not be a problem even if a thread switch occour while the Producer is adding a new object. Is this assumption correct, or can there be problems here? (my guess is as the consumer is asking for a specific location in the list and new objects are added to the end, and objects are never deleted that this will not be a problem)
But what caught my eye was, could a similar problem occour that "counterProducer" is at an unknown value while it is being "counterProducer++"? Could this result in the value temporary be "null" or some unknown value? Will this be a potential issue?
My goal is to have neither of the two threads lock while waiting for a mutex but instead continue their loops, which is why I made the above first, as there is no locking.
If the usage of the list will cause problems, my workaround will be to make a linked list implementation, and share it between the two classes, still use the counters to see if new work has been added and keep last location while the personalQueue moves new stuff to personal queue. So producer add new links, consumer reads them, and deletes previous. (no counter on the list, just external counters to know how much has been added and removed)
alternative pseudo code to avoid the counterConsumer++ risk (need help with this).
Producer
{
Int publicCounterProducer;
Int privateCounterProducer;
bufferedObject newlyProducedObject;
List <buffered_Object> objectsProducer;
while(true)
{
<Do stuff until a new product is created and added to newlyProducedObject>;
objectsProducer.add(newlyProducedObject_Object);
privateCounterProducer++
<Need Help: Some code that updates the publicCounterProducer to the privateCounterProducer if that variable is not
locked, else skips ahead, and the counter will get updated at next pass, at some point the consumer must be done reading stuff, and
new stuff is prepared already>
}
}
Consumer
{
Int counterConsumer;
Producer objectProducer; (contains reference to Producer class)
List <buffered_Object> personalQueue
while(true)
<Do useful work, such as working on personal queue, and polish nails if no personal queue>
//get all outstanding requests and move to personal queue
<Need Help: tries to read the publicProducerCounter and set readProducerCounter to this, else skips this code>
while (counterConsumer < readProducerCounter)
{
personalQueue.add(objectProducer.GetItem(counterconsumer+1));
counterConsumer++;
}
}
So goal in the 2nd part of code, and I have not been able to figure out how to code this, is to make both classes not wait for the other in case the other is in the "critical region" of updating the publicCounterProducer. If I read the lock functionality correct, the threads will go to sleep waiting for the release, which is not what I want. Might end up with having to use it though, in which case, first pseudocode would do it, and just set a "lock" on the getting of the value.
Hope you can help me out with my many questions.
No it is not safe. A context switch can occur within .Add after List has added the object, but before List has updated the internal data structure.
If it is int32, or if it is int64 and you are running in an x64 process, then there is no risk. But if you have any doubts, use the Interlocked class.
Yes, you can use a Semaphore, and when it is time to enter the critical region, use WaitOne overload that takes a timeout. Pass a timeout of 0. If WaitOne returns true, then you successfully acquired the lock and can enter. If it returns false, then you did not acquire the lock and should not enter.
You should really look at the System.Collections.Concurrent namespace. In particular, look at the BlockingCollection. It has a bunch of Try* operators you can use to add/remove items from the collection without blocking.
While working with threads, is it safe to read a list's contents with 1 thread, while another write to it, as long you do not delete list contents (reoganize order) and only reads new object after the new object is added fully
No, it is not. A side-effect of adding an item to a list may be to reallocate its underlying array. Current implementations of List<T> update the internal reference before copying the old data to it, so multiple threads may observe a list of the correct size but containing no data.
While an Int is being updated from "Old Value" to "New Value" by one thread, is there is a risk, if another thread reads this Int that the value returned is neither "Old Value" or "New Value"
Nope, int updates are atomic. But if two threads are both incrementing counterProducer at once, it will go wrong. You should use Interlocked.Increment() to increment it.
Is it possible for a thread to "skip" a critical region if its busy, instead of just going to sleep and wait for the regions release?
No, but you can use (for example) WaitHandle.WaitOne(int) to see if a wait succeeded, and branch accordingly. WaitHandle is implemented by several synchronization classes, such as ManualResetEvent.
Incidentally, is there a reason you are not using the built-in Producer/Consumer classes such as BlockingCollection<T>? BlockingCollection is easy to use (after you read the documentation!) and I'd recommend using it instead.
Good morning,
At the startup of the application I am writing I need to read about 1,600,000 entries from a file to a Dictionary<Tuple<String, String>, Int32>. It is taking about 4-5 seconds to build the whole structure using a BinaryReader (using a FileReader takes about the same time). I profiled the code and found that the function doing the most work in this process is BinaryReader.ReadString(). Although this process needs to be run only once and at startup, I would like to make it as quick as possible. Is there any way I can avoid BinaryReader.ReadString() and make this process faster?
Thank you very much.
Are you sure that you absolutely have to do this before continuing?
I would examine the possibility of hiving off the task to a separate thread which sets a flag when finished. Then your startup code simply kicks off that thread and continues on its merry way, pausing only when both:
the flag is not yet set; and
no more work can be done without the data.
Often, the illusion of speed is good enough, as anyone who has coded up a splash screen will tell you.
Another possibility, if you control the data, is to store it in a more binary form so you can just blat it all in with one hit (i.e., no interpretation of the data, just read in the whole thing). That, of course, makes it harder to edit the data from outside your application but you haven't stated that as a requirement.
If it is a requirement or you don't control the data, I'd still look into my first suggestion above.
If you think that reading the file line by line is the bottleneck, and depending on its size, you can try to read it all at once:
// read the entire file at once
string entireFile = System.IO.File.ReadAllText(path);
It this doesn't help, you can try to add a separate thread with a semaphore, which would start reading in background immediately when the program is started, but block the requesting thread at the moment you try to access the data.
This is called a Future, and you have an implementation in Jon Skeet's miscutil library.
You call it like this at the app startup:
// following line invokes "DoTheActualWork" method on a background thread.
// DoTheActualWork returns an instance of MyData when it's done
Future<MyData> calculation = new Future<MyData>(() => DoTheActualWork(path));
And then, some time later, you can access the value in the main thread:
// following line blocks the calling thread until
// the background thread completes
MyData result = calculation.Value;
If you look at the Future's Value property, you can see that it blocks at the AsyncWaitHandle if the thread is still running:
public TResult Value
{
get
{
if (!IsCompleted)
{
_asyncResult.AsyncWaitHandle.WaitOne();
_lock.WaitOne();
}
return _value;
}
}
If strings are repeated inside tuples you could reorganize your file to have all different involving strings at the start, and have references to those strings (integers) in the body of the file. Your main Dictionary does not have to change, but you would need a temporary Dictionary during startup with all different strings (values) and their references (keys).
I have been using Asynchronous operations in my WinForms client since the beginning, but only for some operations. When the ExecuteReader or the ExecuteNonQuery completes, the callback delegate fires and everything works just fine.
I've basically got two issues:
1) What is the best structure for dealing with this in a real-life system? All the examples I have seen are toy examples where the form handles the operation completion and then opening a datareader on the EndExecuteReader. Of course, this means that the form is more tightly coupled to the database than you would normally like. And, of course, the form can always easily call .Invoke on itself. I've set up all my async objects to inherit from a AsyncCtrlBlock<T> class and have the form and all the callback delegates provided to the constructor of the async objects in my DAL.
2) I am going to re-visit a portion of the program that currently is not async. It makes two calls in series. When the first is complete, part of the model can be populated. When the second is complete, the remaining part of the model can be completed - but only if the first part is already done. What is the best way to structure this? It would be great if the first read can be done and the processing due to the first read be underway while the second is launched, but I don't want the processing of the second read to be started until I know that the processing of the first read's data has been completed.
regarding 2)
make the first phase of your model populating asynchronous.
you will have something like that
FisrtCall();
AsyncResult arPh1 = BeginPhaseOne(); //use results from first call
SecondCall();
EndPhaseOne(arPh1); //wait until phase one is finished
PhaseTwo(); //proceed to phase two
If you are on .Net 4, this would be an ideal application of TPL! You could factor your code into tasks like this:
TaskScheduler uiScheduler = GetUISheduller();
SqlCommand command1 = CreateCommand1();
Task<SqlDataReader> query1 = Task<SqlDataReader>.Factory.FromAsync(command1.BeginExecuteReader, command1.EndExecuteReader, null);
query1.ContinueWith(t => PopulateGrid1(t.Result), uiScheduler);
SqlCommand command2 = CreateCommand2();
query1.ContinueWith(t => Task<SqlDataReader>.Factory.FromAsync(command2.BeginExecuteReader, command2.EndExecuteReader, null)
.ContinueWith(t => PopulateGrid2(t.Result), uiScheduler);