Is `WebWorkers` concept in JavaScript similar to asynchronous functions? - c#

First I have developed much in C#, now I'm working on 3D web project and now the most usable language is JavaScript at the moment.
In C# until there becomes the new keywords async/await in new C# specification there was a way to make asynchronous calls by using:
delegates
Begin/End functions, like: BeginInvoke, EndInvoke
IAsync interface
As for JS... Right now I have a need to use some parallel computations and really need a stuff, which is similar to asynchronous work with some locking models like Semaphore, Mutex or other..
As for async/await... I have tried the promises concept, which is implemented in jQuery with its deferred promise:
http://api.jquery.com/deferred.promise/
http://api.jquery.com/category/deferred-object/
It remains me async/await concept in C#:
http://msdn.microsoft.com/en-us/library/hh191443.aspx
But I've also have found such concept as WebWorkers:
https://developer.mozilla.org/en-US/docs/Web/Guide/Performance/Using_web_workers
When I first read it, I think it could be also a solution except promises pattern, but if to look from the point of implementing I understand WebWorkers are launching from other threads than a main page execution thread and the functions aren't really asynchronous, they're just callbacks with one option, that they have been added to the Worker() instance, which can be used in main page thread, am I right?
So... I wonder, how can I also implement something similar to the Semaphore in JavaScript?
Thanks!
UPDATE #1 ( more a reply to Doge ):
Let me describe you a simple example, there is an application, which I'm developing.
I already using the jQuery deferred object for one thing to await all the texture images I've received, which I was awaiting for.
The link is the next: http://bit.ly/O2dZmQ
It's an webgl application with the three.js library, which is building houses on real data (ps: don't look into code, I know it's not good :) I only recently understand the prototype way of programming in js :) first I was too used to C# and its paradigms ).
So... I have such a task. I must wait when all the textures will be loaded from AJAX and only then I'm setting them as textures for meshes.
Right now... When I've created this question, I thought about redeveloping the source code and thought about WebWorkers use.
What have I think first, which I want to do and what I've done when developed WPF/Silverlight application in C#.
I've done another instance of Worker, which will check asynchronously the task I've described above.
And I have done a very tiny and simple example which I want to use, but have a fail.
As I saw WebWorkers don't accept objects if I want to send it to worker. Alright...
Passing objects to a web worker
There is a JSON.stringify() method... But I've see another thing... JSON.stringify() can't parse an object to string where the are circular references.
Chrome sendrequest error: TypeError: Converting circular structure to JSON
Truly... It's rather disappointing... Because if there is C# or even C++ it's NOT a problem to exchange between instances some objects... Some things could be done with some reinterpret casts or other stuff... And it's not a problem to exchange objects even between different threads even in asynchronous work...
So... For the my aim... What is the best solution? Keep the deffered/promises pattern and not to use WebWorkers?
The tiny source, not full application, but just for a small example, what I want to do:
http://pastebin.com/5ernFNme ( need HTML for JS, which is below )
http://pastebin.com/ELcw7SuE ( JS main logic )
http://pastebin.com/PuHrhW8n ( WebWorker, which I suppose to use as for separate checker )
Textures for a tiny sample:
http://s14.postimg.org/wqm0xb2ep/box.jpg
http://s27.postimg.org/nc77umytv/box2.jpg
Minified three.js could be found here:
https://github.com/mrdoob/three.js/blob/master/build/three.min.js

It's important to understand that JavaScript has no threads. It has an event loop that executes events, one-by-one, as they come in.
The consequence of this is that if you have a process that takes a while you're blocking all execution. If JavaScript is also needed to do UI stuff like responding to user events or animations, it's a bit of snag. You could try to split up your process into multiple events to keep the event loop running smoothly but that's not always easy to do.
This is where Workers come it. Workers run in their own thread. To avoid issues related to threads the workers don't share memory.
You can communicate with a worker by sending and receiving messages. The messages in turn come in the event loop, so you don't need semaphores or anything that synchronizes threads. If your worker controller is in JavaScript there are never any atomicity issues.
If your workers are simple input->output workers then you can totally slap a Promise layer on top of that. Just keep in mind that Promises themselves don't add threads or asynchronousness.
You can only send Workers messages, that is: strings. You can't send them objects, and certainly not objects that might have references to other objects because again: memory issues.
If I look at your use case I guess the only reason why you might want the Workers is to take advantage of multiple cores that most CPUs have nowadays.
I think what you're doing is loading images as textures into your canvas, and that's what takes up a lot of time? There is no good way to use Workers here since the Worker would need a reference to the canvas, and that's not happening.
Now if instead you needed to do processing on the textures to transform them in some way you could use Workers. Send the image data as a binary string, possibly base64_encoded, do the conversion and send it back. If your image is large the serialization will also take up a fair chunk of CPU time so your mileage may vary.
From what I can tell your resources load pretty quick and there doesn't seem to be a CPU bottleneck. So I don't know if you really need the Workers.

Related

in C# how to run method async in the same thread

Is it possible to define and call method asynchronously in the same thread as the caller?
Suppose I have just one core and I don't want threading management overhead with like 100 threads.
Edit
The reason I ask is nodejs model of doing things - everything on one thread never blocking anything, which proved to be very efficient, which made me wonder if same stuff possible in C# (and I couldn't achieve it myself).
Edit2 Well, as noted in comments node isn't single-threaded after all (however simple load test shows, that it uses just one core...), but I think what makes it so efficient is implicit requirement to write only non-blocking code. Which is possible in C#, except not required:) Anyway, thanks everyone...
More info in
this SO post and even more in this one
It's not really clear exactly what context you're talking about, but the async/await feature of C# 5 already helps to support this sort of thing. The difference is that whereas in node.js everything is forced to be single threaded by default (as I understand it; quite possibly incorrectly given the links in the comments), server-side applications in .NET using asynchrony will use very few threads without limiting themselves to that. If everything really can be served by a single thread, it may well be - if you never have more than one thing to do in terms of physical processing, then that's fine.
But what if one request comes in while another is doing a bit of work? Maybe it's doing just a small amount of encryption, or something like that. Do you really want to make the second request wait for the first one to complete? If you do, you could model that in .NET fairly easily with a TaskScheduler associated with a single-thread thread-pool... but more commonly, you'd use the thread-pool built into .NET, which will work efficiently while still allowing concurrency.
First off, you should make sure you're using .NET 4.5 - it has far more asynchronous APIs (e.g. for database and file access) than earlier versions of .NET. You want to use APIs which conform to the Task-based Asynchronous Pattern (TAP). Then, using async/await you can write your server-side code so that it reads a bit like synchronous code, but actually executes asynchronous. So for example, you might have:
Guid userId = await FetchUserIdAsync();
IEnumerable<Message> messages = await FetchMessagesAsync(userId);
Even though you need to "wait" while each of these operations talks place, you do so without blocking a thread. The C# compiler takes care of building a state machine for you. There's no need to explicitly write callbacks which frequently turn into spaghetti code - the compiler does it all.
You need to make sure that whatever web/web-service framework you use supports asynchrony, and it should just manage the rest, using as few threads as it can get away with.
Follow the two links above to find out more about the asynchrony support in C# 5 - it's a huge topic, and I'm sure you'll have more questions afterwards, but when you've read a bit more you should be able to ask very specific questions instead of the current broad one.

Design thoughts required about concurrent processing

I have a series of calculations that need to be processed - the calculations and the order they run are all defined by the user on the UI.
If they just ran one after each other, it wouldn't be too hard. However, some of the calculations need to be processed concurrently and all calculations must have the ability to be individually paused at any time. I also need to be able to re-arrange orders or add new calculations to be processed at any time. So whatever I do must be flexible enough to handle this.
On the UI, imagine a listbox (a queue, if you like) of usercontrols - with each usercontrol displaying the name of the calculation and a pause button. And I can add calculations to this list at any time during processing.
What is the best way to do this?
Should I be running each calculation in its own thread? If so, how should I store the list of running processes? How will I pass the queue to the calculation processor? How will I be able to ensure that every time the queue changes (new ordering or new calculation) the calculation processor will be made aware of this?
My initial thoughts were to have:
CalcProcessor class
CalcCalculation class
In CalcProcessor have 2 Lists of CalcCalculations. One being the "queue" as shown on the UI (perhaps a pointer to it? Or some other way to ensure it updates live), and the other being the list of currently running calculations.
Somehow I need to get the CalcCalculation to be running in its own thread to process the calculation, and be able to handle any pause events. So I need some way to transmit the info of the Pause button being pressed in the UI to the CalcProcessor object, and then into the correct CalcCalculation.
Edit in response to David Hope:
Thanks for your reply.
Yes, there are n calculations but this could change at any time due to being able to add more calculations to process on the UI.
They do not need to share data in anyway. There will be a setting in the application to specify how many should run concurrently (ie. 10 at any given time, the first 10 in the queue for example - and when 1 finishes the next calculation in the queue will start processing).
The calculation will involve taking data from some data source - it could be a database or a file, and analysing it and performing some calculations on that data. When I say the calculation needs to be paused, I don't mean pausing the thread... I just mean (for example, as I haven't written this part of the application yet) if it is reading row by row from a database and doing some live calculations pausing at the completion of processing the current row... and continuing on when the pause button is unclicked on the UI - which could be done with something as primitive as a while(notPaused) loop providing I can get the Pause information from the UI into the thread.
There are several questions here:
How to synchronize the UI and the model?
I think you got this one backwards. Your model shouldn't have a “pointer” to the queue you're showing in the UI. Instead, the queue should be in your model and you should use databinding together with INotifyPropertyChange and ObservableCollection to show the queue on the UI. (At least that's how it's done in WPF.)
This way, you can manipulate your queue directly from your model, and it will automatically show on the UI.
How to start and monitor calculations?
I think Tasks are ideal for this. You can start a Task using Task.Factory.StartNew(). Since it seems your Tasks will take long to execute, you might consider using TaskCreationOptions.LongRunning. You can also use the Task to find out when is the calculation complete (or if it failed with an exception).
How to pause running calculations?
You can use ManualReserEventSlim for that. Normally, it would be set, but if you wanted to pause a running Task, you would Reset() it. The calculation will need to periodically call Wait() on that event. It's not possible to reasonably pause a running thread without cooperation from the calculation on that thread.
If you were using C# 5.0, a better approach would be to use something like PauseToken.
In Framework 4.5, the answer here is the Async API, which removes the need to manage threads. For details, look at the async/await keywords.
From a broader perspective, a "CalcProcessor" class is a good idea, but I think the Task object will suffice to replace your "CalcCalculation" class. The Processor can simply have an Enumerable of Tasks. The Processor can expose methods for managing the queue, if needed, as well as returning information about its status. When your application finally reaches a state where it must have the results, you can use the AwaitAll method to block the CalcProcessor's thread until all of the tasks complete.
Without more information about the actual goal here, it's hard to give better advice.
You can use Observer Pattern to display results on UI and order changes back in to Processor. State and Command patterns will help you to start, pause, cancel the calculations. These patterns have great answers to your questions in design way. Concurrency is still a problem, they do not answer multi-threading problems but they open an easier road to manage threads.
I suggest that you haven't broken the problem down far enough, which is the reason you are frustrated.
You need to start small and build up from there. You mention, but don't define your actual requirements, but they seem to be...
Need to be able to run ?N? calculations
Some need to be run concurrently (does this imply that they share data, if so how are you going to share the data)
Must be able to pause the calculation (don't use Thread.Suspend, as it potentially leaves a thread in an unstable state, particularly bad if you are sharing data), so you will need to build pause points into each calculation. Also need to consider how you are going to communicate the pause/unpause to the calculation
As far as methods, there are several to consider...
Threads are an obvious choice, but require careful tending too (starting, pausing, stopping, etc...)
You could also use BackGroundWorker or possibly Parallel.ForEach
BackGroundWorker contains the framework for cancelling the worker and providing progress (which can be useful).
My recommendation to start would be to go with BackGroundWorker, potentially subclass it to add the Pause/Resume functionality you need. Determine how you are going to manage data sharing (at least use lock to protect against simultaneous access).
You may find BackGroundWorker too restrictive and need to go with Threads, but I'm usually able to avoid it.
If you post more clear requirements, or samples of what you've tried and didn't work, I'll be happy to help more.
For queue you can use heap data structure (priority queue). This will help prioritize yours tasks. Also you should use Thread Pool for effectively calculations. And try to split you tasks to little parts.

Is it possible to create an object on a background thread with no reference on the main thread?

Equities trading application uses a class library for getting callbacks on stock quote updates, another class library for getting callbacks on order executions or cancelations. I currently have the callbacks execute in the thread pool. I start one background thread for each callback. The threads are very short lived and the work involved includes fetching the data and notifying the observers. Once observers are notified the background thread dies. When I have strategies subscribing to over 1000 actively traded symbols I get OutOfMemory exceptions.
How can I improve this design? I was thinking of starting two threads at the start, one for quotes, the other for executions, and creating each object on its respective threads. Then just have a shared object which allows adding and removing observers to the threads. But 1) how would you keep the thread alive to receive the callbacks? 2) How can you even have a callback object which is initialized on a thread with no reference on the main thread? Is this even possible?
Any help would be appreciated.
Use a producer / consumer model with a simple queue. Then you have a set number of worker threads running and you won't have this problem.
As for how to call the callback function, you could possibly use a struct like this:
struct WorkerData
{
Data data;
Delegate someCallback;
}
when the worker is finished with the data it can invoke the callback itself.
What you've described is a general picture of your application. In order to redesign your application we concrete requirements and at least a simplified model of how the participants interact with each other. Your informal description is not precise enough to suggest a specific data structure and algorithm because without knowing all enough details we might omit something crucial and not meet your needs.
You are saying all the right words and you have a specific problem, out of memory, and you need to fix something. Go back to prototyping. Write a very small but brutally exercised program to demonstrate what you want to do. Then scale it back up to your application. It's much easier to design in the prototype size.
Edit:
Because you are running out of memory, the most likely reasons are that you have a memory leak or you simply have a near-real-time system with insufficient capacity to process the load you are experiencing. A leak might be due to the usual suspects, e.g. not detaching event handlers which you can diagnose with memory profilers, but we'll rule that out for now.
If you must keep up with quotes as they are updated, they have to go somewhere such as a queue or be dispatched to a thread, and unless you can keep up, this can grow unbounded.
The only way to solve this problem is to:
throw some quotes on the floor
get beefier hardware
process quotes more efficiently
I think you are hoping that there is a clear alternative to process quotes more efficiently with a new data structure or algorithm that could make a big difference. But even if you do make it more efficient, the problem could still come back and you may be forced to consider gracefully degrading under overload conditions rather than failing with out of memory.
But in general terms, for high performance simpler is often better and fewer threads swaps is better. For example, if the work done in update is small, making it synchronous could be a big win, even though it seems counter intuitive. You have to know what the update handler does and most of all for a near-real-time system you have to measure, measure, measure to empirically know which is fastest.
To me
I currently have the callbacks execute in the thread pool
and
Once observers are notified the background thread dies
are mildly contradictory. I suspect you might be intending to use threads from a pool, but accidentally using new 'free' (unpooled) threads each time.
You might want to look at the documentation for WeakReference.
However, I suggest you use a profiler/perfmon to find the resource leak first and foremost. Replacing the whole shebang with a queuing approach sounds reasonable, but it's pretty close to what you'd have anyway with a proper threadpool.

Why is the explicit management of threads a bad thing?

In a previous question, I made a bit of a faux pas. You see, I'd been reading about threads and had got the impression that they were the tastiest things since kiwi jello.
Imagine my confusion then, when I read stuff like this:
[T]hreads are A Very Bad Thing. Or, at least, explicit management of threads is a bad thing
and
Updating the UI across threads is usually a sign that you are abusing threads.
Since I kill a puppy every time something confuses me, consider this your chance get your karma back in the black...
How should I be using thread?
Enthusiam for learning about threading is great; don't get me wrong. Enthusiasm for using lots of threads, by contrast, is symptomatic of what I call Thread Happiness Disease.
Developers who have just learned about the power of threads start asking questions like "how many threads can I possible create in one program?" This is rather like an English major asking "how many words can I use in a sentence?" Typical advice for writers is to keep your sentences short and to the point, rather than trying to cram as many words and ideas into one sentence as possible. Threads are the same way; the right question is not "how many can I get away with creating?" but rather "how can I write this program so that the number of threads is the minimum necessary to get the job done?"
Threads solve a lot of problems, it's true, but they also introduce huge problems:
Performance analysis of multi-threaded programs is often extremely difficult and deeply counterintuitive. I've seen real-world examples in heavily multi-threaded programs in which making a function faster without slowing down any other function or using more memory makes the total throughput of the system smaller. Why? Because threads are often like streets downtown. Imagine taking every street and magically making it shorter without re-timing the traffic lights. Would traffic jams get better, or worse? Writing faster functions in multi-threaded programs drives the processors towards congestion faster.
What you want is for threads to be like interstate highways: no traffic lights, highly parallel, intersecting at a small number of very well-defined, carefully engineered points. That is very hard to do. Most heavily multi-threaded programs are more like dense urban cores with stoplights everywhere.
Writing your own custom management of threads is insanely difficult to get right. The reason is because when you are writing a regular single-threaded program in a well-designed program, the amount of "global state" you have to reason about is typically small. Ideally you write objects that have well-defined boundaries, and that do not care about the control flow that invokes their members. You want to invoke an object in a loop, or a switch, or whatever, you go right ahead.
Multi-threaded programs with custom thread management require global understanding of everything that a thread is going to do that could possibly affect data that is visible from another thread. You pretty much have to have the entire program in your head, and understand all the possible ways that two threads could be interacting in order to get it right and prevent deadlocks or data corruption. That is a large cost to pay, and highly prone to bugs.
Essentially, threads make your methods lie. Let me give you an example. Suppose you have:
if (!queue.IsEmpty) queue.RemoveWorkItem().Execute();
Is that code correct? If it is single threaded, probably. If it is multi-threaded, what is stopping another thread from removing the last remaining item after the call to IsEmpty is executed? Nothing, that's what. This code, which locally looks just fine, is a bomb waiting to go off in a multi-threaded program. Basically that code is actually:
if (queue.WasNotEmptyAtSomePointInThePast) ...
which obviously is pretty useless.
So suppose you decide to fix the problem by locking the queue. Is this right?
lock(queue) {if (!queue.IsEmpty) queue.RemoveWorkItem().Execute(); }
That's not right either, necessarily. Suppose the execution causes code to run which waits on a resource currently locked by another thread, but that thread is waiting on the lock for queue - what happens? Both threads wait forever. Putting a lock around a hunk of code requires you to know everything that code could possibly do with any shared resource, so that you can work out whether there will be any deadlocks. Again, that is an extremely heavy burden to put on someone writing what ought to be very simple code. (The right thing to do here is probably to extract the work item in the lock and then execute it outside the lock. But... what if the items are in a queue because they have to be executed in a particular order? Now that code is wrong too because other threads can then execute later jobs first.)
It gets worse. The C# language spec guarantees that a single-threaded program will have observable behaviour that is exactly as the program is specified. That is, if you have something like "if (M(ref x)) b = 10;" then you know that the code generated will behave as though x is accessed by M before b is written. Now, the compiler, jitter and CPU are all free to optimize that. If one of them can determine that M is going to be true and if we know that on this thread, the value of b is not read after the call to M, then b can be assigned before x is accessed. All that is guaranteed is that the single-threaded program seems to work like it was written.
Multi-threaded programs do not make that guarantee. If you are examining b and x on a different thread while this one is running then you can see b change before x is accessed, if that optimization is performed. Reads and writes can logically be moved forwards and backwards in time with respect to each other in single threaded programs, and those moves can be observed in multi-threaded programs.
This means that in order to write multi-threaded programs where there is a dependency in the logic on things being observed to happen in the same order as the code is actually written, you have to have a detailed understanding of the "memory model" of the language and the runtime. You have to know precisely what guarantees are made about how accesses can move around in time. And you cannot simply test on your x86 box and hope for the best; the x86 chips have pretty conservative optimizations compared to some other chips out there.
That's just a brief overview of just a few of the problems you run into when writing your own multithreaded logic. There are plenty more. So, some advice:
Do learn about threading.
Do not attempt to write your own thread management in production code.
Use higher-level libraries written by experts to solve problems with threads. If you have a bunch of work that needs to be done in the background and want to farm it out to worker threads, use a thread pool rather than writing your own thread creation logic. If you have a problem that is amenable to solution by multiple processors at once, use the Task Parallel Library. If you want to lazily initialize a resource, use the lazy initialization class rather than trying to write lock free code yourself.
Avoid shared state.
If you can't avoid shared state, share immutable state.
If you have to share mutable state, prefer using locks to lock-free techniques.
Explicit management of threads is not intrinsically a bad thing, but it's frought with dangers and shouldn't be done unless absolutely necessary.
Saying threads are absolutely a good thing would be like saying a propeller is absolutely a good thing: propellers work great on airplanes (when jet engines aren't a better alternative), but wouldn't be a good idea on a car.
You cannot appreciate what kind of problems threading can cause unless you've debugged a three-way deadlock. Or spent a month chasing a race condition that happens only once a day. So, go ahead and jump in with both feet and make all the kind of mistakes you need to make to learn to fear the Beast and what to do to stay out of trouble.
There's no way I could offer a better answer than what's already here. But I can offer a concrete example of some multithreaded code that we actually had at my work that was disastrous.
One of my coworkers, like you, was very enthusiastic about threads when he first learned about them. So there started to be code like this throughout the program:
Thread t = new Thread(LongRunningMethod);
t.Start(GetThreadParameters());
Basically, he was creating threads all over the place.
So eventually another coworker discovered this and told the developer responsible: don't do that! Creating threads is expensive, you should use the thread pool, etc. etc. So a lot of places in the code that originally looked like the above snippet started getting rewritten as:
ThreadPool.QueueUserWorkItem(LongRunningMethod, GetThreadParameters());
Big improvement, right? Everything's sane again?
Well, except that there was a particular call in that LongRunningMethod that could potentially block -- for a long time. Suddenly every now and then we started seeing it happen that something our software should have reacted to right away... it just didn't. In fact, it might not have reacted for several seconds (clarification: I work for a trading firm, so this was a complete catastrophe).
What had ended up happening was that the thread pool was actually filling up with long-blocking calls, leading to other code that was supposed to happen very quickly getting queued up and not running until significantly later than it should have.
The moral of this story is not, of course, that the first approach of creating your own threads is the right thing to do (it isn't). It's really just that using threads is tough, and error-prone, and that, as others have already said, you should be very careful when you use them.
In our particular situation, many mistakes were made:
Creating new threads in the first place was wrong because it was far more costly than the developer realized.
Queuing all background work on the thread pool was wrong because it treated all background tasks indiscriminately and did not account for the possibility of asynchronous calls actually being blocked.
Having a long-blocking method by itself was the result of some careless and very lazy use of the lock keyword.
Insufficient attention was given to ensuring that the code that was being run on background threads was thread-safe (it wasn't).
Insufficient thought was given to the question of whether making a lot of the affected code multithreaded was even worth doing to begin with. In plenty of cases, the answer was no: multithreading just introduced complexity and bugs, made the code less comprehensible, and (here's the kicker): hurt performance.
I'm happy to say that today, we're still alive and our code is in a much healthier state than it once was. And we do use multithreading in plenty of places where we've decided it's appropriate and have measured performance gains (such as reduced latency between receiving a market data tick and having an outgoing quote confirmed by the exchange). But we learned some pretty important lessons the hard way. Chances are, if you ever work on a large, highly multithreaded system, you will too.
Unless you are on the level of being able to write a fully-fledged kernel scheduler, you will get explicit thread management always wrong.
Threads can be the most awesome thing since hot chocolate, but parallel programming is incredibly complex. However, if you design your threads to be independent then you can't shoot yourself in the foot.
As fore rule of the thumb, if a problem is decomposed into threads, they should be as independent as possible, with as few but well defined shared resources as possible, with the most minimalistic management concept.
I think the first statement is best explained as such: with the many advanced APIs now available, manually writing your own thread code is almost never necessary. The new APIs are a lot easier to use, and a lot harder to mess up!. Whereas, with the old-style threading, you have to be quite good to not mess up. The old-style APIs (Thread et. al.) are still available, but the new APIs (Task Parallel Library, Parallel LINQ, and Reactive Extensions) are the way of the future.
The second statement is from more of a design perspective, IMO. In a design that has a clean separation of concerns, a background task should not really be reaching directly into the UI to report updates. There should be some separation there, using a pattern like MVVM or MVC.
I would start by questioning this perception:
I'd been reading about threads and had got the impression that they were the tastiest things since kiwi jello.
Don’t get me wrong – threads are a very versatile tool – but this degree of enthusiasm seems weird. In particular, it indicates that you might be using threads in a lot of situations where they simply don’t make sense (but then again, I might just mistake your enthusiasm).
As others have indicated, thread handling is additionally quite complex and complicated. Wrappers for threads exist and only in rare occasions do they have to be handled explicitly. For most applications, threads can be implied.
For example, if you just want to push a computation to the background while leaving the GUI responsive, a better solution is often to either use callback (that makes it seem as though the computation is done in the background while really being executed on the same thread), or by using a convenience wrapper such as the BackgroundWorker that takes and hides all the explicit thread handling.
A last thing, creating a thread is actually very expensive. Using a thread pool mitigates this cost because here, the runtime creates a number of threads that are subsequently reused. When people say that explicit management of threads is bad, this is all they might be referring to.
Many advanced GUI Applications usually consist of two threads, one for the UI, one (or sometimes more) for Processing of data (copying files, making heavy calculations, loading data from a database, etc).
The processing threads shouldn't update the UI directly, the UI should be a black box to them (check Wikipedia for Encapsulation).
They just say "I'm done processing" or "I completed task 7 of 9" and call an Event or other callback method. The UI subscribes to the event, checks what has changed and updates the UI accordingly.
If you update the UI from the Processing Thread you won't be able to reuse your code and you will have bigger problems if you want to change parts of your code.
I think you should experiement as much as possible with Threads and get to know the benefits and pitfalls of using them. Only by experimentation and usage will your understanding of them grow. Read as much as you can on the subject.
When it comes to C# and the userinterface (which is single threaded and you can only modify userinterface elements on code executed on the UI thread). I use the following utility to keep myself sane and sleep soundly at night.
public static class UIThreadSafe {
public static void Perform(Control c, MethodInvoker inv) {
if(c == null)
return;
if(c.InvokeRequired) {
c.Invoke(inv, null);
}
else {
inv();
}
}
}
You can use this in any thread that needs to change a UI element, like thus:
UIThreadSafe.Perform(myForm, delegate() {
myForm.Title = "I Love Threads!";
});
A huge reason to try to keep the UI thread and the processing thread as independent as possible is that if the UI thread freezes, the user will notice and be unhappy. Having the UI thread be blazing fast is important. If you start moving UI stuff out of the UI thread or moving processing stuff into the UI thread, you run a higher risk of having your application become unresponsive.
Also, a lot of the framework code is deliberately written with the expectation that you will separate the UI and processing; programs will just work better when you separate the two out, and will hit errors and problems when you don't. I don't recall any specifics issues that I encountered as a result of this, though I have vague recollections of in the past trying to set certain properties of stuff the UI was responsible for outside of the UI and having the code refuse to work; I don't recall whether it didn't compile or it threw an exception.
Threads are a very good thing, I think. But, working with them is very hard and needs a lot of knowledge and training. The main problem is when we want to access shared resources from two other threads which can cause undesirable effects.
Consider classic example: you have a two threads which get some items from a shared list and after doing something they remove the item from the list.
The thread method that is called periodically could look like this:
void Thread()
{
if (list.Count > 0)
{
/// Do stuff
list.RemoveAt(0);
}
}
Remember that the threads, in theory, can switch at any line of your code that is not synchronized. So if the list contains only one item, one thread could pass the list.Count condition, just before list.Remove the threads switch and another thread passes the list.Count (list still contains one item). Now the first thread continues to list.Remove and after that second thread continues to list.Remove, but the last item already has been removed by the first thread, so the second one crashes. That's why it would have to be synchronized using lock statement, so that there can't be a situation where two threads are inside the if statement.
So that is the reason why UI which is not synchronized must always run in a single thread and no other thread should interfere with UI.
In previous versions of .NET if you wanted to update UI in another thread, you would have to synchronize using Invoke methods, but as it was hard enough to implement, new versions of .NET come with BackgroundWorker class which simplifies a thing by wrapping all the stuff and letting you do the asynchronous stuff in a DoWork event and updating UI in ProgressChanged event.
A couple of things are important to note when updating the UI from a non-UI thread:
If you use "Invoke" frequently, the performance of your non-UI thread may be severely adversely affected if other stuff makes the UI thread run sluggishly. I prefer to avoid using "Invoke" unless the non-UI thread needs to wait for the UI-thread action to be performed before it continues.
If you use "BeginInvoke" recklessly for things like control updates, an excessive number of invocation delegates may get queued, some of which may well be pretty useless by the time they actually occur.
My preferred style in many cases is to have each control's state encapsulated in an immutable class, and then have a flag which indicates whether an update is not needed, pending, or needed but not pending (the latter situation may occur if a request is made to update a control before it is fully created). The control's update routine should, if an update is needed, start by clearing the update flag, grabbing the state, and drawing the control. If the update flag is set, it should re-loop. To request another thread, a routine should use Interlocked.Exchange to set the update flag to update pending and--if it wasn't pending--try to BeginInvoke the update routine; if the BeginInvoke fails, set the update flag to "needed but not pending".
If an attempt to control occurs just after the control's update routine checks and clears its update flag, it may well happen that the first update will reflect the new value but the update flag will have been set anyway, forcing an extra screen redraw. On the occasions when this happens, it will be relatively harmless. The important thing is that the control will end up being drawn in the correct state, without there ever having been more than one BeginInvoke pending.

Multithreading: how to update UI to indicate progress

I've been working on the same project now since Christmas 2008. I've been asked to take it from a Console Application (which just prints out trace statements), to a full Windows App. Sure, that's fine. The only thing is there are parts of the App that can take several minutes to almost an hour to run. I need to multithread it to show the user status, or errors. But I have no idea where to begin.
I've aready built a little UI in WPF. It's very basic, but I'd like to expand it as I need to. The app works by selecting a source, choosing a destination, and clicking start. I would like a listbox to update as the process goes along. Much in the same way SQL Server Installs, each step has a green check mark by its name as it completes.
How does a newbie start multithreading? What libraries should I check out? Any advice would be greatly appreciated.
p.s. I'm currently reading about this library, http://www.codeplex.com/smartthreadpool
#Martin: Here is how my app is constructed:
Engine: Runs all major components in pre-defined order
Excel: Library I wrote to wrap COM to open/read/close/save Workbooks
Library: Library which understands different types of workbook formats (5 total)
Business Classes: Classes I've written to translate Excel data and prep it for Access
Db Library: A Library I've written which uses ADO.NET to read in Access data
AppSettings: you get the idea
Serialier: Save data in-case of app crash
I use everything from LINQ to ADO.NET to get data, transform it, and then output it.
My main requirement is that I want to update my UI to indicate progress
#Frank: What happens if something in the Background Worker throws an Exception (handled or otherwise)? How does my application recieve notice?
#Eric Lippert: Yes, I'm investigating that right now. Before I complicate things.
Let me know if you need more info. Currently I've running this application from a Unit Test, so I guess callig it a Console Application isn't true. I use Resharper to do this. I'm the only person right now who uses the app, but I'd like a more attractive interface
I don't think you specify the version of the CLR you are using, but you might check out the "BackgroundWorker" control. It is a simple way to implemented multiple threads.
The best part, is that it is a part of the CLR 2.0 and up
Update in response to your update: If you want to be able to update the progress in the UI -- for example in a progress bar -- the background worker is perfect. It uses an event that I think is called: ProgressChanged to report the status. It is very elegant. Also, keep in mind that you can have as many instances that you need and can execute all the instances at the same time (if needed).
In response to your question: You could easily setup an example project and test for your question. I did find the following, here (under remarks, 2nd paragraph from the caution):
If the operation raises an exception
that your code does not handle, the
BackgroundWorker catches the exception
and passes it into the
RunWorkerCompleted event handler,
where it is exposed as the Error
property of
System.ComponentModel..::.RunWorkerCompletedEventArgs.
Threading in C# from Joseph Albahari is quite good.
This page is quite a good summary of threading.
By the sound of it you probably don't need anything very complex - if you just start the task and then want to know when it has finished, you only need a few lines of code to create a new thread and get it to run your task. Then your UI thread can bumble along and check periodically if the task has completed.
Concurrent Programming on Windows is THE best book in the existence on the subject. Written by Joe Duffy, famous Microsoft Guru of multithreading. Everything you ever need to know and more, from the way Windows thread scheduler works to .NET Parallels Extensions Library.
Remember to create your delegates to update the UI so you don't get cross-threading issues and the UI doesn't appear to freeze/lockup
Also if you need a lot of notes/power points/etc etc
Might I suggest all the lecture notes from my undergrad
http://ist.psu.edu/courses/SP04/ist411/lectures.html
The best way for a total newcomer to threading is probably the threadpool. We'll probably need to know a little more about these parts to make more in depth recommendations
EDIT::
Since we now have a little more info, I'm going to stick with my previous answer, it looks like you have a loads of tasks which need doing, the best way to do a load of tasks is to add them to the threadpool and then just keep checking if they're done, if tasks need to be done in a specific order then you can simply add the next one as the previous one finishes. The threadpool really is rather good for this kind of thing and I see no reason not to use it in this case
Jason's link is a good article. Things you need to be aware of are that the UI can only be updated by the main UI thread, you will get cross threading exceptions if you try to do it in the worker thread. The BackgroundWorker control can help you there with the events, but you should also know about Control.Invoke (or Control.Begin/EndInvoke). This can be used to execute delegates in the context of the UI thread.
Also you should read up on the gotchas of accessing the same code/variables from different threads, some of these issues can lead to bugs that are intermittent and tricky to track down.
One point to note is that the volatile keyword only guarantees 'freshness' of variable access, for example, it guarantees that each read and write of the variable will be from main memory, and not from a thread or processor cache or other 'feature' of the memory model. It doesnt stop issues like a thread being interrupted by another thread during its read-update-write process (e.g. changing the variables value). This causes errors where the 2 threads have different (or the same) values for the variable, and can lead to things like values being lost, 2 threads having the same value for the variable when they should have different values, etc. You should use a lock/monitor (or other thread sync method, wait handles, interlockedincrement/decrement etc) to prevent these types of problems, which guarantee only one thread can access the variable. (Monitor also has the advantage that it implicitly performs volatile read/write)
And as someone else has noted, you also should try to avoid blocking your UI thread whilst waiting for background threads to complete, otherwise your UI will become unresponsive. You can do this by having your worker threads raise events that your UI subscribes to that indicate progress or completion.
Matt
Typemock have a new tool called Racer for helping with Multithreading issues. It’s a bit advanced but you can get help on their forum and in other online forums (one that strangely comes to mind is stackoverflow :-) )
I'm a newbie to multithreading as well, but I agree with Frank that a background worker is probably your best options. It works through event subscriptions. Here's the basics of how you used it.
First Instantiate a new background worker
Subscribed methods in your code to the background workers major events:
DoWork: This should contain whatever code that takes a long time to process
ProgressChanged: This is envoked whenever you call ReportProgress() from inside the method subscribed to DoWork
RunWorkerCompleted: Envoked when the DoWork method has completed
When you are ready to run your time consuming process you call the RunAsync() method of the background worker. This starts DoWork method on a separate thread, which can then report it's progress back through the ProgressChanged event. Once it completed RunWorkerComplete will be evoked.
The DoWork event method can also check if the user somehow requested that the process be canceled (CanceLAsync() was called)) by checking the value of the CancelPending property.

Categories

Resources