Multithreaded Service Engineering Questions - c#

I am trying to leverage .NET 4.5 new threading capabilities to engineer a system to update a list of objects in memory.
I worked with multithreading years ago in Java and I fear that my skills have become stagnant, especially in the .NET region.
Basically, I have written a Feed abstract class and I inherit from that for each of my threads. The thread classes themselves are simple and run fine.
Each of these classes run endlessly, they block until an event occurs and it updates the List.
So the first question is, how might I keep the parent thread alive while these threads run? I've prevented this race condition by writing this currently in a dev console app with a Console.read().
Second, I would like to set up a repository of List objects that I can access from the parent thread. How would I update those Lists from the child thread and expose them to another system? Trying to avoid SQL. Shared memory?
I hope I've made this clear. I was hoping for something like this: Writing multithreaded methods using async/await in .Net 4.5
except, we need the adaptability of external classes and of course, we need to somehow expose those Lists.

You can run the "parent" thread in a while with some flag to stop it:
while(flag){
//run the thread
}
You can expose a public List as a property of some class to hold your data. Remember to lock access in multithreading code.

If the 'parent' thread is supposed to wait during the processing it could simply await the call(s) to the async method(s).
If it has to wait for specific events you could use a signaling object such as a Barrier.
If the thread has to 'do' things while waiting you could check the availability of the result or the progress: How to do progress reporting using Async/Await

If you're using tasks, you can use Tasks.WaitAll to wait for the tasks to complete. The default is that Tasks and async/await use your system's ThreadPool, so I'd avoid placing anything but relatively short running tasks here.
If you're using System.Threading.Thread (I prefer using these for long running threads), check out the accepted answer here: C# Waiting for multiple threads to finish
If you can fetch batches of data, you can expose services allowing access to the shared objects using self hosted Web API or something like NancyFX. WCF and remoting are also options if you prefer binary communication.
Shared memory, keep-alive TCP connections or UDP are options if you have many small transactions. Perhaps you could use ZeroMQ (it's not a traditional queue) with the C# binding they provide?
For concurrent access to the lists take a look at the classes in System.Collections.Concurrent before implementing your own locking.

Related

Is it really beneficial to use asynchronous calls when using embedded databases such as SQLite in the context of an ASP.NET app? [duplicate]

This question already has an answer here:
How to deal with SQLite in an asynchronous project
(1 answer)
Closed 7 months ago.
I'm developing an ASP.NET Core web app (Blazor Server-side to be exact) and I was checking a .NET based embedded database called LiteDB and I noticed that it still lacks async calls while most SQLite wrappers have them (for example Dapper ORM for SQLite exposes many async methods)
If I'm not mistaken (please correct me if I'm wrong), the whole point of using asynchronous calls (let's say async/await in C#) is to free up the threads from waiting for the completion of IO operations (let's say querying a database).
The above scenario makes sense when in case of the said example, the database is in another machine or at least another process of the same machine because we are effectively relegating the job to something else and execution thread can do other jobs and come back to the result when it's ready.
But what about embedded databases such as SQLite (or the one mentioned above: LiteDB)? These databases run in the same process as the main application so any database processing (let's say querying) is still done by the threads of the application itself.
If the application is a classic GUI based app (let's say WinForm), using asynchronous calls would free up the main thread from being blocked and app becomes non-resposive and still understandable but what about the context of ASP.NET Core app in which every request is processed in a separate thread*?
*My question is that why use asynchronous calling when the app itself has to do the database processings too and therefore a thread has to be kept busy anyway;
Context
Microsoft's Async limitations (from 09/15/2021) states:
SQLite doesn't support asynchronous I/O. Async ADO.NET methods will execute synchronously in Microsoft.Data.Sqlite. Avoid calling them.
Instead, use a shared cache and write-ahead logging to improve performance and concurrency.
More
what about the context of ASP.NET Core app in which every request is processed in a separate thread*?
*My question is that why use asynchronous calling when the app itself has to do the database processing too and therefore a thread has to be kept busy anyway;
The first point is that it's not true that every request is processed in a separate thread. Using real async/await allows serving more requests than the number available treads.
Please remember that async/await does not equal multi-threading, they are separate and different; with overlaps.
It's not just the overall volume work that decides if using multiple threads is worth it or not. Who is doing what is very important. Even when all the cooking and serving is happening in the same restaurant you wouldn't want to dine in a busy restaurant where waiters do all the cooking.
You're right to think that the async/await is not beneficial with SQLite because under the hood it's synchronous but the point is that the original executing thread is never freed to do other work; the point is not that the work has to be done by the application itself (but could be done by new/dedicated thread).
Async(Await) are not only about free up threads if you have to do smth on another machine. Like example, when you want write text in file, you can do it async.
var text = "mytext";
File.WriteAllTextAsync(#"C:\Temp\csc.txt", text);
It free up your thread and will done by thread from thread pool
Same logic in SQLlite. You can do smth by another thread. So you can use it to improve perfomance and etc
You can check source code SQLLite and check how it works
https://github.com/praeclarum/sqlite-net/blob/master/src/SQLiteAsync.cs

Threading or Task

I'm currently picking up C# again and developping a simple application that sends broadcast messages and when received shown on a Windows Form.
I have a discovery class with two threads, one that broadcasts every 30 seconds, the other thread listens on a socket. It is in a thread because of the blocking call:
if (listenSocket.Poll(-1, SelectMode.SelectRead))
The first thread works much like a timer in a class library, it broadcasts the packet and then sleeps for 30 seconds.
Now in principle it works fine, when a packet is received I throw it to an event and the Winform places it in a list. The problems start with the form though because of the main UI thread requiring Invoke. Now I only have two threads and to me it doesn't seem to be the most effective in the long run becoming a complex thin when the number of threads will grow.
I have explored the Tasks but these seem to be more orientated at a once off long running task (much like the background worker for a form).
Most threading examples I find all report to the console and do not have the problems of Invoke and locking of variables.
As i'm using .NET 4.5 should I move to Tasks or stick to the threads?
Async programming will still delegate some aspects of your application to a different thread (threadpool) if you try to update the GUI from such a thread you are going to have similar problems as you have today with regular threads.
However there are many techniques in async await that allow you to delegate to a background thread, and yet put a kind off wait point saying please continue here on the GUI thread when you are finished with that operation which effectively allows you to update the GUI thread without invoke, and have a responsive GUI. I am talking about configureAwait. But there are other techniques as well.
If you don't know async await mechanism yet, this will take you some investment of your time to learn all these new things. But you'll find it very rewarding.
But it is up to you to decide if you are willing to spend a few days learning and experimenting with a technology that is new to you.
Google around a bit on async await, there are some excellent articles from Stephen Cleary for instance http://blog.stephencleary.com/2012/02/async-and-await.html
Firstly if you're worried about scalability you should probably start off with an approach that scales easily. ThreadPool would work nice. Tasks are based on ThreadPool as well and they allow for a bit more complex situations like tasks/threads firing in a sequence (also based on condition), synchronization etc. In your case (server and client) this seems unneeded.
Secondly it looks to me that you are worried about a bottleneck scenario where more than one thread will try to access a common resource like UI or DB etc. With DBs - don't worry they can handle multiple access well. But in case of UI or other not-multithread-friendly resource you have to manage parallel access yourself. I would suggest something like BlockingCollection which is a nice way to implement "many producers, one consumer" pattern. This way you could have multiple threads adding stuff and just one thread reading it from the collection and passing it on the the single-threaded resource like UI.
BTW, Tasks can also be long running i.e. run loops. Check this documentation.

Multiple Task.Factory's

I'm really loving the TPL. Simply calling Task.Factory.StartNew() and not worrying about anything, is quite amazing.
But, is it possible to have multiple Factories running on the same thread?
Basically, I have would like to have two different queues, executing different types of tasks.
One queue handles tasks of type A while the second queue handles tasks of type B.
If queue A has nothing to do, it should ignore tasks in queue B and vice versa.
Is this possible to do, without making my own queues, or running multiple threads for the factories?
To clarify what I want to do.
I read data from a network device. I want to do two things with this data, totally independent from each other.
I want to log to a database.
I want to send to another device over network.
Sometimes the database log will take a while, and I don't want the network send to be delayed because of this.
If you use .NET 4.0:
LimitedConcurrencyLevelTaskScheduler (with concurrency level of 1; see here)
If you use .NET 4.5:
ConcurrentExclusiveSchedulerPair (take only the exclusive scheduler out of the pair; see here)
Create two schedulers and pass them to the appropriate StartNew. Or create two TaskFactories with these schdulers and use them to create and start the tasks.
You can define yourself a thread pool using a queue of threads

Multithreading in .Net

I've a configuration xml which is being used by a batch module in my .Net 3.5 windows application.
Each node in the xml is mapped to a .Net class. Each class does processing like mathematical calculations, making db calls etc.
The batch module loads the xml, identifies the class associated with each node and then processes it.
Now, we have the following requirements:
1.Lets say there are 3 classes[3 nodes in the xml]...A,B, and C.
Class A can be dependant on class B...ie. we need to execute class B before processing class A. Class C processing should be done on a separare thread.
2.If a thread is running, then we should be able to cancel that thread in the middle of its processing.
We need to implement this whole module using .net multi-threading.
My questions are:
1.Is it possible to implement requirement # 1 above?If yes, how?
2.Given these requirements, is .Net 3.5 a good idea or .Net 4.0 would be a better choice?Would like to know advantages and disadvantages please.
Thanks for reading.
You'd be better off using the Task Parallel Library (TPL) in .NET 4.0. It'll give you lots of nice features for abstracting the actual business of creating threads in the thread pool. You could use the parallel tasks pattern to create a Task for each of the jobs defined in the XML and the TPL will handle the scheduling of those tasks regardless of the hardware. In other words if you move to a machine with more cores the TPL will schedule more threads.
1) The TPL supports the notion of continuation tasks. You can use these to enforce task ordering and pass the result of one Task or future from the antecedent to the continuation. This is the futures pattern.
// The antecedent task. Can also be created with Task.Factory.StartNew.
Task<DayOfWeek> taskA = new Task<DayOfWeek>(() => DateTime.Today.DayOfWeek);
// The continuation. Its delegate takes the antecedent task
// as an argument and can return a different type.
Task<string> continuation = taskA.ContinueWith((antecedent) =>
{
return String.Format("Today is {0}.",
antecedent.Result);
});
// Start the antecedent.
taskA.Start();
// Use the contuation's result.
Console.WriteLine(continuation.Result);
2) Thread cancellation is supported by the TPL but it is cooperative cancellation. In other words the code running in the Task must periodically check to see if it has been cancelled and shut down cleanly. TPL has good support for cancellation. Note that if you were to use threads directly you run into the same limitations. Thread.Abort is not a viable solution in almost all cases.
While you're at it you might want to look at a dependency injection container like Unity for generating configured objects from your XML configuration.
Answer to comment (below)
Jimmy: I'm not sure I understand holtavolt's comment. What is true is that using parallelism only pays off if the amount of work being done is significant, otherwise your program may spend more time managing parallelism that doing useful work. The actual datasets don't have to be large but the work needs to be significant.
For example if your inputs were large numbers and you we checking to see if they were prime then the dataset would be very small but parallelism would still pay off because the computation is costly for each number or block of numbers. Conversely you might have a very large dataset of numbers that you were searching for evenness. This would require a very large set of data but the calculation is still very cheap and a parallel implementation might still not be more efficient.
The canonical example is using Parallel.For instead of for to iterate over a dataset (large or small) but only perform a simple numerical operation like addition. In this case the expected performance improvement of utilizing multiple cores is outweighed by the overhead of creating parallel tasks and scheduling and managing them.
Of course it can be done.
Assuming you're new, I would likely look into multithreading, and you want 1 thread per class then I would look into the backgroundworker class, and basically use it in the different classes to do the processing.
What version you want to use of .NET also depends on if this is going to run on client machines also. But I would go for .NET 4 simply because it's newest, and if you want to split up a single task into multiple threads it has built-in classes for this.
Given your use case, the Thread and BackgroundWorkerThread should be sufficient. As you'll discover in reading the MSDN information regarding these classes, you will want to support cancellation as your means of shutting down a running thread before it's complete. (Thread "killing" is something to be avoided if at all possible)
.NET 4.0 has added some advanced items in the Task Parallel Library (TPL) - where Tasks are defined and managed with some smarter affinity for their most recently used core (to provide better cache behavior, etc.), however this seems like overkill for your use case, unless you expect to be running very large datasets. See these sites for more information:
http://msdn.microsoft.com/en-us/library/dd460717.aspx
http://archive.msdn.microsoft.com/ParExtSamples

Alternative to Threads

I've read that threads are very problematic. What alternatives are available? Something that handles blocking and stuff automatically?
A lot of people recommend the background worker, but I've no idea why.
Anyone care to explain "easy" alternatives? The user will be able to select the number of threads to use (depending on their speed needs and computer power).
Any ideas?
To summarize the problems with threads:
if threads share memory, you can get
race conditions
if you avoid races by liberally using locks, you
can get deadlocks (see the dining philosophers problem)
An example of a race: suppose two threads share access to some memory where a number is stored. Thread 1 reads from the memory address and stores it in a CPU register. Thread 2 does the same. Now thread 1 increments the number and writes it back to memory. Thread 2 then does the same. End result: the number was only incremented by 1, while both threads tried to increment it. The outcome of such interactions depend on timing. Worse, your code may seem to work bug-free but once in a blue moon the timing is wrong and bad things happen.
To avoid these problems, the answer is simple: avoid sharing writable memory. Instead, use message passing to communicate between threads. An extreme example is to put the threads in separate processes and communicate via TCP/IP connections or named pipes.
Another approach is to share only read-only data structures, which is why functional programming languages can work so well with multiple threads.
This is a bit higher-level answer, but it may be useful if you want to consider other alternatives to threads. Anyway, most of the answers discussed solutions based on threads (or thread pools) or maybe tasks from .NET 4.0, but there is one more alternative, which is called message-passing. This has been successfuly used in Erlang (a functional language used by Ericsson). Since functional programming is becoming more mainstream in these days (e.g. F#), I thought I could mention it. In genral:
Threads (or thread pools) can usually used when you have some relatively long-running computation. When it needs to share state with other threads, it gets tricky (you have to correctly use locks or other synchronization primitives).
Tasks (available in TPL in .NET 4.0) are very lightweight - you can split your program into thousands of tasks and then let the runtime run them (it will use optimal number of threads). If you can write your algorithm using tasks instead of threads, it sounds like a good idea - you can avoid some synchronization when you run computation using smaller steps.
Declarative approaches (PLINQ in .NET 4.0 is a great option) if you have some higher-level data processing operation that can be encoded using LINQ primitives, then you can use this technique. The runtime will automatically parallelize your code, because LINQ doesn't specify how exactly should it evaluate the results (you just say what results you want to get).
Message-passing allows you two write program as concurrently running processes that perform some (relatively simple) tasks and communicate by sending messages to each other. This is great, because you can share some state (send messages) without the usual synchronization issues (you just send a message, then do other thing or wait for messages). Here is a good introduction to message-passing in F# from Robert Pickering.
Note that the last three techniques are quite related to functional programming - in functional programming, you desing programs differently - as computations that return result (which makes it easier to use Tasks). You also often write declarative and higher-level code (which makes it easier to use Declarative approaches).
When it comes to actual implementation, F# has a wonderful message-passing library right in the core libraries. In C#, you can use Concurrency & Coordination Runtime, which feels a bit "hacky", but is probably quite powerful too (but may look too complicated).
Won't the parallel programming options in .Net 4 be an "easy" way to use threads? I'm not sure what I'd suggest for .Net 3.5 and earlier...
This MSDN link to the Parallel Computing Developer Center has links to lots of info on Parellel Programming including links to videos, etc.
I can recommend this project. Smart Thread Pool
Project Description
Smart Thread Pool is a thread pool written in C#. It is far more advanced than the .NET built-in thread pool.
Here is a list of the thread pool features:
The number of threads dynamically changes according to the workload on the threads in the pool.
Work items can return a value.
A work item can be cancelled.
The caller thread's context is used when the work item is executed (limited).
Usage of minimum number of Win32 event handles, so the handle count of the application won't explode.
The caller can wait for multiple or all the work items to complete.
Work item can have a PostExecute callback, which is called as soon the work item is completed.
The state object, that accompanies the work item, can be disposed automatically.
Work item exceptions are sent back to the caller.
Work items have priority.
Work items group.
The caller can suspend the start of a thread pool and work items group.
Threads have priority.
Can run COM objects that have single threaded apartment.
Support Action and Func delegates.
Support for WindowsCE (limited)
The MaxThreads and MinThreads can be changed at run time.
Cancel behavior is imporved.
"Problematic" is not the word I would use to describe working with threads. "Tedious" is a more appropriate description.
If you are new to threaded programming, I would suggest reading this thread as a starting point. It is by no means exhaustive but has some good introductory information. From there, I would continue to scour this website and other programming sites for information related to specific threading questions you may have.
As for specific threading options in C#, here's some suggestions on when to use each one.
Use BackgroundWorker if you have a single task that runs in the background and needs to interact with the UI. The task of marshalling data and method calls to the UI thread are handled automatically through its event-based model. Avoid BackgroundWorker if (1) your assembly does not already reference the System.Windows.Form assembly, (2) you need the thread to be a foreground thread, or (3) you need to manipulate the thread priority.
Use a ThreadPool thread when efficiency is desired. The ThreadPool helps avoid the overhead associated with creating, starting, and stopping threads. Avoid using the ThreadPool if (1) the task runs for the lifetime of your application, (2) you need the thread to be a foreground thread, (3) you need to manipulate the thread priority, or (4) you need the thread to have a fixed identity (aborting, suspending, discovering).
Use the Thread class for long-running tasks and when you require features offered by a formal threading model, e.g., choosing between foreground and background threads, tweaking the thread priority, fine-grained control over thread execution, etc.
Any time you introduce multiple threads, each running at once, you open up the potential for race conditions. To avoid these, you tend to need to add synchronization, which adds complexity, as well as the potential for deadlocks.
Many tools make this easier. .NET has quite a few classes specifically meant to ease the pain of dealing with multiple threads, including the BackgroundWorker class, which makes running background work and interacting with a user interface much simpler.
.NET 4 is going to do a lot to ease this even more. The Task Parallel Library and PLINQ dramatically ease working with multiple threads.
As for your last comment:
The user will be able to select the number of threads to use (depending on their speed needs and computer power).
Most of the routines in .NET are built upon the ThreadPool. In .NET 4, when using the TPL, the work load will actually scale at runtime, for you, eliminating the burden of having to specify the number of threads to use. However, there are ways to do this now.
Currently, you can use ThreadPool.SetMaxThreads to help limit the number of threads generated. In TPL, you can specify ParallelOptions.MaxDegreesOfParallelism, and pass an instance of the ParallelOptions into your routine to control this. The default behavior scales up with more threads as you add more processing cores, which is usually the best behavior in any case.
Threads are not problematic if you understand what causes problems with them.
For ex. if you avoid statics, you know which API's to use (e.g. use synchronized streams), you will avoid many of the issues that come up for their bad utilization.
If threading is a problem (this can happen if you have unsafe/unmanaged 3rd party dll's that cannot support multithreading. In this can an option is to create a meachism to queue the operations. ie store the parameters of the action to a database and just run through them one at a time. This can be done in a windows service. Obviously this will take longer but in some cases is the only option.
Threads are indispensable tools for solving many problems, and it behooves the maturing developer to know how to effectively use them. But like many tools, they can cause some very difficult-to-find bugs.
Don't shy away from some so useful just because it can cause problems, instead study and practice until you become the go-to guy for multi-threaded apps.
A great place to start is Joe Albahari's article: http://www.albahari.com/threading/.

Categories

Resources