I have a program, that takes long time to initialize but it's execution is rather fast.
It's becoming a bottleneck, so I want to start multiple instances of this program (like a pool) having it already initialized, and the idea is to just pass the needed arguments for it's execution, saving all the initialization time.
The problem is that I only found howto start new processes passing arguments:
How to pass parameters to ThreadStart method in Thread?
but I would like to start the process normally and then be able to communicate with it to send each thread the needed paramenters required for it's execution.
The best aproach I found was to create multiple threads where I would initialize the program and then using some communication mechanism (named pipes for example as it's all running in the same machine) be able to pass those arguments and trigger the execution of the program (one of the triggers could break an infinite loop, for example).
I'm asking if anyone can advice a more optimal solution rather that the one I came up with.
I suggest you don't mess with direct Thread usage, and use the TPL, something like this:
foreach (var data in YOUR_INITIALIZATION_LOGIC_METHOD_HERE)
{
Task.Run(() => yourDelegate(data), //other params here);
}
More about Task.Run on MSDN, Stephen Cleary blog
Process != Thread
A thread lives inside a process, while a process is an entire program or service in your OS.
If you want to speed-up your app initialization you can still use threads, but nowadays we use them on top of Task Parallel Library using the Task Asynchronous Pattern.
In order to communicate tasks (usually threads), you might need to implement some kind of state machine (some kind of basic workflow) where you can detect when some task progress and perform actions based on task state (running, failed, completed...).
Anyway, you don't need named pipes or something like that to communicate tasks/threads as everything lives in the same parent process. That is, you need to use regular programming approaches to do so. I mean: use C# and thread synchronization mechanisms and some kind of in-app messaging.
Some very basic idea...
.NET has a List<T> collection class. You should design a coordinator class where you might add some list which receives a message class (designed by you) like this:
public enum OperationType { DataInitialization, Authentication, Caching }
public class Message
{
public OperationType Operation { get; set; }
public Task Task { get; set; }
}
And you start all parallel initialization tasks, you add everyone to a list in the coordinator class:
Coordinator.Messages.AddRange
(
new List<Message>
{
new Message
{
Operation = Operation.DataInitialization,
Task = dataInitTask
},
..., // <--- more messages
}
);
Once you've added all messages with pending initialization tasks, somewhere in your code you can wait until initialization ends asynchronously this way:
// You do a projection of each message to get an IEnumerable<Task>
// to give it as argument of Task.WhenAll
await Task.WhenAll(Coordinator.Messages.Select(message => message.Task));
While this line awaits to finish all initialization, your UI (i.e. the main thread) can continue to work and show some kind of loading animation or who knows what (whatever).
Perhaps you can go a step further, and don't wait for all but wait for a group of tasks which allow your users to start using your app, while other non-critical tasks end...
Related
I already have some experience in working with threads in Windows but most of that experience comes from using Win32 API functions in C/C++ applications. When it comes to .NET applications however, I am often not sure about how to properly deal with multithreading. There are threads, tasks, the TPL and all sorts of other things I can use for multithreading but I never know when to use which of those options.
I am currently working on a C# based Windows service which needs to periodically validate different groups of data from different data sources. Implementing the validation itself is not really an issue for me but I am unsure about how to handle all of the validations running simultaneously.
I need a solution for this which allows me to do all of the following things:
Run the validations at different (predefined) intervals.
Control all of the different validations from one place so I can pause and/or stop them if necessary, for example when a user stops or restarts the service.
Use the system ressources as efficiently as possible to avoid performance issues.
So far I've only had one similar project before where I simply used Thread objects combined with a ManualResetEvent and a Thread.Join call with a timeout to notify the threads about when the service is stopped. The logic inside those threads to do something periodically then looked like this:
while (!shutdownEvent.WaitOne(0))
{
if (DateTime.Now > nextExecutionTime)
{
// Do something
nextExecutionTime = nextExecutionTime.AddMinutes(interval);
}
Thread.Sleep(1000);
}
While this did work as expected, I've often heard that using threads directly like this is considered "oldschool" or even a bad practice. I also think that this solution does not use threads very efficiently as they are just sleeping most of the time. How can I achive something like this in a more modern and efficient way?
If this question is too vague or opinion-based then please let me know and I will try my best to make it as specific as possible.
Question feels a bit broad but we can use the provided code and try to improve it.
Indeed the problem with the existing code is that for the majority of the time it holds thread blocked while doing nothing useful (sleeping). Also thread wakes up every second only to check the interval and in most cases go to sleep again since it's not validation time yet. Why it does that? Because if you will sleep for longer period - you might block for a long time when you signal shutdownEvent and then join a thread. Thread.Sleep doesn't provide a way to be interrupted on request.
To solve both problems we can use:
Cooperative cancellation mechanism in form of CancellationTokenSource + CancellationToken.
Task.Delay instead of Thread.Sleep.
For example:
async Task ValidationLoop(CancellationToken ct) {
while (!ct.IsCancellationRequested) {
try {
var now = DateTime.Now;
if (now >= _nextExecutionTime) {
// do something
_nextExecutionTime = _nextExecutionTime.AddMinutes(1);
}
var waitFor = _nextExecutionTime - now;
if (waitFor.Ticks > 0) {
await Task.Delay(waitFor, ct);
}
}
catch (OperationCanceledException) {
// expected, just exit
// otherwise, let it go and handle cancelled task
// at the caller of this method (returned task will be cancelled).
return;
}
catch (Exception) {
// either have global exception handler here
// or expect the task returned by this method to fail
// and handle this condition at the caller
}
}
}
Now we do not hold a thread any more, because await Task.Delay doesn't do this. Instead, after specificed time interval it will execute the subsequent code on a free thread pool thread (it's more complicated that this but we won't go into details here).
We also don't need to wake up every second for no reason, because Task.Delay accepts cancellation token as a parameter. When that token is signalled - Task.Delay will be immediately interrupted with exception, which we expect and break from the validation loop.
To stop the provided loop you need to use CancellationTokenSource:
private readonly CancellationTokenSource _cts = new CancellationTokenSource();
And you pass its _cts.Token token into the provided method. Then when you want to signal the token, just do:
_cts.Cancel();
To futher improve the resource management - IF your validation code uses any IO operations (reads files from disk, network, database access etc) - use Async versions of said operations. Then also while performing IO you will hold no unnecessary threads blocked waiting.
Now you don't need to manage threads yourself anymore and instead you operatate in terms of tasks you need to perform, letting framework \ OS manage threads for you.
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
Subject<bool> starter = new Subject<bool>();
IObservable<Unit> query =
starter
.StartWith(true)
.Select(x => x
? Observable.Interval(TimeSpan.FromSeconds(5.0)).SelectMany(y => Observable.Start(() => Validation()))
: Observable.Never<Unit>())
.Switch();
IDisposable subscription = query.Subscribe();
That fires off the Validation() method every 5.0 seconds.
When you need to pause and resume, do this:
starter.OnNext(false);
// Now paused
starter.OnNext(true);
// Now restarted.
When you want to stop it all call subscription.Dispose().
I am writing a multi player game server and am looking at ways the new C# async/await features can
help me. The core of the server is a loop which updates all the actors in the game as fast as it
can:
while (!shutdown)
{
foreach (var actor in actors)
actor.Update();
// Send and receive pending network messages
// Various other system maintenance
}
This loop is required to handle thousands of actors and update multiple times per second to keep the
game running smoothly. Some actors occasionally perform slow tasks in their update functions, such
as fetching data from a database, which is where I'd like to use async. Once this data is retrieved
the actor wants to update the game state, which must be done on the main thread.
As this is a console application, I plan to write a SynchronizationContext which can dispatch
pending delegates to the main loop. This allows those tasks to update the game once they complete
and lets unhandled exceptions be thrown into the main loop. My question is, how do write the async
update functions? This works very nicely, but breaks the recommendations not to use async void:
Thing foo;
public override void Update()
{
foo.DoThings();
if (someCondition) {
UpdateAsync();
}
}
async void UpdateAsync()
{
// Get data, but let the server continue in the mean time
var newFoo = await GetFooFromDatabase();
// Now back on the main thread, update game state
this.foo = newFoo;
}
I could make Update() async and propogate the tasks back to the main loop, but:
I don't want to add overhead to the thousands of updates that will never use it.
Even in the main loop I don't want to await the tasks and block the loop.
Awaiting the task would cause a deadlock anyway as it needs to complete on the awaiting thread.
What do I do with all these tasks I can't await? The only time I might want to know they've all
finished is when I'm shutting the server down, but I don't want to collect every task generated by
potentially weeks worth of updates.
My understanding is that the crux of it is that you want:
while (!shutdown)
{
//This should happen immediately and completions occur on the main thread.
foreach (var actor in actors)
actor.Update(); //includes i/o bound database operations
// The subsequent code should not be delayed
...
}
Where the while loop is running in your main console thread. This is a tight single-threaded loop. You could run the foreach in parallel, but then you would still be waiting for the longest running instance (the i/o bound operation to get the data from the database).
await async is not the best option within this loop, you need to run these i/o database tasks on a thread pool. On the thread pool async await would be useful to free up pool threads.
So, the next question is how to get these completions back to your main thread. Well, it seems like you need something equivalent to a message pump on your main thread. See this post for information on how to do that, though that may be a bit heavy handed. You could just have a completion queue of sorts that you check on the main thread in each pass through your while Loop. You would use one of the concurrent data structures to do this so that it is all thread safe then set Foo if it needs to be set.
It seems that there is some room to rationalise this polling of actors and threading, but without knowing the details of the app it is hard to say.
A couple of points: -
If you do not have a Wait higher up on a task, your main console thread will exit and so will your application. See here for details.
As you have pointed out, await async does not block the current thread, but it does mean that the code subsequent to the await will only execute on completion of the await.
The completion may or may not be completed on the calling thread. You have already mentioned Synchronization Context, so I won't go into the details.
Synchronization Context is null on a Console app. See here for information.
Async isn't really for fire-and-forget type operations.
For fire and forget you can use one of these options depending on your scenario:
Use Task.Run or Task.StartNew. See here for differences.
Use a producer/consumer type pattern for the long running scenarios running under your own threadpool.
Be aware of the following: -
That you will need to handle the exceptions in your spawned tasks / threads. If there are any exceptions that you do not observe, you may want to handle these, even just to log their occurence. See the information on unobserved exceptions.
If your process dies while these long running tasks are on the queue or starting they will not be run, so you may want some kind of persistence mechanism (database, external queue, file) that keeps track of the state of these operations.
If you want to know about the state of these tasks, then you will need to keep track of them in some way, whether it is an in memory list, or by querying the queues for your own thread pool or by querying the persistence mechanism. The nice thing about the persistence mechanism is that it is resilient to crashes and during shutdown you could just close down immediately, then pick up where you ended up when you restart (this of course depends on how critical it is that the tasks are run within a certain timeframe).
First, I recommend that you do not use your own SynchronizationContext; I have one available as part of my AsyncEx library that I commonly use for Console apps.
As far as your update methods go, they should return Task. My AsyncEx library has a number of "task constants" that are useful when you have a method that might be asynchronous:
public override Task Update() // Note: not "async"
{
foo.DoThings();
if (someCondition) {
return UpdateAsync();
}
else {
return TaskConstants.Completed;
}
}
async Task UpdateAsync()
{
// Get data, but let the server continue in the mean time
var newFoo = await GetFooFromDatabase();
// Now back on the main thread, update game state
this.foo = newFoo;
}
Returning to your main loop, the solution there isn't quite as clear. If you want every actor to complete before continuing to the next actor, then you can do this:
AsyncContext.Run(async () =>
{
while (!shutdown)
{
foreach (var actor in actors)
await actor.Update();
...
}
});
Alternatively, if you want to start all actors simultaneously and wait for them all to complete before moving to the next "tick", you can do this:
AsyncContext.Run(async () =>
{
while (!shutdown)
{
await Task.WhenAll(actors.Select(actor => actor.Update()));
...
}
});
When I say "simultaneously" above, it is actually starting each actor in order, and since they all execute on the main thread (including the async continuations), there's no actual simultaneous behavior; each "chuck of code" will execute on the same thread.
I highly recommend watching this video or just taking a look at the slides:
Three Essential Tips for Using Async in Microsoft Visual C# and Visual Basic
From my understanding what you should probably be doing in this scenario is returning Task<Thing> in UpdateAsync and possibly even Update.
If you are performing some async operations with 'foo' outside the main loop what happens when the async part completes during a future sequential update? I believe you really want to wait on all your update tasks to complete and then swap your internal state over in one go.
Ideally you would start all the slow (database) updates first and then do the other faster ones so that the entire set is ready as soon as possible.
I have a fairly vanilla web service (old school asmx). One of the methods kicks off some async processing that has no bearing on the result returned to the client. Hopefully, the little snippet below makes sense:
[System.Web.Services.WebMethod]
public List<Foo> SampleWebMethod(string id)
{
// sample db query
var foo = db.Query<Foo>("WHERE id=#0",id);
// kick of async stuff here - for example firing off emails
// dont wait to send result
DoAsyncStuffHere();
return foo;
}
My initial implementation for the DoAsyncStuffHere method made use of the ThreadPool.QueueUserWorkItem. So, it looks something like:
public void DoAsyncStuffHere()
{
ThreadPool.QueueUserWorkItem(delegate
{
// DO WORK HERE
});
}
This approach works fine under low load conditions. However, I need something that can handle a fairly high load. So, the producer/consumer pattern would seem to be the best way to go.
Where I am confused is how to constrain all work being done by the queue to a single thread across all instances of the web service. How would I best go about setting up a single queue to be accessed by any instance of the web service?
You can use a System.Collections.Concurrent.BlockingCollection<T> with a System.Collections.Concurrent.ConcurrentQueue<T> as the underlying collection.
As the name of the namespace implies, the collections are thread safe.
Start a consumer thread (or a few) to pull items from the collection, using the Take() method. When no items are available, the thread will block.
Your DoAsyncStuffHere method adds items to the BlockingCollection. These items could be unstarted System.Threading.Tasks.Task objects; the consumer thread(s) would in that case Start the tasks after taking them from the collection.
One easy way to do it would be to implement your queue as a database table.
The producers would be the request threads handled by each instance of the web service.
The consumer could be any kind of continuously running process (Windows Forms app, Windows service, database job, etc.) that monitors the queue and processes items one at a time.
You can't do this with ThreadPool - you could have a static constructor which launches a worker Thread; the DoAsyncStuffHere could insert its work item to a Queue of work that you want done, and the worker Thread can check if there are any items in the Queue to work on. If so, it does the work, otherwise it sleeps for a few millis.
The static constructor ensures that it's only called once, and only a single Thread should be launched (unless there's some bizarre .NET edge case that I'm unaware of).
Here's a layout for an example - you'd probably need to implement some locking on the queue and add a bit more sophistication to the worker thread, but I've used this pattern before with success. The WorkItem object can hold the state of the information that you want passed along to the worker thread.
public static WebService()
{
new Thread(WorkerThread).Start();
WorkQueue = new Queue<WorkItem>();
}
public static void WorkerThread()
{
while(true)
{
if(WorkQueue.Any())
{
WorkQueue.Dequeue().DoWork();
}
else
{
Thread.Sleep(100);
}
}
}
public static Queue<WorkItem> WorkQueue { get; set; }
[System.Web.Services.WebMethod]
public List<Foo> SampleWebMethod(string id)
{
WorkQueue.Queue(newWorkItem());
}
I am writing a program to crawl the websites. The crawl function is a recursive one and may consume more time to complete, So I used Multi Threading to perform the crawl for multiple websites.
What exactly I need is, after completion crawling one website it call next one (which should be in Queqe) instead multiple websites crawling at a time.
I am using C# and ASP.NET.
The standard practice for doing this is to use a blocking queue. If you are using .NET 4.0 then you can take advantage of the BlockingCollection class otherwise you can use Stephen Toub's implementation.
What you will do is spin up as many worker threads as you feel necessary and have them go around in an infinite loop dequeueing items as they appear in the queue. Your main thread will be enqueueing the item. A blocking queue is designed to wait/block on the dequeue operation until an item becomes available.
public class Program
{
private static BlockingQueue<string> m_Queue = new BlockingQueue<string>();
public static void Main()
{
var thread1 = new Thread(Process);
var thread2 = new Thread(Process);
thread1.Start();
thread2.Start();
while (true)
{
string url = GetNextUrl();
m_Queue.Enqueue(url);
}
}
public static void Process()
{
while (true)
{
string url = m_Queue.Dequeue();
// Do whatever with the url here.
}
}
}
I don't usually think positive thoughts when it comes to web crawlers...
You want to use a threadpool.
ThreadPool.QueueUserWorkItem(new WaitCallback(CrawlSite), (object)s);
You simply 'push' you workload into the queue, and let the threadpool manage it.
I have to say - I'm not a Threading expert and my C# is quite rusty - but considering the requirements I would suggest something like this:
Define a Queue for the websites.
Define a Pool with Crawler threads.
The main process iterates over the website queue and retrieves the site address.
Retrieve an available thread from the pool - assign it the website address and allow it to start running. Set an indicator in the thread object that it should wait for all subsequent threads to finish (so you will not continue to the next site).
Once all the threads have ended - the main thread (started in step #4) will end and return to the main loop of the main process to continue to the next website.
The Crawler behavior should be something like this:
Investigate the content of the current address
Retrieve the hierarchy below the current level
For each child of the current node of the site tree - pull a new crawler thread from the pool and start it running in the background with the address of the child node
If the pool is empty, wait until a thread becomes available.
If the thread is marked to wait - wait for all the other threads to finish
I think there are some challenges here - but as a general flow I believe it can do do job.
Put all your url's in a queue, and pop one off the queue each time you are done with the previous one.
You could also put the recursive links in the queue, to better control how many downloads you are executing at a time.
You could set up X number of worker threads which all get a url off the queue in order to process more at a time. But this way you can throttle it yourself.
You can use ConcurrentQueue<T> in .Net to get a thread safe queue to work with.
I'm looking at implementing a "Heartbeat" process to do a lot of repeated cleanup tasks throughout the day.
This seemed like a good chance to use the Command pattern, so I have an interface that looks like:
public interface ICommand
{
void Execute();
bool IsReady();
}
I've then created several tasks that I want to be run. Here is a basic example:
public class ProcessFilesCommand : ICommand
{
private int secondsDelay;
private DateTime? lastRunTime;
public ProcessFilesCommand(int secondsDelay)
{
this.secondsDelay = secondsDelay;
}
public void Execute()
{
Console.WriteLine("Processing Pending Files...");
Thread.Sleep(5000); // Simulate long running task
lastRunTime = DateTime.Now;
}
public bool IsReady()
{
if (lastRunTime == null) return true;
TimeSpan timeSinceLastRun = DateTime.Now.Subtract(lastRunTime.Value);
return (timeSinceLastRun.TotalSeconds > secondsDelay);
}
}
Finally, my console application runs in this loop looking for waiting tasks to add to the ThreadPool:
class Program
{
static void Main(string[] args)
{
bool running = true;
Queue<ICommand> taskList = new Queue<ICommand>();
taskList.Enqueue(new ProcessFilesCommand(60)); // 1 minute interval
taskList.Enqueue(new DeleteOrphanedFilesCommand(300)); // 5 minute interval
while (running)
{
ICommand currentTask = taskList.Dequeue();
if (currentTask.IsReady())
{
ThreadPool.QueueUserWorkItem(t => currentTask.Execute());
}
taskList.Enqueue(currentTask);
Thread.Sleep(100);
}
}
}
I don't have much experience with multi-threading beyond some work I did in Operating Systems class. However, as far as I can tell none of my threads are accessing any shared state so they should be fine.
Does this seem like an "OK" design for what I want to do? Is there anything you would change?
This is a great start. We've done a bunch of things like this recently so I can offer a few suggestions.
Don't use thread pool for long running tasks. The thread pool is designed to run lots of tiny little tasks. If you're doing long running tasks, use a separate thread. If you starve the thread pool (use up all the tasks), everything that gets queued up just waits for a threadpool thread to become available, significantly impacting the effective performance of the threadpool.
Have the Main() routine keep track of when things ran and how long till each runs next. Instead of each command saying "yes I'm ready" or "no I'm not" which will be the same for each command, just have LastRun and Interval fields which Main() can then use to determine when each command needs to run.
Don't use a Queue. While it may seem like a Queue type operation, since each command has it's own interval, it's really not a normal Queue. Instead put all the commands in a List and then sort the list by shortest time to next run. Sleep the thread until the first command is needed to run. Run that command. Resort the list by next command to run. Sleep. Repeat.
Don't use multiple threads. If each command's interval is a minute or few minutes, you probably don't need to use threads at all. You can simplify by doing everything on the same thread.
Error handling. This kind of thing needs extensive error handling to make sure a problem in one command doesn't make the whole loop fail, and so you can debug a problem when it occurs. You also may want to decide if a command should get immediately retried on error or wait until it's next scheduled run, or even delay it more than normal. You may also want to not log an error in a command if the error happens every time (an error in a command that runs often can easily create huge log files).
Instead of writing everything from scratch, you could choose to build your application using a framework that handles all of the scheduling and threading for you. The open-source library NCron is designed for exactly this purpose, and it is very easy to use.
Define your job like this:
class MyFirstJob : CronJob
{
public override void Execute()
{
// Put your logic here.
}
}
And create a main entry point for your application including scheduling setup like this:
class Program
{
static void Main(string[] args)
{
Bootstrap.Init(args, ServiceSetup);
}
static void ServiceSetup(SchedulingService service)
{
service.Hourly().Run<MyFirstJob>();
service.Daily().Run<MySecondJob>();
}
}
This is all the code you will need to write if you choose to go down this path. You also get the option to do more complex schedules or dependency injection if needed, and logging is included out-of-the-box.
Disclaimer: I am the lead programmer on NCron, so I might just be a tad biased! ;-)
I would make all your Command classes immutable to insure that you don't have to worry about changes to state.
Now a days 'Parallel Extensions' from microsoft should be the viable option to write concurrent code or doing any thread related tasks. It provides good abstraction on top of thread pool and system threads such that you need not to think in an imperative manner to get the task done.
In my opinion consider using it. By the way, your code is clean.
Thanks.
running variable will need to be marked as volatile if its state is going to be changed by another thread.
As to the suitability, why not just use a Timer?