How to spawn a new thread from RIA Domain Service - c#

My team and I are working on an application that accesses a "huge" database, roughly 32M rows in 8 months. The application is a RIA Domain Service application. We have optimized the application and the database design in such a way that even on a box with very limited resources the response time is never more than few seconds.
However, there are certain tasks that need to be performed on a large record set (at least 2-3M records per operation). An example is the generation of a monthly report. Definitely we cannot keep the application waiting for a result, because it would hit either the 30 seconds timeout.
After reading this post, I thought I could create an [Invoke] method, which spawns a new thread, and consequently it would free the client up. The thread would be in charge of extracting data from the DB and writing them nicely in a PDF. I've tried to implement this scenario, but I get an exception, which says that the underlying connection has already been disposed...
Is this approach correct? Can I achieve what I am trying to do or there is some issue I cannot overcome? And is there any better way to do this?
Cheers,
Gianluca.

Ok, I've realized my question is a bit silly.
As far as I understood, the ObjectContext exists as long as the client is connected, otherwise it gets disposed. Because I was writing an Invoke method that does not require any change tracking, I have resolved by:
- spawning a new thread from within the Invoke method
- instantiating a new EF context inside the worker thread
- disposing the new EF context as soon as the separate thread operation is terminated.
Cheers,
Gianluca.

Related

Persisted thread pool in c# web environment

I wrote a c# datacentric web application.
This application needs to perform some things asynchronously (for example, sending email, transmitting some data to external api), but I want them to be persisted for any case of crash / restart.
I also want to pass some data that will be persisted, so when the thread wakes up, it will have this data for the invocation. when I say data, I mean data context, some structured object, so when the thread wakesup, it will have the data for the thread operation, incase of email, To,subject and body.
So just to visualized it, here is an api that I can think of...
public interface IAsyncService{
void QueueWork<T>(object dataContext) where T : IAsyncOperation;
}
public interface IAsyncOperation{
void ExecuteQueuedWork(object dataContext);
}
Does this case scenario possible in .net native? if not, do you know any other possible solution for that?
Yes, and no.
You can't "persist a thread". That's simply impossible. Thread is a low-level thing.
However, you can have the expected result. Just persist the jobs, not threads. Job (or task, or workitem, or whatever you would like to name it) is the set of input data that defines the task to be performed, plus, optionally, the information about progress, temporary results, and similar things.
If you define the "job" just as a set of input data, you will be able to have a pool of workers that will start processing the jobs. When a worker crashes, assuming the job is still persisted, you will be able to start a new worker and let it process the failed job again from the beginning.
If you inclide in the "job" some temporary (partial) results, then after a crash, your new worker can start its work from that saved point.
Now, the granularity of savepoints (if any), the tracking of "which thread does what job", the tracking "what job is completed and which are not" - are solely your responibilities. You have to design and write all of that yourself. That's doable, not that hard, but requires a bit of planning.
Or, with a bit of luck, you might find workerpool/messagequeueing/etc library. I don't remember any right now.

Could MSMQ resolve performance bottleneck of out multithreaded services?

We wrote service that using ~200 threads .
200 Threads must do:
1- Download from internet
2- Parse the raw data (html,xml,json...)
3- Store the newly created data to db
For ~10 threads elapsed time for second operation(Parsing) is 50ms (per thread)
For ~50 threads elapsed time for second operation(Parsing) is 80-18000 ms (per thread)
So we have an idea !
We can download documents as multithreaded but using MSMQ we can send rawdata to another process (consumer). And another process implement second part (Parsing) as single threaded.
You can say why dont you use c# Queue class in same process.. We could not prevent our "precious parsing thread" from Thread Context switch. If there are 200 threads in same process the precious will be context switch victim.
Using MSMQ for this requirement is normal?
Yes, this is an excellent example of where MSMQ makes a lot of sense. You can offload your difficult work to a different process to handle without affecting the performance of your current process which clearly doesn't care about the results. Not only that, but if your new worker process goes down, the queue will preserve state and messages (other than maybe the one being worked on when it went down) will not be lost.
Depending on your needs and goals I'd consider offloading the download to the other process as well - passing URLs to work on to the queue for example. Then, scaling up your system is as easy as dialing up the queue receivers, since queue messages are received in a thread safe manner when implemented correctly.
Yes, it is normal. And there are frameworks/libraries that help you building these kind of solutions providing you more than only transports.
NServiceBus or MassTransit are examples (both can sit on top of MSMQ)

Best way to run automated task every minute when site is on multiple servers

I need to setup an automated task that runs every minute and sends emails in the queue. I'm using ASP.NET 4.5 and C#. Currently, I use a scheduler class that starts in the global.asax and makes use of caching and cache callback. I've read this leads to several problems.
The reason I did it that way is because this app runs on multiple load balanced servers and this allows me to have the execution in one place and the code will run even if one or more servers are offline.
I'm looking for some direction to make this better. I've read about Quartz.NET but never used it. Does Quartz.NET call methods from the application? or from a windows service? or from a web service?
I've also read about using a Windows service, but as far as I can tell, those are installed to the server direct. The thing is, I need the task to execute regardless of how many servers are online and don't want to duplicate it. For example, if I have a scheduled task setup on server 1 and server 2, they would both run together therefore duplicating the requests. However, if server 1 was offline, I need server 2 to run the task.
Any advice on how to move forward here or is the global.asax method the best way for the multi-server environment? BTW, the web servers are running Win Server 2012 with IIS 8.
EDIT
In a request for more information, the queue is stored in a database. I should also make mention that the database servers are separate from the web servers. There are two database servers, but only one runs at a time. There is a central storage they both read from so there is only one instance of the database. When one database server goes down, the other comes online.
That being said, would it make more sense to put a Windows Service deployed to both database servers? That would make sure only one runs at a time.
Also, what are your thoughts about running Quartz.NET from the application? As millimoose mentions, I don't necessarily need it running on the web front end, however, doing so allows me to not deploy a windows service to multiple machines and I don't think there would be a performance difference going either way. Thoughts?
Thanks everyone for the input so far. If any additional info is needed, please let me know.
I have had to tackle the exact problem you're facing now.
First, you have to realize that you absolutely cannot reliably run a long-running process inside ASP.NET. If you instantiate your scheduler class from global.asax, you have no control over the lifetime of that class.
In other words, IIS may decide to recycle the worker process that hosts your class at any time. At best, this means your class will be destroyed (and there's nothing you can do about it). At worst, your class will be killed in the middle of doing work. Oops.
The appropriate way to run a long-lived process is by installing a Windows Service on the machine. I'd install the service on each web box, not on the database.
The Service instantiates the Quartz scheduler. This way, you know that your scheduler is guaranteed to continue running as long as the machine is up. When it's time for a job to run, Quartz simply calls a method on a IJob class that you specify.
class EmailSender : Quartz.IJob
{
public void Execute(JobExecutionContext context)
{
// send your emails here
}
}
Keep in mind that Quartz calls the Execute method on a separate thread, so you must be careful to be thread-safe.
Of course, you'll now have the same service running on multiple machines. While it sounds like you're concerned about this, you can actually leverage this into a positive thing!
What I did was add a "lock" column to my database. When a send job executes, it grabs a lock on specific emails in the queue by setting the lock column. For example, when the job executes, generate a guid and then:
UPDATE EmailQueue SET Lock=someGuid WHERE Lock IS NULL LIMIT 1;
SELECT * FROM EmailQueue WHERE Lock=someGuid;
In this way, you let the database server deal with the concurrency. The UPDATE query tells the DB to assign one email in the queue (that is currently unassigned) to the current instance. You then SELECT the the locked email and send it. Once sent, delete the email from the queue (or however you handle sent email), and repeat the process until the queue is empty.
Now you can scale in two directions:
By running the same job on multiple threads concurrently.
By virtue of the fact this is running on multiple machines, you're effectively load balancing your send work across all your servers.
Because of the locking mechanism, you can guarantee that each email in the queue gets sent only once, even though multiple threads on multiple machines are all running the same code.
In response to comments: There's a few differences in the implementation I ended up with.
First, my ASP application can notify the service that there are new emails in the queue. This means that I don't even have to run on a schedule, I can simply tell the service when to start work. However, this kind of notification mechanism is very difficult to get right in a distributed environment, so simply checking the queue every minute or so should be fine.
The interval you go with really depends on the time sensitivity of your email delivery. If emails need to be delivered ASAP, you might need to trigger every 30 seconds or even less. If it's not so urgent, you can check every 5 minutes. Quartz limits the number of jobs executing at once (configurable), and you can configure what should happen if a trigger is missed, so you don't have to worry about having hundreds of jobs backing up.
Second, I actually grab a lock on 5 emails at a time to reduce query load on the DB server. I deal with high volumes, so this helped efficiency (fewer network roundtrips between the service and the DB). The thing to watch out here is what happens if a node happens to go down (for whatever reason, from an Exception to the machine itself crashing) in the middle of sending a group of emails. You'll end up with "locked" rows in the DB and nothing servicing them. The larger the size of the group, the bigger this risk. Also, an idle node obviously can't work on anything if all remaining emails are locked.
As far as thread safety, I mean it in the general sense. Quartz maintains a thread pool, so you don't have to worry about actually managing the threads themselves.
You do have to be careful about what the code in your job accesses. As a rule of thumb, local variables should be fine. However, if you access anything outside the scope of your function, thread safety is a real concern. For example:
class EmailSender : IJob {
static int counter = 0;
public void Execute(JobExecutionContext context) {
counter++; // BAD!
}
}
This code is not thread-safe because multiple threads may try to access counter at the same time.
Thread A Thread B
Execute()
Execute()
Get counter (0)
Get counter (0)
Increment (1)
Increment (1)
Store value
Store value
counter = 1
counter should be 2, but instead we have an extremely hard to debug race condition. Next time this code runs, it might happen this way:
Thread A Thread B
Execute()
Execute()
Get counter (0)
Increment (1)
Store value
Get counter (1)
Increment (2)
Store value
counter = 2
...and you're left scratching your head why it worked this time.
In your particular case, as long as you create a new database connection in each invocation of Execute and don't access any global data structures, you should be fine.
You'll have to be more specific about your architecture. Where is the email queue; in memory or a database? If they exist on a database, you could have a flag column named "processing" and when a task grabs an email from the queue it only grabs emails that are not currently processing, and sets the processing flag to true for emails it grabs. You then leave concurrency woes to the database.

Can/Should I Use an Asynchronous Controller Here? (ASP.NET MVC 3)

I have this [HttpPost] action method:
[HttpPost]
public ActionResult AddReview(Review review)
{
repository.Add(review);
repository.Save();
repository.UpdateSystemScoring(review.Id); // call SPROC with new Review ID.
return View("Success", review);
}
So, basically a user clicks a button, i add it to my database (via Entity Framework 4.0), save changes, and then i call a stored procedure with the identity field, which is that second last line of code.
This needs to be done after the review is saved (as the identity field is only created once Save is called, and EF persists the changes), and it is a system-wide calculation.
From the user point of view, he/she doesn't/shouldn't care that this calculation is happening.
This procedure can take anywhere from 0-20 seconds. It does not return anything.
Is this a candidate for an asynchronous controller?
Is there a way i can add the Review, and let another asynchronous controller handle the long-running SPROC call, so the user can be taken to the Success page immediately?
I must admit (partially ashamed of this): this is a rewrite of an existing system, and in the original system (ASP.NET Web Forms), i fired off another thread in order to achieve the above task - which is why i was wondering if the same principal can be applied to ASP.NET MVC 3.
I always try and avoid multi-threading in ASP.NET but user experience is the #1 priority, and i do not want the page timing out.
So - is this possible? Also happy to hear any other ideas. Also - i can't use triggers here, don't really want to go into too much detail why - but i can't.
I would fire a new thread (not from the thread pool) to perform this task and return immediately especially if you don't care about the results. Asynchronous controllers are useful in situations where most of the time is spent waiting for some other system to complete the task and you once this system completes the task your application is signaled to process the result. During the execution of the task no threads are consumed from your application. So in your scenario this task could be performed by SQL Server using the async versions of the BeginRead methods in ADO.NET. You could use this if you need the results back. If you don't firing a new thread would work just fine as before.
I think asynchronous controllers are more for things where the request may take a long time to return a response, but the main thread would spend most of that time waiting for another thread/process. This is mostly useful for ajax calls rather than main page load, when it is acceptable to just show a progress indicator until the response is returned.
I use a separate queueing system for this type of task, which is more robust and easier to work with but does take a bit more work to set up. If you really need to do it within the ASP.net process, a separate request is probably the best option, though there is some potential for the task not to run - for example I'm not sure what happens if the connection drops or the app pool recycles while an async task is running.
Since the scoring system takes so long to run I would recommend using a scheduled task in SQL Server or Windows to update the scores every x amount of minutes. Since the user doesn't know about the request it don't matter to run immediately.
You could add the ID's to a queue and process the queue every 30 minutes.
Otherwise if there is a reason this needs to be run immediately you could do an asyc call or see if you could trim some fat of the stored proc.
I have a very similar system that I wrote. Instead of doing things synchronously we do everything asynchronous using queues.
Action -> causes javascript request to web server
|
Web server puts notification on queue
|
Worker picks up message from queue and does point calculation
|
At some point in future user sees points adjusted
This allows us to be able to handle large amounts of user load and not need to worry about this having an adverse affect on our calculation engine. This also means that we can add more workers to handle larger load when we have large load and can remove workers when we don't have a large load.

Programming a long-running time-based process

I was wondering what the best way to write an application would be. Basically, I have a sports simulation project that is multi-threaded and can execute different game simulations concurrently.
I store my matches in a SQLite database that have a DateTime attached to it.
I want to write an application that checks every hour or so to see if any new matches need to be played and spawns those threads off.
I can't rely on the task scheduler to execute this every hour because there are objects that the different instances of that process would share (specifically a tournament object), that I suspect would be overwritten by a newer process when saved back into the DB. So ideally I need to write some sort of long-running process that sleeps between hours.
I've written my object model so that each object is only loaded once from memory, so as long as all simulation threads are spawned from this one application, they shouldn't be overwriting data.
EDIT: More detail on requirements
Basically, multiple matches need to be able to run concurrently. These matches can be of arbitrary length, so it's not necessary that one finishes before the other begins (in fact, in most cases there will be multiple matches executing at the same time).
What I'm envisioning is a program that runs in the background (a service, I guess?) that sleeps for 60 minutes and then checks the database to see if any games should be started. If there are any to be started, it fires off threads to simulate those games and then goes back to sleep. Hence, the simulation threads are running but the "scheduling" thread is sleeping for another 60 minutes.
The reason I can't (I think) use the default OS task-scheduling interface is that these require the task to be executed to be spurned as a new process. I have developed my database object model such that they are cached by each object class on first load (the memory reference) meaning that each object is only loaded from memory once and that reference is used on all saves. Meaning that when each simulation thread is done and saves off its state, the same reference is used (with updated state) to save off the state. If a different executable is launched every time, presumably a different memory reference will be opened by each process and hence one process could save into the DB and overwrite the state written by the other process.
A service looks like the way to go. Is there a way to make a service just sleep for 60 minutes and wake up and execute a function after that? I feel like making this a standard console application would waste memory, but I don't know if there is an efficient way to do that which I'm not aware of.
If you want to make it really reliable, make it a Service.
But I don't see any problems in making it a normal (Console, WinForms, WPF) application.
Maybe you could expand on the requirements a little.
The reason I can't (I think) use the default OS task-scheduling interface is that these require the task to be executed to be spurned as a new process. I have developed my database object model such that they are cached by each object class on first load (the memory reference) meaning that each object is only loaded from memory once and that reference is used on all saves
If you want everything to remain cached forever, then you do need to have an application that simply runs forever. You can make this a windows service, or a normal windows application.
A windows service is just a normal exe that conforms to the service manager API. If you want to make one, visual studio has a wizard which auto-generates some skeleton code for you. Basically instead of having a Main method you have a Service class with a Run method and everything else is the same.
You could, if you wanted to, use the windows task scheduler to schedule your actions. The way you'd do this is to have your long-running windows service in the background that does nothing. Have it open a TCP socket or named pipe or something and just sit there. Then write a small "stub" exe which just connects to this socket or named pipe and tells the background app to wake up.
This is, of course, a lot harder than just doing a sleep in your background application, but it does let you have a lot more control - you can change the sleep time without restarting the background service, run it on-demand, etc.
I would however, consider your design. The fact that you rely on a long-running service is a large point of failure. If your app needs to run for days, and you have a single bug which crashes it, then you have to start again. A much better architecture is to follow the Unix model, where you have small processes which start, do one thing, then finish (in this case, process each game simulation as it's own process so if one dies it doesn't take the master process or the other simulations down).
It seems like the main reason you're trying to have it long-running is to cache your database queries. Do you actually need to do this at all? A lot of the time databases are plenty fast enough (they have their own caches, which are plenty smart). A common mistake I've seen programmers make is to just assume that something like a database is slow, and waste a pile of time optimizing when in actual fact it would be fine

Categories

Resources