Say I have a scatter gather setup like this:
1) Web app
2) RabbitMQ
3) Scatter gather API 1
4) Scatter gather API 2
5) Scatter gather API x
Say each scatter gather (and any new ones added in future) need to supply an image/update an image to the web app, so that when the web app displays the results on screen it also displays the image. What is the best way to do this?
1) RESTFUL call from each API to web app adding/updating an image where necessary
2) Use message queue to send the image
I believe option two is best because I am using a microservices architecture. However, this would mean that the image could be processed by the web app after requests are made (if competiting consumers are used). Therefore the image could be missing from the webpage?
The problem with option 1 is the scatter gatherer apis are tightly coupled with the web app.
What is the appropriate way to approach this?
The short answer: There is no right way to do this.
The long answer: Because there's no right way to do this, there a danger that any answer I give you will be an opinion. Rather than do that, I'm going to help clarify the ramifications of each option you've proposed.
First thing to note: Unless there is already an image available at the time of the HTTP request, then your HTTP response will not be able to include an image. This means that your front-end will need to be updated after the HTTP request/response cycle has concluded. There are two ways to do this: polling via AJAX requests, or pushing via sockets.
The advantage of polling is that it is probably easier to integrate into an existing web app. The advantage of pushing the image to the client via sockets is that the client won't need to spam your server with polling requests.
Second thing to note: Reporting back the image from the scatter/gather workers could happen either via an HTTP endpoint, or via the message queue, as you suggest.
The advantage of the HTTP endpoint is that it would likely be simpler to setup. The advantage of the message queue is that the worker would not have to wait for the the HTTP response (which could take a while if you're writing a large image file to disk) before moving on to the next job.
One more thing to note: If you choose to use an HTTP endpoint to create/update the images, it is possible that multiple scatter/gather workers will be trying to do this at the same time. You'll need to handle this to prevent multiple workers from trying to write to the same file at the same time. You could handle this by using a mutex to lock the file while one process is writing to it. If you choose to use a message queue, you'll have several options for dealing with this: you could use a mutex, or you could use a FIFO queue that guarantees the order of execution, or you could limit the number of workers on the queue to one, to prevent concurrency.
I do have experience with a similar system. My team and I chose to use a message queue. It worked well for us, given our constraints. But, ultimately, you'll need to decide which will work better for you given your constraints.
EDIT
The constraints we considered in choosing a message queue over HTTP included:
Not wanting to add private endpoints to a public facing web app
Not wanting to hold up a worker to wait on an HTTP request/response
Not wanting to make synchronous that which was asynchronous
There may have been other reasons. Those are the ones I remember off the top of my head.
Related
This is a high level question that I am asking for something I am currently architecting and cannot seem to find the exact answer I am looking for.
Scenario:
I have a .Net Core REST API that will be receiving requests from an external application. These requests will be getting pushed into a RabbitMQ instance. These notifications will be thrown to an exchange, then fanned out to multiple queues for multiple consumers.
There is one consumer that I will be responsible for and I am looking for advice on best practices. Ultimately, there will be a REST API that will eventually need to react to these messages being pushed into the queue. This REST API in question is a containerized (Docker) app running on a Kubernetes cluster. It will be receiving a lot of request traffic outside of these notifications (queue messages), making SQL calls, etc.
My question is, should I have an external microservice (hosted service/background service) that subscribes to this queue with the intent of calling into said REST API. Kind of like a traffic cop; routing messages to the appropriate API method based on certain data points.
Or
Would it be OK to put this consumer directly into the high-traffic REST API in question?
Any advice around this? Thanks in advance!
There is no right or wrong. This is the whole dilemma around monolith-microservices and synchronous-asynchronous.
If you are looking at going with microservices and more asynchronous, you can start with these questions:
Do you want your system into different codebases?
Do you want to divide responsibilities among different teams?
Do you want to use different languages/projects for the different components?
Do you want some components of the system to respond faster to the user?
Can your app be ok with the fact that one decoupled component may fail completely?
should I have an external microservice (hosted service/background service) that subscribes to this queue with the intent of calling into said REST API. Kind of like a traffic cop; routing messages to the appropriate API method based on certain data points.
Yes, if you are thinking more on the microservices route and the answer is 'yes' for most of the above questions (and even more microservices related questions not mentioned).
If you are thinking more about the monolith route:
Are you ok with the same code base shared across the different teams?
Are you ok with a more unified programming language?
Do you want to have a monorepo? (although you can do micro-services with monorepos)
Is the codebase going to be mainly be worked on by a few people who know it really well?
Is it easy to provide redundancy within the app? i.e If one component fails the application doesn't crash.
Would it be OK to put this consumer directly into the high-traffic REST API in question?
Yes, if your code can handle it and you are more in line with 'yes' on the answers above.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
the title of the question may not clear enough, allow me to explain the background here:
I would like to design a web service that generates PDF and submit it to printer, here is the workflow:
User submit a request to the web service, probably the request will be one off so that user wouldn't suffer from waiting the job complete. User may received a HTTP200 and continue their work.
Once web service received the request, the web service generates the PDF, and submit it to designated printer and this process could take some time and CPU resources. As I don't want the drain all resource on that server, I may use producer consumer pattern here, there might be a queue to to queue client jobs, and process them one by one.
My Questions is that:
I'm new to C#, what is the proper pattern to queue and process them? Should I use ConcurrentQueue and ThreadPool to archive it?
What is the proper way to notify user about the job is success/fail? Instead of using callback service, is async an ideal way? My concern is that there may be lots of jobs in the queue and I don't want client suffer from waiting it complete.
The web service is placed behind a load balancer, how can I maintain a 'process queue' among them? I've tried using Hangfire and it seems okay, however I'm looking for alternative?
How can I know the number of jobs in the Queue/ how may thread is currently running? The webservice will be deployed on IIS, is there a Native way to archive it, or should I implement a web service call to obtain them?
Any help will be appreciated, thanks!
WCF supports the idea of a fire-and-forget methods. You just mark your contract interface method as one way, and there will be no waiting for a return:
[OperationContract( IsOneWay = true )]
void PrintPDF( PrintRequest request );
The only downside, of course, is that you won't get any notification from the server that you're request was successful or even valid. You'd have to do some kind of periodic polling to see what's going on. I guess you could put a Guid into the PrintRequest, so you could interrogate for that job later.
If you're not married to wcf, you might consider signalR...there's a comprehensive sample app of both a server and simple wpf client here. It has the advantage that either party can initiate an exchange once the connection has been established.
If you need to stick with wcf, there's the possibility of doing dualHttp. The client connects with an endpoint to callback to...and the server can then post notifications as work completes. You can get a feel for it from this sample.
Both signalR and wcf dualHttp are pretty straightforward. I guess my preference would be based on the experience of the folks doing the work. signalR has the advantage of playing nicely with browser-based clients...if that ever turns into a concern for you.
As for the queue itself...and keeping with the wcf model, you want to make sure your requests are serializable...so if need be, you can drain the queue and restart it later. In wcf, that typically means making data contracts for queue items. As an aside, I never like to send a boatload of arguments to a service, I prefer instead to make a data contract for method parameters and return types.
Data contracts are typically just simple types marked up with attributes to control serialization. The wcf methods do the magic of serializing/deserializing your types over the wire without you having to do much thinking. The client sends a whizzy and the server receives a whizzy as it's parameter.
There are caveats...in particular, the deserialization doesn't call your constructor (I believe it uses MemberwiseClone instead) ...so you can't rely on the constructor to initialize properties. To that end, you have to remember that, for example, collection types that aren't required might need to be lazily initialized. For example:
[DataContract]
public class ClientState
{
private static object sync = new object( );
//--> and then somewhat later...
[DataMember( Name = "UpdateProblems", IsRequired = false, EmitDefaultValue = false )]
List<UpdateProblem> updateProblems;
/// <summary>Problems encountered during previous Windows Update sessions</summary>
public List<UpdateProblem> UpdateProblems
{
get
{
lock ( sync )
{
if ( updateProblems == null ) updateProblems = new List<UpdateProblem>( );
}
return updateProblems;
}
}
//--> ...and so on...
}
Something I always do is to mark the backing variable as the serializable member, so deserialization doesn't invoke the property logic. I've found this to be an important "trick".
Producer/consumer is easy to write...and easy to get wrong. Look around on StackOverflow...you'll find plenty of examples. One of the best is here. You can do it with ConcurrentQueue and avoid the locks, or just go at it with a good ol' simple Queue as in the example.
But really...you're so much better off using some kind of service bus architecture and not rolling your own queue.
Being behind a load balancer means you probably want them all calling to a service instance to manage a single queue. You could roll your own or, you could let each instance manage its own queue. That might be more processing than you want going on on your server instances...that's your call. With wcf dual http, you may need your load balancer to be configured to have client affinity...so you can have session-oriented two-way communications. signalR supports a message bus backed by Sql Server, Redis, or Azure Service Bus, so you don't have to worry about affinity with a particular server instance. It has performance implication that are discussed here.
I guess the most salient advice is...find out what's out there and try to avoid reinventing the wheel. By all means, go for it if you're in burning/learning mode and can afford the time. But, if you're getting paid, find and learn the tools that are already in the field.
Since you're using .Net on both sides, you might consider writing all your contracts (service contracts and data contracts) into a .DLL that you use on both the client and the service. The nice thing about that is it's easy to keep things in sync, and you don't have to use the (rather weak) generated data contract types that come through WSDL discovery or the service reference wizard, and you can spin up client instances using ChannelFactory<IYourServiceContract>.
I've read a good bit about threading with C#, but to be upfront I haven't done anything in production using it.
I have an application that has to process a bunch of documents and then send the documents via email. This may take 60 seconds to accomplish. I don't want the user of my web application to have to wait for these things to process to move on to other parts of the site.
On a button click the SendEmail function is called. What can I do to this code to make it so that my users can continue browsing the site without discontinuing the processing I need to do within the EmailPDFs function?
[Authorize]
public ActionResult SendEmail(decimal? id, decimal? id2)
{
EmailPDFs(..., ..., ...);
}
Thanks so much!
This is really the kind of thing that message queues are designed to handle. Fire off a message, and a process on a potentially separate server picks it up and processes it. When it's done, it sends a message back to a queue on your server, where a process on your server picks it up and notifies you that it's complete. You then notify your user that the work is finished.
Modern message queue systems can be backed by databases (such as Mongo, MySql, or SQL Server), and are extremely robust. The great thing about them is that they allow you to move long-running or CPU-intensive processes off onto other servers so that your web site remains nice and snappy.
You could try to add multi-threading and parallelism to your web application, by using TaskFactory and all that other stuff (for many folks, this is the route they take), but it doesn't make it very easy to separate your application if you need to, and break those big, resource-hogging pieces off if it becomes necessary.
I urge you to consider a queue-based solution.
Update:
For samples and information on how to implement this type of solution, see the following:
Reliable Messaging with MSMQ and .NET on MSDN
C#: A Message Queuing Service Application on MSDN
Also, consider glancing at this StackOverflow question for a quick crash course on the bare minimimum amount of code required.
A final note: MSMQ is built into certain flavors of Windows, and can be added to it through the Add/Remove Programs feature of the Control Panel. However, how you install it will depend on your specific flavor and version of Windows. A simple Google search will help you to find the appropriate instructions.
Good luck!
As part of my constant learning curve into what you can do to make apps scale better, I am currently trying to get a direction to go with queuing, i.e. job queuing or workload processing whichever phrase you like.
In the distant past I used IBM MQ/Series - it worked for a financial app but quite heavy if I remember.
I know of MSMQ, and I have also heard of quite a few others.
But first, here is my context
I have a C#/.NET back-end web app which serves data etc to a Javascript (mostly jQuery etc) front-end via AJAX calls etc. I have a situation where a certain action involves uploading some files, setting up a few record entries in the database, emailing some users etc. So of course I don't want to make this process "online"/"real-time" due to the possible time delay and I am sure the overheads on the webserver/database etc.
So given the type of "messages" that I need to queue and process, what would be (I shouldn't just say easy here I guess!) a good start point? should I run with MSMQ and/or the SQL 2008 service broker stuff, or something like ZeroMQ - or should I simply create my own lightweight workload queue service?
I realise again without seeing the full picture it is hard to make full recommendations, however any start points gratefully received!
David
Don't try to make your own, please! There are so many things to take into account that you will spend more time on it than the rest of your project most probably.
I'd say go for MSMQ, it's very easy to use with WCF, the queues are transactional, have a retry mechanism, etc, and you benefit from the MSMQ UI to see the messages, move them and so on.
I'm currently in the process of building an ASP.NET MVC web application in c#.
I want to make sure that this application is built so that it can scale out in the future without the need for major re-factoring.
I'm quite keen on using some sort of queue to post any writes to my database base to and have a process which polls that queue asynchronously to perform the update. Once this data has been posted back to the database the client then needs to be updated with the new information. The implication here being that the process to write the data back to the database could take a short while based on business rules executing on the server.
My question is what would be the best way to handle the update from the client\browser perspective.
I'm thinking along the lines of posting the data back to the server and adding it to the queue and immediately sending a response to the client then polling at some frequency to get the updated data. Any best practices or patterns on this would be appreciated.
Also in terms of reading data from the database would you suggest using any particular techniques or would reading straight from db be sufficient given my scenario.
Update
Thought I'd post an update on this as it's been a while. We've actually ended up using Windows Azure but the solution is applicable to other platforms.
What we've ended up doing is using the Windows Azure Queue to post messages\commands to. This is a very quick process and returns immediately. We then have a worker role which processes these messages on another thread. This allows us to minimize any db writes\updates on the web role in theory allowing us to scale more easily.
We handle informing the user via emails or even silently depending on the type of data we are dealing with.
Not sure if this helps but why dont you have an auto refresh on the page every 30 seconds for example. This is sometimes how news feeds work on sports websites, saying the page will be updated every x minutes.
<meta http-equiv="refresh" content="120;url=index.aspx">
Why not let the user manually poll the status of the request? This is how your typical e-commerce app is implemented. When you purchase something online, the order is submitted to a queue for fullfillment. After it's submitted, the user is presented with a "Thank you for your order" page and a link where they can check the status of the order. The user can visit the link anytime to check the status, no need for an auto-poll mechanism.
Is your scenario so different from this?
Sorry in my previous answer I might have misunderstood. I was talking of a "queue" as something stored in a SQL DB, but it seems on reading your post again you are may be talking about a separate message queueing component like MSMQ or JMS?
I would never put a message queue in the front end, between a user and backend SQL DB. Queues are good for scaling across time, which is suitable between backend components, where variances in processing times are acceptable (e.g. order fulfillment)... when dealing with users, this variance is usually not acceptable.
While I don't know if I agree with the logic of why, I do know that something like jQuery is going to make your life a LOT easier. I would suggest making a RESTful web API that your client-side code consumes. For example, you want to post a new order to the system and have the client responsive? Make a post to www.mystore.com/order/create and have that return the new URI to access the order (i.e. order#) as a URI (www.mystore.com/order/1234). That response is then stored in the client code and a jQuery call is setup to poll for a response or stop polling on an error.
For further reading check out this Wikipedia article on the concept of REST.
Additionally you might consider the Reactive Extensions for .NET and within that check out the RxJS sub-project which has some pretty slick ways of handling with the polling problem without causing you to write the polling code yourself. Fun things to play with!
Maybe you can add a "pending transactions" area to the UI. When you queue a transaction, add it to the user's "pending transactions" list.
When it completes, show that in the user's "pending transactions" list the next time they request a new page.
You can make a completed transaction stay listed until the user clicks on it, or for a predetermined length of time.