Could MSMQ resolve performance bottleneck of out multithreaded services? - c#

We wrote service that using ~200 threads .
200 Threads must do:
1- Download from internet
2- Parse the raw data (html,xml,json...)
3- Store the newly created data to db
For ~10 threads elapsed time for second operation(Parsing) is 50ms (per thread)
For ~50 threads elapsed time for second operation(Parsing) is 80-18000 ms (per thread)
So we have an idea !
We can download documents as multithreaded but using MSMQ we can send rawdata to another process (consumer). And another process implement second part (Parsing) as single threaded.
You can say why dont you use c# Queue class in same process.. We could not prevent our "precious parsing thread" from Thread Context switch. If there are 200 threads in same process the precious will be context switch victim.
Using MSMQ for this requirement is normal?

Yes, this is an excellent example of where MSMQ makes a lot of sense. You can offload your difficult work to a different process to handle without affecting the performance of your current process which clearly doesn't care about the results. Not only that, but if your new worker process goes down, the queue will preserve state and messages (other than maybe the one being worked on when it went down) will not be lost.
Depending on your needs and goals I'd consider offloading the download to the other process as well - passing URLs to work on to the queue for example. Then, scaling up your system is as easy as dialing up the queue receivers, since queue messages are received in a thread safe manner when implemented correctly.

Yes, it is normal. And there are frameworks/libraries that help you building these kind of solutions providing you more than only transports.
NServiceBus or MassTransit are examples (both can sit on top of MSMQ)

Related

C# processing received socket data via threads

I have a problem with scalability and processing and I want to get the opinion of the stack overflow community.
I basically have XML data coming down a socket and I want to process that data. For each XML line sent processing can include writing to a text file, opening a socket to another server and using various database queries; all of which take time.
At the minute my solution involves the following threads:
Thread 1
Accepts incoming sockets and thus generates child threads that handle each socket (there will only be a couple of incoming sockets from clients). When an XML line comes through (ReadLine() method on StreamReader) I basically put this line into a Queue, which is accessible via a static method on a class. This static method contains locking logic to ensure that the program is threadsafe (I could use Concurrent Queue for this of course instead of manual locking).
Threads 2-5
Constantly take XML lines from the queue and processes them one at a time (database queries, file writes etc).
This method seems to be working but I was curious if there is a better way of doing things because this seems very crude. If I take the processing that threads 2-5 do into thread 1 this results in extremely slow performance, which I expected, so I created my worker threads (2-5).
I appreciate I could replace threads 2-5 with a thread pool but the thread pool would still be reading from the same Queue of XML lines so I wandered if there is a more efficient way of processing these events instead of using the Queue?
A queue1 is the right approach. But I would certainly move from manual thread control to the thread pool (and thus I don't need to do thread management) and let it manage the number of threads.2
But in the end there is only so much processing a single computer (however expensive) can do. At some point one of memory size, CPU-memory bandwidth, storage IO, network IO, … is going to be saturated. At that point using an external queuing system (MSMQ, WebSphere*MQ, Rabbit-MQ, …) with each task being a separate message allows many workers on many computers to process the data ("competing consumers" pattern).
1 I would move immediately to ConcurrentQueue: getting locking right is hard, the more you don't need to do it yourself the better.
2 At some point you might find you need more control than the thread pool providers, that is the time to switch to a custom thread pool. But prototype and test: it is quite possible your implementation will actually be worse: see paragraph 2.

ThreadPool.QueueUserWorkItem uses ASP.Net

In Asp.Net for creating a huge pdf report iam using "ThreadPool.QueueUserWorkItem", My requirement is report has to be created asynchronously , and i do not want to wait for the Response. I plan to achieve it through below code
protected void Button1_Click(object sender, EventArgs e)
{
ThreadPool.QueueUserWorkItem(report => CreateReport());
}
public void CreateReport()
{
//This method will take 30 seconds to finish it work
}
My question is ThreadPool.QueueUserWorkItem will create a new thread from Asp.Net worker process or some system thread. Is this a good approach ?, I may have 100 of concurrent users accessing the web page.
The QueueUserWorkItem() method utilizes the process's ThreadPool which automatically manages a number of worker-threads. These threads are assigned a task, run them to completion, then are returned to the ThreadPool for reuse.
Since this is hosted in ASP.NET the ThreadPool will belong to the ASP.NET process.
The ThreadPool is a very good candidate for this type of work; as the alternative of spinning up a dedicated thread is relatively expensive. However, you should consider the following limitations of the ThreadPool as well:
The ThreadPool is used by other aspects of .NET, and provides a limited number of threads. If you overuse it there is the possibility your tasks will be blocked waiting for others to complete. This is especially a concern in terms of scalability--however it shouldn't dissuade you from using the ThreadPool unless you have reason to believe it will be a bottleneck.
The ThreadPool tasks must be carefully managed to ensure they are returned for reuse. Unhandled exceptions or returns from a background thread will essentially "leak" that thread and prevent it from being reused. In these scenarios the ThreadPool may effectively lose it's threads and cause a serious slowdown or halt of the process.
The tasks you assign to the ThreadPool should be short-lived. If your processing is intensive then it's a better idea to provide it with a dedicated thread.
All these topics relate to the simple concept that the ThreadPool is intended for small tasks, and for it's threads to provide a cost-saving to the consuming code by being reused. Your scenario sounds like a reasonable case for using the ThreadPool--however you will want to carefully code around it, and ensure you run realistic load-tests to determine if it is the best approach.
The thread pool will manage the number of active threads as needed. Once a thread is done with a task it continues on the next queued task. Using the thread pool is normally a good way to handle background processing.
When running in an ASP.NET application there are a couple of things to be aware of:
ASP.NET applications can be recycled for various reasons. When this happens all queued work items are lost.
There is no simple way to signal back to the client web browser that the operation completed.
A better approach in your case might be to have a WCF service with a REST/JSON binding that is called by AJAX code on the client web page for doing the heavy work. This would give you the possibility to report process and results back to the user.
In addition to what Anders Abel has already laid out, which I agree with entirely, you should consider that ASP.NET also uses the thread pool to respond to requests, so if you have long running work like this using up a thread pool thread, it is technically stealing from the resources which ASP.NET is able to use to fulfill other requests anyway.
If you were to ask me how best to architect it I would say you dispatch the work to a WCF service using one way messaging over the MSMQ transport. That way it is fast to dispatch, resilient to failure and processing of the requests on the WCF side can be more tightly controlled because the messages will just sit on the queue waiting to be processed. So if your server can only create 10 PDFs at a time you would just set the maxConcurrentCalls for the WCF service to 10 and it will only pull a maximum of 10 messages off the queue at once. Also, if your service shuts down, when it starts up it will just begin processing again.

Question regarding threading/background workers

I have a question around threading and background workers that I hope you can help with.
I plan on making an ftp application to upload a file to 50 servers. Rather than the user having to wait for each upload to finish before the next one starts I was looking at threading/background workers. Once an upload finishes I want to report the status of the upload "completed/failed" back to the UI. From my understanding, I will need to use background workers for this so I know when the task has completed. I know with threading I can use producer/consumer queue or a semaphore to run a given amount of threads at once but I am not quite sure how I can achieve this with back ground workers.
So my question is, what would be a sensible number of background workers controlling uploading to run at once and what would be the best way to queue the rest?
There is no limit on the size of the upload file so this could be quite small or up to a few MB.
Thanks in advance.
Edit - I tested out one backgroundworker for each server running simultaneousness. The results where faster than just a single backgroundworker but I can't say that i was fully comfortable with running 50 plus background workers at once and since the server count may increase in the future, I decided to stick with just the one, which seems to be fast enough. I may in future look at increasing the count of workers to 2 or 3 but currently 1 seems to be adequate. Thanks for everyones help.
Thanks
I'd go in a completely different direction with it, tbh. Your app should take the file and store it once, responding to the client that it's got it. The file should then be propagated to the other servers. You can do this many ways, but if you want it controlled by the same application (i.e. not done using a windows service or the like) then a good way would be to use a message queue (either MSMQ or one of the OS ones).
This is much easier than using a semaphore or producer-consumer queue.
Put all your tasks in a queue (doesn't need to be a thread-safe queue, it will only be used from the UI thread).
Loop from 1 to N, taking out a task and starting a BackgroundWorker. (Be sure to handle the empty queue, when there were less than N tasks to begin with). In the RunWorkerCompleted event, update your UI, dequeue another task, and start another BackgroundWorker.
The bottleneck here is going to be your network bandwidth. If your local upstream connection is so fast that you can saturate the incoming connections on two or more remote hosts, then you'll benefit from running multiple uploads in parallel. If not, then it makes very little difference to the total upload time, since it'll be dictated by (file size * number of uploads) / (local bandwidth). In other words - if you do 20 uploads one at a time, it'll take an hour; if you do 20 uploads in parallel, it'll still take an hour. The advantage of the first approach is that if you lose connectivity you'll only need to resume/restart a single upload - whichever one was in progress when the connection was lost.
I'd therefore use a single background thread to sequentially upload the file to each server in turn. If you're using the .NET BackgroundWorker to do this, you can get it to ReportProgress at the end of each file (and you know in advance how many files are to be uploaded so you can calculate progress as a percentage), and attach some custom state to the progress update to inform the user whether the last upload succeeded or not.
The only way to know for sure is to test and measure, but it can be different from machine to machine, mostly depending on uplink speed.
Starting 50 backgroundworkers at the same time is a bit on the high end, but is not incredibly many. A simple approach would be to start 50 all at the same time and measure memory consumption and upload speed.
If the FTP servers are each much faster than the client uplink speed the most efficient would be to just upload one (or possibly two) at a time.

Run one method 1000 times in a short period of time

Let's say we are building some public service that grabs the setup of a user (what server, user and pwd he wants to perform the call), logs in into that server and do some processing...
the process takes about 15 seconds to complete
each user has a different setup (server/user/pwd), so the process needs to run against each one
if 1000 users tells the system to run the method at 1:00PM
How can I insure that the method is processed in the next 15 minutes?
What should be the correct approach to this little problem?
I'm thinking that I need to do something Asynchronously, and parallel processing could speed up things, maybe throttling the processes, maybe execute 100 calls per each 30 seconds?
I never did something like this and would love to get your feedback on ideas and future problems just to spend 100 hours of work and realize that I took a wrong road :(
Thank you.
added
The only thing to have in consideration is that this should be a 100% web solution.
If one call to your method does not affect the result of another method call (which seems to be the case here), parallel programming seems to be the way to go.
Consider not processing this in the asp.net application directly, but rather placing such requests on a queue and having another process (windows service may be a good candidate here) pulling items off the queue for processing. The windows service can have multiple threads and can pull as many items off the queue at once as there are processing threads available. With an appropriate queuing mechanism, the windows service can run on separate hardware if needed to reach your performance goals.
You can have the original web page query the result using e.g. Ajax to provide the user feedback if that's a requirement.
UPDATE:
Microsoft has recommended a pattern for long running tasks that can be used in a hosted environment.
Well, 1000 * 15 seconds is more than 4 hours, so you can only complete the entire task within the 15 minute time frame if you parallelize the batch.
I would set up a queue and have a sufficient number of threads or processes pull from that queue.
You can define an in-process queue with Queue<T> or out-of-process either with a database table or MSMQ.
If you don't want to write multithreaded code, you can just have a bunch of different processes running on different machines, all pulling from the same queue.
A console application can do this, but a Windows Service is definitely also an alternative.

Server multithreading overkill?

I'm creating a server-type application at the moment which will do the usual listening for connections from external clients and, when they connect, handle requests, etc.
At the moment, my implementation creates a pair of threads every time a client connects. One thread simply reads requests from the socket and adds them to a queue, and the second reads the requests from the queue and processes them.
I'm basically looking for opinions on whether or not you think having all of these threads is overkill, and importantly whether this approach is going to cause me problems.
It is important to note that most of the time these threads will be idle - I use wait handles (ManualResetEvent) in both threads. The Reader thread waits until a message is available and if so, reads it and dumps it in a queue for the Process thread. The Process thread waits until the reader signals that a message is in the queue (again, using a wait handle). Unless a particular client is really hammering the server, these threads will be sat waiting. Is this costly?
I'm done a bit of testing - had 1,000 clients connected continually nagging - the server (so, 2,000+ threads) and it seemed to cope quite well.
I think your implementation is flawed. This kind of design doesn't scale because creating threads is expensive and there is a limit on how many threads can be created.
That is the reason that most implementations of this type use a thread pool. That makes it easy to put a cap on the maximum amount of threads while easily managing new connections and reusing the threads when the work is finished.
If all you are doing with your thread is putting items in a queue, then use the
ThreadPool.QueueUserWorkItem method to use the default .NET thread pool.
You haven't given enough information in your question to specify for definite but perhaps you now only need one other thread, constantly running clearing down the queue, you can use a wait handle to signal when something has been added.
Just make sure to synchronise access to your queue or things will go horribly wrong.
I advice to use following patter. First you need thread pool - build in or custom. Have a thread that checks is there something available to read, if yes it picks Reader thread. Then reading thread puts into queue and then thread from pool of processing threads will pick it. it will minimize number of threads and minimize time spend in waiting state

Categories

Resources