I am developing a tcp server that processes commands.
When command arrives a use Task.Run to handle processing on a different thread to unblock the requesting side. The problem is that it is possible for requesting side to make large amount of request thus causing processing side to create lots of threads which subsequently based on my observation can cripple it and unable to process new requests.
Essentially what i need that a single requesting side can have defined max of requests processed and all of other requests queued for later execution.
So my question is should i be looking at a custom Task Scheduler ?
There is an Microsoft example https://msdn.microsoft.com/en-us/library/ee789351(v=vs.100).aspx
I haven't started digging deeper as i wanted to see that this is the actual road i would need to go.
Related
I need some guidance on a project we are developing. When triggered, my program needs to contact 1,000 devices by TCP and exchange about 200 bytes of information. All the clients are wireless on a private network. The majority of the time the program will be sitting idle, but then needs to send these messages as quickly as possible. I have come up with two possible methods:
Method 1
Use thread pooling to establish a number of worker threads and have these threads process their way through the 1,000 conversations. One thread handles one conversation until completion. The number of threads in the thread pool would then be tuned for best use of resources.
Method 2
A number of threads would be used to handle multiple conversations per thread. For example a thread process would open 10 socket connections start the conversation and then use asynchronous methods to wait for responses. As a communication is completed, a new device would be contacted.
Method 2 looks like it would be more effective in that operations wouldn’t have to wait with the server device responded. It would also save on the overhead of starting the stopping all those threads.
Am I headed in the right direction here? What am I missing or not considering?
There is a well-established way to deal with this problem. Simply use async IO. There is no need to maintain any threads at all. Async IO uses no threads while the IO is in progress.
Thanks to await doing this is quite easy.
The select/poll model is obsolete in .NET.
We wrote service that using ~200 threads .
200 Threads must do:
1- Download from internet
2- Parse the raw data (html,xml,json...)
3- Store the newly created data to db
For ~10 threads elapsed time for second operation(Parsing) is 50ms (per thread)
For ~50 threads elapsed time for second operation(Parsing) is 80-18000 ms (per thread)
So we have an idea !
We can download documents as multithreaded but using MSMQ we can send rawdata to another process (consumer). And another process implement second part (Parsing) as single threaded.
You can say why dont you use c# Queue class in same process.. We could not prevent our "precious parsing thread" from Thread Context switch. If there are 200 threads in same process the precious will be context switch victim.
Using MSMQ for this requirement is normal?
Yes, this is an excellent example of where MSMQ makes a lot of sense. You can offload your difficult work to a different process to handle without affecting the performance of your current process which clearly doesn't care about the results. Not only that, but if your new worker process goes down, the queue will preserve state and messages (other than maybe the one being worked on when it went down) will not be lost.
Depending on your needs and goals I'd consider offloading the download to the other process as well - passing URLs to work on to the queue for example. Then, scaling up your system is as easy as dialing up the queue receivers, since queue messages are received in a thread safe manner when implemented correctly.
Yes, it is normal. And there are frameworks/libraries that help you building these kind of solutions providing you more than only transports.
NServiceBus or MassTransit are examples (both can sit on top of MSMQ)
In my server application I want to process lots of coming from client tasks. Client application submits tasks for processing and processing each task requires calling a web service for pre-processing and then actual processing happens.
I was suggested having a queue into which I'll put all tasks that server receives. Then a thread picks up a task from a queue and calls a web service. There will be maybe 40 threads doing this if one blocks on web service, other can do calls as well, picking up items from queue. After a thread receives response from web service it puts pre-processed item on a second queue from which another thread takes tasks for processing. And there will be 1 thread for this queue (will be scaled further per processor - so probably 4 (or more) threads on a 4 core machine).
I believe this can be accomplished more efficiently without having 40 predefined threads doing web service calls and maybe having 1 queue. I think there are multiple options for doing this is .NET more efficiently. Any suggestions?
It's probably broader question how to implement better such a system rather then a .net specific.
I think you should learn about async/await construction awailable in .net 4.5. It is hard to say if it meets all your requirements but you should check it.
I recommend looking into TPL Dataflow, a library that allows you to define a "pipeline" or "mesh" for data processing and then you put the data through it. TPL Dataflow works very well with both asynchronous (e.g., web request) blocks and synchronous (e.g., processing) blocks and has lots of options for parallelism.
In case for some reason you are not up to version 4.5 of the framework look into one-way WCF calls as a kind of "fire and forget" method.
I have a problem with scalability and processing and I want to get the opinion of the stack overflow community.
I basically have XML data coming down a socket and I want to process that data. For each XML line sent processing can include writing to a text file, opening a socket to another server and using various database queries; all of which take time.
At the minute my solution involves the following threads:
Thread 1
Accepts incoming sockets and thus generates child threads that handle each socket (there will only be a couple of incoming sockets from clients). When an XML line comes through (ReadLine() method on StreamReader) I basically put this line into a Queue, which is accessible via a static method on a class. This static method contains locking logic to ensure that the program is threadsafe (I could use Concurrent Queue for this of course instead of manual locking).
Threads 2-5
Constantly take XML lines from the queue and processes them one at a time (database queries, file writes etc).
This method seems to be working but I was curious if there is a better way of doing things because this seems very crude. If I take the processing that threads 2-5 do into thread 1 this results in extremely slow performance, which I expected, so I created my worker threads (2-5).
I appreciate I could replace threads 2-5 with a thread pool but the thread pool would still be reading from the same Queue of XML lines so I wandered if there is a more efficient way of processing these events instead of using the Queue?
A queue1 is the right approach. But I would certainly move from manual thread control to the thread pool (and thus I don't need to do thread management) and let it manage the number of threads.2
But in the end there is only so much processing a single computer (however expensive) can do. At some point one of memory size, CPU-memory bandwidth, storage IO, network IO, … is going to be saturated. At that point using an external queuing system (MSMQ, WebSphere*MQ, Rabbit-MQ, …) with each task being a separate message allows many workers on many computers to process the data ("competing consumers" pattern).
1 I would move immediately to ConcurrentQueue: getting locking right is hard, the more you don't need to do it yourself the better.
2 At some point you might find you need more control than the thread pool providers, that is the time to switch to a custom thread pool. But prototype and test: it is quite possible your implementation will actually be worse: see paragraph 2.
In Asp.Net for creating a huge pdf report iam using "ThreadPool.QueueUserWorkItem", My requirement is report has to be created asynchronously , and i do not want to wait for the Response. I plan to achieve it through below code
protected void Button1_Click(object sender, EventArgs e)
{
ThreadPool.QueueUserWorkItem(report => CreateReport());
}
public void CreateReport()
{
//This method will take 30 seconds to finish it work
}
My question is ThreadPool.QueueUserWorkItem will create a new thread from Asp.Net worker process or some system thread. Is this a good approach ?, I may have 100 of concurrent users accessing the web page.
The QueueUserWorkItem() method utilizes the process's ThreadPool which automatically manages a number of worker-threads. These threads are assigned a task, run them to completion, then are returned to the ThreadPool for reuse.
Since this is hosted in ASP.NET the ThreadPool will belong to the ASP.NET process.
The ThreadPool is a very good candidate for this type of work; as the alternative of spinning up a dedicated thread is relatively expensive. However, you should consider the following limitations of the ThreadPool as well:
The ThreadPool is used by other aspects of .NET, and provides a limited number of threads. If you overuse it there is the possibility your tasks will be blocked waiting for others to complete. This is especially a concern in terms of scalability--however it shouldn't dissuade you from using the ThreadPool unless you have reason to believe it will be a bottleneck.
The ThreadPool tasks must be carefully managed to ensure they are returned for reuse. Unhandled exceptions or returns from a background thread will essentially "leak" that thread and prevent it from being reused. In these scenarios the ThreadPool may effectively lose it's threads and cause a serious slowdown or halt of the process.
The tasks you assign to the ThreadPool should be short-lived. If your processing is intensive then it's a better idea to provide it with a dedicated thread.
All these topics relate to the simple concept that the ThreadPool is intended for small tasks, and for it's threads to provide a cost-saving to the consuming code by being reused. Your scenario sounds like a reasonable case for using the ThreadPool--however you will want to carefully code around it, and ensure you run realistic load-tests to determine if it is the best approach.
The thread pool will manage the number of active threads as needed. Once a thread is done with a task it continues on the next queued task. Using the thread pool is normally a good way to handle background processing.
When running in an ASP.NET application there are a couple of things to be aware of:
ASP.NET applications can be recycled for various reasons. When this happens all queued work items are lost.
There is no simple way to signal back to the client web browser that the operation completed.
A better approach in your case might be to have a WCF service with a REST/JSON binding that is called by AJAX code on the client web page for doing the heavy work. This would give you the possibility to report process and results back to the user.
In addition to what Anders Abel has already laid out, which I agree with entirely, you should consider that ASP.NET also uses the thread pool to respond to requests, so if you have long running work like this using up a thread pool thread, it is technically stealing from the resources which ASP.NET is able to use to fulfill other requests anyway.
If you were to ask me how best to architect it I would say you dispatch the work to a WCF service using one way messaging over the MSMQ transport. That way it is fast to dispatch, resilient to failure and processing of the requests on the WCF side can be more tightly controlled because the messages will just sit on the queue waiting to be processed. So if your server can only create 10 PDFs at a time you would just set the maxConcurrentCalls for the WCF service to 10 and it will only pull a maximum of 10 messages off the queue at once. Also, if your service shuts down, when it starts up it will just begin processing again.