I have a challenge that I am encountering when needing to pull down data from a service. I'm using the following call to Parallel.ForEach:
Parallel.ForEach(idList, id => GetDetails(id));
GetDetails(id) calls a web service that takes roughly half a second and adds the resulting details to a list.
static void GetDetails(string id)
{
var details = WebService.GetDetails(Key, Secret, id);
AllDetails.Add(id, details);
}
The problem is, I know the service can handle more calls, but I can't seem to figure out how to get my process to ramp up more calls, UNLESS I split my list and open the process multiple times. In other words, if I open this app GetDetails.exe 4 times and split the number of IDs into each, I cut the run time down to 25% of the original. This tells me that the possibility is there but I am unsure how to achieve it without ramping up the console app multiple times.
Hopefully this is a pretty simple issue for folks that are more familiar with parallelism, but in my research I've yet to solve it without running multiple instances.
A few possibilities:
There's a chance that WebService.GetDetails(...) is using some kind of mechanism to ensure that only one web request actually happens at a time.
.NET itself may be limiting the number of connections, either to a given host or in general see this question's answers for details about these kinds of problems
If WebService.GetDetails(...) reuses some kind of identifier like a session key, the server may be limiting the number of concurrent requests that it accepts for that one session.
It's generally a bad idea to try to solve performance issues by hammering the server with more concurrent requests. If you control the server, then you're causing your own server to do way more work than it needs to. If not, you run the risk of getting IP-banned or something for abusing their service. It's worth checking to see if the service you're accessing has some options to batch your requests or something.
As Scott Chamberlain mentioned in comments, you need to be careful with parallel processes because accessing structures like Dictionary<> from multiple threads concurrently can cause sporadic, hard-to-track-down bugs. You'd probably be better off using async requests rather than parallel threads. If you're careful about your awaits, you can have multiple requests be active concurrently while still using just a single thread at a time.
Related
I have a webservice (asmx) that is running on 2 different servers that sit behind a load balancer. The service is called by multiple clients across our organization, as per my current knowledge, none of the clients use multiple threads.
I'm investigating a production issue where some of the data in a few static variables is clearing or returning null or empty, causing db exceptions and foreign key constraint errors.
Upon investigation, I noticed that the singleton pattern is not implemented correctly, so it's definitely not multi thread safe.
I checked with my team and see if there is any scenario where the service might run under multiple threads but they're all saying no.
I don't know why but I'm still convinced that it is running multiple threads as all the production issues I see align with the multi thread functionality. I can also force these errors when I do a parallel.invoke in my unit test cases, but I cannot find the scenario where it's happening on a day to day basis.
I was wondering if there is any way to go through the IIS logs or anything on the windows servers itself that might clarify this situation whether the service or anything inside it is using multiple threads while it's running.
Is it possible that on each IIS, the service is in its own single thread but when it calls other classes and methods within itself, they start their own thread?
I apologize for not sharing any code yet, just given the sheer amount of code, I didn't get a chance to extract part of it to post it here, I'll need to refactor quite a few things before I can post it here.
Many thanks in advance.
Here's a problem I'm currently facing:
A WCF service exposes a large number of methods, some of which can take a longer amount of time.
The client is a WinRT (Metro-style) application (so some .NET classes are unavailable).
The timeout on the client has already been increased to 1.5 minutes.
Despite the increased timeout, some operations can take longer still (but not always).
If a timeout happens, the service continues on it's merry way. The result of the requested operation is lost. Even worse, if the operation is a success, then the client won't get the data required, and the server won't "rollback".
All operations are already implemented using the async pattern on the client. I could use an event-based implementation but, as far as I'm aware, the timeouts will still occur then.
Increasing the timeout value is definitely an option, but it feels like a very dirty solution - it feels like pushing the problem away rather than solving it.
Implementing a WS transaction flow on the server seems impossible - I don't have access to TransactionScope class when designing WinRT apps.
WS Atomic seems like overkill as well (it also requires a lot more set up, and I'm willing to bet the limited capabilities of WinRT applications will prove a big hassle to overcome).
So far my only idea (albeit one with a lot more moving parts, which sort of feels like reinventing the wheel) is to create two service methods - one which begins some long-running operation and returns some kind of "task ID", then runs the operation in the background, and saves the result of the operation (be it error or success) into a DB / storage with that task ID. The client can then poll for the operations result using that task ID via the second service method every once in a while until such a result is available (be it a success or an error).
This approach also has it's drawbacks:
long operations become even longer, as the client needs to poll for the results
lots of new moving parts, potentially making the whole thing less stable
What else could I possibly try to solve this issue?
PS. The actual service side is also not without limitations - it's an MS DAX service, which likely comes with it's own set of potential pitfalls and traps.
EDIT:
It appears my question has some similarity to this SO question... however, given the WinRT nature of the client and the MS DAX nature of the service I'm not sure anything in the answer is really useful to me.
I am creating a windows application (using windows form application) which calls the web service to fetch data. In data, I have to fetch 200+ clients information and for each client, I have to fetch all users information. A client can have 50 to 100 users. So, I am calling web service in a loop (after getting all clients list) for each client to fetch the users listing. This is a long process. I want to reduce the execution time for this whole process. So, please suggest me which approach can help in reducing the execution time which is currently up to 40-50 mins for one time data fetch. Let me know any solution like multithreading or any thing else, whichever is best suited to my application.
Thanks in advance.
If you are in control of the web service, have a method that returns all the clients at once instead of 1 by one to avoid rountrips as Michael suggested.
If not, make sure to make as many requests at the same time (not in sequence) to avoid as much laterncy as possible. For each request you will have at least 1 rountrip (so at least your ping's Worth of delay), if you make 150 requests then you'll get your ping to the server X 150 Worth of "just waiting on the network". If you split those requests in 4 bunches, and do each of these bunches in parallel, then you'll only wait 150/4*ping time. So the more requests you do concurrently, the least you wait.
I suggest you to avoid calling the service in a loop for every user to get the details, but instead do that loop in the server and return all the data in one-shot, otherwise you will suffer of a lot of useless latencies caused by the thousand of calls, and not just because of the server time or data-transferring time.
This is also a pattern, called Remote Facade or Facade Pattern explained by Martin Fowler and the Gang of Four:
any object that's intended to be used as a remote objects needs a coarse-grained interface that minimizes the number of calls needed to get some-thing done [...] Rather than ask for an order and its order lines individually, you need to access and update the order and order lines in a single call.
In case you're not in control of the web service, you could try to use a Parallel.ForEach loop instead of a ForEach loop to query the web service.
The MSDN has a tutorial on how to use it: http://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx
Question:
Is there a way to force the Task Parallel Library to run multiple tasks simultaneously? Even if it means making the whole process run slower with all the added context switching on each core?
Background:
I'm fairly new to multithreading, so I could use some assistance. My initial research hasn't turned up much, but I also doubt I know what exactly to search for. Perhaps someone more experienced with multithreading can help me better understand TPL and/or find a better solution.
Our company is planning on deploying a piece of software to all users' machines that will connect to a central server a few times a day, and synchronize some files and MS Access data back to the user's machine. We would like to load-test this concept first and see how the Access DB holds up to lots of simultaneous connections.
I've been tasked with writing a .NET application that behaves like the client app (connecting & syncing with a network location), but does this on multiple threads simultaneously.
I've been getting familiar with the Task Parallel Library (TPL), as this seems like the best (newest) way to handle multithreading, and get return values back from each thread easily. However as I understand it, TPL decides how to run each "task" for the fastest execution possible, splitting the work among the available cores. So lets say I want to run 30 sync jobs on a 2-core machine... the TPL would run 15 on each core, sequentially. This would mean my load test would only be hitting the Access DB with at most 2 connections at the same time. I want to hit the database with lots of simultaneous connections.
You can force the TPL to do this by specifying TaskOptions.LongRunning. According to Reflector (not according to the docs, though) this always creates a new thread. I consider relying on this safe production use.
Normal tasks will not do, because they don't guarantee execution. Setting MinThreads is a horrible solution (for production) because you are changing a process global setting to solve a local problem. And still, you are not guaranteed success.
Of course, you can also start threads. Tasks are more convenient though because of error handling. Nothing wrong with using threads for this use case.
Based on your comment, I think you should reconsider using Access in the first place. It doesn't scale well and has problems once the database grows to a certain size. Especially if this is simply served off some file share on your network.
You can try and simulate load from your single machine but I don't think that would be very representative of what you are trying to accomplish.
Have you considered using SQL Server Express? It's basically a de-tuned version of the full-blown SQL Server which might suit your needs better.
I want a certain action request to trigger a set of e-mail notifications. The user does something, and it sends the emails. However I do not want the user to wait for page response until the system generates and sends the e-mails. Should I use multithreading for this? Will this even work in ASP.NET MVC? I want the user to get a page response back and the system just finish sending the e-mails at it's own pace. Not even sure if this is possible or what the code would look like. (PS: Please don't offer me an alternative solution for sending e-mails, don't have time for that kind of reconfiguration.)
SmtpClient.SendAsync is probably a better bet than manual threading, though multi-threading will work fine with the usual caveats.
http://msdn.microsoft.com/en-us/library/x5x13z6h.aspx
As other people have pointed out, success/failure cannot be indicated deterministically when the page returns before the send is actually complete.
A couple of observations when using asynchronous operations:
1) They will come back to bite you in some way or another. It's a risk versus benefit discussion. I like the SendAsync() method I proposed because it means forms can return instantly even if the email server takes a few seconds to respond. However, because it doesn't throw an exception, you can have a broken form and not even know it.
Of course unit testing should address this initially, but what if the production configuration file gets changed to point to a broken mail server? You won't know it, you won't see it in your logs, you only discover it when someone asks you why you never responded to the form they filled out. I speak from experience on this one. There are ways around this, but in practicality, async is always more work to test, debug, and maintain.
2) Threading in ASP.Net works in some situations if you understand the ThreadPool, app domain refreshes, locking, etc. I find that it is most useful for executing several operations at once to increase performance where the end result is deterministic, i.e. the application waits for all threads to complete. This way, you gain the performance benefits while still having a clear indication of results.
3) Threading/Async operations do not increase performance, only perceived performance. There may be some edge cases where that is not true (such as processor optimizations), but it's a good rule of thumb. Improperly used, threading can hurt performance or introduce instability.
The better scenario is out of process execution. For enterprise applications, I often move things out of the ASP.Net thread pool and into an execution service.
See this SO thread: Designing an asynchronous task library for ASP.NET
I know you are not looking for alternatives, but using a MessageQueue (such as MSMQ) could be a good solution for this problem in the future. Using multithreading in asp.net is normally discouraged, but in your current situation I don't see why you shouldn't. It is definitely possible, but beware of the pitfalls related to multithreading (stolen here):
•There is a runtime overhead
associated with creating and
destroying threads. When your
application creates and destroys
threads frequently, this overhead
affects the overall application
performance. •Having too many threads
running at the same time decreases the
performance of your entire system.
This is because your system is
attempting to give each thread a time
slot to operate inside. •You should
design your application well when you
are going to use multithreading, or
otherwise your application will be
difficult to maintain and extend. •You
should be careful when you implement a
multithreading application, because
threading bugs are difficult to debug
and resolve.
At the risk of violating your no-alternative-solution prime directive, I suggest that you write the email requests to a SQL Server table and use SQL Server's Database Mail feature. You could also write a Windows service that monitors the table and sends emails, logging successes and failures in another table that you view through a separate ASP.Net page.
You probably can use ThreadPool.QueueUserWorkItem
Yes this is an appropriate time to use multi-threading.
One thing to look out for though is how will you express to the user when the email sending ultamitely fails? Not blocking the user is a good step to improving your UI. But it still needs to not provide a false sense of success when ultamitely it failed at a later time.
Don't know if any of the above links mentioned it, but don't forget to keep an eye on request timeout values, the queued items will still need to complete within that time period.