I am creating a windows application (using windows form application) which calls the web service to fetch data. In data, I have to fetch 200+ clients information and for each client, I have to fetch all users information. A client can have 50 to 100 users. So, I am calling web service in a loop (after getting all clients list) for each client to fetch the users listing. This is a long process. I want to reduce the execution time for this whole process. So, please suggest me which approach can help in reducing the execution time which is currently up to 40-50 mins for one time data fetch. Let me know any solution like multithreading or any thing else, whichever is best suited to my application.
Thanks in advance.
If you are in control of the web service, have a method that returns all the clients at once instead of 1 by one to avoid rountrips as Michael suggested.
If not, make sure to make as many requests at the same time (not in sequence) to avoid as much laterncy as possible. For each request you will have at least 1 rountrip (so at least your ping's Worth of delay), if you make 150 requests then you'll get your ping to the server X 150 Worth of "just waiting on the network". If you split those requests in 4 bunches, and do each of these bunches in parallel, then you'll only wait 150/4*ping time. So the more requests you do concurrently, the least you wait.
I suggest you to avoid calling the service in a loop for every user to get the details, but instead do that loop in the server and return all the data in one-shot, otherwise you will suffer of a lot of useless latencies caused by the thousand of calls, and not just because of the server time or data-transferring time.
This is also a pattern, called Remote Facade or Facade Pattern explained by Martin Fowler and the Gang of Four:
any object that's intended to be used as a remote objects needs a coarse-grained interface that minimizes the number of calls needed to get some-thing done [...] Rather than ask for an order and its order lines individually, you need to access and update the order and order lines in a single call.
In case you're not in control of the web service, you could try to use a Parallel.ForEach loop instead of a ForEach loop to query the web service.
The MSDN has a tutorial on how to use it: http://msdn.microsoft.com/en-us/library/dd460720(v=vs.110).aspx
Related
I have a challenge that I am encountering when needing to pull down data from a service. I'm using the following call to Parallel.ForEach:
Parallel.ForEach(idList, id => GetDetails(id));
GetDetails(id) calls a web service that takes roughly half a second and adds the resulting details to a list.
static void GetDetails(string id)
{
var details = WebService.GetDetails(Key, Secret, id);
AllDetails.Add(id, details);
}
The problem is, I know the service can handle more calls, but I can't seem to figure out how to get my process to ramp up more calls, UNLESS I split my list and open the process multiple times. In other words, if I open this app GetDetails.exe 4 times and split the number of IDs into each, I cut the run time down to 25% of the original. This tells me that the possibility is there but I am unsure how to achieve it without ramping up the console app multiple times.
Hopefully this is a pretty simple issue for folks that are more familiar with parallelism, but in my research I've yet to solve it without running multiple instances.
A few possibilities:
There's a chance that WebService.GetDetails(...) is using some kind of mechanism to ensure that only one web request actually happens at a time.
.NET itself may be limiting the number of connections, either to a given host or in general see this question's answers for details about these kinds of problems
If WebService.GetDetails(...) reuses some kind of identifier like a session key, the server may be limiting the number of concurrent requests that it accepts for that one session.
It's generally a bad idea to try to solve performance issues by hammering the server with more concurrent requests. If you control the server, then you're causing your own server to do way more work than it needs to. If not, you run the risk of getting IP-banned or something for abusing their service. It's worth checking to see if the service you're accessing has some options to batch your requests or something.
As Scott Chamberlain mentioned in comments, you need to be careful with parallel processes because accessing structures like Dictionary<> from multiple threads concurrently can cause sporadic, hard-to-track-down bugs. You'd probably be better off using async requests rather than parallel threads. If you're careful about your awaits, you can have multiple requests be active concurrently while still using just a single thread at a time.
I have an NHibernate MVC application that is using ReadCommitted Isolation.
On the site, there is a certain process that the user could initiate, and depending on the input, may take several minutes. This is because the session is per request and is open that entire time.
But while that runs, no other user can access the site (they can try, but their request won't go through unless the long-running thing is finished)
What's more, I also have a need to have a console app that also performs this long running function while connecting to the same database. It is causing the same issue.
I'm not sure what part of my setup is wrong, any feedback would be appreciated.
NHibernate is set up with fluent configuration and StructureMap.
Isolation level is set as ReadCommitted.
The session factory lifecycle is HybridLifeCycle (which on the web should be Session per request, but on the win console app would be ThreadLocal)
It sounds like your requests are waiting on database locks. Your options are really:
Break the long running process into a series of smaller transactions.
Use ReadUncommitted isolation level most of the time (this is appropriate in a lot of use cases).
Judicious use of Snapshot isolation level (Assuming you're using MS-SQL 2005 or later).
(N.B. I'm assuming the long-running function does a lot of reads/writes and the requests being blocked are primarily doing reads.)
As has been suggested, breaking your process down into multiple smaller transactions will probably be the solution.
I would suggest looking at something like Rhino Service Bus or NServiceBus (my preference is Rhino Service Bus - I find it much simpler to work with personally). What that allows you to do is separate the functionality down into small chunks, but maintain the transactional nature. Essentially with a service bus, you send a message to initiate a piece of work, the piece of work will be enlisted in a distributed transaction along with receiving the message, so if something goes wrong, the message will not just disappear, leaving your system in a potentially inconsistent state.
Depending on what you need to do, you could send an initial message to start the processing, and then after each step, send a new message to initiate the next step. This can really help to break down the transactions into much smaller pieces of work (and simplify the code). The two service buses I mentioned (there is also Mass Transit), also have things like retries built in, and error handling, so that if something goes wrong, the message ends up in an error queue and you can investigate what went wrong, hopefully fix it, and reprocess the message, thus ensuring your system remains consistent.
Of course whether this is necessary depends on the requirements of your system :)
Another, but more complex solution would be:
You build a background robot application which runs on one of the machines
this background worker robot can be receive "worker jobs" (the one initiated by the user)
then, the robot processes the jobs step & step in the background
Pitfalls are:
- you have to programm this robot very stable
- you need to watch the robot somehow
Sure, this is involves more work - on the flip side you will have the option to integrate more job-types, enabling your system to process different things in the background.
I think the design of your application /SQL statements has a problem , unless you are facebook I dont think any process it should take all this time , it is better to review your design and check where is the bottleneck are, instead of trying to make this long running process continue .
also some times ORM is not good for every scenario , did you try to use SP ?
I provide a Web Service for my clients which allow him to add a record to the production database.
I had an incident lately, in which my client's programmer called the service in a loop , iterated to call to my service thousands of times.
My question is what would be the best way to prevent such a thing.
I thought of some ways:
1.At the entrence to the service, I can update counters for each client that call the service, but that looks too clumbsy.
2.Check the IP of the client who called this service, and raise a flag each time he/she calls the service, and then reset the flag every hour.
I'm positive that there are better ways and would appriciate any suggestions.
Thanks, David
First you need to have a look at the legal aspects of your situation: Does the contract with your client allow you to restrict the client's access?
This question is out of the scope of SO, but you must find a way to answer it. Because if you are legally bound to process all requests, then there is no way around it. Also, the legal analysis of your situation may already include some limitations, in which way you may restrict the access. That in turn will have an impact on your solution.
All those issues aside, and just focussing on the technical aspects, do you use some sort of user authentication? (If not, why not?) If you do, you can implement whatever scheme you decide to use on a per user base, which I think would be the cleanest solution (you don't need to rely on IP addresses, which is a somehow ugly workaround).
Once you have your way of identifying a single user, you can implement several restrictions. The fist ones that come to my mind are these:
Synchronous processing
Only start processing a request after all previous requests have been processed. This may even be implemented with nothing more but a lock statement in your main processing method. If you go for this kind of approach,
Time delay between processing requests
Requires that after one processing call a specific time must pass before the next call is allowed. The easiest solution is to store a LastProcessed timestamp in the user's session. If you go for this approach, you need to start thinking of how to respond when a new request comes in before it is allowed to be processed - do you send an error message to the caller? I think you should...
EDIT
The lock statement, briefly explained:
It is intended to be used for thread safe operations. the syntax is as follows:
lock(lockObject)
{
// do stuff
}
The lockObject needs to be an object, usually a private member of the current class. The effect is that if you have 2 threads who both want to execute this code, the first to arrive at the lock statement locks the lockObject. While it does it's stuff, the second thread can not acquire a lock, since the object is already locked. So it just sits there and waits until the first thread releases the lock when it exits the block at the }. Only thhen can the second thread lock the lockObject and do it's stuff, blocking the lockObject for any third thread coming along, until it has exited the block as well.
Careful, the whole issue of thread safety is far from trivial. (One could say that the only thing trivial about it are the many trivial errors a programmer can make ;-)
See here for an introduction into threading in C#
The way is to store on the session a counter and use the counter to prevent too many calls per time.
But if your user may try to avoid that and send different cookie each time*, then you need to make a custom table that act like the session but connect the user with the ip, and not with the cookie.
One more here is that if you block basic on the ip you may block an entire company that come out of a proxy. So the final correct way but more complicate is to have both ip and cookie connected with the user and know if the browser allow cookie or not. If not then you block with the ip. The difficult part here is to know about the cookie. Well on every call you can force him to send a valid cookie that is connected with an existing session. If not then the browser did not have cookies.
[ * ] The cookies are connected with the session.
[ * ] By making new table to keep the counters and disconnected from session you can also avoid the session lock.
In the past I have use a code that used for DosAttack, but none of them are working good when you have many pools and difficult application so I now use a custom table as I describe it. This are the two code that I have test and use
Dos attacks in your web app
Block Dos attacks easily on asp.net
How to find the clicks per seconds saved on a table. Here is the part of my SQL that calculate the Clicks Per Second. One of the tricks is that I continue to add clicks and make the calculation of the average if I have 6 or more seconds from the last one check. This is a code snipped from the calculation as an idea
set #cDos_TotalCalls = #cDos_TotalCalls + #NewCallsCounter
SET #cMilSecDif = ABS(DATEDIFF(millisecond, #FirstDate, #UtpNow))
-- I left 6sec diferent to make the calculation
IF #cMilSecDif > 6000
SET #cClickPerSeconds = (#cDos_TotalCalls * 1000 / #cMilSecDif)
else
SET #cClickPerSeconds = 0
IF #cMilSecDif > 30000
UPDATE ATMP_LiveUserInfo SET cDos_TotalCalls = #NewCallsCounter, cDos_TotalCallsChecksOn = #UtpNow WHERE cLiveUsersID=#cLiveUsersID
ELSE IF #cMilSecDif > 16000
UPDATE ATMP_LiveUserInfo SET cDos_TotalCalls = (cDos_TotalCalls / 2),
cDos_TotalCallsChecksOn = DATEADD(millisecond, #cMilSecDif / 2, cDos_TotalCallsChecksOn)
WHERE cLiveUsersID=#cLiveUsersID
Get user ip and insert it into cache for an hour after using web service, this is cached on server:
HttpContext.Current.Cache.Insert("UserIp", true, null,DateTime.Now.AddHours(1),System.Web.Caching.Cache.NoSlidingExpiration);
When you need to check if user entered in last hour:
if(HttpContext.Current.Cache["UserIp"] != null)
{
//means user entered in last hour
}
My team is writing a windows service which needs to poll data from a 3rd party system (the 3rd party system provides a web service which we use).
The process will look something like this:
1. Call the 3rd party WS
2. Save the recieved raw data to our DB
3. Process the raw data
4. Save the processed data to our DB
5. Repeat
The team agrees that we actually have 2 different logical operations:
1. Getting and saving the raw data
2. Processing the raw data and saving the results
We are trying to decide which of the following design options is better:
Option 1: Perform both operation on the same windows service, each operation on it's own thread
Option 2: Perform the first operation on the windows service, and async/one-way call a wcf service for the second operation
In your opinion, which option is better?
if you have another alternative you think is better, please share it.
Thanks.
It depends.
Given that you have an apparently sequential process, why use separate threads to read and then process the data? The simplest approach would be a single thread that loops around reading, processing it, and presumably waiting at some point so you aren't rate limited by the third party.
If however the processing takes a long time you might want to split the work between a single polling thread and a set of workers that process the data.
The simplest option is usually the right one for your initial implementation. Adding threads and WCF service calls before you need them is rarely the right thing to do.
To give a better answer you really need to supply more information: does the third party service limit how many calls you can make at once or how quickly you can make them, how long does the processing take, how often do you need to poll, ...
According to your comment, I would say that you have a thread that once a second polls the 3rd party service and starts two tasks.
Task 1 would store the raw data to database.
Task 2 would process the raw data and store the result in database.
If the polling thread retrieves 1000 entries it should poll again without delay.
You can either use System.Threading.ThreadPool or System.Threading.Tasks.Task.
I have a asp .net web page(MVC) displaying 10,000 products.
For this I am using a method. In that method I have to call an external web service 20 times. This is because the web service gives me 500 data at a time, so to get 10000 data I need to call the service 20 times.
20 calls makes the page load slow. Now I need to increase the performance. Since web service is external I cannot make changes there.
Threading is an option I thought of. Since I can use page numbers (service is paging for the data) each service call is almost independent.
Another option is using parallel linq.
Should I use parallel linq, or choose threading?
Someone please guide me here. Or let me know another way to achieve this.
Note : this web page can be used by many users at a time.
We have filters left side of the page.for that we need all the 10,000 data to construct filter.Otherwise pagewise info could have been enough.and caching is not possible since the huge overload on the server. at a time 400-1000 users can hit server.web service response time is 10 second so that we can hit them many time
We have to hit the service 20 times to get all data.Now i need a solution to improve that hit.Is threading is the only option?
If you can't cache the data from the service, then just get the data you need, when you need to display it. I very much doubt that somebody wants to see all 10000 products on a single web page, and if they do, there is probably something wrong!
Threads, parallel linq will not help you here.
Parallel Linq is meant for lots of CPU work to be shared over CPU cores, what you want to do is make 20 web requests at the same time. You will need to use threading to do that.
You'll probably want to use the built in async capability of HttpWebRequest (see BeginGetResponse).
Consider calling that service asyncrhonously. Most of delay in calling webservice is caused by IO operations that can be done simultaneously.
But getting 10000 items per each request is something very scarry :)