How to fetch API data using WebClient and multiple threads?

How to fetch API data using WebClient and multiple threads? - c#

So I am trying to query an API that's accessible via HTTP ( no authorization ). To speed things up, I tried to use a Parallel.ForEach loop but it seems like the longer it runs, the more errors pop up.
It fails to retrieve more and more requests. I know the API provider isn't limiting me because I can request the very same blocked URLs in my Internet browser. Also, these are different failed URLs each time, so it doesn't seem to be the case of malformed requests.
The error doesn't seem to occur while I use single threaded foreach loop.
My malfunctioning loop is below:
Parallel.ForEach(this.urlArray, singleUrl => {
this.apiResponseBlob = new System.Net.WebClient ().DownloadString(singleUrl );
this.responsesDictionary.Add(singleUrl, apiResponseBlob);
}
Normal foreach loop works fine but is very slow:
foreach (string singleUrl in this.urlArray) {
this.apiResponseBlob = new System.Net.WebClient ().DownloadString(singleUrl);
this.responsesDictionary.Add(singleUrl, apiResponseBlob);
}
Also: I've had a solution in PHP - I spawned several "fetchers" simultaneously and it never hung up. It seems strange to me that PHP would handle multithreaded retrieval better than C# so I must obviously miss something.
How do I query the API fastest way? Without these strange failures?

Hi did you try to speed up your code with a sync downloads like in this question (see marked answer):
DownloadStringAsync wait for request completion
your could loop through your uris and get a callback for each successfull download.
EDIT : i have seen that you use
this.apiResponseBlob = DL
when you use multithreading every thread tries to write in that variable. This could be a reason vor your bug. Try using an instance of that object type or use
lock{}
so that only one thread can write this variable at time.
http://msdn.microsoft.com/de-de/library/c5kehkcz.aspx
like
Parallel.ForEach(this.urlArray, singleUrl => {
var apiResponseBlob = new System.Net.WebClient ().DownloadString(singleUrl );
lock(singleUrl.ToString()){
this.responsesDictionary.Add(singleUrl, apiResponseBlob);
}
}

Related

Multiple WebClient calls from worker threads get blocked in loopback requests back to self

I have a .NET 4.5.2 ASP.NET webapp in which a chunk of code makes async webclient calls back into web pages inside the same webapp. (Yes, my webapp makes async calls back into itself.) It does this in order to screen scrape, grab the html, and hand it to a PDF generator.
I had this all working...except that it was very slow because there are about 15 labor-intensive reports that take roughly 3 seconds each, or 45 seconds in total. Since that is so slow I attempted to generate all these concurrently in parallel, and that's when things hit the fan.
What is happening is that my aspx reports (that get hit by webclient) never make it past the class constructor until timeout. Page_Load doesn't get hit until timeout, or any other page events. The report generation (and webclient calls) are triggered when the user clicks Save in the webapp, and a bunch of stuff happens, including this async page generation activity. The webapp requires windows authentication which I'm handling fine.
So when the multithreaded stuff kicks off, a bunch of webclient requests are made, and they all get stuck in the reports' class contructor for a few minutes, and then time out. During/after timeout, session data is cleared, and when that happens, the reports cannot get their data.
Here is the multithreaded code:
Parallel.ForEach(folders, ( folderPath ) =>
{
...
string html = getReportHTML(fullReportURL, aspNetSessionID);
// hand html to the PDF generator here...
...
});
private string getReportHTML( string url, string aspNetSessionID ) {
using( WebClient webClient = new WebClient() ) {
webClient.UseDefaultCredentials = true;
webClient.Headers.Add(HttpRequestHeader.Cookie, "ASP.NET_SessionId=" + aspNetSessionID);
string fullReportURL = url;
byte[] reportBytes = webClient.DownloadData(fullReportURL);
if( reportBytes != null && reportBytes.Length > 0 ) {
string html = Encoding.ASCII.GetString(reportBytes);
return html;
}
}
return string.Empty;
}
Important points:
Notice I have to include the ASP.NET session cookie, or the web call doesn't work.
webClient.UseDefaultCredentials = true is required for the winauth.
The fragile session state and architecture is not changeable in the short term - it's an old and massive webapp and I am stuck with it. The reports are complex and rely heavily on session state (and prior to session state many db lookups and calcs are occurring.
Even though I'm calling reports from my webapp to my same webapp, I must use an absolute url - relative URL throws errors.
When I extract the code samples above into a separate .net console app, it works well, and doesn't get stuck in the constructor. Because of this, the issue must lie (at least in part) in the fact that my web app is making async calls back to itself. I don't know how to avoid doing this. I even flirted with Server.Execute() which really blows up inside worker threads.
The reports cannot be generated in a windows service or some other process - it must be linked to the webapp's save event.
There's a lot going on here, but I think the most fundamental question/problem is that these concurrent webclient calls hit the ASPX pages and get stuck in the constructor, going no further into page events. And after about 2 minutes, all those threads flood down into the page events, where failures occur because the main webapp's session state is no longer active.
Chicken or egg: I don't know whether the threads unblock and eventually hit page events because the session state was cleared, or the other way around. Or maybe there is no connection.
Any ideas?

Consuming a web service in multiple threads c#

I am consuming a web service provided to me by a vendor in c# application. This application calls a web method in a loop and that slows down the performance. To get the complete set of results, it takes more than an hour.
Can I apply multi threading on my side to consume this web service in multiple threads and combine the results together?
Is there any better approach to retrieve data in minutes instead of hours?

First of all you have to make sure your vendor does indeed support this or does not prohibit it (which is very probable too).
The code itself to do this is fairly straightforward, using a method such as Parallel.For
Simple Example (google.com):
Parallel.For(0, norequests,
i => {
//Code that does your request goes here
} );
Exaplanation:
In a Parallel.For loop, all the requests get executed in-parallel (as implied in the name), which could potentially provide a very significant increase in performance.
Further reading:
MSDN on Parallel.For loops

You should really ask your vendor. We can only speculate about why it takes that long or if firing multiple requests will actually yield the same results as the one that takes long.
Basically, sending one request getting one response should beat the multi-threaded variant because it should be easier to optimize on the servers side.
If you want to know why this is not the case with the current version of the service, ask the vendor.

This is only samples, if you call web services in parallel:
private void TestParallelForeach()
{
string[] uris = {"http://192.168.1.2", "http://192.168.1.3", "http://192.168.1.4"};
var results = new List<string>();
var syncObj = new object();
Parallel.ForEach(uris, uri =>
{
using (var webClient = new WebClient())
{
webClient.Encoding = Encoding.UTF8;
try
{
var result = webClient.DownloadString(uri);
lock (syncObj)
{
results.Add(result);
}
}
catch (Exception ex)
{
// Do error handling here...
}
}
});
// Do with "results" here....
}

HttpWebRequest possibly slowing website

Using Visual studio 2012, C#.net 4.5 , SQL Server 2008, Feefo, Nopcommerce
Hey guys I have Recently implemented a new review service into a current site we have.
When the change went live the first day all worked fine.
Since then though the sending of sales to Feefo hasnt been working, There are no logs either of anything going wrong.
In the OrderProcessingService.cs in Nop Commerce's Service, i call a HttpWebrequest when an order has been confirmed as completed. Here is the code.
var email = HttpUtility.UrlEncode(order.Customer.Email.ToString());
var name = HttpUtility.UrlEncode(order.Customer.GetFullName().ToString());
var description = HttpUtility.UrlEncode(productVariant.ProductVariant.Product.MetaDescription != null ? productVariant.ProductVariant.Product.MetaDescription.ToString() : "product");
var orderRef = HttpUtility.UrlEncode(order.Id.ToString());
var productLink = HttpUtility.UrlEncode(string.Format("myurl/p/{0}/{1}", productVariant.ProductVariant.ProductId, productVariant.ProductVariant.Name.Replace(" ", "-")));
string itemRef = "";
try
{
itemRef = HttpUtility.UrlEncode(productVariant.ProductVariant.ProductId.ToString());
}
catch
{
itemRef = "0";
}
var url = string.Format("feefo Url",
login, password,email,name,description,orderRef,productLink,itemRef);
var request = (HttpWebRequest)WebRequest.Create(url);
request.KeepAlive = false;
request.Timeout = 5000;
request.Proxy = null;
using (var response = (HttpWebResponse)request.GetResponse())
{
if (response.StatusDescription == "OK")
{
var stream = response.GetResponseStream();
if(stream != null)
{
using (var reader = new StreamReader(stream))
{
var content = reader.ReadToEnd();
}
}
}
}
So as you can see its a simple webrequest that is processed on an order, and all product variants are sent to feefo.
Now:
this hasnt been happening all week since the 15th (day of the
implementation)
the site has been grinding to a halt recently.
The stream and reader in the the var content is there for debugging.
Im wondering does the code redflag anything to you that could relate to the process of website?
Also note i have run some SQL statements to see if there is any deadlocks or large escalations, so far seems fine, Logs have also been fine just the usual logging of Bots.
Any help would be much appreciated!
EDIT: also note that this code is in a method that is called and wrapped in A try catch
UPDATE: well forget about the "not sending", thats because i was just told my code was rolled back last week

A call to another web site while processing the order can degrade performance, as you are calling to a site that you do not control. You don't know how much time it is going to take. Furthermore, the GetResponse method can throw an exception, if you don't log anything in your outer try/catch block then you won't be able to know what's happening.
The best way to perform such a task is to implement something like the "Send Emails" scheduled task, and send data when you can afford to wait for the remote service. It is easy if you try. It is more resilient and easier to maintain if you upgrade the nopCommerce code base.
This is how I do similar things:
Avoid modifying the OrderProcessingService: Create a custom service or plugin that consumes the OrderPlacedEvent or the OrderPaidEvent (just implement the IConsumer<OrderPaidEvent> or IConsumer<OrderPlacedEvent> interface).
Do not call to a third party service directly while processing the request if you don't need the response at that moment. It will only delay your process. At the service created in step 1, store data and send it to Feefo later. You can store data to database or use an static collection if you don't mind losing pending data when restarting the site (that could be ok for statistical data for instance).
Best way to implement point #2 is to add a new scheduled task implementing ITask (remember to add a record to the ScheduleTask table). Just recover the stored data do your processing.
Add some logging. It is easy, just get an ILogger instance and call Insert.

As far as I can see, you are making a blocking synchronous call to other websites, which will definitely slow down your site in between the request-response process. What Marco has suggested is valid, try to do it in an ITask. Or you can use an asynchronous web request to potentially remove the block, if you need things done immediately instead of scheduled. :)

C# How to process several web requests at once

I have been reading a lot about ThreadPools, Tasks, and Threads. After awhile I got pretty confused with the whole thing. Lots of people saying negative/positive things about each... Maybe someone can help me find a solution for my problem. I created a simple diagram here to get my point across better.
Basically on the left is a list of 5 strings (URL's) that need to be processed. In the center is just my idea of a handler that has 2 events to track progress. Inside that handler it takes all 5 URL's creates separate tasks for them, shown in blue. Once each one complete I want each one to return the webpage results to the handler. When they have all returned a value I want the OnComplete to be called and all this information passed back to the main thread.
Hopefully you can understand what I am trying to do. Thanks in advance for anyone who would like to help!
Update
I have taken your suggestions and put them to use. But I still have a few questions. Here is the code I have built, mind it is not build proof, just a concept to see if I'm going in the right direction. Please read the comments, I had included my questions on how to proceed in there. Thank you for all who took interest in my question so far.
public List<String> ProcessList (string[] URLs)
{
List<string> data = new List<string>();
for(int i = 0; i < URLs.Length - 1; i++)
{
//not sure how to do this now??
//I want only 10 HttpWebRequest running at once.
//Also I want this method to block until all the URL data has been returned.
}
return data;
}
private async Task<string> GetURLData(string URL)
{
//First setup out web client
HttpWebRequest Request = GetWebRequest(URL);
//
//Check if the client holds a value. (There were no errors)
if (Request != null)
{
//GetCouponsAsync will return to the calling function and resumes
//here when GetResponse is complete.
WebResponse Response = await Request.GetResponseAsync();
//
//Setup our Stream to read the reply
Stream ResponseStream = Response.GetResponseStream();
//return the reply string here...
}
}

As #fendorio and #ps2goat pointed out async await is perfect for your scenario. Here is another msdn article
http://msdn.microsoft.com/en-us/library/hh300224.aspx

It seems to me that you are trying to replicate a webserver within a webserver.
Each web request starts its own thread in a webserver. As these requests can originate from anywhere that has access to the server, nothing but the server itself has access or the ability to manage them (in a clean way).
If you would like to handle requests and keep track of them like I believe you are asking, AJAX requests would be the best way to do this. This way you can leave the server to manage the threads and requests as it does best, but you can manage their progress and monitor them via JSON return results.
Look into jQuery.ajax for some ideas on how to do this.

To achieve the above mentioned functionality in a simple way, I would prefer calling a BackgroundWorker for each of the tasks. You can keep track of the progress plus you get a notification upon task completion.
Another reason to choose this is that the mentioned tasks look like a back-end job and not tightly coupled with the UI.
Here's a MSDN link and this is the link for a cool tutorial.

Bloomberg API request timing out

Having set up a ReferenceDataRequest I send it along to an EventQueue
Service refdata = _session.GetService("//blp/refdata");
Request request = refdata.CreateRequest("ReferenceDataRequest");
// append the appropriate symbol and field data to the request
EventQueue eventQueue = new EventQueue();
Guid guid = Guid.NewGuid();
CorrelationID id = new CorrelationID(guid);
_session.SendRequest(request, eventQueue, id);
long _eventWaitTimeout = 60000;
myEvent = eventQueue.NextEvent(_eventWaitTimeout);
Normally I can grab the message from the queue, but I'm hitting the situation now that if I'm making a number of requests in the same run of the app (normally around the tenth), I see a TIMEOUT EventType
if (myEvent.Type == Event.EventType.TIMEOUT)
throw new Exception("Timed Out - need to rethink this strategy");
else
msg = myEvent.GetMessages().First();
These are being made on the same thread, but I'm assuming that there's something somewhere along the line that I'm consuming and not releasing.
Anyone have any clues or advice?
There aren't many references on SO to BLP's API, but hopefully we can start to rectify that situation.

I just wanted to share something, thanks to the code you included in your initial post.
If you make a request for historical intraday data for a long duration (which results in many events generated by Bloomberg API), do not use the pattern specified in the API documentation, as it may end up making your application very slow to retrieve all events.
Basically, do not call NextEvent() on a Session object! Use a dedicated EventQueue instead.
Instead of doing this:
var cID = new CorrelationID(1);
session.SendRequest(request, cID);
do {
Event eventObj = session.NextEvent();
...
}
Do this:
var cID = new CorrelationID(1);
var eventQueue = new EventQueue();
session.SendRequest(request, eventQueue, cID);
do {
Event eventObj = eventQueue.NextEvent();
...
}
This can result in some performance improvement, though the API is known to not be particularly deterministic...

I didn't really ever get around to solving this question, but we did find a workaround.
Based on a small, apparently throwaway, comment in the Server API documentation, we opted to create a second session. One session is responsible for static requests, the other for real-time. e.g.
_marketDataSession.OpenService("//blp/mktdata");
_staticSession.OpenService("//blp/refdata");
The means one session operates in subscription mode, the other more synchronously - I think it was this duality which was at the root of our problems.
Since making that change, we've not had any problems.

My reading of the docs agrees that you need separate sessions for the "//blp/mktdata" and "//blp/refdata" services.

A client appeared to have a similar problem. I solved it by making hundreds of sessions rather than passing in hundreds of requests in one session. Bloomberg may not be to happy with this BFI (brute force and ignorance) approach as we are sending the field requests for each session but it works.

Nice to see another person on stackoverflow enjoying the pain of bloomberg API :-)
I'm ashamed to say I use the following pattern (I suspect copied from the example code). It seems to work reasonably robustly, but probably ignores some important messages. But I don't get your time-out problem. It's Java, but all the languages work basically the same.
cid = session.sendRequest(request, null);
while (true) {
Event event = session.nextEvent();
MessageIterator msgIter = event.messageIterator();
while (msgIter.hasNext()) {
Message msg = msgIter.next();
if (msg.correlationID() == cid) {
processMessage(msg, fieldStrings, result);
}
}
if (event.eventType() == Event.EventType.RESPONSE) {
break;
}
}
This may work because it consumes all messages off each event.

It sounds like you are making too many requests at once. BB will only process a certain number of requests per connection at any given time. Note that opening more and more connections will not help because there are limits per subscription as well. If you make a large number of time consuming requests simultaneously, some may timeout. Also, you should process the request completely(until you receive RESPONSE message), or cancel them. A partial request that is outstanding is wasting a slot. Since splitting into two sessions, seems to have helped you, it sounds like you are also making a lot of subscription requests at the same time. Are you using subscriptions as a way to take snapshots? That is subscribe to an instrument, get initial values, and de-subscribe. If so, you should try to find a different design. This is not the way the subscriptions are intended to be used. An outstanding subscription request also uses a request slot. That is why it is best to batch as many subscriptions as possible in a single subscription list instead of making many individual requests. Hope this helps with your use of the api.

By the way, I can't tell from your sample code, but while you are blocked on messages from the event queue, are you also reading from the main event queue while(in a seperate event queue)? You must process all the messages out of the queue, especially if you have outstanding subscriptions. Responses can queue up really fast. If you are not processing messages, the session may hit some queue limits which may be why you are getting timeouts. Also, if you don't read messages, you may be marked a slow consumer and not receive more data until you start consuming the pending messages. The api is async. Event queues are just a way to block on specific requests without having to process all messages from the main queue in a context where blocking is ok, and it would otherwise be be difficult to interrupt the logic flow to process parts asynchronously.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.