Multiple concurrent webservice requests in ASP.NET using threads - c#

I need to make multiple requests to a webservice at the same time. I thought of creating a thread for each request. Can this be done in ASP.NET v3.5?
Example:
for(int i = 0; i<=10; i++)
{
"Do each Request in a separate thread..."
}

While the oportunities to what you can use vary depending of what and where you would like to use paralelism in your code. I would suggest that you start off by using the new Task class from .NET 4.0.
Example would be:
Task backgroundProcess = new Task(() =>
{
service.CallMethod();
});
This will get you started. After that I suggest that you do some reading because this is a very broad subject. Try this link:
http://www.albahari.com/threading/

The following pattern can be used to spin off multiple requests as work items in the ThreadPool. It will also wait for all of those works items to complete before proceeding.
int pending = requests.Count;
var finished = new ManualResetEvent(false);
foreach (Request request in requests)
{
Request capture = request; // Required to close over the loop variable correctly.
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
ProcessRequest(capture);
}
finally
{
if (Interlocked.Decrement(ref pending) == 0)
{
finished.Set(); // Signal completion of all work items.
}
}
}, null);
}
finished.WaitOne(); // Wait for all work items to complete.
You could also download the Reactive Extensions backport for 3.5 and then use Parallel.For to do the same thing.

If the webservice calls are asynchronous I don't see how have multiple threads will accomplish anything.

In .NET 3.5 you can use ThreadPool QueueUserWorkItem method. Plenty of examples on the web.

Related

Task post-processing to start soon after 2 tasks are done

I have a core task retreiving me some core data and multiple other sub-tasks fetching extra data. Would like to run some enricher process to the core data as soon as the core task and any of the sub-task is ready. Would you know how to do so?
Thought about something like this but not sure it's the doing what I want:
// Starting the tasks
var coreDataTask = new Task(...);
var extraDataTask1 = new Task(...);
var extraDataTask2 = new Task(...);
coreDataTask.Start();
extraDataTask1.Start();
extraDataTask2.Start();
// Enriching the results
Task.WaitAll(coreDataTask, extraDataTask1);
EnrichCore(coreDataTask.Results, extraDataTask1.Results);
Task.WaitAll(coreDataTask, extraDataTask2);
EnrichCore(coreDataTask.Results, extraDataTask2.Results);
Also given the enrichement is on the same core object, guess I would need to lock it somewhere?
Thanks in advance!
Here is another idea taking advantage of Task.WhenAny() to detect when tasks are completing.
For this minimal example, I just assume that the core data and extra data are strings. But you can adjust for whatever your type is.
Also, I am not actually doing any processing. You would have to plug in your processing.
Also, an assumption I am making, that is not really clear, is that you are mostly trying to parallelize the gathering of your data because that's the expensive part, but that the enriching part is actually pretty fast. Based on that assumption, you'll notice that the tasks run in parallel to gather the core data and extra data. But as the data becomes available, the core data is enriched synchronously to avoid having to complicate the code with locking.
If you copy-paste the code below, you should be able to run it as is to see how it works.
public static void Main(string[] args)
{
StartWork().Wait();
}
private async static Task StartWork()
{
// start core and extra tasks
Task<string> coreDataTask = Task.Run(() => "core data" /* do something more complicated here */);
List<Task<string>> extraDataTaskList = new List<Task<string>>();
for (int i = 0; i < 10; i++)
{
int x = i;
extraDataTaskList.Add(Task.Run(() => "extra data " + x /* do something more complicated here */));
}
// wait for core data to be ready first.
StringBuilder coreData = new StringBuilder(await coreDataTask);
// enrich core as the extra data tasks complete.
while (extraDataTaskList.Count != 0)
{
Task<string> completedExtraDataTask = await Task.WhenAny(extraDataTaskList);
extraDataTaskList.Remove(completedExtraDataTask);
EnrichCore(coreData, await completedExtraDataTask);
}
Console.WriteLine(coreData.ToString());
}
private static void EnrichCore(StringBuilder coreData, string extraData)
{
coreData.Append(" enriched with ").Append(extraData);
}
EDIT: .NET 4.0 version
Here is how I would change it for .NET 4.0, while still retaining the same overall design:
Task.Run() becomes Task.Factory.StartNew()
Instead of doing await on tasks, I call Result, which is a blocking call that waits for the task to complete.
Use Task.WaitAny instead of Task.WhenAny, which is also a blocking call.
The design remains very similar. The one big difference between both versions of the code is that in the .NET 4.5 version, whenever there is an await, the current thread is free to do other work. In the .NET 4.0 version, whenever you call Task.Result or Task.WaitAny, the current thread blocks until the Task completes. It's possible that this difference is not really important to you. But if it is, just make sure to wrap and run the whole block of code in a background thread or task to free up your main thread.
The other difference is with the exception handling. With the .NET 4.5 version, if any of your tasks fails with an unhandled exception, the exception is automatically unwrapped and propagated in a very transparent manner. With the .NET 4.0 version, you'll be getting AggregateExceptions that you will have to unwrap and handle yourself. If this is a concern, make sure you test this beforehand so you know what to expect.
Personally, I try to avoid Task.ContinueWith whenever I can. It tends to make the code really ugly and hard to read.
public static void Main(string[] args)
{
// start core and extra tasks
Task<string> coreDataTask = Task.Factory.StartNew(() => "core data" /* do something more complicated here */);
List<Task<string>> extraDataTaskList = new List<Task<string>>();
for (int i = 0; i < 10; i++)
{
int x = i;
extraDataTaskList.Add(Task.Factory.StartNew(() => "extra data " + x /* do something more complicated here */));
}
// wait for core data to be ready first.
StringBuilder coreData = new StringBuilder(coreDataTask.Result);
// enrich core as the extra data tasks complete.
while (extraDataTaskList.Count != 0)
{
int indexOfCompletedTask = Task.WaitAny(extraDataTaskList.ToArray());
Task<string> completedExtraDataTask = extraDataTaskList[indexOfCompletedTask];
extraDataTaskList.Remove(completedExtraDataTask);
EnrichCore(coreData, completedExtraDataTask.Result);
}
Console.WriteLine(coreData.ToString());
}
private static void EnrichCore(StringBuilder coreData, string extraData)
{
coreData.Append(" enriched with ").Append(extraData);
}
I think what you probably want is "ContinueWith" (Documentation here : https://msdn.microsoft.com/en-us/library/dd270696(v=vs.110).aspx). That is as long as your enriching doesn't need to be done in a specific order.
The code would look something like the following :
var coreTask = new Task<object>(() => { return null; });
var enrichTask1 = new Task<object>(() => { return null; });
var enrichTask2 = new Task<object>(() => { return null; });
coreTask.Start();
coreTask.Wait();
//Create your continue tasks here with the data you want.
enrichTask1.ContinueWith(task => {/*Do enriching here with task.Result*/});
//Start all enricher tasks here.
enrichTask1.Start();
//Wait for all the tasks to complete here.
Task.WaitAll(enrichTask1);
You still need to run your CoreTask first as that's required to finish before all enriching tasks. But from there you can start all tasks, and tell them when they are done to "ContinueWith" doing something else.
You should also take a quick look in the "Enricher Pattern" that may be able to help you in general with what you want to achieve (Outside of threading). Examples like here : http://www.enterpriseintegrationpatterns.com/DataEnricher.html

Hangfire Background Job with Return Value

I'm switching from Task.Run to Hangfire. In .NET 4.5+ Task.Run can return Task<TResult> which allows me to run tasks that return other than void. I can normally wait and get the result of my task by accessing the property MyReturnedTask.Result
Example of my old code:
public void MyMainCode()
{
List<string> listStr = new List<string>();
listStr.Add("Bob");
listStr.Add("Kate");
listStr.Add("Yaz");
List<Task<string>> listTasks = new List<Task<string>>();
foreach(string str in listStr)
{
Task<string> returnedTask = Task.Run(() => GetMyString(str));
listTasks.Add(returnedTask);
}
foreach(Task<string> task in listTasks)
{
// using task.Result will cause the code to wait for the task if not yet finished.
// Alternatively, you can use Task.WaitAll(listTasks.ToArray()) to wait for all tasks in the list to finish.
MyTextBox.Text += task.Result + Environment.NewLine;
}
}
private string GetMyString(string str)
{
// long execution in order to calculate the returned string
return str + "_finished";
}
As far as I can see from the Quick Start page of Hangfire, your main guy which is BackgroundJob.Enqueue(() => Console.WriteLine("Fire-and-forget"));
perfectly runs the code as a background job but apparently doesn't support jobs that have a return value (like the code I presented above). Is that right? if not, how can I tweak my code in order to use Hangfire?
P.S. I already looked at HostingEnvironment.QueueBackgroundWorkItem (here) but it apparently lacks the same functionality (background jobs have to be void)
EDIT
As #Dejan figured out, the main reason I want to switch to Hangfire is the same reason the .NET folks added QueueBackgroundWorkItem in .NET 4.5.2. And that reason is well described in Scott Hanselman's great article about Background Tasks in ASP.NET. So I'm gonna quote from the article:
QBWI (QueueBackgroundWorkItem) schedules a task which can run in the background, independent of
any request. This differs from a normal ThreadPool work item in that
ASP.NET automatically keeps track of how many work items registered
through this API are currently running, and the ASP.NET runtime will
try to delay AppDomain shutdown until these work items have finished
executing.
One simple solution would be to poll the monitoring API until the job is finished like this:
public static Task Enqueue(Expression<Action> methodCall)
{
string jobId = BackgroundJob.Enqueue(methodCall);
Task checkJobState = Task.Factory.StartNew(() =>
{
while (true)
{
IMonitoringApi monitoringApi = JobStorage.Current.GetMonitoringApi();
JobDetailsDto jobDetails = monitoringApi.JobDetails(jobId);
string currentState = jobDetails.History[0].StateName;
if (currentState != "Enqueued" && currentState != "Processing")
{
break;
}
Thread.Sleep(100); // adjust to a coarse enough value for your scenario
}
});
return checkJobState;
}
Attention: Of course, in a Web-hosted scenario you cannot rely on continuation of the task (task.ContinueWith()) to do more things after the job has finished as the AppDomain might be shut down - for the same reasons you probably want to use Hangfire in the first place.

Task Scheduler with WCF Service Reference async function

I am trying to consume a service reference, making multiple requests at the same time using a task scheduler. The service includes an synchronous and an asynchronous function that returns a result set. I am a bit confused, and I have a couple of initial questions, and then I will share how far I got in each. I am using some logging, concurrency visualizer, and fiddler to investigate. Ultimately I want to use a reactive scheduler to make as many requests as possible.
1) Should I use the async function to make all the requests?
2) If I were to use the synchronous function in multiple tasks what would be the limited resources that would potentially starve my thread count?
Here is what I have so far:
var myScheduler = new myScheduler();
var myFactory = new Factory(myScheduler);
var myClientProxy = new ClientProxy();
var tasks = new List<Task<Response>>();
foreach( var request in Requests )
{
var localrequest = request;
tasks.Add( myFactory.StartNew( () =>
{
// log stuff
return client.GetResponsesAsync( localTransaction.Request );
// log some more stuff
}).Unwrap() );
}
Task.WaitAll( tasks.ToArray() );
// process all the requests after they are done
This runs but according to fiddler it just tries to do all of the requests at once. It could be the scheduler but I trust that more then I do the above.
I have also tried to implement it without the unwrap command and instead using an async await delegate and it does the same thing. I have also tried referencing the .result and that seems to do it sequentially. Using the non synchronous service function call with the scheduler/factory it only gets up to about 20 simultaneous requests at the same time per client.
Yes. It will allow your application to scale better by using fewer threads to accomplish more.
Threads. When you initiate a synchronous operation that is inherently asynchronous (e.g. I/O) you have a blocked thread waiting for the operation to complete. You could however be using this thread in the meantime to execute CPU bound operations.
The simplest way to limit the amount of concurrent requests is to use a SemaphoreSlim which allows to asynchronously wait to enter it:
async Task ConsumeService()
{
var client = new ClientProxy();
var semaphore = new SemaphoreSlim(100);
var tasks = Requests.Select(async request =>
{
await semaphore.WaitAsync();
try
{
return await client.GetResponsesAsync(request);
}
finally
{
semaphore.Release();
}
}).ToList();
await Task.WhenAll(tasks);
// TODO: Process responses...
}
Regardless of how you are calling the WCF service whether it is an Async call or a Synchronous one you will be bound by the WCF serviceThrottling limits. You should look at these settings and possible adjust them higher (if you have them set to low values for some reason), in .NET4 the defaults are pretty good, however In older versions of the .NET framework, these defaults were much more conservative than .NET4.
.NET 4.0
MaxConcurrentSessions: default is 100 * ProcessorCount
MaxConcurrentCalls: default is 16 * ProcessorCount
MaxConcurrentInstances: default is MaxConcurrentCalls+MaxConcurrentSessions
1.)Yes.
2.)Yes.
If you want to control the number of simultaneous requests you can try using Stephen Toub's ForEachAsync method. it allows you to control how many tasks are processed at the same time.
public static class Extensions
{
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
}
void Main()
{
var myClientProxy = new ClientProxy();
var responses = new List<Response>();
// Max 10 concurrent requests
Requests.ForEachAsync<Request>(10, async (r) =>
{
var response = await client.GetResponsesAsync( localTransaction.Request );
responses.Add(response);
}).Wait();
}

Out of Memory Threading - Perf Test Tool

I'm creating a tool to load test (sends http: GETs) and it runs fine but eventually dies because of an out of memory error.
ASK: How can I reset the threads so this loop can continually run and not err?
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
while (true)
{
for (int i = 0; i < 1000; i++)
{
new Thread(LoadTest).Start(); //<-- EXCEPTION!.eventually errs out of memory
}
Thread.Sleep(2);
}
}
static void LoadTest()
{
string url = "http://myserv.com/api/dev/getstuff?whatstuff=thisstuff";
// Sends http get from above url ... and displays the repose in the console....
}
You are instantiating Threads left right and centre. This is likely you problem. You want to replace the
new Thread(LoadTest).Start();
with
Task.Run(LoadTest);
This will run your LoadTest on a Thread in the ThreadPool, instead of using resources to create a new Thread each time. HOWEVER. This will then expose a different issue.
Threads on the ThreadPool are a limited resource and you want to return Threads to the ThreadPool as soon as possible. I assume you are using the synchronous download methods as opposed to the APM methods. This means that whilst the request is being sent out to the server, the thread spawning the request is sleeping as opposed to going off to do some other work.
Either use (assuming .net 4.5)
var client = new WebClient();
var response = await client.DownloadStringTaskAsync(url);
Console.WriteLine(response);
Or use a callback (if not .net 4.5)
var client = new WebClient();
client.OnDownloadStringCompleted(x => Console.WriteLine(x));
client.BeginDownloadString(url);
Use a ThreadPool and use QueueUserWorkItem instead of creating thousands of threads. Threads are expensive objects and it is no surprise you are running out of memory and besides you won't be able to have any performance (in your test tool) with so many threads.
You code snippet creates lots of threads and no wonder it eventually runs out of memory. It would be better to use a Thread Pool here.
You code would look like this:
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
ThreadPool.SetMaxThreads(500, 300);
while (true)
{
ThreadPool.QueueUserWorkItem(LoadTest);
}
}
static void LoadTest(object state)
{
string url = "http://myserv.com/api/dev/getstuff?whatstuff=thisstuff";
// Sends http get from above url ... and displays the repose in the console....
}

Threading an unkown amount of threads in C#

I'm currently writing a sitemap generator that scrapes a site for urls and builds an xml sitemap. As most of the waiting is spent on requests to uri's I'm using threading, specifically the build in ThreadPool object.
In order to let the main thread wait for the unknown amount of threads to complete I have implemented the following setup. I don't feel this is a good solution though, can any threading gurus advise me of any problems this solution has, or suggest a better way to implement it?
The EventWaitHandle is set to EventResetMode.ManualReset
Here is the thread method
protected void CrawlUri(object o)
{
try
{
Interlocked.Increment(ref _threadCount);
Uri uri = (Uri)o;
foreach (Match match in _regex.Matches(GetWebResponse(uri)))
{
Uri newUri = new Uri(uri, match.Value);
if (!_uriCollection.Contains(newUri))
{
_uriCollection.Add(newUri);
ThreadPool.QueueUserWorkItem(_waitCallback, newUri);
}
}
}
catch
{
// Handle exceptions
}
finally
{
Interlocked.Decrement(ref _threadCount);
}
// If there are no more threads running then signal the waithandle
if (_threadCount == 0)
_eventWaitHandle.Set();
}
Here is the main thread method
// Request first page (based on host)
Uri root = new Uri(context.Request.Url.GetLeftPart(UriPartial.Authority));
// Begin threaded crawling of the Uri
ThreadPool.QueueUserWorkItem(_waitCallback, root);
Thread.Sleep(5000); // TEMP SOLUTION: Sleep for 5 seconds
_eventWaitHandle.WaitOne();
// Server the Xml Sitemap
context.Response.ContentType = "text/xml";
context.Response.Write(GetXml().OuterXml);
Any ideas are much appreciated :)
Well, first off you can create a ManualResetEvent that starts unset, so you don't have to sleep before waiting on it. Secondly you're going to need to put thread synchronization around your Uri collection. You could get a race condition where one two threads pass the "this Uri does not exist yet" check and they add duplicates. Another race condition is that two threads could pass the if (_threadCount == 0) check and they could both set the event.
Last, you can make the whole thing much more efficient by using the asynchronous BeginGetRequest. Your solution right now keeps a thread around to wait for every request. If you use async methods and callbacks, your program will use less memory (1MB per thread) and won't need to do context switches of threads nearly as much.
Here's an example that should illustrate what I'm talking about. Out of curiosity, I did test it out (with a depth limit) and it does work.
public class CrawlUriTool
{
private Regex regex;
private int pendingRequests;
private List<Uri> uriCollection;
private object uriCollectionSync = new object();
private ManualResetEvent crawlCompletedEvent;
public List<Uri> CrawlUri(Uri uri)
{
this.pendingRequests = 0;
this.uriCollection = new List<Uri>();
this.crawlCompletedEvent = new ManualResetEvent(false);
this.StartUriCrawl(uri);
this.crawlCompletedEvent.WaitOne();
return this.uriCollection;
}
private void StartUriCrawl(Uri uri)
{
Interlocked.Increment(ref this.pendingRequests);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.BeginGetResponse(this.UriCrawlCallback, request);
}
private void UriCrawlCallback(IAsyncResult asyncResult)
{
HttpWebRequest request = asyncResult.AsyncState as HttpWebRequest;
try
{
HttpWebResponse response = (HttpWebResponse)request.EndGetResponse(asyncResult);
string responseText = this.GetTextFromResponse(response); // not included
foreach (Match match in this.regex.Matches(responseText))
{
Uri newUri = new Uri(response.ResponseUri, match.Value);
lock (this.uriCollectionSync)
{
if (!this.uriCollection.Contains(newUri))
{
this.uriCollection.Add(newUri);
this.StartUriCrawl(newUri);
}
}
}
}
catch (WebException exception)
{
// handle exception
}
finally
{
if (Interlocked.Decrement(ref this.pendingRequests) == 0)
{
this.crawlCompletedEvent.Set();
}
}
}
}
When doing this kind of logic I generally try to make an object representing each asynchronous task and the data it needs to run. I would typically add this object to the collection of tasks to be done. The thread pool gets these tasks secheduled, and I would let the object itself remove itself from the "to be done" collection when the task finishes, possibly signalling on the collection itself.
So you're finished when the "to be done" collection is empty; the main thread is probably awoken once by each task that finishes.
You could look into the CTP of the Task Parallel Library which should make this simpler for you. What you're doing can be divided into "tasks", chunks or units of work, and the TPL can parallelize this for you if you supply the tasks. It uses a thread pool internally as well, but it's easier to use and comes with a lot of options like waiting for all tasks to finish. Check out this Channel9 video where the possibilities are explained and where a demo is shown of traversing a tree recursively in parallel, which seems very applicable to your problem.
However, it's still a preview and won't be released until .NET 4.0, so it comes with no warranties and you'll have to manually include the supplied System.Threading.dll (found in the install folder) into your project and I don't know if that's an option to you.

Categories

Resources