I have REST web API service in IIS which takes a collection of request objects. The user can enter more than 100 request objects.
I want to run this 100 request concurrently and then aggregate the result and send it back. This involves both I/O operation (calling to backend services for each request) and CPU bound operations (to compute few response elements)
Code snippet -
using System.Threading.Tasks;
....
var taskArray = new Task<FlightInformation>[multiFlightStatusRequest.FlightRequests.Count];
for (int i = 0; i < multiFlightStatusRequest.FlightRequests.Count; i++)
{
var z = i;
taskArray[z] = Tasks.Task.Run(() =>
PerformLogic(multiFlightStatusRequest.FlightRequests[z],lite, fetchRouteByAnyLeg)
);
}
Task.WaitAll(taskArray);
for (int i = 0; i < taskArray.Length; i++)
{
flightInformations.Add(taskArray[i].Result);
}
public Object PerformLogic(Request,...)
{
//multiple IO operations each depends on the outcome of the previous result
//Computations after getting the result from all I/O operations
}
If i individually run the PerformLogic operation (for 1 object) it is taking 300 ms, now my requirement is when I run this PerformLogic() for 100 objects in a single request it should take around 2 secs.
PerformLogic() has the following steps - 1. Call a 3rd Party web service to get some details 2. Based on the details call another 3rd Party webservice 3. Collect the result from the webservice, apply few transformation
But with Task.run() it takes around 7 secs, I would like to know the best approach to handle concurrency and achieve the desired NFR of 2 secs.
I can see that at any point of time 7-8 threads are working concurrently
not sure if I can spawn 100 threads or tasks may be we can see some better performance. Please suggest an approach to handle this efficiently.
Judging by this
public Object PerformLogic(Request,...)
{
//multiple IO operations each depends on the outcome of the previous result
//Computations after getting the result from all I/O operations
}
I'd wager that PerformLogic spends most its time waiting on the IO operations. If so, there's hope with async. You'll have to rewrite PerformLogicand maybe even the IO operations - async needs to be present in all levels, from the top to the bottom. But if you can do it, the result should be a lot faster.
Other than that - get faster hardware. If 8 cores take 7 seconds, then get 32 cores. It's pricey, but could still be cheaper than rewriting the code.
First, don't reinvent the wheel. PLINQ is perfectly capable of doing stuff in parallel, there is no need for manual task handling or result merging.
If you want 100 tasks each taking 300ms done in 2 seconds, you need at least 15 parallel workers, ignoring the cost of parallelization itself.
var results = multiFlightStatusRequest.FlightRequests
.AsParallel()
.WithDegreeOfParallelism(15)
.Select(flightRequest => PerformLogic(flightRequest, lite, fetchRouteByAnyLeg)
.ToList();
Now you have told PLinq to use 15 concurrent workers to work on your queue of tasks. Are you sure your machine is up to the task? You could put any number you want in there, that doesn't mean that your computer magically gets the power to do that.
Another option is to look at your PerformLogic method and optimize that. You call it 100 times, maybe it's worth optimizing.
Related
I have an application that uses hangfire to do long-running jobs for me (I know the time the job takes and it is always roughly the same), and in my UI I want to give an estimate for when a certain job is done. For that I need to query hangfire for the position of the job in the queue and the number of servers working on it.
I know I can get the number of enqueued jobs (in the "DEFAULT" queue) by
public long JobsInQueue() {
var monitor = JobStorage.Current.GetMonitoringApi();
return monitor.EnqueuedCount("DEFAULT");
}
and the number of servers by
public int HealthyServers() {
var monitor = JobStorage.Current.GetMonitoringApi();
return monitor.Servers().Count(n => (n.Heartbeat != null) && (DateTime.Now - n.Heartbeat.Value).TotalMinutes < 5);
}
(BTW: I exclude older heartbeats, because if I turn off servers they sometimes linger in the hangfire database. Is there a better way?), but to give a proper estimate I need to know the position of the job in the queue. How do I get that?
The problem you have is that hangfire is asynchronous, queued, parallel, exhibits an at-least-once durability semantic, and basically non-deterministic.
To know with certainty the order in which an item will finish being processed in such a system is impossible. In fact, if the requirement was to enforce strict ordering, then many of the benefits of hangfire would go away.
There is a very good blog post by #odinserj (the author of hangfire) where he outlines this point: http://odinserj.net/2014/05/10/are-your-methods-ready-to-run-in-background/
However, that said, it's not impossible to come up with a sensible estimation algorithm, but it would have to be one where the order of execution is approximated in some way. As to how you can arrive at such an algorithm I don't know but something like this might work (but probably won't):
Approximate seconds remaining until completion =
(
(average duration of job in seconds * queue depth)
/ (the lower of: number of hangfire threads OR queue depth)
)
- number of seconds already spent in queue
+ average duration of job in seconds
I have the following issue :
I am using a parallel.foreach iteration for a pretty CPU intensive workload (applying a method on a number of items) & it works fine for about the first 80% of the items - using all cpu cores very nice.
As the iteration seems to come near to the end (around 80% i would say) i see that the number of threads begins to go down core by core, & at the end the last around 5% of the items are proceesed only by two cores. So insted to use all cores untill the end, it slows down pretty hard toward the end of the iteration.
Please note the the workload can be per item very different. One can last 1-2 seconds, the other item can take 2-3 minutes to finish.
Any ideea, suggestion is very welcome.
Code used:
var source = myList.ToArray();
var rangePartitioner = Partitioner.Create(0, source.Lenght);
using (SqlConnection connection =new SqlConnection(cnStr))
{
connection.Open();
try
(
Parallel.ForEach(rangePartitioner, (range, loopState) =>
{
for(int i = range.Item1; i<range.Item2; i++)
{
CPUIntensiveMethod(source[i]);
}
});
}
catch(AggretateException ae)
{ //Exception cachting}
}
This is an unavoidable consequence of the fact the parallelism is per computation. It is clear that the whole parallel batch cannot run any quicker than the time taken by the slowest single item in the work-set.
Imagine a batch of 100 items, 8 of which are slow (say 1000s to run) and the rest are quick (say 1s to run). You kick them off in a random order across 8 threads. Its clear that eventually each thread will be calculating one of your long running items, at this point you are seeing full utilisation. Eventually the one(s) that hit their long-op(s) first will finish up their long op(s) and quickly finish up any remaining short ops. At that time you ONLY have some of the long ops waiting to finish, so you will see the active utilisation drop off.. i.e. at some point there are only 3 ops left to finish, so only 3 cores are in use.
Mitigation Tactics
Your long running items might be amenable to 'internal parallelism' allowing them to have a faster minimum limit runtime.
Your long running items may be able to be identified and prioritised to start first (which will ensure you get full CPU utilisation for a long as possible)
(see update below) DONT use partitioning in cases where the body can be long running as this simply increases the 'hit' of this effect. (ie get rid of your rangePartitioner entirely). This will massively reduce the impact of this effect to your particular loop
either way your batch run-time is bound by the run-time of the slowest item in the batch.
Update I have also noticed you are using partitioning on your loop, which massively increases the scope of this effect, i.e. you are saying 'break this work-set down into N work-sets' and then parallelize the running of those N work-sets. In the example above this could mean that you get (say) 3 of the long ops into the same work-set and so those are going to process on that same thread. As such you should NOT be using partitioning if the inner body can be long running. For example the docs on partitioning here https://msdn.microsoft.com/en-us/library/dd560853(v=vs.110).aspx are saying this is aimed at short bodies
If you have multiple threads that process the same number of items each and each item takes varying amount of time, then of course you will have some threads that finish earlier.
If you use collection whose size is not known, then the items will be taken one by one:
var source = myList.AsEnumerable();
Another approach can be a Producer-Consumer pattern
https://msdn.microsoft.com/en-us/library/dd997371
In my application
int numberOfTimes = 1; //Or 100, or 100000
//Incorrect, please see update.
var tasks = Enumerable.Repeat(
(new HttpClient()).GetStringAsync("http://www.someurl.com")
, numberOfTimes);
var resultArray = await Task.WhenAll(tasks);
With numberOfTimes == 1, it takes 5 seconds.
With numberOfTimes == 100000, it still takes 5 seconds.
Thats amazing.
But does that mean I can run unlimited calls in parallel? There has to be some limit when this starts to queues?
What is that limit? Where is that set? What does it depend on?
In other words, How many IO completion ports are there? Who all are competing for them? Does IIS get its own set of IO completion port.
--This is in an ASP.Net MVC action, .Net 4.5.2, IIS
Update: Thanks to #Enigmativity, following is more relevant to the question
var tasks = Enumerable.Range(1, numberOfTimes ).Select(i =>
(new HttpClient()).GetStringAsync("http://deletewhenever.com/api/default"));
var resultArray = await Task.WhenAll(tasks);
With numberOfTimes == 1, it takes 5 seconds.
With numberOfTimes == 100, it still takes 5 seconds.
I am seeing more believable numbers for higher counts now though. The question remains, what governs the number?
What is that limit? Where is that set?
There's no explicit limit. However, you will eventually run out of resources. Mark Russinovich has an interesting blog series on probing the limits of common resources.
Asynchronous operations generally increase memory usage in exchange for responsiveness. So, each naturally-async op uses at least memory for its Task, an OVERLAPPED struct, and an IRP for the driver (each of these represents an in-progress asynchronous operation at different levels). At the lower levels, there are lots and lots of different limitations that can come into play to affect system resources (for an example, I have an old blog post where I had to calculate the maximum size of an I/O buffer - something you would think is simple but is really not).
Socket operations require a client port, which are (in theory) limited to 64k connections to the same remote IP. Sockets also have their own more significant memory overhead, with both input and output buffers at the device level and in user space.
The IOCP doesn't come into play until the operations complete. On .NET, there's only one IOCP for your AppDomain. The default maximum number of I/O threads servicing this IOCP is 1000 on the modern (4.5) .NET framework. Note that this is a limit on how many operations may complete at a time, not how many may be in progress at a time.
Here's a test to see what's going on.
Start with this code:
var i = 0;
Func<int> generate = () =>
{
Thread.Sleep(1000);
return i++;
};
Now call this:
Enumerable.Repeat(generate(), 5)
After one second you get { 0, 0, 0, 0, 0 }.
But make this call:
Enumerable.Range(0, 5).Select(n => generate())
After five seconds you get { 0, 1, 2, 3, 4 }.
It's only calling the async function once in your code.
I have a List of items, and I would like to go through each item, create a task and launch the task. But, I want it to do batches of 10 tasks at once.
For example, if I have 100 URL's in a list, I want it to group them into batches of 10, and loop through batches getting the web response from 10 URL's per batch iteration.
Is this possible?
I am using C# 5 and .NET 4.5.
You can use Parallel.For() or Parallel.ForEach(), they will execute the work on a number of Tasks.
When you need precise control over the batches you could use a custom Partitioner but given that the problem is about URLs it will probably make more sense to use the more common MaxDegreeOfParallelism option.
The Partitioner has a good algorithm for creating the batches depending also on the number of cores.
Parallel.ForEach(Partitioner.Create(from, to), range =>
{
for (int i = range.Item1; i < range.Item2; i++)
{
// ... process i
}
});
I am trying to understand how Parallelism is implemented in .Net. Following code is taken as an example from Reed Copsey Blog.
This code loops over the customers collection and sends them emails after 14 days since their last contact. My Question here is if the customer table is very BIG and sending an email takes few seconds, will NOT this code takes CPU in Denial of Services mode to other important processes?
Is there a way to run following lines of code in parallel but only using few cores so other processes can share CPU? Or Am i approaching the problem in wrong way?
Parallel.ForEach(customers, (customer, parallelLoopState) =>
{
// database operation
DateTime lastContact = theStore.GetLastContact(customer);
TimeSpan timeSinceContact = DateTime.Now - lastContact;
// If it's been more than two weeks, send an email, and update...
if (timeSinceContact.Days > 14)
{
// Exit gracefully if we fail to email, since this
// entire process can be repeated later without issue.
if (theStore.EmailCustomer(customer) == false)
parallelLoopState.Break();
else
customer.LastEmailContact = DateTime.Now;
}
});
Accepted Answer:
Thought Process was RIGHT! as Cole Campbell pointed out, One can control and configure how many cores should be used by specifying ParallelOption object in this specific example. Here is how.
var parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism =
Math.Max(Environment.ProcessorCount / 2, 1);
And Parallel.ForEach will be used as follow.
Parallel.ForEach(customers, parallelOptions,
(customer, parallelLoopState) =>
{
//do all same stuff
}
Same concept can be applied for PLINQ using .WithDegreeOfParallelism(int numberOfThreads).
For more information on how to configure Parallel Options, read this.
The Task Parallelism Library is designed to take the system workload into account when scheduling tasks to run, so this shouldn't be an issue. However, you can use the MaxDegreeOfParallelism property on the ParallelOptions class, which can be passed into one of the overloads of ForEach(), to restrict the number of concurrent operations that it can perform, if you really need to.