Processing a month of dates in parallel with a progress bar - c#

I am working on an ASP.Net MVC 3 C# project.
I have read about ThreadPool.QueueUserWorkItem and Tasks, I think what i need should use one of them.
public static void ProcessDay(DateTime day) { ... }
Let's say that I want to process days from 2012-01-01 to 2012-01-31, So I have to call ProcessDay function 31 times.
How to to process 5 days at once, and keep running more threads as any of the 5 threads is done, till it finishes the 31 days?
Samples are highly appreciated, and considering showing progress bar is a good idea :)

Have a look at TPL library - for example,
DateTime startDate = new DateTime(2012, 01, 01);
var options = new ParallelOptions() { MaxDegreeOfParallelism = 5 };
Parallel.For(0, 30, options, day => { ProcessDay(startDay.AddDays(day)); });

A simple parallelization of this could be:-
var dates = Enumerable.Range(1, DateTime.DaysInMonth(year, month)).Select(day => new DateTime(year, month, day));
Parallel.ForEach(dates, date => { ... });
(Note that, if you have shared state that each thread needs to modify, then you will have to use thread synchronization techniques appropriately.)
As for the number of threads, it's usually best to leave Parallel.ForEach() to work this out for itself. It will use the thread pool internally and will use all available processors efficiently. You can limit the number of threads it will use by using an overload of ForEach which takes a ParallelOptions object with MaxDegreeOfParallelism set to the required amount, but you should run some benchmarks to see whether you can really achieve better results by doing this.
Moreover, if this is a 'long running task' for which you wish to show progress, then you won't be able to achieve this with a single web request. The notion of a 'progress bar' in a website is rather different to that in a desktop application. In a website this typically has to be done by polling. I.e. send a request which starts the process (response returned immediately), poll periodically for progress (which can be displayed in a progress bar or similar) and then display the result when finished.
There are also frameworks and techniques being developed for 'push' type notification e.g. SignalR, so this may also be an option if you want to investigate it.

Related

Concurrently running multiple tasks C#

I have REST web API service in IIS which takes a collection of request objects. The user can enter more than 100 request objects.
I want to run this 100 request concurrently and then aggregate the result and send it back. This involves both I/O operation (calling to backend services for each request) and CPU bound operations (to compute few response elements)
Code snippet -
using System.Threading.Tasks;
....
var taskArray = new Task<FlightInformation>[multiFlightStatusRequest.FlightRequests.Count];
for (int i = 0; i < multiFlightStatusRequest.FlightRequests.Count; i++)
{
var z = i;
taskArray[z] = Tasks.Task.Run(() =>
PerformLogic(multiFlightStatusRequest.FlightRequests[z],lite, fetchRouteByAnyLeg)
);
}
Task.WaitAll(taskArray);
for (int i = 0; i < taskArray.Length; i++)
{
flightInformations.Add(taskArray[i].Result);
}
public Object PerformLogic(Request,...)
{
//multiple IO operations each depends on the outcome of the previous result
//Computations after getting the result from all I/O operations
}
If i individually run the PerformLogic operation (for 1 object) it is taking 300 ms, now my requirement is when I run this PerformLogic() for 100 objects in a single request it should take around 2 secs.
PerformLogic() has the following steps - 1. Call a 3rd Party web service to get some details 2. Based on the details call another 3rd Party webservice 3. Collect the result from the webservice, apply few transformation
But with Task.run() it takes around 7 secs, I would like to know the best approach to handle concurrency and achieve the desired NFR of 2 secs.
I can see that at any point of time 7-8 threads are working concurrently
not sure if I can spawn 100 threads or tasks may be we can see some better performance. Please suggest an approach to handle this efficiently.
Judging by this
public Object PerformLogic(Request,...)
{
//multiple IO operations each depends on the outcome of the previous result
//Computations after getting the result from all I/O operations
}
I'd wager that PerformLogic spends most its time waiting on the IO operations. If so, there's hope with async. You'll have to rewrite PerformLogicand maybe even the IO operations - async needs to be present in all levels, from the top to the bottom. But if you can do it, the result should be a lot faster.
Other than that - get faster hardware. If 8 cores take 7 seconds, then get 32 cores. It's pricey, but could still be cheaper than rewriting the code.
First, don't reinvent the wheel. PLINQ is perfectly capable of doing stuff in parallel, there is no need for manual task handling or result merging.
If you want 100 tasks each taking 300ms done in 2 seconds, you need at least 15 parallel workers, ignoring the cost of parallelization itself.
var results = multiFlightStatusRequest.FlightRequests
.AsParallel()
.WithDegreeOfParallelism(15)
.Select(flightRequest => PerformLogic(flightRequest, lite, fetchRouteByAnyLeg)
.ToList();
Now you have told PLinq to use 15 concurrent workers to work on your queue of tasks. Are you sure your machine is up to the task? You could put any number you want in there, that doesn't mean that your computer magically gets the power to do that.
Another option is to look at your PerformLogic method and optimize that. You call it 100 times, maybe it's worth optimizing.

How to approximate job completion times in Hangfire

I have an application that uses hangfire to do long-running jobs for me (I know the time the job takes and it is always roughly the same), and in my UI I want to give an estimate for when a certain job is done. For that I need to query hangfire for the position of the job in the queue and the number of servers working on it.
I know I can get the number of enqueued jobs (in the "DEFAULT" queue) by
public long JobsInQueue() {
var monitor = JobStorage.Current.GetMonitoringApi();
return monitor.EnqueuedCount("DEFAULT");
}
and the number of servers by
public int HealthyServers() {
var monitor = JobStorage.Current.GetMonitoringApi();
return monitor.Servers().Count(n => (n.Heartbeat != null) && (DateTime.Now - n.Heartbeat.Value).TotalMinutes < 5);
}
(BTW: I exclude older heartbeats, because if I turn off servers they sometimes linger in the hangfire database. Is there a better way?), but to give a proper estimate I need to know the position of the job in the queue. How do I get that?
The problem you have is that hangfire is asynchronous, queued, parallel, exhibits an at-least-once durability semantic, and basically non-deterministic.
To know with certainty the order in which an item will finish being processed in such a system is impossible. In fact, if the requirement was to enforce strict ordering, then many of the benefits of hangfire would go away.
There is a very good blog post by #odinserj (the author of hangfire) where he outlines this point: http://odinserj.net/2014/05/10/are-your-methods-ready-to-run-in-background/
However, that said, it's not impossible to come up with a sensible estimation algorithm, but it would have to be one where the order of execution is approximated in some way. As to how you can arrive at such an algorithm I don't know but something like this might work (but probably won't):
Approximate seconds remaining until completion =
(
(average duration of job in seconds * queue depth)
/ (the lower of: number of hangfire threads OR queue depth)
)
- number of seconds already spent in queue
+ average duration of job in seconds

Hangfire recurring tasks under minute

Is there a way to set hangfire recurring jobs every few seconds?
I do not seek a solution where fire and forget task creates another fire and forget task, and if not, what are suggested alternatives?
Not sure when this became supported but tried this in ASP.NET Core 2.0 with Hangfire 1.7.0. The following code schedules a job every 20 seconds:
RecurringJob.AddOrUpdate<SomeJob>(
x => x.DoWork(),
"*/20 * * * * *");
If I am not mistaken 6 tokens (as opposed to standard 5 tokens) is supported due to Hangfire use of NCrontab which allows cron expressions with 6 tokens (second granularity instead of minute granularity).
Hangfire dashboard also nicely shows the small time interval between runs:
I think anyone who is against allowing a recurring trigger of less than 1 min is short sighted. After all, is 55 secs any less efficient than 1 min ? It seems so arbitrary! As much as I love Hangfire, I've encountered situations where I've had to steer a client to Quartz.net simply because there was a business requirement for a job to run every 55 secs or alike.
Anyone who makes the counter argument that if it was configured to run every 1sec it would have a serious impact on performance is again taking a closed view on things. Of course a trigger with a 1 sec interval is probably not a good idea, but do we dissalow 55 sec or 45 sec for the unlikely situation where someone will choose 1 sec ?
In any case, performance is both subjective and dependent on the host platform and hardware. It really isn't up to the API to enforce opinion when it comes to performance. Just make the polling interval and trigger recurrence configurable. That way the user can determine the best result for themselves.
Although a background process which is orchestrating a job to run every 55 sec may be an option, it isn't very satisfactory. In this case, the process isn't visible via the Hangfire UI so it's hidden from the administrator. I feel this approach is circumventing one of the major benefits of Hangfire.
If Hangfire was a serious competitor to the likes of Quartz.net it would at least match their basic functionality. If Quartz can support triggers with an interval below 1 min than why can't Hangfire!
Although Hangfire doesn't allow you to schedule tasks for less than a minute, you can actually achieve this by having the function recursively scheduling itself; i.e. let's say you want some method to be hit every 2s you can schedule a background job that calls the method on Startup;
BackgroundJob.Schedule(() => PublishMessage(), TimeSpan.FromMilliseconds(2000));
And then in your PublishMessage method do your stuff and then schedule a job to call the same method;
public void PublishMessage()
{
/* do your stuff */
//then schedule a job to exec the same method
BackgroundJob.Schedule(() => PublishMessage(), TimeSpan.FromMilliseconds(2000));
}
The other thing you need to override is the default SchedulePollingInterval of 15s, otherwise your method will only be hit after every 15s. To do so just pass in an instance of BackgroundJobServerOptions to UseHangfireServer in your startup, like so;
var options = new BackgroundJobServerOptions
{
SchedulePollingInterval = TimeSpan.FromMilliseconds(2000)
};
app.UseHangfireServer(options);
I don't know how "foolproof" my solution is, but I managed to achieve my goal with it and everything is "happy" in production.
I had to do the same but with 5 seconds. The default schedule polling interval is set to 15s. So it requires 2 steps to achieve a 5s interval job.
in Startup.cs
var options = new BackgroundJobServerOptions
{
SchedulePollingInterval = TimeSpan.FromMilliseconds(5000)
};
app.UseHangfireDashboard();
app.UseHangfireServer(options);
Your job
RecurringJob.AddOrUpdate(() => YourJob(), "*/5 * * * * *");
Hangfire doesn't support intervals of less than a minute for recurring jobs.
Why? Imagine if they allowed less than a minute: let say 1 sec. How frequently would hangfire check recurring jobs in the database? This would cause a lot of database IO.
See this discussion on Hangfire for more information.
I faced with the same problem, and here it is my solution:
private void TimingExecuteWrapper(Action action, int sleepSeconds, int intervalSeconds)
{
DateTime beginTime = DateTime.UtcNow, endTime;
var interval = TimeSpan.FromSeconds(intervalSeconds);
while (true)
{
action();
Thread.Sleep(TimeSpan.FromSeconds(sleepSeconds));
endTime = DateTime.UtcNow;
if (endTime - beginTime >= interval)
break;
}
}
intervalSeconds is minimal NCRON interval. It is 1 minute.
action is our job code.
Also I suggest to use DisableConcurrentExecution to avoid some collisions of concurrency.
I had a similar requirement, in that I had a recurring job that needs running every 15 seconds.
What I did to try to get around this limitation was to just delay the creation of the scheduled jobs (set to 1 minute intervals), which seemed to do the trick.
However what I found was that, taking into account the polling intervals (set the schedulepolling interval to my frequency) and delays in picking up the new jobs this isn't always as accurate as it should be, but is doing the trick for the moment. However a better/proper solution would be good.
feel a bit dirty having to resolve to the below approach, but it helped me out...
so in essence I created 4 jobs doing the same thing, created 15 seconds apart.
along the lines of:
...
new Thread(() =>
{
//loop from {id=1 through 4}
// create job here with {id} in the name at interval of 1 minute
// sleep 15 seconds
//end loop
}).Start();
...

how many async calls can I make in parallel

In my application
int numberOfTimes = 1; //Or 100, or 100000
//Incorrect, please see update.
var tasks = Enumerable.Repeat(
(new HttpClient()).GetStringAsync("http://www.someurl.com")
, numberOfTimes);
var resultArray = await Task.WhenAll(tasks);
With numberOfTimes == 1, it takes 5 seconds.
With numberOfTimes == 100000, it still takes 5 seconds.
Thats amazing.
But does that mean I can run unlimited calls in parallel? There has to be some limit when this starts to queues?
What is that limit? Where is that set? What does it depend on?
In other words, How many IO completion ports are there? Who all are competing for them? Does IIS get its own set of IO completion port.
--This is in an ASP.Net MVC action, .Net 4.5.2, IIS
Update: Thanks to #Enigmativity, following is more relevant to the question
var tasks = Enumerable.Range(1, numberOfTimes ).Select(i =>
(new HttpClient()).GetStringAsync("http://deletewhenever.com/api/default"));
var resultArray = await Task.WhenAll(tasks);
With numberOfTimes == 1, it takes 5 seconds.
With numberOfTimes == 100, it still takes 5 seconds.
I am seeing more believable numbers for higher counts now though. The question remains, what governs the number?
What is that limit? Where is that set?
There's no explicit limit. However, you will eventually run out of resources. Mark Russinovich has an interesting blog series on probing the limits of common resources.
Asynchronous operations generally increase memory usage in exchange for responsiveness. So, each naturally-async op uses at least memory for its Task, an OVERLAPPED struct, and an IRP for the driver (each of these represents an in-progress asynchronous operation at different levels). At the lower levels, there are lots and lots of different limitations that can come into play to affect system resources (for an example, I have an old blog post where I had to calculate the maximum size of an I/O buffer - something you would think is simple but is really not).
Socket operations require a client port, which are (in theory) limited to 64k connections to the same remote IP. Sockets also have their own more significant memory overhead, with both input and output buffers at the device level and in user space.
The IOCP doesn't come into play until the operations complete. On .NET, there's only one IOCP for your AppDomain. The default maximum number of I/O threads servicing this IOCP is 1000 on the modern (4.5) .NET framework. Note that this is a limit on how many operations may complete at a time, not how many may be in progress at a time.
Here's a test to see what's going on.
Start with this code:
var i = 0;
Func<int> generate = () =>
{
Thread.Sleep(1000);
return i++;
};
Now call this:
Enumerable.Repeat(generate(), 5)
After one second you get { 0, 0, 0, 0, 0 }.
But make this call:
Enumerable.Range(0, 5).Select(n => generate())
After five seconds you get { 0, 1, 2, 3, 4 }.
It's only calling the async function once in your code.

Can Parallelism in .Net takes over CPU and may deny services to other processes?

I am trying to understand how Parallelism is implemented in .Net. Following code is taken as an example from Reed Copsey Blog.
This code loops over the customers collection and sends them emails after 14 days since their last contact. My Question here is if the customer table is very BIG and sending an email takes few seconds, will NOT this code takes CPU in Denial of Services mode to other important processes?
Is there a way to run following lines of code in parallel but only using few cores so other processes can share CPU? Or Am i approaching the problem in wrong way?
Parallel.ForEach(customers, (customer, parallelLoopState) =>
{
// database operation
DateTime lastContact = theStore.GetLastContact(customer);
TimeSpan timeSinceContact = DateTime.Now - lastContact;
// If it's been more than two weeks, send an email, and update...
if (timeSinceContact.Days > 14)
{
// Exit gracefully if we fail to email, since this
// entire process can be repeated later without issue.
if (theStore.EmailCustomer(customer) == false)
parallelLoopState.Break();
else
customer.LastEmailContact = DateTime.Now;
}
});
Accepted Answer:
Thought Process was RIGHT! as Cole Campbell pointed out, One can control and configure how many cores should be used by specifying ParallelOption object in this specific example. Here is how.
var parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism =
Math.Max(Environment.ProcessorCount / 2, 1);
And Parallel.ForEach will be used as follow.
Parallel.ForEach(customers, parallelOptions,
(customer, parallelLoopState) =>
{
//do all same stuff
}
Same concept can be applied for PLINQ using .WithDegreeOfParallelism(int numberOfThreads).
For more information on how to configure Parallel Options, read this.
The Task Parallelism Library is designed to take the system workload into account when scheduling tasks to run, so this shouldn't be an issue. However, you can use the MaxDegreeOfParallelism property on the ParallelOptions class, which can be passed into one of the overloads of ForEach(), to restrict the number of concurrent operations that it can perform, if you really need to.

Categories

Resources