Basically I'm trying to be able to rate limit the execution of iterations of a list.
I really like the idea of using RX as I can build off the top of it, and have a more elegant solution, but it wouldn't have to be done using RX.
I've formulated this with the help of many much smarter than I. My problem is that I'd like to be able to say someCollection.RateLimitedForEach(rate, function), and have it ultimately block until we're done processing... or have it be an async method.
The demo below the function, works in a console app, but if I close after the foreach, it immediately returns.
I'm just kind of at a loss whether this is fixable, or if I should go about it completely different
public static void RateLimitedForEach<T>(this List<T> list, double minumumDelay, Action<T> action)
{
list.ToObservable().Zip(Observable.Interval(TimeSpan.FromSeconds(minumumDelay)), (v, _) => v)
.Do(action).Subscribe();
}
//rate limits iteration of foreach... keep in mind this is not the same thing as just sleeping for a second
//between each iteration, this is saying at the start of the next iteration, if minimum delay time hasnt past, hold until it has
var maxRequestsPerMinute = 60;
requests.RateLimitedForeach(60/maxRequestsPerMinute,(request) => SendRequest(request));
but it wouldn't have to be done using RX
Here is how you can do it synchronously:
public static void RateLimitedForEach<T>(
this List<T> list,
double minumumDelay,
Action<T> action)
{
foreach (var item in list)
{
Stopwatch sw = Stopwatch.StartNew();
action(item);
double left = minumumDelay - sw.Elapsed.TotalSeconds;
if(left > 0)
Thread.Sleep(TimeSpan.FromSeconds(left));
}
}
And here is how you can do it asynchronously (only potential waits are asynchronous):
public static async Task RateLimitedForEachAsync<T>(
this List<T> list,
double minumumDelay,
Action<T> action)
{
foreach (var item in list)
{
Stopwatch sw = Stopwatch.StartNew();
action(item);
double left = minumumDelay - sw.Elapsed.TotalSeconds;
if (left > 0)
await Task.Delay(TimeSpan.FromSeconds(left));
}
}
Please note that you can change the asynchronous version to make the action it self asynchronous like this:
public static async Task RateLimitedForEachAsync<T>(
this List<T> list,
double minumumDelay,
Func<T,Task> async_task_func)
{
foreach (var item in list)
{
Stopwatch sw = Stopwatch.StartNew();
await async_task_func(item);
double left = minumumDelay - sw.Elapsed.TotalSeconds;
if (left > 0)
await Task.Delay(TimeSpan.FromSeconds(left));
}
}
This is helpful if the action you need to run on each item is asynchronous.
The last version can be used like this:
List<string> list = new List<string>();
list.Add("1");
list.Add("2");
var task = list.RateLimitedForEachAsync(1.0, async str =>
{
//Do something asynchronous here, e.g.:
await Task.Delay(500);
Console.WriteLine(DateTime.Now + ": " + str);
});
Now you should wait for task to finish. If this is the Main method, then you need to synchronously wait like this:
task.Wait();
On the other hand, if you are inside an asynchronous method, then you need to asynchronously wait like this:
await task;
Your code was just about perfect.
Try this instead:
public static void RateLimitedForEach<T>(this List<T> list, double minumumDelay, Action<T> action)
{
list
.ToObservable()
.Zip(Observable.Interval(TimeSpan.FromSeconds(minumumDelay)), (v, _) => v)
.Do(action)
.ToArray()
.Wait();
}
The concept that you need to get accross, is that the main thread is not waiting on your RateLimitedForEach call to complete. Also - on your console app - as soon as the main thread ends, the process ends.
What does that mean? It means that the process will end regardless of whatever or not the observer on RateLimitedForEach has finished executing.
Note: The user may still force the execution of you app to finish, and that is a good thing. You may use a form app if you want to be able to wait without hanging the UI, you may use a service if you don't want the user closing windows related to the process.
Using Task is a superios solution to what I present below.
Notice that when using Tasks on the console app, you still need to wait on the task to prevent the main thread to finish before RateLimitedForEach completed its job. Moving away from a console app is still advised.
If you insist in continuing using your code, you can tweak it for it to hang the calling thread until completion:
public static void RateLimitedForEach<T>
(
this List<T> list,
double minumumDelay,
Action<T> action
)
{
using (var waitHandle = new ManualResetEventSlim(false))
{
var mainObservable = list.ToObservable();
var intervalObservable = Observable.Interval(TimeSpan.FromSeconds(minumumDelay));
var zipObservable = mainObservable .Zip(intervalObservable, (v, _) => v);
zipObservable.Subscribe
(
action,
error => GC.KeepAlive(error), // Ingoring them, as you already were
() => waitHandle.Set() // <-- "Done signal"
);
waitHandle.Wait(); // <--- Waiting on the observer to complete
}
}
Does RX Throttle not do what you want?
https://msdn.microsoft.com/en-us/library/hh229400(v=vs.103).aspx
Related
I'm currently working on a concurrent file downloader.
For that reason I want to parametrize the number of concurrent tasks. I don't want to wait for all the tasks to be completed but to keep the same number being runned.
In fact, this thread on star overflow gave me a proper clue, but I'm struggling making it async:
Keep running a specific number of tasks
Here is my code:
public async Task StartAsync()
{
var semaphore = new SemaphoreSlim(1, _concurrentTransfers);
var queueHasMessages = true;
while (queueHasMessages)
{
try {
await Task.Run(async () =>
{
await semaphore.WaitAsync();
await asyncStuff();
});
}
finally {
semaphore.Release();
};
}
}
But the code just get executed one at a time. I think that the await is blocking me for generating the desired amount of tasks, but I don't know how to avoid it while respecting the limit established by the semaphore.
If I add all the tasks to a list and make a whenall, the semaphore throws an exception since it has reached the max count.
Any suggestions?
It was brought to my attention that the struck-through solution will drop any exceptions that occur during execution. That's bad.
Here is a solution that will not drop exceptions:
Task.Run is a Factory Method for creating a Task. You can check yourself with the intellisense return value. You can assign the returned Task anywhere you like.
"await" is an operator that will wait until the task it operates on completes. You are able to use any Task with the await operator.
public static async Task RunTasksConcurrently()
{
IList<Task> tasks = new List<Task>();
for (int i = 1; i < 4; i++)
{
tasks.Add(RunNextTask());
}
foreach (var task in tasks) {
await task;
}
}
public static async Task RunNextTask()
{
while(true) {
await Task.Delay(500);
}
}
By adding the values of the Task we create to a list, we can await them later on in execution.
Previous Answer below
Edit: With the clarification I think I understand better.
Instead of running every task at once, you want to start 3 tasks, and as soon as a task is finished, run the next one.
I believe this can happen using the .ContinueWith(Action<Task>) method.
See if this gets closer to your intended solution.
public void SpawnInitialTasks()
{
for (int i = 0; i < 3; i++)
{
RunNextTask();
}
}
public void RunNextTask()
{
Task.Run(async () => await Task.Delay(500))
.ContinueWith(t => RunNextTask());
// Recurse here to keep running tasks whenever we finish one.
}
The idea is that we spawn 3 tasks right away, then whenever one finishes we spawn the next. If you need to keep data flowing between the tasks, you can use parameters:
RunNextTask(DataObject object)
You can do this easily the old-fashioned way without using await by using Parallel.ForEach(), which lets you specify the maximum number of concurrent threads to use.
For example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
class Program
{
public static void Main(string[] args)
{
IEnumerable<string> filenames = Enumerable.Range(1, 100).Select(x => x.ToString());
Parallel.ForEach(
filenames,
new ParallelOptions { MaxDegreeOfParallelism = 4},
download
);
}
static void download(string filepath)
{
Console.WriteLine("Downloading " + filepath);
Thread.Sleep(1000); // Simulate downloading time.
Console.WriteLine("Downloaded " + filepath);
}
}
}
If you run this and observe the output, you'll see that the "files" are being "downloaded" in batchs.
A better simulation is the change download() so that it takes a random amount of time to process each "file", like so:
static Random rng = new Random();
static void download(string filepath)
{
Console.WriteLine("Downloading " + filepath);
Thread.Sleep(500 + rng.Next(1000)); // Simulate random downloading time.
Console.WriteLine("Downloaded " + filepath);
}
Try that and see the difference in the output.
However, if you want a more modern way to do this, you could look into the Dataflow part of the TPL (Task Parallel Library) - this works well with async methods.
This is a lot more complicated to get to grips with, but it's a lot more powerful. You could use an ActionBlock to do it, but describing how to do that is a bit beyond the scope of an answer I could give here.
Have a look at this other answer on StackOverflow; it gives a brief example.
Also note that the TPL is not built in to .Net - you have to get it from NuGet.
I have a C# WinForms (.NET 4.5.2) app utilizing the TPL. The tool has a synchronous function which is passed over to a task factory X amount of times (with different input parameters), where X is a number declared by the user before commencing the process. The tasks are started and stored in a List<Task>.
Assuming the user entered 5, we have this in an async button click handler:
for (int i = 0; i < X; i++)
{
var progress = Progress(); // returns a new IProgress<T>
var task = Task<int>.Factory.StartNew(() => MyFunction(progress), TaskCreationOptions.LongRunning);
TaskList.Add(task);
}
Each progress instance updates the UI.
Now, as soon as a task is finished, I want to fire up a new one. Essentially, the process should run indefinitely, having X tasks running at any given time, unless the user cancels via the UI (I'll use cancellation tokens for this). I try to achieve this using the following:
while (TaskList.Count > 0)
{
var completed = await Task.WhenAny(TaskList.ToArray());
if (completed.Exception == null)
{
// report success
}
else
{
// flatten AggregateException, print out, etc
}
// update some labels/textboxes in the UI, and then:
TaskList.Remove(completed);
var task = Task<int>.Factory.StartNew(() => MyFunction(progress), TaskCreationOptions.LongRunning);
TaskList.Add(task);
}
This is bogging down the UI. Is there a better way of achieving this functionality, while keeping the UI responsive?
A suggestion was made in the comments to use TPL Dataflow but due to time constraints and specs, alternative solutions are welcome
Update
I'm not sure whether the progress reporting might be the problem? Here's what it looks like:
private IProgress<string> Progress()
{
return new Progress<string>(msg =>
{
txtMsg.AppendText(msg);
});
}
Now, as soon as a task is finished, I want to fire up a new one. Essentially, the process should run indefinitely, having X tasks running at any given time
It sounds to me like you want an infinite loop inside your task:
for (int i = 0; i < X; i++)
{
var progress = Progress(); // returns a new IProgress<T>
var task = RunIndefinitelyAsync(progress);
TaskList.Add(task);
}
private async Task RunIndefinitelyAsync(IProgress<T> progress)
{
while (true)
{
try
{
await Task.Run(() => MyFunction(progress));
// handle success
}
catch (Exception ex)
{
// handle exceptions
}
// update some labels/textboxes in the UI
}
}
However, I suspect that the "bogging down the UI" is probably in the // handle success and/or // handle exceptions code. If my suspicion is correct, then push as much of the logic into the Task.Run as possible.
As I understand, you simply need a parallel execution with the defined degree of parallelization. There is a lot of ways to implement what you want. I suggest to use blocking collection and parallel class instead of tasks.
So when user clicks button, you need to create a new blocking collection which will be your data source:
BlockingCollection<IProgress> queue = new BlockingCollection<IProgress>();
CancellationTokenSource source = new CancellationTokenSource();
Now you need a runner that will execute your in parallel:
Task.Factory.StartNew(() =>
Parallel.For(0, X, i =>
{
foreach (IProgress p in queue.GetConsumingEnumerable(source.Token))
{
MyFunction(p);
}
}), source.Token);
Or you can choose more correct way with partitioner. So you'll need a partitioner class:
private class BlockingPartitioner<T> : Partitioner<T>
{
private readonly BlockingCollection<T> _Collection;
private readonly CancellationToken _Token;
public BlockingPartitioner(BlockingCollection<T> collection, CancellationToken token)
{
_Collection = collection;
_Token = token;
}
public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
{
throw new NotImplementedException();
}
public override IEnumerable<T> GetDynamicPartitions()
{
return _Collection.GetConsumingEnumerable(_Token);
}
public override bool SupportsDynamicPartitions
{
get { return true; }
}
}
And runner will looks like this:
ParallelOptions Options = new ParallelOptions();
Options.MaxDegreeOfParallelism = X;
Task.Factory.StartNew(
() => Parallel.ForEach(
new BlockingPartitioner<IProgress>(queue, source.Token),
Options,
p => MyFunction(p)));
So all you need right now is to fill queue with necessary data. You can do it whenever you want.
And final touch, when the user cancels operation, you have two options:
first you can break execution with source.Cancel call,
or you can gracefully stop execution by marking collection complete (queue.CompleteAdding), in that case runner will execute all already queued data and finish.
Of course you need additional code to handle exceptions, progress, state and so on. But main idea is here.
I have an external reference in my .NET console application that does some language translations on large strings for me.
In a loop, I make a bunch of calls to the service. There are probably 5000-8000 calls total.
The service requires that I implement a callback function so that it can give the translated string back to me when the work is completed. In another class which inherits the TranslationService's interface, I have implemented their callback function:
class MyTranslationServiceCallback : TranslationService.ITranslationServiceCallback
{
public void TranslateTextCallback(string sourceContent, string responseContent)
{
UpdateMyDatabase(responseContent);
}
}
When debugging, I have added Console.Readkey at the very end of my Main() to prevent the app from closing so that it can finish getting all of the callbacks. So far, I have just assumed that when it stops entering the callback function for a minute or so that it is "complete" (I know, this is bad).
So it looks like:
class Program
{
static void Main(string[] args)
{
foreach (var item in itemList)
{
TranslationService.TranslateText(item.EnglishText, "french");
}
Console.Readkey()
}
}
What is the proper way to determine whether or not all the callbacks have been completed?
Since translation service does not have any way of telling the status of translations you will need to keep track of the calls made and callbacks. Create a singleton which has a counter and increment with each call. Decrease the count in each call back.
Why not use the async framework built into .NET? All you need to do is to fire off tasks and keep track of them in an array, where then you can call Task.WhenAll to block the program until all Tasks are complete.
Note: I'm using the Nito.AsyncEx NuGet Package, in order to run async code from Console apps.
class Program
{
static int Main(string[] args)
{
return AsynContent.Run(() => MainAsync(args));
}
static async Task<int> MainAsync(string[] args)
{
var taskList = new List<Task>();
foreach (var item in itemList)
{
Task.Factory.StartNew(() => TranslationService.TranslateText(item.EnglishText, "french");
}
Task.WhenAll(taskList.ToArray());
}
}
If you're implementing this in .NET, then async/await is your friend.
It would be great if, rather than returning the result via callbacks, TranslationService returned a Task<string>.
Then you could implement the following:
static async Task TranslateAllItems(IEnumerable<Item> list)
{
foreach(var item in itemList)
{
string result = await TranslationService.TranslateText(item.EnglishText, "french");
UpdateMyDatabase(item.EnglishText, content);
}
}
static void Main(string[] args)
{
Task task = TranslateAllItems(itemList);
task.Wait();
Console.ReadKey();
}
The above solution would perform each translation in sequence, waiting for one translation task to complete before commencing with the next one.
If it would be faster to start all of the translations, then wait for the entire batch to finish:
static Task TranslateAllItems(IEnumerable<Item> list)
{
List<Task> waitingTasks = new List<Task>();
foreach(var item in itemList)
{
string englishText = item.EnglishText;
var task = TranslationService.TranslateText(englishText , "french")
.ContinueWith(taskResult => UpdateMyDatabase(englishText, taskResult.Result);
waitingTasks.Add(task);
}
return Task.WhenAll(waitingTasks);
}
I have this situation:
var tasks = new List<ITask> ...
Parallel.ForEach(tasks, currentTask => currentTask.Execute() );
Is it possible to instruct PLinq to wait for 500ms before the next thread is spawned?
System.Threading.Thread.Sleep(5000);
You are using Parallel.Foreach totally wrong, You should make a special Enumerator that rate limits itself to getting data once every 500 ms.
I made some assumptions on how your DTO works due to you not providing any details.
private IEnumerator<SomeResource> GetRateLimitedResource()
{
SomeResource someResource = null;
do
{
someResource = _remoteProvider.GetData();
if(someResource != null)
{
yield return someResource;
Thread.Sleep(500);
}
} while (someResource != null);
}
here is how your paralell should look then
Parallel.ForEach(GetRateLimitedResource(), SomeFunctionToProcessSomeResource);
There are already some good suggestions. I would agree with others that you are using PLINQ in a manner it wasn't meant to be used.
My suggestion would be to use System.Threading.Timer. This is probably better than writing a method that returns an IEnumerable<> that forces a half second delay, because you may not need to wait the full half second, depending on how much time has passed since your last API call.
With the timer, it will invoke a delegate that you've provided it at the interval you specify, so even if the first task isn't done, a half second later it will invoke your delegate on another thread, so there won't be any extra waiting.
From your example code, it sounds like you have a list of tasks, in this case, I would use System.Collections.Concurrent.ConcurrentQueue to keep track of the tasks. Once the queue is empty, turn off the timer.
You could use Enumerable.Aggregate instead.
var task = tasks.Aggregate((t1, t2) =>
t1.ContinueWith(async _ =>
{ Thread.Sleep(500); return t2.Result; }));
If you don't want the tasks chained then there is also the overload to Select assuming the tasks are in order of delay.
var tasks = Enumerable
.Range(1, 10)
.Select(x => Task.Run(() => x * 2))
.Select((x, i) => Task.Delay(TimeSpan.FromMilliseconds(i * 500))
.ContinueWith(_ => x.Result));
foreach(var result in tasks.Select(x => x.Result))
{
Console.WriteLine(result);
}
From the comments a better options would be to guard the resource instead of using the time delay.
static object Locker = new object();
static int GetResultFromResource(int arg)
{
lock(Locker)
{
Thread.Sleep(500);
return arg * 2;
}
}
var tasks = Enumerable
.Range(1, 10)
.Select(x => Task.Run(() => GetResultFromResource(x)));
foreach(var result in tasks.Select(x => x.Result))
{
Console.WriteLine(result);
}
In this case how about a Producer-Consumer pattern with a BlockingCollection<T>?
var tasks = new BlockingCollection<ITask>();
// add tasks, if this is an expensive process, put it out onto a Task
// tasks.Add(x);
// we're done producin' (allows GetConsumingEnumerable to finish)
tasks.CompleteAdding();
RunTasks(tasks);
With a single consumer thread:
static void RunTasks(BlockingCollection<ITask> tasks)
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Execute();
// this may not be as accurate as you would like
Thread.Sleep(500);
}
}
If you have access to .Net 4.5 you can use Task.Delay:
static void RunTasks(BlockingCollection<ITask> tasks)
{
foreach (var task in tasks.GetConsumingEnumerable())
{
Task.Delay(500)
.ContinueWith(() => task.Execute())
.Wait();
}
}
I'm running this thread inside a method from a WCF service library.
The code below is executed at the end of the method. I do this because i don't want the user to wait for a background process to complete that does not affect the output from the WCF to the client.
The problem that i have now is that if i execute that thread and the client gets the response, the parent thread is killed; killing this thread as well. How do i make it so that the parent thread waits for this thread to finish, while performing the rest of the operations?
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> sampleDict = getPopulatedDictionary();
var result = run(sampleDict);
}
public static int run(Dictionary<string, string> sampleDict_)
{
PerformCalculations(sampleDict_);
if (sampleDict_.Keys.Count > 10)
{
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
backgroundprocess(sampleDict_);
});
}
//after returning i still want it to run
return sampleDict_.Keys.Count;
}
private static void backgroundprocess(Dictionary<string,string> dict)
{
foreach (var k in dict.Keys)
{
dict[k] = new Random().Next(2666).ToString();
}
}
}
In short, i want this method to kick off that thread and move onto return the value X but still wait for that thread to finish AFTER it returns the value.
Couldn't you do it as a continuation of the parent task. So execute
FameMappingEntry.SaveFameDBMap(toSaveIdentifiers); as a continuation of a successful completion of the parent task. And then you can wait on the continutation.
var childTask = parentTask.ContinueWith((pt) =>
{
FameMappingEntry.SaveFameDBMap(toSaveIdentifiers);
}, TaskContinuationOptions.OnlyOnRanToCompletion);
And then you can decide if you want to wait on the child task or use another continuation.
If you aren't going to do anything except wait for the background thread to complete, then you might as well just not create the new background thread in the first place and execute the code in-line.
Try this:
var task = System.Threading.Tasks.Task.Factory.StartNew(() =>
{
lock (toSaveIdentifiers)
{
FameMappingEntry.SaveFameDBMap(toSaveIdentifiers);
}
);
int x = dosomething();
task.Wait();
return x;
You should also lock objects in the thread that uses them, and not some other random thread.