SemaphoreSlim Works with foreach, but not for - c#

I'm a bit new to using Semaphores, and I'm a bit confused by this. All I'm doing is grabbing the html code from a page, and writing it to a file, over and over. When I use it in a foreach, it works perfectly. When I change it over to a for loop, I get a the file is used by another process exception, and it references the file that is after the max count. What is the difference between these two? They should be processing the same files.
Works perfectly:
private async Task GetFiles()
{
string strDir = ConfigurationManager.AppSettings["location"];
var maxThreads = 4;
var allTasks = new List<Task>();
SemaphoreSlim throttler = new SemaphoreSlim(initialCount: maxThreads);
foreach(FileToProcess ftpFile in lstFiles)
{
await throttler.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
await GetHtmlFromUrlAsync(ftpFile, strDir);
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
}
Gets an error after on the 5th item in the list (max thread + 1). If I adjusted the max thread count, the error would change as well:
private async Task GetFiles()
{
string strDir = ConfigurationManager.AppSettings["location"];
var maxThreads = 4;
var allTasks = new List<Task>();
SemaphoreSlim throttler = new SemaphoreSlim(initialCount: maxThreads);
for (int x = 0; x < lstFiles.Count; x++)
{
await throttler.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
await GetHtmlFromUrlAsync(lstFiles[x], strDir);
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
}

Related

Why the Task gets Cancelled

I'd like to spawn some threads and in each thread sequentially make calls to an API and aggregate the results (some sort of stress testing). Here is my attempt:
private async Task DoWork()
{
var allResponses = new List<int>();
for (int i = 0; i < 10; i++)
{
await Task.Run(() =>
{
var responses = Enumerable.Range(0, 50).Select(i => CallApiAndGetStatusCode());
allResponses.AddRange(responses);
});
}
// do more work
}
private int CallApiAndGetStatusCode()
{
try
{
var request = new HttpRequestMessage(httpMethod.Get, "some url");
var responseResult = httpClient.SendAsync(request).Result;
return (int)responseResult.StatusCode;
}
catch (Exception e)
{
logger.LogError(e, "Calling API failed");
}
}
However, this code always ends up the catch with the inner exception being {"A task was canceled."}. What am I doing wrong here?
There is no benefit to using either Enumerable.Range or .AddRange in your example, since you do not need the seeded number. Your code must be converted to async/await to avoid deadlocks and in doing so, you can simply loop inside of each task and avoid any odd interactions between Enumerable.Select and await:
private async Task DoWork()
{
var allTasks = new List<Task>(10);
var allResponses = new List<int>();
for (int i = 0; i < 10; i++)
{
allTasks.Add(Task.Run(async () =>
{
var tempResults = new List<int>();
for (int i = 0; i < 50; i++)
{
var result = await CallApiAndGetStatusCode();
if (result > 0) tempResults.Add(result);
}
if (tempResults.Count > 0)
{
lock (allResponses)
{
allResponses.AddRange(tempResults);
}
}
}));
}
await Task.WhenAll(allTasks);
// do more work
}
private async Task<int> CallApiAndGetStatusCode()
{
try
{
var request = new HttpRequestMessage(HttpMethod.Get, "some url");
var responseResult = await httpClient.SendAsync(request);
return (int)responseResult.StatusCode;
}
catch (Exception e)
{
logger.LogError(e, "Calling API failed");
}
return -1;
}
Note that this code is overly protective, locking the overall batch before adding the temp results.
I changed your code to this and work
async Task DoWork()
{
var allResponses = new List<int>();
for (int i = 0; i < 10; i++)
{
await Task.Run(() =>
{
var responses = Enumerable.Range(0, 3).Select(i => CallApiAndGetStatusCodeAsync());
allResponses.AddRange(responses.Select(x => x.Result));
});
}
// do more work
}
async Task<int> CallApiAndGetStatusCodeAsync()
{
try
{
var request = new HttpRequestMessage(HttpMethod.Get, "http://www.google.com");
var responseResult = await httpClient.SendAsync(request);
return (int)responseResult.StatusCode;
}
catch (Exception e)
{
logger.LogError(e, "Calling API failed");
return -1;
}
}

async methods in a foreach and add results in a list

I would like to run several methods asyncron in a foreach. The return value should be written to a list.
The method is executed in a WPF application. The method GetItemPricesFromJsonAsync fetches from the web data.
public async Task LoadBlackMarketListView(List<MarketAnalysisManager.ItemTier> tiers, List<MarketAnalysisManager.ItemLevel> levels,
List<MarketAnalysisManager.ItemQuality> quialityList, string outdatedHours, string profit, Location? location)
{
await Task.Run(async () =>
{
var blackMarketSellObjectList = new List<BlackMarketSellObject>();
var items = await MarketAnalysisManager.GetItemListAsync(tiers, levels);
await Dispatcher.InvokeAsync(() =>
{
PbBlackMarketMode.Minimum = 0;
PbBlackMarketMode.Maximum = items.Count;
PbBlackMarketMode.Value = 0;
GridBlackMarketMode.IsEnabled = false;
LvBlackMarket.Visibility = Visibility.Hidden;
PbBlackMarketMode.Visibility = Visibility.Visible;
});
foreach (var item in items)
{
var allItemPrices = await MarketAnalysisManager.GetItemPricesFromJsonAsync(item.UniqueName, true);
if (allItemPrices.FindAll(a => a.City == Locations.GetName(Location.BlackMarket)).Count <= 0)
{
await IncreaseBlackMarketProgressBar();
continue;
}
blackMarketSellObjectList.AddRange(await GetBlackMarketSellObjectList(item, quialityList, allItemPrices, outdatedHours, profit, location));
await IncreaseBlackMarketProgressBar();
}
await Dispatcher.InvokeAsync(() =>
{
LvBlackMarket.ItemsSource = blackMarketSellObjectList;
PbBlackMarketMode.Visibility = Visibility.Hidden;
LvBlackMarket.Visibility = Visibility.Visible;
GridBlackMarketMode.IsEnabled = true;
});
});
}
Currently it looks like he's only doing one thing at a time.
Run... 0
End... 0
Run... 1
End... 1
Run... 2
End... 2
You will need to store the Tasks, not await them. Then you can wait for all of them.
Try this (replace your foreach with my code).
I would also advise you to use a real method instead of the annonymous one, it's much more readable.
List<Task> tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(Task.Run(async () =>
{
var allItemPrices = await MarketAnalysisManager.GetItemPricesFromJsonAsync(item.UniqueName, true);
if (allItemPrices.FindAll(a => a.City == Locations.GetName(Location.BlackMarket)).Count <= 0)
{
await IncreaseBlackMarketProgressBar();
return;
}
blackMarketSellObjectList.AddRange(await GetBlackMarketSellObjectList(item, quialityList, allItemPrices, outdatedHours, profit, location));
await IncreaseBlackMarketProgressBar();
}));
}
await Task.WhenAll(tasks);
Note: There is now a return instead of a continue since this is an annonymous function and you just have to end the function there instead of continuing with the foreach.

Task.Whenall(List<Task>) outputs OutOfRangeException

I have two loops that use SemaphoreSlim and a array of strings "Contents"
a foreachloop:
var allTasks = new List<Task>();
var throttle = new SemaphoreSlim(10,10);
foreach (string s in Contents)
{
await throttle.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
rootResponse.Add(await POSTAsync(s, siteurl, src, target));
}
finally
{
throttle.Release();
}
}));
}
await Task.WhenAll(allTasks);
a for loop:
var allTasks = new List<Task>();
var throttle = new SemaphoreSlim(10,10);
for(int s=0;s<Contents.Count;s++)
{
await throttle.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
rootResponse[s] = await POSTAsync(Contents[s], siteurl, src, target);
}
finally
{
throttle.Release();
}
}));
}
await Task.WhenAll(allTasks);
the first foreach loop runs well, but the for loops Task.WhenAll(allTasks) returns a OutOfRangeException and I want the Contents[] index and List index to match.
Can I fix the for loop? or is there a better approach?
This would fix your current problems
for (int s = 0; s < Contents.Count; s++)
{
var content = Contents[s];
allTasks.Add(
Task.Run(async () =>
{
await throttle.WaitAsync();
try
{
rootResponse[s] = await POSTAsync(content, siteurl, src, target);
}
finally
{
throttle.Release();
}
}));
}
await Task.WhenAll(allTasks);
However this is a fairly messy and nasty piece of code. This looks a bit neater
public static async Task DoStuffAsync(Content[] contents, string siteurl, string src, string target)
{
var throttle = new SemaphoreSlim(10, 10);
// local method
async Task<(Content, SomeResponse)> PostAsyncWrapper(Content content)
{
await throttle.WaitAsync();
try
{
// return a content and result pair
return (content, await PostAsync(content, siteurl, src, target));
}
finally
{
throttle.Release();
}
}
var results = await Task.WhenAll(contents.Select(PostAsyncWrapper));
// do stuff with your results pairs here
}
There are many other ways you could do this, PLinq, Parallel.For,Parallel.ForEach, Or just tidying up your captures in your loops like above.
However since you have an IO bound work load, and you have async methods that run it. The most appropriate solution is the async await pattern which neither Parallel.For,Parallel.ForEach cater for optimally.
Another way is TPL DataFlow library which can be found in the System.Threading.Tasks.Dataflow nuget package.
Code
public static async Task DoStuffAsync(Content[] contents, string siteurl, string src, string target)
{
async Task<(Content, SomeResponse)> PostAsyncWrapper(Content content)
{
return (content, await PostAsync(content, siteurl, src, target));
}
var bufferblock = new BufferBlock<(Content, SomeResponse)>();
var actionBlock = new TransformBlock<Content, (Content, SomeResponse)>(
content => PostAsyncWrapper(content),
new ExecutionDataflowBlockOptions
{
EnsureOrdered = false,
MaxDegreeOfParallelism = 100,
SingleProducerConstrained = true
});
actionBlock.LinkTo(bufferblock);
foreach (var content in contents)
actionBlock.Post(content);
actionBlock.Complete();
await actionBlock.Completion;
if (bufferblock.TryReceiveAll(out var result))
{
// do stuff with your results pairs here
}
}
Basically this creates a BufferBlock And TransformBlock, You pump your work load into the TransformBlock, it has degrees of parallel in its options, and it pushes them into the BufferBlock, you await completion and get your results.
Why Dataflow? because it deals with the async await, it has MaxDegreeOfParallelism, it designed for IO bound or CPU Bound workloads, and its extremely simple to use. Additionally, as most data is generally processed in many ways (in a pipeline), you can then use it to pipe and manipulate streams of data in sequence and parallel or in any way way you choose down the line.
Anyway good luck

Stop semphore task execution if exception occurs in current running task

I have the following test code to simulate the use of a semaphore and throttling task execution. Is there a way to not continue to create new tasks if one of the running tasks throws an exception like the below. I don't need the existing tasks to stop running, I just want no new tasks to start after the exception is encountered.
Currently the tasks will all start in this scenario below. I want it to stop after a few tasks are ran because of the exceptions being thrown.
var testStrings = new List<string>();
for (var i = 0; i < 5000; i++)
{
testStrings.Add($"string-{i}");
}
using (var semaphore = new SemaphoreSlim(10))
{
var tasks = testStrings.Select(async testString =>
{
await semaphore.WaitAsync();
try
{
Console.WriteLine($"{testString}-Start");
await Task.Delay(2000);
throw new Exception("test");
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(tasks);
}
As per Sinatr's comments, I think this may work for me by adding a cancellation token to be monitored.
var testStrings = new List<string>();
for (var i = 0; i < 5000; i++)
{
testStrings.Add($"string-{i}");
}
var cancellationTokenSource = new CancellationTokenSource();
var cancellationToken = cancellationTokenSource.Token;
using (var semaphore = new SemaphoreSlim(10))
{
var tasks = testStrings.Select(async testString =>
{
await semaphore.WaitAsync();
try
{
if (!cancellationToken.IsCancellationRequested)
{
Console.WriteLine($"{testString}-Start");
await Task.Delay(2000);
throw new Exception("test");
}
}
catch (Exception ex)
{
if (!cancellationTokenSource.IsCancellationRequested)
cancellationTokenSource.Cancel();
throw;
}
finally
{
semaphore.Release();
}
});
await Task.WhenAll(tasks);
}

Task.WaitSubset / Task.WaitN?

There're Task.WaitAll method which waits for all tasks and Task.WaitAny method which waits for one task. How to wait for any N tasks?
Use case: search result pages are downloaded, each result needs a separate task to download and process it. If I use WaitAll to wait for the results of the subtasks before getting next search result page, I will not use all available resources (one long task will delay the rest). Not waiting at all can cause thousands of tasks to be queued which isn't the best idea either.
So, how to wait for a subset of tasks to be completed? Or, alternatively, how to wait for the task scheduler queue to have only N tasks?
This looks like an excellent problem for TPL Dataflow, which will allow you to control parallelism and buffering to process at maximum speed.
Here's some (untested) code to show you what I mean:
static void Process()
{
var searchReader =
new TransformManyBlock<SearchResult, SearchResult>(async uri =>
{
// return a list of search results at uri.
return new[]
{
new SearchResult
{
IsResult = true,
Uri = "http://foo.com"
},
new SearchResult
{
// return the next search result page here.
IsResult = false,
Uri = "http://google.com/next"
}
};
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 8, // restrict buffer size.
MaxDegreeOfParallelism = 4 // control parallelism.
});
// link "next" pages back to the searchReader.
searchReader.LinkTo(searchReader, x => !x.IsResult);
var resultActor = new ActionBlock<SearchResult>(async uri =>
{
// do something with the search result.
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 64,
MaxDegreeOfParallelism = 16
});
// link search results into resultActor.
searchReader.LinkTo(resultActor, x => x.IsResult);
// put in the first piece of input.
searchReader.Post(new SearchResult { Uri = "http://google/first" });
}
struct SearchResult
{
public bool IsResult { get; set; }
public string Uri { get; set; }
}
I think you should independently limit the number of parallel download tasks and the number of concurrent result processing tasks. I would do it using two SemaphoreSlim objects, like below. This version doesn't use the synchronous SemaphoreSlim.Wait (thanks #svick for making the point). It was only slightly tested, the exception handling can be improved; substitute your own DownloadNextPageAsync and ProcessResults:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Console_21666797
{
partial class Program
{
// the actual download method
// async Task<string> DownloadNextPageAsync(string url) { ... }
// the actual process methods
// void ProcessResults(string data) { ... }
// download and process all pages
async Task DownloadAndProcessAllAsync(
string startUrl, int maxDownloads, int maxProcesses)
{
// max parallel downloads
var downloadSemaphore = new SemaphoreSlim(maxDownloads);
// max parallel processing tasks
var processSemaphore = new SemaphoreSlim(maxProcesses);
var tasks = new HashSet<Task>();
var complete = false;
var protect = new Object(); // protect tasks
var page = 0;
// do the page
Func<string, Task> doPageAsync = async (url) =>
{
bool downloadSemaphoreAcquired = true;
try
{
// download the page
var data = await DownloadNextPageAsync(
url).ConfigureAwait(false);
if (String.IsNullOrEmpty(data))
{
Volatile.Write(ref complete, true);
}
else
{
// enable the next download to happen
downloadSemaphore.Release();
downloadSemaphoreAcquired = false;
// process this download
await processSemaphore.WaitAsync();
try
{
await Task.Run(() => ProcessResults(data));
}
finally
{
processSemaphore.Release();
}
}
}
catch (Exception)
{
Volatile.Write(ref complete, true);
throw;
}
finally
{
if (downloadSemaphoreAcquired)
downloadSemaphore.Release();
}
};
// do the page and save the task
Func<string, Task> queuePageAsync = async (url) =>
{
var task = doPageAsync(url);
lock (protect)
tasks.Add(task);
await task;
lock (protect)
tasks.Remove(task);
};
// process pages in a loop until complete is true
while (!Volatile.Read(ref complete))
{
page++;
// acquire download semaphore synchrnously
await downloadSemaphore.WaitAsync().ConfigureAwait(false);
// do the page
var task = queuePageAsync(startUrl + "?page=" + page);
}
// await completion of the pending tasks
Task[] pendingTasks;
lock (protect)
pendingTasks = tasks.ToArray();
await Task.WhenAll(pendingTasks);
}
static void Main(string[] args)
{
new Program().DownloadAndProcessAllAsync("http://google.com", 10, 5).Wait();
Console.ReadLine();
}
}
}
Something like this should work. There might be some edge cases, but all in all it should ensure a minimum of completions.
public static async Task WhenN(IEnumerable<Task> tasks, int n, CancellationTokenSource cts = null)
{
var pending = new HashSet<Task>(tasks);
if (n > pending.Count)
{
n = pending.Count;
// or throw
}
var completed = 0;
while (completed != n)
{
var completedTask = await Task.WhenAny(pending);
pending.Remove(completedTask);
completed++;
}
if (cts != null)
{
cts.Cancel();
}
}
Usage:
static void Main(string[] args)
{
var tasks = new List<Task>();
var completed = 0;
var cts = new CancellationTokenSource();
for (int i = 0; i < 100; i++)
{
tasks.Add(Task.Run(async () =>
{
await Task.Delay(temp * 100, cts.Token);
Console.WriteLine("Completed task {0}", i);
completed++;
}, cts.Token));
}
Extensions.WhenN(tasks, 30, cts).Wait();
Console.WriteLine(completed);
Console.ReadLine();
}
Task[] runningTasks = MyTasksFactory.StartTasks();
while(runningTasks.Any())
{
int finished = Task.WaitAny(runningTasks);
Task.Factory.StareNew(()=> {Consume(runningTasks[Finished].Result);})
runningTasks.RemoveAt(finished);
}

Categories

Resources