WaitAll for tasks from dictionary and get result, including Dictionary key - c#

I have to modify the following code. I have given a list of tasks (httpClient.Run method returns Task). I have to run them and wait until all of them are done. Later on I need to collect all results and build response.
var tasks = new Dictionary<string, Task<string>>();
tasks.Add("CODE1", service.RunTask("CODE1"));
tasks.Add("CODE2", service.RunTask("CODE2"));
tasks.Add("CODE3", service.RunTask("CODE3"));
//...
var result = await Task.WhenAll(tasks.Values); // how to get CODE (dictionary KEY) here
// build response
The problem above is that when we get results we have lost exactly which task was run. results is string array, but I need, for instance,KeyValuePair array. We need to know which task (CODE) was run, so that we can build result properly.

You can use async in a Select lambda to transform the KeyValuePair<T1, Task<T2>s into Task<KeyValuePair<T1, T2>>s.
var resultTasks = tasks.Select(async pair => KeyValuePair.Create(pair.Key, await pair.Value));
IReadOnlyCollection<KeyValuePair<string, string>> results = await Task.WhenAll(resultTasks);

Something like this ought to do it. This is written assuming you do the Task.WhenAll() call first. Otherwise, you'll block the thread when you start to enumerate over resultsWithKeys.
await Task.WhenAll(tasks.Values());
var resultsWithKeys = tasks.Select(
x => new
{
Key = x.Key,
Result = x.Value.Result
});
foreach (var result in resultsWithKeys)
Console.WriteLine($"{result.Key} - {result.Result.SomeValue}");

The Dictionary<string, Task<string>> contains tasks that do not propagate the key as part of their Result. If you prefer to have tasks that include the key in their result, you'll have to create a new dictionary and fill it with new tasks that wrap the existing tasks. The TResult of these new tasks can be a struct of your choice, for example a KeyValuePair<string, string>. Below is an extension method WithKeys that allows to create easily the new dictionary with the new tasks:
public static Dictionary<TKey, Task<KeyValuePair<TKey, TValue>>> WithKeys<TKey, TValue>(
this Dictionary<TKey, Task<TValue>> source)
{
return source.ToDictionary(e => e.Key,
async e => KeyValuePair.Create(e.Key, await e.Value.ConfigureAwait(false)),
source.Comparer);
}
Usage example:
KeyValuePair<string, string>[] result = await Task.WhenAll(tasks.WithKeys().Values);

Related

How to convert Dictionary<string, Task<int>> to Task<Dictionary<string, int>> [duplicate]

This question already has answers here:
Convert a dictionary of tasks into a dictionary of results
(3 answers)
How to hydrate a Dictionary with the results of async calls?
(4 answers)
Closed 5 months ago.
I am working on retrieving about 10 different types of data using http-requests with rather complex dependencies between them. And I am trying to find my way through this in an elegant, readable and maintainable way without waiting unnecessarily.
Let's assume, some method creates a Dictionary<string, Task<int>>. What is the most elegant way to convert this into Task<Dictionary<string, int>>?
The new outer Task should finish, as soon as all Tasks contained in the dictionary are finished.
Of course, I can write this manually:
Dictionary<string, Task<int>> values = GetValues();
Task<Dictionary<string, int>> result = Task.Run(async () => {
Dictionary<string, int> rewrapped = new();
foreach (var entry in values) {
rewrapped.Add(entry.Key, await entry.Value);
}
return rewrapped;
});
But isn't there a better way?
Rule of thumb: Never use Task.Run instead of it being truly async, it will use a thread to mimic asyncronity!
You just have to await the values, then get the Results:
Dictionary<string, Task<int>> values = GetValues();
await Task.WhenAll(values.Values);
Dictionary<string, int> results = values.ToDictionary(p => p.Key, p => p.Value.Result);
You can wrap the last two lines in a method, if you really want a Task<Dictionary<string, int>>.
async Task<Dictionary<string, int>> Unwrap(Dictionary<string, Task<int>> values)
{
await Task.WhenAll(values.Values);
return values.ToDictionary(p => p.Key, p => p.Value.Result);
}
The request is unusual. A Task isn't a value, it's a promise that something will complete in the future. To get the desired result the code will have to await all the tasks, retrieve their results, then return all of them in a Dictionary<string, int>.
There's almost certainly a better way to solve the actual problem.
One quick and dirty example would be :
async Task<ConcurrentDictionary<string,T>> GetValues<T>(CancellationToken token=default)
{
var dict=new ConcurrentDictionary<string,T>();
try
{
await Parallel.ForEachAsync(_urls,token, async (url,tk)=>{
var res=await _httpClient.GetStringAsync(url,tk);
dict[url]=someResult;
});
}
catch(OperationCancelledException){}
return dict;
}
There are far better ways to solve the actual problem though - execute interdependent HttpClient requests. .NET offers several ways to construct asynchronous processing pipelines: Dataflow blocks, Channels, IAsyncEnumerable.
Dataflow Blocks
For example, using Dataflow blocks you can create a pipeline that downloads CSV files, parses them, then inserts the data into a database.
These options specify that 8 CSV files will be downloaded concurrently and two parsed concurrently.
var downloadDOP=8;
var parseDOP=2;
var tableName="SomeTable";
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
var downloadOptions =new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = downloadDOP,
};
var parseOptions =new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = parseDOP,
};
The following code creates the pipeline
HttpClient http=new HttpClient(...);
var downloader=new TransformBlock<(Uri,string),FileInfo>(async (uri,path)=>{
var file=new FileInfo(path);
using var stream =await httpClient.GetStreamAsync(uri);
using var fileStream=file.Create();
await stream.CopyToAsync(stream);
return file;
},downloadOptions);
var parser=new TransformBlock<FileInfo,Foo[]>(async file=>{
using var reader = file.OpenText();
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
var records = csv.GetRecords<Foo>().ToList();
return records;
},parseOptions);
var importer=new ActionBlock<Foo[]>(async recs=>{
using var bcp=new SqlBulkCopy(connectionString, SqlBulkCopyOptions.TableLock);
bcp.DestinationTableName=tableName;
//Map columns if needed
...
using var reader=ObjectReader.Create(recs);
await bcp.WriteToServerAsync(reader);
});
downloader.LinkTo(parser,linkOptions);
parser.LinkTo(importer,linkOptions);
Once you have the pipeline, you can start posting URLs to it and await for the entire pipeline to complete:
IEnumerable<(Uri,string)> filesToDownload = ...
foreach(var pair in filesToDownload)
{
await downloader.SendAsync(pair);
}
downloader.Complete();
await importer.Completion;

Is this dictionary operation thread-safe?

I understand that regular dictionary read/write operations aren't threadsafe. What about this case?
class Data{ public int data{get; set;}}
and
var dict = new Dictionary<int, Data>();
Can this be thread safe?
await Task.WhenAll(dict
.Select(async t =>
t.Value.data = t.Value.data+1;
await somefunc(t.Value.data);
);
Is there a better way to rewrite this operation without using a ConcurrentDictionary since there is just one instance that I need this behaviour supported.

How to instantiate a specific number of items in a dictionary?

I'm trying to declare a Dictionary<Task> with a specific number of items, I tried:
private Dictionary<Task, CancellationToken> bots =
new Dictionary<Task, CancellationToken>(new Task[9], new CancellationToken[9]);
this will return the following error:
you can not convert from 'System.Threading.Tasks.Task []' to 'System.Collections.Generic.IDictionary '
all works if I do this in a List:
private List<Task> bots = new List<Task>(new Task[9]);
As the error shows you're trying something that doesn't exists.
One of the override Dictionary constructor accepts
public Dictionary(IDictionary<TKey, TValue> dictionary);
public Dictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey> comparer);
And neither of the parameters you gave is correct.
The 1st input you gave is Task[] and the 2nd is CancellationToken[]
You should create a IDictionary implementation which is usually a Dictionary and then passing it to it.
var example1Original = new Dictionary<Task, CancellationToken>();
example1Original.Add(new Task(DoWork), new CancellationToken());
example1Original.Add(new Task(DoWork), new CancellationToken());
// and more (This procedure can be shorten using a loop)
var example1Result = new Dictionary<Task, CancellationToken>(example1Original);
As you can see, we successfully passed our variable into the Dictionary constructor, this is possible because Dictionary implements IDictionary as we can see here
But the last line is actually redundant, because yes we can pass it, but we don't need to. Because our populated example1Original is already a Dictionary which is what we're aiming to.
So it begs the question, why the Dictionary constructor has it in the first place. Which leads us to our original statement, that IDictionary can have multi implementations which can be passed.
Here are few of IDictionary implementations
(Picture taken from mscorlib.dll using ILSpy)
So your question is actually, how can i populate my Dictionary with a new instances of Task and Cancellation tokens.
This can be done with:
Previous above code. (And shorten more by a loop)
Or using a nice language capability in a shorter manner.
Capabilities we're going to use
System.Linq.Enumerable.Range - Generates a sequence of integral numbers within a specified range.
System.Linq.Enumerable.Select - Projects each element of a sequence into a new form.
The power of Interfaces - For allowing us to use the ToDictionary extension method.
System.Linq.Enumrable.ToDictionary() - Extension method that takes IEnumerable and generates a Dictionary
Enumerable.ToDictionary - Because IDictionary itself implements IEnumerable we can then use the following ToDictionary extension method
Extension method from System.Linq namespace
public static Dictionary<TKey, TElement> ToDictionary<TSource, TKey, TElement>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector);
If we will use those capabilities we can build the following to generate our Dictionary.
var kvpOfTaskCancellation = Enumerable.Range(0, 9) // Generates Enumerable integers sequence.
.Select(i => new KeyValuePair<Task, CancellationToken>(new Task(DoWork), new CancellationToken())) // Iterating and projecting every elements inside the previous generated sequence.
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value); // Mapping each iteration from the previous line KeyValuePair objects to the Dictionary Key and Value.
Which can also be shorten to the following
var kvpOfTaskCancellation2 = Enumerable.Range(0, 9)
.ToDictionary(kvp => new Task(DoWork), kvp => new CancellationToken());
This all works if you want a new Task and Cancellation token.
But if you already have a filled collection of Tasks and CancellationTokens and you want to generate from them a Dictionary, then you can do the following:
var tasks = new Task[3];
// I'm assuming the tasks already been populated
tasks.ToDictionary(kvp => kvp, kvp => new CancellationToken());
But if you also have a CancellationToken array as well, then you can use the following:
var tasks = new Task[3];
var cancellationsTokens = new CancellationToken[9];
// I'm assuming tasks and cancellationToken array already been filled.
Enumerable.Range(0, tasks.Length)
.Select(i => new KeyValuePair<Task, CancellationToken>(tasks[i], cancellationsTokens[i]))
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
An uninitialized array of Task only contains null elements and you can not have null as the key of a dictionary item, but in general, if you have initialized arrays, you can have dictionaries with those arrays using Linq:
var dictionary = (from token in new CancellationToken[9]
from task in initializedTaskArray
select (task, token)
)
.ToDictionary(x => x.task, x => x.token);
You would need to do something like the following:
private Dictionary<Task, CancellationToken> bots = new Dictionary<Task, CancellationToken>() {
{ new Task(), new CancellationToken() },
{ new Task(), new CancellationToken() },
...
}

How is this parallel for not processing all elements?

I've created this normal for loop:
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
Dictionary<string, Dictionary<string, bool>> filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
foreach (var item in files)
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
}
return filesAnalyzed;
}
The loop just checks if each file that is in the variable "files" has all the dependencies specified in the variable "dependencies".
the "files" variable should only have unique elements because it is used as the key for the result, a dictionary, but I check this before calling the method.
The for loop works correctly and all elements are processed in single thread, so I wanted to increase the performance by changing to a parallel for loop, the problem is that not all the elements that come from the "files" variable are being processed in the parallel for (in my test case I get 30 elements instead of 53).
I've tried to increase the timespan, or to remove all the "Monitor.TryEnter" code and use just a lock(filesAnalyzed) but still got the same result
I'm not very familiar with the paraller for, so it might be something in the syntax that I'm using.
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = new Dictionary<string, Dictionary<string, bool>>();
Parallel.For<KeyValuePair<string, Dictionary<string, bool>>>(
//start index
0,
//end index
files.Count(),
// initialization?
()=>new KeyValuePair<string, Dictionary<string, bool>>(),
(index, loop, result) =>
{
var temp = new KeyValuePair<string, Dictionary<string, bool>>(
files.ElementAt(index),
AnalyzeFile(files.ElementAt(index), dependencies));
return temp;
}
,
//finally
(x) =>
{
if (Monitor.TryEnter(filesAnalyzed, new TimeSpan(0, 0, 30)))
{
try
{
filesAnalyzed.Add(x.Key, x.Value);
}
finally
{
Monitor.Exit(filesAnalyzed);
}
}
}
);
return filesAnalyzed;
}
any feedback is appreciated
Assuming the code inside AnalyzeFile and dependencies is thread safe, how about something like this:
var filesAnalyzed = files
.AsParellel()
.Select(x => new{Item = x, File = AnalyzeFile(x, dependencies)})
.ToDictionary(x => x.Item, x=> x.File);
Rewrite your normal loop this way:
Parallel.Foreach(files, item=>
{
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
You should also use ConcurrentDictionary except Dictionary to make all process thread-safe
You can simplify your code a lot if you use Parallel LINQ instead :
public static Dictionary<string,Dictionary<string,bool>> AnalyzeFiles(IEnumerable<string> files, IEnumerable<string> dependencies)
{
var filesAnalyzed = ( from item in files.AsParallel()
let result=AnalyzeFile(item, dependencies)
select (Item:item,Result:result)
).ToDictionary( it=>it.Item,it=>it.Result)
return filesAnalyzed;
}
I used tuple syntax in this case to avoid noise. It also cuts down on allocations.
Using method syntax, the same can be written as :
var filesAnalyzed = files.AsParallel()
.Select(item=> (item, AnalyzeFile(item, dependencies)))
.ToDictionary( it=>it.Item,it=>it.Result)
Dictionary<> isn't thread-safe for modification. If you wanted to use Parallel.ForEach without locking, you'd have to use ConcurrentDictionary
var filesAnalyzed = ConcurrentDictionary<string,Dictionary<string,bool>>;
Parallel.ForEach(files,file => {
filesAnalyzed[item] = AnalyzeFile(item, dependencies);
});
In this case at least, there is no benefit in using Parallel over PLINQ.
Hard to say what is exactly going wrong without debugging the code. Just looking at it though I would have used a ConcurrentDictionary for filesAnalyzed variable instead of a normal `Dictionary and get rid of the Monitor.
I would also check whether same key already exists in the dictionary filesAnalyzed, it could be that you are trying to add a kvp withthe key that is added to the dictionary already.

C# Multithreading String Array

I feel super confused... I am trying to implement an asynchronous C# call to a Web API to translate a list of values, the result I expect is another list in a 1 to 1 fashion. We don't mind about order, we are just interested in speed and to our knowledge the servers are capable to process the load.
private object ReadFileToEnd(string filePath)
{
//file read logic and validations...
string[] rowData = new string[4]; //array with initial value
rowData = translateData(rowData);
}
private async Task<List<string>> translateData(string[] Collection)
{
//The resulting string collection.
List<string> resultCollection = new List<string>();
Dictionary dict = new Dictionary();
foreach (string value in Collection)
{
Person person = await Task.Run(() => dict.getNewValue(param1, param2, value.Substring(0, 10)));
value.Remove(0, 10);
resultCollection.Add(person.Property1 + value);
}
return resultCollection;
}
I might have other problems, like the return type, I am just not getting it to work. My main focus is the multithread and returning an string array. The main thread is coming from ReadFileToEnd(...) already noticed that if I add the await it will require to add async to the function, I am trying not to change too much.
Use a Parallel ForEach to iterate and remove the await call inside each loop iteration.
private IEnumerable<string> translateData(string[] Collection)
{
//The resulting string collection.
var resultCollection = new ConcurrentBag<string>();
Dictionary dict = new Dictionary();
Parallel.ForEach(Collection,
value =>
{
var person = dict.getNewValue(param1, param2, value.Substring(0, 10));
value.Remove(0, 10);
resultCollection.Add(person.Property1 + value);
});
return resultCollection;
}
Your attempt and parallelism is not correct. You are doing nothing if everytime you send a Parallel request to the translate you stop your current iteration and wait for a result (without continuing the loop).
Hope this help!

Categories

Resources