How to iterate dictionary in "await foreach" - c#

Help with a small problem...
I have a method that returns a dictionary. I need to rewrite it in such a way that I can enumerate the result of this method using await foreach. Please help me, something is not working for me at all.
It's my method:
public IDictionary<long, long> TransformListInDictionary(IList<string> list)
{
var result = new Dictionary<long, long>();
foreach (var member in list)
{
var idx = member.IndexOf(':');
var key = member.Substring(0, idx);
var value = member.Substring(idx + 1);
result.Add(Convert.ToInt64(key), Convert.ToInt64(value));
}
return result;
}
To use a dictionary in an await foreach loop, this method requires a return value of IAsyncEnumerable<KeyValuePair<long,long>>.
Therefore, I have a question, how to rewrite the method published above is the return value.
Why do I need it.
I have some code which I am posting below. I'll try to describe my idea.
When the code enters the for loop, it does some work and instantiates the dictionary. Which is further processed in the foreach loop.
var logic = new AllLogic();
var variable = Convert.ToInt32(Console.ReadLine());
var randomSecundForPause = new Random();
for (int i = 0; i < variable; i++)
{
var list = new List<string>();
var dict = logic.TransformListInDictionary(list);
//some code
foreach (var item in dict)
{
try
{
//some code
Thread.Sleep(randomSecundForPause.Next(100000, 150000));
}
catch (Exception e)
{
//some code
Thread.Sleep(randomSecundForPause.Next(100000, 150000));
}
}
}
I would like this foreach loop to run in the background and the main code flow to go to a new iteration of the for loop.
As I understand it, I need to replace foreach with await foreach.

What you want is probably an asynchronous iterator:
#pragma warning disable CS1998
public async IAsyncEnumerable<KeyValuePair<long, long>> ToAsyncEnumerable(
#pragma warning restore CS1998
IList<string> list)
{
foreach (var member in list)
{
var idx = member.IndexOf(':');
var key = member.Substring(0, idx);
var value = member.Substring(idx + 1);
yield return KeyValuePair.Create(Convert.ToInt64(key), Convert.ToInt64(value));
}
}
The method must be async, it must have IAsyncEnumerable<X> as return type, and it must contain the yield contextual keyword.
The #pragma warning disable CS1998 is needed in order to suppress a warning about an async method that lacks the await keyword. Without it the program will still compile, but the C# compiler will emit a warning.

I would like this foreach loop to run in the background and the main
code flow to go to a new iteration of the for loop.
You don't need an await foreach. The for loop body would not continue to the next iteration once it hits an await of any kind. Instead, it would asynchronously wait for whatever it is to finish, and only when it does will the next iteration of the for loop start.
If you want to understand it better, try running this code:
for(int i = 0; i < 10; i++)
{
Console.WriteLine($"{i} Started");
await Task.Delay(2000);
Console.WriteLine($"{i} Finished");
}
If you want the foreach body to run in the background, you need to wrap the body inside a Task.Run(...) call. This call would return a Task, so store the Task in a collection and await Task.WhenAll(...) afterwards.
Would something like this:
var tasks = new List<Task>(variable);
for (int i = 0; i < variable; i++)
{
var list = new List<string>();
var dict = logic.TransformListInDictionary(list);
//some code
var task = Task.Run(async () =>
{
foreach (var item in dict)
{
try
{
//some code
await Task.Delay(randomSecundForPause.Next(100000, 150000)); // This is better because the thread is not blocked
}
catch (Exception e)
{
//some code
await Task.Delay(randomSecundForPause.Next(100000, 150000));
}
}
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
Notice that the Dictionary class is thread-safe for multiple readers, but not for multiple writers, so depending on what //some code does you might want to consider a ConcurrentDictionary.

Related

Elegant way to get a task for async code without running the task immediately

I have the following code that does what I want but I had to resort to using .GetAwaiter().GetResult() in the middle of asynchronous code to get it. I am wondering if there is an elegant way to achieve this without resorting to such hacks.
This is a simplified version of the code I have.
public async Task<string[]> GetValues(int[] keys)
{
List<int> keysNotYetActivelyRequested = null;
// don't start the task at this point because the
// keysNotYetActivelyRequested is not yet populated
var taskToCreateWithoutStarting = new Task<Dictionary<int, string>>(
() => GetValuesFromApi(keysNotYetActivelyRequested.ToArray())
.GetAwaiter().GetResult() /*not the best idea*/);
(var allTasksToAwait, keysNotYetActivelyRequested) = GetAllTasksToAwait(
keys, taskToCreateWithoutStarting);
if (keysNotYetActivelyRequested.Any())
{
// keysNotYetActivelyRequested will be empty when all keys
// are already part of another active request
taskToCreateWithoutStarting.Start(TaskScheduler.Current);
}
var allResults = await Task.WhenAll(allTasksToAwait);
var theReturn = new string[keys.Length];
for (int i = 0; i < keys.Length; i++)
{
foreach (var result in allResults)
{
if (result.TryGetValue(keys[i], out var value))
{
theReturn[i] = value;
}
}
}
if (keysNotYetActivelyRequested.Any())
{
taskToCreateWithoutStarting.Dispose();
}
return theReturn;
}
// all active requests indexed by the key, used to avoid generating
// multiple requests for the same key
private Dictionary<int, Task<Dictionary<int, string>>> _activeRequests = new();
private (HashSet<Task<Dictionary<int, string>>> allTasksToAwait,
List<int> keysNotYetActivelyRequested) GetAllTasksToAwait(
int[] keys, Task<Dictionary<int, string>> taskToCreateWithoutStarting)
{
var keysNotYetActivelyRequested = new List<int>();
// a HashSet because each task will have multiple keys hence _activeRequests
// will have the same task multiple times
var allTasksToAwait = new HashSet<Task<Dictionary<int, string>>>();
// add cleanup to the task to remove the requested keys from _activeRequests
// once it completes
var taskWithCleanup = taskToCreateWithoutStarting.ContinueWith(_ =>
{
lock (_activeRequests)
{
foreach (var key in keysNotYetActivelyRequested)
{
_activeRequests.Remove(key);
}
}
});
lock (_activeRequests)
{
foreach (var key in keys)
{
// use CollectionsMarshal to avoid a lookup for the same key twice
ref var refToTask = ref CollectionsMarshal.GetValueRefOrAddDefault(
_activeRequests, key, out var exists);
if (exists)
{
allTasksToAwait.Add(refToTask);
}
else
{
refToTask = taskToCreateWithoutStarting;
allTasksToAwait.Add(taskToCreateWithoutStarting);
keysNotYetActivelyRequested.Add(key);
}
}
}
return (allTasksToAwait, keysNotYetActivelyRequested);
}
// not the actual code
private async Task<Dictionary<int, string>> GetValuesFromApi(int[] keys)
{
// request duration dependent on the number of keys
await Task.Delay(keys.Length);
return keys.ToDictionary(k => k, k => k.ToString());
}
And a test method:
[Test]
public void TestGetValues()
{
var random = new Random();
var allTasks = new Task[10];
for (int i = 0; i < 10; i++)
{
var arrayofRandomInts = Enumerable.Repeat(random, random.Next(1, 100))
.Select(r => r.Next(1, 100)).ToArray();
allTasks[i] = GetValues(arrayofRandomInts);
}
Assert.DoesNotThrowAsync(() => Task.WhenAll(allTasks));
Assert.That(_activeRequests.Count, Is.EqualTo(0));
}
Instead of:
Task<Something> coldTask = new(() => GetAsync().GetAwaiter().GetResult());
You can do it like this:
Task<Task<Something>> coldTaskTask = new(() => GetAsync());
Task<Something> proxyTask = coldTaskTask.Unwrap();
The nested task coldTaskTask is the task that you will later Start (or RunSynchronously).
The unwrapped task proxyTask is a proxy that represents both the invocation of the GetAsync method, as well as the completion of the Task<Something> that this method generates.
You should never use the task constructor.
If you want to refer to some code to execute later, use a delegate. Just like you would with synchronous code. The delegate types for asynchronous code are slightly different, but they're still just delegates.
Func<Task<Dictionary<int, string>>> getValuesAsync = () => GetValuesFromApi(keysNotYetActivelyRequested.ToArray());
...
var result = await getValuesAsync();
Also, I strongly recommend replacing ContinueWith with await.
All links are to my blog.

Async generator, previous iterations await a future iteration?

I want to generate an enumerable of tasks, the tasks will complete at different times.
How can I make a generator in C# that:
yields tasks
every few iterations, resolves previously yielded tasks with results that are only now known
The reason I want to do this is because I am processing a long iterable of inputs, and every so often I accumulate enough data from these inputs to send a batch API request and finalise my outputs.
Pseudocode:
IEnumerable<Task<Output>> Process(IEnumerable<Input> inputs)
{
var queuedInputs = Queue<Input>();
var cumulativeLength = 0;
foreach (var input in inputs)
{
yield return waiting task for this input
queuedInputs.Enqueue(input);
cumulativeLength += input.Length;
if (cumulativeLength > 10)
{
cumulativeLength = 0
GetFromAPI(queue).ContinueWith((apiTask) => {
Queue<BatchResult> batchResults = apiTask.Result;
while (queuedInputs.Count > 0)
{
batchResult = batchResults.Dequeue();
historicalInput = queuedInputs.Dequeue();
var output = MakeOutput(historicalInput, batchResult);
resolve earlier input's task with this output
}
});
}
}
}
The shape of your solution is going to be driven by the shape of your problem. There's a couple of questions I have because your problem domain seems odd:
Are all your inputs known at the outset? The (synchronous) IEnumerable<Input> implies they are.
Are you sure you want to wait for a batch of inputs before sending any query? What about the "remainder" if you're batching by 10 but have 55 inputs?
Assuming you do have synchronous inputs, and that you want to batch with remainders, you can just accumulate all your inputs immediately, batch them, and walk the batches, asynchronously providing outputs:
async IAsyncEnumerable<Output> Process(IEnumerable<Input> inputs)
{
foreach (var batchedInput in inputs.Batch(10))
{
var batchResults = await GetFromAPI(batchedInput);
for (int i = 0; i != batchedInput.Count; ++i)
yield return MakeOutput(batchedInput[i], batchResults[i]);
}
}
public static IEnumerable<IReadOnlyList<TSource>> Batch<TSource>(this IEnumerable<TSource> source, int size)
{
List<TSource>? batch = null;
foreach (var item in source)
{
batch ??= new List<TSource>(capacity: size);
batch.Add(item);
if (batch.Count == size)
{
yield return batch;
batch = null;
}
}
if (batch?.Count > 0)
yield return batch;
}
Update:
If you want to start the API calls immediately, you can move those out of the loop:
async IAsyncEnumerable<Output> Process(IEnumerable<Input> inputs)
{
var batchedInputs = inputs.Batch(10).ToList();
var apiCallTasks = batchedInputs.Select(GetFromAPI).ToList();
foreach (int i = 0; i != apiCallTasks.Count; ++i)
{
var batchResults = await apiCallTasks[i];
var batchedInput = batchedInputs[i];
for (int j = 0; j != batchedInput.Count; ++j)
yield return MakeOutput(batchedInput[j], batchResults[j]);
}
}
One approach is to use the TPL Dataflow library. This library offers a variety of components named "blocks" (TransformBlock, ActionBlock etc), where each block is processing its input data, and then propagates the results to the next block. The blocks are linked together so that the completion of the previous block in the pipeline triggers the completion of the next block etc, until the final block which is usually an ActionBlock<T> with no output. Here is an example:
var block1 = new TransformBlock<int, string>(item =>
{
Thread.Sleep(1000); // Simulate synchronous work
return item.ToString();
}, new()
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
EnsureOrdered = false
});
var block2 = new BatchBlock<string>(batchSize: 10);
var block3 = new ActionBlock<string[]>(async batch =>
{
await Task.Delay(1000); // Simulate asynchronous work
}); // The default MaxDegreeOfParallelism is 1
block1.LinkTo(block2, new() { PropagateCompletion = true });
block2.LinkTo(block3, new() { PropagateCompletion = true });
// Provide some input in the pipeline
block1.Post(1);
block1.Post(2);
block1.Post(3);
block1.Post(4);
block1.Post(5);
block1.Complete(); // Mark the first block as completed
await block3.Completion; // Await the completion of the last block
The TPL Dataflow library is powerful and flexible, but is has a weak point in the propagation of exceptions. There is no built-in way to instruct the block1 to stop working, if the block3 fails. You can read more about this issue here. It might not be a serious issue, if you don't expect your blocks to fail very often.
Assuming MyGenerator() returns List<Task<T>>, and the number of tasks is relatively small (even in the hundreds is probably fine) then you can use Task.WhenAny(), which returns the first Task that completes. Then remove that Task from the list, process the result, and move on to the next:
var tasks = MyGenerator();
while (tasks.Count > 0) {
var t = Task.WhenAny(tasks);
tasks.Remove(t);
var result = await t; // this won't actually wait since the task is already done
// Do something with result
}
There is a good discussion of this in an article by Stephen Toub, which explains in more detail, and gives alternatives if your task list is in the thousands: Processing tasks as they complete
There's also this article, but I think Stephen's is better written: Process asynchronous tasks as they complete (C#)
Using TaskCompletionSource:
IEnumerable<Task<Output>> Process(IEnumerable<Input> inputs)
{
var tcss = new List<TaskCompletionSource<Output>>();
var queue = new Queue<(Input, TaskCompletionSource<Output>)>();
var cumulativeLength = 0;
foreach (var input in inputs)
{
var tcs = new TaskCompletionSource<Output>();
queue.Enqueue((input, tcs));
tcss.Add(tcs);
cumulativeLength += input.Length;
if (cumulativeLength > 10)
{
cumulativeLength = 0
var queueClone = Queue<(Input, TaskCompletionSource<Input>)>(queue);
queue.Clear();
GetFromAPI(queueClone.Select(x => x.Item1)).ContinueWith((apiTask) => {
Queue<BatchResult> batchResults = apiTask.Result;
while (queueClone.Count > 0)
{
var batchResult = batchResults.Dequeue();
var (queuedInput, queuedTcs) = queueClone.Dequeue();
var output = MakeOutput(queuedInput, batchResult);
queuedTcs.SetResult(output)
}
});
}
}
GetFromAPI(queue.Select(x => x.Item1)).ContinueWith((apiTask) => {
Queue<BatchResult> batchResults = apiTask.Result;
while (queue.Count > 0)
{
var batchResult = batchResults.Dequeue();
var (queuedInput, queuedTcs) = queue.Dequeue();
var output = MakeOutput(queuedInput, batchResult);
queuedTcs.SetResult(output)
}
});
foreach (var tcs in tcss)
{
yield return tcs.Task;
}
}

Run same code multiple times in parallel with different parameter

This very simple example:
int numLanes = 8;
var tasks = new List<Task>();
for (var i = 0; i < numLanes; ++i)
{
var t = new Task(() =>
{
Console.WriteLine($"Lane {i}");
});
tasks.Add(t);
}
tasks.ForEach((t) => t.Start());
Task.WaitAll(tasks.ToArray());
Produces:
Lane 8
Lane 8
Lane 8
Lane 8
Lane 8
Lane 8
Lane 8
Lane 8
Which is not as expected, the parameter i isn't passed correctly. I had thought to use Action<int> to wrap the code but couldn't see how I would. I do not want to write a dedicated method like Task CreateTask(int i) I'm interested how to do it using lambdas.
What is normal way to do this - spin up the same code a bunch of times in parallel with a different parameter value?
You've got a captured loop variable i, try to add temp variable inside a loop and pass it to the Task
for (var i = 0; i < numLanes; ++i)
{
var temp = i;
var t = new Task(() =>
{
Console.WriteLine($"Lane {temp}");
});
tasks.Add(t);
}
Further reading How to capture a variable in C# and not to shoot yourself in the foot. foreach loop has the same behavior before C# 5, but according to link above
with the release of the C# 5.0 standard this behavior was changed by
declaring the iterator variable inside every loop iteration, not
before it on the compilation stage, but for all other constructions
similar behavior remained without any changes
So, you may use foreach without temp variable
You need to capture the value inside the for loop otherwise all of the Tasks are still referring to the same object:
for (var i = 0; i < numLanes; ++i)
{
var innerI = I; // Copy the variable here
var t = new Task(() =>
{
Console.WriteLine($"Lane {innerI}");
});
tasks.Add(t);
}
See here for more info.
You could use LINQ to create a closure for each lambda you pass to the Task constructor:
var tasks = Enumerable.Range(0, numLanes - 1)
.Select(i => new Task(() => Console.WriteLine($"Lane {i}")));
Another approach (without introducing additional variable inside for loop) is to use constructor Task(Action<object>, object):
int numLanes = 8;
var tasks = new List<Task>();
for (int i = 0; i < numLanes; ++i)
{
// Variable "i" is passed as an argument into Task constructor.
var t = new Task(arg =>
{
Console.WriteLine("Lane {0}", arg);
}, i);
tasks.Add(t);
}
tasks.ForEach((t) => t.Start());
Task.WaitAll(tasks.ToArray());
In C# 5 and later foreach loop introduces a new variable on each iteration. Therefore in C# 5 and later it is possible to use foreach to create tasks where each task captures its own loop variable (also no need to introduce additional variable inside loop):
int numLanes = 8;
var tasks = new List<Task>();
foreach (int i in Enumerable.Range(0, numLanes))
{
// A new "i" variable is introduced on each iteration.
// Therefore each task captures its own variable.
var t = new Task(() =>
{
Console.WriteLine("Lane {0}", i);
});
tasks.Add(t);
}
tasks.ForEach((t) => t.Start());
Task.WaitAll(tasks.ToArray());

Immediately process asynchronous results in the order they were requested

Suppose I kick off 5 async tasks, and I want to print the results in the order they were requested:
public async void RunTasks()
{
var tasks = new List<Task<int>>();
for(int i=1; i<=5; i++)
{
tasks.Add(DoSomething(i));
}
var results = await Task.WhenAll(tasks);
Console.WriteLine(String.Join(',', results));
}
public async Task<int> DoSomething(int taskNumber)
{
var random = new Random();
await Task.Delay(random.Next(5000));
return taskNumber;
}
This will always print "1,2,3,4,5" - because Task.WhenAll() orders the results by the order requested, not by the order in which they finished.
Unfortunately this means I have to wait for ALL Tasks to finish until I can print anything.
How might I instead print the result of each task as soon as it's finished, but still respecting the order they were requested?
So I should always see "1,2,3,4,5" - but it may arrive gradually:
"1"
"1,2,3"
"1,2,3,4"
"1,2,3,4,5"
(no need to worry about the actual reasoning for doing this, treat it as a fun problem)
var tasks = new List<Task<int>>();
for(int i=1; i<=5; i++)
{
tasks.Add(DoSomething(i));
}
foreach (var task in tasks)
{
var result = await task;
Console.WriteLine(result);
}
We kick off all of the tasks first, then loop over them in order, awaiting each in turn. If the task being awaited has previously completed, the await just returns its result. Otherwise we wait until it completes.
Try a TransformBlock it will output the items it processes one by one in the order the were received by default even if the elements are processed in parallel.
public async Task Order()
{
var tBlock = new TransformBlock<int, string>(async x =>
{
await Task.Delay(100);
return x.ToString();
}, new ExecutionDataflowBlockOptions() { MaxDegreeOfParallelism = 10 });
var sub = tBlock.AsObservable().Subscribe(x => Console.Write(x));
foreach (var num in Enumerable.Range(0, 10))
{
tBlock.Post(num);
}
tBlock.Complete();
await tBlock.Completion;
sub.Dispose();
}
Output:
0123456789

My Thread Function Runs with doubled arguments

I'm making thread on foreach loop.
I give array value and count for the threads, and want to see the list.
But my thread[] is running with same count argument, randomly.
Also, T[0] doesn't get terminated normally. I guess this is with argu overlapping problem too..
This makes the result panel to be placed on other panels.
Thread[] T = new Thread[VA.Count];
int count = 0;
ThreadEnd = new CountdownEvent(VA.Count);
foreach (var item in VA)
{
T[count] = new Thread(delegate () { SetResultBox(count, item); });
T[count].Start();
count++;
}
ThreadEnd.Wait();
private void SetResultBox(int RunCount, JToken item)
{
VideoJson videoinfo = new VideoJson();
videoinfo.title = item["snippet"]["title"].ToString();
videoinfo.description = item["snippet"]["description"].ToString();
videoinfo.ThumbnailURL = item["snippet"]["url"].ToString();
VideoArray.Add(videoinfo);
SearchResultControl SRC = new SearchResultControl(videoinfo);
SRC.Location = new Point(0, RunCount * 110);
ResultControlList.Add(SRC);
ThreadEnd.Signal();
}
I want to know why SetResultBox function's argument is getting overlapped.
Important thing is that I hope there are no Join method.
If VA array gets bigger, this function works too slow with Join Func..
You are implicitly capturing the context of count thus making the thread to use the value of count which is present at the time thread actually starts doing the job. I guess the foreach loop manages to iterate faster than the Thread actually starts. Declare a local variable which would make you a copy of the value as it is at the moment and it should run fine:
foreach (var item in VA)
{
var currentCount = count;
T[count] = new Thread(delegate () { SetResultBox(currentCount, item); });
T[count].Start();
count++;
}
This is very common problem, You need to create a dedicated variable for count to pass into the delegate / lambda
foreach (var item in VA)
{
var count1 = count;
T[count] = new Thread(delegate () { SetResultBox(count1, item); });
T[count].Start();
count++;
}

Categories

Resources