What would be he correct way to fire a list of task in parallel in a fire and forget manner.
What I've got below makes me believe that .WhenAll is blocking until all is done.
I've got quit a few like these, I need to learn how to loop and store all the called functions and then fire them off where the all run at the same time, it does not matter what function gets called first or last.
What is the correct approach for this?
I wish MS would put in their intellisense a little bit more info to help us out, because I've got more needs for async calls especially a lot of work calls at one time and they're all fire and forget.
Here is what I've got now.
public async static Task UpdateBayPositionAsync(string cadCNN, string bayPositions)
{
List<Task> myTask = new List<Task>();
string[] bps = bayPositions.Split(',');
int bID; byte pos;
for (int i = 0; i < bps.Length; i++)
{
bID = int.Parse(bps[i].Split(':')[0].ToString());
pos = byte.Parse(bps[i].Split(':')[1].ToString());
myTask.Add(Task.Run(() => { ElevationManagerDL.UpdateBayPosition(cadCNN, bID, pos); }));
};
await Task.WhenAll(myTask.ToList());
}
It looks like you are interested in both asynchronicity and parallelism
I would recommend solving the asynchronicity by a Task (not awaited) and the parallelism with Parallel.ForEach(..)
Parallel.ForEach is a lot more performant than to create one task per position,
especially if there are many positions, see Parallel.ForEach vs Task.Factory.StartNew
Something like this
public async static Task UpdateBayPositionAsync(string cadCnn, string serializedBayPositions)
{
string[] bayPositionsAsStrings = serializedBayPositions.Split(',');
List<BayPosition> bayPositions = bayPositionsAsStrings.Select(bp => new BayPosition(cadCnn, bp)).ToList();
Task.Factory.StartNew( () => Parallel.ForEach(bayPositions, item => item.Update()));
}
public class BayPosition
{
public int BId { get; private set; }
public byte Pos { get; private set; }
public string CadCnn { get; private set; }
public BayPosition(string cadCnn, string bayPosition)
{
string[] parameters = bayPosition.Split(':');
BId = Int32.Parse(parameters[0]);
Pos = Byte.Parse(parameters[1]);
CadCnn = cadCnn;
}
public void Update()
{
ElevationManagerDL.UpdateBayPosition(CadCnn, BId, Pos);
}
}
And if you only want the parallelism and want to block until all Updates are run then you just replace:
Task.Factory.StartNew( () => Parallel.ForEach(bayPositions, item => item.Update()));
with
Parallel.ForEach(bayPositions, item => item.Update());
Related
As a part of best practices for async and await it is recommended to not use Task.Run. I have a service which makes multiple calls to a third party service and we use async to make those calls. I'm looking for advice on code improvement in the code below.
public interface IRouteService
{
Task<IEnumerable<Route>> GetRoute(Coordinates orign, Coordinates destination);
}
public class RouteProvider
{
private readonly IRouteService _routeService;
public RouteProvider(IRouteService routeService)
{
_routeService = routeService;
}
public async Task<IEnumerable<Route>> GetRoutes(IEnumerable<Coordinates> origns, IEnumerable<Coordinates> destinations)
{
ConcurrentBag<Route> routes = new ConcurrentBag<Route>();
List<Task> tasks = new List<Task>();
foreach (var origin in origns)
{
foreach (var destination in destinations)
{
tasks.Add(Task.Run(async () =>
{
var response= await _routeService.GetRoute(origin, destination);
foreach (var item in response)
{
routes.Add(item);
}
}));
}
}
Task.WaitAll(tasks.ToArray());
return routes;
}
}
public class Route
{
public string Distance { get; set; }
public Coordinates Origin { get; set; }
public object Destination { get; set; }
public string OriginName { get; set; }
public string DestinationName { get; set; }
}
public class Coordinates
{
public float Lat { get; set; }
public float Long { get; set; }
}
For a problem like this it is handy to use LINQ. LINQ produces immutable results so you avoid concurrency issues and don't need any specialized collections.
In general, using LINQ or similar programming techniques (i.e. thinking like a functional programmer) will make multithreading much easier.
public async Task<IEnumerable<Route>> GetRoutes(IEnumerable<Coordinates> origins, IEnumerable<Coordinates> destinations)
{
var tasks = origins
.SelectMany
(
o => destinations.Select
(
d => _routeService.GetRoute(o, d)
)
);
await Task.WhenAll( tasks.ToArray() );
return tasks.SelectMany( task => task.Result );
}
As pointed in the comments I would suggest that you could use Task.WhenAll() to determine all task to complete and get the results with return await Task.WhenAll(tasks);. To do that, you can update your code like shown below.
public async Task<IEnumerable<Route>> GetRoutes(IEnumerable<Coordinates> origns, IEnumerable<Coordinates> destinations)
{
ConcurrentBag<Route> routes = new ConcurrentBag<Route>();
List<Task> tasks = new List<Task>();
foreach (var origin in origns)
{
foreach (var destination in destinations)
{
tasks.Add(_routeService.GetRoute(origin, destination));
}
}
var response = await Task.WhenAll(tasks);
foreach (var item in response)
{
routes.Add(item);
}
return routes;
}
}
Since all the calls will return the same type, you do not need to start a second foreach in other loop. Also, this way you will avoid locking thread execution with Task.WaitAll() and your program will run more syncronous. To see the difference between WhenAll() vs WaitAll(), you can check this out.
Instead of directly creating tasks using the Task.Run method you can use continuations.
foreach (var origin in origns)
{
foreach (var destination in destinations)
{
tasks.Add(
_routeService.GetRoute(origin, destination)
.ContinueWith(response =>
{
foreach (var item in response.Result)
routes.Add(item);
})
);
}
}
Thus, the GetRoute method will be executed asynchronously, without creating a separate thread. And the result obtained from it will be processed in a separate thread (task).
However, this is only necessary if the result takes a long time to process. Otherwise, a separate thread is not needed at all.
I've been attempting to see how long functions take to execute in my code as practice to see where I can optimize. Right now I use a helper class that is essentially a stopwatch with a message to check these. The goal of this is that I should be able to wrap whatever method call I want in the helper and I'll get it's duration.
public class StopwatcherData
{
public long Time { get; set; }
public string Message { get; set; }
public StopwatcherData(long time, string message)
{
Time = time;
Message = message;
}
}
public class Stopwatcher
{
public delegate void CompletedCallBack(string result);
public static List<StopwatcherData> Data { get; set; }
private static Stopwatch stopwatch { get; set;}
public Stopwatcher()
{
Data = new List<StopwatcherData>();
stopwatch = new Stopwatch();
stopwatch.Start();
}
public static void Click(string message)
{
Data.Add(new StopwatcherData(stopwatch.ElapsedMilliseconds, message));
}
public static void Reset()
{
stopwatch.Reset();
stopwatch.Start();
}
}
Right now to use this, I have to call the Reset before the function I want so that the timer is restarted, and then call the click after it.
Stopwatcher.Reset()
MyFunction();
Stopwatcher.Click("MyFunction");
I've read a bit about delegates and actions, but I'm unsure of how to apply them to this situation. Ideally, I would pass the function as part of the Stopwatcher call.
//End Goal:
Stopwatcher.Track(MyFunction(), "MyFunction Time");
Any help is welcome.
It's not really a good idea to profile your application like that, but if you insist, you can at least make some improvements.
First, don't reuse Stopwatch, just create new every time you need.
Second, you need to handle two cases - one when delegate you pass returns value and one when it does not.
Since your Track method is static - it's a common practice to make it thread safe. Non-thread-safe static methods are quite bad idea. For that you can store your messages in a thread-safe collection like ConcurrentBag, or just use lock every time you add item to your list.
In the end you can have something like this:
public class Stopwatcher {
private static readonly ConcurrentBag<StopwatcherData> _data = new ConcurrentBag<StopwatcherData>();
public static void Track(Action action, string message) {
var w = Stopwatch.StartNew();
try {
action();
}
finally {
w.Stop();
_data.Add(new StopwatcherData(w.ElapsedMilliseconds, message));
}
}
public static T Track<T>(Func<T> func, string message) {
var w = Stopwatch.StartNew();
try {
return func();
}
finally {
w.Stop();
_data.Add(new StopwatcherData(w.ElapsedMilliseconds, message));
}
}
}
And use it like this:
Stopwatcher.Track(() => SomeAction(param1), "test");
bool result = Stopwatcher.Track(() => SomeFunc(param2), "test");
If you are going to use that with async delegates (which return Task or Task<T>) - you need to add two more overloads for that case.
Yes, you can create a timer function that accepts any action as a delegate. Try this block:
public static long TimeAction(Action action)
{
var timer = new Stopwatch();
timer.Start();
action();
timer.Stop();
return timer.ElapsedMilliseconds;
}
This can be used like this:
var elapsedMilliseconds = TimeAction(() => MyFunc(param1, param2));
This is a bit more awkward if your wrapped function returns a value, but you can deal with this by assigning a variable from within the closure, like this:
bool isSuccess ;
var elapsedMilliseconds = TimeToAction(() => {
isSuccess = MyFunc(param1, param2);
});
I've had this problem a while ago as well and was always afraid of the case that I'll leave errors when I change Stopwatcher.Track(() => SomeFunc(), "test")(See Evk's answer) back to SomeFunc(). So I tought about something that wraps it without changing it!
I came up with a using, which is for sure not the intended purpose.
public class OneTimeStopwatch : IDisposable
{
private string _logPath = "C:\\Temp\\OneTimeStopwatch.log";
private readonly string _itemname;
private System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
public OneTimeStopwatch(string itemname)
{
_itemname = itemname;
sw.Start();
}
public void Dispose()
{
sw.Stop();
System.IO.File.AppendAllText(_logPath, string.Format($"{_itemname}: {sw.ElapsedMilliseconds}ms{Environment.NewLine}"));
}
}
This can be used a easy way
using (new OneTimeStopwatch("test"))
{
//some sensible code not to touch
System.Threading.Thread.Sleep(1000);
}
//logfile with line "test: 1000ms"
I only need to remove 2 lines (and auto format) to make it normal again.
Plus I can easily wrap multiple lines here which isn't possible without defining new functions in the other approach.
Again, this is not recommended for terms of few miliseconds.
Here's what I'm trying to do:
Keep a queue in memory of items that need processed (i.e. IsProcessed = 0)
Every 5 seconds, get unprocessed items from the db, and if they're not already in the queue, add them
Continuous pull items from the queue, process them, and each time an item is processed, update it in the db (IsProcessed = 1)
Do this all "as parallel as possible"
I have a constructor for my service like
public MyService()
{
Ticker.Elapsed += FillQueue;
}
and I start that timer when the service starts like
protected override void OnStart(string[] args)
{
Ticker.Enabled = true;
Task.Run(() => { ConsumeWork(); });
}
and my FillQueue is like
private static async void FillQueue(object source, ElapsedEventArgs e)
{
var items = GetUnprocessedItemsFromDb();
foreach(var item in items)
{
if(!Work.Contains(item))
{
Work.Enqueue(item);
}
}
}
and my ConsumeWork is like
private static void ConsumeWork()
{
while(true)
{
if(Work.Count > 0)
{
var item = Work.Peek();
Process(item);
Work.Dequeue();
}
else
{
Thread.Sleep(500);
}
}
}
However this is probably a naive implementation and I'm wondering whether .NET has any type of class that is exactly what I need for this type of situation.
Though #JSteward' answer is a good start, you can improve it with mixing up the TPL-Dataflow and Rx.NET extensions, as a dataflow block may easily become an observer for your data, and with Rx Timer it will be much less effort for you (Rx.Timer explanation).
We can adjust MSDN article for your needs, like this:
private const int EventIntervalInSeconds = 5;
private const int DueIntervalInSeconds = 60;
var source =
// sequence of Int64 numbers, starting from 0
// https://msdn.microsoft.com/en-us/library/hh229435.aspx
Observable.Timer(
// fire first event after 1 minute waiting
TimeSpan.FromSeconds(DueIntervalInSeconds),
// fire all next events each 5 seconds
TimeSpan.FromSeconds(EventIntervalInSeconds))
// each number will have a timestamp
.Timestamp()
// each time we select some items to process
.SelectMany(GetItemsFromDB)
// filter already added
.Where(i => !_processedItemIds.Contains(i.Id));
var action = new ActionBlock<Item>(ProcessItem, new ExecutionDataflowBlockOptions
{
// we can start as many item processing as processor count
MaxDegreeOfParallelism = Environment.ProcessorCount,
});
IDisposable subscription = source.Subscribe(action.AsObserver());
Also, your check for item being already processed isn't quite accurate, as there is a possibility to item get selected as unprocessed from db right at the time you've finished it's processing, yet didn't update it in database. In this case item will be removed from Queue<T>, and after that added there again by producer, this is why I've added the ConcurrentBag<T> to this solution (HashSet<T> isn't thread-safe):
private static async Task ProcessItem(Item item)
{
if (_processedItemIds.Contains(item.Id))
{
return;
}
_processedItemIds.Add(item.Id);
// actual work here
// save item as processed in database
// we need to wait to ensure item not to appear in queue again
await Task.Delay(TimeSpan.FromSeconds(EventIntervalInSeconds * 2));
// clear the processed cache to reduce memory usage
_processedItemIds.Remove(item.Id);
}
public class Item
{
public Guid Id { get; set; }
}
// temporary cache for items in process
private static ConcurrentBag<Guid> _processedItemIds = new ConcurrentBag<Guid>();
private static IEnumerable<Item> GetItemsFromDB(Timestamped<long> time)
{
// log event timing
Console.WriteLine($"Event # {time.Value} at {time.Timestamp}");
// return items from DB
return new[] { new Item { Id = Guid.NewGuid() } };
}
You can implement cache clean up in other way, for example, start a "GC" timer, which will remove processed items from cache on regular basis.
To stop events and processing items you should Dispose the subscription and, maybe, Complete the ActionBlock:
subscription.Dispose();
action.Complete();
You can find more information about Rx.Net in their guidelines on github.
You could use an ActionBlock to do your processing, it has a built in queue that you can post work to. You can read up on tpl-dataflow here: Intro to TPL-Dataflow also Introduction to Dataflow, Part 1. Finally, this is a quick sample to get you going. I've left out a lot but it should at least get you started.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace MyWorkProcessor {
public class WorkProcessor {
public WorkProcessor() {
Processor = CreatePipeline();
}
public async Task StartProcessing() {
try {
await Task.Run(() => GetWorkFromDatabase());
} catch (OperationCanceledException) {
//handle cancel
}
}
private CancellationTokenSource cts {
get;
set;
}
private ITargetBlock<WorkItem> Processor {
get;
}
private TimeSpan DatabasePollingFrequency {
get;
} = TimeSpan.FromSeconds(5);
private ITargetBlock<WorkItem> CreatePipeline() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = 100,
CancellationToken = cts.Token
};
return new ActionBlock<WorkItem>(item => ProcessWork(item), options);
}
private async Task GetWorkFromDatabase() {
while (!cts.IsCancellationRequested) {
var work = await GetWork();
await Processor.SendAsync(work);
await Task.Delay(DatabasePollingFrequency);
}
}
private async Task<WorkItem> GetWork() {
return await Context.GetWork();
}
private void ProcessWork(WorkItem item) {
//do processing
}
}
}
I wrote a console application for downloading YouTube preview-images.But I think this program is running synchronously instead async. What did I do wrong and how do I make multi-loading files from web use async/await?
using System;
using System.IO;
using System.Net.Http;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace YoutubePreviewer
{
class Node
{
public string Path { get; private set; }
public string Title { get; private set; }
public string Source { get; private set; }
public string Id { get; private set; }
public Previews Previews { get; set; }
public Node(string p, string t, string s, string i)
{
Path = p;
Title = t;
Source = s;
Id = i;
}
}
class Previews
{
public string[] Urls { get; private set; }
public static Previews Get(Node n)
{
string[] resolutions = {"default", "hqdefault", "mqdefault", "maxresdefault"};
for (int i = 0; i < resolutions.Length; i++)
{
string end = resolutions[i] + ".jpg";
resolutions[i] = "https://img.youtube.com/vi/" + n.Id + "/" + resolutions[i] + ".jpg";
}
Previews pr = new Previews();
pr.Urls = resolutions;
return pr;
}
}
static class Operations
{
public static async Task<string> DownloadUrl(string address)
{
HttpClient http = new HttpClient();
return await http.GetStringAsync(address);
}
public static async Task<Node> Build(string url)
{
var source = await Operations.DownloadUrl(url);
var title = Regex.Match(source, "<title>(.*)</title>").Groups[1].Value;
var id = Regex.Match(url, #"watch\?v=(.+)").Groups[1].Value;
Node node = new Node(url, title, source, id);
node.Previews =await Task<Previews>.Factory.StartNew(()=>Previews.Get(node);
return node;
}
public static async Task WriteToDisk(Node n, string path = "C:/Downloads")
{
Console.WriteLine($"Starting downloading {n.Path} previews");
var securedName = string.Join("_", n.Title.Split(Path.GetInvalidFileNameChars()));
Directory.CreateDirectory(Path.Combine(path, securedName));
HttpClient http = new HttpClient();
foreach (var preview in n.Previews.Urls)
{
try
{
var arr = await http.GetByteArrayAsync(preview);
await Task.Delay(100);
string name = preview.Substring(preview.LastIndexOf("/") + 1);
using (FileStream fs = new FileStream(Path.Combine(path, securedName, name), FileMode.Create,
FileAccess.ReadWrite))
{
await fs.WriteAsync(arr, 0, arr.Length);
}
}
catch (Exception e)
{
Console.WriteLine($"Can't download and save preview {preview}");
Console.WriteLine(e.Message);
Console.WriteLine(new string('*', 12));
}
Console.WriteLine($"{preview} is saved!");
}
}
public static async Task Load(params string[] urls)
{
foreach (var url in urls)
{
Node n = await Build(url);
await WriteToDisk(n);
}
}
}
class Program
{
static void Main(string[] args)
{
Task t= Operations.Load(File.ReadAllLines("data.txt"));
Task.WaitAll(t);
Console.WriteLine("Done");
Console.ReadKey();
}
}
}
Your code is downloading URLs and writing them to disk one at a time. It is operating asynchronously, but serially.
If you want it to run asynchronously and concurrently, then you should be using something like Task.WhenAll:
public static async Task LoadAsync(params string[] urls)
{
var tasks = urls.Select(url => WriteToDisk(Build(url)));
await Task.WhenAll(tasks);
}
(This code assumes that Build is a synchronous method, which it should be).
There are also a number of unrelated issues that jump out:
node.Previews =await Task<Previews>.Factory.StartNew(()=>Previews.Get(node); is sending trivial work to the thread pool for no real reason. It should be node.Previews = Previews.Get(node);.
This means that Operations.Build doesn't need to be async, and indeed it shouldn't be.
You should be using a single shared instance of HttpClient rather than creating a new one for each request.
Task.WaitAll(t); is quite odd. It can be just t.Wait();.
await Task.Delay(100); is also unusual.
To add to #Stephen Cleary's excellent answer - as he said, this is technically running asynchronously, but that's not actually helping you at all because it's doing things serially - i.e. it is asynchronous but the performance is no better than if it actually was just running synchronously.
The key thing to remember here is that async/await will only help you if it actually allows the machine to do more work than it would have done otherwise in a certain amount of time (or if it allows the machine to finish a certain set of tasks faster).
Just to use my favorite analogy: suppose that you're at a restaurant with 9 other people. When the waiter comes by to take orders, the first guy he calls on isn't ready. Clearly, the most efficient thing to do would be to take the order of the other 9 people and then come back to him. Suppose, however, the first guy said, "it's OK to come back to me later, as long as you wait for me to be ready to order first." (This is essentially what you have above - "it's OK to come back to my method to process the download later, as long as you wait for me to finish the download first"). This analogy isn't perfect by any means, but I think that captures the essence of what needs to happen here.
The key thing to remember is that there's only an improvement here if the waiter can accomplish more in the same amount of time or can accomplish a certain set of tasks faster. In this case, he only saves time if he decreases the total amount of time that he spends taking the table's order.
One other thing to remember: it's acceptable to do something like Task.WaitAll(...) in a console application (as long as you're not using a synchronization context) but you want to make sure you don't do something like that in a WPF application or something else with a synchronization context as that could cause a deadlock.
It's very important to control concurrency, so you efficiently utilize the network channel and don't get throttled. So I would suggest to use the AsyncEnumerator NuGet Package with such code:
using System.Collections.Async;
static class Operations
{
public static async Task Load(params string[] urls)
{
await urls.ParallelForEachAsync(
async url =>
{
Node n = await Build(url);
await WriteToDisk(n);
},
maxDegreeOfParallelism: 10);
}
}
I am confused about the accuracy of code in multi threading as some time I am getting wrong result.
Looks like it might fail. Below is the code.
public class MyKeyValue
{
public double Key { get; set; }
public double Value { get; set; }
}
public class CollMyKeyValue : List<MyKeyValue>
{
public void SumUpValues(CollMyKeyValue collection)
{
int count =0;
Parallel.For(count, this.Count,
(i) =>
{
this[count].Value = this[count].Value + collection[count].Value;
Interlocked.Increment(ref count);
});
}
}
Assuming the keys are same in both collection.
I want add the values of one collection into another. Is it therad safe ?
I have not put the this[count].Value = this[count].Value + collection[count].Value; in thread safe block.
Just remove the interlocked increment :
public void SumUpValues(CollMyKeyValue collection)
{
//int count =0;
Parallel.For(0, this.Count,
(i) =>
{
this[i].Value = this[i].Value + collection[i].Value;
//Interlocked.Increment(ref count);
});
}
Your version is altering the index variable inside the loop. The For loop does this automatically; in the parallel version each thread gets an i (or set of i) to do, so incrementing in the loop makes no sense.
Not sure what you're trying to do. But I guess you mean this.
public void SumUpValues(CollMyKeyValue collection)
{
Parallel.For(0, this.Count, (i) =>
{
this[i].Value += collection[i].Value;
});
}
First parameter says the Parallel.For where to start, altering that makes no sense. You get i as the parameter to the loop body which will tell you which iteration you're in.