progress tracking of multiple async calls

progress tracking of multiple async calls - c#

There is a method which gets a server response with
var response = (HttpWebResponse)await request.GetResponseAsync();
There's additional code to set up the request. I wrapped this in an async method. This method takes a url as parameter. The call for this method is wrapped with a method again, which constructs the actual url.
Imagine the final methodcall looks like this:
string resultString = await GetResultString(parameter);
Then the following problem occured: I want to send multiple requests at the same time. I've got a list of parameters. I did it like this:
var tasks = new List<Task<string>>();
foreach(parameter in parameters)
{
tasks.Add(GetResultString(parameter));
}
List<string> resultStrings = await TaskEx.WhenAll(tasks);
That also works fine. But the number of requests is variable. So it could be 10 requests. But it's also possible that you fire 10.000 requests or even 100.000. So I thought about monitoring the progress in a progress bar. But how could I do this? I already had a look on this piece of code but I can't get any idea of how I could use that for my code. I need to raise an event, everytime a task has finished. But how when there is not any callback when a task finished?
Thanks and regards
PS: could somebody tell me how to use copy & paste for code without putting four spaces in front of every single line of code? I couldn't get any help out of the advanced help. I just can't get it work. And I'm sorry that my English is not too perfect. I'm German. :-)

Well, First you have to ask yourself what you want. If you have 10,000 requests, and you are able to actually get them to await properly, then they're all in progress. In this particular case, you don't know how done each task is, just how many tasks are completed.
If you have 5 tasks, then a progress time line might look like this:
0------------------------------------------1-23-4---5
Do you want it to say 1 / 5 completed? It seem like 1/5 is misleading, because you're probably almost done?
But other than that, the link you mentioned should cover what it is you want. Why didn't it work?

#McKay and craig1231: you were right. The code I provided in the link was perfectly fine, except that an old AsyncCtpLib is used there (well, the post was released at the end of 2010). The code needed to get reworked a bit. But I solved this even before asking this question here.
My problem just was that I didn't understand when the ProgressChangedEvent gets raised.
Basically, when you instantiate this class [Progress<T>, not EventProgress<T> since the new AsyncCtpLib releases], it captures the current
thread's SynchronizationContext. Then, each time Report is called from
inside the TAP method, it raises the ProgressChanged event on the
right thread.
The answer was right there. But thanks, anyway. :)

Related

Variable within bool function won't change [duplicate]

I'm running in circles with this one. I have some tasks on an HttpClient (.NET 4 with httpclient package from NuGet), in one of them i'm trying to assign a value to a variable that i declared OUTSIDE the task, at the beggining of the function, but when the execution gets to that point, the variable lost the assigned value and came back to the initial value, like it never changed. But I'm pretty sure it DID change at a moment, when the execution passed through the task.
I've made this screenshot to show it more easily:
What should I do to make my xmlString KEEP the value that was assigned to it inside the task, and use it OUTSIDE the task???
Thanks in advance for your help guys.

Judging by your screenshot (it would be better if you provided the code in your question as well) you are never awaiting your task. Therefore, your last usage where you obtain the value of xmlString happens before your task has finished executing, and presumably before your .ContinueWith() has assigned the variable.
Ideally, your enclosing method should be async as well. Then you can simply await it. Otherwise, you can try calling the .ContinueWith(...).Wait() method first, though at that point your'e not leveraging async semantics at all.

Why don't you use await? It makes the code a lot cleaner.
Replace the client.GetAsync() line with the following:
HttpResponse resp = await client.GetAsync(par);
And then add the try-catch part of the Task. Then it should work as you originally intended it to!
EDIT:
Servy is half-right in the comments. Apart from the Microsoft.Net.HttpClient you will most probably need to manually add Microsoft.Bcl.Async too.

Creating an asynchronous controller that returns OK immediately

I'm trying to wrap my head around the new async/await functionality in combination with webapi.
What I would like to do is the following:
I'm receiving a (POST) request on a controller (A) on a certain endpoint.
This controller does nothing but just sends this request through to a second controller (B) on another endpoint (don't ask me why - just a thought experiment).
Let's say B does some long-running work (retrieve data from somewhere) and stores the result in the database.
I don't want to wait on the work from B before I can return a OK message to the request that was sent to A. I would like to be able to call A and get OK back ASAP while the work is still being completed on B.
Is this even possible? If so, what would be a correct way to design something like this?
Thanks for any pointers you might give me.

What you want is to trigger a "Fire and Forget" semantics on your call.
If you're using .NET 4.5.2, you can use HostingEnvironment.QueueBackgroundWorkItem to register and queue work on a ASP.NET Threadpool thread.
If you're on previous versions, you can use BackgroundTaskManager by Stephan Cleary.
Note this has nothing to do with async-await.
An example would look like this:
public HttpResponseMessage Foo()
{
HostingEnvironment.QueueUserWorkItem(() => { /* offloaded code here */ });
return Request.CreateResponse(HttpStatusCode.OK);
}

Just an assumptioin based out of the async/await concept explained by John Skeet. The async/await pattern uses continuations which maintain the state machine. Basically the frame work is supposed to execute the instructions till the end of the synchronous methods.
When it encounters the async call instructions, it invokes it and if the result is readily available goes to the next steps and does the processing. If it does not have the result immediately after the call, it returns back from the method leaving this to be taken care by the framework. So in your case, I think practically it would return immediately if you have no code. But if you have any long running code, it should return leaving the job to the background framework thread but result may come back later.
If your objective is to make sure you don't block the foreground, then this will be achieved with the first controller itself. But even in your case, the second controller wouldn't block the UI. To your question on whether this is right or wrong, I am sure because you may have some reasons why you are thinking of such a pattern.
There are numerous experts here. Correct me if my interpretation is wrong, please:)

You start the background work and respond immediately. The async pattern is only to be used in case you have to respond in any way on the result of the long running task and prevent blocking of the calling thread.
i.e:
public bool ControllerAMethod(byte[] somedata)
{
Task.Factory.StartNew(() =>
{
ControllerB.SendData(somedata);
});
return true;
}

You probably want to respond to the caller with a response code of 202 (accepted) to indicate to the client that some other processing is ongoing.
Then if your client needs additional feedback it could do a long poll, or you could implement a websocket, etc.

BackgroundWorker problems on exit

I am having a bit of a conundrum here, and would like to know a couple of things:
Am i doing this wrong?
What is the expected behaviour of a backgroundworker in different scenarios...
If possible, get an answer as to why i am getting specific behaviour would be nice...
For point 1, and ultimately 3 as well, i will explain what i am doing in Pseudo-Code so that you have the details without actually spitting out thousands of lines of code. While i write this post, i will look at the code itself to ensure that the information is accurate as far as when and what is happening. At the very end, i will also detail what is happening and why i am having issues.
Pseudo-Code details:
I have a main UI thread (WinForms form), where after selecting a few configuration options you click a button.
This button's event does some preliminary setup work in memory and on the file system to get things going and once that's done fires off ONE backgroundworker. This backgroundworker initializes 5 other backgroundworkers (form scope variables), sets their "Done" flags (bool - same scope) to true, sets their "Log" vars to a new List<LogEntry> (same scope) and once that's done calls a method called CheckEndConditions. This method call is done within the DoWork() of the initial backgroundworker, and not in the RunWorkerCompleted event.
The CheckEndConditions method does the following logic:
IF ALL "Done" vars are set to True...
Grab the "Log" vars for all 5 BWs and adds their content to a master log.
Reset the "Log" vars for all 5 BWs to a new List<LogEntry>
Reset the "Done" vars for all 5 BWs to False.
Call MoveToNextStep() method which returns an Enum value representative of the next step to perform
Based on the result of (5), grab a List<ActionFileAction> that needs to be processed
Check to ensure (6) has actions to perform
If NO, set ALL "Done" flags to true, and call itself to move to the next step...
If YES, partition this list of actions into 5 lists and place them in an array of List<ActionFileAction> called ThreadActionSets[]
Check EACH partitioned list for content, and if none, sets the "Done" flag for the respective thread to true (this ensures there are no "end race scenarios")
Fire off all 5 threads using RunWorkerAsync() (unless we are at the Finished step of course)
Return
Each BW has the exact same DoWork() code, which basically boils down to the following:
Do i have any actions to perform?
If NO, set my e.Result var to an empty list of log entries and exit.
If YES, loop for each action in the set and perform 4-5-6 below...
What context of action am i doing? (Groups, Modules, etc)
Based on (4), what type of action am i doing? (Add, Delete, Modify)
Based on (5), perform the right action and log everything you do locally
When all actions are done, set my e.Result var to the "log of everything i've done", and exit.
Each BW has the same RunWorkerCompleted() code, which basically boils down to the following:
TRY
From the e.Result var, grab the List<LogEntry> and put it in my respective thread's "Log" var.
Set my respective "Done" var to true
Call CheckEndConditions()
CATCH
Set my respective "Done" var to true
Call CheckEndConditions()
So that is basically it... in summary, i am splitting a huge amount of actions into 5 partitions, and sending those off to 5 threads to perform them at a faster rate than on a single thread.
The Problem
The problem i am having is that i often find myself, regardless of how much thought i put into this for race scenarios (specifically end ones), with a jammed/non-responsive program.
In the beginning, i had setup my code inefficiently and the problem was with End Race Scenarios and the threads would complete so fast that the last call made to CheckEndConditions saw one of the "Done" vars still set to false, when in fact it wasn't/it had completed... So i changed my code to what you see above which, i thought, would fix the problem, but it hasn't. The whole process still jams/falls asleep, and no threads are actually running any processing when this happens which means that something went wrong (i think, not sure) with the last call to CheckEndConditions.
So my 1st question: Am i doing this wrong? What is the standard way of doing what it is i want to do? The logic of what i've done feels sound to me, but it doesn't behave how i expect it to so maybe the logic isn't sound? ...
2nd question: What is the expected behaviour of a BW, when this scenario occurs:
An error occurred within the DoWork() method that was un-caught... does it fire off the RunWorkerCompleted() event? If not, what happens?
3rd question: Does anyone see something obvious as to why my problem is occurring?
Thanks for the help!

Reposting my comment as answer per OP's request:
The RunWorkerCompleted event will not necessarily be raised on the same thread that it was created on (unless it is created on UI thread) See BackgroundWorker RunWorkerCompleted Event
See OP comments for more details.

Can many instances of an async task share a reference to a concurrent collection and add items concurrently to it in C#?

I'm just beginning to learn C# threading and concurrent collections, and am not sure of the proper terminology to pose my question, so I'll describe briefly what I'm trying to do. My grasp of the subject is rudimentary at best at this point. Is my approach below even feasible as I've envisioned it?
I have 100,000 urls in a Concurrent collection that must be tested--is the link still good? I have another concurrent collection, initially empty, that will contain the subset of urls that an async request determines to have been moved (400, 404, etc errors).
I want to spawn as many of these async requests concurrently as my PC and our bandwidth will allow, and was going to start at 20 async-web-request-tasks per second and work my way up from there.
Would it work if a single async task handled both things: it would make the async request and then add the url to the BadUrls collection if it encountered a 4xx error? A new instance of that task would be spawned every 50ms:
class TestArgs args {
ConcurrentBag<UrlInfo> myCollection { get; set; }
System.Uri currentUrl { get; set; }
}
ConcurrentQueue<UrlInfo> Urls = new ConncurrentQueue<UrlInfo>();
// populate the Urls queue
<snip>
// initialize the bad urls collection
ConcurrentBag<UrlInfo> BadUrls = new ConcurrentBag<UrlInfo>();
// timer fires every 50ms, whereupon a new args object is created
// and the timer callback spawns a new task; an autoEvent would
// reset the timer and dispose of it when the queue was empty
void SpawnNewUrlTask(){
// if queue is empty then reset the timer
// otherwise:
TestArgs args = {
myCollection = BadUrls,
currentUrl = getNextUrl() // take an item from the queue
};
Task.Factory.StartNew( asyncWebRequestAndConcurrentCollectionUpdater, args);
}
public async Task asyncWebRequestAndConcurrentCollectionUpdater(TestArgs args)
{
//make the async web request
// add the url to the bad collection if appropriate.
}
Feasible? Way off?

The approach seems fine, but there are some issues with the specific code you've shown.
But before I get to that, there have been suggestions in the comments that Task Parallelism is the way to go. I think that's misguided. There's a common misconception that if you want to have lots of work going on in parallel, you necessarily need lots of threads. That's only true if the work is compute-bound. But the work you're doing will be IO bound - this code is going to spend the vast majority of its time waiting for responses. It will do very little computation. So in practice, even if it only used a single thread, your initial target of 20 requests per second doesn't seem like a workload that would cause a single CPU core to break into a sweat.
In short, a single thread can handle very high levels of concurrent IO. You only need multiple threads if you need parallel execution of code, and that doesn't look likely to be the case here, because there's so little work for the CPU in this particular job.
(This misconception predates await and async by years. In fact, it predates the TPL - see http://www.interact-sw.co.uk/iangblog/2004/09/23/threadless for a .NET 1.1 era illustration of how you can handle thousands of concurrent requests with a tiny number of threads. The underlying principles still apply today because Windows networking IO still basically works the same way.)
Not that there's anything particularly wrong with using multiple threads here, I'm just pointing out that it's a bit of a distraction.
Anyway, back to your code. This line is problematic:
Task.Factory.StartNew( asyncWebRequestAndConcurrentCollectionUpdater, args);
While you've not given us all your code, I can't see how that will be able to compile. The overloads of StartNew that accept two arguments require the first to be either an Action, an Action<object>, a Func<TResult>, or a Func<object,TResult>. In other words, it has to be a method that either takes no arguments, or accepts a single argument of type object (and which may or may not return a value). Your 'asyncWebRequestAndConcurrentCollectionUpdater' takes an argument of type TestArgs.
But the fact that it doesn't compile isn't the main problem. That's easily fixed. (E.g., change it to Task.Factory.StartNew(() => asyncWebRequestAndConcurrentCollectionUpdater(args));) The real issue is what you're doing is a bit weird: you're using Task.StartNew to invoke a method that already returns a Task.
Task.StartNew is a handy way to take a synchronous method (i.e., one that doesn't return a Task) and run it in a non-blocking way. (It'll run on the thread pool.) But if you've got a method that already returns a Task, then you didn't really need to use Task.StartNew. The weirdness becomes more apparent if we look at what Task.StartNew returns (once you've fixed the compilation error):
Task<Task> t = Task.Factory.StartNew(
() => asyncWebRequestAndConcurrentCollectionUpdater(args));
That Task<Task> reveals what's happening. You've decided to wrap a method that was already asynchronous with a mechanism that is normally used to make non-asynchronous methods asynchronous. And so you've now got a Task that produces a Task.
One of the slightly surprising upshots of this is that if you were to wait for the task returned by StartNew to complete, the underlying work would not necessarily be done:
t.Wait(); // doesn't wait for asyncWebRequestAndConcurrentCollectionUpdater to finish!
All that will actually do is wait for asyncWebRequestAndConcurrentCollectionUpdater to return a Task. And since asyncWebRequestAndConcurrentCollectionUpdater is already an async method, it will return a task more or less immediately. (Specifically, it'll return a task the moment it performs an await that does not complete immediately.)
If you want to wait for the work you've kicked off to finish, you'll need to do this:
t.Result.Wait();
or, potentially more efficiently, this:
t.Unwrap().Wait();
That says: get me the Task that my async method returned, and then wait for that. This may not be usefully different from this much simpler code:
Task t = asyncWebRequestAndConcurrentCollectionUpdater("foo");
... maybe queue up some other tasks ...
t.Wait();
You may not have gained anything useful by introducing `Task.Factory.StartNew'.
I say "may" because there's an important qualification: it depends on the context in which you start the work. C# generates code which, by default, attempts to ensure that when an async method continues after an await, it does so in the same context in which the await was initially performed. E.g., if you're in a WPF app and you await while on the UI thread, when the code continues it will arrange to do so on the UI thread. (You can disable this with ConfigureAwait.)
So if you're in a situation in which the context is essentially serialized (either because it's single-threaded, as will be the case in a GUI app, or because it uses something resembling a rental model, e.g. the context of an particular ASP.NET request), it may actually be useful to kick an async task off via Task.Factory.StartNew because it enables you to escape the original context. However, you just made your life harder - tracking your tasks to completion is somewhat more complex. And you might have been able to achieve the same effect simply by using ConfigureAwait inside your async method.
And it may not matter anyway - if you're only attempting to manage 20 requests a second, the minimal amount of CPU effort required to do that means that you can probably manage it entirely adequately on one thread. (Also, if this is a console app, the default context will come into play, which uses the thread pool, so your tasks will be able to run multithreaded in any case.)
But to get back to your question, it seems entirely reasonable to me to have a single async method that picks a url off the queue, makes the request, examines the response, and if necessary, adds an entry to the bad url collection. And kicking the things off from a timer also seems reasonable - that will throttle the rate at which connections are attempted without getting bogged down with slow responses (e.g., if a load of requests end up attempting to talk to servers that are offline). It might be necessary to introduce a cap for the maximum number of requests in flight if you hit some pathological case where you end up with tens of thousands of URLs in a row all pointing to a server that isn't responding. (On a related note, you'll need to make sure that you're not going to hit any per-client connection limits with whichever HTTP API you're using - that might end up throttling the effective throughput.)
You will need to add some sort of completion handling - just kicking off asynchronous operations and not doing anything to handle the results is bad practice, because you can end up with exceptions that have nowhere to go. (In .NET 4.0, these used to terminate your process, but as of .NET 4.5, by default an unhandled exception from an asynchronous operation will simply be ignored!) And if you end up deciding that it is worth launching via Task.Factory.StartNew remember that you've ended up with an extra layer of wrapping, so you'll need to do something like myTask.Unwrap().ContinueWith(...) to handle it correctly.

Of course you can. Concurrent collections are called 'concurrent' because they can be used... concurrently by multiple threads, with some warranties about their behaviour.
A ConcurrentQueue will ensure that each element inserted in it is extracted exactly once (concurrent threads will never extract the same item by mistake, and once the queue is empty, all the items have been extracted by a thread).
EDIT: the only thing that could go wrong is that 50ms is not enough to complete the request, and so more and more tasks cumulate in the task queue. If that happens, your memory could get filled, but the thing would work anyway. So yes, it is feasible.
Anyway, I would like to underline the fact that a task is not a thread. Even if you create 100 tasks, the framework will decide how many of them will be actually executed concurrently.
If you want to have more control on the level of parallelism, you should use asynchronous requests.
In your comments, you wrote "async web request", but I can't understand if you wrote async just because it's on a different thread or because you intend to use the async API.
If you were using the async API, I'd expect to see some handler attached to the completion event, but I couldn't see it, so I assumed you're using synchronous requests issued from an asynchronous task.
If you're using asynchronous requests, then it's pointless to use tasks, just use the timer to issue the async requests, since they are already asynchronous.
When I say "asynchronous request" I'm referring to methods like WebRequest.GetResponseAsync and WebRequest.BeginGetResponse.
EDIT2: if you want to use asynchronous requests, then you can just make requests from the timer handler. The BeginGetResponse method takes two arguments. The first one is a callback procedure, that will be called to report the status of the request. You can pass the same procedure for all the requests. The second one is an user-provided object, which will store status about the request, you can use this argument to differentiate among different requests. You can even do it without the timer. Something like:
private readonly int desiredConcurrency = 20;
struct RequestData
{
public UrlInfo url;
public HttpWebRequest request;
}
/// Handles the completion of an asynchronous request
/// When a request has been completed,
/// tries to issue a new request to another url.
private void AsyncRequestHandler(IAsyncResult ar)
{
if (ar.IsCompleted)
{
RequestData data = (RequestData)ar.AsyncState;
HttpWebResponse resp = data.request.EndGetResponse(ar);
if (resp.StatusCode != 200)
{
BadUrls.Add(data.url);
}
//A request has been completed, try to start a new one
TryIssueRequest();
}
}
/// If urls is not empty, dequeues a url from it
/// and issues a new request to the extracted url.
private bool TryIssueRequest()
{
RequestData rd;
if (urls.TryDequeue(out rd.url))
{
rd.request = CreateRequestTo(rd.url); //TODO implement
rd.request.BeginGetResponse(AsyncRequestHandler, rd);
return true;
}
else
{
return false;
}
}
//Called by a button handler, or something like that
void StartTheRequests()
{
for (int requestCount = 0; requestCount < desiredConcurrency; ++requestCount)
{
if (!TryIssueRequest()) break;
}
}

Avoiding BinaryReader.ReadString() in C#?

Good morning,
At the startup of the application I am writing I need to read about 1,600,000 entries from a file to a Dictionary<Tuple<String, String>, Int32>. It is taking about 4-5 seconds to build the whole structure using a BinaryReader (using a FileReader takes about the same time). I profiled the code and found that the function doing the most work in this process is BinaryReader.ReadString(). Although this process needs to be run only once and at startup, I would like to make it as quick as possible. Is there any way I can avoid BinaryReader.ReadString() and make this process faster?
Thank you very much.

Are you sure that you absolutely have to do this before continuing?
I would examine the possibility of hiving off the task to a separate thread which sets a flag when finished. Then your startup code simply kicks off that thread and continues on its merry way, pausing only when both:
the flag is not yet set; and
no more work can be done without the data.
Often, the illusion of speed is good enough, as anyone who has coded up a splash screen will tell you.
Another possibility, if you control the data, is to store it in a more binary form so you can just blat it all in with one hit (i.e., no interpretation of the data, just read in the whole thing). That, of course, makes it harder to edit the data from outside your application but you haven't stated that as a requirement.
If it is a requirement or you don't control the data, I'd still look into my first suggestion above.

If you think that reading the file line by line is the bottleneck, and depending on its size, you can try to read it all at once:
// read the entire file at once
string entireFile = System.IO.File.ReadAllText(path);
It this doesn't help, you can try to add a separate thread with a semaphore, which would start reading in background immediately when the program is started, but block the requesting thread at the moment you try to access the data.
This is called a Future, and you have an implementation in Jon Skeet's miscutil library.
You call it like this at the app startup:
// following line invokes "DoTheActualWork" method on a background thread.
// DoTheActualWork returns an instance of MyData when it's done
Future<MyData> calculation = new Future<MyData>(() => DoTheActualWork(path));
And then, some time later, you can access the value in the main thread:
// following line blocks the calling thread until
// the background thread completes
MyData result = calculation.Value;
If you look at the Future's Value property, you can see that it blocks at the AsyncWaitHandle if the thread is still running:
public TResult Value
{
get
{
if (!IsCompleted)
{
_asyncResult.AsyncWaitHandle.WaitOne();
_lock.WaitOne();
}
return _value;
}
}

If strings are repeated inside tuples you could reorganize your file to have all different involving strings at the start, and have references to those strings (integers) in the body of the file. Your main Dictionary does not have to change, but you would need a temporary Dictionary during startup with all different strings (values) and their references (keys).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

progress tracking of multiple async calls - c#

Related

Variable within bool function won't change [duplicate]

Creating an asynchronous controller that returns OK immediately

BackgroundWorker problems on exit

Can many instances of an async task share a reference to a concurrent collection and add items concurrently to it in C#?

Avoiding BinaryReader.ReadString() in C#?

Categories

Resources