I am attempting to implement some async code in my Application. I start a new Task and dont wait for the results. This task that has been started news up another Task which does await the result. The second Task uses Http.Context (as I need to get the User from the http context) as the secondary Tasks which I wait on fires off an API call which uses the http.context.current.user.
I was using this answer in order to pass the current context into the Task.
So my code is as below:
var context = HttpContext.Current;
Task.Factory.StartNew(() =>
{
HttpContext.Current = context;
ExecuteMethodAndContinue();
});
private static void ExecuteMethodAndContinue()
{
var myService = ServiceManager.GetMyService();
var query = GetQuery();
var files = myService.GetFiles(query).ToList();
//Remaining code removed for brevity
}
The implementation of GetFiles which is called from other places in the code as well is as below:
public IDictionary<FileName, FileDetails> GetFiles(MyQuery query)
{
var countries = GetAllCountries();
var context = HttpContext.Current;
var taskList = countries.Select(c => Task.Factory.StartNew(() =>
{
HttpContext.Current = context;
return new Dictionary<FileName, FileDetails> { { c, GetFilesInCountry(query, c) } };
})).ToList();
try
{
// Wait on all queries completing
Task.WaitAll(taskList.ToArray<Task>());
}
catch (AggregateException ae)
{
throw new ApplicationException("Failed.", ae);
}
// Return collated results
return taskList.SelectMany(t => t.Result).ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
}
The GetFilesInCountry method that actually contains the API call which relies on the Http.Context.Current.User. However when I hit a breakpoint on the return new line in GetFiles I can see the http.current.context.user is correctly set as expected. When I breakpoint into the GetFilesInCountry method if I hover over the Http.Context.Current.User in GetFiles I find that it is null.
I think this is due to the fact that the http request from where I started the first call (ExecuteMethodAndContinue) is finished so this is why the User on the current context is null.
Is there something straight forward I can do to correctly work around this?
The easiest way of course would be to never use HttpContext.Current. It's not a good practice anyway - you should only access the HttpContext in the request thread it's associated with. Instead, you can just make sure all the methods that require e.g. a user name, get the user name as an argument:
var username = HttpContext.Current.User.Identity.Name;
var taskList = countries.Select(c => Task.Factory.StartNew(() =>
{
return new Dictionary<FileName, FileDetails> { { c, GetFilesInCountry(query, c, username) } };
})).ToList();
If this is impractical for some reason (it probably isn't a very good reason, but fixing legacy applications to work like this can be a chore), you can replace the HttpContext.Current accesses with something a bit more specific, and not tied to a particular request. And, uh, thread-safe:
public static class UserContext
{
[ThreadStatic]
public static string Username;
}
So your calling code would look something like this:
var username = HttpContext.Current.User.Identity.Name;
var taskList = countries.Select(c => Task.Factory.StartNew(() =>
{
UserContext.Username = username;
return new Dictionary<FileName, FileDetails> { { c, GetFilesInCountry(query, c) } };
})).ToList();
And whenever you'd usually use HttpContext.Current.User.Identity.Name, you'll use UserContext.Username instead (don't forget to also fill the UserContext in the main request thread).
The huge caveat with this is that it gets completely crazy when you have asynchronous code in there; you're on the thread-pool, so you're not the exclusive user of those threads, and any awaits or continuations are free to be performed on any available thread-pool thread (there's no marshalling to a synchronization context). So anywhere you're creating more tasks, be it through manual Task.Run, await, ContinueWith or whatever, you'll losing this context. Just as importantly, there's no place where you can clear this information - this can obviously be a huge security hole, as concurrent requests may have different parts of the code execute with different user contexts. If you choose to go this path, you better read up a lot about making this kind of thing safe. You'll probably have to code your own synchronization context to hold this information, and make sure all the asynchronous stuff in your application marshalls back to this synchronization context. In short - don't do this. Really. It isn't worth it. You'll have so many obscure bugs that are very hard to reproduce, there's no way it will be worth it.
Related
In an application I am experiencing odd behavior due to wrong/unexpected values of AsyncLocal: Despite I suppressed the flow of the execution context, I the AsyncLocal.Value-property is sometimes not reset within the execution scope of a newly spawned Task.
Below I created a minimal reproducible sample which demonstrates the problem:
private static readonly AsyncLocal<object> AsyncLocal = new AsyncLocal<object>();
[TestMethod]
public void Test()
{
Trace.WriteLine(System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription);
var mainTask = Task.Factory.StartNew(() =>
{
AsyncLocal.Value = "1";
Task anotherTask;
using (ExecutionContext.SuppressFlow())
{
anotherTask = Task.Run(() =>
{
Trace.WriteLine(AsyncLocal.Value); // "1" <- ???
Assert.IsNull(AsyncLocal.Value); // BOOM - FAILS
AsyncLocal.Value = "2";
});
}
Task.WaitAll(anotherTask);
});
mainTask.Wait(500000, CancellationToken.None);
}
In nine out of ten runs (on my pc) the outcome of the Test-method is:
.NET 6.0.2
"1"
-> The test fails
As you can see the test fails because within the action which is executed within Task.Run the the previous value is still present within AsyncLocal.Value (Message: 1).
My concrete questions are:
Why does this happen?
I suspect this happens because Task.Run may use the current thread to execute the work load. In that case, I assume lack of async/await-operators does not force the creation of a new/separate ExecutionContext for the action. Like Stephen Cleary said "from the logical call context’s perspective, all synchronous invocations are “collapsed” - they’re actually part of the context of the closest async method further up the call stack". If that’s the case I do understand why the same context is used within the action.
Is this the correct explanation for this behavior? In addition, why does it work flawlessly sometimes (about 1 run out of 10 on my machine)?
How can I fix this?
Assuming that my theory above is true it should be enough to forcefully introduce a new async "layer", like below:
private static readonly AsyncLocal<object> AsyncLocal = new AsyncLocal<object>();
[TestMethod]
public void Test()
{
Trace.WriteLine(System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription);
var mainTask = Task.Factory.StartNew(() =>
{
AsyncLocal.Value = "1";
Task anotherTask;
using (ExecutionContext.SuppressFlow())
{
var wrapper = () =>
{
Trace.WriteLine(AsyncLocal.Value);
Assert.IsNull(AsyncLocal.Value);
AsyncLocal.Value = "2";
return Task.CompletedTask;
};
anotherTask = Task.Run(async () => await wrapper());
}
Task.WaitAll(anotherTask);
});
mainTask.Wait(500000, CancellationToken.None);
}
This seems to fix the problem (it consistently works on my machine), but I want to be sure that this is a correct fix for this problem.
Many thanks in advance
Why does this happen? I suspect this happens because Task.Run may use the current thread to execute the work load.
I suspect that it happens because Task.WaitAll will use the current thread to execute the task inline.
Specifically, Task.WaitAll calls Task.WaitAllCore, which will attempt to run it inline by calling Task.WrappedTryRunInline. I'm going to assume the default task scheduler is used throughout. In that case, this will invoke TaskScheduler.TryRunInline, which will return false if the delegate is already invoked. So, if the task has already started running on a thread pool thread, this will return back to WaitAllCore, which will just do a normal wait, and your code will work as expected (1 out of 10).
If a thread pool thread hasn't picked it up yet (9 out of 10), then TaskScheduler.TryRunInline will call TaskScheduler.TryExecuteTaskInline, the default implementation of which will call Task.ExecuteEntryUnsafe, which calls Task.ExecuteWithThreadLocal. Task.ExecuteWithThreadLocal has logic for applying an ExecutionContext if one was captured. Assuming none was captured, the task's delegate is just invoked directly.
So, it seems like each step is behaving logically. Technically, what ExecutionContext.SuppressFlow means is "don't capture the ExecutionContext", and that is what is happening. It doesn't mean "clear the ExecutionContext". Sometimes the task is run on a thread pool thread (without the captured ExecutionContext), and WaitAll will just wait for it to complete. Other times the task will be executed inline by WaitAll instead of a thread pool thread, and in that case the ExecutionContext is not cleared (and technically isn't captured, either).
You can test this theory by capturing the current thread id within your wrapper and comparing it to the thread id doing the Task.WaitAll. I expect that they will be the same thread for the runs where the async local value is (unexpectedly) inherited, and they will be different threads for the runs where the async local value works as expected.
If you can, I'd first consider whether it's possible to replace the thread-specific caches with a single shared cache. The app likely predates useful types such as ConcurrentDictionary.
If it isn't possible to use a singleton cache, then you can use a stack of async local values. Stacking async local values is a common pattern. I prefer wrapping the stack logic into a separate type (AsyncLocalValue in the code below):
public sealed class AsyncLocalValue
{
private static readonly AsyncLocal<ImmutableStack<object>> _asyncLocal = new();
public object Value => _asyncLocal.Value?.Peek();
public IDisposable PushValue(object value)
{
var originalValue = _asyncLocal.Value;
var newValue = (originalValue ?? ImmutableStack<object>.Empty).Push(value);
_asyncLocal.Value = newValue;
return Disposable.Create(() => _asyncLocal.Value = originalValue);
}
}
private static AsyncLocalValue AsyncLocal = new();
[TestMethod]
public void Test()
{
Console.WriteLine(System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription);
var mainTask = Task.Factory.StartNew(() =>
{
Task anotherTask;
using (AsyncLocal.PushValue("1"))
{
using (AsyncLocal.PushValue(null))
{
anotherTask = Task.Run(() =>
{
Console.WriteLine("Observed: " + AsyncLocal.Value);
using (AsyncLocal.PushValue("2"))
{
}
});
}
}
Task.WaitAll(anotherTask);
});
mainTask.Wait(500000, CancellationToken.None);
}
This code sample uses Disposable.Create from my Nito.Disposables library.
I need to use proxies to download a forum. The problem with my code is that it takes only 10% of my internet bandwidth. Also I have read that I need to use a single HttpClient instance, but with multiple proxies I don't know how to do it. Changing MaxDegreeOfParallelism doesn't change anything.
public static IAsyncEnumerable<IFetchResult> FetchInParallelAsync(
this IEnumerable<Url> urls, FetchContext context)
{
var fetchBlcock = new TransformBlock<Url, IFetchResult>(
transform: url => url.FetchAsync(context),
dataflowBlockOptions: new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 128
}
);
foreach(var url in urls)
fetchBlcock.Post(url);
fetchBlcock.Complete();
var result = fetchBlcock.ToAsyncEnumerable();
return result;
}
Every call to FetchAsync will create or reuse a HttpClient with a WebProxy.
public static async Task<IFetchResult> FetchAsync(this Url url, FetchContext context)
{
var httpClient = context.ProxyPool.Rent();
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay,
context.isReloadWithCookie);
context.ProxyPool.Return(httpClient);
return result;
}
public HttpClient Rent()
{
lock(_lockObject)
{
if (_uninitiliazedDatacenterProxiesAddresses.Count != 0)
{
var proxyAddress = _uninitiliazedDatacenterProxiesAddresses.Pop();
return proxyAddress.GetWebProxy(DataCenterProxiesCredentials).GetHttpClient();
}
return _proxiesQueue.Dequeue();
}
}
I am a novice at software developing, but the task of downloading using hundreds or thousands of proxies asynchronously looks like a trivial task that many should have been faced with and found a correct way to do it. So far I was unable to find any solutions to my problem on the internet. Any thoughts of how to achieve maximum download speed?
Let's take a look at what happens here:
var result = await url.FetchAsync(httpClient, context.Observer, context.Delay, context.isReloadWithCookie);
You are actually awaiting before you continue with the next item. That's why it is asynchronous and not parallel programming. async in Microsoft docs
The await keyword is where the magic happens. It yields control to the caller of the method that performed await, and it ultimately allows a UI to be responsive or a service to be elastic.
In essence, it frees the calling thread to do other stuff but the original calling code is suspended from executing, until the IO operation is done.
Now to your problem:
You can either use this excellent solution here: foreach async
You can use the Parallel library to execute your code in different threads.
Something like the following from Parallel for example
Parallel.For(0, urls.Count,
index => fetchBlcock.Post(urls[index])
});
I'm trying to "store" an async task for later completion - I've found the async cache example but this is effectively caching task results in a concurrent dictionary so that their results can be reloaded without re-doing the task again (the HTML implementation is here).
Basically what I'm trying to design is a dictionary of tasks, with correlation IDs (GUIDs) as the key. This is for co-ordinating incoming results from another place (XML identified by the GUID correlation ID) and my aim is for the task to suspend execution until the results come in (probably from a queue).
Is this going to work? This is my first foray into proper async coding and I can't find anything similar to my hopeful solution so I may well be entirely on the right track.
Can I effectively "store" a task for later completion, with the task result being set at completion time?
Edit: I've just found out about TaskCompletionSource (based on this question) is that viable?
If I understand your use-case correctly, you can use TaskCompletionSource.
An example of implementation:
public class AsyncCache
{
private Dictionary<Guid, Task<string>> _cache;
public Task<string> GetAsync(Guid guid)
{
if (_cache.TryGetValue(guid, out var task))
{
// The value is either there or already queued
return task;
}
var tcs = new TaskCompletionSource<string>(TaskCreationOptions.RunContinuationsAsynchronously);
_queue.Enqueue(() => {
var result = LoadResult();
tcs.TrySetValue(result);
});
_cache.Add(guid, tcs.Task);
return tcs.Task;
}
}
Here, _queue is whatever queuing mechanism you're going to use to process the data.
Of course, you would also have to make that code thread-safe.
Are you thinking of lazy loading? You could use Lazy<Task> (which will initialise the task but not queue it to run).
var tasks = new Dictionary<Guid, Lazy<Task>>();
tasks.Add(Task1Guid, new Lazy<Task>(() => { whatever the 1st task is }));
tasks.Add(Task2Guid, new Lazy<Task>(() => { whatever the 2nd task is }));
void async RunTaskAsync(Guid guid)
{
await tasks[guid].Value;
}
I have an interface that reads/writes an object to storage. In one case the storage is a database with async methods. In the other case it's just a cookie.
I gather that its recommended to use async back along the path ending at an async call, so it seems to make sense for the interface to be async as well. But in the cookie case, I'm just setting a couple fields and sticking it in the response so there isn't any async there yet. I can wrap that bit in await Task.Run() to match the new interface but I don't know if this is advisable or if it has some negative impact on performance.
What to do?
public interface IProfileStore
{
async Task SetProfile(UserProfile profile);
}
public async Task SetProfile(UserProfile profile)
{
// Look mom, I'm needlessly async
await Task.Run(() =>
{
var cookie = new HttpCookie(AnonymousCookieName);
cookie["name"] = profile.FullName;
HttpContext.Current.Response.Cookies.Add(cookie);
});
}
You should not do that; you're just creating needless threadpool churn.
Instead, remove the async keyword from the method and simply return Task.FromResult(0) to return a synchronously-completed task
If you're performing a very short quickly completed operation then you're quite right that there is likely no need to use Task.Run to push the work to another thread. The act of scheduling the code in the thread pool is likely going to take longer than just doing it.
As for how to do that, just remove the await Task.Run that you have no need for and voila, you're all set. You have a synchronous operation that is still wrapped in a Task and so still matches the required interface.
Almost as SLaks suggests if you were doing something async but return the Task, so:
public Task SetProfile(UserProfile profile)
{
return Task.Run(() =>
{
var cookie = new HttpCookie(AnonymousCookieName);
cookie["name"] = profile.FullName;
HttpContext.Current.Response.Cookies.Add(cookie);
});
}
However as he suggests in this case:
public Task SetProfile(UserProfile profile)
{
var cookie = new HttpCookie(AnonymousCookieName);
cookie["name"] = profile.FullName;
HttpContext.Current.Response.Cookies.Add(cookie);
return Task.FromResult(null);
}
Return null as its a system cached completed Task.
With business logic encapsulated behind synchronous service calls e.g.:
interface IFooService
{
Foo GetFooById(int id);
int SaveFoo(Foo foo);
}
What is the best way to extend/use these service calls in an asynchronous fashion?
At present I've created a simple AsyncUtils class:
public static class AsyncUtils
{
public static void Execute<T>(Func<T> asyncFunc)
{
Execute(asyncFunc, null, null);
}
public static void Execute<T>(Func<T> asyncFunc, Action<T> successCallback)
{
Execute(asyncFunc, successCallback, null);
}
public static void Execute<T>(Func<T> asyncFunc, Action<T> successCallback, Action<Exception> failureCallback)
{
ThreadPool.UnsafeQueueUserWorkItem(state => ExecuteAndHandleError(asyncFunc, successCallback, failureCallback), null);
}
private static void ExecuteAndHandleError<T>(Func<T> asyncFunc, Action<T> successCallback, Action<Exception> failureCallback)
{
try
{
T result = asyncFunc();
if (successCallback != null)
{
successCallback(result);
}
}
catch (Exception e)
{
if (failureCallback != null)
{
failureCallback(e);
}
}
}
}
Which lets me call anything asynchronously:
AsyncUtils(
() => _fooService.SaveFoo(foo),
id => HandleFooSavedSuccessfully(id),
ex => HandleFooSaveError(ex));
Whilst this works in simple use cases it quickly gets tricky if other processes need to coordinate about the results, for example if I need to save three objects asynchronously before the current thread can continue then I'd like a way to wait-on/join the worker threads.
Options I've thought of so far include:
having AsyncUtils return a WaitHandle
having AsyncUtils use an AsyncMethodCaller and return an IAsyncResult
rewriting the API to include Begin, End async calls
e.g. something resembling:
interface IFooService
{
Foo GetFooById(int id);
IAsyncResult BeginGetFooById(int id);
Foo EndGetFooById(IAsyncResult result);
int SaveFoo(Foo foo);
IAsyncResult BeginSaveFoo(Foo foo);
int EndSaveFoo(IAsyncResult result);
}
Are there other approaches I should consider? What are the benefits and potential pitfalls of each?
Ideally I'd like to keep the service layer simple/synchronous and provide some easy to use utility methods for calling them asynchronously. I'd be interested in hearing about solutions and ideas applicable to C# 3.5 and C# 4 (we haven't upgraded yet but will do in the relatively near future).
Looking forward to your ideas.
Given your requirement to stay .NET 2.0 only, and not work on 3.5 or 4.0, this is probably the best option.
I do have three remarks on your current implementation.
Is there a specific reason you're using ThreadPool.UnsafeQueueUserWorkItem? Unless there is a specific reason this is required, I would recommend using ThreadPool.QueueUserWorkItem instead, especially if you're in a large development team. The Unsafe version can potentially allow security flaws to appear as you lose the calling stack, and as a result, the ability to control permissions as closely.
The current design of your exception handling, using the failureCallback, will swallow all exceptions, and provide no feedback, unless a callback is defined. It might be better to propogate the exception and let it bubble up if you're not going to handle it properly. Alternatively, you could push this back onto the calling thread in some fashion, though this would require using something more like IAsyncResult.
You currently have no way to tell if an asynchronous call is completed. This would be the other advantage of using IAsyncResult in your design (though it does add some complexity to the implementation).
Once you upgrade to .NET 4, however, I would recommend just putting this in a Task or Task<T>, as it was designed to handle this very cleanly. Instead of:
AsyncUtils(
() => _fooService.SaveFoo(foo),
id => HandleFooSavedSuccessfully(id),
ex => HandleFooSaveError(ex));
You can use the built-in tools and just write:
var task = Task.Factory.StartNew(
() => return _fooService.SaveFoo(foo) );
task.ContinueWith(
t => HandleFooSavedSuccessfully(t.Result),
TaskContinuationOptions.NotOnFaulted);
task.ContinueWith(
t => try { t.Wait(); } catch( Exception e) { HandleFooSaveError(e); },
TaskContinuationOptions.OnlyOnFaulted );
Granted, the last line there is a bit odd, but that's mainly because I tried to keep your existing API. If you reworked it a bit, you could simplify it...
Asynchronous interface (based on IAsyncResult) is useful only when you have some non-blocking call under the cover. The main point of the interface is to make it possible to do the call without blocking the caller thread.
This is useful in scenarios when you can make some system call and the system will notify you back when something happens (e.g. when a HTTP response is received or when an event happens).
The price for using IAsyncResult based interface is that you have to write code in a somewhat awkward way (by making every call using callback). Even worse, asynchronous API makes it impossible to use standard language constructs like while, for, or try..catch.
I don't really see the point of wrapping synchronous API into asynchronous interface, because you won't get the benefit (there will always be some thread blocked) and you'll just get more awkward way of calling it.
Of course, it makes a perfect sense to run the synchronous code on a background thread somehow (to avoid blocking the main application thread). Either using Task<T> on .NET 4.0 or using QueueUserWorkItem on .NET 2.0. However, I'm not sure if this should be done automatically in the service - it feels like doing this on the caller side would be easier, because you may need to perform multiple calls to the service. Using asynchronous API, you'd have to write something like:
svc.BeginGetFooId(ar1 => {
var foo = ar1.Result;
foo.Prop = 123;
svc.BeginSaveFoo(foo, ar2 => {
// etc...
}
});
When using synchronous API, you'd write something like:
ThreadPool.QueueUserWorkItem(() => {
var foo = svc.GetFooId();
foo.Prop = 123;
svc.SaveFoo(foo);
});
The following is a response to Reed's follow-up question. I'm not suggesting that it's the right way to go.
public static int PerformSlowly(int id)
{
// Addition isn't so hard, but let's pretend.
Thread.Sleep(10000);
return 42 + id;
}
public static Task<int> PerformTask(int id)
{
// Here's the straightforward approach.
return Task.Factory.StartNew(() => PerformSlowly(id));
}
public static Lazy<int> PerformLazily(int id)
{
// Start performing it now, but don't block.
var task = PerformTask(id);
// JIT for the value being checked, block and retrieve.
return new Lazy<int>(() => task.Result);
}
static void Main(string[] args)
{
int i;
// Start calculating the result, using a Lazy<int> as the future value.
var result = PerformLazily(7);
// Do assorted work, then get result.
i = result.Value;
// The alternative is to use the Task as the future value.
var task = PerformTask(7);
// Do assorted work, then get result.
i = task.Result;
}