I've never attempted to use WaitAll() or WhenAll() when running async functionality. After looking at many documentations, SO posts, and tutorials, I haven't found enough information for this, so here I am.
I'm trying to figure out the best/proper way(s) to do the following:
Using EF6, get data as List<Entity>.
Iterate through each Entity and call an external API to perform some action.
External API returns data per Entity which I need to store on the same Entity.
Currently I have built (not tested) the following (without the error handling code):
public IEnumerable<Entity> Process() {
bool hasChanged = false;
var data = _db.Entity.Where(x => !x.IsRegistered);
foreach (var entity in data) {
var result = await CallExternalApi(entity.Id, entity.Name);
entity.RegistrationId = result.RegistrationId;
entity.IsRegistered = true;
_db.Entry(entity).State = EntityState.Modified;
hasChanges = true;
}
if (hasChanges) {
uow.Commit();
}
return data;
}
I feel like I may be able to take advantage of some other functionality/feature in async, but if I can I'm not sure how to implement it here.
Any guidance is really appreciated.
Update
The API I'm calling is the Zoom Api to add Registrants. While they do have an route to batch add Registrants, it does not return the RegistrantId and the Join Url I need.
First, figure out if your external API might have a way to get all the items you want in a batch. If it does, use that instead of sending a whole bunch of requests.
If you need to send a separate request for each item, but want to do it concurrently, you could do this:
public async Task<IReadOnlyCollection<Entity>> Process() {
var data = _db.Entity.Where(x => !x.IsRegistered).ToList();
if(!data.Any()) { return data; }
var entityResultTasks = data
.Select(async entity => new { entity, result = await CallExternalApi(entity.Id, entity.Name) })
.ToList();
var entityResults = await Task.WhenAll(entityResultTasks);
foreach (var entityResult in entityResults) {
var entity = entityResult.entity;
var result = entityResult.result;
entity.RegistrationId = result.RegistrationId;
entity.IsRegistered = true;
_db.Entry(entity).State = EntityState.Modified;
}
uow.Commit();
return data;
}
You will want to watch out for possible concurrency limits on the target source. Consider using Chunk to break your work into batches of acceptable sizes, or leveraging a semaphore or something to throttle the number of calls you're making.
I have a controller which returns a large json object. If this object does not exist, it will generate and return it afterwards. The generation takes about 5 seconds, and if the client sent the request multiple times, the object gets generated with x-times the children. So my question is: Is there a way to block the second request, until the first one finished, independent who sent the request?
Normally I would do it with a Singleton, but because I am having scoped services, singleton does not work here
Warning: this is very oppinionated and maybe not suitable for Stack Overflow, but here it is anyway
Although I'll provide no code... when things take a while to generate, you don't usually spend that time directly in controller code, but do something like "start a background task to generate the result, and provide a "task id", which can be queried on another different call).
So, my preferred course of action for this would be having two different controller actions:
Generate, which creates the background job, assigns it some id, and returns the id
GetResult, to which you pass the task id, and returns either different error codes for "job id doesn't exist", "job id isn't finished", or a 200 with the result.
This way, your clients will need to call both, however, in Generate, you can check if the job is already being created and return an existing job id.
This of course moves the need to "retry and check" to your client: in exchange, you don't leave the connection to the server opened during those 5 seconds (which could potentially be multiplied by a number of clients) and return fast.
Otherwise, if you don't care about having your clients wait for a response during those 5 seconds, you could do a simple:
if(resultDoesntExist) {
resultDoesntExist = false; // You can use locks for the boolean setters or Interlocked instead of just setting a member
resultIsBeingGenerated = true;
generateResult(); // <-- this is what takes 5 seconds
resultIsBeingGenerated = false;
}
while(resultIsBeingGenerated) { await Task.Delay(10); } // <-- other clients will wait here
var result = getResult(); // <-- this should be fast once the result is already created
return result;
note: those booleans and the actual loop could be on the controller, or on the service, or wherever you see fit: just be wary of making them thread-safe in however method you see appropriate
So you basically make other clients wait till the first one generates the result, with "almost" no CPU load on the server... however with a connection open and a thread from the threadpool used, so I just DO NOT recommend this :-)
PS: #Leaky solution above is also good, but it also shifts the responsability to retry to the client, and if you are going to do that, I'd probably go directly with a "background job id", instead of having the first (the one that generates the result) one take 5 seconds. IMO, if it can be avoided, no API action should ever take 5 seconds to return :-)
Do you have an example for Interlocked.CompareExchange?
Sure. I'm definitely not the most knowledgeable person when it comes to multi-threading stuff, but this is quite simple (as you might know, Interlocked has no support for bool, so it's customary to represent it with an integral type):
public class QueryStatus
{
private static int _flag;
// Returns false if the query has already started.
public bool TrySetStarted()
=> Interlocked.CompareExchange(ref _flag, 1, 0) == 0;
public void SetFinished()
=> Interlocked.Exchange(ref _flag, 0);
}
I think it's the safest if you use it like this, with a 'Try' method, which tries to set the value and tells you if it was already set, in an atomic way.
Besides simply adding this (I mean just the field and the methods) to your existing component, you can also use it as a separate component, injected from the IOC container as scoped. Or even injected as a singleton, and then you don't have to use a static field.
Storing state like this should be good for as long as the application is running, but if the hosted application is recycled due to inactivity, it's obviously lost. Though, that won't happen while a request is still processing, and definitely won't happen in 5 seconds.
(And if you wanted to synchronize between app service instances, you could 'quickly' save a flag to the database, in a transaction with proper isolation level set. Or use e.g. Azure Redis Cache.)
Example solution
As Kit noted, rightly so, I didn't provide a full solution above.
So, a crude implementation could go like this:
public class SomeQueryService : ISomeQueryService
{
private static int _hasStartedFlag;
private static bool TrySetStarted()
=> Interlocked.CompareExchange(ref _hasStartedFlag, 1, 0) == 0;
private static void SetFinished()
=> Interlocked.Exchange(ref _hasStartedFlag, 0);
public async Task<(bool couldExecute, object result)> TryExecute()
{
if (!TrySetStarted())
return (couldExecute: false, result: null);
// Safely execute long query.
SetFinished();
return (couldExecute: true, result: result);
}
}
// In the controller, obviously
[HttpGet()]
public async Task<IActionResult> DoLongQuery([FromServices] ISomeQueryService someQueryService)
{
var (couldExecute, result) = await someQueryService.TryExecute();
if (!couldExecute)
{
return new ObjectResult(new ProblemDetails
{
Status = StatusCodes.Status503ServiceUnavailable,
Title = "Another request has already started. Try again later.",
Type = "https://tools.ietf.org/html/rfc7231#section-6.6.4"
})
{ StatusCode = StatusCodes.Status503ServiceUnavailable };
}
return Ok(result);
}
Of course possibly you'd want to extract the 'blocking' logic from the controller action into somewhere else, for example an action filter. In that case the flag should also go into a separate component that could be shared between the query service and the filter.
General use action filter
I felt bad about my inelegant solution above, and I realized that this problem can be generalized into basically a connection number limiter on an endpoint.
I wrote this small action filter that can be applied to any endpoint (multiple endpoints), and it accepts the number of allowed connections:
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public class ConcurrencyLimiterAttribute : ActionFilterAttribute
{
private readonly int _allowedConnections;
private static readonly ConcurrentDictionary<string, int> _connections = new ConcurrentDictionary<string, int>();
public ConcurrencyLimiterAttribute(int allowedConnections = 1)
=> _allowedConnections = allowedConnections;
public override async Task OnActionExecutionAsync(ActionExecutingContext context, ActionExecutionDelegate next)
{
var key = context.HttpContext.Request.Path;
if (_connections.AddOrUpdate(key, 1, (k, v) => ++v) > _allowedConnections)
{
Close(withError: true);
return;
}
try
{
await next();
}
finally
{
Close();
}
void Close(bool withError = false)
{
if (withError)
{
context.Result = new ObjectResult(new ProblemDetails
{
Status = StatusCodes.Status503ServiceUnavailable,
Title = $"Maximum {_allowedConnections} simultaneous connections are allowed. Try again later.",
Type = "https://tools.ietf.org/html/rfc7231#section-6.6.4"
})
{ StatusCode = StatusCodes.Status503ServiceUnavailable };
}
_connections.AddOrUpdate(key, 0, (k, v) => --v);
}
}
}
I am facing an issue with two different endpoints in my single asp.net app. Basically, the issue is that one of the endpoints does not allow asynchronous methods in the page and the other endpoint does. If I run the app one endpoint will ask me to have an asynchronous asp.net page, but the other one crashes and vice versa.
public async Task<AirtableListRecordsResponse> RetrieveRecord()
{
string MyProductID = ProductID;
string baseId = "00000000000xxxx";
string appKey = "00000000000xxxx";
var records = new List<AirtableRecord>();
using (AirtableBase airtableBase = new AirtableBase(appKey, baseId))
{
Task<AirtableListRecordsResponse> task = airtableBase.ListRecords(tableName: "efls", filterByFormula: ProductID);
AirtableListRecordsResponse response = await task;
if (!response.Success)
{
string errorMessage = null;
if (response.AirtableApiError is AirtableApiException)
{
errorMessage = response.AirtableApiError.ErrorMessage;
}
else
{
errorMessage = "Unknown error";
}
// Report errorMessage
}
else
{
records.AddRange(response.Records.ToList());
var record = response.Records;
//offset = response.Offset;
//var record = response.Record;
foreach (var item in record)
{
foreach (var Fields in item.Fields)
{
if (Fields.Key == "pdfUrl")
{
string link = Fields.Value.ToString();
MyLink = Fields.Value.ToString();
}
}
}
// Do something with your retrieved record.
// Such as getting the attachmentList of the record if you
// know the Attachment field name
//var attachmentList = response.Record.GetAttachmentField(YOUR_ATTACHMENT_FIELD_NAME);
}
return response;
}
}
This is the asynchronous method which asks for an asynchronous page, the other contains a strong structure and it cannot be changed for any reason. Is there any way to make them work together?
I am using airtable.com api by the way.
Thanks in advance.
I solved by my own,
The solution I found is the following:
When a page works with two different endpoints and one of them obligates the page to be asynchronous the best solution is to split the procedures into two different sections and/or pages, one of them will call the asynchronous methods and retrieves the info and other works without being asynchronous.
How can I pass the information between the sites?
Using session variables, there are endpoints which only needs to display simple data as in this case, so the session variables will be called in the page #2 which is the non-asynchronous page.
It is a simple solution but effective.
Thank you very much to all for you answers.
Using Wait on Task, you can use synchronous method
Task<AirtableListRecordsResponse> task = Task.Run(() => airtableBase.ListRecords(tableName: "efls", filterByFormula: ProductID));
task.Wait();
AirtableListRecordsResponse response = task.Result;
Use it only when you cannot use async method.
This method is completely deadlock free as mentioned on msdn blog-
https://blogs.msdn.microsoft.com/jpsanders/2017/08/28/asp-net-do-not-use-task-result-in-main-context/
I was trying to figure out some performance values of two scenarios. I thought I was only going to declare the obvious at first. But when I got the results I got a little confused. And now I am looking for a justification for the case.
I have a library which makes couple of queries through a MongoDb database and Active Directory services, then returns the results to client, which are:
GetUserType - to MongoDb - there is a collection which has username and type fields in its all documents. In the query I give the username and ask for the type field.
LoginCheck - to Active Directory - given the username and the password from the client, I create a PrincipalContext object to access to AD server and call ValidateCredentials upon it.
This job is performing on an existing MVC application at the moment. And we are going to create a new desktop application and employ it with the same job.
We were curios about how different can these two scenarios perform? We thought that a direct call to a library without any http connection would perform better than a service request without an hesitation. But we still wondered how much difference is there, and if it was acceptable we are going to make it work through the rest MVC service - because of reasons :)
Hence we tested out the following architectures:
Scenario 1:
Scenario 2:
Basically, what I do for performance test is this:
For scenario 1:
for(var i = 0; i<10000; i++)
{
new Class1().HeavyMethod();
}
For scenario 2:
// client side
for(var i = 0; i<10000; i++)
{
using ( var client = new HttpClient() )
{
var values = new Dictionary<string, string>();
var content = new FormUrlEncodedContent(values);
var response = client.PostAsync("http://localhost:654/Home/HeavyLift", content).Result;
var responseString = response.Content.ReadAsStringAsync().Result;
}
}
// MVC rest service
public class HomeController : Controller
{
public JsonResult HeavyLift()
{
return Json(new Class1().HeavyMethod(), JsonRequestBehavior.AllowGet);
}
}
Common Class:
public class Class1
{
public string HeavyMethod ()
{
var userName = "asdfasdfasd";
var password = "asdfasdfasdf";
try
{
// this call is to MongoDB
var userType = Personnel.GetPersonnelsType(userName).Result;
// this call is to Active Directory
var user = new ADUser(new Session
{
UserType = userType.Type,
UserName = userName,
Password = password
});
return userType.Type + "-" + user.Auth();
}
catch ( Exception e )
{
return e.Message;
}
}
}
The results for 10000 consecutive calls are confusingly shocking:
Scenario 1: 159181 ms
Scenario 2: 13952 ms
Scenario 1 starts off pretty quicly for the first few dosens of calls, then it starts to slow down.
Scenario 2 though offers a constant response time through 10k calls.
What is actually happening here?
Note: I checked the memory and cpu usages of the server that this scenarios runs on(everything runs on the same server) but there is nothing interesting actually, they are behaving just the same in terms of memory and cpu resources.
I have a slow and expensive method that return some data for me:
public Data GetData(){...}
I don't want to wait until this method will execute. Rather than I want to return a cached data immediately.
I have a class CachedData that contains one property Data cachedData.
So I want to create another method public CachedData GetCachedData() that will initiate a new task(call GetData inside of it) and immediately return cached data and after task will finish we will update the cache.
I need to have thread safe GetCachedData() because I will have multiple request that will call this method.
I will have a light ping "is there anything change?" each minute and if it will return true (cachedData != currentData) then I will call GetCachedData().
I'm new in C#. Please, help me to implement this method.
I'm using .net framework 4.5.2
The basic idea is clear:
You have a Data property which is wrapper around an expensive function call.
In order to have some response immediately the property holds a cached value and performs updating in the background.
No need for an event when the updater is done because you poll, for now.
That seems like a straight-forward design. At some point you may want to use events, but that can be added later.
Depending on the circumstances it may be necessary to make access to the property thread-safe. I think that if the Data cache is a simple reference and no other data is updated together with it, a lock is not necessary, but you may want to declare the reference volatile so that the reading thread does not rely on a stale cached (ha!) version. This post seems to have good links which discuss the issues.
If you will not call GetCachedData at the same time, you may not use lock. If data is null (for sure first run) we will wait long method to finish its work.
public class SlowClass
{
private static object _lock;
private static Data _cachedData;
public SlowClass()
{
_lock = new object();
}
public void GetCachedData()
{
var task = new Task(DoStuffLongRun);
task.Start();
if (_cachedData == null)
task.Wait();
}
public Data GetData()
{
if (_cachedData == null)
GetCachedData();
return _cachedData;
}
private void DoStuffLongRun()
{
lock (_lock)
{
Console.WriteLine("Locked Entered");
Thread.Sleep(5000);//Do Long Stuff
_cachedData = new Data();
}
}
}
I have tested on console application.
static void Main(string[] args)
{
var mySlow = new SlowClass();
var mySlow2 = new SlowClass();
mySlow.GetCachedData();
for (int i = 0; i < 5; i++)
{
Console.WriteLine(i);
mySlow.GetData();
mySlow2.GetData();
}
mySlow.GetCachedData();
Console.Read();
}
Maybe you can use the MemoryCache class,
as explained here in MSDN