I have two internal processes which I use to upload long sdos strings to an API. Process 1 reads these from another stream. Process 1 (client) sends strings to process 2 (server) via a [ServiceContract] and a [MessageContract]. Process 2 then sends this to an API which in turn processes the sdos and uploads to a server.
[MessageContract]
public class CallRequestMessage
{
[MessageHeader]
public string Sdos;
[MessageHeader]
public int ArrayLength;
[MessageBodyMember]
public Stream SdosStream;
}
[MessageContract]
public class CallResponseMessage
{
[MessageHeader]
public Task<ResultCode> Task;
}
Since the bulk of the time processing the string is in the API, I want to try and return a Task<ResultCode> from my server that will get a result from the API once the processing has concluded. Then my threads can work on client-side processing (in this case, reading the sdos strings from a stream input).
My problem is that the tasks returned to the client seem to be different to the ones that I create on the server. On the server I have the code
task = Task<ResultCode>.Factory.StartNew(() =>
{
ResultCode res;
lock (SyncObject)
res = upload(/* input */)
return res;
});
// ...other code
return new CallResponseMessage { Task = task };
where upload is a method in the API, accessed by process 2 by using a [DllImportAttribute].
Using logs I have seen that the task does complete on the server (all sdos are uploaded), however on the client side, all tasks appear to not have started, and so retrieving the results is not possible directly.
An alternative approach that I thought of would be to return nothing from the server, and add a separate method that retrospectively goes to the server, awaits the tasks, and returns an aggregated result. I would like to try and get the task back directly, though, as this implementation may be a model for future services in my projects.
Thank you for any help.
There are no Task instances across process boundaries. The server's task is the Task that sends the data to the client. The client task is the task that receives the data. If you use the asnyc methods on the auto-generated WCF clients, by default, WCF will not stream the data from server to client, so your normal flow will be:
Start client task -> Send request -> Start server task -> End server task -> Send response -> End client task
In order for the server tasks to be performed asynchronously, you can design your service methods with the task asynchronous pattern (TAP). This example is from the official documentation:
public class SampleService:ISampleService
{
// ...
public async Task<string> SampleMethodTaskAsync(string msg)
{
return Task<string>.Factory.StartNew(() =>
{
return msg;
});
}
// ...
}
The benefits of tasks on client and server is not so much that the client can receive while the server sends the data, but to allow the server to process more incoming requests while other requests are waiting for long running operations (e.g. data access) and the client to do something useful while the data is received.
Your options are:
Use seperate asynchronous server and client operations
Unless you are transferring large amounts of data and performance is critical, there is nothing wrong with the situation. You can still use tasks for async programming. However, your approach of returning a task won't work. Use the described combination of async service methods and the auto-generated async client methods. You will essentially achieve the same result, which is that both, client and server will perform the operation asynchronously.
Stream the data
If you must start processing on the client while the server is sending the data (which only brings you a practical benefit for large amounts of data), you can stream the data from the server. This issue is too large to cover here, but a good point to start is the official documentation.
Related
This question already has answers here:
Wraping sync API into Async method
(2 answers)
Closed 1 year ago.
I read some posts and articles on the internet saying that I should not use async until my calls are IO (reading file, sending request, etc.). But if the calling API is actually an IO-related API and does not support async calls, should I wrap them with Task.Run()?
Say, if I have MVC server calling an external server via API to get data, but the API does not have async methods. Is there a benefit wrapping sync calls with async in such case?
As I understand, if not wrapping, then all calls to the API are sync and each call to my MVC app uses a thread from thread pool. In second case Task.Run() will queue my task onto thread pool, but releases main thread. Is released main thread considered to be benefit here, or it is not worth wrapping? Am I correct here at all?
EDIT
Ok, here are some more details as people requested.
I have a third party .NET assembly that I'm forced to use. I can't just call the web-service directly. It looks pretty much like this:
// the API (I not owner of it, can't change it):
public class Service {
/* everything is synchronous like this. but somewhere inside
it still makes HTTP requests to external server, so that's
kinda IO. just the lib is very old and doesn't provide async methods*/
public Data GetData(Paramters parameters);
}
// here comes my code.
// option 1. sync controller
public class MyController {
public ActionResult GetDataById(string id) {
var data = new Service().GetData(new Parameters{Id = id});
// process and return some result ...
}
}
// option 2. async controller
public class MyController {
public async ActionResult GetDataById(string id) {
var result = await Task.Run(() => new Service().GetData(new Parameters{Id = id}));
// process and return some result ...
}
}
So the question is, does it make sense to do option 2?
By wrapping into async, you may get the benefit of making parallel calls to the external server while the actual call would still remain sync , so it would be like multiple threads each waiting on response from external server. I have done this and it could be beneficial when you would like to handle the failures and then the application would not remain stuck in the calling threads.
Let's say if the API itself performs IO related tasks even then having a free main thread would benefit you if the user chooses to perform multiple tasks through the API you would benefit by having the main thread free rather than having it blocked on a single request.
When receiving mail through MailGun they require a response within a limited time. I have two issues with this:
1) After receiving the message I need to process and record it in my CRM which takes some time. This causes MailGun to time out before I get to send a response. Then MailGun resends the message again and again as it continues to time out.
2) MailGun's post is not async but the api calls to my CRM are async.
So I need to send MailGun a 200 response and then continue to process the message. And that process needs to be in async.
The below code shows what I want to have happen. I tried using tasks and couldn't get it working. There are times when many emails can come in a once (like when initializing someone's account) if the solution requires some sort of parallel tasks or threads it would need to handle many of them.
public class HomeController : Controller
{
[HttpPost]
[Route("mail1")]
public ActionResult Mail()
{
var emailObj = MailGun.Receive(Request);
return Content("ok");
_ = await CRM.SendToEmailApp(emailObj);
}
}
Thank you for the help!
The easiest way to do what you are describing (which is not recommended, because you may lose some results if your app crash) is to use a fire & forget task:
var emailObj = MailGun.Receive(Request);
Task.Run(async () => await CRM.SendToEmailApp(emailObj));
return Content("ok");
But, I think what you really want is sort of a Message Queue, by using a message queue you put the message in the queue (which is fast enough) and return immediately, at the same time a processor is processing the message queue and saves the result in the CRM.
This is what it'll look like when you use a message queueing broker.
I have a long running process which performs matches between millions of records I call this code using a Service Bus, However when my process passes the 5 minute limit Azure starts processing the already processed records from the start again.
How can I avoid this
Here is my code:
private static async Task ProcessMessagesAsync(Message message, CancellationToken token)
{
long receivedMessageTrasactionId = 0;
try
{
IQueueClient queueClient = new QueueClient(serviceBusConnectionString, serviceBusQueueName, ReceiveMode.PeekLock);
// Process the message
receivedMessageTrasactionId = Convert.ToInt64(Encoding.UTF8.GetString(message.Body));
// My Very Long Running Method
await DataCleanse.PerformDataCleanse(receivedMessageTrasactionId);
//Get Transaction and Metric details
await queueClient.CompleteAsync(message.SystemProperties.LockToken);
}
catch (Exception ex)
{
Log4NetErrorLogger(ex);
throw ex;
}
}
Messages are intended for notifications and not long running processing.
You've got a fewoptions:
Receive the message and rely on receiver's RenewLock() operation to extend the lock.
Use user-callback API and specify maximum processing time, if known, via MessageHandlerOptions.MaxAutoRenewDuration setting to auto-renew message's lock.
Record the processing started but do not complete the incoming message. Rather leverage message deferral feature, sending yourself a new delayed message with the reference to the deferred message SequenceNumber. This will allow you to periodically receive a "reminder" message to see if the work is finished. If it is, complete the deferred message by its SequenceNumber. Otherise, complete the "reminder" message along with sending a new one. This approach would require some level of your architecture redesign.
Similar to option 3, but offload processing to an external process that will report the status later. There are frameworks that can help you with that. MassTransit or NServiceBus. The latter has a sample you can download and play with.
Note that option 1 and 2 are not guaranteed as those are client-side initiated operations.
Background
We have a service operation that can receive concurrent asynchronous requests and must process those requests one at a time.
In the following example, the UploadAndImport(...) method receives concurrent requests on multiple threads, but its calls to the ImportFile(...) method must happen one at a time.
Layperson Description
Imagine a warehouse with many workers (multiple threads). People (clients) can send the warehouse many packages (requests) at the same time (concurrently). When a package comes in a worker takes responsibility for it from start to finish, and the person who dropped off the package can leave (fire-and-forget). The workers' job is to put each package down a small chute, and only one worker can put a package down a chute at a time, otherwise chaos ensues. If the person who dropped off the package checks in later (polling endpoint), the warehouse should be able to report on whether the package went down the chute or not.
Question
The question then is how to write a service operation that...
can receive concurrent client requests,
receives and processes those requests on multiple threads,
processes requests on the same thread that received the request,
processes requests one at a time,
is a one way fire-and-forget operation, and
has a separate polling endpoint that reports on request completion.
We've tried the following and are wondering two things:
Are there any race conditions that we have not considered?
Is there a more canonical way to code this scenario in C#.NET with a service oriented architecture (we happen to be using WCF)?
Example: What We Have Tried?
This is the service code that we have tried. It works though it feels like somewhat of a hack or kludge.
static ImportFileInfo _inProgressRequest = null;
static readonly ConcurrentDictionary<Guid, ImportFileInfo> WaitingRequests =
new ConcurrentDictionary<Guid, ImportFileInfo>();
public void UploadAndImport(ImportFileInfo request)
{
// Receive the incoming request
WaitingRequests.TryAdd(request.OperationId, request);
while (null != Interlocked.CompareExchange(ref _inProgressRequest, request, null))
{
// Wait for any previous processing to complete
Thread.Sleep(500);
}
// Process the incoming request
ImportFile(request);
Interlocked.Exchange(ref _inProgressRequest, null);
WaitingRequests.TryRemove(request.OperationId, out _);
}
public bool UploadAndImportIsComplete(Guid operationId) =>
!WaitingRequests.ContainsKey(operationId);
This is example client code.
private static async Task UploadFile(FileInfo fileInfo, ImportFileInfo importFileInfo)
{
using (var proxy = new Proxy())
using (var stream = new FileStream(fileInfo.FullName, FileMode.Open, FileAccess.Read))
{
importFileInfo.FileByteStream = stream;
proxy.UploadAndImport(importFileInfo);
}
await Task.Run(() => Poller.Poll(timeoutSeconds: 90, intervalSeconds: 1, func: () =>
{
using (var proxy = new Proxy())
{
return proxy.UploadAndImportIsComplete(importFileInfo.OperationId);
}
}));
}
It's hard to write a minimum viable example of this in a Fiddle, but here is a start that give a sense and that compiles.
As before, the above seems like a hack/kludge, and we are asking both about potential pitfalls in its approach and for alternative patterns that are more appropriate/canonical.
Simple solution using Producer-Consumer pattern to pipe requests in case of thread count restrictions.
You still have to implement a simple progress reporter or event. I suggest to replace the expensive polling approach with an asynchronous communication which is offered by Microsoft's SignalR library. It uses WebSocket to enable async behavior. The client and server can register their callbacks on a hub. Using RPC the client can now invoke server side methods and vice versa. You would post progress to the client by using the hub (client side). In my experience SignalR is very simple to use and very good documented. It has a library for all famous server side languages (e.g. Java).
Polling in my understanding is the totally opposite of fire-and-forget. You can't forget, because you have to check something based on an interval. Event based communication, like SignalR, is fire-and-forget since you fire and will get a reminder (cause you forgot). The "event side" will invoke your callback instead of you waiting to do it yourself!
Requirement 5 is ignored since I didn't get any reason. Waiting for a thread to complete would eliminate the fire and forget character.
private BlockingCollection<ImportFileInfo> requestQueue = new BlockingCollection<ImportFileInfo>();
private bool isServiceEnabled;
private readonly int maxNumberOfThreads = 8;
private Semaphore semaphore = new Semaphore(numberOfThreads);
private readonly object syncLock = new object();
public void UploadAndImport(ImportFileInfo request)
{
// Start the request handler background loop
if (!this.isServiceEnabled)
{
this.requestQueue?.Dispose();
this.requestQueue = new BlockingCollection<ImportFileInfo>();
// Fire and forget (requirement 4)
Task.Run(() => HandleRequests());
this.isServiceEnabled = true;
}
// Cache multiple incoming client requests (requirement 1) (and enable throttling)
this.requestQueue.Add(request);
}
private void HandleRequests()
{
while (!this.requestQueue.IsCompleted)
{
// Wait while thread limit is exceeded (some throttling)
this.semaphore.WaitOne();
// Process the incoming requests in a dedicated thread (requirement 2) until the BlockingCollection is marked completed.
Task.Run(() => ProcessRequest());
}
// Reset the request handler after BlockingCollection was marked completed
this.isServiceEnabled = false;
this.requestQueue.Dispose();
}
private void ProcessRequest()
{
ImportFileInfo request = this.requestQueue.Take();
UploadFile(request);
// You updated your question saying the method "ImportFile()" requires synchronization.
// This a bottleneck and will significantly drop performance, when this method is long running.
lock (this.syncLock)
{
ImportFile(request);
}
this.semaphore.Release();
}
Remarks:
BlockingCollection is a IDisposable
TODO: You have to "close" the BlockingCollection by marking it completed:
"BlockingCollection.CompleteAdding()" or it will loop indeterminately waiting for further requests. Maybe you introduce a additional request methods for the client to cancel and/ or to update the process and to mark adding to the BlockingCollection as completed. Or a timer that waits an idle time before marking it as completed. Or make your request handler thread block or spin.
Replace Take() and Add(...) with TryTake(...) and TryAdd(...) if you want cancellation support
Code is not tested
Your "ImportFile()" method is a bottleneck in your multi threading environment. I suggest to make it thread safe. In case of I/O that requires synchronization, I would cache the data in a BlockingCollection and then write them to I/O one by one.
The problem is that your total bandwidth is very small-- only one job can run at a time-- and you want to handle parallel requests. That means that queue time could vary wildly. It may not be the best choice to implement your job queue in-memory, as it would make your system much more brittle, and more difficult to scale out when your business grows.
A traditional, scaleable way to architect this would be:
An HTTP service to accept requests, load balanced/redundant, with no session state.
A SQL Server database to persist the requests in a queue, returning a persistent unique job ID.
A Windows service to process the queue, one job at a time, and mark jobs as complete. The worker process for the service would probably be single-threaded.
This solution requires you to choose a web server. A common choice is IIS running ASP.NET. On that platform, each request is guaranteed to be handled in a single-threaded manner (i.e. you don't need to worry about race conditions too much), but due to a feature called thread agility the request might end with a different thread, but in the original synchronization context, which means you will probably never notice unless you are debugging and inspecting thread IDs.
Given the constraints context of our system, this is the implementation we ended up using:
static ImportFileInfo _importInProgressItem = null;
static readonly ConcurrentQueue<ImportFileInfo> ImportQueue =
new ConcurrentQueue<ImportFileInfo>();
public void UploadAndImport(ImportFileInfo request) {
UploadFile(request);
ImportFileSynchronized(request);
}
// Synchronize the file import,
// because the database allows a user to perform only one write at a time.
private void ImportFileSynchronized(ImportFileInfo request) {
ImportQueue.Enqueue(request);
do {
ImportQueue.TryPeek(out var next);
if (null != Interlocked.CompareExchange(ref _importInProgressItem, next, null)) {
// Queue processing is already under way in another thread.
return;
}
ImportFile(next);
ImportQueue.TryDequeue(out _);
Interlocked.Exchange(ref _importInProgressItem, null);
}
while (ImportQueue.Any());
}
public bool UploadAndImportIsComplete(Guid operationId) =>
ImportQueue.All(waiting => waiting.OperationId != operationId);
This solution works well for the loads we are expecting. That load involves a maximum of about 15-20 concurrent PDF file uploads. The batch of up to 15-20 files tends to arrive all at once and then to go quiet for several hours until the next batch arrives.
Criticism and feedback is most welcome.
Background: We have import-functions that can take anywhere from a few seconds to 1-2 hours to run depending on the file being imported. We want to expose a new way of triggering imports, via a REST request.
Ideally the REST service would be called, trigger the import and reply with a result when done. My question is: since it can take up to two hours to run, is it possible to reply or will the request timeout for the caller? Is there a better way for this kind of operation?
What I use in these cases is an asynchronous operation that returns no result (void function result in case of c# Web API), then send the result asynchronously using a message queue.
E.g.
[HttpPut]
[Route("update")]
public void Update()
{
var task = Task.Run(() => this.engine.Update());
task.ContinueWith(t => publish(t, "Update()"));
}