Convert synchronous zip operation to async - c#

We got an existing library where some of the methods needs to be converted to async methods.
However I'm not sure how to do it with the following method (errorhandling has been removed). The purpose of the method is to zip a file and save it to disk. (Note that the zip class doesn't expose any async methods.)
public static bool ZipAndSaveFile(string fileToPack, string archiveName, string outputDirectory)
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
using (var zip = new ZipFile())
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zip.Comment = $"This archive was created at {System.DateTime.UtcNow.ToString("G")} (UTC)";
zip.AddFile(fileToPack);
zip.Save(archiveNameAndPath);
}
return true;
}
An implementation could look like this:
public static async Task<bool> ZipAndSaveFileAsync(string fileToPack, string archiveName, string outputDirectory)
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
await Task.Run(() =>
{
using (var zip = new ZipFile())
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zip.Comment = $"This archive was created at {System.DateTime.UtcNow.ToString("G")} (UTC)";
zip.AddFile(fileToPack);
zip.Save(archiveNameAndPath);
}
});
return true;
}
Which just seems wrong. The client could just call the sync method using Task.Run
Please, anyone got any hints on how to transform it into a async method ?

Which just seems wrong. The client could just call the sync method
using Task.Run
Spot on. By wrapping synchronous code in Task.Run() the library is now using the client's threadpool resources without it being readily apparent. Imagine what could happen to the client's threadpool if all libraries took this approach? Long story short, just expose the synchronous method and let the client decide if it wants to wrap it in Task.Run().
Having said that, if the ZipFile object had async functionality (e.g. had a SaveAsync() method) then you could make the outer method async as well. Here's an example of how that would look:
public static async Task<bool> ZipAndSaveFileAsync(string fileToPack, string archiveName, string outputDirectory)
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
using (var zip = new ZipFile())
{
// do stuff
await zip.SaveAsync(archiveNameAndPath);
}
return true;
}

As a temporarily solution, I would introduce an extension method:
public static class ZipFileExtensions
{
public static Task SaveAsync(this ZipFile zipFile, string filePath)
{
zipFile.Save(filePath);
return Task.FromResult(true);
}
}
Then the usage would be:
public static async Task<bool> ZipAndSaveFileAsync(string fileToPack, string archiveName, string outputDirectory)
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
using (var zip = new ZipFile())
{
...
await zip.SaveAsync(archiveNameAndPath).ConfugureAwait(false);
}
return true;
}
Implementing synchronous tasks does not violate anything (talking about Task.FromResult)
Submit a request to https://github.com/jstedfast/Ionic.Zlib asking for an async support in the library due to IO operations
Hope that's done eventually, and then you can upgrade the Ionic.Zlib in your app, delete the ZipFileExtensions, and continue using async version of the Save method (this time built into the library).
Alternatively, you can clone the repo from GitHub, and add SaveAsync by yourself, the submit a pull request back.
It's just not possible to 'convert' a sync method to an async if a library does not support it.
From performance standpoint, this might not be the best solution, but from management point of view, you can decouple stories "Convert everything to async" and "Improve app performance by having Ionic.Zlib async", what makes your backlog more granular.

public static Task<bool> ZipAndSaveFileAsync(string fileToPack, string archiveName, string outputDirectory)
{
return Task.Run(() =>
{
var archiveNameAndPath = Path.Combine(outputDirectory, archiveName);
using (var zip = new ZipFile())
{
zip.CompressionLevel = Ionic.Zlib.CompressionLevel.BestCompression;
zip.Comment = $"This archive was created at {System.DateTime.UtcNow.ToString("G")} (UTC)";
zip.AddFile(fileToPack);
zip.Save(archiveNameAndPath);
}
return true;
});
}
Then use like this
public async Task MyMethod()
{
bool b = await ZipAndSaveFileAsync();
}

Some of the answers suggest that zipping a file is not a process that you should do asynchronously. I don't agree with this.
I can imagine that zipping files is a process that might take some time. During this time you want to keep your UI responsive or you want to zip several files simultaneously, or you want to upload a zipped file while zipping the next one/
The code you show is the proper way to make your function asynchronous. You question whether it is useful to create such a small method. Why not let the users call Task.Run instead of call your async function?
The reason for this is called information hiding. By creating the async function you're hiding how you zip asynchronously, thus relieving others from knowing how to do this.
Besides, information hiding gives you the freedom to change the internals of the procedure as long as you don't change the pre- and postcondition.
One of the answers said that your function still is not asynchronous. That is not true. Callers of your function may call your async function without awaiting for it. While the task is zipping, the caller may do other things. As soon as it needs the boolean result of the task if can await for the task.
Example of usage:
private async Task DoSomethingSimultaneously()
{
var taskZipFileA = ZipAndSaveFileAsync(fileA, ...)
// while this task is zipping do other things,
// for instance start zipping file B:
var taskZipFileB = ZipAndSaveFileAsync(fileB, ...)
// while both tasks are zipping do other things
// after a while you want to wait until both files are finished:
await Task.WhenAll(new Task[] {taskZipFileA, taskZipFileB});
// after the await, the results are known:
if (taskZipFileA.Result)
{
// process the boolean result of taskZipFile A
}
Note the difference between Task.WaitAll and Task.WhenAll
In async - await you use Task.WhenAll. The return is a Task, so you can
await Task.WhenAll (...)
For proper async-await, all functions that call any async function need to be async themselves and return a Task (instead of void) or Task<TResult> instead of TResult. There is one exception: the event handler may return void.
private async void OnButton1_clicked(object sender, ...)
{
bool zipResult = await SaveAndZipFileAsync(...);
ProcessZipResult(zipResult);
}
Using this method your UI keeps responsive. You don't have to call Task.Run
If you have a non-async function and want to start zipping while doing something else, your non-async function has to call Task.Run. As the function is not async it can't use await. When it needs the result of task.Run it needs to use Task.Wait, or Task.WaitAll
private void NonAsyncZipper()
{
var taskZipFileA = Task.Run ( () => ZipAndSaveFileAsync(...);
// while zipping do other things
// after a while when the result is needed:
taskZipFileA.Wait();
ProcesZipResult(taskZipFileA.Result);
}

If it's possible to get the binary data from your Zip library after the compression, then instead of using this library to save the file, use .NET IO libraries to save it.
EDIT:
There is no point in using async for CPU bound operations (such as compression). In your case, the only benefit you can get from async is when you save the file to the disk. I thought that's what you were asking for.

Related

How do you create an async method?

How do I make below simple method async so that i can it call it like await DoSomething
public void DoDomething()
{
string d = "doing something";
}
Editing Question for what I am actually doing in my method
public void RunValidationScripts()
{
string scriptDirPath = #"D:\ValidationScripts";
string[] psScriptsPath = Directory.GetFiles(scriptDirPath);
Dictionary<string, Collection<PSObject>> result = new Dictionary<string, Collection<PSObject>>();
foreach (string scriptPath in psScriptsPath)
{
result.Add(scriptPath, ExecutePSScript(scriptPath));
}
}
I am using package Microsoft.PowerShell.SDK
private Collection<PSObject> ExecutePSScript(string scriptFilePath)
{
using (var ps = PowerShell.Create())
{
string fileName = Path.GetFileName(scriptFilePath);
var results = ps.AddScript(File.ReadAllText(scriptFilePath)).Invoke();
return results;
}
}
The question is unclear. It looks like the actual problem is trying to execute PowerShell scripts asynchronously. Nothing will be gained by using Task.Run in this case. PowerShell scripts execute on a different thread already.
Powershell.InvokeAsync can be used to execute a PowerShell script asynchronously. Directory.EnumerateFiles can be used to start enumerating files and processing them without waiting for all files to be retrieved.
Assuming the method executing the scripts looks something like this:
async Task<IEnumerable<PSObject>> ExecutePSScript(string scriptPath)
{
var ps = PowerShell.Create();
var content=await File.ReadAllTextAsync(scriptPath);
ps.AddScript(content);
var results=await ps.InvokeAsync();
return results;
}
The calling method can be :
async Task RunValidationScriptsAsync(string folder)
{
var allResults=new List<PSObject>();
var files=Directory.EnumerateFiles(folder);
foreach(var file in files)
{
var results=await ExecutePSScript(file);
allResults.AddRange(results);
}
//Do something with the results
}
From a developer's perspective, there is one simple pratical rule of thumb that applies to most cases.
Invoking asynchronous APIs
The idea is that if you can choose between a synchronous and asynchronous API in the framework, you should opt for making all your code asynchronous and write your methods asynchronously, so that invocations to the aforementioned async framework methods is awaited.
If you don't utilize async methods, you should write synchronous methods as async does not provide any benefit
Why would you make simple method async, If there is nothing to wait for (IO bound or CPU bound operation). But if there is really a need (fire and forget kind of situation), You could do,
public async Task DoDomething()
{
string d=null;
await Task.Run(()=> {
d = "doing something";
}
}
Having an async is not important if you dont want to wait here, so,
public Task DoDomething()
{
string d=null;
Task.Run(()=> {
d = "doing something";
}
}
will work too.
Note: the question detail has changed significantly, and answer might not be relevant for this.

Parallel.ForEach not adding items as expected in ConcurrentBag in C#

In my Asp.Net Core WebApi Controller, I'm receiving a IFormFile[] files. I need to convert this to of List<DocumentData>. I first used foreach. It was working fine. But later decided to change to Parallel.ForEach as I'm receiving many(>5) files.
Here is my DocumentData Class:
public class DocumentData
{
public byte[] BinaryData { get; set; }
public string FileName { get; set; }
}
Here is my Parallel.ForEach Logic:
var documents = new ConcurrentBag<DocumentData>();
Parallel.ForEach(files, async (currentFile) =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
documents.Add(new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
});
}
}
});
For Example, even for two files as input, documents always gives one file as output. Am I missing something?
I initially had List<DocumentData>. I found that it's not thread safe and changed to ConcurrentBag<DocumentData>. But still I'm getting unexpected results. Please assist on where I'm wrong?
I guess it is because, Parallel.Foreach doesn't support async/await. It only takes Action as input and executes it for each item. And in case of async delegates it will execute them in a fire-and-forget manner.
In that case passed lambda will be considered as async void function and async void can't be awaited.
If there were overload which takes Func<Task> then it would work.
I suggest you to create Tasks with the help of Select and use Task.WhenAll for executing them at the same time.
For example:
var tasks = files.Select(async currentFile =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
documents.Add(new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
});
}
}
});
await Task.WhenAll(tasks);
Additionally you can improve that code with just returning DocumentData instance from that method, and in such case there is no need to modify documents collection. Task.WhenAll has overload which takes IEnumerable<Task<TResult> as input and produces Task of TResult array. So, the result will be so:
var tasks = files.Select(async currentFile =>
{
if (currentFile.Length > 0)
{
using (var ms = new MemoryStream())
{
await currentFile.CopyToAsync(ms);
return new DocumentData
{
BinaryData = ms.ToArray(),
FileName = currentFile.FileName
};
}
}
return null;
});
var documents = (await Task.WhenAll(tasks)).Where(d => d != null).ToArray();
You had the right idea with a concurrent collection, but misused a TPL method.
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run mutple tasks at the same time use Task.WhenAll , or a TPL Dataflow ActionBlock which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
The fundimental issue is when you call an async lambda on an Action, you are essentially creating an async void method, which will run as a task unobserved. That's to say, your TPL method is just creating a bunch of tasks in parallel to run a bunch of unobserved tasks and not waiting for them.
Think of it like this, you ask a bunch of friends to go and get you some groceries, they in turn tell someone else to get your groceries, yet your friends report back to you and say thier job is done. It obviously isn't and you have no groceries.

Mixing Async and Sync in same HTTP request ASP.NET Core C#

Is it bad practice to mix async and sync call in same asp.net core api call
For example in following code
Method CropBlackBroderOfAnImageAsync is an Async Method
On the other hand SaveImageForProcessing(file, sourceFolderPath); is Sync Method
Reason: that I am calling SaveImageForProcessing synchronously that I want use the result of it to execute the code in CropBlackBroderOfAnImageAsync
The complete code repo
code reviews are welcome: https://github.com/aamir-poswal/ImageCropApp/
public async Task<(string sourceFolderPath, string destinationFolderPath)> CropBlackBorderOfAnImage(IFormFile file)
{
var extension = Path.GetExtension(file.FileName);
var newFileName = Guid.NewGuid().ToString();//Create a new Name for the file due to security reasons.
var fileNameSource = newFileName + extension;
var sourceFolderPath = Path.Combine(Directory.GetCurrentDirectory(), "Images\\Source", fileNameSource);
var fileNameDestination = newFileName + "Result" + extension;
var destinationFolderPath = Path.Combine(Directory.GetCurrentDirectory(), "Images\\Destination", fileNameDestination);
SaveImageForProcessing(file, sourceFolderPath);
await _imageCropBlackBroderService.CropBlackBroderOfAnImageAsync(sourceFolderPath, destinationFolderPath);
return (sourceFolderPath, destinationFolderPath);
}
private void SaveImageForProcessing(IFormFile file, string path)
{
using (var bits = new FileStream(path, FileMode.Create))
{
file.CopyTo(bits);
}
}
Yes, it is bad practice. If you use a long synchronous operation in your asynchronous code, you're going to freeze all the other asynchronous code as well. The reason for this is that asynchronous code usually runs in one thread. If that thread is busy waiting on a long operation, it can't handle other asynchronous code.
Instead, just use CopyToAsync, and turn your SaveImageForProcessing to an async function as well.
Reason: that I am calling SaveImageForProcessing synchronously that I want use the result of it to execute the code in CropBlackBroderOfAnImageAsync
I think you misunderstand what asynchronous means. When code runs asynchronously, it means that the current thread is freed to do other things while it is waiting. It might be helpful to read the article Asynchronous programming with async and await. It has a very helpful illustration about cooking breakfast.
It sounds like you believe that running asynchronous means that the code will run somewhere else at the same time that the rest of your code is running. That's called "parallel".
The purpose of async and await is to make it easy to start asynchronous tasks and come back to them when you're ready for the results. In your case, since you want the result right away, you can make SaveImageForProcessing asynchronous, use CopyToAsync and await it. Then you can use the result right away.
Because you use await, when you get to the line calling CropBlackBroderOfAnImageAsync, you are guaranteed that SaveImageForProcessing has entirely completed.
public async Task<(string sourceFolderPath, string destinationFolderPath)> CropBlackBorderOfAnImage(IFormFile file)
{
var extension = Path.GetExtension(file.FileName);
var newFileName = Guid.NewGuid().ToString();//Create a new Name for the file due to security reasons.
var fileNameSource = newFileName + extension;
var sourceFolderPath = Path.Combine(Directory.GetCurrentDirectory(), "Images\\Source", fileNameSource);
var fileNameDestination = newFileName + "Result" + extension;
var destinationFolderPath = Path.Combine(Directory.GetCurrentDirectory(), "Images\\Destination", fileNameDestination);
await SaveImageForProcessing(file, sourceFolderPath);
await _imageCropBlackBroderService.CropBlackBroderOfAnImageAsync(sourceFolderPath, destinationFolderPath);
return (sourceFolderPath, destinationFolderPath);
}
private async Task SaveImageForProcessing(IFormFile file, string path)
{
using (var bits = new FileStream(path, FileMode.Create))
{
await file.CopyToAsync(bits);
}
}
This all depends on your application. It's totally fine to invoke a synchronous function inside of an async function. Though, you might be leaving some performance on the table. Though, if you are already doing things in an async manner, in an async function, you might as well use the async file IO methods. The code in the async function can still be asynchronous, and still execute in the correct order (which seems to be your concern), you would just await the results and use them in later lines of code.
The bigger issue is attempting to mask an async call to make it seem sync or vice versa.
You might look at these posts:
sync over async
async over sync

Calling Async in a Sync method

I've been reading examples for a long time now, but unfortunately I've been unable to apply the solutions to the code I'm working with. Some quick Facts/Assorted Info:
1) I'm new to C#
2) The code posted below is modified from Amazon Web Services (mostly stock)
3) Purpose of code is to compare server info to offline already downloaded info and create a list of need to download files. This snip is for the list made from the server side, only option with AWS is to call async, but I need this to finish before moving forward.
public void InitiateSearch()
{
UnityInitializer.AttachToGameObject(this.gameObject);
//these are the access key and secret access key for credentials
BasicAWSCredentials credentials = new BasicAWSCredentials("secret key", "very secret key");
AmazonS3Config S3Config = new AmazonS3Config()
{
ServiceURL = ("url"),
RegionEndpoint = RegionEndpoint.blahblah
};
//Setting the client to be used in the call below
AmazonS3Client Client = new AmazonS3Client(credentials, S3Config);
var request = new ListObjectsRequest()
{
BucketName = "thebucket"
};
Client.ListObjectsAsync(request, (responseObject) =>
{
if (responseObject.Exception == null)
{
responseObject.Response.S3Objects.ForEach((o) =>
{
int StartCut = o.Key.IndexOf(SearchType) - 11;
if (SearchType == o.Key.Substring(o.Key.IndexOf(SearchType), SearchType.Length))
{
if (ZipCode == o.Key.Substring(StartCut + 12 + SearchType.Length, 5))
{
AWSFileList.Add(o.Key + ", " + o.LastModified);
}
}
}
);
}
else
{
Debug.Log(responseObject.Exception);
}
});
}
I have no idea how to apply await to the Client.ListObjectsAsync line, I'm hoping you all can give me some guidance and let me keep my hair for a few more years.
You can either mark your method async and await it, or you can call .Wait() or .Result() on the Task you're given back.
I have no idea how to apply await to the Client.ListObjectsAsync line
You probably just put await in front of it:
await Client.ListObjectsAsync(request, (responseObject) => ...
As soon as you do this, Visual Studio will give you an error. Take a good look at the error message, because it tells you exactly what to do next (mark InitiateSearch with async and change its return type to Task):
public async Task InitiateSearchAsync()
(it's also a good idea to add an Async suffix to follow the common pattern).
Next, you'd add an await everywhere that InitiateSearchAsync is called, and so on.
I'm assuming Client.ListObjectsAsync returns a Task object, so a solution for your specific problem would be this:
public async void InitiateSearch()
{
//code
var collection = await Client.ListObjectsAsync(request, (responseObject) =>
{
//code
});
foreach (var item in collection)
{
//do stuff with item
}
}
the variable result will now be filled with the objects. You may want to set the return type of InitiateSearch() to Task, so you can await it too.
await InitiateSearch(); //like this
If this method is an event handler of some sort (like called by the click of a button), then you can keep using void as return type.
A simple introduction from an unpublished part of the documentation for async-await:
Three things are needed to use async-await:
The Task object: This object is returned by a method which is executed asynchronous. It allows you to control the execution of the method.
The await keyword: "Awaits" a Task. Put this keyword before the Task to asynchronously wait for it to finish
The async keyword: All methods which use the await keyword have to be marked as async
A small example which demonstrates the usage of this keywords
public async Task DoStuffAsync()
{
var result = await DownloadFromWebpageAsync(); //calls method and waits till execution finished
var task = WriteTextAsync(#"temp.txt", result); //starts saving the string to a file, continues execution right await
Debug.Write("this is executed parallel with WriteTextAsync!"); //executed parallel with WriteTextAsync!
await task; //wait for WriteTextAsync to finish execution
}
private async Task<string> DownloadFromWebpageAsync()
{
using (var client = new WebClient())
{
return await client.DownloadStringTaskAsync(new Uri("http://stackoverflow.com"));
}
}
private async Task WriteTextAsync(string filePath, string text)
{
byte[] encodedText = Encoding.Unicode.GetBytes(text);
using (FileStream sourceStream = new FileStream(filePath, FileMode.Append))
{
await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
}
}
Some thing to note:
You can specify a return value from an asynchronous operations with Task. The await keyword waits till the execution of the method finishes, and returns the string.
the Task object contains the status of the execution of the method, it can be used as any other variable.
if an exception is thrown (for example by the WebClient) it bubbles up at the first time the await keyword is used (in this example at the line string result (...))
It is recommended to name methods which return the Task object as MethodNameAsync
For more information about this take a look at http://blog.stephencleary.com/2012/02/async-and-await.html.

Don't wait for Async call inside async method that is being waited on

I am seeing some odd behavior from my async methods. I recently discovered that unzipping my entire zip archive was not performant on all Windows devices. So much so that I've resorted to extracting a single file I need, using it while I wait for the rest of the archive to extract. However, currently the code to extract the single file and the code to extract the entire archive is being called from the same method. This method is async and ultimately originally called on the UI thread by code in App.xaml.cs. When I call this method, I am using the await keyword to wait for it to complete as there is one file in the zip archive I need for the loading of the app.
App.xaml looks like this:
SharedContext.ChangeUniverse("1234");
SharedContext looks like this:
public static void ChangeUniverse(string universe) {
await DownloadArchive(universe);
}
public async Task DownloadArchive(string universe) {
ZipArchive archive = magic; // get it somehow
var someLocalFilePath = magic; // the exact location I need to extract data.json
var someLocalPath = magic; // the exact location I need to extract the zip
archive.GetEntry("data.json").ExtractToFile(someLocalFilePath);
// notice I do NOT await
ExtractFullArchive(archive, someLocalPath);
}
public async Task ExtractFullArchive(ZipArchive archive, string path) {
archive.ExtractToDirectory(path, true); // extracting using an override nice extension method I found on SO.com
}
The problem is that DownloadArchive doesn't return until ExtractFullArchive completes and ExtractFullArchive is what is taking a LONG time. I need ExtractFullArchive to execute asynchronously while DownloadArchive completes. I really don't care when it finishes.
When you don't care when ExtractFullArchive finishes, you can start a new Task to execute the method on another thread. With this approach, the DownloadArchive method finishes, although the ExtractFullArchive has not finished yet. This could look like this for example.
public async Task DownloadArchive(string universe) {
ZipArchive archive = magic; // get it somehow
var someLocalFilePath = magic; // the exact location I need to extract data.json
var someLocalPath = magic; // the exact location I need to extract the zip
archive.GetEntry("data.json").ExtractToFile(someLocalFilePath);
Task.Run(() => ExtractFullArchive(archive, someLocalPath));
}
Dont return Task if you don't want to wait instead return void like this
public async void ExtractFullArchive(ZipArchive archive, string path) {
archive.ExtractToDirectory(path, true); // extracting using an override nice extension method I found on SO.com
}

Categories

Resources