C# async file operations / eliminate UI Blocking - c#

Maybe I'm just looking at this wrong, I don't normally have to write a UI with progress. I'm trying to do this in a WPF app with .Net 7.0. My code looks something like:
public async void ProcessAllFiles()
{
var fileItemCollection = await GetFilesToProcess(#"\\MyServer\Share");
foreach (var fileItem in fileItemCollection)
{
var result = await ProcessFile(fileItem);
PublishResultsToUI(result);
}
}
public async Task<ResultInfo> ProcessFile(FileInfo item)
{
var result = new ResultInfo();
await Task.Run(() =>
{
// add processing results to result object
....
));
return result;
}
The "GetFilesToProcess" locks the UI 10-15 seconds and each "ProcessFiles" locks the UI a second or 2 for each file.
The timing seems to be about the same with or without the async/await/Task<T> statements.
If the UI didn't freeze every few seconds all would be good.
Any ideas would be appreciated (I've read through a bunch of docs, and I just have not found ones that make sense to me).

Related

Why does async IO block in C#? [duplicate]

This question already has answers here:
Why File.ReadAllLinesAsync() blocks the UI thread?
(2 answers)
Closed 11 months ago.
I've created a WPF app that targets a local document database for fun/practice. The idea is the document for an entity is a .json file that lives on disk and folders act as collections. In this implementation, I have a bunch of .json documents that provide data about a Video to create a sort of an IMDB clone.
I have this class:
public class VideoRepository : IVideoRepository
{
public async IAsyncEnumerable<Video> EnumerateEntities()
{
foreach (var file in new DirectoryInfo(Constants.JsonDatabaseVideoCollectionPath).GetFiles())
{
var json = await File.ReadAllTextAsync(file.FullName); // This blocks
var document = JsonConvert.DeserializeObject<VideoDocument>(json); // Newtonsoft
var domainObject = VideoMapper.Map(document); // A mapper to go from the document type to the domain type
yield return domainObject;
}
// Uncommenting the below lines and commenting out the above foreach loop doesn't lock up the UI.
//await Task.Delay(5000);
//yield return new Video();
}
// Rest of class.
}
Way up the call stack, though the API layer and into the UI layer, I have an ICommand in a ViewModel:
QueryCommand = new RelayCommand(async (query) => await SendQuery((string)query));
private async Task SendQuery(string query)
{
QueryStatus = "Querying...";
QueryResult.Clear();
await foreach (var video in _videoEndpoints.QueryOnTags(query))
QueryResult.Add(_mapperService.Map(video));
QueryStatus = $"{QueryResult.Count()} videos found.";
}
The goal is to show the user a message 'Querying...' while the query is being processed. However, that message is never shown and the UI locks up until the query is complete, at which point the result message shows.
In VideoRepository, if I comment out the foreach loop and uncomment the two lines below it, the UI doesn't lock up and the 'Querying...' message gets shown for 5 seconds.
Why does that happen? Is there a way to do IO without locking up the UI/blocking?
Fortunately, if this were behind a web API and hit a real database, I probably wouldn't see this issue. I'd still like the UI to not lock up with this implementation though.
EDIT:
Dupe of Why File.ReadAllLinesAsync() blocks the UI thread?
Turns out Microsoft didn't make their async method very async. Changing the IO line fixes everything:
//var json = await File.ReadAllTextAsync(file.FullName); // Bad
var json = await Task.Run(() => File.ReadAllText(file.FullName)); // Good
You are probably targeting a .NET version older than .NET 6. In these old versions the file-system APIs were not implemented efficiently, and were not even truly asynchronous. Things have been improved in .NET 6, but still the synchronous file-system APIs are more performant than their asynchronous counterparts. Your problem can be solved simply by switching from this:
var json = await File.ReadAllTextAsync(file.FullName);
to this:
var json = await Task.Run(() => File.ReadAllText(file.FullName));
If you want to get fancy, you could also solve the problem in the UI layer, by using a custom LINQ operator like this:
public static async IAsyncEnumerable<T> OnThreadPool<T>(
this IAsyncEnumerable<T> source,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
var enumerator = await Task.Run(() => source
.GetAsyncEnumerator(cancellationToken)).ConfigureAwait(false);
try
{
while (true)
{
var (moved, current) = await Task.Run(async () =>
{
if (await enumerator.MoveNextAsync())
return (true, enumerator.Current);
else
return (false, default);
}).ConfigureAwait(false);
if (!moved) break;
yield return current;
}
}
finally
{
await Task.Run(async () => await enumerator
.DisposeAsync()).ConfigureAwait(false);
}
}
This operator offloads to the ThreadPool all the operations associated with enumerating an IAsyncEnumerable<T>. It can be used like this:
await foreach (var video in _videoEndpoints.QueryOnTags(query).OnThreadPool())
QueryResult.Add(_mapperService.Map(video));

I cannot understand how exactly does await/async work

I started to look into Task, async/await concepts is c# and I'm having big problems understanding it, well at least i don't know how to implement it. I started rewriting an older test program i had written before, but now instead of threading i want to use these new concepts. Basically the layout is as it follows:
I have a simple class where i download the HTML content of a web page.
I process that in another class where i basically just parse the page to my model. Later on i want to display that to my UI.
The problem is that my program is not responsive, it blocks the UI while I'm processing the info.
I started learning this 2 days ago, i have read a lot of stuff online, including MSDN and some blogs but yet I'm unable to figure it out. Maybe someone can provide a look as well
HtmlDOwnloadCOde:
public async Task<string> GetMangaDescriptionPage(string detailUrl)
{
WebClient client = new WebClient();
Stream data = await client.OpenReadTaskAsync(detailUrl);
StreamReader reader = new StreamReader(data);
string s = reader.ReadToEnd();
data.Dispose();
reader.Dispose();
data.Close();
reader.Close();
return s;
}
My parse class code:
public async Task<MangaDetailsModel> ParseMangaDescriptionPage()
{
ParseOneManga pom = new ParseOneManga();
string t1 = await pom.GetMangaDescriptionPage(selectedManga.url);
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(t1);
var divs = htmlDoc.DocumentNode.Descendants("div").Where(x => x.Attributes.Contains("id") &&
x.Attributes["id"].Value.Contains("title")).ToArray();
mangaDetails.mangaName = divs[0].Element("h1").InnerText;
mangaDetails.description = divs[0].Descendants("p").Single().InnerText ?? "DSA";
var tds = divs[0].Descendants("td");
int info = 0;
var chapters = htmlDoc.DocumentNode.Descendants("div").Where(x => x.Attributes.Contains("id") &&
x.Attributes["id"].Value.Contains("chapters")).ToArray();
var chapterUi = chapters[0].Descendants("ul").Where(x => x.Attributes.Contains("class") &&
x.Attributes["class"].Value.Contains("chlist"));
foreach (var li in chapterUi)
{
var liChapter = li.Descendants("li");
foreach (var h3tag in liChapter)
{
var chapterH3 = h3tag.Descendants("a").ToArray();
SingleManagFox chapterData = new SingleManagFox();
chapterData.name = chapterH3[1].InnerHtml;
chapterData.url = chapterH3[1].GetAttributeValue("href", "0");
mangaDetails.chapters.Add(chapterData);
}
};
return mangaDetails;
}
UI code:
private async void mainBtn_Click(object sender, RoutedEventArgs e)
{
if (mangaList.SelectedItem != null)
{
test12((SingleManagFox)mangaList.SelectedItem);
}
}
private async void test12(SingleManagFox selectedManga)
{
selectedManga = (SingleManagFox)mangaList.SelectedItem;
MangaDetails mangaDetails = new MangaDetails(selectedManga);
MangaDetailsModel mdm = await mangaDetails.ParseMangaDescriptionPage();
txtMangaArtist.Text = mdm.artisName;
txtMangaAuthor.Text = mdm.authorName;
chapterList.ItemsSource = mdm.chapters;
}
Sorry if its trivial but i cannot figure it out myself.
When going async you need to try to go async all the way and avoid mixing blocking calls with async calls.
You are using async void in the event handler with no await.
Try to avoid async void unless it is an event handler. test12 should be updated to return Task and awaited in the event handler mainBtn_Click.
private async void mainBtn_Click(object sender, RoutedEventArgs e) {
if (mangaList.SelectedItem != null) {
await test12((SingleManagFox)mangaList.SelectedItem);
}
}
private async Task test12(SingleManagFox selectedManga) {
selectedManga = (SingleManagFox)mangaList.SelectedItem;
MangaDetails mangaDetails = new MangaDetails(selectedManga);
MangaDetailsModel mdm = await mangaDetails.ParseMangaDescriptionPage();
txtMangaArtist.Text = mdm.artisName;
txtMangaAuthor.Text = mdm.authorName;
chapterList.ItemsSource = mdm.chapters;
}
Also consider updating the web call to use HttpClient if available.
class ParseOneManga {
public async Task<string> GetMangaDescriptionPageAsync(string detailUrl) {
using (var client = new HttpClient()) {
string s = await client.GetStringAsync(detailUrl);
return s;
}
}
}
Reference: - Async/Await - Best Practices in Asynchronous Programming
Quite often people think that async-await means that multiple threads are processing your code at the same time. This is not the case, unless you explicitly start a different thread.
A good metaphore that helped me a lot explaining async-await is the restauran metaphor used in this interview with Eric Lippert. Search somewhere in the middle for async-await.
Eric Lipperts compares async-await processing with a cook who has to wait for his water to boil. Instead of waiting, he looks around if he can do other things instead. When finished doing the other thing, he comes back to see if the water is boiling and starts processing the boiling water.
The same is with your process. There is only one thread busy (at a time). This thread keeps processing until he has to await for something. This something is usually a fairly long process that is processed without using your CPU core, like writing a file to disk, loading a web page, or querying information from an external database.
Your thread can only do one thing at a time. So while it is busy calculating something, if can't react on operator input and your UI freezes, until the calculations are done. Async await will only help if there are a lot of times your thread would be waiting for other processes to complete
If you call an async function, you are certain that somewhere in that function is an await. In fact, if you declare your function async, and your forget to await in it, your compiler will warn you.
When your call meets the await in the function, your thread goes up its call stack to see if it can do other things. If you are not awaiting, you can continue processing, until you have to await. The thread goes up its call stack again to see if one of the callers is not awaiting etc.
async Task ReadDataAsync()
{
// do some preparations
using (TextReader textReader = ...)
{
var myReadTask = textReader.ReadToEndAsync();
// while the textReader is waiting for the information to be available
// you can do other things
ProcessSomething();
// after a while you really need the results from the read data,
// so you await for it.
string text = await MyReadTask;
// after the await, the results from ReatToEnd are available
Process(text);
...
There are some rules to follow:
an async function should return Task instead of void and Task<TResult> instead of TResult
There is one exception: the async event handler returns void instead of Task.
Inside your async function you should await somehow. If you don't await, it is useless to declare your function async
The result of await Task is void, and the result of await Task<TResult> is TResult
If you call an async function, see if you can do some processing instead of waiting for the results of the call
Note that even if you call several async functions before awaiting for them, does not mean that several threads are running these functions synchronously. The statement after your first call to the async function is processed after the called function starts awaiting.
async Task DoSomethingAsync()
{
var task1 = ReadAsync(...);
// no await, so next statement processes as soon as ReadAsync starts awaiting
DoSomeThingElse();
var task2 = QueryAsync(...);
// again no await
// now I need results from bothtask1, or from task2:
await Task.WhenAll(new Task[] {task1, task2});
var result1 = Task1.Result;
var result2 = Task2.Result;
Process(result1, result2);
...
Usually all your async functionality is performed by the same context. In practice this means that you can program as if your program is single threaded. This makes the look of your program much easier.
Another article that helped me a lot understanding async-await is Async-Await best practices written by the ever so helpful Stephen Cleary

C# return before await - without starting a new Task

This is for an iOS app written in Xamarin. All my application code runs in the main thread (i.e. the UI thread).
The UI code does something as follows:
public async void ButtonClicked()
{
StartSpinner();
var data = await UpdateData();
StopSpinner();
UpdateScreen(data);
}
The UpdateData function does something as follows:
public Task<Data> UpdateData()
{
var data = await FetchFromServer();
TriggerCacheUpdate();
return data;
}
TriggerCacheUpdate ends up calling the following function defined below
public Task RefreshCache()
{
var data = await FetchMoreDataFromServer();
UpdateInternalDataStructures();
}
My question is how should TriggerCacheUpdate be written? The requirements are:
Can't be async, I don't want UpdateData and consequently
ButtonClicked to wait for RefreshCache to complete before
continuing.
UpdateInternalDataStructures needs to execute on the main (UI) thread, i.e. the thread that all the other code shown above executes on.
Here are a few alternatives I came up with:
public void TriggerCacheUpdate()
{
RefreshCache();
}
The above works but generates a compiler warning. Moreover exception handling from RefreshCache doesn't work.
public void TriggerCacheUpdate()
{
Task.Run(async() =>
{
await RefreshCache();
});
}
The above violates requirement 2 as UpdateInternalDataStructures is not executed on the same thread as everything else.
A possible alternative that I believe works is:
private event EventHandler Done;
public void TriggerCacheUpdate()
{
this.task = RefreshCache();
Done += async(sender, e) => await this.task;
}
Task RefreshCache() {
var data = await FetchMoreDataFromServer();
UpdateInternalDataStructures();
if (Done != null) {
Done(this, EventArgs.Empty);
}
}
Does the above work? I haven't ran into any problems thus far with my limited testing. Is there a better way to write TriggerCacheUpdate?
It's hard to say without being able to test it but it looks like your trigger cache update method is fine, it's your RefreshCache that needs to change. Inside of RefreshCache you are not waiting in the UI thread for the result of "data" to return so set the ConfigureAwait to false.
public async Task RefreshCache()
{
var data = await FetchMoreDataFromServer().ConfigureAwait(false);
UpdateInternalDataStructures();
}
Your event handler is async. That means, that even if you await for a Task to complete, that your UI remains responsive. So even if you would await for the TriggerCacheUpdate to return, your UI would remain responsive.
However, if you are really certain that you are not interested in the result of TriggerCachUpdate, then you could start a Task without waiting for it:
public Task<Data> UpdateData()
{
var data = await FetchFromServer();
Task.Run( () => TriggerCacheUpdate());
return data;
}
Note: careful: you don't know when TriggerCachUpdate is finished, not even if it ended successfully or threw an exception. Also: check what happens if you start a new TriggerCacheUpdate task while the previous one is not finished yet.
For those who want to use Task.Factory.StartNew, see the discussion about it in MSDN:
Task.Run vs Task.Factory.StartNew

Best practice for task/await in a foreach loop

I have some time consuming code in a foreach that uses task/await.
it includes pulling data from the database, generating html, POSTing that to an API, and saving the replies to the DB.
A mock-up looks like this
List<label> labels = db.labels.ToList();
foreach (var x in list)
{
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
List<response> res = await api.sendMessage(object); //POST
//put all the responses in the db
foreach (var r in res)
{
db.responses.add(r);
}
db.SaveChanges();
}
Time wise, generating the Html and posting it to the API seem to be taking most of the time.
Ideally it would be great if I could generate the HTML for the next item, and wait for the post to finish, before posting the next item.
Other ideas are also welcome.
How would one go about this?
I first thought of adding a Task above the foreach and wait for that to finish before making the next POST, but then how do I process the last loop... it feels messy...
You can do it in parallel but you will need different context in each Task.
Entity framework is not thread safe, so if you can't use one context in parallel tasks.
var tasks = myLabels.Select( async label=>{
using(var db = new MyDbContext ()){
// do processing...
var response = await api.getresponse();
db.Responses.Add(response);
await db.SaveChangesAsync();
}
});
await Task.WhenAll(tasks);
In this case, all tasks will appear to run in parallel, and each task will have its own context.
If you don't create new Context per task, you will get error mentioned on this question Does Entity Framework support parallel async queries?
It's more an architecture problem than a code issue here, imo.
You could split your work into two separate parts:
Get data from database and generate HTML
Send API request and save response to database
You could run them both in parallel, and use a queue to coordinate that: whenever your HTML is ready it's added to a queue and another worker proceeds from there, taking that HTML and sending to the API.
Both parts can be done in multithreaded way too, e.g. you can process multiple items from the queue at the same time by having a set of workers looking for items to be processed in the queue.
This screams for the producer / consumer pattern: one producer produces data in a speed different than the consumer consumes it. Once the producer does not have anything to produce anymore it notifies the consumer that no data is expected anymore.
MSDN has a nice example of this pattern where several dataflowblocks are chained together: the output of one block is the input of another block.
Walkthrough: Creating a Dataflow Pipeline
The idea is as follows:
Create a class that will generate the HTML.
This class has an object of class System.Threading.Tasks.Dataflow.BufferBlock<T>
An async procedure creates all HTML output and await SendAsync the data to the bufferBlock
The buffer block implements interface ISourceBlock<T>. The class exposes this as a get property:
The code:
class MyProducer<T>
{
private System.Threading.Tasks.Dataflow.BufferBlock<T> bufferBlock = new BufferBlock<T>();
public ISourceBlock<T> Output {get {return this.bufferBlock;}
public async ProcessAsync()
{
while (somethingToProduce)
{
T producedData = ProduceOutput(...)
await this.bufferBlock.SendAsync(producedData);
}
// no date to send anymore. Mark the output complete:
this.bufferBlock.Complete()
}
}
A second class takes this ISourceBlock. It will wait at this source block until data arrives and processes it.
do this in an async function
stop when no more data is available
The code:
public class MyConsumer<T>
{
ISourceBlock<T> Source {get; set;}
public async Task ProcessAsync()
{
while (await this.Source.OutputAvailableAsync())
{ // there is input of type T, read it:
var input = await this.Source.ReceiveAsync();
// process input
}
// if here, no more input expected. finish.
}
}
Now put it together:
private async Task ProduceOutput<T>()
{
var producer = new MyProducer<T>();
var consumer = new MyConsumer<T>() {Source = producer.Output};
var producerTask = Task.Run( () => producer.ProcessAsync());
var consumerTask = Task.Run( () => consumer.ProcessAsync());
// while both tasks are working you can do other things.
// wait until both tasks are finished:
await Task.WhenAll(new Task[] {producerTask, consumerTask});
}
For simplicity I've left out exception handling and cancellation. StackOverFlow has artibles about exception handling and cancellation of Tasks:
Keep UI responsive using Tasks, Handle AggregateException
Cancel an Async Task or a List of Tasks
This is what I ended up using: (https://stackoverflow.com/a/25877042/275990)
List<ToSend> sendToAPI = new List<ToSend>();
List<label> labels = db.labels.ToList();
foreach (var x in list) {
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
sendToAPI.add(the object with HTML);
}
int maxParallelPOSTs=5;
await TaskHelper.ForEachAsync(sendToAPI, maxParallelPOSTs, async i => {
using (NasContext db2 = new NasContext()) {
List<response> res = await api.sendMessage(i.object); //POST
//put all the responses in the db
foreach (var r in res)
{
db2.responses.add(r);
}
db2.SaveChanges();
}
});
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body) {
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext()) {
await body(partition.Current).ContinueWith(t => {
if (t.Exception != null) {
string problem = t.Exception.ToString();
}
//observe exceptions
});
}
}));
}
basically lets me generate the HTML sync, which is fine, since it only takes a few seconds to generate 1000's but lets me post and save to DB async, with as many threads as I predefine. In this case I'm posting to the Mandrill API, parallel posts are no problem.

Async/Await - not understanding why one way is Blocking and another is not

Still trying to wrap my head around async/await. I have the following method for drag/drop loading:
private async void p_DragDrop(object sender, DragEventArgs e)
{
string[] files = (string[])e.Data.GetData(DataFormats.FileDrop);
List<string> CurvesToLoad = new List<string>();
List<string> TestsToLoad = new List<string>();
foreach (string file in files)
{
if (file.ToUpper().EndsWith(".CCC"))
CurvesToLoad.Add(file);
else if (file.ToUpper().EndsWith(".TTT"))
TestsToLoad.Add(file);
}
//SNIPPET IN BELOW SECTION
foreach (string CurvePath in CurvesToLoad)
{
Curve c = new Curve(CurvePath);
await Task.Run(() =>
{
c.load();
c.calculate();
});
AddCurveControls(c);
}
//END SNIPPET
foreach (string TestPath in TestsToLoad)
{
Test t = new Test(TestPath);
await Task.Run(() =>
{
t.load();
});
AddTestControls(t);
}
}
It is non-blocking as I expected - I am able to navigate between tabs of the TabControl as multiple items are loaded and I can see each tab pop up as it complete loading.
I then tried to convert to this:
private Task<Curve> LoadAndCalculateCurve(string path)
{
Curve c = new Curve(path);
c.load();
c.calculate();
return Task.FromResult(c);
}
And then replace the marked snippet from the first code block with:
foreach (string CurvePath in CurvesToLoad)
{
Curve c = await LoadAndCalculateCurve(CurvePath);
AddCurveControls(c);
}
And it becomes blocking - I can't navigate through tabs as it's loading, and then all of the loaded items appear at once when they are completed. Just trying to learn and understand the differences at play here - many thanks in advance.
EDIT:
Updated LoadAndCalculateCurve():
private async Task<Curve> LoadAndCalculateCurve(string path)
{
Curve c = new Curve(path);
await Task.Run(() => {
c.load();
c.calculate();
});
return c;
}
Async methods do not execute in a different thread, await does not start a thread. async merely enables the await keyword and await waits for something that already runs.
All of that code is running on the UI thread.
So basically this is what is happening to my knowledge.
In your first code you write
foreach (string CurvePath in CurvesToLoad)
{
Curve c = new Curve(CurvePath);
await Task.Run(() =>
{
c.load();
c.calculate();
});
AddCurveControls(c);
}
this does the async flow as expected because you are using the Task.Run which adds the work to the ThreadPool queue.
In your second try you don't do this and you are using the same task. Try to use the same logic of (Task.Run) in the second try and I think it will work
The implementation of your LoadAndCalculateCurve method is to synchronously create a Curve, synchronously load it and perform the calculation, and then return the result wrapped in a Task. Nothing about this is asynchronously. When you await it it will invoke the method, do all of the synchronous work to get the (already completed) task, and then add a continuation to that (already completed) task that will fire immediately.
When you instead use Task.Run you're scheduling those long running operations to take place in another thread and immediately returning a (not yet completed) Task that you can use to run code when it does eventually finish its work.

Categories

Resources