Task.FromAsync and two threads - c#

I'm using .net 4.0 and have following code:
var stream = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize,
FileOptions.Asynchronous | FileOptions.SequentialScan);
var buffer = new byte[bufferSize];
Debug.Assert(stream.IsAsync, "stream.IsAsync");
var ia = stream.BeginRead(buffer, 0, buffer.Length, t =>
{
var ms = new MemoryStream(buffer);
using (TextReader rdr = new StreamReader(ms, Encoding.ASCII))
{
for (uint iEpoch = 0; item < FileHeader.NUMBER_OF_ITEMS; item++)
{
dataList.Add(epochData);
}
}
}, null);
return Task<int>.Factory.FromAsync(ia, t =>
{
var st = stream;
var bytes1 = st.EndRead(t);
var a = EpochDataList.Count;
var b = FileHeader.NUMBER_OF_EPOCHS;
Debug.Assert(a == b);
st.Dispose();
return bytes1;
});
And it seems that there are race conditions between execution of async callback and end method lambda function(assert is raising). But according to msdn it is explicitly stated that end method should be executing after async callback is finished:
Creates a Task that executes an end method function when a specified IAsyncResult completes.
Am I right that I'm confusing fact of completion of IO operation which triggering end method and fact of completion of async callback, so they both can potentially execute in the same time?
Meanwhile this code works great:
return Task<int>.Factory.FromAsync(stream.BeginRead, (ai) =>
{
var ms = new MemoryStream(buffer);
using (TextReader rdr = new StreamReader(ms, Encoding.ASCII))
{
using (TextReader rdr = new StreamReader(ms, Encoding.ASCII))
{
for (uint iEpoch = 0; item < FileHeader.NUMBER_OF_ITEMS; item++)
{
dataList.Add(epochData);
}
}
}
stream.Dispose();
return stream.EndRead(ai);
}, buffer, 0, buffer.Length, null);
Also I need to mention that returned task is used within continuation.
Thanks in advance.

You're doing this so wrong, I'm almost inclined not to answer - you're going to hurt someone with that code. But since this isn't Code Review...
Your most immediate problem is that the callback you provide to BeginRead isn't part of the IAsyncResult at all. Thus, when a specified IAsyncResult completes doesn't talk about your callback, it only talks about the underlying asynchronous operation - you get two separate callbacks launched by the same event.
Now, for the other problems:
You need to keep issuing BeginReads over and over again, until EndRead returns 0. Otherwise, you're only ever reading the whole buffer at most - if your file is longer than that, you're not going to read the whole file.
You're combining old-school asynchronous API callbacks with Task-based asynchrony. This is bound to give you trouble. Just learn to use Tasks properly, and you'll find the callbacks are 100% unnecessary.
EndRead is telling you how many bytes were actually read in the preceding BeginRead operation - you're ignoring that information.
Doing this correctly isn't all that easy - if possible, I'd suggest upgrading to .NET 4.5, and taking advantage of the await keyword. If that's not possible, you can install the async targetting pack, which adds await to 4.0 as a simple NuGet package.
With await, reading the whole file is as simple as
using (var sr = new StreamReader(fs))
{
string line;
while ((line = await sr.ReadLineAsync(buffer, 0, buffer.Length)) > 0)
{
// Do whatever
}
}

Related

Calling async functions on parallel threads in c#

I have a three async function that I want to call from multiple threads in parallel at the same time. Till now I have tried the following approach -
int numOfThreads = 4;
var taskList = List<Task>();
using(fs = new FileStream(inputFilePath, FileMode.OpenOrCreate,FileAccess.ReadWrite,FileShare.ReadWrite))
{
for(int i=1; i<= numOfThreads ; i++)
{
taskList.Add(Task.Run( async() => {
byte[] buffer = new byte[length]; // length could be upto a few thousand
await Function1Async(); // Reads from the file into a byte array
long result = await Function2Aync(); // Does some async operation with that byte array data
await Function3Async(result); // Writes the result into the file
}
}
}
Task.WaitAll(taskList.toArray());
However, not all of the tasks complete before the execution reaches an end. I have limited experience with threading in c#. What am I doing wrong in my code? Or should I take an alternative approach?
EDIT -
So I made some changes to my approach. I got rid of the Function3Async for now -
for(int i=1;i<=numOfThreads; i++)
{
using(fs = new FileStream(----))
{
taskList.Add(Task.Run( async() => {
byte[] buffer = new byte[length]; // length could be upto a few thousand
await Function1Async(buffer); // Reads from the file into a byte array
Stream data = new MemoryStream(buffer);
/** Write the Stream into a file and return
* the offset at which the write operation was done
*/
long blockStartOffset = await Function2Aync(data);
Console.WriteLine($"Block written at - {blockStartOffset}");
}
}
}
Task.WaitAll(taskList.toArray());
Now all threads seem to proceed to completion but the Function2Async seems to randomly write some Japanese characters to the output file. I guess it is some threading issue perhaps?
Here is the implementation of the Function2Async ->
public async Task<long> Function2Async(Stream data)
{
long offset = getBlockOffset();
using(var outputFs = new FileStream(fileName,
FileMode.OpenOrCreate,
FileAccess.ReadWrite,
FileShare.ReadWrite))
{
outputFs.Seek(offset, SeekOrigin.Begin);
await data.CopyToAsync(outputFs);
}
return offset;
}
In your example you have passed neither fs nor buffer into Function1Async but your comment says it reads from fs into buffer, so I will assume that is what happens.
You cannot read from a stream in parallel. It does not support that. If you find one that supports it, it will be horribly inefficient, because that is how hard disk storage works. Even worse if it is a network drive.
Read from the stream into your buffers first and in sequence, then let your threads loose and run your logic. In parallel, on the already existing buffers in memory.
Writing by the way would have the same problem if you wrote to the same file. If you write to one file per buffer, that's fine, otherwise, do it sequentially.

Strange dispose behavior while testing

I Have endpoint which use handlers of 2 others endpoints it's probably not best practice, but it's not the point. In this methods I use a lot of MemoryStreams, ZipStream and stuff like that. Of course I dispose all of them. And everything works good till I run all tests together, then tests throw errors like: “Input string was not in a correct format.”, "Cannot read Zip file" or other weird messages. This are also test of this 2 handlers which I use in previous test.
Solution what I found is to add "Thread.Sleep(1);" at the end of the "Handle" method, just before return. It looks like something need more time to dispose, but why?. Have you any ideas why this 1ms sleep help with this?
ExtractFilesFromZipAndWriteToGivenZipArchive is an async method.
public async Task<MemoryStream> Handle(MultipleTypesExportQuery request, CancellationToken cancellationToken)
{
var stepwiseData = await HandleStepwise(request.RainmeterId, request.StepwiseQueries, cancellationToken);
var periodicData = await HandlePeriodic(request.RainmeterId, request.PeriodicQueries, cancellationToken);
var data = new List<MemoryStream>();
data.AddRange(stepwiseData);
data.AddRange(periodicData);
await using (var ms = new MemoryStream())
using (var archive = new ZipArchive(ms, ZipArchiveMode.Create,false))
{
int i = 0;
foreach (var d in data)
{
d.Open();
d.Position = 0;
var file = ZipFile.Read(d);
ExtractFilesFromZipAndWriteToGivenZipArchive(file, archive, i, cancellationToken);
i++;
file.Dispose();
d.Dispose();
}
//Thread.Sleep(100);
return ms;
}
}
ExtractFilesFromZipAndWriteToGivenZipArchive() is an asynchronous function which means, in this case, that you need to await it:
await ExtractFilesFromZipAndWriteToGivenZipArchive(file, archive, i, cancellationToken);
Otherwise, the execution will keep going without waiting the function to return.

file writing using blockingcollection

I have a tcp listener which listens and writes data from the server. I used a BlockingCollection to store data. Here I don't know when the file ends. So, my filestream is always open.
Part of my code is:
private static BlockingCollection<string> Buffer = new BlockingCollection<string>();
Process()
{
var consumer = Task.Factory.StartNew(() =>WriteData());
while()
{
string request = await reader.ReadLineAsync();
Buffer.Add(request);
}
}
WriteData()
{
FileStream fStream = new FileStream(filename,FileMode.Append,FileAccess.Write,FileShare.Write, 16392);
foreach(var val in Buffer.GetConsumingEnumerable(token))
{
fStream.Write(Encoding.UTF8.GetBytes(val), 0, val.Length);
fStream.Flush();
}
}
The problem is I cannot dispose filestream within loop otherwise I have to create filestream for each line and the loop may never end.
This would be much easier in .NET 4.5 if you used a DataFlow ActionBlock. An ActionBlock accepts and buffers incoming messages and processes them asynchronously using one or more Tasks.
You could write something like this:
public static async Task ProcessFile(string sourceFileName,string targetFileName)
{
//Pass the target stream as part of the message to avoid globals
var block = new ActionBlock<Tuple<string, FileStream>>(async tuple =>
{
var line = tuple.Item1;
var stream = tuple.Item2;
await stream.WriteAsync(Encoding.UTF8.GetBytes(line), 0, line.Length);
});
//Post lines to block
using (var targetStream = new FileStream(targetFileName, FileMode.Append,
FileAccess.Write, FileShare.Write, 16392))
{
using (var sourceStream = File.OpenRead(sourceFileName))
{
await PostLines(sourceStream, targetStream, block);
}
//Tell the block we are done
block.Complete();
//And wait fo it to finish
await block.Completion;
}
}
private static async Task PostLines(FileStream sourceStream, FileStream targetStream,
ActionBlock<Tuple<string, FileStream>> block)
{
using (var reader = new StreamReader(sourceStream))
{
while (true)
{
var line = await reader.ReadLineAsync();
if (line == null)
break;
var tuple = Tuple.Create(line, targetStream);
block.Post(tuple);
}
}
}
Most of the code deals with reading each line and posting it to the block. By default, an ActionBlock uses only a single Task to process one message at a time, which is fine in this scenario. More tasks can be used if needed to process data in parallel.
Once all lines are read, we notify the block with a call to Complete and await for it to finish processing with await block.Completion.
Once the block's Completion task finishes we can close the target stream.
The beauty of the DataFlow library is that you can link multiple blocks together, to create a pipeline of processing steps. ActionBlock is typically the final step in such a chain. The library takes care to pass data from one block to the next and propagate completion down the chain.
For example, one step can read files from a log, a second can parse them with a regex to find specific patterns (eg error messages) and pass them on, a third can receive the error messages and write them to another file. Each step will execute on a different thread, with intermediate messages buffered at each step.

Reading all lines from file asynchronously and safely

I'm trying to read from a file asynchronously, and safely (minimum level of permissions sought). I'm using .NET 3.5 and cannot find a good example for this (all uses async and await).
public string GetLines()
{
var encoding = new UnicodeEncoding();
byte[] allText;
using (FileStream stream =File.Open(_path, FileMode.Open))
{
allText = new byte[stream.Length];
//something like this, but does not compile in .net 3.5
stream.ReadAsync(allText, 0, (int) allText.Length);
}
return encoding.GetString(allText);
}
Question is, how do I do this asynchronously in .net 3.5, wait till the operation is finished and send back all lines to the caller?
The caller can wait till the operation is complete, but the read has to happen in a background thread.
The caller is a UI thread, and I'm using .NET 3.5
There are several options, but the simplest would be to have this method accept a callback, and then call it when it has computed the given value. The caller than needs to pass in the callback method to process the results rather than blocking on the method call:
public static void GetLines(Action<string> callback)
{
var encoding = new UnicodeEncoding();
byte[] allText;
FileStream stream = File.Open(_path, FileMode.Open);
allText = new byte[stream.Length];
//something like this, but does not compile in .net 3.5
stream.ReadAsync(allText, 0, (int)allText.Length);
stream.BeginRead(allText, 0, allText.Length, result =>
{
callback(encoding.GetString(allText));
stream.Dispose();
}, null);
}
If you want to wait until the operation is complete, why do you need to do it asynchronously?
return File.ReadAllText(_path, new UnicodeEncoding());
Would do the trick
Maybe something like this:
GetLines(string path, ()=>
{
// here your code...
});
public void GetLines(string _path, Action<string> callback)
{
var result = string.Empty;
new Action(() =>
{
var encoding = new UnicodeEncoding();
byte[] allText;
using (FileStream stream = File.Open(_path, FileMode.Open))
{
allText = new byte[stream.Length];
//something like this, but does not compile in .net 3.5
stream.Read(allText, 0, (int)allText.Length);
}
result = encoding.GetString(allText);
}).BeginInvoke(x => callback(result), null);
}

Working with System.Threading.Tasks.Task<Stream> instead of Stream

I was using a method like below on the previous versions of WCF Web API:
// grab the posted stream
Stream stream = request.Content.ContentReadStream;
// write it to
using (FileStream fileStream = File.Create(fullFileName, (int)stream.Length)) {
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, (int)bytesInStream.Length);
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
But on the preview 6, HttpRequestMessage.Content.ContentReadStream property is gone. I believe that it now should look like this one:
// grab the posted stream
System.Threading.Tasks.Task<Stream> stream = request.Content.ReadAsStreamAsync();
But I couldn't figure out what the rest of the code should be like inside the using statement. Can anyone provide me a way of doing it?
You might have to adjust this depending on what code is happening before/after, and there's no error handling, but something like this:
Task task = request.Content.ReadAsStreamAsync().ContinueWith(t =>
{
var stream = t.Result;
using (FileStream fileStream = File.Create(fullFileName, (int) stream.Length))
{
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, (int) bytesInStream.Length);
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
});
If, later in your code, you need to ensure that this has completed, you can call task.Wait() and it will block until this has completed (or thrown an exception).
I highly recommend Stephen Toub's Patterns of Parallel Programming to get up to speed on some of the new async patterns (tasks, data parallelism etc) in .NET 4.
Quick and dirty fix:
// grab the posted stream
Task<Stream> streamTask = request.Content.ReadAsStreamAsync();
Stream stream = streamTask.Result; //blocks until Task is completed
Be aware that the fact that the sync version has been removed from the API suggests that you should really be attempting to learn the new async paradigms to avoid gobbling up many threads under high load.
You could for instance:
streamTask.ContinueWith( _ => {
var stream = streamTask.Result; //result already available, so no blocking
//work with stream here
} )
or with new async await features:
//async wait until task is complete
var stream = await request.Content.ReadAsStreamAsync();
Take time to learn async/await. It's pretty handy.
Here is how you can do this better with async and await:
private async void WhatEverMethod()
{
var stream = await response.Content.ReadAsStreamAsync();
using (FileStream fileStream = File.Create(fullFileName, (int)stream.Length))
{
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, (int)bytesInStream.Length);
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
});

Categories

Resources