Reading all lines from file asynchronously and safely - c#

I'm trying to read from a file asynchronously, and safely (minimum level of permissions sought). I'm using .NET 3.5 and cannot find a good example for this (all uses async and await).
public string GetLines()
{
var encoding = new UnicodeEncoding();
byte[] allText;
using (FileStream stream =File.Open(_path, FileMode.Open))
{
allText = new byte[stream.Length];
//something like this, but does not compile in .net 3.5
stream.ReadAsync(allText, 0, (int) allText.Length);
}
return encoding.GetString(allText);
}
Question is, how do I do this asynchronously in .net 3.5, wait till the operation is finished and send back all lines to the caller?
The caller can wait till the operation is complete, but the read has to happen in a background thread.
The caller is a UI thread, and I'm using .NET 3.5

There are several options, but the simplest would be to have this method accept a callback, and then call it when it has computed the given value. The caller than needs to pass in the callback method to process the results rather than blocking on the method call:
public static void GetLines(Action<string> callback)
{
var encoding = new UnicodeEncoding();
byte[] allText;
FileStream stream = File.Open(_path, FileMode.Open);
allText = new byte[stream.Length];
//something like this, but does not compile in .net 3.5
stream.ReadAsync(allText, 0, (int)allText.Length);
stream.BeginRead(allText, 0, allText.Length, result =>
{
callback(encoding.GetString(allText));
stream.Dispose();
}, null);
}

If you want to wait until the operation is complete, why do you need to do it asynchronously?
return File.ReadAllText(_path, new UnicodeEncoding());
Would do the trick

Maybe something like this:
GetLines(string path, ()=>
{
// here your code...
});
public void GetLines(string _path, Action<string> callback)
{
var result = string.Empty;
new Action(() =>
{
var encoding = new UnicodeEncoding();
byte[] allText;
using (FileStream stream = File.Open(_path, FileMode.Open))
{
allText = new byte[stream.Length];
//something like this, but does not compile in .net 3.5
stream.Read(allText, 0, (int)allText.Length);
}
result = encoding.GetString(allText);
}).BeginInvoke(x => callback(result), null);
}

Related

Task.FromAsync and two threads

I'm using .net 4.0 and have following code:
var stream = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize,
FileOptions.Asynchronous | FileOptions.SequentialScan);
var buffer = new byte[bufferSize];
Debug.Assert(stream.IsAsync, "stream.IsAsync");
var ia = stream.BeginRead(buffer, 0, buffer.Length, t =>
{
var ms = new MemoryStream(buffer);
using (TextReader rdr = new StreamReader(ms, Encoding.ASCII))
{
for (uint iEpoch = 0; item < FileHeader.NUMBER_OF_ITEMS; item++)
{
dataList.Add(epochData);
}
}
}, null);
return Task<int>.Factory.FromAsync(ia, t =>
{
var st = stream;
var bytes1 = st.EndRead(t);
var a = EpochDataList.Count;
var b = FileHeader.NUMBER_OF_EPOCHS;
Debug.Assert(a == b);
st.Dispose();
return bytes1;
});
And it seems that there are race conditions between execution of async callback and end method lambda function(assert is raising). But according to msdn it is explicitly stated that end method should be executing after async callback is finished:
Creates a Task that executes an end method function when a specified IAsyncResult completes.
Am I right that I'm confusing fact of completion of IO operation which triggering end method and fact of completion of async callback, so they both can potentially execute in the same time?
Meanwhile this code works great:
return Task<int>.Factory.FromAsync(stream.BeginRead, (ai) =>
{
var ms = new MemoryStream(buffer);
using (TextReader rdr = new StreamReader(ms, Encoding.ASCII))
{
using (TextReader rdr = new StreamReader(ms, Encoding.ASCII))
{
for (uint iEpoch = 0; item < FileHeader.NUMBER_OF_ITEMS; item++)
{
dataList.Add(epochData);
}
}
}
stream.Dispose();
return stream.EndRead(ai);
}, buffer, 0, buffer.Length, null);
Also I need to mention that returned task is used within continuation.
Thanks in advance.
You're doing this so wrong, I'm almost inclined not to answer - you're going to hurt someone with that code. But since this isn't Code Review...
Your most immediate problem is that the callback you provide to BeginRead isn't part of the IAsyncResult at all. Thus, when a specified IAsyncResult completes doesn't talk about your callback, it only talks about the underlying asynchronous operation - you get two separate callbacks launched by the same event.
Now, for the other problems:
You need to keep issuing BeginReads over and over again, until EndRead returns 0. Otherwise, you're only ever reading the whole buffer at most - if your file is longer than that, you're not going to read the whole file.
You're combining old-school asynchronous API callbacks with Task-based asynchrony. This is bound to give you trouble. Just learn to use Tasks properly, and you'll find the callbacks are 100% unnecessary.
EndRead is telling you how many bytes were actually read in the preceding BeginRead operation - you're ignoring that information.
Doing this correctly isn't all that easy - if possible, I'd suggest upgrading to .NET 4.5, and taking advantage of the await keyword. If that's not possible, you can install the async targetting pack, which adds await to 4.0 as a simple NuGet package.
With await, reading the whole file is as simple as
using (var sr = new StreamReader(fs))
{
string line;
while ((line = await sr.ReadLineAsync(buffer, 0, buffer.Length)) > 0)
{
// Do whatever
}
}

file writing using blockingcollection

I have a tcp listener which listens and writes data from the server. I used a BlockingCollection to store data. Here I don't know when the file ends. So, my filestream is always open.
Part of my code is:
private static BlockingCollection<string> Buffer = new BlockingCollection<string>();
Process()
{
var consumer = Task.Factory.StartNew(() =>WriteData());
while()
{
string request = await reader.ReadLineAsync();
Buffer.Add(request);
}
}
WriteData()
{
FileStream fStream = new FileStream(filename,FileMode.Append,FileAccess.Write,FileShare.Write, 16392);
foreach(var val in Buffer.GetConsumingEnumerable(token))
{
fStream.Write(Encoding.UTF8.GetBytes(val), 0, val.Length);
fStream.Flush();
}
}
The problem is I cannot dispose filestream within loop otherwise I have to create filestream for each line and the loop may never end.
This would be much easier in .NET 4.5 if you used a DataFlow ActionBlock. An ActionBlock accepts and buffers incoming messages and processes them asynchronously using one or more Tasks.
You could write something like this:
public static async Task ProcessFile(string sourceFileName,string targetFileName)
{
//Pass the target stream as part of the message to avoid globals
var block = new ActionBlock<Tuple<string, FileStream>>(async tuple =>
{
var line = tuple.Item1;
var stream = tuple.Item2;
await stream.WriteAsync(Encoding.UTF8.GetBytes(line), 0, line.Length);
});
//Post lines to block
using (var targetStream = new FileStream(targetFileName, FileMode.Append,
FileAccess.Write, FileShare.Write, 16392))
{
using (var sourceStream = File.OpenRead(sourceFileName))
{
await PostLines(sourceStream, targetStream, block);
}
//Tell the block we are done
block.Complete();
//And wait fo it to finish
await block.Completion;
}
}
private static async Task PostLines(FileStream sourceStream, FileStream targetStream,
ActionBlock<Tuple<string, FileStream>> block)
{
using (var reader = new StreamReader(sourceStream))
{
while (true)
{
var line = await reader.ReadLineAsync();
if (line == null)
break;
var tuple = Tuple.Create(line, targetStream);
block.Post(tuple);
}
}
}
Most of the code deals with reading each line and posting it to the block. By default, an ActionBlock uses only a single Task to process one message at a time, which is fine in this scenario. More tasks can be used if needed to process data in parallel.
Once all lines are read, we notify the block with a call to Complete and await for it to finish processing with await block.Completion.
Once the block's Completion task finishes we can close the target stream.
The beauty of the DataFlow library is that you can link multiple blocks together, to create a pipeline of processing steps. ActionBlock is typically the final step in such a chain. The library takes care to pass data from one block to the next and propagate completion down the chain.
For example, one step can read files from a log, a second can parse them with a regex to find specific patterns (eg error messages) and pass them on, a third can receive the error messages and write them to another file. Each step will execute on a different thread, with intermediate messages buffered at each step.

is this correct use of mutex to avoid concurrent modification to file?

i want to be able to put the code of writing file into mutex so as to avoid any concurrent modification to a file. however, I want only the file with particular name be blocked as critical section not for the other file. will this code work as expected or have I missed anything?
private async static void StoreToFileAsync(string filename, object data)
{
IsolatedStorageFile AppIsolatedStorage = IsolatedStorageFile.GetUserStoreForApplication();
if (IsolatedStorageFileExist(filename))
{
AppIsolatedStorage.DeleteFile(filename);
}
if (data != null)
{
string json = await Task.Factory.StartNew<string>(() => JsonConvert.SerializeObject(data));
if (!string.IsNullOrWhiteSpace(json))
{
byte[] buffer = Encoding.UTF8.GetBytes(json);
Mutex mutex = new Mutex(false, filename);
mutex.WaitOne();
IsolatedStorageFileStream ISFileStream = AppIsolatedStorage.CreateFile(filename);
await ISFileStream.WriteAsync(buffer, 0, buffer.Length);
ISFileStream.Close();
mutex.ReleaseMutex();
}
}
}
EDIT 1: or should I replace the async write with synchronous one and run as a separate task?
await Task.Factory.StartNew(() =>
{
if (!string.IsNullOrWhiteSpace(json))
{
byte[] buffer = Encoding.UTF8.GetBytes(json);
lock (filename)
{
IsolatedStorageFileStream ISFileStream = AppIsolatedStorage.CreateFile(filename);
ISFileStream.Write(buffer, 0, buffer.Length);
ISFileStream.Close();
}
}
});
In the general case, using a Mutex with asynchronous code is wrong. Normally, I'd say you should use something like AsyncLock from my AsyncEx library, and if you want a separate lock per file, then you'll need a dictionary or some such.
However, if your method is always called from the UI thread, then it will work. It is not very efficient, but it will work.
There are few things you should improve:
running asynchronous code 'inside' Mutex is not good, as the same thread should create and release Mutex. You will probably get exception when called not from UI thread.
filename is not a good name for a Mutex - it is global object (among all processes), so it should be unique.
you should Dispose your Mutex, and Relese it in finally clause (what happens if you get an Exception?)- BTW here is a good pattern
you should/must Dispose IsolatedFileStream
you should also think if infinite waiting is a good choice
if you aren't accessing to file amopng processes (Bacground Agents, other App) then you can use lock or SemaphoreSlim
private async static void StoreToFileAsync(string filename, object data)
{
using (IsolatedStorageFile AppIsolatedStorage = IsolatedStorageFile.GetUserStoreForApplication())
{
if (IsolatedStorageFileExist(filename))
AppIsolatedStorage.DeleteFile(filename);
if (data != null)
{
string json = await Task.Factory.StartNew<string>(() => JsonConvert.SerializeObject(data));
if (!string.IsNullOrWhiteSpace(json)) await Task.Run(() =>
{
byte[] buffer = Encoding.UTF8.GetBytes(json);
using (Mutex mutex = new Mutex(false, filename))
{
try
{
mutex.WaitOne();
using (IsolatedStorageFileStream ISFileStream = AppIsolatedStorage.CreateFile(filename))
ISFileStream.Write(buffer, 0, buffer.Length);
}
catch { }
finally { mutex.ReleaseMutex(); }
}
});
}
}
}
EDIT: Removed aynchronous call from Mutex - as it will cause problems when called not from UI thread.

Downloading simultaneously as many files as possible

I want to download images from random generated URIs and fastest way I found is unstoppable.
I'm using pregenerated List<string> of URIs reaching about 400imgs/minute (about 8 times more than while using standard Threads) but i want it to continuously generate URIs and download new images until I say it to pause. How to achieve that?
private void StartButton_Click(object sender, EventArgs e)
{
List<string> ImageURI;
GenerateURIs(out ImageURI); // creates list of 1000 uris
ImageNames.AsParallel().WithDegreeOfParallelism(50).Sum(s => DownloadFile(s));
}
private int DownloadFile(string URI)
{
try
{
HttpWebRequest webrequest = (HttpWebRequest)WebRequest.Create(URI);
webrequest.Timeout = 10000;
webrequest.ReadWriteTimeout = 10000;
webrequest.Proxy = null;
webrequest.KeepAlive = false;
HttpWebResponse webresponse = (HttpWebResponse)webrequest.GetResponse();
using (Stream sr = webrequest.GetResponse().GetResponseStream())
{
DownloadedImages++;
using (MemoryStream ms = new MemoryStream())
{
sr.CopyTo(ms);
byte[] ImageBytes = ms.ToArray();
if (ImageBytes.Length == 503)
{
InvalidImages++;
return 0;
}
else
{
ValidImages++;
using (var Writer = new BinaryWriter(new FileStream("images/" + (++FilesIndex).ToString() + ".png", FileMode.Append, FileAccess.Write)))
{
Writer.Write(ImageBytes);
}
}
}
}
}
catch (Exception e)
{
return 0;
}
return 0;
}
First off, your current code isn't thread safe. InvalidImages, DownloadedImages and ValidImages all need synchronization.
That being said, you can do this more efficiently using async instead of threading. Since nearly all of the "work" in this case is IO bound, async will likely be a far better, more scalable approach.
Try this instead:
private async void StartButton_Click(object sender, EventArgs e)
{
List<string> ImageURI;
GenerateURIs(out ImageURI); // creates list of 1000 uris
var requests = ImageURI
.Select(uri => (new WebClient()).DownloadDataTaskAsync(uri))
.Select(SaveImageFile);
await Task.WhenAll(requests);
}
private Task SaveImageFile(Task<byte[]> data)
{
try
{
byte[] ImageBytes = await data;
DownloadedImages++;
if (ImageBytes.Length == 503)
{
InvalidImages++;
return;
}
ValidImages++;
using (var file = new FileStream("images/" + (++FilesIndex).ToString() + ".png", FileMode.Append, FileAccess.Write))
{
await Writer.WriteAsync(ImageBytes, 0, ImageBytes.Length);
}
}
catch (Exception e)
{
}
return;
}
Note that, with async/await, you no longer have to worry about synchronization since those values will be set on the main UI thread still.
As for pausing, there are various options - you could add a flag of whether or not to continually execute data, or use CancellationTokenSource to provide cancellation support through the entire operation.
What you're looking for is a producer/consumer model in which you have a producer adding items to a queue and a consumer pulling items off of it. BlockingCollection makes this very easy. Create a BlockingCollection, have your producer continue to add items to it over time as you generate them, calling CompleteAdding when done, and have your consumer use GetConsumingEnumerable to which you can call your exact code on that enumerable.
You'll want both the producer and the consuming code to be moved into non-UI threads, so that they both don't block the UI and can produce/consume the data in parallel.
Also note that currently within your DownloadFile method you are mutating and accessing instance data, despite the fact that this method is likely to be called from different threads concurrently. Doing things like incrementing indexes is not safe, because it is not an atomic operation, which result in your code having possible side effects. You either need to avoid the use of shared state between these different threads, or properly synchronize access to that shared state.

Working with System.Threading.Tasks.Task<Stream> instead of Stream

I was using a method like below on the previous versions of WCF Web API:
// grab the posted stream
Stream stream = request.Content.ContentReadStream;
// write it to
using (FileStream fileStream = File.Create(fullFileName, (int)stream.Length)) {
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, (int)bytesInStream.Length);
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
But on the preview 6, HttpRequestMessage.Content.ContentReadStream property is gone. I believe that it now should look like this one:
// grab the posted stream
System.Threading.Tasks.Task<Stream> stream = request.Content.ReadAsStreamAsync();
But I couldn't figure out what the rest of the code should be like inside the using statement. Can anyone provide me a way of doing it?
You might have to adjust this depending on what code is happening before/after, and there's no error handling, but something like this:
Task task = request.Content.ReadAsStreamAsync().ContinueWith(t =>
{
var stream = t.Result;
using (FileStream fileStream = File.Create(fullFileName, (int) stream.Length))
{
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, (int) bytesInStream.Length);
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
});
If, later in your code, you need to ensure that this has completed, you can call task.Wait() and it will block until this has completed (or thrown an exception).
I highly recommend Stephen Toub's Patterns of Parallel Programming to get up to speed on some of the new async patterns (tasks, data parallelism etc) in .NET 4.
Quick and dirty fix:
// grab the posted stream
Task<Stream> streamTask = request.Content.ReadAsStreamAsync();
Stream stream = streamTask.Result; //blocks until Task is completed
Be aware that the fact that the sync version has been removed from the API suggests that you should really be attempting to learn the new async paradigms to avoid gobbling up many threads under high load.
You could for instance:
streamTask.ContinueWith( _ => {
var stream = streamTask.Result; //result already available, so no blocking
//work with stream here
} )
or with new async await features:
//async wait until task is complete
var stream = await request.Content.ReadAsStreamAsync();
Take time to learn async/await. It's pretty handy.
Here is how you can do this better with async and await:
private async void WhatEverMethod()
{
var stream = await response.Content.ReadAsStreamAsync();
using (FileStream fileStream = File.Create(fullFileName, (int)stream.Length))
{
byte[] bytesInStream = new byte[stream.Length];
stream.Read(bytesInStream, 0, (int)bytesInStream.Length);
fileStream.Write(bytesInStream, 0, bytesInStream.Length);
}
});

Categories

Resources