Await doesn't give a result - c#

Ok, I have some code to present. Here is extension method for NetworkStream object.
public async static Task<byte[]> ReadDataAsync(this NetworkStream clientStream)
{
byte[] data = {};
var buffer = new byte[1024];
if (clientStream.CanRead)
{
using (var ms = new MemoryStream())
{
try
{
int bytesRead;
while (clientStream.DataAvailable &&
(bytesRead = await clientStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
await ms.WriteAsync(buffer, 0, bytesRead);
}
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
return data;
}
data = ms.ToArray();
}
}
else
{
Console.WriteLine("Closing clientStream.");
clientStream.Close();
}
return data;
}
And the code where I am trying to call this method.
public async static Task Preform(Socket client)
{
var stream = new NetworkStream(client);
var data = await stream.ReadDataAsync();
var message = await MessageFabrique.DeserializeMessage(data);
ServerCollections.Instance.ServerIssueQueue.Add(new ServerIssue
{
Message = message,
ClientStream = stream
});
}
ReadDataAsync method always returns me to an empty array. And at the moment when i'm trying to deserialize data there is an exception - because data[0]. Please help me. Why is this happening, if await guarantees me the result, when it needed?

clientStream.DataAvailable does not mean data might show up in the future. It means data is available right now for reading. Get rid of it and just read, the read will block till data shows up or will return 0 when the stream hits it's end.

Scott's answer is right, but .Net already takes care of you...
You might consider Stream.CopyToAsync
await clientStream.CopyToAsync(ms)
for code with considerably less places to go wrong.

In addition to the other answers, you might also want to create a synchronization context. See this article for details.
The summary is that async/await works differently in console applications than it does in a UI application. WPF and WebForms applications have a synchronization context by default but console applications don't. The result (which is actually remarkably "poorly advertised" in the documentation) is that the behavior of async/await is much less predictable in a console application than it is in a UI application, and that this might make it not work "as advertised" under certain circumstances.
For example, in a UI application "async" doesn't necessarily mean that the code runs on a background thread. It's the equivalent of "come back to me later when I'm ready." As an analogy, consider going out to eat with 10 people: when the waiter comes by, the first person he asks to order isn't ready. Two bad solutions here would be to a) bring in a second waiter to either wait for the first guy to become ready or take the other 9 people's orders) or b) wait until the first guy's ready to start taking orders. The optimal thing is to take the other 9 people's orders and then come back to the first guy hoping he'll be ready by that time. At risk of oversimplifying this is basically how async works in a UI (unless you're explicitly putting the code on a background thread with something like Task.Run). However, in a console application when you use async there's no guarantee as to where the code will actually run.
If, however, you add a synchronization context as described in the the article I link to it'll behave in a much more predictable manner.

Related

EF core query loop asynchronous

In the previous day I am looking for a way to make my code fully asynchronous. So that when called by a rest API, I' ll get an immediate response meanwhile the process is running in the background.
To do that I simply used
tasks.Add(Task<bool>.Run( () => WholeProcessFunc(parameter) ))
where WholeProcessFunc is the function that make all the calculations(it may be computationally intensive).
It works as expected however I read that it is not optimal to wrap the whole process in a Task.Run.
My code need to compute different entity framework query which result depends on the previous one and contains also foreach loop.
For instance I can' t understand which is the best practice to make async a function like this:
public async Task<List<float>> func()
{
List<float> acsi = new List<float>();
using (var db = new EFContext())
{
long[] ids = await db.table1.Join(db.table2 /*,...*/)
.Where(/*...*/)
.Select(/*...*/).ToArrayAsync();
foreach (long id in ids)
{
var all = db.table1.Join(/*...*/)
.Where(/*...*/);
float acsi_temp = await all.OrderByDescending(/*...*/)
.Select(/*...*/).FirstAsync();
if (acsi_temp < 0) { break; }
acsi.Add(acsi_temp);
}
}
return acsi;
}
In particular I have difficulties with the foreach loop and the fact that the result of a query is used in the next .
Finally with the break statement which I don't get how to translate it. I read about cancellation token, could it be the way ?
Is wrapping up all this function in a Task.Run a solid solution ?
In the previous day I am looking for a way to make my code fully asynchronous. So that when called by a rest api, I' ll get an immediate response meanwhile the process is running in the background.
Well, that's one meaning of the word "asynchronous". Unfortunately, it's completely different than the kind of "asynchronous" that async/await does. async yields to the thread pool, not the client (browser).
It works as expected however I read that it is not optimal to wrap the whole process in a Task.Run.
It only seems to work as expected. It's likely that once your web site gets higher load, it will start to fail. It's definite that once your web site gets busier and you do things like rolling upgrades, it will start to fail.
Is wrapping up all this function in a Task.Run a solid solution ?
Not at all. Fire-and-forget is inherently dangerous.
A proper solution should be a basic distributed architecture:
A durable queue, such as an Azure Queue or Rabbit (if properly configured to be durable).
An independent processor, such as an Azure Function or Win32 Service.
Then the ASP.NET app will encode the work to be done into a queue message, enqueue that to the durable queue, and then return. Some time later, the processor will retrieve the message from that queue and do the actual work.
You can translate your code to return an IAsyncEnumerable<...>, that way the caller can process the results as they are obtained. In an asp.net 5 MVC endpoint, this includes writing serialised json to the browser;
public async IAsyncEnumerable<float> func()
{
using (var db = new EFContext())
{
//...
foreach (long id in ids)
{
//...
if(acsi_temp<0) { yield break; }
yield return acsi_temp;
}
}
}
public async Task<IActionResult> ControllerAction(){
if (...)
return NotFound();
return Ok(func());
}
Note that if your endpoint is an async IAsyncEnumerable coroutine. In asp.net 5, your headers would be flushed before your action even started. Giving you no way to return any http error codes.
Though for performance, you should try rework your queries so you can fetch all the data up front.

Windows Store App how to continuously read from a TCP socket and update the UI when "\n" is received

I am working with this code in order to comunicate with another program through TCP which acts as a server. My app is also a Windows Store App and I just added the 3 methods to my code. The server is not made by me, I can't modify it in any way. Connection and sending messages works fine. After I give a specific command to the server, it sends back a continuous stream composed of strings that end in "\r\n" in order to see when a message ends, something like this: "string1\r\nstring2\r\n" and so on, as long as there is a connection with it. Note that sending the command works because I get a visual response from the server.
I can not find a way to display the individual strings in my app's UI, I think my problem lies in the read() method, because the stream never "consumes":
public async Task<String> read()
{
DataReader reader;
StringBuilder strBuilder;
using (reader = new DataReader(socket.InputStream))
{
strBuilder = new StringBuilder();
// Set the DataReader to only wait for available data (so that we don't have to know the data size)
reader.InputStreamOptions = Windows.Storage.Streams.InputStreamOptions.Partial;
// The encoding and byte order need to match the settings of the writer we previously used.
reader.UnicodeEncoding = Windows.Storage.Streams.UnicodeEncoding.Utf8;
reader.ByteOrder = Windows.Storage.Streams.ByteOrder.LittleEndian;
// Send the contents of the writer to the backing stream.
// Get the size of the buffer that has not been read.
await reader.LoadAsync(256);
// Keep reading until we consume the complete stream.
while (reader.UnconsumedBufferLength > 0)
{
strBuilder.Append(reader.ReadString(reader.UnconsumedBufferLength));
await reader.LoadAsync(256);
}
reader.DetachStream();
return strBuilder.ToString();
}
}
I have an event on a button that calls send() having a parameter the string command I wish to send. At first, I simply tried textBox.Text = await read(); after calling the send() method, nothing appeared in the textBox. Next, I tried making the read() method to not return anything and putting textBox.Text = strBuilder.ToString(); in different places inside read(). Finally, I discovered if I put it inside while (reader.UnconsumedBufferLength > 0) after strBuilder.Append(reader.ReadString(reader.UnconsumedBufferLength)); the textBox gets updated, although I'm not sure if the strings really appear correctly, but my UI becomes unresponsive, probably because it gets stuck in the while loop. I searched the internet for multiple examples, including how to do it in a separate thread, unfortunately my experience is entry-level and this is the best I could do, I don't know how to adapt the code any further. I hope I have been explicit enough. Also, I don't mind if you show me a different, better way of updating the UI

Reading from Stream using Observable through FromAsyncPattern, how to close/cancel properly

Need: long-running program with TCP connections
A C# 4.0 (VS1010, XP) program needs to connect to a host using TCP, send and receive bytes, sometimes close the connection properly and reopen it later. Surrounding code is written using Rx.Net Observable style. The volume of data is low but the program should runs continuously (avoid memory leak by taking care of properly disposing resources).
The text below is long because I explain what I searched and found. It now appears to work.
Overall questions are: since Rx is sometime unintuitive, are the solutions good? Will that be reliable (say, may it run for years without trouble)?
Solution so far
Send
The program obtains a NetworkStream like this:
TcpClient tcpClient = new TcpClient();
LingerOption lingerOption = new LingerOption(false, 0); // Make sure that on call to Close(), connection is closed immediately even if some data is pending.
tcpClient.LingerState = lingerOption;
tcpClient.Connect(remoteHostPort);
return tcpClient.GetStream();
Asynchronous sending is easy enough. Rx.Net allows to handle this with much shorter and cleaner code than traditional solutions. I created a dedicated thread with an EventLoopScheduler. The operations needing a send are expressed using IObservable. Using ObserveOn(sendRecvThreadScheduler) guarantee that all send operations are done on that thread.
sendRecvThreadScheduler = new EventLoopScheduler(
ts =>
{
var thread = new System.Threading.Thread(ts) { Name = "my send+receive thread", IsBackground = true };
return thread;
});
// Loop code for sending not shown (too long and off-topic).
So far this is excellent and flawless.
Receive
It seems that to receive data, Rx.Net should also allow shorter and cleaner code that traditional solutions.
After reading several resources (e.g. http://www.introtorx.com/ ) and stackoverflow, it seems that a very simple solution is to bridge the Asynchronous Programming to Rx.Net like in https://stackoverflow.com/a/14464068/1429390 :
public static class Ext
{
public static IObservable<byte[]> ReadObservable(this Stream stream, int bufferSize)
{
// to hold read data
var buffer = new byte[bufferSize];
// Step 1: async signature => observable factory
var asyncRead = Observable.FromAsyncPattern<byte[], int, int, int>(
stream.BeginRead,
stream.EndRead);
return Observable.While(
// while there is data to be read
() => stream.CanRead,
// iteratively invoke the observable factory, which will
// "recreate" it such that it will start from the current
// stream position - hence "0" for offset
Observable.Defer(() => asyncRead(buffer, 0, bufferSize))
.Select(readBytes => buffer.Take(readBytes).ToArray()));
}
}
It mostly works. I can send and receive bytes.
Close time
This is when things start to go wrong.
Sometimes I need to close the stream and keep things clean. Basically this means: stop reading, end the byte-receiving observable, open a new connection with a new one.
For one thing, when connection is forcibly closed by remote host, BeginRead()/EndRead() immediately loop consuming all CPU returning zero bytes. I let higher level code notice this (with a Subscribe() to the ReadObservable in a context where high-level elements are available) and cleanup (including closing and disposing of the stream). This works well, too, and I take care of disposing of the object returned by Subscribe().
someobject.readOneStreamObservableSubscription = myobject.readOneStreamObservable.Subscribe(buf =>
{
if (buf.Length == 0)
{
MyLoggerLog("Read explicitly returned zero bytes. Closing stream.");
this.pscDestroyIfAny();
}
});
Sometimes, I just need to close the stream. But apparently this must cause exceptions to be thrown in the asynchronous read. c# - Proper way to prematurely abort BeginRead and BeginWrite? - Stack Overflow
I added a CancellationToken that causes Observable.While() to end the sequence. This does not help much to avoid these exceptions since BeginRead() can sleep for a long time.
Unhandled exception in the observable caused the program to exit. Searching provided .net - Continue using subscription after exception - Stack Overflow which suggested to add a Catch that resumes the broken Observable with an empty one, effectively.
Code looks like this:
public static IObservable<byte[]> ReadObservable(this Stream stream, int bufferSize, CancellationToken token)
{
// to hold read data
var buffer = new byte[bufferSize];
// Step 1: async signature => observable factory
var asyncRead = Observable.FromAsyncPattern<byte[], int, int, int>(
stream.BeginRead,
stream.EndRead);
return Observable.While(
// while there is data to be read
() =>
{
return (!token.IsCancellationRequested) && stream.CanRead;
},
// iteratively invoke the observable factory, which will
// "recreate" it such that it will start from the current
// stream position - hence "0" for offset
Observable.Defer(() =>
{
if ((!token.IsCancellationRequested) && stream.CanRead)
{
return asyncRead(buffer, 0, bufferSize);
}
else
{
return Observable.Empty<int>();
}
})
.Catch(Observable.Empty<int>()) // When BeginRead() or EndRead() causes an exception, don't choke but just end the Observable.
.Select(readBytes => buffer.Take(readBytes).ToArray()));
}
What now? Question
This appears to work well. Conditions where remote host forcibly closed the connection or is just no longer reachable are detected, causing higher level code to close the connection and retry. So far so good.
I'm unsure if things feel quite right.
For one thing, that line:
.Catch(Observable.Empty<int>()) // When BeginRead() or EndRead() causes an exception, don't choke but just end the Observable.
feels like the bad practice of empty catch block in imperative code. Actual code does log the exception, and higher level code detect the absence of reply and correctly handle, so it should be considered fairly okay (see below)?
.Catch((Func<Exception, IObservable<int>>)(ex =>
{
MyLoggerLogException("On asynchronous read from network.", ex);
return Observable.Empty<int>();
})) // When BeginRead() or EndRead() causes an exception, don't choke but just end the Observable.
Also, this is indeed shorter than most traditional solutions.
Are the solutions correct or did I miss some simpler/cleaner ways?
Are there some dreadful problems that would look obvious to wizards of Reactive Extensions?
Thank you for your attention.

HttpContent.ReadAsStringAsync causes request to hang (or other strange behaviours)

We are building a highly concurrent web application, and recently we have started using asynchronous programming extensively (using TPL and async/await).
We have a distributed environment, in which apps communicate with each other through REST APIs (built on top of ASP.NET Web API). In one specific app, we have a DelegatingHandler that after calling base.SendAsync (i.e., after calculating the response) logs the response to a file. We include the response's basic information in the log (status code, headers and content):
public static string SerializeResponse(HttpResponseMessage response)
{
var builder = new StringBuilder();
var content = ReadContentAsString(response.Content);
builder.AppendFormat("HTTP/{0} {1:d} {1}", response.Version.ToString(2), response.StatusCode);
builder.AppendLine();
builder.Append(response.Headers);
if (!string.IsNullOrWhiteSpace(content))
{
builder.Append(response.Content.Headers);
builder.AppendLine();
builder.AppendLine(Beautified(content));
}
return builder.ToString();
}
private static string ReadContentAsString(HttpContent content)
{
return content == null ? null : content.ReadAsStringAsync().Result;
}
The problem is this: when the code reaches content.ReadAsStringAsync().Result under heavy server load, the request sometimes hangs on IIS. When it does, it sometimes returns a response -- but hangs on IIS as if it didn't -- or in other times it never returns.
I have also tried reading the content using ReadAsByteArrayAsync and then converting it to String, with no luck.
When I convert the code to use async throughout I get even weirder results:
public static async Task<string> SerializeResponseAsync(HttpResponseMessage response)
{
var builder = new StringBuilder();
var content = await ReadContentAsStringAsync(response.Content);
builder.AppendFormat("HTTP/{0} {1:d} {1}", response.Version.ToString(2), response.StatusCode);
builder.AppendLine();
builder.Append(response.Headers);
if (!string.IsNullOrWhiteSpace(content))
{
builder.Append(response.Content.Headers);
builder.AppendLine();
builder.AppendLine(Beautified(content));
}
return builder.ToString();
}
private static Task<string> ReadContentAsStringAsync(HttpContent content)
{
return content == null ? Task.FromResult<string>(null) : content.ReadAsStringAsync();
}
Now HttpContext.Current is null after the call to content.ReadAsStringAsync(), and it keeps being null for all the subsequent requests! I know this sounds unbelievable -- and it took me some time and the presence of three coworkers to accept that this was really happening.
Is this some kind of expected behavior? Am I doing something wrong here?
I had this problem. Although, I haven't fully tested yet, using CopyToAsync instead of ReadAsStringAsync seems to fix the problem:
var ms = new MemoryStream();
await response.Content.CopyToAsync(ms);
ms.Seek(0, SeekOrigin.Begin);
var sr = new StreamReader(ms);
responseContent = sr.ReadToEnd();
With regards to your second issue, the async/await is syntactic sugar for the compiler building a state machine where the call to to a function preceded by "await" returns immediately on the current thread...one that contains HttpContext.Current in its thread local storage. The completion of that async call can occur on a different thread...one that does NOT have HttpContext.Current in its thread local storage.
If you want the completion to execute on the same thread (thus having the same objects in thread local storage like HttpContext.Current), then you need to be aware of this behavior. This is especially important on calls from the main UI thread (if you're building a Windows application) or in ASP.NET, calls from an ASP.NET request thread where you are dependent on HttpContext.Current.
See reference docs on ConfigureAwait(false). Also, view some Channel 9 tutorials on TPL. Once the "easy" stuff is grokked, the presenter will invariably talk about this issue as it causes subtle problems that are not easily understood unless you know what the TPL is doing underneath the covers.
Good luck.
With regards to your first problem, if the caller gets a result, I'm not convinced that IIS has not completed the request. How are you determining that the ASP.NET request thread initiated by this caller is hung in IIS?

C# Threading - Reading and hashing multiple files concurrently, easiest method?

I've been trying to get what I believe to be the simplest possible form of threading to work in my application but I just can't do it.
What I want to do: I have a main form with a status strip and a progress bar on it. I have to read something between 3 and 99 files and add their hashes to a string[] which I want to add to a list of all files with their respective hashes. Afterwards I have to compare the items on that list to a database (which comes in text files).
Once all that is done, I have to update a textbox in the main form and the progressbar to 33%; mostly I just don't want the main form to freeze during processing.
The files I'm working with always sum up to 1.2GB (+/- a few MB), meaning I should be able to read them into byte[]s and process them from there (I have to calculate CRC32, MD5 and SHA1 of each of those files so that should be faster than reading all of them from a HDD 3 times).
Also I should note that some files may be 1MB while another one may be 1GB. I initially wanted to create 99 threads for 99 files but that seems not wise, I suppose it would be best to reuse threads of small files while bigger file threads are still running. But that sounds pretty complicated to me so I'm not sure if that's wise either.
So far I've tried workerThreads and backgroundWorkers but neither seem to work too well for me; at least the backgroundWorkers worked SOME of the time, but I can't even figure out why they won't the other times... either way the main form still froze.
Now I've read about the Task Parallel Library in .NET 4.0 but I thought I should better ask someone who knows what he's doing before wasting more time on this.
What I want to do looks something like this (without threading):
List<string[]> fileSpecifics = new List<string[]>();
int fileMaxNumber = 42; // something between 3 and 99, depending on file set
for (int i = 1; i <= fileMaxNumber; i++)
{
string fileName = "C:\\path\\to\\file" + i.ToString("D2") + ".ext"; // file01.ext - file99.ext
string fileSize = new FileInfo(fileName).Length.ToString();
byte[] file = File.ReadAllBytes(fileName);
// hash calculations (using SHA1CryptoServiceProvider() etc., no problems with that so I'll spare you that, return strings)
file = null; // I didn't yet check if this made any actual difference but I figured it couldn't hurt
fileSpecifics.Add(new string[] { fileName, fileSize, fileCRC, fileMD5, fileSHA1 });
}
// look for files in text database mentioned above, i.e. first check for "file bundles" with the same amount of files I have here; then compare file sizes, then hashes
// again, no problems with that so I'll spare you that; the database text files are pretty small so parsing them doesn't need to be done in an extra thread.
Would anybody be kind enough to point me in the right direction? I'm looking for the easiest way to read and hash those files quickly (I believe the hashing takes some time in which other files could already be read) and save the output to a string[], without the main form freezing, nothing more, nothing less.
I'm thankful for any input.
EDIT to clarify: by "backgroundWorkers working some of the time" I meant that (for the very same set of files), maybe the first and fourth execution of my code produces the correct output and the UI unfreezes within 5 seconds, for the second, third and fifth execution it freezes the form (and after 60 seconds I get an error message saying some thread didn't respond within that time frame) and I have to stop execution via VS.
Thanks for all your suggestions and pointers, as you all have correctly guessed I'm completely new to threading and will have to read up on the great links you guys posted.
Then I'll give those methods a try and flag the answer that helped me the most. Thanks again!
With .NET Framework 4.X
Use Directory.EnumerateFiles Method for efficient/lazy files enumeration
Use Parallel.For() to delegate parallelism work to PLINQ framework or use TPL to delegate single Task per pipeline Stage
Use Pipelines pattern to pipeline following stages: calculating hashcodes, compare with pattern, update UI
To avoid UI freeze use appropriate techniques: for WPF use Dispatcher.BeginInvoke(), for WinForms use Invoke(), see this SO answer
Considering that all this stuff has UI it might be useful adding some cancellation feature to abandon long running operation if needed, take a look at the CreateLinkedTokenSource class which allows triggering CancellationToken from the "external scope"
I can try adding an example but it's worth do it yourself so you would learn all this stuff rather than simply copy/paste - > got it working -> forgot about it.
PS: Must read - Pipelines paper at MSDN
TPL specific pipeline implementation
Pipeline pattern implementation: three stages: calculate hash, match, update UI
Three tasks, one per stage
Two Blocking Queues
//
// 1) CalculateHashesImpl() should store all calculated hashes here
// 2) CompareMatchesImpl() should read input hashes from this queue
// Tuple.Item1 - hash, Typle.Item2 - file path
var calculatedHashes = new BlockingCollection<Tuple<string, string>>();
// 1) CompareMatchesImpl() should store all pattern matching results here
// 2) SyncUiImpl() method should read from this collection and update
// UI with available results
var comparedMatches = new BlockingCollection<string>();
var factory = new TaskFactory(TaskCreationOptions.LongRunning,
TaskContinuationOptions.None);
var calculateHashesWorker = factory.StartNew(() => CalculateHashesImpl(...));
var comparedMatchesWorker = factory.StartNew(() => CompareMatchesImpl(...));
var syncUiWorker= factory.StartNew(() => SyncUiImpl(...));
Task.WaitAll(calculateHashesWorker, comparedMatchesWorker, syncUiWorker);
CalculateHashesImpl():
private void CalculateHashesImpl(string directoryPath)
{
foreach (var file in Directory.EnumerateFiles(directoryPath))
{
var hash = CalculateHashTODO(file);
calculatedHashes.Add(new Tuple<string, string>(hash, file.Path));
}
}
CompareMatchesImpl():
private void CompareMatchesImpl()
{
foreach (var hashEntry in calculatedHashes.GetConsumingEnumerable())
{
// TODO: obviously return type is up to you
string matchResult = GetMathResultTODO(hashEntry.Item1, hashEntry.Item2);
comparedMatches.Add(matchResult);
}
}
SyncUiImpl():
private void UpdateUiImpl()
{
foreach (var matchResult in comparedMatches.GetConsumingEnumerable())
{
// TODO: track progress in UI using UI framework specific features
// to do not freeze it
}
}
TODO: Consider using CancellationToken as a parameter for all GetConsumingEnumerable() calls so you easily can stop a pipeline execution when needed.
First off, you should be using a higher level of abstraction to solve this problem. You have a bunch of tasks to complete, so use the "task" abstraction. You should be using the Task Parallel Library to do this sort of thing. Let the TPL deal with the question of how many worker threads to create -- the answer could be as low as one if the work is gated on I/O.
If you do want to do your own threading, some good advice:
Do not ever block on the UI thread. That's is what is freezing your application. Come up with a protocol by which working threads can communicate with your UI thread, which then does nothing except for responding to UI events. Remember that methods of user interface controls like task completion bars must never be called by any other thread other than the UI thread.
Do not create 99 threads to read 99 files. That's like getting 99 pieces of mail and hiring 99 assistants to write responses: an extraordinarily expensive solution to a simple problem. If your work is CPU intensive then there is no point in "hiring" more threads than you have CPUs to service them. (That's like hiring 99 assistants in an office that only has four desks. The assistants spend most of their time waiting for a desk to sit at instead of reading your mail.) If your work is disk-intensive then most of those threads are going to be idle most of the time waiting for the disk, which is an even bigger waste of resources.
First, I hope you are using a built-in library for calculating hashes. It's possible to write your own, but it's far safer to use something that has been around for a while.
You may need only create as many threads as CPUs if your process is CPU intensive. If it is bound by I/O, you might be able to get away with more threads.
I do not recommend loading the entire file into memory. Your hashing library should support updating a chunk at a time. Read a chunk into memory, use it to update the hashes of each algorighm, read the next chunk, and repeat until end of file. The chunked approach will help lower your program's memory demands.
As others have suggested, look into the Task Parallel Library, particularly Data Parallelism. It might be as easy as this:
Parallel.ForEach(fileSpecifics, item => CalculateHashes(item));
Check out TPL Dataflow. You can use a throttled ActionBlock which will manage the hard part for you.
If my understanding that you are looking to perform some tasks in the background and not block your UI, then the UI BackgroundWorker would be an appropriate choice. You mentioned that you got it working some of the time, so my recommendation would be to take what you had in a semi-working state, and improve upon it by tracking down the failures. If my hunch is correct, your worker was throwing an exception, which it does not appear you are handling in your code. Unhandled exceptions that bubble out of their containing threads make bad things happen.
This code hashing one file (stream) using two tasks - one for reading, second for hashing, for more robust way you should read more chunks forward.
Because bandwidth of processor is much higher than of disk, unless you use some high speed Flash drive you gain nothing from hashing more files concurrently.
public void TransformStream(Stream a_stream, long a_length = -1)
{
Debug.Assert((a_length == -1 || a_length > 0));
if (a_stream.CanSeek)
{
if (a_length > -1)
{
if (a_stream.Position + a_length > a_stream.Length)
throw new IndexOutOfRangeException();
}
if (a_stream.Position >= a_stream.Length)
return;
}
System.Collections.Concurrent.ConcurrentQueue<byte[]> queue =
new System.Collections.Concurrent.ConcurrentQueue<byte[]>();
System.Threading.AutoResetEvent data_ready = new System.Threading.AutoResetEvent(false);
System.Threading.AutoResetEvent prepare_data = new System.Threading.AutoResetEvent(false);
Task reader = Task.Factory.StartNew(() =>
{
long total = 0;
for (; ; )
{
byte[] data = new byte[BUFFER_SIZE];
int readed = a_stream.Read(data, 0, data.Length);
if ((a_length == -1) && (readed != BUFFER_SIZE))
data = data.SubArray(0, readed);
else if ((a_length != -1) && (total + readed >= a_length))
data = data.SubArray(0, (int)(a_length - total));
total += data.Length;
queue.Enqueue(data);
data_ready.Set();
if (a_length == -1)
{
if (readed != BUFFER_SIZE)
break;
}
else if (a_length == total)
break;
else if (readed != BUFFER_SIZE)
throw new EndOfStreamException();
prepare_data.WaitOne();
}
});
Task hasher = Task.Factory.StartNew((obj) =>
{
IHash h = (IHash)obj;
long total = 0;
for (; ; )
{
data_ready.WaitOne();
byte[] data;
queue.TryDequeue(out data);
prepare_data.Set();
total += data.Length;
if ((a_length == -1) || (total < a_length))
{
h.TransformBytes(data, 0, data.Length);
}
else
{
int readed = data.Length;
readed = readed - (int)(total - a_length);
h.TransformBytes(data, 0, data.Length);
}
if (a_length == -1)
{
if (data.Length != BUFFER_SIZE)
break;
}
else if (a_length == total)
break;
else if (data.Length != BUFFER_SIZE)
throw new EndOfStreamException();
}
}, this);
reader.Wait();
hasher.Wait();
}
Rest of code here: http://hashlib.codeplex.com/SourceControl/changeset/view/71730#514336

Categories

Resources