I need to check a NamedPipeClientStream to see if there are bytes for it to read before I attempt to read it. The reason for this is because the thread stops on any read operation if there's nothing to read and I simply cannot have that. I must be able to continue even if there's no bytes to read.
I've also tried wrapping it in a StreamReader, which I've seen suggested, but that has the same result.
StreamReader sr = new StreamReader(myPipe)
string temp;
while((temp = sr.ReadLine()) != null) //Thread stops in ReadLine
{
Console.WriteLine("Received from server: {0}", temp);
}
I either need for the read operations to not wait until there are bytes to read, or a way to check if there are bytes to read before attempting the read operations.
PipeStream does not support the Length, Position or ReadTimout properties or Seek...
This is a very bad pattern. Structure your code so that there's a reading thread that always tries to read until the stream has ended. Then, make your threads communicate to achieve the logic and control flow you want.
It is generally not possible to check whether an arbitrary Stream has data available. I think it's possible with named pipes. But even if you do that you need to ensure that incoming bytes will be read in a timely manner. There is no event for that. Even if you manage all of this the code will be quite nasty. It will not be easy to mentally verify.
For that reason, simply keep a reading loop alive. You could make that reading loop enqueue the data into a queue (maybe BlockingCollection). Then other threads can check that queue for data or wait for data to arrive. The stream will always be drained correctly. You can signal the stream end by enqueueing null.
When I say "thread" I mean any primitive that gives you the appearance of a thread. These days you would never use Thread. Rather, use async/await or Task.
Related
I have a service that takes an input Stream containing CSV data that needs to be bulk-inserted into a database, and my application is using async/await wherever possible.
The process is: Parse stream using CsvHelper's CsvParser, add each row to DataTable, use SqlBulkCopy to copy the DataTable to the database.
The data could be any size so I'd like to avoid reading the whole thing into memory at one time - obviously I'll have all that data in the DataTable by the end anyway so would essentially have 2 copies in memory.
I would like to do all of this as asynchronously as possible, but CsvHelper doesn't have any async methods so I've come up with the following workaround:
using (var inputStreamReader = new StreamReader(inputStream))
{
while (!inputStreamReader.EndOfStream)
{
// Read line from the input stream
string line = await inputStreamReader.ReadLineAsync();
using (var memoryStream = new MemoryStream())
using (var streamWriter = new StreamWriter(memoryStream))
using (var memoryStreamReader = new StreamReader(memoryStream))
using (var csvParser = new CsvParser(memoryStreamReader))
{
await streamWriter.WriteLineAsync(line);
await streamWriter.FlushAsync();
memoryStream.Position = 0;
// Loop through all the rows (should only be one as we only read a single line...)
while (true)
{
var row = csvParser.Read();
// No more rows to process
if (row == null)
{
break;
}
// Add row to DataTable
}
}
}
}
Are there any issues with this solution? Is it even necessary? I've seen that the CsvHelper devs specifically did not add async functionality (https://github.com/JoshClose/CsvHelper/issues/202) but I don't really follow the reasoning behind not doing so.
EDIT: I've just realised that this solution isn't going to work for instances where a column contains a line break anyway :( Guess I'll just have to copy the whole input Stream to a MemoryStream or something
EDIT2: Some more information.
This is in an async method in a library where I am trying to do async all the way down. It'll likely be consumed by an MVC controller (if I just wanted to offload it from a UI thread I would just Task.Run() it). Mostly the method will be waiting on external sources such as a database / DFS, and I would like for the thread to be freed while it is.
CsvParser.Read() is going to block even if what's blocking is reading the Stream (e.g. if the data I'm attempting to read resides on a server on the other side of the world), whereas if CsvHelper were to implement an async method that uses TextReader.ReadAsync(), then I wouldn't be blocked waiting for my data to arrive from Dubai. As far as I can tell, I'm not asking for an async wrapper around a synchronous method.
EDIT3: Update from far in the future! Async functionality was actually added to CsvHelper back in 2017. I hope someone at the company I was working for has upgraded to the newer version since then!
Eric lippert explained the usefulness of async-await using a metaphor of cooking a meal in a restaurant. According to his explanation it is not useful to do something asynchronously if your thread has nothing else to do.
Also, be aware that while your thread is doing something it cannot do something else. Only if your thread is waiting for something it can do other things. One of the things you wait for in your process is the reading of a file. While the thread is reading the file line by line, it has to wait several times for lines to be read. During this waiting it could do other things, like parsing the read CSV-data and sending the parsed data to your destination.
Parsing the data is not a process where your thread has to wait for some other process to finish, like it has to do when reading a file or sending data to a database. That's why there is no async version of the parsing process. A normal async-await wouldn't help keeping your thread busy, because during the parsing process there is nothing to await for, so during the parsing your thread wouldn't have time to do something else.
You could of course convert the parsing process to an awaitable task using Task.Run ( () => ParseReadData(...)), and await for this task to finish, but in the analogy of Eric Lippert's restaurant this would be defrosting a cook to do the job, while you are sitting behind the counter doing nothing.
However, if your thread has something meaningful to do, while the read CSV-data is being parsed, like responding to user input, then it might be useful to start the parsing in a separate task.
If your complete reading - parsing - updating database process doesn't need interaction with the user, but you need your thread to be free to do other things while doing the process, consider putting the complete process in a separate task, and start the task without awaiting for it. In that case you only use your interface thread to start the other task, and your interface thread is free to do other things. Starting this new task is a relatively small cost in comparison to the total time of your process.
Once again: if your thread has nothing else to do, let this thread do the processing, don't start other tasks to do it.
Here is a good article on exposing async wrappers on sync methods, and why CsvHelper didn't do it. http://blogs.msdn.com/b/pfxteam/archive/2012/03/24/10287244.aspx
If you don't want to block the UI thread, run the processing on a background thread.
CsvHelper pulls in a buffer of data. The size of the buffer is a setting that you can change if you like. If your server is on the other side of the world, it'll buffer some data, then read it. More than likely, it'll take several reads before the buffer is used.
CsvHelper also yields records, so if you don't actually get a row, nothing is read. If you only read a couple rows, only that much of the file is read (actually the buffer size).
If you're worried about memory, there are a couple simple options.
Buffer the data. You can bulk copy in 100 or 1000 rows at a time instead of the whole file. Just keep doing that until the file is done.
Use a FileStream. If you need to read the whole file at once for some reason, use a FileStream instead and write the whole thing to disc. It will be slower, but you won't be using a bunch of memory.
I'm not even sure how this would work because at its very nature an asynchronous server socket can accept multiple connections.
What I would like to do is capture content if it meets a certain format and pass it outside the server socket so that other classes can reference it. I followed the MSDN code for building a Asynchronous Server Socket. Here is the callback that reads the content.
public static void ReadCallback(IAsyncResult ar)
{
string content = String.Empty;
// Retrieve the state object and the handler socketH:\JCI\BWSI\Integrations\Middleware\Jci\Jci\Framework.Jci.EventEngine\EventEngineRTLSEvents.cs
// from the asynchronous state object.
StateObject state = (StateObject)ar.AsyncState;
Socket handler = state.workSocket;
// Read data from the client socket
int bytesRead = handler.EndReceive(ar);
if (bytesRead > 0)
{
// There might be more data, so store the data recieved so far.
state.sb.Append(Encoding.ASCII.GetString(state.buffer, 0, bytesRead));
// Check for end-of-file tag. If it is not there, read more data.
content = state.sb.ToString();
if (content.IndexOf('\u001c') > -1)
{
// ALl the data has been read from the
// client. Display it on the console.
Console.WriteLine(
"Read {0} bytes from socket. \n Data : {1}", content.Length, content);
// Echo the data back to the client.
Send(handler, content);
}
else
{
// Not all data recieived. Get more.
handler.BeginReceive(state.buffer, 0, StateObject.BufferSize, 0,
new AsyncCallback(ReadCallback), state);
}
}
}
If I understand the question correctly, the basic issue is this: when some data is received on the socket, that results in some new object being created in your program, and you want some code to operate on this object to process it in some way.
So, let's think about it this way: when you want some code to execute, how do you make that happen? Since in C# all code exists in methods, you need to call a method to make the code execute.
Now, in the scenario of an asynchronously handled socket, you have some options. Code always executes in the context of a thread, so you need to think about which thread you want to execute this particular code. That really amounts to there being just two options: 1) execute in the current thread, or 2) execute in a different thread.
Okay, now we're getting somewhere. If we pick option #1, how does that happen? Easy...just call the method from your ReadCallback() method. If you've created an object you want that called method to process, just pass that object to the method when you call it.
And it really is that simple (*).
Now, what if you want to pick #2? Well…that's a bit more complicated. First, you need to find a thread to execute the code, and second you need to get the data to that thread.
I can think of at least three obvious ways to go about this:
Use an existing UI thread. In this case, you'll use e.g. the Control.Invoke() method or Dispatcher.Invoke() (for Winforms or WPF, respectively). A similar mechanism is available in ASP.NET.
Use the thread pool, e.g. via the Task Parallel Library. For example, you might use the Task.Run() method to start a new task.
Use a producer/consumer implementation, in which you've previously started a thread dedicated just to consuming the data objects created when receiving data. For example, you could start a new thread with a method that just uses foreach to pull items from a BlockingCollection<T> instance, while the ReadCallback() method adds items to that same instance.
In the first two options above, data moves to the other thread via an argument to the method being invoked, or as a captured variable in an anonymous method being invoked (I find the latter more convenient than the former, but either works fine). In the third option, obviously the data moves from the socket's thread to the consuming thread via the shared collection.
I hope that the above is enough to get you pointed in the right direction. As asked, the question is fairly broad (possibly too broad), but I think what I've written here is still reasonably concise, with just enough vagueness to remain applicable to whatever your scenario is, without being so vague as to be non-useful. :)
(*) Actually, it's a little more complicated than that, in that you have a number of mechanisms by which you can call a method. The simplest is that the method name is hard-coded into your ReadCallback() method; you just call the one method you know you always want to call. But that limits reusability of the code, and couples it to unrelated code which makes it harder to maintain.
Other options include:
Declaring an event on your socket client object where the ReadCallback() method exists, and have the object that's supposed to actually process the data subscribe to that event. The ReadCallback() method would raise the event, passing the object to process as part of the event's arguments.
Simply passing a callback delegate to the socket client object, very much in the same way you currently pass a delegate representing your ReadCallback() method to the socket class.
First off, this is not really intended to be an answer, but I find that writing a lot of text as comments is problematic.
I've now actually taken a look at your code, and maybe I'm beginning to understand what you're asking. Your use of the phrase "pass it outside the server socket" is partly what confused me - what you mean is "give the data to a method that is not a dedicated part of the Socket processing code", right?
The simple way to do this is, once you've accumulated a complete "logical message" is to call a method to process it from your asynchronous ReadCallback method. So the business logic is actually running as a subroutine of the Socket code. But this is only OK for trivial processing that does not block for any length of time.
A more common technique is to use multi-threading and cross-thread dispatching techniques. Then the business logic is semi-independent of the Socket code. There are many possible ways of doing this. I've written code that explicitly uses an AutoResetEvent and a queue of messages (with a lock), but it is a bit messy. The modern way (which I've not used personally) is via the C# async and await facility.
Finally, a few comments on the code you present. It is safer to not use an end-of-file token, and instead prefix each logical message with a message length, for example an Int32 encoded/decoded via BitConverter to a 4-byte array (be careful of big endian vs. little endian). The problem with the end-of-file marker is that it is conceivable that the marker could be split between two calls to your ReadCallback method.
Similarly, it is best to first accumulate the entire message as raw bytes, and then decode the whole message back to string. Again, the problem could be that a UTF-8 two-byte sequence could get split across two calls to ReadCallback.
Hope this helps.
EDIT:
Just want to mention that the fact that TCP/IP input is considered to be a stream of bytes does make the processing tricky. I've already indicated that a length prefix is safer than an end-of-file token, and that accumulating the whole message before converting from UTF-8 to string is safer than converting individual segments.
But in addition you have to be careful that you have at least 4 bytes before you try to convert it into the length. It is conceivable that you get one message plus the first 2 bytes of the next one, so you only have half of the length prefix for the second message. Then you have to just save those 2 bytes and wait for the next call before you can even convert the length.
Normally you get a whole message on each call, and it is very rare that this streaming causes problems. And it never happens during testing. But according to Murphy's law eventually it will happen, and at the worst possible time.
I am redirecting the output of a process into a streamreader which I read later. My problem is I am using multiple threads which SHOULD have separate instances of this stream. When I go to read this stream in, the threading fudges and starts executing oddly. Is there such a thing as making a thread-safe stream? EDIT: I put locks on the ReadToEnd on the streamreader, and the line where I did: reader = proc.StandardOutput;
There's a SyncrhonizedStream built into the framework, they just don't expose the class for you to look at/subclass etc, but you can turn any stream into a SynchronizedStream using
var syncStream = Stream.Synchronized(inStream);
You should pass the syncStream object around to each thread that needs it, and make sure you never try to access inStream elsewhere in code.
The SynchronizedStream just implements a monitor on all read/write operation to ensure that a thread has mutually exclusive access to the stream.
Edit:
Appears they also implements a SynchronizedReader/SynchronizedWriter in the framework too.
var reader = TextReader.Synchronized(process.StandardOutput);
A 'thread-safe' stream doesn't really mean anything. If the stream is somehow shared you must define on what level synchronization/sharing can take place. This in terms of the data packets (messages or records) and their allowed/required ordering.
I'm working on an image processing application where I have two threads on top of my main thread:
1 - CameraThread that captures images from the webcam and writes them into a buffer
2 - ImageProcessingThread that takes the latest image from that buffer for filtering.
The reason why this is multithreaded is because speed is critical and I need to have CameraThread to keep grabbing pictures and making the latest capture ready to pick up by ImageProcessingThread while it's still processing the previous image.
My problem is about finding a fast and thread-safe way to access that common buffer and I've figured that, ideally, it should be a triple buffer (image[3]) so that if ImageProcessingThread is slow, then CameraThread can keep on writing on the two other images and vice versa.
What sort of locking mechanism would be the most appropriate for this to be thread-safe ?
I looked at the lock statement but it seems like it would make a thread block-waiting for another one to be finished and that would be against the point of triple buffering.
Thanks in advance for any idea or advice.
J.
This could be a textbook example of the Producer-Consumer Pattern.
If you're going to be working in .NET 4, you can use the IProducerConsumerCollection<T> and associated concrete classes to provide your functionality.
If not, have a read of this article for more information on the pattern, and this question for guidance in writing your own thread-safe implementation of a blocking First-In First-Out structure.
Personally I think you might want to look at a different approach for this, rather than writing to a centralized "buffer" that you have to manage access to, could you switch to an approach that uses events. Once the camera thread has "received" an image it could raise an event, that passed the image data off to the process that actually handles the image processing.
An alternative would be to use a Queue, which the queue is a FIFO (First in First Out) data structure, now it is not thread-safe for access so you would have to lock it, but your locking time would be very minimal to put the item in the queue. There are also other Queue classes out there that are thread-safe that you could use.
Using your approach there are a number of issues that you would have to contend with. Blocking as you are accessing the array, limitations as to what happens after you run out of available array slots, blocking, etc..
Given the amount of precessing needed for a picture, I don't think that a simple locking scheme would be your bottleneck. Measure before you start wasting time on the wrong problem.
Be very careful with 'lock-free' solutions, they are always more complicated than they look.
And you need a Queue, not an array.
If you can use dotNET4 I would use the ConcurrentQuue.
You will have to run some performance metrics, but take a look at lock free queues.
See this question and its associated answers, for example.
In your particular application, though, you processor is only really interested in the most recent image. In effect this means you only really want to maintain a queue of two items (the new item and the previous item) so that there is no contention between reading and writing. You could, for example, have your producer remove old entries from the queue once a new one is written.
Edit: having said all this, I think there is a lot of merit in what is said in Mitchel Sellers's answer.
I would look at using a ReaderWriterLockSlim which allows fast read and upgradable locks for writes.
This isn't a direct answer to your question, but it may be better to rethink your concurrency model. Locks are a terrible way to syncronize anything -- too low level, error prone, etc. Try to rethink your problem in terms of message passing concurrency:
The idea here is that each thread is its own tightly contained message loop, and each thread has a "mailbox" for sending and receiving messages -- we're going to use the term MailboxThread to distinguish these types of objects from plain jane threads.
So instead of having two threads accessing the same buffer, you instead have two MailboxThreads sending and receiving messages between one another (pseudocode):
let filter =
while true
let image = getNextMsg() // blocks until the next message is recieved
process image
let camera(filterMailbox) =
while true
let image = takePicture()
filterMailbox.SendMsg(image) // sends a message asyncronous
let filterMailbox = Mailbox.Start(filter)
let cameraMailbox = Mailbox.Start(camera(filterMailbox))
Now you're processing threads don't know or care about any buffers at all. They just wait for messages and process them whenever they're available. If you send to many message for the filterMailbox to handle, those messages get enqueued to be processed later.
The hard part here is actually implementing your MailboxThread object. Although it requires some creativity to get right, its wholly possible to implement these types of objects so that they only hold a thread open while processing a message, and release the executing thread back to the thread-pool when there are no messages left to handle (this implementation allows you to terminate your application without dangling threads).
The advantage here is how threads send and receive messages without worrying about locking or syncronization. Behind the scenes, you need to lock your message queue between enqueing or dequeuing a message, but that implementation detail is completely transparent to your client-side code.
Just an Idea.
Since we're talking about only two threads, we can make some assumptions.
Lets use your tripple buffer idea. Assuming there is only 1 writer and 1 reader thread, we can toss a "flag" back-and-forth in the form of an integer. Both threads will continuously spin but update their buffers.
WARNING: This will only work for 1 reader thread
Pseudo Code
Shared Variables:
int Status = 0; //0 = ready to write; 1 = ready to read
Buffer1 = New bytes[]
Buffer2 = New bytes[]
Buffer3 = New bytes[]
BufferTmp = null
thread1
{
while(true)
{
WriteData(Buffer1);
if (Status == 0)
{
BufferTmp = Buffer1;
Buffer1 = Buffer2;
Buffer2 = BufferTmp;
Status = 1;
}
}
}
thread2
{
while(true)
{
ReadData(Buffer3);
if (Status == 1)
{
BufferTmp = Buffer1;
Buffer2 = Buffer3;
Buffer3 = BufferTmp;
Status = 0;
}
}
}
just remember, you're writedata method wouldn't create new byte objects, but update the current one. Creating new objects is expensive.
Also, you may want a thread.sleep(1) in an ELSE statement to accompany the IF statements, otherwise one a single core CPU, a spinning thread will increase the latency before the other thread gets scheduled. eg. The write thread may run spin 2-3 times before the read thread gets scheduled, because the schedulers sees the write thread doing "work"
Does anyone know where I can find a Stream splitter implementation?
I'm looking to take a Stream, and obtain two separate streams that can be independently read and closed without impacting each other. These streams should each return the same binary data that the original stream would. No need to implement Position or Seek and such... Forward only.
I'd prefer if it didn't just copy the whole stream into memory and serve it up multiple times, which would be fairly simple enough to implement myself.
Is there anything out there that could do this?
I have made a SplitStream available on github and NuGet.
It goes like this.
using (var inputSplitStream = new ReadableSplitStream(inputSourceStream))
using (var inputFileStream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputFileStream = File.OpenWrite("MyFileOnAnyFilestore.bin"))
using (var inputSha1Stream = inputSplitStream.GetForwardReadOnlyStream())
using (var outputSha1Stream = SHA1.Create())
{
inputSplitStream.StartReadAhead();
Parallel.Invoke(
() => {
var bytes = outputSha1Stream.ComputeHash(inputSha1Stream);
var checksumSha1 = string.Join("", bytes.Select(x => x.ToString("x")));
},
() => {
inputFileStream.CopyTo(outputFileStream);
},
);
}
I have not tested it on very large streams, but give it a try.
github: https://github.com/microknights/SplitStream
Not out of the box.
You'll need to buffer the data from the original stream in a FIFO manner, discarding only data which has been read by all "reader" streams.
I'd use:
A "management" object holding some sort of queue of byte[] holding the chunks to be buffered and reading additional data from the source stream if required
Some "reader" instances which known where and on what buffer they are reading, and which request the next chunk from the "management" and notify it when they don't use a chunk anymore, so that it may be removed from the queue
This could be tricky without risking keeping everything buffered in memory (if the streams are at BOF and EOF respectively).
I wonder whether it isn't easier to write the stream to disk, copy it, and have two streams reading from disk, with self-deletion built into the Close() (i.e. write your own Stream wrapper around FileStream).
The below seems to be valid called EchoStream
http://www.codeproject.com/Articles/3922/EchoStream-An-Echo-Tee-Stream-for-NET
Its a very old implementation (2003) but should provide some context
found via Redirect writes to a file to a stream C#
You can't really do this without duplicating at least part of the sourse stream - mostly due to the fact that if doesn't sound like you can control the rate at which they are consumed (multiple threads?). You could do something clever regarding one reading ahread of the other (and thereby making the copy at that point only) but the complexiy of this sounds like it's not worth the trouble.
I do not think you will be able to find a generic implementation to do just that. A Stream is rather abstract, you don't know where the bytes are coming from. For instance you don't know if it will support seeking; and you don't know the relative cost of operations. (The Stream might be an abstraction of reading data from a remote server, or even off a backup tape !).
If you are able to have a MemoryStream and store the contents once, you can create two separate streams using the same buffer; and they will behave as independent Streams but only use the memory once.
Otherwise, I think you are best off by creating a wrapper class that stores the bytes read from one stream, until they are also read by the second stream. That would give you the desired forward-only behaviour - but in worst case, you might risk storing all of the bytes in memory, if the second Stream is not read until the first Stream has completed reading all content.
With the introduction of async / await, so long as all but one of your reading tasks are async, you should be able to process the same data twice using only a single OS thread.
What I think you want, is a linked list of the data blocks you have seen so far. Then you can have multiple custom Stream instances that hold a pointer into this list. As blocks fall off the end of the list, they will be garbage collected. Reusing the memory immediately would require some other kind of circular list and reference counting. Doable, but more complicated.
When your custom Stream can answer a ReadAsync call from the cache, copy the data, advance the pointer down the list and return.
When your Stream has caught up to the end of the cache list, you want to issue a single ReadAsync to the underlying stream, without awaiting it, and cache the returned Task with the data block. So if any other Stream reader also catches up and tries to read more before this read completes, you can return the same Task object.
This way, both readers will hook their await continuation to the result of the same ReadAsync call. When the single read returns, both reading tasks will sequentially execute the next step of their process.