Is it possible to somehow (without a huge performance loss) determine if a Stream (just a normal System.IO.Stream) "contains" a string or not? I have tried to google this, but I havent found a good solution that doesnt involve try and catching.
Any stream can be a string as a string is just a series of bytes. If you're asking if a stream contains a specific sequence of bytes -- i.e. you want to confirm that a stream contains a $MY_TOKEN$ somewhere then you'll have to read up until that point or to the end of the stream.
Depending on the nature of your stream, there might be an efficient way to do this and then reset the cursor back to the beginning of the stream.
Related
I have any System.IO.Stream with input data and another one to write output to. What's a good way to copy stream contents while beeing able to detect certain keywords and modifying/removing data from the stream?
Do I have to use .Read and .Write and a buffer and deal with buffer boundaries myself (like only a part of a keyword at the end of the buffer)? Of course that's not too hard, but I hope for something more fashionable, like inheriting some ready-made stream converter class.
For example, since it's for MS Exchange anyway, I tried to inherit Microsoft.Exchange.Data.TextConverters.TextConverter, but it looks like this is not possible?
I know, encoding could be another issue, but let's treat it as raw bytes for this question.
I have a file that I'm opening into a stream and passing to another method. However, I'd like to replace a string in the file before passing the stream to the other method. So:
string path = "C:/...";
Stream s = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read);
//need to replace all occurrences of "John" in the file to "Jack" here.
CallMethod(s);
The original file should not be modified, only the stream. What would be the easiest way to do this?
Thanks...
It's a lot easier if you just read in the file as lines, and then deal with those, instead of forcing yourself to stick with a Stream, simply because stream deals with both text and binary files, and needs to be able to read in one character at a time (which makes such replacement very hard). If you read in a whole line at a time (so long as you don't have multi-line replacement) it's quite easy.
var lines = File.ReadLines(path)
.Select(line => line.Replace("John", "Jack"));
Note that ReadLines still does stream the data, and Select doesn't need to materialize the whole thing, so you're still not reading the whole file into memory at one time when doing this.
If you don't actually need to stream the data you can easily just load it all as one big string, do the replace, and then create a stream based on that one string:
string data = File.ReadAllText(path)
.Replace("John", "Jack");
byte[] bytes = Encoding.ASCII.GetBytes(data);
Stream s = new MemoryStream(bytes);
This question probably has many good answers. I'll try one I've used and has always worked for me and my peers.
I suggest you create a separate stream, say a MemoryStream. Read from your filestream and write into the memory one. You can then extract strings from either and replace stuff, and you would pass the memory stream ahead. That makes it double sure that you are not messing up with the original stream, and you can ever read the original values from it whenever you need, though you are using basically twice as much memory by using this method.
If the file has extremely long lines, the replaced string may contain a newline or there are other constraints preventing the use of File.ReadLines() while requiring streaming, there is an alternative solution using streams only, even though it is not trivial.
Implement your own stream decorator (wrapper) that performs the replacement. I.e. a class based on Stream that takes another stream in its constructor, reads data from the stream in its Read(byte[], int, int) override and performs the replacement in the buffer. See notes to Stream implementers for further requirements and suggestions.
Let's call the string being replaced "needle", the source stream "haystack" and the replacement string "replacement".
Needle and replacement need to be encoded using the encoding of the haystack contents (typically Encoding.UTF8.GetBytes()). Inside streams, the data is not converted to string, unlike in StreamReader.ReadLine(). Thus unnecessary memory allocation is prevented.
Simple cases: If both needle and replacement are just a single byte, the implementation is just a simple loop over the buffer, replacing all occurrences. If needle is a single byte and replacement is empty (i.e. deleting the byte, e.g. deleting carriage return for line ending normalization), it is a simple loop maintaining from and to indexes to the buffer, rewriting the buffer byte by byte.
In more complex cases, implement the KMP algorithm to perform the replacement.
Read the data from the underlying stream (haystack) to an internal buffer that is at least as long as needle and perform the replacement while rewriting the data to the output buffer. The internal buffer is needed so that data from a partial match are not published before a complete match is detected -- then, it would be too late to go back and delete the match completely.
Process the internal buffer byte by byte, feeding each byte into the KMP automaton. With each automaton update, write the bytes it releases to the appropriate position in output buffer.
When a match is detected by KMP, replace it: reset the automaton keeping the position in the internal buffer (which deletes the match) and write the replacement in the output buffer.
When end of either buffer is reached, keep the unwritten output and unprocessed part of the internal buffer including current partial match as a starting point for next call to the method and return the current output buffer. Next call to the method writes the remaining output and starts processing the rest of haystack where the current one stopped.
When end of haystack is reached, release the current partial match and write it to the output buffer.
Just be careful not to return an empty output buffer before processing all the data of haystack -- that would signal end of stream to the caller and therefore truncate the data.
I am working on improving a stream reader class that uses a BinaryReader. It consists of a while loop that uses .PeekChar() to check if more data exists to continue processing.
The very first operation is a .ReadInt32() which reads 4 bytes. What if PeekChar only "saw" one byte (or one bit)? This doesn't seem like a reliable way of checking for EOF.
The BinaryReader is constructed using its default parameters, which as I understand it, uses UTF8 as the default encoding. I assume that .PeekChar() checks for 8 bits but I really am not sure.
How many bits does .PeekChar() look for? (and what are some alternate methods to checking for EOF?)
Here BinaryReader.PeekChar
I read:
ArgumentException: The current character cannot be decoded into the
internal character buffer by using the Encoding selected for the
stream.
This makes clear that amount of memory read depends on Encoding applied to that stream.
EDIT
Actually definition according to MSDN is:
Returns the next available character and does not advance the
byte or character position.*
Infact, it depends on encoding if this is a byte or more...
Hope this helps.
Making your Read*() calls blindly and handling any exceptions that are thrown is the normal method. I don't believe that the stream position is moved if anything goes wrong.
The PeekChar() method of BinaryReader is very buggy. Even when trying to read a from a memory stream with UTF8 encoded data, PeekChar() throws an exception after reading a particular length of the stream. The BCL team has acknowledged the issue, but they have not committed to resolving the issue. Their only response is to avoid using PeekChar() if you can.
After write the xml document into the memory stream. When I want to use it by using XMLDocuments.Load, I have to set the position back to 0.
I am wondering If there any standard way to do it?
Well the simplest way is just:
stream.Position = 0;
I'm not sure what you're after beyond that. You can use the Seek method, but personally I find the Position property to be far simpler.
Do you definitely need to go via a stream in the first place? If you've already got the XmlDocument, why not just use that?
That's pretty much how you have to do it. The position must be set back to 0, because after writing the document into the stream, the stream is positioned at the end, ready to append more data. Setting the position to 0 effectively "rewinds" the stream, so that you will read it back in from the beginning.
This is a normal and expected usage pattern, if you are doing something like this anyway.
I've been working on something that make use of streams and I found myself not clear about some stream concepts( you can also view another question posted by me at About redirected stdout in System.Diagnostics.Process ).
1.how do you indicate that you have finished writing a stream, writing something like a EOF?
2.follow the previous question, if I have written a EOF(or something like that) to a stream but didn't close the stream, then I want to write something else to the same stream, can I just start writing to it and no more set up required?
3.if a procedure tries to read a stream(like the stdin ) that no one has written anything to it, the reading procedure will be blocked,finally some data arrives and the procedure will just read till the writing is done,which is indicated by getting a return of 0 count of bytes read rather than being blocked, and now if the procedure issues another read to the same stream, it will still get a 0 count and return immediately while I was expecting it will be blocked since no one is writing to the stream now. So does the stream holds different states when the stream is opened but no one has written to it yet and when someone has finished a writing session?
I'm using Windows the .net framework if there will by any thing platform specific.
Thanks a lot!
This depends on the concrete stream. For example, reading from a MemoryStream would not block as you describle. This is because a MemoryStream has an explicit size, and as you read from the stream the pointer is progressed through the stream untile you reach the end, at which point the Read will return 0. If there was not data in the MemoryStream the first Read would have immediately returned 0.
What you describe fits with a NetworkStream, in which case reading from the stream will block until data becomes available, when the "server" side closes the underlying Socket that is wrapped by the NetworkStream the Read will return 0.
So the actual details depends on the stream, but at the high level they are all treated the same ie. You can read from a stream until the Read returns 0.
There is no "EOF" with streams. You write to a stream until you close it, which prevents it from being written further.
Streams just read and write bytes. That's all.