how does XMLReader work

how does XMLReader work - c#

If I use XMLReader.Create, and pass it a stream, the XMLReader appears to read the entire stream even before I call any read methods, because the position property of the stream changes to match the length of the stream. Is XMLReader then storing the entire xml in memory? It would appear so as I can call XmlReader.Read and the stream position never changes. Is it possible for XmlReader to not consume the entire stream?

XmlReader does not read the entire stream at once, it only reads blocks of (up to) 8192 bytes at a time from the stream (or more if Async is set to true) and stores them in an internal byte buffer. Obviously, if your stream has less bytes than that, it will read them all on the first Read() call. This is likely what you are experiencing.

Related

Passing subsequence of a stream without copying its content into a new instance

Let's assume I have following method:
void Upload(Stream stream)
{
// uploads the content of the specified stream somewhere
}
And let's further assume I got some binary file f, which contains some data I'd like to upload with the method above.
But: It's not the whole file I want to upload. It's only a certain part of f. More precisely the desired data starts at a certain position s >= 0 and ends at a certain position e <= f.Length.
Is there a way to pass a Stream instance, which starts at position s, with the length of e, without copying all bytes between s and e into a new stream instance? I'm asking because there is the possibility, that file f is quite big and I don't want to make assumptions on potentially available RAM.

Please consider using Stream.CanSeek Property, Stream.Position Property, Stream.Seek Method to "access" the certain part of the stream.
To have a separate Stream instance with appropriate length, it seems it is required to implement a SubStream class — the wrapper which represents sub-stream. The following references can be useful to implement such a wrapper:
How to access part of a FileStream or MemoryStream, Social MSDN.
How to expose a sub section of my stream to a user, Stackoverflow.

If modifying the pointers in the original stream before calling the method will work, then use Seek to set the starting position and SetLength to set the end position. Then, you can pass the stream to the method and it should only touch that section (assuming it does not internally seek back to the beginning).
Unfortunately, SetLength will truncate the stream, so you won't be able to later access the rest of it if you needed to for some reason. However, if that is not a requirement, this should work.
Edit: Since you need to preserve the original stream, these are the other options I can think of:
If you have access to the path (and it is not locked by the other stream), you could open a new stream to the file and send a truncated version of that stream.
You could copy the section you need to a new stream, such as a MemoryStream. You won't need to copy the entire file, but you would need to copy the part you are going to upload using Seek and Read.
byte[] data = new byte[size];
stream.Seek(position, SeekOrigin.Begin);
stream.Read(data, 0, size);
using (MemoryStream subStream = new MemoryStream(data))
{
Upload(subStream);
}
You could write your own stream implementation that does what you want, accessing only a specific part of another stream.

How can I decompress live streaming data with DeflateStream?

I have an event that gets fired when I receive data. The data received is a segment of a large compressed stream.
Currently, I maintain a MemoryStream and am calling Write on it whenever I receive data. However, to decompress this data I need to wrap a DeflateStream around it. The problem is that when you call Read on the DeflateStream, it reads zero bytes!
This is because the DeflateStream.BaseStream is the MemoryStream I just wrote to, so the MemoryStream.Position has been updated, for both reading and writing.
Is there an alternative to calling MemoryStream.Seek(-bytesJustWritten, SeekOrigin.Current) after every single MemoryStream.Write?
I need to be able to read the decompressed bytes in real-time without waiting for closure of the stream.
I thought I could just use two MemoryStream objects, one for buffering my input and then one to copy into and then read from, but I quickly realized that it would present the same problem; you cannot write to a stream and then read from it without first seeking.

I solved this by using one MemoryStream and calling MemoryStream.Seek(-bytesJustWritten, SeekOrigin.Current) after each time MemoryStream.Write was invoked.
This reset the position of the stream to allow for the data just decompressed to be read and later overwritten, while preserving the state of the DeflateStream.

Replacing a string within a stream in C# (without overwriting the original file)

I have a file that I'm opening into a stream and passing to another method. However, I'd like to replace a string in the file before passing the stream to the other method. So:
string path = "C:/...";
Stream s = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read);
//need to replace all occurrences of "John" in the file to "Jack" here.
CallMethod(s);
The original file should not be modified, only the stream. What would be the easiest way to do this?
Thanks...

It's a lot easier if you just read in the file as lines, and then deal with those, instead of forcing yourself to stick with a Stream, simply because stream deals with both text and binary files, and needs to be able to read in one character at a time (which makes such replacement very hard). If you read in a whole line at a time (so long as you don't have multi-line replacement) it's quite easy.
var lines = File.ReadLines(path)
.Select(line => line.Replace("John", "Jack"));
Note that ReadLines still does stream the data, and Select doesn't need to materialize the whole thing, so you're still not reading the whole file into memory at one time when doing this.
If you don't actually need to stream the data you can easily just load it all as one big string, do the replace, and then create a stream based on that one string:
string data = File.ReadAllText(path)
.Replace("John", "Jack");
byte[] bytes = Encoding.ASCII.GetBytes(data);
Stream s = new MemoryStream(bytes);

This question probably has many good answers. I'll try one I've used and has always worked for me and my peers.
I suggest you create a separate stream, say a MemoryStream. Read from your filestream and write into the memory one. You can then extract strings from either and replace stuff, and you would pass the memory stream ahead. That makes it double sure that you are not messing up with the original stream, and you can ever read the original values from it whenever you need, though you are using basically twice as much memory by using this method.

If the file has extremely long lines, the replaced string may contain a newline or there are other constraints preventing the use of File.ReadLines() while requiring streaming, there is an alternative solution using streams only, even though it is not trivial.
Implement your own stream decorator (wrapper) that performs the replacement. I.e. a class based on Stream that takes another stream in its constructor, reads data from the stream in its Read(byte[], int, int) override and performs the replacement in the buffer. See notes to Stream implementers for further requirements and suggestions.
Let's call the string being replaced "needle", the source stream "haystack" and the replacement string "replacement".
Needle and replacement need to be encoded using the encoding of the haystack contents (typically Encoding.UTF8.GetBytes()). Inside streams, the data is not converted to string, unlike in StreamReader.ReadLine(). Thus unnecessary memory allocation is prevented.
Simple cases: If both needle and replacement are just a single byte, the implementation is just a simple loop over the buffer, replacing all occurrences. If needle is a single byte and replacement is empty (i.e. deleting the byte, e.g. deleting carriage return for line ending normalization), it is a simple loop maintaining from and to indexes to the buffer, rewriting the buffer byte by byte.
In more complex cases, implement the KMP algorithm to perform the replacement.
Read the data from the underlying stream (haystack) to an internal buffer that is at least as long as needle and perform the replacement while rewriting the data to the output buffer. The internal buffer is needed so that data from a partial match are not published before a complete match is detected -- then, it would be too late to go back and delete the match completely.
Process the internal buffer byte by byte, feeding each byte into the KMP automaton. With each automaton update, write the bytes it releases to the appropriate position in output buffer.
When a match is detected by KMP, replace it: reset the automaton keeping the position in the internal buffer (which deletes the match) and write the replacement in the output buffer.
When end of either buffer is reached, keep the unwritten output and unprocessed part of the internal buffer including current partial match as a starting point for next call to the method and return the current output buffer. Next call to the method writes the remaining output and starts processing the rest of haystack where the current one stopped.
When end of haystack is reached, release the current partial match and write it to the output buffer.
Just be careful not to return an empty output buffer before processing all the data of haystack -- that would signal end of stream to the caller and therefore truncate the data.

BaseStream underlying stream

I was trying to read a binary file, it was written in a certain pattern for example: string, string, byte
I surfed web and found this code:
while (br.BaseStream.Position<br.BaseStream.Length)
{
br.ReadString();
br.ReadString();
br.ReadByte();
}
Even though it is easy code I can't understand what the underlying stream(BaseStream) means? Can somebody give me a brief explanation of it?

.NET offers different objects to read or write data. Basicly there are DataWriters and DataReaders that write or read into different streams. Streams are representing the data flow between the data source (e.g. a file) to your applications memory (or back).
To access the stream in a defined direction you can use readers or writers. BinaryReader is one example of an data reader. It is supposed to read binary data out of the stream. Streams usually inherit from a base class called Stream. There are different type of streams representing different data sources. For example a FileStream reads or writes data into a file on the HDD, whereas a MemoryStream reads or writes data into the RAM. So the implementation of a stream describes where the data is stored.
DataReaders or DataWriters describe how the data is stored. For example your BinaryReader reads byte sequences, whereas a TextReader reads text with a given encoding. But both can be used with the same stream.
To come back to your question: Your BinaryReader reads binary data from a stream. The BaseStream property returns the instance of the stream the reader reads data from. This is why you need to initialize the BinaryReader with an stream instance. You cannot tell the computer to read binary data from nowhere! ;)

Something about Stream

I've been working on something that make use of streams and I found myself not clear about some stream concepts( you can also view another question posted by me at About redirected stdout in System.Diagnostics.Process ).
1.how do you indicate that you have finished writing a stream, writing something like a EOF?
2.follow the previous question, if I have written a EOF(or something like that) to a stream but didn't close the stream, then I want to write something else to the same stream, can I just start writing to it and no more set up required?
3.if a procedure tries to read a stream(like the stdin ) that no one has written anything to it, the reading procedure will be blocked,finally some data arrives and the procedure will just read till the writing is done,which is indicated by getting a return of 0 count of bytes read rather than being blocked, and now if the procedure issues another read to the same stream, it will still get a 0 count and return immediately while I was expecting it will be blocked since no one is writing to the stream now. So does the stream holds different states when the stream is opened but no one has written to it yet and when someone has finished a writing session?
I'm using Windows the .net framework if there will by any thing platform specific.
Thanks a lot!

This depends on the concrete stream. For example, reading from a MemoryStream would not block as you describle. This is because a MemoryStream has an explicit size, and as you read from the stream the pointer is progressed through the stream untile you reach the end, at which point the Read will return 0. If there was not data in the MemoryStream the first Read would have immediately returned 0.
What you describe fits with a NetworkStream, in which case reading from the stream will block until data becomes available, when the "server" side closes the underlying Socket that is wrapped by the NetworkStream the Read will return 0.
So the actual details depends on the stream, but at the high level they are all treated the same ie. You can read from a stream until the Read returns 0.

There is no "EOF" with streams. You write to a stream until you close it, which prevents it from being written further.
Streams just read and write bytes. That's all.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

how does XMLReader work - c#

Related

Passing subsequence of a stream without copying its content into a new instance

How can I decompress live streaming data with DeflateStream?

Replacing a string within a stream in C# (without overwriting the original file)

BaseStream underlying stream

Something about Stream

Categories

Resources