How can I decompress live streaming data with DeflateStream?

How can I decompress live streaming data with DeflateStream? - c#

I have an event that gets fired when I receive data. The data received is a segment of a large compressed stream.
Currently, I maintain a MemoryStream and am calling Write on it whenever I receive data. However, to decompress this data I need to wrap a DeflateStream around it. The problem is that when you call Read on the DeflateStream, it reads zero bytes!
This is because the DeflateStream.BaseStream is the MemoryStream I just wrote to, so the MemoryStream.Position has been updated, for both reading and writing.
Is there an alternative to calling MemoryStream.Seek(-bytesJustWritten, SeekOrigin.Current) after every single MemoryStream.Write?
I need to be able to read the decompressed bytes in real-time without waiting for closure of the stream.
I thought I could just use two MemoryStream objects, one for buffering my input and then one to copy into and then read from, but I quickly realized that it would present the same problem; you cannot write to a stream and then read from it without first seeking.

I solved this by using one MemoryStream and calling MemoryStream.Seek(-bytesJustWritten, SeekOrigin.Current) after each time MemoryStream.Write was invoked.
This reset the position of the stream to allow for the data just decompressed to be read and later overwritten, while preserving the state of the DeflateStream.

Related

Passing subsequence of a stream without copying its content into a new instance

Let's assume I have following method:
void Upload(Stream stream)
{
// uploads the content of the specified stream somewhere
}
And let's further assume I got some binary file f, which contains some data I'd like to upload with the method above.
But: It's not the whole file I want to upload. It's only a certain part of f. More precisely the desired data starts at a certain position s >= 0 and ends at a certain position e <= f.Length.
Is there a way to pass a Stream instance, which starts at position s, with the length of e, without copying all bytes between s and e into a new stream instance? I'm asking because there is the possibility, that file f is quite big and I don't want to make assumptions on potentially available RAM.

Please consider using Stream.CanSeek Property, Stream.Position Property, Stream.Seek Method to "access" the certain part of the stream.
To have a separate Stream instance with appropriate length, it seems it is required to implement a SubStream class — the wrapper which represents sub-stream. The following references can be useful to implement such a wrapper:
How to access part of a FileStream or MemoryStream, Social MSDN.
How to expose a sub section of my stream to a user, Stackoverflow.

If modifying the pointers in the original stream before calling the method will work, then use Seek to set the starting position and SetLength to set the end position. Then, you can pass the stream to the method and it should only touch that section (assuming it does not internally seek back to the beginning).
Unfortunately, SetLength will truncate the stream, so you won't be able to later access the rest of it if you needed to for some reason. However, if that is not a requirement, this should work.
Edit: Since you need to preserve the original stream, these are the other options I can think of:
If you have access to the path (and it is not locked by the other stream), you could open a new stream to the file and send a truncated version of that stream.
You could copy the section you need to a new stream, such as a MemoryStream. You won't need to copy the entire file, but you would need to copy the part you are going to upload using Seek and Read.
byte[] data = new byte[size];
stream.Seek(position, SeekOrigin.Begin);
stream.Read(data, 0, size);
using (MemoryStream subStream = new MemoryStream(data))
{
Upload(subStream);
}
You could write your own stream implementation that does what you want, accessing only a specific part of another stream.

how does XMLReader work

If I use XMLReader.Create, and pass it a stream, the XMLReader appears to read the entire stream even before I call any read methods, because the position property of the stream changes to match the length of the stream. Is XMLReader then storing the entire xml in memory? It would appear so as I can call XmlReader.Read and the stream position never changes. Is it possible for XmlReader to not consume the entire stream?

XmlReader does not read the entire stream at once, it only reads blocks of (up to) 8192 bytes at a time from the stream (or more if Async is set to true) and stores them in an internal byte buffer. Obviously, if your stream has less bytes than that, it will read them all on the first Read() call. This is likely what you are experiencing.

How to duplicate a stream

Currently I have this code
SessionStream(Request.Content.ReadAsStreamAsync(), new { });
I need to somehow "mirror" the incoming stream so that I have two instances of it.
Something like the following pseudo code:
Task<Stream> stream = Request.Content.ReadAsStreamAsync();
SessionStream(stream, new { });
Stream theotherStram;
stream.Result.CopyToAsync(theotherStram)
ThoOtherStream(theotherStram, new { });

A technique that always works is to copy the stream to a MemoryStream and then use that.
Often, it is more efficient to just seek the original stream back to the beginning using the Seek method. That only works if this stream supports seeking.
If you do not want to buffer and cannot seek you need to push the stream contents blockwise two the two consumers. Read a block, write it two times.
If in addition you need a pull model (i.e. hand a readable stream to some component) it gets really hard and threading is involved. You'd need to write a push-to-pull adapter which is always hard.

The answer of usr is still correct in 2020, but for those wondering about why it is not trivial here is a short explanation.
The idea behind the steams is that writing to the stream and reading from it are independent. Usually, the process of reading is much faster then writing (think about receiving data through network - you can read the data as soon as it arrives) so usually the reader waits for new portion of data, processes it as soon as it arrives and then drops it to free the memory, and then waits for the next portion.
This allows processing potentially infinite data stream (for example, application log stream) without using much RAM.
Suppose now we have 2 readers (as required by the question). A data portion arrives, and then we have to wait for both the readers to read the data before we can drop it. Which means that it must be stored in memory until both readers are done with it. The problem is that the readers can process the data with very different speed. E.g. one can write it to a file, another can just count the symbols in memory. In this case either the fast one would have to wait for the slow one before reading further, or we would need to save the data to a buffer in memory, and let the readers read from it. In the worst case we will end up with the full copy of the input stream in memory, basically creating an instance of memory stream.
To implement the first option, you would have to implement a stream reader that is aware which of your stream usage is faster, and considering it, would distribute and drop the data accordingly.
If you are sure you have enough memory, and processing speed is not critical, just use the memory stream:
using var memStream = new MemoryStream();
await incomingStream.CopyToAsync(memStream);
UseTheStreamForTheFirstTime(memStream);
memStream.Seek(0, SeekOrigin.Begin);
UseTheStreamAnotherTime(memStream);

BaseStream underlying stream

I was trying to read a binary file, it was written in a certain pattern for example: string, string, byte
I surfed web and found this code:
while (br.BaseStream.Position<br.BaseStream.Length)
{
br.ReadString();
br.ReadString();
br.ReadByte();
}
Even though it is easy code I can't understand what the underlying stream(BaseStream) means? Can somebody give me a brief explanation of it?

.NET offers different objects to read or write data. Basicly there are DataWriters and DataReaders that write or read into different streams. Streams are representing the data flow between the data source (e.g. a file) to your applications memory (or back).
To access the stream in a defined direction you can use readers or writers. BinaryReader is one example of an data reader. It is supposed to read binary data out of the stream. Streams usually inherit from a base class called Stream. There are different type of streams representing different data sources. For example a FileStream reads or writes data into a file on the HDD, whereas a MemoryStream reads or writes data into the RAM. So the implementation of a stream describes where the data is stored.
DataReaders or DataWriters describe how the data is stored. For example your BinaryReader reads byte sequences, whereas a TextReader reads text with a given encoding. But both can be used with the same stream.
To come back to your question: Your BinaryReader reads binary data from a stream. The BaseStream property returns the instance of the stream the reader reads data from. This is why you need to initialize the BinaryReader with an stream instance. You cannot tell the computer to read binary data from nowhere! ;)

Something about Stream

I've been working on something that make use of streams and I found myself not clear about some stream concepts( you can also view another question posted by me at About redirected stdout in System.Diagnostics.Process ).
1.how do you indicate that you have finished writing a stream, writing something like a EOF?
2.follow the previous question, if I have written a EOF(or something like that) to a stream but didn't close the stream, then I want to write something else to the same stream, can I just start writing to it and no more set up required?
3.if a procedure tries to read a stream(like the stdin ) that no one has written anything to it, the reading procedure will be blocked,finally some data arrives and the procedure will just read till the writing is done,which is indicated by getting a return of 0 count of bytes read rather than being blocked, and now if the procedure issues another read to the same stream, it will still get a 0 count and return immediately while I was expecting it will be blocked since no one is writing to the stream now. So does the stream holds different states when the stream is opened but no one has written to it yet and when someone has finished a writing session?
I'm using Windows the .net framework if there will by any thing platform specific.
Thanks a lot!

This depends on the concrete stream. For example, reading from a MemoryStream would not block as you describle. This is because a MemoryStream has an explicit size, and as you read from the stream the pointer is progressed through the stream untile you reach the end, at which point the Read will return 0. If there was not data in the MemoryStream the first Read would have immediately returned 0.
What you describe fits with a NetworkStream, in which case reading from the stream will block until data becomes available, when the "server" side closes the underlying Socket that is wrapped by the NetworkStream the Read will return 0.
So the actual details depends on the stream, but at the high level they are all treated the same ie. You can read from a stream until the Read returns 0.

There is no "EOF" with streams. You write to a stream until you close it, which prevents it from being written further.
Streams just read and write bytes. That's all.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How can I decompress live streaming data with DeflateStream? - c#

Related

Passing subsequence of a stream without copying its content into a new instance

how does XMLReader work

How to duplicate a stream

BaseStream underlying stream

Something about Stream

Categories

Resources