Read once stream for multiple consumers - c#

We have a few (big) image files. We wanted to create thumbnails from those files and at the same time record their MD5 sums.
Idealy we wanted the program to read those files only once, and never positions back. However since the data serves for two consumers, although we can create multiple threads we cannot avoid multiple read of the files.
So the requirement is: assuming a read-only, forward only stream, how to use it to feed both a new Bitmap(stream) and a call to md5.ComputeHash(stream)? (The solution should be extended to other stream consumers)
How can we do this?

For your specific case:
Instead of calling md5.ComputeHash(stream), call new CryptoStream(stream, md5, CryptoStreamMode.Read).
This stream will mirror the original stream, but will also pass it through the MD5 hasher.
Once the stream has been read to the end, the md5 instance will hold the hash.

Related

How to modify/convert raw stream data during copy to another stream?

I have any System.IO.Stream with input data and another one to write output to. What's a good way to copy stream contents while beeing able to detect certain keywords and modifying/removing data from the stream?
Do I have to use .Read and .Write and a buffer and deal with buffer boundaries myself (like only a part of a keyword at the end of the buffer)? Of course that's not too hard, but I hope for something more fashionable, like inheriting some ready-made stream converter class.
For example, since it's for MS Exchange anyway, I tried to inherit Microsoft.Exchange.Data.TextConverters.TextConverter, but it looks like this is not possible?
I know, encoding could be another issue, but let's treat it as raw bytes for this question.

Passing subsequence of a stream without copying its content into a new instance

Let's assume I have following method:
void Upload(Stream stream)
{
// uploads the content of the specified stream somewhere
}
And let's further assume I got some binary file f, which contains some data I'd like to upload with the method above.
But: It's not the whole file I want to upload. It's only a certain part of f. More precisely the desired data starts at a certain position s >= 0 and ends at a certain position e <= f.Length.
Is there a way to pass a Stream instance, which starts at position s, with the length of e, without copying all bytes between s and e into a new stream instance? I'm asking because there is the possibility, that file f is quite big and I don't want to make assumptions on potentially available RAM.
Please consider using Stream.CanSeek Property, Stream.Position Property, Stream.Seek Method to "access" the certain part of the stream.
To have a separate Stream instance with appropriate length, it seems it is required to implement a SubStream class — the wrapper which represents sub-stream. The following references can be useful to implement such a wrapper:
How to access part of a FileStream or MemoryStream, Social MSDN.
How to expose a sub section of my stream to a user, Stackoverflow.
If modifying the pointers in the original stream before calling the method will work, then use Seek to set the starting position and SetLength to set the end position. Then, you can pass the stream to the method and it should only touch that section (assuming it does not internally seek back to the beginning).
Unfortunately, SetLength will truncate the stream, so you won't be able to later access the rest of it if you needed to for some reason. However, if that is not a requirement, this should work.
Edit: Since you need to preserve the original stream, these are the other options I can think of:
If you have access to the path (and it is not locked by the other stream), you could open a new stream to the file and send a truncated version of that stream.
You could copy the section you need to a new stream, such as a MemoryStream. You won't need to copy the entire file, but you would need to copy the part you are going to upload using Seek and Read.
byte[] data = new byte[size];
stream.Seek(position, SeekOrigin.Begin);
stream.Read(data, 0, size);
using (MemoryStream subStream = new MemoryStream(data))
{
Upload(subStream);
}
You could write your own stream implementation that does what you want, accessing only a specific part of another stream.

How to duplicate a stream

Currently I have this code
SessionStream(Request.Content.ReadAsStreamAsync(), new { });
I need to somehow "mirror" the incoming stream so that I have two instances of it.
Something like the following pseudo code:
Task<Stream> stream = Request.Content.ReadAsStreamAsync();
SessionStream(stream, new { });
Stream theotherStram;
stream.Result.CopyToAsync(theotherStram)
ThoOtherStream(theotherStram, new { });
A technique that always works is to copy the stream to a MemoryStream and then use that.
Often, it is more efficient to just seek the original stream back to the beginning using the Seek method. That only works if this stream supports seeking.
If you do not want to buffer and cannot seek you need to push the stream contents blockwise two the two consumers. Read a block, write it two times.
If in addition you need a pull model (i.e. hand a readable stream to some component) it gets really hard and threading is involved. You'd need to write a push-to-pull adapter which is always hard.
The answer of usr is still correct in 2020, but for those wondering about why it is not trivial here is a short explanation.
The idea behind the steams is that writing to the stream and reading from it are independent. Usually, the process of reading is much faster then writing (think about receiving data through network - you can read the data as soon as it arrives) so usually the reader waits for new portion of data, processes it as soon as it arrives and then drops it to free the memory, and then waits for the next portion.
This allows processing potentially infinite data stream (for example, application log stream) without using much RAM.
Suppose now we have 2 readers (as required by the question). A data portion arrives, and then we have to wait for both the readers to read the data before we can drop it. Which means that it must be stored in memory until both readers are done with it. The problem is that the readers can process the data with very different speed. E.g. one can write it to a file, another can just count the symbols in memory. In this case either the fast one would have to wait for the slow one before reading further, or we would need to save the data to a buffer in memory, and let the readers read from it. In the worst case we will end up with the full copy of the input stream in memory, basically creating an instance of memory stream.
To implement the first option, you would have to implement a stream reader that is aware which of your stream usage is faster, and considering it, would distribute and drop the data accordingly.
If you are sure you have enough memory, and processing speed is not critical, just use the memory stream:
using var memStream = new MemoryStream();
await incomingStream.CopyToAsync(memStream);
UseTheStreamForTheFirstTime(memStream);
memStream.Seek(0, SeekOrigin.Begin);
UseTheStreamAnotherTime(memStream);

How can I decompress live streaming data with DeflateStream?

I have an event that gets fired when I receive data. The data received is a segment of a large compressed stream.
Currently, I maintain a MemoryStream and am calling Write on it whenever I receive data. However, to decompress this data I need to wrap a DeflateStream around it. The problem is that when you call Read on the DeflateStream, it reads zero bytes!
This is because the DeflateStream.BaseStream is the MemoryStream I just wrote to, so the MemoryStream.Position has been updated, for both reading and writing.
Is there an alternative to calling MemoryStream.Seek(-bytesJustWritten, SeekOrigin.Current) after every single MemoryStream.Write?
I need to be able to read the decompressed bytes in real-time without waiting for closure of the stream.
I thought I could just use two MemoryStream objects, one for buffering my input and then one to copy into and then read from, but I quickly realized that it would present the same problem; you cannot write to a stream and then read from it without first seeking.
I solved this by using one MemoryStream and calling MemoryStream.Seek(-bytesJustWritten, SeekOrigin.Current) after each time MemoryStream.Write was invoked.
This reset the position of the stream to allow for the data just decompressed to be read and later overwritten, while preserving the state of the DeflateStream.

.NET BinaryWriter.Write() Method -- Writing Multiple Datatypes Simultaneously

I'm using BinaryWriter to write records to a file. The records are comprised of a class with the following property datatypes.
Int32,
Int16,
Byte[],
Null Character
To write each record, I call BinaryWriter.Write four times--one for each datatype. This works fine but I'd like to know if there's any way to just call the BinaryWriter.Write() method a single time using all of these datatypes. The reasoning for this is that another program is reading my binary file and will occasionally only read part of a record because it starts reading between my write calls. Unfortunately, I don't have control over the code to the other program else I would modify the way it reads.
Add a .ToBinary() method to your class that returns byte[].
public byte[] ToBinary()
{
byte[] result= new byte[number_of_bytes_you_need];
// fill buf
return result;
}
In your calling code (approximate as I haven't compiled this)
stream.BinaryWrite(myObj.toBinary());
You're still writing each value independently, but it cleans up the code a little.
Also, as sindre suggested, consider using the serialization, as it makes it incredibly easy to recreate your objects from the file in question, and requires less effort than writing the file as you're attempting to.
Sync: you can't depend on any of these solutions to fix your file sync issue. Even if you manage to reduce your binaryWrite() call to a single statement, not using serialization or the .ToBinary() method I've outlined, bytes are still written sequentially by the framework. This is a limitation of the physical structure of the disk. If you have control over the file format, add a record length field written before any of the record data. In the app that's reading the file, make sure that you have record_length bytes before attempting to process the next record from the file. While you're at it, put this in a database. If you don't have control over the file format, you're kind of out of luck.
In keeping with you writing to a BinaryWriter, I would have the object create a binary record using a second BinaryWriter that is then written to the BinaryWriter. So on your class you could have a method like this:
public void WriteTo(BinaryWriter writer)
{
MemoryStream ms = new MemoryStream();
BinaryWriter bw = new BinaryWriter(ms);
bw.Write(value1);
bw.Write(value2);
bw.Write(value3);
bw.Write(value4);
writer.Write(ms.ToArray());
}
This would create a single record with the same format as you're already writing to the main BinaryWriter, just it would build it all at once then write it as a byte array.
Create a class of the record and use the binary formatter:
FileStream fs = new FileStream("file.dat", FileMode.Create);
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fs, <insert instance of a class here>);
fs.Close();
I haven't done this myself so I'm not absolutely sure it would work, the class cannot contain any other data, that's for sure. If you have no luck with a class you could try a struct.
Edit:
Just came up with another possible solution, create a struct of your data and use the Buffer.BlockCopy function:
byte[] writeBuffer = new byte[sizeof(structure)];
structure[] strucPtr = new structure[1]; // must be an array, 1 element is enough though
strucPtr[0].item1 = 0213; // initialize all the members
// Copy the structure array into the byte array.
Buffer.BlockCopy(strucPtr, 0, writeBuffer, 0, writeBuffer.Length);
Now you can write the writeBuffer to file in one go.
Second edit:
I don't agree with the sync problems not beeing possible to solve. First of all, the data is written to the file in entire sectors, not one byte at a time. And the file is really not updated until you flush it, thus writing data and updating the file length. The best and safest thing to do is to open the file exclusively, write a record (or several), and close the file. That requires the reading applications to use a similiar manner to read the file (open ex, read, close), as well as handling "access denied" errors gracefully.
Anyhow, I'm quite sure this will perform better no matter what when your'e writing an entire record at a time.

Categories

Resources