How to modify/convert raw stream data during copy to another stream? - c#

I have any System.IO.Stream with input data and another one to write output to. What's a good way to copy stream contents while beeing able to detect certain keywords and modifying/removing data from the stream?
Do I have to use .Read and .Write and a buffer and deal with buffer boundaries myself (like only a part of a keyword at the end of the buffer)? Of course that's not too hard, but I hope for something more fashionable, like inheriting some ready-made stream converter class.
For example, since it's for MS Exchange anyway, I tried to inherit Microsoft.Exchange.Data.TextConverters.TextConverter, but it looks like this is not possible?
I know, encoding could be another issue, but let's treat it as raw bytes for this question.

Related

Possible to determine if a stream is a string?

Is it possible to somehow (without a huge performance loss) determine if a Stream (just a normal System.IO.Stream) "contains" a string or not? I have tried to google this, but I havent found a good solution that doesnt involve try and catching.
Any stream can be a string as a string is just a series of bytes. If you're asking if a stream contains a specific sequence of bytes -- i.e. you want to confirm that a stream contains a $MY_TOKEN$ somewhere then you'll have to read up until that point or to the end of the stream.
Depending on the nature of your stream, there might be an efficient way to do this and then reset the cursor back to the beginning of the stream.

Read once stream for multiple consumers

We have a few (big) image files. We wanted to create thumbnails from those files and at the same time record their MD5 sums.
Idealy we wanted the program to read those files only once, and never positions back. However since the data serves for two consumers, although we can create multiple threads we cannot avoid multiple read of the files.
So the requirement is: assuming a read-only, forward only stream, how to use it to feed both a new Bitmap(stream) and a call to md5.ComputeHash(stream)? (The solution should be extended to other stream consumers)
How can we do this?
For your specific case:
Instead of calling md5.ComputeHash(stream), call new CryptoStream(stream, md5, CryptoStreamMode.Read).
This stream will mirror the original stream, but will also pass it through the MD5 hasher.
Once the stream has been read to the end, the md5 instance will hold the hash.

BaseStream underlying stream

I was trying to read a binary file, it was written in a certain pattern for example: string, string, byte
I surfed web and found this code:
while (br.BaseStream.Position<br.BaseStream.Length)
{
br.ReadString();
br.ReadString();
br.ReadByte();
}
Even though it is easy code I can't understand what the underlying stream(BaseStream) means? Can somebody give me a brief explanation of it?
.NET offers different objects to read or write data. Basicly there are DataWriters and DataReaders that write or read into different streams. Streams are representing the data flow between the data source (e.g. a file) to your applications memory (or back).
To access the stream in a defined direction you can use readers or writers. BinaryReader is one example of an data reader. It is supposed to read binary data out of the stream. Streams usually inherit from a base class called Stream. There are different type of streams representing different data sources. For example a FileStream reads or writes data into a file on the HDD, whereas a MemoryStream reads or writes data into the RAM. So the implementation of a stream describes where the data is stored.
DataReaders or DataWriters describe how the data is stored. For example your BinaryReader reads byte sequences, whereas a TextReader reads text with a given encoding. But both can be used with the same stream.
To come back to your question: Your BinaryReader reads binary data from a stream. The BaseStream property returns the instance of the stream the reader reads data from. This is why you need to initialize the BinaryReader with an stream instance. You cannot tell the computer to read binary data from nowhere! ;)

How to specify BinaryReader to interpret as big-endian

Is there a way to tell the BinaryReader to interpret as big-endian? Like just saying "interpret everything big endian" so I don't have to write extra code to manually read in bytes, reverse them, and then convert it to int or float or whatever I need.
UPDATE
looked around, seems like you can't.
Which is kind of strange; I figured it's something you'd naturally do when writing a class that will read binary data from arbitrary files.
Try creating a BinaryReader BinaryReader(stream,encoding) using the Encoding.BigEndianUnicode Property
Since it was pointed out that this is for text only, you will have to create your own code to manually convert it, or you can use Scott Chamberlain's example at the end of this MSDN Forum Posting .

C# performance methods of receiving data from a socket?

Let's assume we have a simple internet socket, and it's going to send 10 megabytes (because I want to ignore memory issues) of random data through.
Is there any performance difference or a best practice method that one should use for receiving data? The final output data should be represented by a byte[]. Yes I know writing an arbitrary amount of data to memory is bad, and if I was downloading a large file I wouldn't be doing it like this. But for argument's sake let's ignore that and assume it's a smallish amount of data. I also realise that the bottleneck here is probably not the memory management but rather the socket receiving. I just want to know what would be the most efficient method of receiving data.
A few dodgy ways can think of is:
Have a List and a buffer, after the buffer is full, add it to the list and at the end list.ToArray() to get the byte[]
Write the buffer to a memory stream, after its complete construct a byte[] of the stream.Length and read it all into it in order to get the byte[] output.
Is there a more efficient/better way of doing this?
Just write to a MemoryStream and then call ToArray - that does the business of constructing an appropriately-sized byte array for you. That's effectively what a List<byte> would be like anyway, but using a MemoryStream will be a lot simpler.
Well, Jon Skeet's answer is great (as usual), but there's no code, so here's my interpretation. (Worked fine for me.)
using (var mem = new MemoryStream())
{
using (var tcp = new TcpClient())
{
tcp.Connect(new IPEndPoint(IPAddress.Parse("192.0.0.192"), 8880));
tcp.GetStream().CopyTo(mem);
}
var bytes = mem.ToArray();
}
(Why not combine the two usings? Well, if you want to debug, you might want to release the tcp connection before taking your time looking at the bytes received.)
This code will receive multiple packets and aggregate their data, FYI. So it's a great way to simply receive all tcp data sent during a connection.
What is the encoding of your data? is it plain ASCII, or is it something else, like UTF-8/Unicode?
if it is plain ASCII, you could just allocate a StringBuilder() of the required size (get the size from the ContentLength header of the response) and keep on appending your data to the builder, after converting it into a string using Encoding.ASCII.
If it is Unicode/UTF8 then you have an issue - you cannot just call Encoding..GetString(buffer, 0, bytesRead) on the bytes read, because the bytesRead might not constitute a logical string fragment in that encoding. For this case you will need to buffer the entire entity body into memory(or file), then read that file and decode it using the encoding.
You could write to a memory stream, then use a streamreader or something like that to get the data. What are you doing with the data? I ask because would be more efficient from a memory standpoint to write the incoming data to a file or database table as the data is being received rather than storing the entire contents in memory.

Categories

Resources