Read data in FileStream into a generic Stream - c#

What's the most efficient way to read a stream into another stream? In this case, I'm trying to read data in a Filestream into a generic stream. I know I could do the following:
1. read line by line and write the data to the stream
2. read chunks of bytes and write to the stream
3. etc
I'm just trying to find the most efficient way.
Thanks

Stephen Toub discusses a stream pipeline in his MSDN .NET matters column here. In the article he describes a CopyStream() method that copies from one input stream to another stream. This sounds quite similar to what you're trying to do.

I rolled together a quick extension method (so VS 2008 w/ 3.5 only):
public static class StreamCopier
{
private const long DefaultStreamChunkSize = 0x1000;
public static void CopyTo(this Stream from, Stream to)
{
if (!from.CanRead || !to.CanWrite)
{
return;
}
var buffer = from.CanSeek
? new byte[from.Length]
: new byte[DefaultStreamChunkSize];
int read;
while ((read = from.Read(buffer, 0, buffer.Length)) > 0)
{
to.Write(buffer, 0, read);
}
}
}
It can be used thus:
using (var input = File.OpenRead(#"C:\wrnpc12.txt"))
using (var output = File.OpenWrite(#"C:\wrnpc12.bak"))
{
input.CopyTo(output);
}
You can also swap the logic around slightly and write a CopyFrom() method as well.

Reading a buffer of bytes and then writing it is fastest. Methods like ReadLine() need to look for line delimiters, which takes more time than just filling a buffer.

I assume by generic stream, you mean any other kind of stream, like a Memory Stream, etc.
If so, the most efficient way is to read chunks of bytes and write them to the recipient stream. The chunk size can be something like 512 bytes.

Related

Stuck on second loop through at the reading part of the stream

public static void RemoteDesktopFunction()
{
Task.Run(async() =>
{
while (!ClientSession.noConnection && data != "§Close§")
{
byte[] frameBytes = ScreenShotToByteArray();
byte[] buffer = new byte[900];
using (MemoryStream byteStream = new MemoryStream())
{
await byteStream.WriteAsync(frameBytes, 0, frameBytes.Length);
byteStream.Seek(0, SeekOrigin.Begin);
for (int i = 0; i <= frameBytes.Length; i+= buffer.Length)
{
await byteStream.ReadAsync(buffer, i, buffer.Length);
await ClientSession.SendData(Encoding.UTF8.GetString(buffer).Trim('\0')+ "§RemoteDesktop§");
}
await ClientSession.SendData("§RemoteDesktopFrameDone§§RemoteDesktop§");
};
}
});
}
I'm trying to add a remoteDesktop function to my program by passing chunks of bytes that are read from the byte stream. frameBytes.length is about 20,000b in the debugger. And the chunk is 900b. I expected it to read through and send chunks of data from the frameBytes array to a network stream. But it got stuck on :
await byteStream.ReadAsync(buffer, i, buffer.Length);
On the second loopthrough...
What could cause the issue?
There is no obvious reason why this code should hand on ReadAsync. But an obvious problem is that you are not using the return value that tells you how many bytes are actually read. So the last 'chunk' will likely have a bunch of invalid data from the last chunk at the end.
Note that there is really no reason to use async variants to read/write 900 bytes fro/to a memory stream. Async is mostly meant to hide IO latency, and writing to memory is not an IO operation.
If the goal is to chunk a byte array you can just use the overload of GetString that takes a span.
var chunk = frameBytes.AsSpan().Slice(i, Math.Min(900, frameBytes.Length - i);
At least on any modern c# version, on older versions you can just use Buffer.BlockCopy, no need for a memory stream.
All this assumes your actual idea is sound. I know little about RDP, but it seems odd to convert a array of more or less random data to a string as if it was UTF8 encoded. Normally when sending binary data over a text protocol you would encode it as a base64 string, or possibly prefix it with a command that includes the length. I'm also not sure what the purpose of sending it in chunks is, what is the client supposed to do with 900bytes of screenshot? But again, I know little about RDP.

How to copy a large stream to another without OutOfMemoryException in C# [duplicate]

What is the best way to copy the contents of one stream to another? Is there a standard utility method for this?
From .NET 4.5 on, there is the Stream.CopyToAsync method
input.CopyToAsync(output);
This will return a Task that can be continued on when completed, like so:
await input.CopyToAsync(output)
// Code from here on will be run in a continuation.
Note that depending on where the call to CopyToAsync is made, the code that follows may or may not continue on the same thread that called it.
The SynchronizationContext that was captured when calling await will determine what thread the continuation will be executed on.
Additionally, this call (and this is an implementation detail subject to change) still sequences reads and writes (it just doesn't waste a threads blocking on I/O completion).
From .NET 4.0 on, there's is the Stream.CopyTo method
input.CopyTo(output);
For .NET 3.5 and before
There isn't anything baked into the framework to assist with this; you have to copy the content manually, like so:
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write (buffer, 0, read);
}
}
Note 1: This method will allow you to report on progress (x bytes read so far ...)
Note 2: Why use a fixed buffer size and not input.Length? Because that Length may not be available! From the docs:
If a class derived from Stream does not support seeking, calls to Length, SetLength, Position, and Seek throw a NotSupportedException.
MemoryStream has .WriteTo(outstream);
and .NET 4.0 has .CopyTo on normal stream object.
.NET 4.0:
instream.CopyTo(outstream);
I use the following extension methods. They have optimized overloads for when one stream is a MemoryStream.
public static void CopyTo(this Stream src, Stream dest)
{
int size = (src.CanSeek) ? Math.Min((int)(src.Length - src.Position), 0x2000) : 0x2000;
byte[] buffer = new byte[size];
int n;
do
{
n = src.Read(buffer, 0, buffer.Length);
dest.Write(buffer, 0, n);
} while (n != 0);
}
public static void CopyTo(this MemoryStream src, Stream dest)
{
dest.Write(src.GetBuffer(), (int)src.Position, (int)(src.Length - src.Position));
}
public static void CopyTo(this Stream src, MemoryStream dest)
{
if (src.CanSeek)
{
int pos = (int)dest.Position;
int length = (int)(src.Length - src.Position) + pos;
dest.SetLength(length);
while(pos < length)
pos += src.Read(dest.GetBuffer(), pos, length - pos);
}
else
src.CopyTo((Stream)dest);
}
.NET Framework 4 introduce new "CopyTo" method of Stream Class of System.IO namespace. Using this method we can copy one stream to another stream of different stream class.
Here is example for this.
FileStream objFileStream = File.Open(Server.MapPath("TextFile.txt"), FileMode.Open);
Response.Write(string.Format("FileStream Content length: {0}", objFileStream.Length.ToString()));
MemoryStream objMemoryStream = new MemoryStream();
// Copy File Stream to Memory Stream using CopyTo method
objFileStream.CopyTo(objMemoryStream);
Response.Write("<br/><br/>");
Response.Write(string.Format("MemoryStream Content length: {0}", objMemoryStream.Length.ToString()));
Response.Write("<br/><br/>");
There is actually, a less heavy-handed way of doing a stream copy. Take note however, that this implies that you can store the entire file in memory. Don't try and use this if you are working with files that go into the hundreds of megabytes or more, without caution.
public static void CopySmallTextStream(Stream input, Stream output)
{
using (StreamReader reader = new StreamReader(input))
using (StreamWriter writer = new StreamWriter(output))
{
writer.Write(reader.ReadToEnd());
}
}
NOTE: There may also be some issues concerning binary data and character encodings.
The basic questions that differentiate implementations of "CopyStream" are:
size of the reading buffer
size of the writes
Can we use more than one thread (writing while we are reading).
The answers to these questions result in vastly different implementations of CopyStream and are dependent on what kind of streams you have and what you are trying to optimize. The "best" implementation would even need to know what specific hardware the streams were reading and writing to.
Unfortunately, there is no really simple solution. You can try something like that:
Stream s1, s2;
byte[] buffer = new byte[4096];
int bytesRead = 0;
while (bytesRead = s1.Read(buffer, 0, buffer.Length) > 0) s2.Write(buffer, 0, bytesRead);
s1.Close(); s2.Close();
But the problem with that that different implementation of the Stream class might behave differently if there is nothing to read. A stream reading a file from a local harddrive will probably block until the read operaition has read enough data from the disk to fill the buffer and only return less data if it reaches the end of file. On the other hand, a stream reading from the network might return less data even though there are more data left to be received.
Always check the documentation of the specific stream class you are using before using a generic solution.
There may be a way to do this more efficiently, depending on what kind of stream you're working with. If you can convert one or both of your streams to a MemoryStream, you can use the GetBuffer method to work directly with a byte array representing your data. This lets you use methods like Array.CopyTo, which abstract away all the issues raised by fryguybob. You can just trust .NET to know the optimal way to copy the data.
if you want a procdure to copy a stream to other the one that nick posted is fine but it is missing the position reset, it should be
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
long TempPos = input.Position;
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
input.Position = TempPos;// or you make Position = 0 to set it at the start
}
but if it is in runtime not using a procedure you shpuld use memory stream
Stream output = new MemoryStream();
byte[] buffer = new byte[32768]; // or you specify the size you want of your buffer
long TempPos = input.Position;
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
input.Position = TempPos;// or you make Position = 0 to set it at the start
Since none of the answers have covered an asynchronous way of copying from one stream to another, here is a pattern that I've successfully used in a port forwarding application to copy data from one network stream to another. It lacks exception handling to emphasize the pattern.
const int BUFFER_SIZE = 4096;
static byte[] bufferForRead = new byte[BUFFER_SIZE];
static byte[] bufferForWrite = new byte[BUFFER_SIZE];
static Stream sourceStream = new MemoryStream();
static Stream destinationStream = new MemoryStream();
static void Main(string[] args)
{
// Initial read from source stream
sourceStream.BeginRead(bufferForRead, 0, BUFFER_SIZE, BeginReadCallback, null);
}
private static void BeginReadCallback(IAsyncResult asyncRes)
{
// Finish reading from source stream
int bytesRead = sourceStream.EndRead(asyncRes);
// Make a copy of the buffer as we'll start another read immediately
Array.Copy(bufferForRead, 0, bufferForWrite, 0, bytesRead);
// Write copied buffer to destination stream
destinationStream.BeginWrite(bufferForWrite, 0, bytesRead, BeginWriteCallback, null);
// Start the next read (looks like async recursion I guess)
sourceStream.BeginRead(bufferForRead, 0, BUFFER_SIZE, BeginReadCallback, null);
}
private static void BeginWriteCallback(IAsyncResult asyncRes)
{
// Finish writing to destination stream
destinationStream.EndWrite(asyncRes);
}
For .NET 3.5 and before try :
MemoryStream1.WriteTo(MemoryStream2);
Easy and safe - make new stream from original source:
MemoryStream source = new MemoryStream(byteArray);
MemoryStream copy = new MemoryStream(byteArray);
The following code to solve the issue copy the Stream to MemoryStream using CopyTo
Stream stream = new MemoryStream();
//any function require input the stream. In mycase to save the PDF file as stream
document.Save(stream);
MemoryStream newMs = (MemoryStream)stream;
byte[] getByte = newMs.ToArray();
//Note - please dispose the stream in the finally block instead of inside using block as it will throw an error 'Access denied as the stream is closed'

DeflateStream.Flush() has no functionality?

I was trying to compress a byte array using DeflateStream. After wring the data, I was looking for a way to close the compression (mark as done). At first, I tried Dispose() then Close(), but those made the result MemoryStream unreadable. Then I thought I may need to Flush(), but the description said "The current implementation of this method has no functionality"
But it seems that without Flush(), the result seems empty. What does it mean that "The current implementation of this method has no functionality"?
static void Main(string[] args)
{
byte[] result = Encoding.UTF8.GetBytes("わたしのこいはみなみのかぜにのってはしるわ");
var output = new MemoryStream();
var dstream = new DeflateStream(output, CompressionLevel.Optimal);
dstream.Write(result, 0, result.Length);
var compressedSize1 = output.Position;
dstream.Flush();
var compressedSize2 = output.Position;
What does it mean that "The current implementation of this method has no functionality" ?
it means: it doesn't do anything, so calling Flush() by itself isn't going to help. You need to close the deflate stream for it to finish writing what it needs, but without closing the underlying stream (blocks will be written as you write to the stream, as the internal buffer fills - but the final partial block cannot be written until the data is ready to be terminated; not all compression sequences support being able to flush at an arbitrary point and then resume compressed data in additional blocks)
What you probably want is:
using (var dstream = new DeflateStream(output, CompressionLevel.Optimal, true))
{
// Your deflate code
}
// Now check the memory stream
Note the extra bool leaveOpen parameter usage.

Convert a Stream to a FileStream in C#

What is the best method to convert a Stream to a FileStream using C#.
The function I am working on has a Stream passed to it containing uploaded data, and I need to be able to perform stream.Read(), stream.Seek() methods which are methods of the FileStream type.
A simple cast does not work, so I'm asking here for help.
Read and Seek are methods on the Stream type, not just FileStream. It's just that not every stream supports them. (Personally I prefer using the Position property over calling Seek, but they boil down to the same thing.)
If you would prefer having the data in memory over dumping it to a file, why not just read it all into a MemoryStream? That supports seeking. For example:
public static MemoryStream CopyToMemory(Stream input)
{
// It won't matter if we throw an exception during this method;
// we don't *really* need to dispose of the MemoryStream, and the
// caller should dispose of the input stream
MemoryStream ret = new MemoryStream();
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
ret.Write(buffer, 0, bytesRead);
}
// Rewind ready for reading (typical scenario)
ret.Position = 0;
return ret;
}
Use:
using (Stream input = ...)
{
using (Stream memory = CopyToMemory(input))
{
// Seek around in memory to your heart's content
}
}
This is similar to using the Stream.CopyTo method introduced in .NET 4.
If you actually want to write to the file system, you could do something similar that first writes to the file then rewinds the stream... but then you'll need to take care of deleting it afterwards, to avoid littering your disk with files.

How to write super-fast file-streaming code in C#?

I have to split a huge file into many smaller files. Each of the destination files is defined by an offset and length as the number of bytes. I'm using the following code:
private void copy(string srcFile, string dstFile, int offset, int length)
{
BinaryReader reader = new BinaryReader(File.OpenRead(srcFile));
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
byte[] buffer = reader.ReadBytes(length);
BinaryWriter writer = new BinaryWriter(File.OpenWrite(dstFile));
writer.Write(buffer);
}
Considering that I have to call this function about 100,000 times, it is remarkably slow.
Is there a way to make the Writer connected directly to the Reader? (That is, without actually loading the contents into the Buffer in memory.)
I don't believe there's anything within .NET to allow copying a section of a file without buffering it in memory. However, it strikes me that this is inefficient anyway, as it needs to open the input file and seek many times. If you're just splitting up the file, why not open the input file once, and then just write something like:
public static void CopySection(Stream input, string targetFile, int length)
{
byte[] buffer = new byte[8192];
using (Stream output = File.OpenWrite(targetFile))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
This has a minor inefficiency in creating a buffer on each invocation - you might want to create the buffer once and pass that into the method as well:
public static void CopySection(Stream input, string targetFile,
int length, byte[] buffer)
{
using (Stream output = File.OpenWrite(targetFile))
{
int bytesRead = 1;
// This will finish silently if we couldn't read "length" bytes.
// An alternative would be to throw an exception
while (length > 0 && bytesRead > 0)
{
bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
output.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
Note that this also closes the output stream (due to the using statement) which your original code didn't.
The important point is that this will use the operating system file buffering more efficiently, because you reuse the same input stream, instead of reopening the file at the beginning and then seeking.
I think it'll be significantly faster, but obviously you'll need to try it to see...
This assumes contiguous chunks, of course. If you need to skip bits of the file, you can do that from outside the method. Also, if you're writing very small files, you may want to optimise for that situation too - the easiest way to do that would probably be to introduce a BufferedStream wrapping the input stream.
The fastest way to do file I/O from C# is to use the Windows ReadFile and WriteFile functions. I have written a C# class that encapsulates this capability as well as a benchmarking program that looks at differnet I/O methods, including BinaryReader and BinaryWriter. See my blog post at:
http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/
How large is length? You may do better to re-use a fixed sized (moderately large, but not obscene) buffer, and forget BinaryReader... just use Stream.Read and Stream.Write.
(edit) something like:
private static void copy(string srcFile, string dstFile, int offset,
int length, byte[] buffer)
{
using(Stream inStream = File.OpenRead(srcFile))
using (Stream outStream = File.OpenWrite(dstFile))
{
inStream.Seek(offset, SeekOrigin.Begin);
int bufferLength = buffer.Length, bytesRead;
while (length > bufferLength &&
(bytesRead = inStream.Read(buffer, 0, bufferLength)) > 0)
{
outStream.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
while (length > 0 &&
(bytesRead = inStream.Read(buffer, 0, length)) > 0)
{
outStream.Write(buffer, 0, bytesRead);
length -= bytesRead;
}
}
}
You shouldn't re-open the source file each time you do a copy, better open it once and pass the resulting BinaryReader to the copy function. Also, it might help if you order your seeks, so you don't make big jumps inside the file.
If the lengths aren't too big, you can also try to group several copy calls by grouping offsets that are near to each other and reading the whole block you need for them, for example:
offset = 1234, length = 34
offset = 1300, length = 40
offset = 1350, length = 1000
can be grouped to one read:
offset = 1234, length = 1074
Then you only have to "seek" in your buffer and can write the three new files from there without having to read again.
Have you considered using the CCR since you are writing to separate files you can do everything in parallel (read and write) and the CCR makes it very easy to do this.
static void Main(string[] args)
{
Dispatcher dp = new Dispatcher();
DispatcherQueue dq = new DispatcherQueue("DQ", dp);
Port<long> offsetPort = new Port<long>();
Arbiter.Activate(dq, Arbiter.Receive<long>(true, offsetPort,
new Handler<long>(Split)));
FileStream fs = File.Open(file_path, FileMode.Open);
long size = fs.Length;
fs.Dispose();
for (long i = 0; i < size; i += split_size)
{
offsetPort.Post(i);
}
}
private static void Split(long offset)
{
FileStream reader = new FileStream(file_path, FileMode.Open,
FileAccess.Read);
reader.Seek(offset, SeekOrigin.Begin);
long toRead = 0;
if (offset + split_size <= reader.Length)
toRead = split_size;
else
toRead = reader.Length - offset;
byte[] buff = new byte[toRead];
reader.Read(buff, 0, (int)toRead);
reader.Dispose();
File.WriteAllBytes("c:\\out" + offset + ".txt", buff);
}
This code posts offsets to a CCR port which causes a Thread to be created to execute the code in the Split method. This causes you to open the file multiple times but gets rid of the need for synchronization. You can make it more memory efficient but you'll have to sacrifice speed.
The first thing I would recommend is to take measurements. Where are you losing your time? Is it in the read, or the write?
Over 100,000 accesses (sum the times):
How much time is spent allocating the buffer array?
How much time is spent opening the file for read (is it the same file every time?)
How much time is spent in read and write operations?
If you aren't doing any type of transformation on the file, do you need a BinaryWriter, or can you use a filestream for writes? (try it, do you get identical output? does it save time?)
Using FileStream + StreamWriter I know it's possible to create massive files in little time (less than 1 min 30 seconds). I generate three files totaling 700+ megabytes from one file using that technique.
Your primary problem with the code you're using is that you are opening a file every time. That is creating file I/O overhead.
If you knew the names of the files you would be generating ahead of time, you could extract the File.OpenWrite into a separate method; it will increase the speed. Without seeing the code that determines how you are splitting the files, I don't think you can get much faster.
No one suggests threading? Writing the smaller files looks like text book example of where threads are useful. Set up a bunch of threads to create the smaller files. this way, you can create them all in parallel and you don't need to wait for each one to finish. My assumption is that creating the files(disk operation) will take WAY longer than splitting up the data. and of course you should verify first that a sequential approach is not adequate.
(For future reference.)
Quite possibly the fastest way to do this would be to use memory mapped files (so primarily copying memory, and the OS handling the file reads/writes via its paging/memory management).
Memory Mapped files are supported in managed code in .NET 4.0.
But as noted, you need to profile, and expect to switch to native code for maximum performance.

Categories

Resources