I've been playing around with what I thought was a simple idea. I want to be able to read in a file from somewhere (website, filesystem, ftp), perform some operations on it (compress, encrypt, etc.) and then save it somewhere (somewhere may be a filesystem, ftp, or whatever). It's a basic pipeline design. What I would like to do is to read in the file and put it onto a MemoryStream, then perform the operations on the data in the MemoryStream, and then save that data in the MemoryStream somewhere. I was thinking I could use the same Stream to do this but run into a couple of problems:
Everytime I use a StreamWriter or StreamReader I need to close it and that closes the stream so that I cannot use it anymore. That seems like there must be some way to get around that.
Some of these files may be big and so I may run out of memory if I try to read the whole thing in at once.
I was hoping to be able to spin up each of the steps as separate threads and have the compression step begin as soon as there is data on the stream, and then as soon as the compression has some compressed data available on the stream I could start saving it (for example). Is anything like this easily possible with the C# Streams? ANyone have thoughts as to how to accomplish this best?
Thanks,
Mike
Using a helper method to drive the streaming:
static public void StreamCopy(Stream source, Stream target)
{
byte[] buffer = new byte[8 * 1024];
int size;
do
{
size = source.Read(buffer, 0, 8 * 1024);
target.Write(buffer, 0, size);
} while (size > 0);
}
You can easily combine whatever you need:
using (FileStream iFile = new FileStream(...))
using (FileStream oFile = new FileStream(...))
using (DeflateStream oZip = new DeflateStream(outFile, CompressionMode.Compress))
StreamCopy(iFile, oZip);
Depending on what you are actually trying to do, you'd chain the streams differently. This also uses relatively little memory, because only the data being operated upon is in memory.
StreamReader/StreamWriter shouldn't have been designed to close their underlying stream -- that's a horrible misfeature in the BCL. But they do, they won't be changed (because of backward compatibility), so we're stuck with this disaster of an API.
But there are some well-established workarounds, if you want to use StreamReader/Writer but keep the Stream open afterward.
For a StreamReader: don't Dispose the StreamReader. It's that simple. It's harmless to just let a StreamReader go without ever calling Dispose. The only effect is that your Stream won't get prematurely closed, which is actually a plus.
For a StreamWriter: there may be buffered data, so you can't get away with just letting it go. You have to call Flush, to make sure that buffered data gets written out to the Stream. Then you can just let the StreamWriter go. (Basically, you put a Flush where you normally would have put a Dispose.)
Unless you're reading in streams bigger than your hard drive, I don't think you'll run out of memory:
http://blogs.msdn.com/ericlippert/archive/2009/06/08/out-of-memory-does-not-refer-to-physical-memory.aspx
Related
If I did this in my MAUI project:
var uploadImage = new UploadImage();
var img = await uploadImage.OpenMediaPickerAsync();
var imagefile = await uploadImage.Upload(img);
var imageBytes = uploadImage.StringToByteBase64(imagefile.byteBase64);
var stream = uploadImage.ByteArrayToStream(imageBytes);
img_profilePic.Source = ImageSource.FromStream(() => stream); //working
I am displaying an image from my ios simulator. At one point in time I have the stream of the IO at my displosal.
If I now add
MemoryStream newStream = new MemoryStream();
stream.CopyTo(newStream);
these two lines of code to my code above, the image I was displaying at that point dissapears.
SO somehow, when I copy my stream, I for some reason also delete the already existing stream...?
What is going on here...
The point of streams is to be able to process very large amounts of data (sequentially) without having to load all that data into memory (at once anyway), which is what happens if instead of streams you work with collections. So, what stream do is track the current position so they can keep loading more data to memory when the position moves forward and unload already processed data. So, using the same stream for two things it's not going to work because the second usage won't get the data that the first usage has already consumed.
That said, it's possible to reset the position of a stream with the seek operation (as long as the type of stream you are using allows it).
Check out this answer https://stackoverflow.com/a/1746108/352826
SO somehow, when I copy my stream, I for some reason also delete the already existing stream...?
It's far more likely that you're trying to use the same stream as multiple sources without rewinding it. Streams have a position that increases as you use it.
In short, use this when you want to reuse your stream:
stream.Position = 0;
Assuming, of course, that whatever stream you're using is rewindable in the first place. If not, then copy it to a memory stream first and reuse that as many times as you wish, rewinding between uses.
I receive a fair amount of binary data from an external device, about 30-40 MB/s. I need to save it to a file. On the external source device side, I have a very small buffer that I can't enlarge and as soon as the transmission stutters on the C# application side, it quickly gets clogged and I lose data.
In the application, I tried writing using FileStream, but unfortunately it is not fast enough.
_FileStream = new FileStream(FileName, FileMode.Create, FileAccess.Write);
...
void Handler_OnFtdiBytesReceived(object sender, FtdiBytesReceivedEventArgs e)
{
...
Array.Copy(e.Bytes, 0, _ReceivedDataBuffer, _ReceivedDataBufferPosition, e.NumBytesAvailable);
_ReceivedDataBufferPosition += (int)e.NumBytesAvailable;
if (_ReceivedDataBufferPosition > 0)
{
_FileStream.Write(_ReceivedDataBuffer, 0, (int)e.NumBytesAvailable);
_ReceivedDataBufferPosition = 0;
}
if (_IsOperationFinished == true)
{
_FileStream.Flush();
_FileStream.Close();
}
...
}
I also tried adding a BinaryWriter:
_FileStream = new FileStream(FileName, FileMode.Create, FileAccess.Write);
_BinaryWriter = new BinaryWriter(_FileStream);
and then:
_BinaryWriter.Write(_ReceivedDataBuffer, 0, (int)e.NumBytesAvailable);
instead of previous _FileStream.Write(...), but also the buffer on the transmit side gets clogged.
Is there any way to deal with this?
I wonder if it might help to somehow buffer the data in the computer's RAM when receiving it and, for example, when it reaches some sizable amount (say, 512 MB), start writing to the file in a separate Task, so that in the meantime, new data can be collected into the buffer continuously. Perhaps I would need to use two buffers and use them alternately, one to receive data continuously and the other from which to write to the file, and swap??
This seems quite complex code for someone with little experience, which I don't know if it will help, so I'd like to ask you for a hint first.
I'll just add at the end that I have the ability to watch the buffer fill up in this external device and by commenting out this one line regarding writing:
_FileStream.Write(_ReceivedDataBuffer, 0, (int)e.NumBytesAvailable);
I see that the problem with its clogging disappears. Earlier I also analyzed whether other code fragments might be inefficient, such as Array.Copy(...) or passing parameters via Event, but it had no effect.
Occasionally we need to copy huge files from one bucket to another in AWS S3. Whenever possible we use the CopyRequest to handle this operation all on AWS (since no round trip required back to the client). But sometimes we do not have the option to do this because we need to copy between 2 completely separate accounts which requires a GET and then a PUT.
Problems:
The response stream returned from the GET is not seekable so it cannot be passed to the PUT request and have it stream seamlessly from one to the other
Copying the response stream to an intermediary stream (MemoryStream) using CopyTo() and then passing that to the PUT operation works well but doesn't scale (large files will throw OutOfMemory exceptions)
So basically I need an intermediary stream that I can read/write to at the same time, basically I would read a chunk from the response stream and write it to my intermediary stream, meanwhile the PUT request is reading out the content and its just a seamless pass-thru sort of scenario.
I found this post on stackoverflow and it seemed promising at first but it still throws an OutOfMemory exception with large files.
.NET Asynchronous stream read/write
Anyone ever had to do something similar to this? How would you tackle it? Thanks in advcance
It's not clear why you would want to use MemoryStream. The Stream.CopyTo method in .NET 4 doesn't need to use an intermediate stream - it will just read into a local buffer of a fixed size, then write that buffer to the output stream, then read more data (overwriting the buffer) etc.
If you're not using .NET 4, it's easy to implement something similar, e.g.
public static void CopyTo(this Stream input, Stream output)
{
byte[] buffer = new byte[64 * 1024]; // 64K buffer
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
I found this, but it uses a Queue internally, which the author notes is an order of magnitude slower than a MemoryStream.
http://www.codeproject.com/Articles/16011/PipeStream-a-Memory-Efficient-and-Thread-Safe-Stre
I keep hoping I'll find an official MS library solution, but it seems that this wheel hasn't been properly invented yet.
I have a binary log file with streaming data from a sensor (Int16).
Every 6 seconds, 6000 samples of type Int16 are added, until the sensor is disconnected.
I need to poll this file on regular intervals, continuing from last position read.
Is it better to
a) keep a filestream and binary reader open and instantiated between readings
b) instantiate filestream and binary reader each time I need to read (and keep an external variable to track the last position read)
c) something better?
EDIT: Some great suggestions so far, need to add that the "server" app is supplied by an outside source vendor and cannot be modified.
If it's always adding the same amount of data, it may make sense to reopen it. You might want to find out the length before you open it, and then round down to the whole number of "sample sets" available, just in case you catch it while it's still writing the data. That may mean you read less than you could read (if the write finishes between you checking the length and starting the read) but you'll catch up next time.
You'll need to make sure you use appropriate sharing options so that the writer can still write while you're reading though. (The writer will probably have to have been written with this in mind too.)
Can you use MemoryMappedFiles?
If you can, mapping the file in memory and sharing it between processes you will be able to read the data by simply incrementing the offset for your pointer each time.
If you combine it with an event you can signal your reader when he can go in an read the information. There will be no need to block anything as the reader will always read "old" data which has already been written.
I would recommend using pipes, they act just like files, except stream data directly between applications, even if the apps run on different PCs (though this is really only an option if you are able to change both applications). Check it out under the "System.IO.Pipes" namespace.
P.S. You would use a "named" pipe for this (pipes are supported in 'c' as well, so basically any half decent programming language should be able to implement them)
I think that (a) is the best because:
Current Position will be incremented as you read and you don't need to worry about to store it somewhere;
You don't need to open it and seek required position (it shouldn't be much slower to reopen but keeping it open gives OS some hints for optimization I believe) each time you poll it;
Other solutions I can think out requires PInvokes to system interprocess synchronisation primitives. And they won't be faster than file operations already in framework.
You just need to set proper FileShare flags:
Just for example:
Server:
using(var writer = new BinaryWriter(new FileStream(#"D:\testlog.log", FileMode.Append, FileAccess.Write, FileShare.Read)))
{
int n;
while(Int32.TryParse(Console.ReadLine(), out n))
{
writer.Write(n);
writer.Flush(); // write cached bytes to file
}
}
Client:
using (var reader = new BinaryReader(new FileStream(#"D:\testlog.log", FileMode.Open, FileAccess.Read, FileShare.ReadWrite)))
{
string s;
while (Console.ReadLine() != "exit")
{
// allocate buffer for new ints
Int32[] buffer = new Int32[(reader.BaseStream.Length - reader.BaseStream.Position) / sizeof(Int32)];
Console.WriteLine("Stream length: {0}", reader.BaseStream.Length);
Console.Write("Ints read: ");
for (int i = 0; i < buffer.Length; i++)
{
buffer[i] = reader.ReadInt32();
Console.Write((i == 0 ? "" : ", ") + buffer[i].ToString());
}
Console.WriteLine();
}
}
you could also stream the data into a database, rather than a file as another alternative, then you wouldn't have to worry about file locking.
but if you're stuck with the file method, you may want to close the file each time you read data from it; it depends alot on how complicated the process writing to the file is going to be, and whether it can detect a file locking operation and respond appropriately without crashing horribly.
My problem is in regards file copying performance. We have a media management system that requires a lot of moving files around on the file system to different locations including windows shares on the same network, FTP sites, AmazonS3, etc. When we were all on one windows network we could get away with using System.IO.File.Copy(source, destination) to copy a file. Since many times all we have is an input Stream (like a MemoryStream), we tried abstracting the Copy operation to take an input Stream and an output Stream but we are seeing a massive performance decrease. Below is some code for copying a file to use as a discussion point.
public void Copy(System.IO.Stream inStream, string outputFilePath)
{
int bufferSize = 1024 * 64;
using (FileStream fileStream = new FileStream(outputFilePath, FileMode.OpenOrCreate, FileAccess.Write))
{
int bytesRead = -1;
byte[] bytes = new byte[bufferSize];
while ((bytesRead = inStream.Read(bytes, 0, bufferSize)) > 0)
{
fileStream.Write(bytes, 0, bytesRead);
fileStream.Flush();
}
}
}
Does anyone know why this performs so much slower than File.Copy? Is there anything I can do to improve performance? Am I just going to have to put special logic in to see if I'm copying from one windows location to another--in which case I would just use File.Copy and in the other cases I'll use the streams?
Please let me know what you think and whether you need additional information. I have tried different buffer sizes and it seems like a 64k buffer size is optimal for our "small" files and 256k+ is a better buffer size for our "large" files--but in either case it performs much worse than File.Copy(). Thanks in advance!
File.Copy was build around CopyFile Win32 function and this function takes lot of attention from MS crew (remember this Vista-related threads about slow copy performance).
Several clues to improve performance of your method:
Like many said earlier remove Flush method from your cycle. You do not need it at all.
Increasing buffer may help, but only on file-to-file operations, for network shares, or ftp servers this will slow down instead. 60 * 1024 is ideal for network shares, at least before vista. for ftp 32k will be enough in most cases.
Help os by providing your caching strategy (in your case sequential reading and writing), use FileStream constructor override with FileOptions parameter (SequentalScan).
You can speed up copying by using asynchronous pattern (especially useful for network-to-file cases), but do not use threads for this, instead use overlapped io (BeginRead, EndRead, BeginWrite, EndWrite in .net), and do not forget set Asynchronous option in FileStream constructor (see FileOptions)
Example of asynchronous copy pattern:
int Readed = 0;
IAsyncResult ReadResult;
IAsyncResult WriteResult;
ReadResult = sourceStream.BeginRead(ActiveBuffer, 0, ActiveBuffer.Length, null, null);
do
{
Readed = sourceStream.EndRead(ReadResult);
WriteResult = destStream.BeginWrite(ActiveBuffer, 0, Readed, null, null);
WriteBuffer = ActiveBuffer;
if (Readed > 0)
{
ReadResult = sourceStream.BeginRead(BackBuffer, 0, BackBuffer.Length, null, null);
BackBuffer = Interlocked.Exchange(ref ActiveBuffer, BackBuffer);
}
destStream.EndWrite(WriteResult);
}
while (Readed > 0);
Three changes will dramatically improve performance:
Increase your buffer size, try 1MB (well -just experiment)
After you open your fileStream, call fileStream.SetLength(inStream.Length) to allocate the entire block on disk up front (only works if inStream is seekable)
Remove fileStream.Flush() - it is redundant and probably has the single biggest impact on performance as it will block until the flush is complete. The stream will be flushed anyway on dispose.
This seemed about 3-4 times faster in the experiments I tried:
public static void Copy(System.IO.Stream inStream, string outputFilePath)
{
int bufferSize = 1024 * 1024;
using (FileStream fileStream = new FileStream(outputFilePath, FileMode.OpenOrCreate, FileAccess.Write))
{
fileStream.SetLength(inStream.Length);
int bytesRead = -1;
byte[] bytes = new byte[bufferSize];
while ((bytesRead = inStream.Read(bytes, 0, bufferSize)) > 0)
{
fileStream.Write(bytes, 0, bytesRead);
}
}
}
Dusting off reflector we can see that File.Copy actually calls the Win32 API:
if (!Win32Native.CopyFile(fullPathInternal, dst, !overwrite))
Which resolves to
[DllImport("kernel32.dll", CharSet=CharSet.Auto, SetLastError=true)]
internal static extern bool CopyFile(string src, string dst, bool failIfExists);
And here is the documentation for CopyFile
You'll never going to able to beat the operating system at doing something so fundemental with your own code, not even if you crafted it carefully in assembler.
If you need make sure that your operations occur with the best performance AND you want to mix and match various sources then you will need to create a type that describes the resource locations. You then create an API that has functions such as Copy that takes two such types and having examined the descriptions of both chooses the best performing copy mechanism. E.g., having determined that both locations are windows file locations you it would choose File.Copy OR if the source is windows file but the destination is to be HTTP POST it uses a WebRequest.
Try to remove the Flush call, and move it to be outside the loop.
Sometimes the OS knows best when to flush the IO.. It allows it to better use its internal buffers.
Here's a similar answer
How do I copy the contents of one stream to another?
Your main problem is the call to Flush(), that will bind your performance to the speed of the I/O.
Mark Russinovich would be the authority on this.
He wrote on his blog an entry Inside Vista SP1 File Copy Improvements which sums up the Windows state of the art through Vista SP1.
My semi-educated guess would be that File.Copy would be most robust over the greatest number of situations. Of course, that doesn't mean in some specific corner case, your own code might beat it...
One thing that stands out is that you are reading a chunk, writing that chunk, reading another chunk and so on.
Streaming operations are great candidates for multithreading. My guess is that File.Copy implements multithreading.
Try reading in one thread and writing in another thread. You will need to coordinate the threads so that the write thread doesn't start writing away a buffer until the read thread is done filling it up. You can solve this by having two buffers, one that is being read while the other is being written, and a flag that says which buffer is currently being used for which purpose.