While opening a file in C# using stream reader is the file going to remain in memory till it closed.
For eg if a file of size 6MB is opened by a program using streamreader to append a single line at the end of the file. Will the program hold the entire 6 MB in it's memory till file is closed. OR is a file pointer returned internally by .Net code and the line is appended at the end. So the 6MB memory will not be taken up by the program
The whole point of a stream is so that you don't have to hold an entire object in memory. You read from it piece by piece as needed.
If you want to append to a file, you should use File.AppendText which will create a StreamWriter that appends to the end of a file.
Here is an example:
string path = #"c:\temp\MyTest.txt";
// This text is always added, making the file longer over time
// if it is not deleted.
using (StreamWriter sw = File.AppendText(path))
{
sw.WriteLine("This");
sw.WriteLine("is Extra");
sw.WriteLine("Text");
}
Again, the whole file will not be stored in memory.
Documentation: http://msdn.microsoft.com/en-us/library/system.io.file.appendtext.aspx
The .NET FileStream will buffer a small amount of data (you can set this amount with some of the constructors).
The Windows OS will do more significant caching of the file, if you have plenty of RAM this might be the whole file.
A StreamReader uses FileStream to open the file. FileStream stores a Windows handle, returned by the CreateFile() API function. It is 4 bytes on a 32-bit operating system. FileStream also has a byte[] buffer, it is 4096 bytes by default. This buffer avoids having to call the ReadFile() API function for every single read call. StreamReader itself has a small buffer to make decoding the text in the file more efficient, it is 128 bytes by default. And it has some private variables to keep track of the buffer index and whether or not a BOM has been detected.
This all adds up to a few kilobytes. The data you read with StreamReader will of course take space in your program's heap. That could add up to 12 megabytes if you store every string in, say, a List. You usually want to avoid that.
StreamReader will not read the 6 MB file into memory. Also, you can't append a line to the end of the file using StreamReader. You might want to use StreamWriter.
update: not counting buffering and OS caching as someone else mentioned
Related
I have a requirement where I need to encrypt file of size 1-2 GB in azure function. In am using PGP core library to encrypt file in memory. The below code is throwing out of memory exception if file size is above 700 MB. Note:- I am using azure function. Scaling up of App service plan didn't help.
I there any alternate of Memory stream that I can use. After encryption , I am uploading file into blob storage.
var privateKeyEncoded = Encoding.UTF8.GetString(Convert.FromBase64String(_options.PGPKeys.PublicKey));
using Stream privateKeyStream = StringToStreamUtility.GenerateStreamFromString(privateKeyEncoded);
privateKeyStream.Position = 0;
var encryptionKeys = new EncryptionKeys(privateKeyStream);
var pgp = new PGP(encryptionKeys);
//encrypt stream
var encryptStream = new MemoryStream();
await pgp.EncryptStreamAsync(streamToEncrypt, encryptStream );
MemoryStream is a Stream wrapper over a byte[]` buffer. Every time that buffer is full, a new one with double the size is allocated and the data is copied. This eventually uses double the final buffer size (4GB for a 2GB file) but worse, it results in such memory fragmentation that eventually the memory allocator can't find a new contiguous memory block to allocate. That's when you get an OOM.
While you could avoid OOM errors by specifying a capacity in the constructor, storing 2GB in memory before even starting to write it is very wasteful. With a real FileStream the encrypted bytes would be written out as soon as they were available.
Azure Functions allow temporary storage. This means you can create a temporary file, open a stream on it and use it for encryption.
var tempPath=Path.GetTempFileName();
try
{
using (var outputStream=File.Open(tempPath))
{
await pgp.EncryptStreamAsync(streamToEncrypt, outputStream);
...
}
}
finally
{
File.Delete(tempPath);
}
MemoryStream uses a byte[] internally, and any byte[] is going to get a bit brittle as it gets around/above 1GiB (although in theory a byte[] can be nearly 2 GiB, in reality this isn't a good idea, and is rarely seen).
Frankly, MemoryStream simply isn't a good choice here; I'd probably suggest using a temporary file instead, and use a FileStream. This doesn't attempt to keep everything in memory at once, and is more reliable at large sizes. Alternatively: avoid ever needing all the data at once completely, by performing the encryption in a pass-thru streaming way.
I'm using OpenXML to generate an Excel spreadsheet.
I'm generating the spreadsheet in a MemoryStream; the caller is writing writing out the actual file. For example, my .Net Core controller will return the memory stream as a FileResult. At the moment, I've got a standalone Console mode program that's writing a FileStream.
PROBLEM: I'm getting extra bytes at the end of the file. Since an OpenXml .xlsx file is a .zip file, the extra bytes effectively corrupt the file.
Program.cs:
using (MemoryStream memoryStream = new MemoryStream())
{
OpenXMLGenerate(memoryStream, sampleData);
long msPos = memoryStream.Position; // Position= 1869: Good!
memoryStream.Position = 0;
using (FileStream fs = new FileStream("myfile.xlsx", FileMode.OpenOrCreate))
{
memoryStream.WriteTo(fs);
long fsPos = fs.Position; // Position= 1869: Good!
}
// Myfile.xlsx filesize= 2014, not 1869! Bad!!!
}
When I open the file in 7-Zip, it it says:
Warnings: There are some data after the end of the payload data
Physical Size: 1869
Tail Size: 145
When I try to open it as a .zip file, Windows says:
The Compressed (zipped) folder is invald.
Q: Any idea why I'm getting a 2014 byte file, instead of 1869 bytes?
Q: What can I do about it?
(Documenting per comments.) The issue could be explained by the file replacing an existing file of length 2014 bytes.
Creating a file stream using a mode of FileMode.OpenOrCreate is equivalent to using FileMode.Open if the referenced file exists. If the length of the memory stream is less than the length of the existing file, the existing file will not be truncated to the length of the memory stream; in this case, if N is the length of the memory stream, the first N bytes of the existing file will be overwritten with the contens of the memory stream, and the remaining bytes will persist from the original file.
Creating the file stream with a file mode of FileMode.Create will replace the existing file entirely (if one exists), eliminating any possibility that the new file will contain remnants of the existing file.
When I use zlib in C/C++, I have a simple method uncompress which only requires two buffers and no more else. Its definition is like this:
int uncompress (Bytef *dest, uLongf *destLen, const Bytef *source,
uLong sourceLen);
/*
Decompresses the source buffer into the destination buffer. sourceLen is the byte length of the source buffer. Upon entry,
destLen is the total size of the destination buffer, which must be
large enough to hold the entire uncompressed data. (The size of
the uncompressed data must have been saved previously by the
compressor and transmitted to the decompressor by some mechanism
outside the scope of this compression library.) Upon exit, destLen
is the actual size of the uncompressed data.
uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough memory, Z_BUF_ERROR if there was not enough room in the output
buffer, or Z_DATA_ERROR if the input data was corrupted or incomplete.
In the case where there is not enough room, uncompress() will fill
the output buffer with the uncompressed data up to that point.
*/
I want to know if C# has a similar way. I checked SharpZipLib FAQ as follows but did not quite understand:
How do I compress/decompress files in memory?
Use a memory stream when creating the Zip stream!
MemoryStream outputMemStream = new MemoryStream();
using (ZipOutputStream zipOutput = new ZipOutputStream(outputMemStream)) {
// Use zipOutput stream as normal
...
You can get the resulting data with memory stream methods ToArray or GetBuffer.
ToArray is the cleaner and easiest to use correctly with the penalty
of duplicating allocated memory. GetBuffer returns a raw buffer raw
and so you need to account for the true length yourself.
See the framework class library help for more information.
I can't figure out if this block of code is for compression or decompression, if outputMemStream meas a compressed stream or an uncompressed stream. I really hope there is a easy-to-understand-way like in zlib. Thanks you very much if you can help me.
Check out the ZipArchive class, which I think has the features you need to accomplish in-memory decompression of zip files.
Assuming you have an array of bytes (byte []) which represent the ZIP file in memory, you have to instantiate a ZipArchive object which will be used to read that array of bytes and interpret them as the ZIP file you whish to load. If you check the ZipArchive class' available constructors in documentation, you will see that they require a stream object from which the data will be read. So, first step would be to convert your byte [] array to a stream that can be read by the constructors, and you can do this by using a MemoryStream object.
Here's an example of how to list all entries inside of a ZIP archive represented in memory as a bytes array:
byte [] zipArchiveBytes = ...; // Read the ZIP file in memory as an array of bytes
using (var inputStream = new MemoryStream(zipArchiveBytes))
using (var zipArchive = new ZipArchive(inputStream, ZipArchiveMode.Read))
{
Console.WriteLine("Listing archive entries...");
foreach (var archiveEntry in zipArchive.Entries)
Console.WriteLine($" {archiveEntry.FullName}");
}
Each file in the ZIP archive will be represented as a ZipArchiveEntry instance. This class offers properties which allow you to retrieve information such as the original length of a file from the ZIP archive, its compressed length, its name, etc.
In order to read a specific file which is contained inside the ZIP file, you can use ZipArchiveEntry.Open(). The following exemplifies how to open a specific file from an archive, if you have its FullName inside the ZIP archive:
ZipArchiveEntry archEntry = zipArchive.GetEntry("my-folder-inside-zip/dog-picture.jpg");
byte[] readResult;
using (Stream entryReadStream = archEntry.Open())
{
using (var tempMemStream = new MemoryStream())
{
entryReadStream.CopyTo(tempMemStream);
readResult = tempMemStream.ToArray();
}
}
This example reads the given file contents, and returns them as an array of bytes (stored in the byte[] readResult variable) which you can then use according to your needs.
I'm attempting to take a large file, uploaded from a web app, and make it a memorystream for processing later. I was receiving OutOfMemory exceptions when trying to copy the HttpPostedFileBase's inputstream into a new MemoryStream. During troubleshooting, I tried just creating a new MemoryStream and allocate the same amount of space (roughly) as the length of the InputStream (935,638,275), like so:
MemoryStream memStream = new MemoryStream(935700000);
Even doing this results in a System.OutOfMemoryException on this line.
I only slightly understand MemoryStreams, and this seems to be something to do with how MemoryStreams buffer data. Is there a way for me to get all of the data into one MemoryStream without too much fuss?
I am not sure what the processing involves, but the HttpPostedFileBase already contains a stream with the data. You can use that stream to process what you need to do.
If you really need to move back and forth or multiple times over the stream, and the input stream does not support seeking/positioning, you may want to stream the data to a temporary local file first and then use a file stream to do your processing against that file.
If many people uploading via your web app, the array size you specified would quickly eat up all memory using a MemoryStream.
I am developing an application which requires to read a text file which is continuously updating. I need to read the file till the end of file(at that very moment) and need to remember this location for my next file read. I am planning to develop this application in C#.net . How can I perform these partial reads and remember the locations as C# does not provide pointers in file handling ?
Edit : File is updated every 30 seconds and new data is appended to the old file.
I tried with maintaining the length of previous read and then reading the data from that location but the file can not be accessed by two applications at the same time.
You can maintain the last offset of the read pointer in the file. You can do sth like this
long lastOffset = 0;
using (var fs = new FileStream("myFile.bin", FileMode.Open))
{
fs.Seek(lastOffset, SeekOrigin.Begin);
// Read the file here
// Once file is read, update the lastOffset
lastOffset=fs.Seek(0, SeekOrigin.End);
}
Open the file, read everything, save the count of bytes you read in a variable (let's assume it's read_until_here).
next time you read the file, just take the new information (whichever comes after the location of your read_until_here variable ...
I am planning to develop this application in C#.net . How can I perform these partial reads and remember the locations as C# does not provide pointers in file handling ?
Not entirely sure why you'd be concerned about the supposed lack of pointers... I'd look into int FileStream.Read(byte[] array, int offset, int count) myself.
It allows you to read from an offset into a buffer, as many bytes as you want, and tells you how many bytes were actually read... which looks to be all the functionality you'd need.