System.IO.File.ReadAllBytes for file larger than 2GB

System.IO.File.ReadAllBytes for file larger than 2GB - c#

I have a large file that I need to copy to memory for further processing. The software works fine for files smaller than 2GB, but as soon as they pass this limit I get an exception that ReadAllBytes only supports files smaller than 2GB.
byte[] buffer = System.IO.File.ReadAllBytes(file); // exception if file > 2GB
What is the fastest way to copy a file larger than 2GB to memory?
The process is already 64bit and the flag gcAllowVeryLargeObjects is already set.

I doubt you can do anything faster than a memory mapped file http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile(v=vs.110).aspx.
using ( var file = MemoryMappedFile.CreateFromFile( "F:\\VeryLargeFile.data" ) )
{
}
You can then use CreateViewAccessor or CreateViewStream to manipulate the data.

Related

C# MemoryMappedFile not enough resources

I'm doing operations on images. Some of these operations require me to create 3 different versions of the pixel data from the image and then later on combine them and do operantions on it.
For regular/small images the code works fine, I simply initialize my image raster data as new int[size].
However for bigger images with a bigger resolution (600, 1200, ...) the new int[size] throws an OutOfMemoryException. Trying to allocate more than 2GB. However I've built it 64bit (not anycpu or 32bit).
To resolve this issue, I've tried to create a MemoryMappedFile in memory itself. This gave me out of resource also. Next I've tried to create a MemoryMappedFile but by first creating a file on disk and then creating a accessor over the complete file.
Still I'm facing the not enough resources with the temporary file on disk and the MemoryMappedFile/ViewAcessor.
Am I doing something wrong in the code below? I thought the MMF and Accessor would handle the virtual memory paging automagically.
mmfPath = Path.GetTempFileName();
// create a file on disk first
using (var fs = File.OpenWrite(mmfPath))
{
var widthBytes = new byte[width * 4];
for (int y = 0; y < height; y++)
{
fs.Write(widthBytes, 0, widthBytes.Length);
}
}
// open the file on disk as a MMF
_RasterData = MemoryMappedFile.CreateFromFile(mmfPath,
FileMode.OpenOrCreate,
Guid.NewGuid().ToString(),
0, // 0 to set te capacity to the size of the file on disk
MemoryMappedFileAccess.ReadWrite);
_RasterDataAccessor = _RasterData.CreateViewAccessor(); // <-- not enough memory resources
Not enough memory resources are available to process this command.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.MemoryMappedFiles.MemoryMappedView.CreateView(SafeMemoryMappedFileHandle memMappedFileHandle, MemoryMappedFileAccess access, Int64 offset, Int64 size)
at System.IO.MemoryMappedFiles.MemoryMappedFile.CreateViewAccessor(Int64 offset, Int64 size, MemoryMappedFileAccess access)
at System.IO.MemoryMappedFiles.MemoryMappedFile.CreateViewAccessor()
...
In case I can resolve the problem above, I think I will later on run again on the same issue when I need to create a resulting bitmap out of the pixeldata again. (2GB limit).
The goal is working with big images (and temporary copies of its pixeldata for raster/raster operations).
The current issue is that I'm getting Out of memory resources with MemoryMappedFile. Where I thought this would resolve the 2GB limit and that Windows/Framework would handle the virtual memory paging issues.
(.NET Framework 4.8 - 64bit build.)

Using memory stream is throwing out of memory exeption

I have a requirement where I need to encrypt file of size 1-2 GB in azure function. In am using PGP core library to encrypt file in memory. The below code is throwing out of memory exception if file size is above 700 MB. Note:- I am using azure function. Scaling up of App service plan didn't help.
I there any alternate of Memory stream that I can use. After encryption , I am uploading file into blob storage.
var privateKeyEncoded = Encoding.UTF8.GetString(Convert.FromBase64String(_options.PGPKeys.PublicKey));
using Stream privateKeyStream = StringToStreamUtility.GenerateStreamFromString(privateKeyEncoded);
privateKeyStream.Position = 0;
var encryptionKeys = new EncryptionKeys(privateKeyStream);
var pgp = new PGP(encryptionKeys);
//encrypt stream
var encryptStream = new MemoryStream();
await pgp.EncryptStreamAsync(streamToEncrypt, encryptStream );

MemoryStream is a Stream wrapper over a byte[]` buffer. Every time that buffer is full, a new one with double the size is allocated and the data is copied. This eventually uses double the final buffer size (4GB for a 2GB file) but worse, it results in such memory fragmentation that eventually the memory allocator can't find a new contiguous memory block to allocate. That's when you get an OOM.
While you could avoid OOM errors by specifying a capacity in the constructor, storing 2GB in memory before even starting to write it is very wasteful. With a real FileStream the encrypted bytes would be written out as soon as they were available.
Azure Functions allow temporary storage. This means you can create a temporary file, open a stream on it and use it for encryption.
var tempPath=Path.GetTempFileName();
try
{
using (var outputStream=File.Open(tempPath))
{
await pgp.EncryptStreamAsync(streamToEncrypt, outputStream);
...
}
}
finally
{
File.Delete(tempPath);
}

MemoryStream uses a byte[] internally, and any byte[] is going to get a bit brittle as it gets around/above 1GiB (although in theory a byte[] can be nearly 2 GiB, in reality this isn't a good idea, and is rarely seen).
Frankly, MemoryStream simply isn't a good choice here; I'd probably suggest using a temporary file instead, and use a FileStream. This doesn't attempt to keep everything in memory at once, and is more reliable at large sizes. Alternatively: avoid ever needing all the data at once completely, by performing the encryption in a pass-thru streaming way.

File.ReadAllBytes() throws OutOfMemoryException

I am loading small pdf files into the buffer an getting the OutOfMemoryEception. File Size 220KB works fine, the next size I have tested is 4,50MB an this file throws the exception. What ist the maximum file size and what can I do to change the max size? 4,5MB ist not that much :-)
This is the related code:
ListViewDataItem dataItem = (ListViewDataItem)e.Item;
int i = dataItem.DisplayIndex;
byte[] buffer = File.ReadAllBytes(Session["pdfFileToSplit"].ToString());
string unique = Guid.NewGuid().ToString();
Session[unique] = buffer;
Panel thumbnailPanel = (Panel)e.Item.FindControl("thumbnails");
Thumbnail thumbnail = new Thumbnail();
thumbnail.SessionKey = unique;
thumbnail.Index = i+1;
thumbnail.DPI = 17;
thumbnail.BorderColor = System.Drawing.Color.Blue;
thumbnailPanel.Controls.Add(thumbnail);
Ok I just saw something really mysterios (for me). I uploaded a file below 10MB an whatching the used memory of the IIS Server(w3wp.exe), nothing dramatic happens, a few MB up, a few down, everything worked fine. Than I've tried the same thing with a 12MB file. At the beginning it appears same, but than, suddenly, out of nowhere, the used memory of the w3wp.exe exploded to 1,5GB an than the server crashes....

The OutOfMemoryException is on server side or client side?
When you useSession[unique] = buffer, you're storing all the files (represented as byte arrays) simultaneously in your session.
That can be a lot of information.
If your session is "InProc", your server will probably run out of memory.
The limit is the memory of the machine.
When your request finishes the memory stays allocated in the session. That's the problem. You should set Session[unique] = null if this isn't the desired behavior, making the session dispose the memory on the server. If you put 10 files, 10 will be simultaneously stored in the session even after the requests finishes. They will be disposed only when the session ends.

Read Big TXT File, Out of Memory Exception

I want to read big TXT file size is 500 MB,
First I use
var file = new StreamReader(_filePath).ReadToEnd();
var lines = file.Split(new[] { '\n' });
but it throw out of memory Exception then I tried to read line by line but again after reading around 1.5 million lines it throw out of memory Exception
using (StreamReader r = new StreamReader(_filePath))
{
while ((line = r.ReadLine()) != null)
_lines.Add(line);
}
or I used
foreach (var l in File.ReadLines(_filePath))
{
_lines.Add(l);
}
but Again I received
An exception of type 'System.OutOfMemoryException' occurred in
mscorlib.dll but was not handled in user code
My Machine is powerful machine with 8GB of ram so it shouldn't be my machine problem.
p.s: I tried to open this file in NotePadd++ and I received 'the file is too big to be opened' exception.

Just use File.ReadLines which returns an IEnumerable<string> and doesn't load all the lines at once to the memory.
foreach (var line in File.ReadLines(_filePath))
{
//Don't put "line" into a list or collection.
//Just make your processing on it.
}

The cause of exception seem to be growing _lines collection but not reading big file. You are reading line and adding to some collection _lines which will be taking memory and causing out of memory execption. You can apply filters to only put the required lines to _lines collection.

I know this is an old post but Google sent me here in 2021..
Just to emphasize igrimpe's comments above:
I've run into an OutOfMemoryException on StreamReader.ReadLine() recently looping through folders of giant text files.
As igrimpe mentioned, you can sometimes encounter this where your input file exhibits a lack of uniformity in line breaks. If you are looping through a textfile and encounter this, double check your input file for unexpected characters / ascii encoded hex or binary strings, etc.
In my case, I split the 60 gb problematic file into 256mb chunks, had my file iterator stash the problematic textfiles as part of the exception trap and later remedied the problem textfiles by removing the problematic lines.

Edit:
loading the whole file in memory will be causing objects to grow, and .net will throw OOM exceptions if it cannot allocate enough contiguous memory for an object.
The answer is still the same, you need to stream the file, not read the entire contents. That may require a rearchitecture of your application, however using IEnumerable<> methods you can stack up business processes in different areas of the applications and defer processing.
A "powerful" machine with 8GB of RAM isn't going to be able to store a 500GB file in memory, as 500 is bigger than 8. (plus you don't get 8 as the operating system will be holding some, you can't allocate all memory in .Net, 32-bit has a 2GB limit, opening the file and storing the line will hold the data twice, there is an object size overhead....)
You can't load the whole thing into memory to process, you will have to stream the file through your processing.

You have to count the lines first.
It is slower, but you can read up to 2,147,483,647 lines.
int intNoOfLines = 0;
using (StreamReader oReader = new
StreamReader(MyFilePath))
{
while (oReader.ReadLine() != null) intNoOfLines++;
}
string[] strArrLines = new string[intNoOfLines];
int intIndex = 0;
using (StreamReader oReader = new
StreamReader(MyFilePath))
{
string strLine;
while ((strLine = oReader.ReadLine()) != null)
{
strArrLines[intIndex++] = strLine;
}
}

For anyone else having this issue:
If you're running out of memory while using StreamReader.ReadLine(), I'd be willing to bet your file doesn't have multiple lines to begin with. You're just assuming it does. It's an easy mistake to make because you can't just open a 10GB file with Notepad.
One time I received a 10GB file from a client that was supposed to be a list of numbers and instead of using '\n' as a separator, he used commas. The whole file was a single line which obviously caused ReadLine() to blow up.
Try reading a few thousand characters from your stream using StreamReader.Read() and see if you can find a '\n'. Odds are you won't.

File open memory c#

While opening a file in C# using stream reader is the file going to remain in memory till it closed.
For eg if a file of size 6MB is opened by a program using streamreader to append a single line at the end of the file. Will the program hold the entire 6 MB in it's memory till file is closed. OR is a file pointer returned internally by .Net code and the line is appended at the end. So the 6MB memory will not be taken up by the program

The whole point of a stream is so that you don't have to hold an entire object in memory. You read from it piece by piece as needed.
If you want to append to a file, you should use File.AppendText which will create a StreamWriter that appends to the end of a file.
Here is an example:
string path = #"c:\temp\MyTest.txt";
// This text is always added, making the file longer over time
// if it is not deleted.
using (StreamWriter sw = File.AppendText(path))
{
sw.WriteLine("This");
sw.WriteLine("is Extra");
sw.WriteLine("Text");
}
Again, the whole file will not be stored in memory.
Documentation: http://msdn.microsoft.com/en-us/library/system.io.file.appendtext.aspx

The .NET FileStream will buffer a small amount of data (you can set this amount with some of the constructors).
The Windows OS will do more significant caching of the file, if you have plenty of RAM this might be the whole file.

A StreamReader uses FileStream to open the file. FileStream stores a Windows handle, returned by the CreateFile() API function. It is 4 bytes on a 32-bit operating system. FileStream also has a byte[] buffer, it is 4096 bytes by default. This buffer avoids having to call the ReadFile() API function for every single read call. StreamReader itself has a small buffer to make decoding the text in the file more efficient, it is 128 bytes by default. And it has some private variables to keep track of the buffer index and whether or not a BOM has been detected.
This all adds up to a few kilobytes. The data you read with StreamReader will of course take space in your program's heap. That could add up to 12 megabytes if you store every string in, say, a List. You usually want to avoid that.

StreamReader will not read the 6 MB file into memory. Also, you can't append a line to the end of the file using StreamReader. You might want to use StreamWriter.
update: not counting buffering and OS caching as someone else mentioned

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

System.IO.File.ReadAllBytes for file larger than 2GB - c#

Related

C# MemoryMappedFile not enough resources

Using memory stream is throwing out of memory exeption

File.ReadAllBytes() throws OutOfMemoryException

Read Big TXT File, Out of Memory Exception

File open memory c#

Categories

Resources