C# increase the size of reading binary data - c#

I am using the below code from Jon Skeet's article. Of late, the binary data that needs to be processed has grown multi-fold. The binary data size file size that I am trying to import is ~ 900 mb almost 1 gb. How do I increase the memory stream size.
public static byte[] ReadFully (Stream stream)
{
byte[] buffer = new byte[32768];
using (MemoryStream ms = new MemoryStream())
{
while (true)
{
int read = stream.Read (buffer, 0, buffer.Length);
if (read <= 0)
return ms.ToArray();
ms.Write (buffer, 0, read);
}
}
}

Your method returns a byte array, which means it will return all of the data in the file. Your entire file will be loaded into memory.
If that is what you want to do, then simply use the built in File methods:
byte[] bytes = System.IO.File.ReadAllBytes(string path);
string text = System.IO.File.ReadAllText(string path);
If you don't want to load the entire file into memory, take advantage of your Stream
using (var fs = new FileStream("path", FileMode.Open))
using (var reader = new StreamReader(fs))
{
var line = reader.ReadLine();
// do stuff with 'line' here, or use one of the other
// StreamReader methods.
}

You don't have to increase the size of MemoryStream - by default it expands to fit the contents.
Apparently there can be problems with memory fragmentation, but you can pre-allocate memory to avoid them:
using (MemoryStream ms = new MemoryStream(1024 * 1024 * 1024)) // initial capacity 1GB
{
}
In my opinion 1GB should be no big deal these days, but it's probably better to process the data in chunks if possible. That is what Streams are designed for.

Related

Read large file into byte array and encode it to ToBase64String

I have implemented POC to read entire file content into Byte[] array. I am now succeeded to read files whose size below 100MB, when I load file whose size more than 100MB then it is throwing
Convert.ToBase64String(mybytearray) Cannot obtain value of the
local variable or argument because there is not enough memory
available.
Below is my code that I have tried to read content from file to Byte array
var sFile = fileName;
var mybytearray = File.ReadAllBytes(sFile);
var binaryModel = new BinaryModel
{
fileName = binaryFile.FileName,
binaryData = Convert.ToBase64String(mybytearray),
filePath = string.Empty
};
My model class is as below
public class BinaryModel
{
public string fileName { get; set; }
public string binaryData { get; set; }
public string filePath { get; set; }
}
I am getting "Convert.ToBase64String(mybytearray) Cannot obtain value of the local variable or argument because there is not enough memory available." this error at Convert.ToBase64String(mybytearray).
Is there anything which I need to take care to prevent this error?
Note: I do not want to add line breaks to my file content
To save memory you can convert stream of bytes in 3-packs. Every three bytes produce 4 bytes in Base64. You don't need whole file in memory at once.
Here is pseudocode:
Repeat
1. Try to read max 3 bytes from stream
2. Convert to base64, write to output stream
And simple implementation:
using (var inStream = File.OpenRead("E:\\Temp\\File.xml"))
using (var outStream = File.CreateText("E:\\Temp\\File.base64"))
{
var buffer = new byte[3];
int read;
while ((read = inStream.Read(buffer, 0, 3)) > 0)
{
var base64 = Convert.ToBase64String(buffer, 0, read);
outStream.Write(base64);
}
}
Hint: every multiply of 3 is valid. Higher - more memory, better performance, lower - less memory, worse performance.
Additional info:
File stream is an example. As a result stream use [HttpContext].Response.OutputStream and write directly to it. Processing hundreds of megabytes in one chunk will kill you and your server.
Think about total memory requirements. 100MB in string, leads to 133 MB in byte array, since you wrote about model I expect copy of this 133 MB in response. And remember it's just a simple request. A few such requests could drain your memory.
I would use two filestreams - one to read the large file, one to write the result back out.
So in chunks you would convert to base 64 ... then convert the resulting string to bytes ... and write.
private static void ConvertLargeFileToBase64()
{
var buffer = new byte[16 * 1024];
using (var fsIn = new FileStream("D:\\in.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (var fsOut = new FileStream("D:\\out.txt", FileMode.CreateNew, FileAccess.Write))
{
int read;
while ((read = fsIn.Read(buffer, 0, buffer.Length)) > 0)
{
// convert to base 64 and convert to bytes for writing back to file
var b64 = Encoding.ASCII.GetBytes(Convert.ToBase64String(buffer));
// write to the output filestream
fsOut.Write(b64, 0, read);
}
fsOut.Close();
}
}
}

Memory stream out of memory exception when filesize greter than 4 gb

I am having a problem with memory stream. I am getting a exception of out of memory exception. How to handle this.
var file = VC.ReadStream(filename, true);
var memoryStream = new MemoryStream();
file.CopyTo(memoryStream);
var fileContentBytes = memoryStream.ToArray();
memoryStream = null;
LogUtil.Log(LogUtil.LogType.INFO, String.Format("File size: {0} bytes", fileContentBytes.Length));
var enc = new UTF8Encoding();
var filecontent = enc.GetString(fileContentBytes);
First, the size of String is restricted to 2Gb, and that's why
var filecontent = enc.GetString(fileContentBytes);
will be OutOfMemory exception. Next, you've got a giant overhead at
var fileContentBytes = memoryStream.ToArray();
Since both memoryStream and fileContentBytes array are about 4 Gb
they are 8 Gb total. Yet another issue: when working with IDisposable
you are supposed to dispose the instances:
using (var memoryStream = new MemoryStream()) {
file.CopyTo(memoryStream);
var fileContentBytes = memoryStream.ToArray();
...
}
If your task is to put down the file's size into a log you don't need to read the file at all:
long length = new System.IO.FileInfo(filename).Length;
LogUtil.Log(LogUtil.LogType.INFO, String.Format("File size: {0} bytes", length));
When working with file content use FileStream class istead of MemoryStream since it works with memory chunks (Kbs in size, usually 4 or 8 Kb)
Reading a so big file at once in memory is not a so god idea. Since this is a text file you should read it and process line by line, using FileStream.ReadLine(). If it is a bynary file, find some chunking strategy. If you really can't do so, try with some Memory Mapping technique.
I can also advice you to you the StreamReader.ReadLineAsync() Method.

BinaryReader reading different length of data depending on BufferSize

The issue is as follows, I am using an HttpWebRequest to request some online data from dmo.gov.uk. The response I am reading using a BinaryReader and writing to a MemoryStream. I have packaged the code being used into a simple test method:
public static byte[] Test(int bufferSize)
{
var request = (HttpWebRequest)WebRequest.Create("http://www.dmo.gov.uk/xmlData.aspx?rptCode=D3B.2");
request.Method = "GET";
request.Credentials = CredentialCache.DefaultCredentials;
var buffer = new byte[bufferSize];
using (var httpResponse = (HttpWebResponse)request.GetResponse())
{
using (var ms = new MemoryStream())
{
using (var reader = new BinaryReader(httpResponse.GetResponseStream()))
{
int bytesRead;
while ((bytesRead = reader.Read(buffer, 0, bufferSize)) > 0)
{
ms.Write(buffer, 0, bytesRead);
}
}
return ms.GetBuffer();
}
}
}
My real-life code uses a buffer size of 2048 bytes usually, however I noticed today that this file has a huge amount of empty bytes (\0) at the end which bloats the file size. As a test I tried increasing the buffer size to near-on the file size I expected (I was expecting ~80Kb so made the buffer size 79000) and now I get the right file size. But I'm confused, I expected to get the same file size regardless of the buffer size used to read the data.
The following test:
Console.WriteLine(Test(2048).Length);
Console.WriteLine(Test(79000).Length);
Console.ReadLine();
Yields the follwoing output:
131072
81341
The second figure, using the high buffer size is the exact file size I was expecting (This file changes daily, so expect that size to differ after today's date). The first figure contains \0 for everything after the file size expected.
What's going on here?
You should change ms.GetBuffer(); to ms.ToArray();.
GetBuffer will return the entire MemoryStream buffer while ToArray will return all the values inside the MemoryStream.

Appending to MemoryStream

I'm trying to append some data to a stream. This works well with FileStream, but not for MemoryStream due to the fixed buffer size.
The method which writes data to the stream is separated from the method which creates the stream (I've simplified it greatly in the below example). The method which creates the stream is unaware of the length of data to be written to the stream.
public void Foo(){
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
Stream s1 = new FileStream("someFile.txt", FileMode.Append, FileAccess.Write, FileShare.Read);
s1.Write(existingData, 0, existingData.Length);
Stream s2 = new MemoryStream(existingData, 0, existingData.Length, true);
s2.Seek(0, SeekOrigin.End); //move to end of the stream for appending
WriteUnknownDataToStream(s1);
WriteUnknownDataToStream(s2); // NotSupportedException is thrown as the MemoryStream is not expandable
}
public static void WriteUnknownDataToStream(Stream s)
{
// this is some example data for this SO query - the real data is generated elsewhere and is of a variable, and often large, size.
byte[] newBytesToWrite = System.Text.Encoding.UTF8.GetBytes("bar"); // the length of this is not known before the stream is created.
s.Write(newBytesToWrite, 0, newBytesToWrite.Length);
}
An idea I had was to send an expandable MemoryStream to the function, then append the returned data to the existing data.
public void ModifiedFoo()
{
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
Stream s2 = new MemoryStream(); // expandable capacity memory stream
WriteUnknownDataToStream(s2);
// append the data which has been written into s2 to the existingData
byte[] buffer = new byte[existingData.Length + s2.Length];
Buffer.BlockCopy(existingData, 0, buffer, 0, existingData.Length);
Stream merger = new MemoryStream(buffer, true);
merger.Seek(existingData.Length, SeekOrigin.Begin);
s2.CopyTo(merger);
}
Any better (more efficient) solutions?
A possible solution is not to limit the capacity of the MemoryStream in the first place.
If you do not know in advance the total number of bytes you will need to write, create a MemoryStream with unspecified capacity and use it for both writes.
byte[] existingData = System.Text.Encoding.UTF8.GetBytes("foo");
MemoryStream ms = new MemoryStream();
ms.Write(existingData, 0, existingData.Length);
WriteUnknownData(ms);
This will no doubt be less performant than initializing a MemoryStream from a byte[], but if you need to continue writing to the stream I believe it is your only option.

.NET Micro Framework, reading files on a device with limited memory

On a ChipworkX device we would read files using:
File.ReadAllBytes(filename);
But if we try that on a NetDuino Plus which has a much smaller amount of memory,
we simply get an OutOfMemoryException.
The files are not that big, but I guess that's all relative in this case (1.5kb max).
What's the correct way to read files on a device like this?
Use a FileStream
using (var fileStream = new FileStream(filename, FileMode.Open))
{
byte[] block = new byte[1024];
int readLength;
while ((readLength = fileStream.Read(block, 0, block.Length)) > 0)
{
Process(block, readLength);
}
}
write your own Process method. The block length of 1024 is just an example, read as big chunks as you can process at a time. You can vary that depending on the data.
I am assuming that you believe that there should be sufficient memory. If this is so, I suspect that internal default buffer sizes are blowing things. Try explicitly stating buffer sizes when opening the file to keep it tight to the actual file length:
string path = //some path
byte[] buffer;
int bufferSize = (int)new FileInfo(path).Length;
using (FileStream fs = new FileStream(
path, FileMode.Open, FileAccess.Read, FileShare.Read, bufferSize))
{
buffer = new byte[bufferSize];
fs.Read(buffer, 0, buffer.Length);
}
//do stuff with buffer
When you are using a device with limited memory, it is a good idea to use a buffer that is the size of a sector. What you are doing is trading speed for memory. When you have little memory, you must do things more slowly and a sector is the smallest unit you can use that makes any sense.
I suggest a buffer of 512 bytes.

Categories

Resources