Creating ZIP file larger than 150MB throws OutOfMemoryException C# - c#

There is a client requirement to create a zip which consist of multiple files placed inside tree-like structured folders. It contains upto 150 files at maximum. When these files exceed approximately 160MB in memory stream OutOfMemoryException is thrown.
List item
Is there a config to increase memory allocated to this operation?
Any other alternative ways to solve this?
Sample code
MemoryStream memStream = new MemoryStream();
using (var zipStream = new ZipOutputStream(memStream))
{
foreach (FileModel fileToBeAddedInZip in listOfFiles)
{
byte[] fileBytes;
fileBytes = //Read the file from DB
ZipEntry fileEntry = null;
fileEntry = new ZipEntry(fileToBeAddedInZip.fileName)
{
Size = fileBytes.Length
};
zipStream.PutNextEntry(fileEntry);
zipStream.Write(fileBytes, 0, fileBytes.Length);
}
}
zipStream.Flush();
zipStream.Close();

Related

Out of memory exception when adding data to the database using Entity Framework

I have 27000 images that are stored in a folder and which need to be added to the database using EntityFramework. I have a code
var files = Directory.GetFiles(path, "*", SearchOption.AllDirectories);
foreach(var file in files)
{
using (ApplicationContext db = new ApplicationContext())
{
Image img = Image.FromFile(file);
var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height);
MemoryStream memoryStream = new MemoryStream();
img.Save(memoryStream, ImageFormat.Png);
var label = Directory.GetParent(file).Name;
var bytes = memoryStream.ToArray();
memoryStream.Close();
db.Add(new ImageData { Image = bytes, Label = label });
img.Dispose();
memoryStream.Dispose();
imgRes.Dispose();
}
}
it only works when there are less than 10,000 images otherwise I get Out of memory exception.
how can i upload my 27000 images to the database.
First of all, this code doesn't deal with entities or objects, so using an ORM doesn't help at all. This doesn't cause the OOM though, it only makes the code a lot slower.
The real problem is that MemoryStream is actually a wrapper around a buffer. Once the buffer is full, a new one is reallocated with double the size, the original data are copied over and the old buffer deleted. Growing a 50MB byte buffer this way results in a lot of reallocations, log2(50M). This fragments the free memory to the point the runtime can no longer allocate a large enough contiguous buffer. This results in OOMs with List<T> objects too, not just MemoryStreams.
The quick fix would be to pass the expected size as the stream's capacity through the MemoryStream(Int32) constructor. This cuts down on reallocations and saves a lot of CPU cycles. The number doesn't have to be exact, just large enough to avoid too much garbage :
using(Image img = Image.FromFile(file))
using(var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height))
using(var memoryStream = new MemoryStream(10_000_000))
{
img.Save(memoryStream, ImageFormat.Png);
var label = Directory.GetParent(file).Name;
var bytes = memoryStream.ToArray();
db.Add(new ImageData { Image = bytes, Label = label });
}
There's no need to close MemoryStream, it's just a wrapper over an array. That still allocates a big buffer for each file though.
If we know the maximum file size, we can allocate a single buffer and reuse it in all iterations. In this case the size matters - it's no longer possible to resize the buffer :
var buffer=new byte[100_000_000];
using(Image img = Image.FromFile(file))
using(var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height))
using(var memoryStream = new MemoryStream(buffer))
{
img.Save(memoryStream, ImageFormat.Png);
var label = Directory.GetParent(file).Name;
var bytes = memoryStream.ToArray();
db.Add(new ImageData { Image = bytes, Label = label });
}
Try to avoid creating ApplicationContext in foreach loop:
using (ApplicationContext db = new ApplicationContext())
{
foreach(var file in files)
{
using (MemoryStream ms = new MemoryStream())
using(Image img = Image.FromFile(file))
using(var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height))
{
var imgRes = ResizeImage(img, ImageSettings.Width, ImageSettings.Height);
MemoryStream memoryStream = new MemoryStream();
img.Save(memoryStream, ImageFormat.Png);
var label = Directory.GetParent(file).Name;
var bytes = memoryStream.ToArray();
db.Add(new ImageData { Image = bytes, Label = label });
}
}
}

Zipping a number of potentially large files in chunks to avoid large memory consumption

I am working on an application that can take a list of file keys to files on AWS S3 as input and then create a zip file back on AWS S3 with all of those files inside. The compression part does not matter - the important part is to have a single zip file containing all of the other files.
To be able to run the application on a server without a lot of memory or file storage space, I was thinking of using the API that allows fetching a byte range from a file on S3: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html for downloading the files in chunks, and then add them to the zip file and upload the chunk using the multipart upload API: https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html
I have tried to make a small sample app, that will simulate how it could work (without actually calling the S3 APIs yet), but it gets stuck on this line: "await zipStream.WriteAsync(inBuffer, 0, currentChunk);"
public static async Task Main(string[] args)
{
const int ChunkSize = 5 * 1024 * 1024;
using (var fileOutputStream = new FileStream("/Users/SPE/Downloads/BG_K01.zip", FileMode.Create))
{
using (var fileInputStream = File.Open("/Users/SPE/Downloads/BG_K01.rvt", FileMode.Open))
{
long fileSize = new FileInfo("/Users/SPE/Downloads/BG_K01.rvt").Length;
int readBytes = 0;
using (AnonymousPipeServerStream pipeServer = new AnonymousPipeServerStream())
{
using (AnonymousPipeClientStream pipeClient = new AnonymousPipeClientStream(pipeServer.GetClientHandleAsString()))
{
using (var zipArchive = new ZipArchive(pipeServer, ZipArchiveMode.Create, true))
{
var zipEntry = zipArchive.CreateEntry("BG_K01.rvt", CompressionLevel.NoCompression);
using (var zipStream = zipEntry.Open())
{
// Simulate receiving and sending a chunk of bytes
while (readBytes < fileSize)
{
var currentChunk = (int)Math.Min(ChunkSize, fileSize - readBytes);
var inBuffer = new byte[currentChunk];
var outBuffer = new byte[currentChunk];
await fileInputStream.ReadAsync(inBuffer, 0, currentChunk);
await zipStream.WriteAsync(inBuffer, 0, currentChunk);
await pipeClient.ReadAsync(outBuffer, 0, currentChunk);
await fileOutputStream.WriteAsync(outBuffer, 0, currentChunk);
readBytes += currentChunk;
}
}
}
}
}
}
}
}
I am also not sure if using the pipe streams is the best way to do this, but my hope is that they will release any memory consumed once the stream has been read, and thereby keep the memory consumption very low.
Does anybody know why writing to the zipStream hangs?

Compressing C# objects using SharpZipLib?

Is it possible to compress a List<T> using SharpZipLib?
Turning the List to a byte array gives me around 60000 bytes (uncompressed).
Compression this with System.IO.Compression.DeflateStream gives me around a 1/3 compression rate but this is far from enough.
The purpose is to store the collections in the (MS SQL) database as a byte[] because saving them as individual rows uses to much space (1 million rows/day).
Thanks
Edit:
List<ItemLog> itemLogs = new List<ItemLog>();
//populate with 1000 ItemLogs
byte[] array = null; //original byte array
BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, itemLogs);
array = ms.ToArray();
the array size is now 60000 bytes
Zip the collection using a ZipOutputStream
MemoryStream outputMemoryStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemoryStream);
zipStream.SetLevel(3);
ZipEntry entry = new ZipEntry("logs");
entry.DateTime = DateTime.Now;
zipStream.PutNextEntry(entry);
StreamUtils.Copy(ms, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false;
zipStream.Close();
outputMemoryStream.Position = 0;
byte[] compressed = outputMemoryStream.ToArray();
The compressed is now 164 bytes in size. <- length not valid/possible?
Uncompressing gives me a empty array. But as the compression is not right I will skip the uncompression code for now.
I do not see any real problem in your code. The only part, where the problem can be is copying of data. Is the stream of input data at the start of data, that should be stored? Try add the following line:
ms.Seek(0, SeekOrigin.Begin); // added line
StreamUtils.Copy(ms, zipStream, new byte[4096]);
Based on your code I wrote a simple compress function and it works as expected.
private static byte[] Compress(byte[] source)
{
byte[] compressed;
using (var memory = new MemoryStream())
using (var zipped = new ZipOutputStream(memory))
{
zipped.IsStreamOwner = false;
zipped.SetLevel(9);
var entry = new ZipEntry("data")
{
DateTime = DateTime.Now
};
zipped.PutNextEntry(entry);
#if true
zipped.Write(source, 0, source.Length);
#else
using (var src = new MemoryStream(source))
{
StreamUtils.Copy(src, zipped, new byte[4096]);
}
#endif
zipped.Close();
compressed = memory.ToArray();
}
#if false
using (var file = new FileStream("test.zip", FileMode.Create, FileAccess.Write, FileShare.Read))
{
file.Write(compressed, 0, compressed.Length);
}
#endif
return compressed;
}
You have there alternatives how to save the output (array or stream) and there is disabled code to save compressed data to file (to check in a external application the real content of compressed data).
My tested data were a 256 bytes long (data, with small compression rate), and the result was 407 bytes (file).
Try to use array, or check the stream content that is saved.

Zip files and attach them to MailMessage without saving a file

I'm working on a little C# ASP.NET web app that pulls 3 files from my server, creates a zip of those files, and sends the zip file to an e-mail recipient.
The problem I'm having is finding a way to combine those 3 files without creating a zip file on the hard drive of the server. I think I need to use some sort of memorystream or filestream, but I'm in a little beyond my understanding when it comes to merging them into 1 zip file. I've tried SharpZipLib and DotNetZip, but I haven't been able to figure it out.
The reason I don't want the zip saved locally is that there might be a number of users on this app at once, and I don't want to clog up my server machine with those zips. I'm looking for 2 answers, how to zip files without saving the zip as a file, and how to attach that zip to a MailMessage.
Check this example for SharpZipLib:
https://github.com/icsharpcode/SharpZipLib/wiki/Zip-Samples#wiki-anchorMemory
using ICSharpCode.SharpZipLib.Zip;
// Compresses the supplied memory stream, naming it as zipEntryName, into a zip,
// which is returned as a memory stream or a byte array.
//
public MemoryStream CreateToMemoryStream(MemoryStream memStreamIn, string zipEntryName) {
MemoryStream outputMemStream = new MemoryStream();
ZipOutputStream zipStream = new ZipOutputStream(outputMemStream);
zipStream.SetLevel(3); //0-9, 9 being the highest level of compression
ZipEntry newEntry = new ZipEntry(zipEntryName);
newEntry.DateTime = DateTime.Now;
zipStream.PutNextEntry(newEntry);
StreamUtils.Copy(memStreamIn, zipStream, new byte[4096]);
zipStream.CloseEntry();
zipStream.IsStreamOwner = false; // False stops the Close also Closing the underlying stream.
zipStream.Close(); // Must finish the ZipOutputStream before using outputMemStream.
outputMemStream.Position = 0;
return outputMemStream;
// Alternative outputs:
// ToArray is the cleaner and easiest to use correctly with the penalty of duplicating allocated memory.
byte[] byteArrayOut = outputMemStream.ToArray();
// GetBuffer returns a raw buffer raw and so you need to account for the true length yourself.
byte[] byteArrayOut = outputMemStream.GetBuffer();
long len = outputMemStream.Length;
}
Try this:
public static Attachment CreateAttachment(string fileNameAndPath, bool zipIfTooLarge = true, int bytes = 1 << 20)
{
if (!zipIfTooLarge)
{
return new Attachment(fileNameAndPath);
}
var fileInfo = new FileInfo(fileNameAndPath);
// Less than 1Mb just attach as is.
if (fileInfo.Length < bytes)
{
var attachment = new Attachment(fileNameAndPath);
return attachment;
}
byte[] fileBytes = File.ReadAllBytes(fileNameAndPath);
using (var memoryStream = new MemoryStream())
{
string fileName = Path.GetFileName(fileNameAndPath);
using (var zipArchive = new ZipArchive(memoryStream, ZipArchiveMode.Create))
{
ZipArchiveEntry zipArchiveEntry = zipArchive.CreateEntry(fileName, CompressionLevel.Optimal);
using (var streamWriter = new StreamWriter(zipArchiveEntry.Open()))
{
streamWriter.Write(Encoding.Default.GetString(fileBytes));
}
}
var attachmentStream = new MemoryStream(memoryStream.ToArray());
string zipname = $"{Path.GetFileNameWithoutExtension(fileName)}.zip";
var attachment = new Attachment(attachmentStream, zipname, MediaTypeNames.Application.Zip);
return attachment;
}
}

Create new FileStream out of a byte array

I am attempting to create a new FileStream object from a byte array. I'm sure that made no sense at all so I will try to explain in further detail below.
Tasks I am completing:
1) Reading the source file which was previously compressed
2) Decompressing the data using GZipStream
3) copying the decompressed data into a byte array.
What I would like to change:
1) I would like to be able to use File.ReadAllBytes to read the decompressed data.
2) I would then like to create a new filestream object usingg this byte array.
In short, I want to do this entire operating using byte arrays. One of the parameters for GZipStream is a stream of some sort, so I figured I was stuck using a filestream. But, if some method exists where I can create a new instance of a FileStream from a byte array - then I should be fine.
Here is what I have so far:
FolderBrowserDialog fbd = new FolderBrowserDialog(); // Shows a browser dialog
fbd.ShowDialog();
// Path to directory of files to compress and decompress.
string dirpath = fbd.SelectedPath;
DirectoryInfo di = new DirectoryInfo(dirpath);
foreach (FileInfo fi in di.GetFiles())
{
zip.Program.Decompress(fi);
}
// Get the stream of the source file.
using (FileStream inFile = fi.OpenRead())
{
//Create the decompressed file.
string outfile = #"C:\Decompressed.exe";
{
using (GZipStream Decompress = new GZipStream(inFile,
CompressionMode.Decompress))
{
byte[] b = new byte[blen.Length];
Decompress.Read(b,0,b.Length);
File.WriteAllBytes(outfile, b);
}
}
}
Thanks for any help!
Regards,
Evan
It sounds like you need to use a MemoryStream.
Since you don't know how many bytes you'll be reading from the GZipStream, you can't really allocate an array for it. You need to read it all into a byte array and then use a MemoryStream to decompress.
const int BufferSize = 65536;
byte[] compressedBytes = File.ReadAllBytes("compressedFilename");
// create memory stream
using (var mstrm = new MemoryStream(compressedBytes))
{
using(var inStream = new GzipStream(mstrm, CompressionMode.Decompress))
{
using (var outStream = File.Create("outputfilename"))
{
var buffer = new byte[BufferSize];
int bytesRead;
while ((bytesRead = inStream.Read(buffer, 0, BufferSize)) != 0)
{
outStream.Write(buffer, 0, bytesRead);
}
}
}
}
Here is what I ended up doing. I realize that I did not give sufficient information in my question - and I apologize for that - but I do know the size of the file I need to decompress as I am using it earlier in my program. This buffer is referred to as "blen".
string fi = #"C:\Path To Compressed File";
// Get the stream of the source file.
// using (FileStream inFile = fi.OpenRead())
using (MemoryStream infile1 = new MemoryStream(File.ReadAllBytes(fi)))
{
//Create the decompressed file.
string outfile = #"C:\Decompressed.exe";
{
using (GZipStream Decompress = new GZipStream(infile1,
CompressionMode.Decompress))
{
byte[] b = new byte[blen.Length];
Decompress.Read(b,0,b.Length);
File.WriteAllBytes(outfile, b);
}
}
}

Categories

Resources