Creating List of object with byte array : OutOfMemoryException - c#

I have a .NET Core 1.1 Application that is having a problem when generating a List of objects that have a byte array in them. If there are more than 20 items in the list (arbitrary, I'm not sure of the exact number or size at which it fails) the method throws the OutOfMemoryException. The method is below:
public async Task<List<Blob>> GetBlobsAsync(string container)
{
List<Blob> retVal = new List<Blob>();
Blob itrBlob;
BlobContinuationToken continuationToken = null;
BlobResultSegment resultSegment = null;
CloudBlobContainer cont = _cbc.GetContainerReference(container);
resultSegment = await cont.ListBlobsSegmentedAsync(String.Empty, true, BlobListingDetails.Metadata, null, continuationToken, null, null);
do
{
foreach (var bItem in resultSegment.Results)
{
var iBlob = bItem as CloudBlockBlob;
itrBlob = new Blob()
{
Contents = new byte[iBlob.Properties.Length],
Name = iBlob.Name,
ContentType = iBlob.Properties.ContentType
};
await iBlob.DownloadToByteArrayAsync(itrBlob.Contents, 0);
retVal.Add(itrBlob);
}
continuationToken = resultSegment.ContinuationToken;
} while (continuationToken != null);
return retVal;
}
I'm not using anything that can really be disposed in the method. Is there a better way to accomplish this? The ultimate goal is to pull all of these files and then create a ZIP archive. This process works as long as I don't breach some size threshold.
If it helps, the application is accessing Azure Block Blob Storage from an Azure Web Application instance. Maybe there is a setting I need to adjust to increase a threshold?
The exception is thrown when the Blob() object is instantiated.
EDIT:
So the question as posted was admittedly weak in the way of detail. The problem container has 30 files (mostly large text files that compress well). The total size of the container is 971MB. The request runs for approximately 40 seconds before reporting an HTTP 500 error and the referenced exception.
When I debug locally and step through the same operation it succeeds, resulting in a 237MB zip file. During the operation I can see the memory usage shoot over 2GB by the time the list is created.
I tried to abstract the interaction of the blob storage to its own service, but perhaps I've made this more difficult on myself than is necessary.

Found these two code samples that illustrate the concept very well that supports your use case.
get list of block blobs in blob container and create ZipOutputStream on-the-fly
add each block blob to a ZipOutputStream (SharpZipLib) writing to Response.OutputStream
ZIP compression level:
zipOutputStream.SetLevel(3); //0-9, 9 being the highest level of compression
End-to-end example using ASP.NET WebApi
adding Zip feature can be added in this well structured application
Further reading
https://www.strathweb.com/2012/09/dealing-with-large-files-in-asp-net-web-api/
https://www.strathweb.com/2013/01/asynchronously-streaming-video-with-asp-net-web-api/
WebAPI StreamContent vs PushStreamContent

Using Sascha's answer, I was able to make a compromise that seems to perform decently given the parameters. Probably not perfect, but it cuts the memory usage by nearly 70% and allows me to keep some abstraction.
I added a method to my blob service called GetBlobsAsZipAsync that accepts a container name as an argument:
public async Task<Stream> GetBlobsAsZipAsync(string container)
{
BlobContinuationToken continuationToken = null;
BlobResultSegment resultSegment = null;
byte[] buffer = new byte[4194304];
MemoryStream ms = new MemoryStream();
CloudBlobContainer cont = _cbc.GetContainerReference(container);
resultSegment = await cont.ListBlobsSegmentedAsync(String.Empty, true, BlobListingDetails.Metadata, null, continuationToken, null, null);
using (var za = new ZipArchive(ms, ZipArchiveMode.Create, true))
{
do
{
foreach (var bItem in resultSegment.Results)
{
var iBlob = bItem as CloudBlockBlob;
var ze = za.CreateEntry(iBlob.Name);
using (var fs = await iBlob.OpenReadAsync())
{
using (var dest = ze.Open())
{
int count = await fs.ReadAsync(buffer, 0, buffer.Length);
while (count > 0)
{
await dest.WriteAsync(buffer, 0, count);
count = await fs.ReadAsync(buffer, 0, buffer.Length);
}
}
}
}
continuationToken = resultSegment.ContinuationToken;
} while (continuationToken != null);
}
return ms;
}
This returns the Zip as a (closed) MemoryStream that is then returned as an Array using a FileResult:
[HttpPost]
public async Task<IActionResult> DownloadFiles(string container, int projectId, int? profileId)
{
MemoryStream ms = null;
_ctx.Add(new ProjectDownload() { ProfileId = profileId, ProjectId = projectId });
await _ctx.SaveChangesAsync();
using (ms = (MemoryStream)await _blobs.GetBlobsAsZipAsync(container))
{
return File(ms.ToArray(), "application/zip", "download.zip");
}
}
I hope this is useful to someone else who just needs a push in the right direction. I took a lazy way out on this originally and it came back to bite me.

Related

Extract the file header signature as it is being streamed directly to disk in ASP.NET Core

I have an API method that streams uploaded files directly to disk to be scanned with a virus checker. Some of these files can be quite large, so IFormFile is a no go:
Any single buffered file exceeding 64 KB is moved from memory to a
temp file on disk.
Source: https://learn.microsoft.com/en-us/aspnet/core/mvc/models/file-uploads?view=aspnetcore-3.1
I have a working example that uses multipart/form-data and a really nice NuGet package that takes out the headache when working with multipart/form-data, and it works well, however I want to add a file header signature check, to make sure that the file type defined by the client is actually what they say it is. I can't rely on the file extension to do this securely, but I can use the file header signature to make it at least a bit more secure. Since I'm am streaming directly to disk, how can I extract the first bytes as it's going through the file stream?
[DisableFormValueModelBinding] // required for form binding
[ValidateMimeMultipartContent] // simple check to make sure this is a multipart form
[FileUploadOperation(typeof(SwaggerFileItem))] // used to define the Swagger schema
[RequestSizeLimit(31457280)] // 30MB
[RequestFormLimits(MultipartBodyLengthLimit = 31457280)]
public async Task<IActionResult> PostAsync([FromRoute] int customerId)
{
// place holders
var uploadLocation = string.Empty;
var trustedFileNameForDisplay = string.Empty;
// this is using a nuget package that does the hard work on reading the multipart form-data.... using UploadStream;
var model = await this.StreamFiles<FileItem>(async x =>
{
// never trust the client
trustedFileNameForDisplay = WebUtility.HtmlEncode(Path.GetFileName(x.FileName));
// determien the quarantine location
uploadLocation = GetUploadLocation(trustedFileNameForDisplay);
// stream the input stream to the file stream
// importantly this should never load the file into memory
// it should be a straight pass through to disk
await using var fs = System.IO.File.Create(uploadLocation, BufSize);
// --> How do I extract the file signature? I.e. a copy of the header bytes as it is being streamed??? <--
await x.OpenReadStream().CopyToAsync(fs);
});
// The model state can now be checked
if (!ModelState.IsValid)
{
// delete the file
DeleteFileIfExists(uploadLocation);
// return a bad request
ThrowProblemDetails(ModelState, StatusCodes.Status400BadRequest);
}
// map as much as we can
var request = _mapper.Map<CreateAttachmentRequest>(model);
// map the remaining properties
request.CustomerId = customerId;
request.UploadServer = Environment.MachineName;
request.uploadLocation = uploadLocation;
request.FileName = trustedFileNameForDisplay;
// call mediator with this request to send it over WCF to Pulse Core.
var result = await _mediator.Send(request);
// build response
var response = new FileResponse { Id = result.FileId, CustomerId = customerId, ExternalId = request.ExternalId };
// return the 201 with the appropriate response
return CreatedAtAction(nameof(GetFile), new { fileId = response.Id, customerId = response.customerId }, response);
}
The section I'm stuck on is around the line await x.OpenReadStream().CopyToAsync(fs);. I would like to pull out the file header here as the stream is being copied to the FileStream. Is there a way to add some kind of inspector? I don't want to read the entire stream again, just the header.
Update
Based on the answer given by #Ackdari I have successfully switched the code to extract the header from the uploaded file stream. I don't know if this could be made any more efficient, but it does work:
//...... removed for clarity
var model = await this.StreamFiles<FileItem>(async x =>
{
trustedFileNameForDisplay = WebUtility.HtmlEncode(Path.GetFileName(x.FileName));
quarantineLocation = QuarantineLocation(trustedFileNameForDisplay);
await using (var fs = System.IO.File.Create(quarantineLocation, BufSize))
{
await x.OpenReadStream().CopyToAsync(fs);
fileFormat = await FileHelpers.GetFileFormatFromFileHeader(fs);
}
});
//...... removed for clarity
and
// using https://github.com/AJMitev/FileTypeChecker
public static async Task<IFileType> GetFileFormatFromFileHeader(FileStream fs)
{
IFileType fileFormat = null;
fs.Position = 0;
var headerData = new byte[40];
var bytesRead = await fs.ReadAsync(headerData, 0, 40);
if (bytesRead > 0)
{
await using (var ms = new MemoryStream(headerData))
{
if (!FileTypeValidator.IsTypeRecognizable(ms))
{
return null;
}
fileFormat = FileTypeValidator.GetFileType(ms);
}
}
return fileFormat;
}
You may want to consider reading the header yourself dependent on which file type is expected
int n = 4; // length of header
var headerData = new byte[n];
var bytesRead = 0;
while (bytesRead < n)
bytesRead += await x.ReadAsync(headerData.AsMemory(bytesRead));
CheckHeader(headerData);
await fs.WriteAsync(headerData.AsMemory());
await x.CopyToAsync(fs);

Unable to stream file upload through controller method

I'm trying to effectively proxy a file upload via an ASP.NET Core 5 MVC controller to another API:
[DisableFormValueModelBinding]
[HttpPost]
public async Task<IActionResult> Upload()
{
var reader = new MultipartReader(Request.GetMultipartBoundary(), Request.Body);
MultipartSection section;
while ((section = await reader.ReadNextSectionAsync().ConfigureAwait(false)) != null)
{
if (section.ContentType == "application/json")
{
await SendFile(section.Body);
}
}
return View("Upload");
}
private async Task SendFile(Stream stream)
{
var request = new HttpRequestMessage(HttpMethod.Post, "http://blah/upload");
request.Content = new StreamContent(stream);
var response = await httpClient.SendAsync(request);
}
However, the receiving API always gets an empty stream.
I can confirm the SendFile method works as the following test works from within the controller method:
using (var fs = new FileStream("test.json", FileMode.Open))
{
await SendFile(fs);
}
And I can see the uploaded file if I try to read it in the controller:
var buf = new char[256];
using (var sr = new StreamReader(section.Body))
{
var x = await sr.ReadAsync(buf, 0, buf.Length);
while (x > 0)
{
log.Debug(new string(buf));
x = await sr.ReadAsync(buf, 0, buf.Length);
}
}
So both ends seem to work, just not together.
I have EnableBuffering set:
app.Use(next => context =>
{
context.Request.EnableBuffering();
return next(context);
});
And I'm disabling binding of the uploaded files to the model using the DisableFormValueModelBindingAttribute example from Upload files in ASP.NET Core
I've also tried rewinding the stream manually using Seek, but it doesn't make a difference.
It works if I copy it through a MemoryStream:
using (var ms = new MemoryStream())
{
await section.Body.CopyToAsync(ms);
await ms.FlushAsync();
ms.Seek(0, SeekOrigin.Begin);
await SendFile(ms);
}
However, this buffers the file in memory which is not suitable for large files.
It also works if I read the uploaded file first, rewind and then try:
var buf = new char[256];
using (var sr = new StreamReader(section.Body))
{
var x = await sr.ReadAsync(buf, 0, buf.Length);
while (x > 0)
{
log.Debug(new string(buf));
x = await sr.ReadAsync(buf, 0, buf.Length);
}
}
section.Body.Seek(0, SeekOrigin.Begin);
// this works now:
await SendFile(section.Body);
Again, this is not suitable for large files.
It seems the stream is not in the correct state to be consumed by my SendFile method but I cannot see why.
UPDATE
Based on comments from Jeremy Lakeman I took a closer look at what was happening with the stream length.
I discovered that removing EnableBuffering makes it work as expected, so the issue is sort of resolved by that.
However, I came across this aspnetcore Github comment where a contributor states that:
We don't support flowing the Request Body through as a stream to HttpClient.
That and the other comments in that issue support Jeremy's comments about CanSeek and the stream length, and it's unclear (to me) whether this should actually work and whether it's just a coincidence that it now does (i.e. will I get hit with another gotcha later).
In this specific scenario with MIME multipart, where we don't know the stream length without buffering/counting the whole file, is there an alternative to StreamContent or a different way to handle the file upload?
The Microsoft docs page Upload files in ASP.NET Core advises only to use an alternative approach. It talks about streaming uploads, however, it stops short of properly consuming the stream and just buffers the file into a MemoryStream (completely defeating the purpose of streaming)

Executing a process on IIS makes RAM goes up really quick

I built an ASP.NET MVC API hosted on IIS on Windows 10 Pro (VM on Azure - 4GB RAM, 2CPU). Within I call an .exe (wkhtmltopdf) that I want to convert an HTML page to image and save it locally. Everything works fine, except I noticed that after some calls to the API, the RAM goes crazy and while investigating the process with Task Manager I saw a process, called IIS Worker Process, that adds more RAM every time the API is called. Of course I wrapped my System.Diagnostics.Process instance usage inside a using statement to be disposed, because IDisposable is implemented, but it still consumes more and more RAM and after a while the server becomes laggy and unresponsive (it has only 4GB of RAM after all). I noticed that after some number of minutes (10-15-20 maybe) this IIS Worker Process calms down in terms of RAM usage... Here is my code, pretty straight forward:
Gets base64 encoded url
Decodes it
Uses wkhtmltoimage.exe to convert it to image
Saves it locally
Reads the byte array
Creates a blob in Azure with the image
Returns json with the url
public async Task<ActionResult> Index(string url)
{
object oJSON = new { url = string.Empty };
if (!string.IsNullOrEmpty(value: url))
{
try
{
byte[] EncodedData = Convert.FromBase64String(s: url);
string DecodedURL = Encoding.UTF8.GetString(bytes: EncodedData);
using (Process proc = new Process())
{
proc.StartInfo.FileName = wkhtmltopdfExecutablePath;
proc.StartInfo.Arguments = $"--encoding utf-8 \"{DecodedURL}\" {LocalImageFilePath}";
proc.Start();
proc.WaitForExit();
oJSON = new { procStatusCode = proc.ExitCode };
}
if (System.IO.File.Exists(path: LocalImageFilePath))
{
byte[] pngBytes = System.IO.File.ReadAllBytes(path: LocalImageFilePath);
System.IO.File.Delete(path: LocalImageFilePath);
string ImageURL = await CreateBlob(blobName: $"{BlobName}.png", data: pngBytes);
oJSON = new { url = ImageURL };
}
}
catch (Exception ex)
{
Debug.WriteLine(value: ex);
}
}
return Json(data: oJSON, behavior: JsonRequestBehavior.AllowGet);
}
private async Task<string> CreateBlob(string blobName, byte[] data)
{
string ConnectionString = "DefaultEndpointsProtocol=https;AccountName=" + AzureStorrageAccountName + ";AccountKey=" + AzureStorageAccessKey + ";EndpointSuffix=core.windows.net";
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(connectionString: ConnectionString);
CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();
CloudBlobContainer cloudBlobContainer = cloudBlobClient.GetContainerReference(containerName: AzureBlobContainer);
await cloudBlobContainer.CreateIfNotExistsAsync();
BlobContainerPermissions blobContainerPermissions = await cloudBlobContainer.GetPermissionsAsync();
blobContainerPermissions.PublicAccess = BlobContainerPublicAccessType.Container;
await cloudBlobContainer.SetPermissionsAsync(permissions: blobContainerPermissions);
CloudBlockBlob cloudBlockBlob = cloudBlobContainer.GetBlockBlobReference(blobName: blobName);
cloudBlockBlob.Properties.ContentType = "image/png";
using (Stream stream = new MemoryStream(buffer: data))
{
await cloudBlockBlob.UploadFromStreamAsync(source: stream);
}
return cloudBlockBlob.Uri.AbsoluteUri;
}
Here are the resources I'm reading somehow related to this issue IMO, but are not helping much:
Investigating ASP.Net Memory Dumps for Idiots (like Me)
ASP.NET app eating memory. Application / Session objects the reason?
IIS Worker Process using a LOT of memory?
Run dispose method upon asp.net IIS app restart
IIS: Idle Timeout vs Recycle
UPDATE:
if (System.IO.File.Exists(path: LocalImageFilePath))
{
string BlobName = Guid.NewGuid().ToString(format: "n");
string ImageURL = string.Empty;
using(FileStream fileStream = new FileStream(LocalImageFilePath, FileMode.Open)
{
ImageURL = await CreateBlob(blobName: $"{BlobName}.png", dataStream: fileStream);
}
System.IO.File.Delete(path: LocalImageFilePath);
oJSON = new { url = ImageURL };
}
The most likely cause of your pain is the allocation of large byte arrays:
byte[] pngBytes = System.IO.File.ReadAllBytes(path: LocalImageFilePath);
The easiest change to make, to try and encourage the GC to collect the Large Object Heap more often, is to set GCSettings.LargeObjectHeapCompactionMode to CompactOnce at the end of the method. That might help.
But, a better idea would be to remove the need for the large array altogether. To do this, change:
private async Task<string> CreateBlob(string blobName, byte[] data)
to instead be:
private async Task<string> CreateBlob(string blobName, FileStream data)
And then later use:
await cloudBlockBlob.UploadFromStreamAsync(source: data);
In the caller, you'll need to stop using ReadAllBytes, and instead use a FileStream to read the file instead.

Asp.Net Core 2 + Google Cloud Storage download Memory Stream

I'm working on an Asp.Net Core 2 Web Api and I have to make an endpoint to download the file. This file is not public, so I cannot use the MediaLink property of the google storage object. I'm using their C# library.
In the piece of code you will see bellow _storageClient was created like this: _storageClient = StorageClient.Create(cred);. The client is working, just showing which class it is.
[HttpGet("DownloadFile/{clientId}/{fileId}")]
public async Task<IActionResult> DownloadFile([FromRoute] long fileId, long clientId)
{
// here there are a bunch of logic and permissions. Not relevant to the quest
var stream = new MemoryStream();
try
{
stream.Position = 0;
var obj = _storageClient.GetObject("bucket name here", "file.png");
_storageClient.DownloadObject(obj, stream);
var response = File(stream, obj.ContentType, "file.png"); // FileStreamResult
return response;
}
catch (Exception ex)
{
throw;
}
}
The variable obj comes OK. with all properties filled as expected. The stream seems to be filled properly. it has length and everything, but it returns me a 500 error that I cannot even catch.
I cannot see what I'm doing wrong, maybe how I'm using memory stream but I can;t even catch the error.
Thanks for any help
You're rewinding the stream, but before you've written anything to it - but you're not rewinding it afterwards. I'd expect that to result in an empty response rather than a 500 error, but I'd at least move the stream.Position call to after the download:
var obj = _storageClient.GetObject("bucket name here", "file.png");
_storageClient.DownloadObject(obj, stream);
stream.Position = 0;
Note that you don't need to fetch the object metadata before downloading it. You can just use:
_storageClient.DownloadObject("bucket name here", "file.png", stream);
stream.Position = 0;
Solution can be like below.
[HttpGet("get-file")]
public ActionResult GetFile()
{
var storageClient = ...;
byte[] buffer;
using (var memoryStream = new MemoryStream())
{
storageClient.DownloadObject("bucket name here"+"/my-file.jpg", memoryStream);
buffer = memoryStream.ToArray();
}
return File(buffer, "image/jpeg", "my-file.jpg");
}

Creating zipFile exhausts memory

I have ASP.NET Web API project where a user can download some stuff from a database.
My Download controller fetches data from the database instance. Every single result has a blob field which is some kind of data (1).
I want add each result to a ZIP file (2). After all I send the HTTP response adding my stream content.
List<Result> results = m_Repository.GetResultsForResultId(given_id_by_request);
// 1
foreach (Result result in results)
{
string fileName = String.Format("{0}-{1}.bin", id >> 16, result.Id);
zipFile.AddEntry(fileName, result.Value);
}
// 2
PushStreamContent pushStreamContent = new PushStreamContent((stream, content, context) =>
{
zipFile.Save(stream);
stream.Close();
}
response = new HttpResponseMessage(HttpStatusCode.OK) { Content = pushStreamContent };
It works nice! But on big download requests this exhausts my memory. I need to find a way to put a stream into a zip archive bufferless. Can someone please help me?!
As far as I can see from the code you posted, you are not disposing the streams you create after usage. This can add to a great amount of memory being reserved by your app which might cause your problems.
I am using the ZipArchive to put multiple files into a zip file in my web application. The code Looks somewhat like that:
using (var compressedFileStream = new MemoryStream())
{
using (var zipArchive = new ZipArchive(compressedFileStream, ZipArchiveMode.Update, false))
{
foreach (Result result in results)
{
string fileName = String.Format("{0}-{1}.bin", id >> 16, result.Id);
var zipEntry = zipArchive.CreateEntry(fileName);
using (var originalFileStream = new MemoryStream(result.Value))
{
using (var zipEntryStream = zipEntry.Open())
{
originalFileStream.CopyTo(zipEntryStream);
}
}
}
}
return File(compressedFileStream.ToArray(), "application/zip", string.Format("Download_{0:ddMMyyy_hhmm}.zip", DateTime.Now));
}
I am using that code snippet inside an MVC Controller method so you have to adapt the return part for your situation.
The above code works fine in my application for up to 300 entries or 50MB volume (those are the limits set by the requirements for my app).
Hope that helps you.
EDIT: Forgot the closing bracket of the first using block. the return Statement has to be inside this using-block, else the stream will be disposed.

Categories

Resources