I built an ASP.NET MVC API hosted on IIS on Windows 10 Pro (VM on Azure - 4GB RAM, 2CPU). Within I call an .exe (wkhtmltopdf) that I want to convert an HTML page to image and save it locally. Everything works fine, except I noticed that after some calls to the API, the RAM goes crazy and while investigating the process with Task Manager I saw a process, called IIS Worker Process, that adds more RAM every time the API is called. Of course I wrapped my System.Diagnostics.Process instance usage inside a using statement to be disposed, because IDisposable is implemented, but it still consumes more and more RAM and after a while the server becomes laggy and unresponsive (it has only 4GB of RAM after all). I noticed that after some number of minutes (10-15-20 maybe) this IIS Worker Process calms down in terms of RAM usage... Here is my code, pretty straight forward:
Gets base64 encoded url
Decodes it
Uses wkhtmltoimage.exe to convert it to image
Saves it locally
Reads the byte array
Creates a blob in Azure with the image
Returns json with the url
public async Task<ActionResult> Index(string url)
{
object oJSON = new { url = string.Empty };
if (!string.IsNullOrEmpty(value: url))
{
try
{
byte[] EncodedData = Convert.FromBase64String(s: url);
string DecodedURL = Encoding.UTF8.GetString(bytes: EncodedData);
using (Process proc = new Process())
{
proc.StartInfo.FileName = wkhtmltopdfExecutablePath;
proc.StartInfo.Arguments = $"--encoding utf-8 \"{DecodedURL}\" {LocalImageFilePath}";
proc.Start();
proc.WaitForExit();
oJSON = new { procStatusCode = proc.ExitCode };
}
if (System.IO.File.Exists(path: LocalImageFilePath))
{
byte[] pngBytes = System.IO.File.ReadAllBytes(path: LocalImageFilePath);
System.IO.File.Delete(path: LocalImageFilePath);
string ImageURL = await CreateBlob(blobName: $"{BlobName}.png", data: pngBytes);
oJSON = new { url = ImageURL };
}
}
catch (Exception ex)
{
Debug.WriteLine(value: ex);
}
}
return Json(data: oJSON, behavior: JsonRequestBehavior.AllowGet);
}
private async Task<string> CreateBlob(string blobName, byte[] data)
{
string ConnectionString = "DefaultEndpointsProtocol=https;AccountName=" + AzureStorrageAccountName + ";AccountKey=" + AzureStorageAccessKey + ";EndpointSuffix=core.windows.net";
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(connectionString: ConnectionString);
CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();
CloudBlobContainer cloudBlobContainer = cloudBlobClient.GetContainerReference(containerName: AzureBlobContainer);
await cloudBlobContainer.CreateIfNotExistsAsync();
BlobContainerPermissions blobContainerPermissions = await cloudBlobContainer.GetPermissionsAsync();
blobContainerPermissions.PublicAccess = BlobContainerPublicAccessType.Container;
await cloudBlobContainer.SetPermissionsAsync(permissions: blobContainerPermissions);
CloudBlockBlob cloudBlockBlob = cloudBlobContainer.GetBlockBlobReference(blobName: blobName);
cloudBlockBlob.Properties.ContentType = "image/png";
using (Stream stream = new MemoryStream(buffer: data))
{
await cloudBlockBlob.UploadFromStreamAsync(source: stream);
}
return cloudBlockBlob.Uri.AbsoluteUri;
}
Here are the resources I'm reading somehow related to this issue IMO, but are not helping much:
Investigating ASP.Net Memory Dumps for Idiots (like Me)
ASP.NET app eating memory. Application / Session objects the reason?
IIS Worker Process using a LOT of memory?
Run dispose method upon asp.net IIS app restart
IIS: Idle Timeout vs Recycle
UPDATE:
if (System.IO.File.Exists(path: LocalImageFilePath))
{
string BlobName = Guid.NewGuid().ToString(format: "n");
string ImageURL = string.Empty;
using(FileStream fileStream = new FileStream(LocalImageFilePath, FileMode.Open)
{
ImageURL = await CreateBlob(blobName: $"{BlobName}.png", dataStream: fileStream);
}
System.IO.File.Delete(path: LocalImageFilePath);
oJSON = new { url = ImageURL };
}
The most likely cause of your pain is the allocation of large byte arrays:
byte[] pngBytes = System.IO.File.ReadAllBytes(path: LocalImageFilePath);
The easiest change to make, to try and encourage the GC to collect the Large Object Heap more often, is to set GCSettings.LargeObjectHeapCompactionMode to CompactOnce at the end of the method. That might help.
But, a better idea would be to remove the need for the large array altogether. To do this, change:
private async Task<string> CreateBlob(string blobName, byte[] data)
to instead be:
private async Task<string> CreateBlob(string blobName, FileStream data)
And then later use:
await cloudBlockBlob.UploadFromStreamAsync(source: data);
In the caller, you'll need to stop using ReadAllBytes, and instead use a FileStream to read the file instead.
Related
public string SavePath { get; set; } = #"I:\files\";
public void DownloadList(List<string> list)
{
var rest = ExcludeDownloaded(list);
var result = Parallel.ForEach(rest, link=>
{
Download(link);
});
}
private void Download(string link)
{
using(var net = new System.Net.WebClient())
{
var data = net.DownloadData(link);
var fileName = code to generate unique fileName;
if (File.Exists(fileName))
return;
File.WriteAllBytes(fileName, data);
}
}
var downloader = new DownloaderService();
var links = downloader.GetLinks();
downloader.DownloadList(links);
I observed the usage of RAM for the project keeps growing
I guess there is something wrong on the Parallel.ForEach(), but I cannot figure it out.
Is there the memory leak, or what is happening?
Update 1
After changed to the new code
private void Download(string link)
{
using(var net = new System.Net.WebClient())
{
var fileName = code to generate unique fileName;
if (File.Exists(fileName))
return;
var data = net.DownloadFile(link, fileName);
Track theTrack = new Track(fileName);
theTrack.Title = GetCDName();
theTrack.Save();
}
}
I still observed increasing memory use after keeping running for 9 hours, it is much slowly growing usage though.
Just wondering, is it because that I didn't free the memory use of theTrack file?
Btw, I use ALT package for update file metadata, unfortunately, it doesn't implement IDisposable interface.
The Parallel.ForEach method is intended for parallelizing CPU-bound workloads. Downloading a file is an I/O bound workload, and so the Parallel.ForEach is not ideal for this case because it needlessly blocks ThreadPool threads. The correct way to do it is asynchronously, with async/await. The recommended class for making asynchronous web requests is the HttpClient, and for controlling the level of concurrency an excellent option is the TPL Dataflow library. For this case it is enough to use the simplest component of this library, the ActionBlock class:
async Task DownloadListAsync(List<string> list)
{
using (var httpClient = new HttpClient())
{
var rest = ExcludeDownloaded(list);
var block = new ActionBlock<string>(async link =>
{
await DownloadFileAsync(httpClient, link);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 10
});
foreach (var link in rest)
{
await block.SendAsync(link);
}
block.Complete();
await block.Completion;
}
}
async Task DownloadFileAsync(HttpClient httpClient, string link)
{
var fileName = Guid.NewGuid().ToString(); // code to generate unique fileName;
var filePath = Path.Combine(SavePath, fileName);
if (File.Exists(filePath)) return;
var response = await httpClient.GetAsync(link);
response.EnsureSuccessStatusCode();
using (var contentStream = await response.Content.ReadAsStreamAsync())
using (var fileStream = new FileStream(filePath, FileMode.Create,
FileAccess.Write, FileShare.None, 32768, FileOptions.Asynchronous))
{
await contentStream.CopyToAsync(fileStream);
}
}
The code for downloading a file with HttpClient is not as simple as the WebClient.DownloadFile(), but it's what you have to do in order to keep the whole process asynchronous (both reading from the web and writing to the disk).
Caveat: Asynchronous filesystem operations are currently not implemented efficiently in .NET. For maximum efficiency it may be preferable to avoid using the FileOptions.Asynchronous option in the FileStream constructor.
.NET 6 update: The preferable way for parallelizing asynchronous work is now the Parallel.ForEachAsync API. A usage example can be found here.
Use WebClient.DownloadFile() to download directly to a file so you don't have the whole file in memory.
This is a bit architecture and code issue. I have a lot of source url's containing huge files that come from many different clients that I have to download and save on filesystem.
I have hardware limits on RAM. So I want to buffer each stream in chunks of bytes and I think it will be good idea to initiate one thread for each downloading of a stream.
I have added a coding for initiating a thread/task using Task Parallel Library as such:
public Task RunTask(Action action)
{
Task task = Task.Run(action);
return task;
}
and I pass for the action parameter the following method:
public void DownloadFileThroughWebStream(WebClient webClient, Uri src, string dest, long buffersize)
{
Stream stream = webClient.OpenRead(src);
byte[] buffer = new byte[buffersize];
int len;
using (BufferedStream bufferedStream = new BufferedStream(stream))
{
using (FileStream fileStream = new FileStream(Path.GetFullPath(dest), FileMode.Create, FileAccess.Write))
{
while ((len = stream.Read(buffer, 0, buffer.Length)) > 0)
{
fileStream.Write(buffer, 0, len);
fileStream.Flush();
}
}
}
}
And I for testing purposes try to download some resources from http uri's as by initiating a thread/task for each specific download:
[Test]
public async Task DownloadSomeStream()
{
Uri uri = new Uri("http://mirrors.standaloneinstaller.com/video-sample/metaxas-keller-Bell.mpeg");
List<Uri> streams = new List<Uri> { uri, uri, uri};
List<Task> tasks = new List<Task>();
var path = "C:\\TMP\\";
//Create task for each of the streams from uri
int c = 1;
foreach (var uri in streams)
{
WebClient webClient = new WebClient();
Task task = taskInitiator.RunTask(() => DownloadFileThroughWebStream(webClient, uri, Path.Combine(path,"File"+c), 8192));
tasks.Add(task);
c++;
}
Task allTasksHaveCompleted = Task.WhenAll(tasks);
await allTasksHaveCompleted;
}
I get the following exception:
System.IO.IOException: 'The process cannot access the file 'D:\TMP\File4' because it is being used by another process'
on line:
using (FileStream fileStream = new FileStream(Path.GetFullPath(dest), FileMode.Create, FileAccess.Write))
So there are two things that i dont understand with this exception:
Why it is not allowed to write? and how another process is allocating the file?
Why do it want to save file4 when I have only added 3 url's, so I only should have files: file1, file2, and file3 ?
Also, other questions that could be nice to get some thoughts on:
Is it right approach what I am doing in regards to what I want to achieve? Am I doing the Task initiations using Task Parallel Library correct?
Any tips and trick, best practices, etc.?
Why it is not allowed to write? and how another process is allocating
the file?
The file is not locked by another process, but by the same process. If you open a file for write, you basically get an exclusive lock for it. When you try to open the file again for writing from another task, it is locked and that is why you get the error.
To handle this case, you should put a lock around writing the data to disk. You should have a separate lock object for every unique file name you are writing to, and be careful to use the proper lock!
Why do it want to save file4 when I have only added 3 url's, so I only
should have files: file1, file2, and file3 ?
This is because you capture the variable c in the delegate you pass to Task.Run. Since these tasks normally start after the loop is over, the value of c is now 4. See here for more information about closures.
We can create download method which can execute downloading:
async Task DownloadFile(string url, string location, string fileName)
{
using (var client = new WebClient())
{
await client.DownloadFileTaskAsync(url, $"{location}{fileName}");
}
}
And the above method can be called by Task.Run() to execute simultaneous download of files:
IList<string> urls = new List<string>()
{
#"http://mirrors.standaloneinstaller.com/video-sample/metaxas-keller-Bell.mpeg",
#"https://...",
#"https://..."
};
string location = "D:";
Directory.CreateDirectory(location);
Task.Run(async () =>
{
var tasks = urls.Select(url =>
{
var fileName = url.Substring(url.LastIndexOf('/'));
return DownloadFile(url, location, fileName);
}).ToArray();
await Task.WhenAll(tasks);
}).GetAwaiter().GetResult();
I have a .NET Core 1.1 Application that is having a problem when generating a List of objects that have a byte array in them. If there are more than 20 items in the list (arbitrary, I'm not sure of the exact number or size at which it fails) the method throws the OutOfMemoryException. The method is below:
public async Task<List<Blob>> GetBlobsAsync(string container)
{
List<Blob> retVal = new List<Blob>();
Blob itrBlob;
BlobContinuationToken continuationToken = null;
BlobResultSegment resultSegment = null;
CloudBlobContainer cont = _cbc.GetContainerReference(container);
resultSegment = await cont.ListBlobsSegmentedAsync(String.Empty, true, BlobListingDetails.Metadata, null, continuationToken, null, null);
do
{
foreach (var bItem in resultSegment.Results)
{
var iBlob = bItem as CloudBlockBlob;
itrBlob = new Blob()
{
Contents = new byte[iBlob.Properties.Length],
Name = iBlob.Name,
ContentType = iBlob.Properties.ContentType
};
await iBlob.DownloadToByteArrayAsync(itrBlob.Contents, 0);
retVal.Add(itrBlob);
}
continuationToken = resultSegment.ContinuationToken;
} while (continuationToken != null);
return retVal;
}
I'm not using anything that can really be disposed in the method. Is there a better way to accomplish this? The ultimate goal is to pull all of these files and then create a ZIP archive. This process works as long as I don't breach some size threshold.
If it helps, the application is accessing Azure Block Blob Storage from an Azure Web Application instance. Maybe there is a setting I need to adjust to increase a threshold?
The exception is thrown when the Blob() object is instantiated.
EDIT:
So the question as posted was admittedly weak in the way of detail. The problem container has 30 files (mostly large text files that compress well). The total size of the container is 971MB. The request runs for approximately 40 seconds before reporting an HTTP 500 error and the referenced exception.
When I debug locally and step through the same operation it succeeds, resulting in a 237MB zip file. During the operation I can see the memory usage shoot over 2GB by the time the list is created.
I tried to abstract the interaction of the blob storage to its own service, but perhaps I've made this more difficult on myself than is necessary.
Found these two code samples that illustrate the concept very well that supports your use case.
get list of block blobs in blob container and create ZipOutputStream on-the-fly
add each block blob to a ZipOutputStream (SharpZipLib) writing to Response.OutputStream
ZIP compression level:
zipOutputStream.SetLevel(3); //0-9, 9 being the highest level of compression
End-to-end example using ASP.NET WebApi
adding Zip feature can be added in this well structured application
Further reading
https://www.strathweb.com/2012/09/dealing-with-large-files-in-asp-net-web-api/
https://www.strathweb.com/2013/01/asynchronously-streaming-video-with-asp-net-web-api/
WebAPI StreamContent vs PushStreamContent
Using Sascha's answer, I was able to make a compromise that seems to perform decently given the parameters. Probably not perfect, but it cuts the memory usage by nearly 70% and allows me to keep some abstraction.
I added a method to my blob service called GetBlobsAsZipAsync that accepts a container name as an argument:
public async Task<Stream> GetBlobsAsZipAsync(string container)
{
BlobContinuationToken continuationToken = null;
BlobResultSegment resultSegment = null;
byte[] buffer = new byte[4194304];
MemoryStream ms = new MemoryStream();
CloudBlobContainer cont = _cbc.GetContainerReference(container);
resultSegment = await cont.ListBlobsSegmentedAsync(String.Empty, true, BlobListingDetails.Metadata, null, continuationToken, null, null);
using (var za = new ZipArchive(ms, ZipArchiveMode.Create, true))
{
do
{
foreach (var bItem in resultSegment.Results)
{
var iBlob = bItem as CloudBlockBlob;
var ze = za.CreateEntry(iBlob.Name);
using (var fs = await iBlob.OpenReadAsync())
{
using (var dest = ze.Open())
{
int count = await fs.ReadAsync(buffer, 0, buffer.Length);
while (count > 0)
{
await dest.WriteAsync(buffer, 0, count);
count = await fs.ReadAsync(buffer, 0, buffer.Length);
}
}
}
}
continuationToken = resultSegment.ContinuationToken;
} while (continuationToken != null);
}
return ms;
}
This returns the Zip as a (closed) MemoryStream that is then returned as an Array using a FileResult:
[HttpPost]
public async Task<IActionResult> DownloadFiles(string container, int projectId, int? profileId)
{
MemoryStream ms = null;
_ctx.Add(new ProjectDownload() { ProfileId = profileId, ProjectId = projectId });
await _ctx.SaveChangesAsync();
using (ms = (MemoryStream)await _blobs.GetBlobsAsZipAsync(container))
{
return File(ms.ToArray(), "application/zip", "download.zip");
}
}
I hope this is useful to someone else who just needs a push in the right direction. I took a lazy way out on this originally and it came back to bite me.
i am trying to get a COMPLETE example of downloading a file form Azure Blob Storage using the .DownloadToStreamAsync() method wired up to a progress bar.
i've found references to old implementations of the azure storage sdk, but they dont compile with the newer sdk (that has implemented these async methods) or don't work with current nuget packages.
https://blogs.msdn.microsoft.com/avkashchauhan/2010/11/03/uploading-a-blob-to-azure-storage-with-progress-bar-and-variable-upload-block-size/
https://blogs.msdn.microsoft.com/kwill/2013/03/05/asynchronous-parallel-blob-transfers-with-progress-change-notification-2-0/
i'm a newbie to async/await threading in .NET, and was wondering if someone could help me out with taking the below (in a windows form app) and showing how i can 'hook' into the progress of the file download... i see some examples dont use the .DownloadToStream method, and they instead download chunks of the blob file.. but i wondered since these new ...Async() methods exist in the newer Storage SDK's, if there was something smarter that could be done?
So assuming the below is working (non async), what additionally would i have to do to use the blockBlob.DownloadToStreamAsync(fileStream); method, is this even the right use of this, and how can i get the progress?
ideally i am after any way i can just hook the progress of the blob download so i can update a Windows Form UI on big downloads.. so if the below is not the right way, please enlighten me :)
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
CloudConfigurationManager.GetSetting("StorageConnectionString"));
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
// Retrieve reference to a blob named "photo1.jpg".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("photo1.jpg");
// Save blob contents to a file.
using (var fileStream = System.IO.File.OpenWrite(#"path\myfile"))
{
blockBlob.DownloadToStream(fileStream);
}
Using the awesome suggested method (downloading 1mb chunks) kindly suggsted by Gaurav, i have implemented using a background worker to do the download so i can update the UI as i go.
The main part inside the do loop that downloads the range to a stream and then writes the stream to the file system I havent touched from the original example, but i have added code to update the worker progress and to listen for the worker cancellation (to abort the download).. not sure if this could be the issue?
For completeness, below is everything inside the worker_DoWork method:
public void worker_DoWork(object sender, DoWorkEventArgs e)
{
object[] parameters = e.Argument as object[];
string localFile = (string)parameters[0];
string blobName = (string)parameters[1];
string blobContainerName = (string)parameters[2];
CloudBlobClient client = (CloudBlobClient)parameters[3];
try
{
int segmentSize = 1 * 1024 * 1024; //1 MB chunk
var blobContainer = client.GetContainerReference(blobContainerName);
var blob = blobContainer.GetBlockBlobReference(blobName);
blob.FetchAttributes();
blobLengthRemaining = blob.Properties.Length;
blobLength = blob.Properties.Length;
long startPosition = 0;
do
{
long blockSize = Math.Min(segmentSize, blobLengthRemaining);
byte[] blobContents = new byte[blockSize];
using (MemoryStream ms = new MemoryStream())
{
blob.DownloadRangeToStream(ms, startPosition, blockSize);
ms.Position = 0;
ms.Read(blobContents, 0, blobContents.Length);
using (FileStream fs = new FileStream(localFile, FileMode.OpenOrCreate))
{
fs.Position = startPosition;
fs.Write(blobContents, 0, blobContents.Length);
}
}
startPosition += blockSize;
blobLengthRemaining -= blockSize;
if (blobLength > 0)
{
decimal totalSize = Convert.ToDecimal(blobLength);
decimal downloaded = totalSize - Convert.ToDecimal(blobLengthRemaining);
decimal blobPercent = (downloaded / totalSize) * 100;
worker.ReportProgress(Convert.ToInt32(blobPercent));
}
if (worker.CancellationPending)
{
e.Cancel = true;
blobDownloadCancelled = true;
return;
}
}
while (blobLengthRemaining > 0);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
This is working, but on bigger files (30mb for example), i sometimes am getting 'can't write to file as open in another process error...' and the process fails..
Using your code:
using (var fileStream = System.IO.File.OpenWrite(#"path\myfile"))
{
blockBlob.DownloadToStream(fileStream);
}
It is not possible to show the progress because the code comes out of this function only when the download is complete. DownloadToStream function will internally split a large blob in chunks and download the chunks.
What you need to do is download these chunks using your code. What you have to do is use DownloadRangeToStream method. I answered a similar question some time back that you may find useful: Azure download blob part.
I have ASP.NET Web API project where a user can download some stuff from a database.
My Download controller fetches data from the database instance. Every single result has a blob field which is some kind of data (1).
I want add each result to a ZIP file (2). After all I send the HTTP response adding my stream content.
List<Result> results = m_Repository.GetResultsForResultId(given_id_by_request);
// 1
foreach (Result result in results)
{
string fileName = String.Format("{0}-{1}.bin", id >> 16, result.Id);
zipFile.AddEntry(fileName, result.Value);
}
// 2
PushStreamContent pushStreamContent = new PushStreamContent((stream, content, context) =>
{
zipFile.Save(stream);
stream.Close();
}
response = new HttpResponseMessage(HttpStatusCode.OK) { Content = pushStreamContent };
It works nice! But on big download requests this exhausts my memory. I need to find a way to put a stream into a zip archive bufferless. Can someone please help me?!
As far as I can see from the code you posted, you are not disposing the streams you create after usage. This can add to a great amount of memory being reserved by your app which might cause your problems.
I am using the ZipArchive to put multiple files into a zip file in my web application. The code Looks somewhat like that:
using (var compressedFileStream = new MemoryStream())
{
using (var zipArchive = new ZipArchive(compressedFileStream, ZipArchiveMode.Update, false))
{
foreach (Result result in results)
{
string fileName = String.Format("{0}-{1}.bin", id >> 16, result.Id);
var zipEntry = zipArchive.CreateEntry(fileName);
using (var originalFileStream = new MemoryStream(result.Value))
{
using (var zipEntryStream = zipEntry.Open())
{
originalFileStream.CopyTo(zipEntryStream);
}
}
}
}
return File(compressedFileStream.ToArray(), "application/zip", string.Format("Download_{0:ddMMyyy_hhmm}.zip", DateTime.Now));
}
I am using that code snippet inside an MVC Controller method so you have to adapt the return part for your situation.
The above code works fine in my application for up to 300 entries or 50MB volume (those are the limits set by the requirements for my app).
Hope that helps you.
EDIT: Forgot the closing bracket of the first using block. the return Statement has to be inside this using-block, else the stream will be disposed.