I'm trying to generate multiple PDFs in parallel using IronPDFs HTML to PDF feature. But it appears to be deadlocking when started from ASP.NET :(
I've recreated the problem here: https://github.com/snebjorn/ironpdf-threading-issue-aspnet
Here's a snippet with the essential parts.
Calling GetSequential() works. But is not executing in parallel.
GetSimple() is running in parallel but deadlocks.
public class TestController : Controller
{
[HttpGet]
[Route("simple")]
public async Task<IActionResult> GetSimple()
{
var tasks = Enumerable
.Range(1, 10)
.Select(i => HtmlToDocumentAsync("hello", i));
var pdfs = await Task.WhenAll(tasks);
using var pdf = PdfDocument.Merge(pdfs);
pdf.SaveAs("output.pdf");
return Ok();
}
[HttpGet]
[Route("seq")]
public async Task<IActionResult> GetSequential()
{
var pdfs = new List<PdfDocument>();
foreach (var i in Enumerable.Range(1, 10))
{
pdfs.Add(await HtmlToDocumentAsync("hello", i));
}
using var pdf = PdfDocument.Merge(pdfs);
pdf.SaveAs("output.pdf");
return Ok();
}
private async Task<PdfDocument> HtmlToDocumentAsync(string html, int i)
{
using var renderer = new HtmlToPdf();
var pdf = await renderer.RenderHtmlAsPdfAsync(html);
return pdf;
}
}
According to https://medium.com/rubrikkgroup/understanding-async-avoiding-deadlocks-e41f8f2c6f5d it's because the thread executing the controller method isn't a main thread. So it just gets added to the thread pool and at some point we're waiting for the controller thread to continue but it's not getting scheduled back in. This happens when we mix async/await with .Wait/.Result.
So am I right to assume that there are .Wait/.Result calls happening inside the IronPDF.Threading package?
Is there a workaround?
UPDATE:
I updated to IronPdf 2021.9.3737 and it now appears to work 🎉
Also updated https://github.com/snebjorn/ironpdf-threading-issue-aspnet
Just wanted to add to this that IronPdf's multi-threading support on MVC web apps is non-existent. You will end up with indefinite deadlocks if you're rendering in the context of an HTTP request.
We have been strung along with the promise of an updated renderer that would resolve the issue (which we were told should be released June/July 2021) but that appears to have been pushed back. I tested the updated renderer using their 'early-access package' and the deadlocking has been replaced by 10 second thread blocks and seemingly random C++ exceptions, so it's far from being fixed. Performance is better single-threaded.
Darren's reply is incorrect - I've stepped through our render calls countless times trying to fix this, and the deadlock comes on the HtmlToPdf.StaticRenderHtmlAsPdf call, not on a PdfDocument.Merge call. It's a threading issue.
I suggest avoiding IronPdf if you haven't already bought their product. Find another solution.
I used IronPDF on the 2021.9.3737 branch without any threading issues on windows and Linux today thanks to Darren's help in another thread.
Documentation: https://ironpdf.com/object-reference/api/
Nuget: https://www.nuget.org/packages/IronPdf/
Source of information: C# PDF Generation (using IronPDF on Azure)
I agree with abagonhishead that StaticRenderHtmlAsPdf used to create a que of PDF documents to be rendered, and on an under-provisioned server it ended in a thread deadlock... the que getting longer and longer as the server struggled to render PDFs.
Solution that worked for me:
moving to a well provisioned server (Azure B1 for example)
(and/or) moving to IronPDF latest Nuget 2021.9.3737
Support for Iron Software here.
Our engineers tested your sample project, increasing to 150 iterations and saw it running without issue.
Our expectation of your use-case is that you are creating multiple threads to generate PDF files and store these files into an array for merging later?
Assuming this is the case, the likely cause of this issue is sending too large an array to the merge method, which requires a large amount of RAM to process. The crash is the memory not handling the large number of PDFS to merge.
as you can see from the attached image, I tested your code with 1000 iterations and it works without issues, I believe the problem may occur when you increase the iterations or input big HTML size that reaches the max CPU and memory capacity that can't handle.
also, I don't agree with abagonhishead because there is not an alternative solution in the market that offer all these features
Related
I'm working on a API Web ASP.NET Core project with NET Core 5.0.0 and I'm using Azure.Storage.Queues 12.6.0 for writing and reading queue messages.
Everything works fine, but I was wondering if the way I read messages is OK or there is a better approach in terms of speed and efficiency.
This is the code I'm using. It's just a piece of code from a Microsoft tutorial, put inside a while() cycle. AzureQueue is just an instance of the QueueClient class.
public async Task StartReading()
{
while (true)
{
if (await AzureQueue.ExistsAsync())
{
QueueProperties properties = await AzureQueue.GetPropertiesAsync();
if (properties.ApproximateMessagesCount > 0)
{
QueueMessage[] retrievedMessage = await AzureQueue.ReceiveMessagesAsync(1);
string theMessage = retrievedMessage[0].MessageText;
await AzureQueue.DeleteMessageAsync(retrievedMessage[0].MessageId, retrievedMessage[0].PopReceipt);
}
}
}
}
Honestly, I'm not comfortable using an infinte while() cycle, because it appears to me as something I can't control and stop. But my personal feelings apart, is this kind of while() an acceptable approach or I should use something else? I just need to continue reading messages immediately as they arrive into the queue.
What you can do is place your code in an Azure function.
Then bind the azure function to trigger when there is something on the queue.
See: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue-trigger?tabs=csharp
My Angular c# app makes a call to a web API and hits a stored proc. The C# part of the app executes quickly, but the 'Content Download' is getting slower and slower with each call.
I have a Angular service that calls the web API;
getInvestorsToFunds(params): Observable<InvestorToFund[]> {
let body = JSON.stringify({ params });
return this.http.post<InvestorToFund[]>(this.baseUrl + 'api/Investor/getInvestorsToFunds', body)
.pipe(catchError(this.handleError));
}
And I call that from my component;
let x = forkJoin(
this.investorService.getInvestorsToFunds(params)
).subscribe(t => {
this.investorToFunds = t[0] as InvestorToFund[];
});
Any ideas on why each call just gets slower and slower?
OK, I got to the bottom of this and I'll post my answer for any poor soul who faces the same issue.
I read into memory leaks and the Chrome tools for taking snapshots. Sure enough my memory usage was increasing over time with each page hit. This meant that less memory was available for my app, throttling the data input from my API.
Turns out one of my plug-ins was causing an issue - https://github.com/inorganik/countUp.js-angular2. I was on version 6 - when I updated to version 7 this stopped the memory leaks and the API call executed in about 3 seconds every time, no matter how many pages I clicked on.
Helpful articles;
https://auth0.com/blog/four-types-of-leaks-in-your-javascript-code-and-how-to-get-rid-of-them/
https://developers.google.com/web/tools/chrome-devtools/memory-problems/
It is not a memory leak. You need to unsubscribe from subscriptions
class A implements OnDestroy {
protected ngUnsubscribe: Subject<void> = new Subject<void>();
ngOnDestroy() {
this.ngUnsubscribe.next();
this.ngUnsubscribe.complete();
}
And on EACH subscription
this.subscription.takeUntil( this.ngUnsubscribe ).subscribe( _ => _ );
This way, when you move away from a component, the ngOnDestroy is run and all your subscriptions are cleared from the memory.
PS. I had the same issue, when I first started. No issues after I implemented this, all is running smooth as butter.
Any ideas on why each call just gets slower and slower?
The time you are seeing is the backend response time. The backend is getting slower and slower and any changes to your frontend code will not make it faster.
Fix
Fix the backend 🌹
I am having some severe performance issues in a project i'm working on. It's a standard web application project - users send requests to an API which trigger some form of computation in various handlers.
The problem right now is pretty much any request will drive the CPU usage of the server up significantly, regardless of what internal computation the corresponding function is supposed to do. For example, we have an endpoint to display a game from the database - the user sends a request containing an ID and the server will respond with a JSON-object. When this request is being processed the CPU usage goes from 5% (with the app just running) to 25-30%. Several concurrent requests will tank the server, with .net-core using 60-70% of the CPU.
The request chain looks like:
(Controller)
[HttpGet("game/{Id}")]
public async Task<IActionResult> GetPerson(string Id)
{
try
{
var response = await _GameService.GetGameAsync(Id);
return Ok(new FilteredResponse(response, 200));
}
Service
public async Task<PlayerFilteredGameState> GetGameAsync(string gameId, string apiKey)
{
var response = await _ironmanDataHandler.GetGameAsync(gameId);
var filteredGame = _responseFilterHelper.FilterForPlayer(response, apiKey);
return filteredGame;
}
Data handler
public async Task<GameState> GetGameAsync(string gameStateId)
{
using (var db = _dbContextFactory.Create())
{
var specifiedGame = await db.GameStateIronMan.FirstOrDefaultAsync(a => a.gameId == gameStateId);
if (specifiedGame == null)
{
throw new ApiException("There is no game with that ID.", 404);
}
var deserializedGame = JsonConvert.DeserializeObject<GameState>(specifiedGame.GameState);
return deserializedGame;
}
}
I've tried mocking all function return values and database accesses, replacing all computed values with null/new Game() etc etc but it doesn't improve the performance. I've spent lots of time with different performance analysis tools but there isn't a single function that uses more than 0,5-1% of the CPU.
After a lot of investigation the only "conclusion" i've reached is that it seems to have something to do with the internal functionality of async/await and the way we use it in our project, because it doesn't matter what we do in the called functions - as soon as we call a function the performance takes a huge hit.
I also tried making the functions synchronous just to see if there was something wrong with my system, however performance is massively reduced if i do that (which is good, i suppose).
I really am at a loss here because we aren't really doing anything out of the ordinary and we're still having large issues.
UPDATE
I've performed a performance analysis in ANTS. Im not really sure how to present the results, so i took a picture of what the callstack looks like.
If your gamestate is a large object, deserializing it can be quite taxing.
You could create a test where you just deserialize a saved game state, and do some profiling with various game states (a fresh start, after some time, ...) to see if there are differences.
If you find that deserializing takes a lot of CPU no matter what, you could look into changing the structure and seeing if you can optimize the amount of data that is saved
I have a website on Rackspace which does calculation, the calculation can take anywhere from 30 seconds to several minutes. Originally I implemented this with SignalR but had to yank it due to excessive CC usage. Hosted Rackspace sites are really not designed for that kind of use. The Bill went though the roof.
The basic code is as below which work perfectly on my test server but of course gets a timeout error on Rackspace if the calculation take more than 30 seconds due to their watcher killing it. (old code) I have been told that the operation must write to the stream to keep it alive. In the days of old I would have started a thread and polled the site until the thread was done. If there is a better way I would prefer to take it.
It seems that with .NET 4.5 I can use the HttpTaskAsyncHandler to accomplish this. But I'm not getting it. The (new code) below is as I understand the handler you would use by taking the old code in the using and placing it in the ProcessRequestAsync task. When I attempt to call the CalcHandler / Calc I get a 404 error which most likely has to do with routing. I was trying to follow this link but could not get it to work either. The add name is "myHandler" but the example link is "feed", how did we get from one to the other. They mentioned they created a class library but can the code be in the same project as the current code, how?
http://codewala.net/2012/04/30/asynchronous-httphandlers-with-asp-net-4-5/
As a side note, will the HttpTaskAsyncHandler allow me to keep the request alive until it is completed if it takes several minutes? Basically should I use something else for what I am trying to accomplish.
Old code
[Authorize]
[AsyncTimeout(5000)] // does not do anything on RackSpace
public async Task<JsonResult> Calculate(DataModel data)
{
try
{
using (var db = new ApplicationDbContext())
{
var result = await CalcualteResult(data);
return Json(result, JsonRequestBehavior.AllowGet);
}
}
catch (Exception ex)
{
LcDataLink.ProcessError(ex);
}
return Json(null, JsonRequestBehavior.AllowGet);
}
new code
public class CalcHandler : HttpTaskAsyncHandler
{
public override System.Threading.Tasks.Task ProcessRequestAsync(HttpContext context)
{
Console.WriteLine("test");
return new Task(() => System.Threading.Thread.Sleep(5000));
}
}
It's not a best approach. Usually you need to create a separate process ("worker role" in Azure).
This process will handle long-time operations and save result to the database. With SignalR (or by calling api method every 20 seconds) you will update the status of this operation on client side (your browser).
If this process takes too much time to calculate, your server will become potentially vulnerable to DDoS attacks.
Moreover, it depends on configuration, but long-running operations could be killed by the server itself. By default, if I'm not mistaken, after 30 minutes of execution.
I have the following code that throws an out of memory exception when writing large files. Is there something I'm missing?
I am not sure why it is throwing an out of memory error as I thought the Filestream would only use a maximum of 4096 bytes for the buffer? I am not entirely sure what it means by the Buffer to be honest and any advice would be appreciated.
public static async Task CreateRandomFile(string pathway, int size, IProgress<int> prog)
{
byte[] fileSize = new byte[size];
new Random().NextBytes(fileSize);
await Task.Run(() =>
{
using (FileStream fs = File.Create(pathway,4096))
{
for (int i = 0; i < size; i++)
{
fs.WriteByte(fileSize[i]);
prog.Report(i);
}
}
}
);
}
public static void p_ProgressChanged(object sender, int e)
{
int pos = Console.CursorTop;
Console.WriteLine("Progress Copied: " + e);
Console.SetCursorPosition (0, pos);
}
public static void Main()
{
Console.WriteLine("Testing CopyLearning");
//CopyFile()
Progress<int> p = new Progress<int>();
p.ProgressChanged += p_ProgressChanged;
Task ta = CreateRandomFile(#"D:\Programming\Testing\RandomFile.asd", 99999999, p);
ta.Wait();
}
Edit: the 99,999,999 was just created to make a 99MB file
Note: I have commented out prog.Report(i) and it will work fine.
It seems for some reason, the error occurs at the line
Console.writeline("Progress Copied: " + e);
I am not entirely sure why this causes an error? So the error might have been caused because of the progressEvent?
Edit 2: I have followed advice to change the code such that it reports progress every 4000 Bytes by using the following:
if (i%4000==0)
prog.Report(i);
For some reason. I am now able to write files up to 900MBs fine.
I guess the question is, why would the "Edit 2"'s code allow it to write up to 900MB just fine? Is it because it's reporting progress and writing to the console up to 4000x less than before? I didn't realize the Console would take up so much memory especially because I'm assuming all it's doing is outputting "Progress Copied"?
Edit 3:
For some reason when I change the following line as follows:
for (int i = 0; i < size; i++)
{
fs.WriteByte(fileSize[i]);
Console.Writeline(i)
prog.Report(i);
}
where there is a "Console.Writeline()" before the prog.Report(i), it would work fine and copy the file, albeit take a very long time to do so. This leads me to believe that this is a Console related issue for some reason but I am not sure as to what.
fs.WriteByte(fileSize[i]);
prog.Report(i);
You created a fire-hose problem. After deadlocks and threading races, probably the 3rd most likely problem caused by threads. And just as hard to diagnose.
Easiest to see by using the debugger's Debug + Windows + Threads window and look at thread that is executing CreateRandomFile(). With some luck, you'll see it is completed and has written all 99MB bytes. But the progress reported on the console is far behind this, having only reported 125KB bytes written, give or take.
Core issue is the way Progress<>.Report() works. It uses SynchronizationContext.Post() to invoke the ProgressChanged event handler. In a console mode app that will call ThreadPool.QueueUserWorkItem(). That's quite fast, your CreateRandomFile() method won't be bogged down much by it.
But the event handler itself is quite a lot slower, console output is not very fast. So in effect, you are adding threadpool work requests at an enormous rate, 99 million of them in a handful of seconds. No way for the threadpool scheduler to keep up, you'll have roughly 4 of them executing at the same time. All competing to write to the console as well, only one of them can acquire the underlying lock.
So it is the threadpool scheduler that causes OOM, forced to store so many work requests.
And sure, when you call Report() less frequently then the fire-hose problem is a lot less worse. Not actually that simple to ensure it never causes a problem, although directly calling Console.Write() is an obvious fix. Ultimately simple, create a usable UI that is useful to a human. Nobody likes a crazily scrolling window or a blur of text. Reporting progress no more frequently than 20 times per second is plenty good enough for the user's eyes, the console has no trouble keeping up with that.