Performance profiling application with non-pattern async calls:

Performance profiling application with non-pattern async calls: - c#

I'm working on an existing large enterprise application. This application has a small asynchronous method framework built in to it's ViewModel base class. These async methods are similar to APM and the event-based asynchronous pattern. There are little bits from both established patterns that were borrowed.
I've been assigned to profile the performance of a particularly slow view in the application. I have been given a license to Redgate ANTS Performance Profiler for the job.
As far as I believe I have read today, ANTS is normally capable of linking async/await calls to the actual work done. However, since this application I am working on does not follow the async/await pattern, I believe I am missing out on this automatic linkage of Async calls to their execution to their completion handlers.
The actual work being performed is done by a service that is central to the application, so there are hundreds of things that are causing this service to perform work, constantly.
Because of this issue, what ANTS is showing me is the worker method being extremely slow, but it is giving me zero feedback on what inside the view is actually causing this slow work to be done.
I spoke to a coworker about this problem, and he told me this is why he doesn't bother with performance profilers. He told me that what he would do is put time stamped logging calls all over the view and then write a quick and dirty tool to filter the data into something consumable by a human. But this is pretty much exactly what the profiler should be doing for me.
We talked about this for a while and concluded that for a tool to be effective with Async calls, it would either have to support a specific standard, or it would have to support something in the actual code, perhaps such as an attribute, that allows you to mark the async call and the completion handler.
Do you agree with what I've said here? If so, are there any such performance profilers for .NET that have custom attributes to annotate your problematic code with for profiling? If not, could you please enlighten me as to how I can interpret this data to determine the actual cause of the issue?
Thank you for any help.

Related

Size of EF context

My app uses Entity Framework. As I perform operations on the Context, such as inserts/deletes/updates, due to its unity of work behavior im sure it occupies more and more memory as these operations take place.
My question here is: Is there a way to obtain how much memory is the context holding in a given moment?
Details:
No Lazy Loading being used
No Proxy Creation
EF 4

When you need to monitorize memory and CPU usage of objects within an application or service is a synonym of requiring a profiler.
There're many options here. It's just about looking for ".NET profiler" in Google.
Note I'm redirecting you to Google to avoid spam.
Answering to #Servy's concern:
Profilers are fine for a debugging tool during development. It's not
really something that you can use during the execution of the program
outside of development. I get the impression that that's what he's
asking about. –
It's when the requirement of implementing some kind of load tests arises. OP should implement some test cases which mimic real-world scenarios to get stats as close as possible to actual production execution of the same code.

Should all my actions using IO be async?

As I read the MSDN article Using Asynchronous Methods in ASP.NET MVC 4, I draw the conclusion that I should always use async await for I/O-bound operations.
Consider the following code, where movieManager exposes the async methods of an ORM like Entity Framework.
public class MovieController : Controller
{
// fields and constructors
public async Task<ActionResult> Index()
{
var movies = await movieManager.listAsync();
return View(movies);
}
public async Task<ActionResult> Details(int id)
{
var movie = await movieManager.FindAsync(id);
return View(movie);
}
}
Will this always give me better scalability and/or performance?
How can I measure this?
Why isn't this used in the "real world"?
How about context synchronization?
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
I know these are a lot of questions, but literature on this topic has conflicting conclusions. Some say you should always use async for I/O dependent Tasks, others say you shouldn't use async in ASP.NET applications at all.

Will this always give me better scalability and/or performance?
It may. If you only have a single database server as your backend, then your database could be your scalability bottleneck, and in that case scaling your web server won't have any effect in the wider scope of your service as a whole.
How can I measure this?
With load testing. If you want a simple proof-of-concept, you can check out this gist of mine.
Why isn't this used in the "real world" a lot?
It is. Asynchronous request handlers before .NET 4.5 were quite painful to write, and a lot of companies just threw more hardware at the problem instead. Now that .NET 4.5 and async/await are gaining a lot of momentum, asynchronous request handling will continue to be much more common.
How about context synchronization?
It's handled for you by ASP.NET. I have an async intro on my blog that explains how await will capture the current SynchronizationContext when you await a task. In this case it's an AspNetSynchronizationContext that represents the request, so things like HttpContext.Current, culture, etc. all get preserved across await points automatically.
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
As a general rule, if you're on .NET 4.5, you should use async to handle any request that requires I/O. If the request is simple (i.e., does not hit a database or call another service), then just keep it synchronous.

Will this always give me better scalability and/or performance?
You answered it yourself, you need to measure and find out. Typically async is something to add later on due to adding complexity, which is the #1 concern in your code base until you have a problem that is specific.
How can I measure this?
Build it both ways, see which is faster (preferably for a large number of operations)
Why isn't this used in the "real world" a lot?
Because complexity is the biggest problem in software development. If code is complex it is more error prone and harder to debug. More, harder to fix bugs is not a good trade off for potential performance advantages.
How about context synchronization?
I am assuming you mean ASP.NET context, if so you should not have any synchronization, make sure only one thread is hitting your context and communicate through it.
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
Introducing async just to then have to deal with synchronization is a loss unless you really need the performance.

Putting asynchronous code in a website has a lot of negative sides :
You'll get into trouble when there are dependencies between the pieces of data, as you cannot make that asynchronous.
Asynchronous work is often done for things like API requests. Have you considered that you shouldn't be doing these in a webpage? If the external service goes down, so goes your site. That doesn't scale.
Doing things asynchronously may speed up your site in some cases but you're basically introducing trouble. You always end up waiting for the slowest one, and since sometimes resources just slow down for whatever reason this means that the risk of something slowing down your site increases by a factor equal to the number of asynchronous jobs you're using. You'll have to introduce timeouts to deal with these, then error handling code, etc.
When scaling to multiple webservers because the CPU load is getting too heavy, the asynchronous work will hurt you. Everything you used to put in asynchronous code now fires simultaneously the moment the user clicks a link, and then eases down. This doesn't only apply to CPU load, but also database load and even API requests. You will see a very awful utilization pattern across all system resources: spikes of heavy usage, and then it goes down again. That doesn't scale well. Synchronous code doesn't have this problem: jobs only start after another one is done.
Asynchronous work for websites is a trap: don't go there!
Put your heavy code in a worker (or cron job) that does these things before the user asks for them. You'll have them in a database and you can keep adding features to your site without having to worry about firing too many asynchronous jobs and what not.
Performance for websites is seriously overrated. Sure, it's nice if your page renders in 50ms, but if it takes 250ms people really won't notice (to test this: put a Sleep(200) in your code).
Your code becomes a lot more scalable if you just offload the work to another process and make the website an interface to only your database. Don't make your webserver do heavy work that it shouldn't do, it doesn't scale. You can have a hundred machines spending a total of 1 CPU hour per webpage - but at least it scales in a way where the page still loads in 200ms. Good luck achieving that with asynchronous code.
I would like to add a side-note here. While my opinion on asynchronous code might seem strong, it's mostly an opinion about programmers. Asynchronous code is awesome and can make a performance difference that proves all of the points I outlined wrong. However, it needs a lot of finetuning in your code to avoid the points I mention in this post, and most programmers just can't handle that.

What C# tools exist for triggering, queueing, prioritizing dependent tasks

I have a C# service application which interacts with a database. It was recently migrated from .NET 2.0 to .NET 4.0 so there are plenty of new tools we could use.
I'm looking for pointers to programming approaches or tools/libraries to handle defining tasks, configuring which tasks they depend on, queueing, prioritizing, cancelling, etc.
There are various types of services:
Data (for retrieving and updating)
Calculation (populate some table with the results of a calculation on the data)
Reporting
These services often depend on one another and are triggered on demand, i.e., a Reporting task, will probably have code within it such as
if (IsSomeDependentCalculationRequired())
PerformDependentCalculation(); // which may trigger further calculations
GenerateRequestedReport();
Also, any Data modification is likely to set the Required flag on some of the Calculation or Reporting services, (so the report could be out of date before it's finished generating). The tasks vary in length from a few seconds to a couple of minutes and are performed within transactions.
This has worked OK up until now, but it is not scaling well. There are fundamental design problems and I am looking to rewrite this part of the code. For instance, if two users request the same report at similar times, the dependent tasks will be executed twice. Also, there's currently no way to cancel a task in progress. It's hard to maintain the dependent tasks, etc..
I'm NOT looking for suggestions on how to implement a fix. Rather I'm looking for pointers to what tools/libraries I would be using for this sort of requirement if I were starting in .NET 4 from scratch. Would this be a good candidate for Windows Workflow? Is this what Futures are for? Are there any other libraries I should look at or books or blog posts I should read?
Edit: What about Rx Reactive Extensions?

I don't think your requirements fit into any of the built-in stuff. Your requirements are too specific for that.
I'd recommend that you build a task queueing infrastructure around a SQL database. Your tasks are pretty long-running (seconds) so you don't need particularly high throughput in the task scheduler. This means you won't encounter performance hurdles. It will actually be a pretty manageable task from a programming perspective.
Probably you should build a windows service or some other process that is continuously polling the database for new tasks or requests. This service can then enforce arbitrary rules on the requested tasks. For example it can detect that a reporting task is already running and not schedule a new computation.
My main point is that your requirements are that specific that you need to use C# code to encode them. You cannot make an existing tool fit your needs. You need the turing completeness of a programming language to do this yourself.
Edit: You should probably separate a task-request from a task-execution. This allows multiple parties to request a refresh of some reports while at the same time only one actual computation is running. Once this single computation is completed all task-requests are marked as completed. When a request is cancelled the execution does not need to be cancelled. Only when the last request is cancelled the task-execution is cancelled as well.
Edit 2: I don't think workflows are the solution. Workflows usually operate separately from each other. But you don't want that. You want to have rules which span multiple tasks/workflows. You would be working against the system with a workflow based model.
Edit 3: A few words about the TPL (Task Parallel Library). You mentioned it ("Futures"). If you want some inspiration on how tasks could work together, how dependencies could be created and how tasks could be composed, look at the Task Parallel Library (in particular the Task and TaskFactory classes). You will find some nice design patterns there because it is very well designed. Here is how you model a sequence of tasks: You call Task.ContinueWith which will register a continuation function as a new task. Here is how you model dependencies: TaskFactory.WhenAll(Task[]) starts a task that only runs when all its input tasks are completed.
BUT: The TPL itself is probably not well suited for you because its task cannot be saved to disk. When you reboot your server or deploy new code, all existing tasks are being cancelled and the process aborted. This is likely to be unacceptable. Please just use the TPL as inspiration. Learn from it what a "task/future" is and how they can be composed. Then implement your own form of tasks.
Does this help?

I would try to use the state machine package stateless to model the workflow. Using a package will provide a consistent way to advance the state of the workflow, across the various services. Each of your services would hold an internal statemachine implementation, and expose methods for advancing it. Stateless will be resposible for triggering actions based on the state of the workflow, and enforce you to explicitly setup the various states that it can be in - this will be particularly useful for maintenance, and it will probably help you understand the domain better.

If you want to solve this fundamental problem properly and in a scalable way, you should probably look as SOA architecture style.
Your services will receive commands and generate events you can handle in order to react on facts happen in your system.
And, yes, there are tools for it. For example NServiceBus is a wonderful tool to build SOA systems.

You can do a SQL data agent to run SQL queries in timed interval. You have to write the application yourself it looks like. Write like a long running program that checks the time and does something. I don't think there is clearcut tools out there to do what you are trying to do. Do C# application, WCF service. data automation can be done in the sql itself.

If I understand you right you want to cache the generated reports and do not the work again. As other commenters have pointed out this can be solved elegantly with multiple Producer/Consumer queues and some caches.
First you enqueue your Report request. Based on the report genration parameters you can check the cache first if a previously generated report is already available and simply return this one. If due to changes in the database the report becomes obsolete you need to take care that the cache is invalidated in a reliable manner.
Now if the report was not generated yet you need need to schedule the report for generation. The report scheduler needs to check if the same report is already beeing generated. If yes register an event to notify you when it is completed and return the report once it is finished. Make sure that you do not access the data via the caching layer since it could produce races (report is generated, data is changed and the finished report would be immediatly discared by the cache leaving noting for you to return).
Or if you do want to prevent to return outdated reports you can let the caching layer become your main data provider which will produce as many reports until one report is generated in time which was not outdated. But be aware that if you have constant changes in your database you might enter an endless loop here by constantly generating invalid reports if the report generation time is longer as the average time between to changes to your db.
As you can see you have plenty of options here without actually talking about .NET, TPL, SQL server. First you need to set your goals how fast/scalable and reliable your system should be then you need to choose the appropriate architecture-design as described above for your particular problem domain. I cannot do it for you because I do not have your full domain know how what is acceptable and what not.
The tricky part is the handover part between different queues with the proper reliability and correctness guarantees. Depending on your specific report generation needs you can put this logic into the cloud or use a single thread by putting all work into the proper queues and work on them concurrently or one by one or something in between.
TPL and SQL server can help there for sure but they are only tools. If used wrongly due to not sufficient experience with the one or the other it might turn out that a different approach (like the usage of only in memory queues and persisted reports on in the file system) is better suited for your problem.
From my current understanding I would not use SQL server to misuse it as a cache but if you want a database I would use something like RavenDB or RaportDB which look stable and much more light weight compared to a full blown SQL server.
But if you already have a SQL server running then go ahead and use it.

I am not sure if I understood you correctly, but you might want to have a look at JAMS Scheduler: http://www.jamsscheduler.com/. It's non-free, but a very good system for scheduling depending tasks and reporting. I have used it with success at my previous company. It's written in .NET and there is a .NET API for it, so you can write your own apps communicating with JAMS. They also have a very good support and are eager to implement new features.

Is this a good time to use multithreading in ASP.NET MVC and how is it implemented?

I want a certain action request to trigger a set of e-mail notifications. The user does something, and it sends the emails. However I do not want the user to wait for page response until the system generates and sends the e-mails. Should I use multithreading for this? Will this even work in ASP.NET MVC? I want the user to get a page response back and the system just finish sending the e-mails at it's own pace. Not even sure if this is possible or what the code would look like. (PS: Please don't offer me an alternative solution for sending e-mails, don't have time for that kind of reconfiguration.)

SmtpClient.SendAsync is probably a better bet than manual threading, though multi-threading will work fine with the usual caveats.
http://msdn.microsoft.com/en-us/library/x5x13z6h.aspx
As other people have pointed out, success/failure cannot be indicated deterministically when the page returns before the send is actually complete.
A couple of observations when using asynchronous operations:
1) They will come back to bite you in some way or another. It's a risk versus benefit discussion. I like the SendAsync() method I proposed because it means forms can return instantly even if the email server takes a few seconds to respond. However, because it doesn't throw an exception, you can have a broken form and not even know it.
Of course unit testing should address this initially, but what if the production configuration file gets changed to point to a broken mail server? You won't know it, you won't see it in your logs, you only discover it when someone asks you why you never responded to the form they filled out. I speak from experience on this one. There are ways around this, but in practicality, async is always more work to test, debug, and maintain.
2) Threading in ASP.Net works in some situations if you understand the ThreadPool, app domain refreshes, locking, etc. I find that it is most useful for executing several operations at once to increase performance where the end result is deterministic, i.e. the application waits for all threads to complete. This way, you gain the performance benefits while still having a clear indication of results.
3) Threading/Async operations do not increase performance, only perceived performance. There may be some edge cases where that is not true (such as processor optimizations), but it's a good rule of thumb. Improperly used, threading can hurt performance or introduce instability.
The better scenario is out of process execution. For enterprise applications, I often move things out of the ASP.Net thread pool and into an execution service.
See this SO thread: Designing an asynchronous task library for ASP.NET

I know you are not looking for alternatives, but using a MessageQueue (such as MSMQ) could be a good solution for this problem in the future. Using multithreading in asp.net is normally discouraged, but in your current situation I don't see why you shouldn't. It is definitely possible, but beware of the pitfalls related to multithreading (stolen here):
•There is a runtime overhead
associated with creating and
destroying threads. When your
application creates and destroys
threads frequently, this overhead
affects the overall application
performance. •Having too many threads
running at the same time decreases the
performance of your entire system.
This is because your system is
attempting to give each thread a time
slot to operate inside. •You should
design your application well when you
are going to use multithreading, or
otherwise your application will be
difficult to maintain and extend. •You
should be careful when you implement a
multithreading application, because
threading bugs are difficult to debug
and resolve.

At the risk of violating your no-alternative-solution prime directive, I suggest that you write the email requests to a SQL Server table and use SQL Server's Database Mail feature. You could also write a Windows service that monitors the table and sends emails, logging successes and failures in another table that you view through a separate ASP.Net page.

You probably can use ThreadPool.QueueUserWorkItem

Yes this is an appropriate time to use multi-threading.
One thing to look out for though is how will you express to the user when the email sending ultamitely fails? Not blocking the user is a good step to improving your UI. But it still needs to not provide a false sense of success when ultamitely it failed at a later time.

Don't know if any of the above links mentioned it, but don't forget to keep an eye on request timeout values, the queued items will still need to complete within that time period.

What is the best way to debug performance problems?

I'm writing a plug-in for another program in C#.NET, and am having performance issues where commands take a lot longer then I would. The plug-in reacts to events in the host program, and also depends on utility methods of the the host program SDK. My plug-in has a lot of recursive functions because I'm doing a lot of reading and writing to a tree structure. Plus I have a lot of event subscriptions between my plugin and the host application, as well as event subscriptions between classes in my plug-in.
How can I figure out what is taking so long for a task to complete? I can't use regular breakpoint style debugging, because it's not that it doesn't work it's just that it's too slow. I have setup a static "LogWriter" class that I can reference from all my classes that will allow me to write out timestamped lines to a log file from my code. Is there another way? Does visual studio keep some kind of timestamped log that I could use instead? Is there someway to view the call stack after the application has closed?

You need to use profiler. Here link to good one: ANTS Performance Profiler.
Update: You can also write messages in control points using Debug.Write. Then you need to load DebugView application that displays all your debug string with precise time stamp. It is freeware and very good for quick debugging and profiling.

My Profiler List includes ANTS, dotTrace, and AQtime.
However, looking more closely at your question, it seems to me that you should do some unit testing at the same time you're doing profiling. Maybe start by doing a quick overall performance scan, just to see which areas need most attention. Then start writing some unit tests for those areas. You can then run the profiler while running those unit tests, so that you'll get consistent results.

In my experience, the best method is also the simplest. Get it running, and while it is being slow, hit the "pause" button in the IDE. Then make a record of the call stack. Repeat this several times. (Here's a more detailed example and explanation.)
What you are looking for is any statement that appears on more than one stack sample that isn't strictly necessary. The more samples it appears on, the more time it takes. The way to tell if the statement is necessary is to look up the stack, because that tells you why it is being done.
Anything that causes a significant amount of time to be consumed will be revealed by this method, and recursion does not bother it.
People seem to tackle problems like this in one of two ways:
Try to get good measurements before doing anything.
Just find something big that you can get rid of, rip it out, and repeat.
I prefer the latter, because it's fast, and because you don't have to know precisely how big a tumor is to know it's big enough to remove. What you do need to know is exactly where it is, and that's what this method tells you.

Sounds like you want a code 'profiler'. http://en.wikipedia.org/wiki/Code_profiler#Use_of_profilers
I'm unfamiliar with which profilers are the best for C#, but I came across this link after a quick google which has a list of free open-source offerings. I'm sure someone else will know which ones are worth considering :)
http://csharp-source.net/open-source/profilers

Despite the title of this topic I must argue that the "best" way is subjective, we can only suggest possible solutions.
I have had experience using Redgate ANTS Performance Profiler which will show you where the bottlenecks are in your application. It's definitely worth checking out.

Visual Studio Team System has a profiler baked in, its far from perfect, but for simple applications you can kind of get it to work.
Recently I have had the most success with EQATECs free profiler, or rolling my own tiny profiling class where needed.
Also, there have been quite a few questions about profilers in that past see: http://www.google.com.au/search?hl=en&q=site:stackoverflow.com+.net+profiler&btnG=Google+Search&meta=&aq=f&oq=

Don't ever forget Rico Mariani's advice on how to carry out a good perf investigation.

You can also use performance counter for asp.net applications.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.