Parallel processing in server applications [closed]

Parallel processing in server applications [closed] - c#

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
Since on a server side application the work is done by the server, and since it also needs to serve other request, I would like to know if there are any real benefits of using Parallel processing in server side applications? The way I see it, I think it is usually bad to use parallel-processing? focusing the CPU power only on part of the problem, other requests cannot get server?
If there are advantages, I guess they should be considered only when specific conditions are meat. So, what are some good guidelines of when to use the Parallel class in server applications ?

You are balancing two concerns: Fast response for a given user, and supporting all users that wish to connect to the server in a given time period.
Before considering parallelism for faster computation for a given user, consider whether precomputation and caching allow you to meet your performance requirements. Perform hotspot analysis and see if there are opportunities to optimize existing code.
If your deployment hardware is a given, observe the CPU load during peak times. If the CPU is busy (rule of thumb 70%+ utilization), parallel computing will be detrimental to both concerns. If the CPU isn't heavily loaded, you might improve response time for a given user without affecting the number of users the server can handle at once (benchmark to be sure).
If you aren't meeting your single-user performance targets and have exhausted options to precalculate and cache (and have analyzed performance hotspots and don't see opportunities to optimize), you can always parallelize workloads that lend themselves to parallel computation if you're willing to upgrade your server(s) as needed so that during peak periods you don't over-tax the CPU.

As with most performance-related questions: it depends on a lot of factors. Things like:
do you tend to have a lot of requests hitting your server at the same time?
how much of a typical request's turnaround time is spent waiting on I/O, as opposed to actually exercising the CPU?
are you able to have multiple instances of your server code sitting behind a load balancer?
how well does the operation you're looking at get optimized by parallelizing?
how important is it for the operation you're performing to return an answer to the user faster than it would without parallelism?
In my experience, most of the time for typical request is spent waiting for things like database queries and REST API calls to complete, or loading files from a disk. These are not CPU-intensive operations, and inasmuch as they can be made concurrent that can usually be done by simply orchestrating async Tasks in a concurrent manner, not necessarily using parallel threads.
Also in my experience, most attempts to use the TPL to improve performance of an algorithm end up yielding only marginal performance improvements, whereas other approaches (like using more appropriate data structures, caching, etc.) often yield improvements of orders of magnitude.
And of course if your application isn't going too slow for your needs in the first place then any optimization would count as premature optimization, which you want to avoid.
But if you for some reason find yourself doing a CPU-intensive operation that responds well to parallelism, in a part of your code that absolutely must perform faster than it currently does, then parallel processing is a good choice.

Related

Async rewrite of sync code performs 20 times slower in loops [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I'm trying not to panic here, so please, bear with me! :S
I've spent a considerable amount of time rewriting a big chunk of code from sync (i.e. thread-blocking) to async (i.e. C# 6+). The code in question runs inside an ASP.NET application and spans everything from low-level ADO.NET DB access to higher-level unit-of-work pattern, and finally a custom async HTTP handler for public API access - the full server-side stack, so to speak. The primary purpose of the rewrite wasn't optimization, but untangling, general clean-up, and bringing the code up to something that resembles a modern and deliberate design. Naturally, optimization gain was implicitly assumed.
Everything in general is great and I'm very satisfied with the overall quality of the new code, as well as the improved scalability it's shown so far in the last couple of weeks of real-world tests. The CPU and memory loads on the server have fallen drastically!
So what's the problem, you might ask?
Well, I've recently been tasked with optimizing a simple data import that is still utilizing the old sync code. Naturally, it didn't take me long before I tried changing it to the new async code-base to see what would happen.
Given everything, the import code is quite simple. It's basically a loop that reads items from a list that's been previously read into memory, adds each of them individually to a unit of work, and saves it to a SQL database by means of an INSERT statement. After the loop is done, the unit of work is committed, which makes the changes permanent (i.e. the DB transaction is committed as well).
The problem is that the new code takes about 20 times as long as the old one, when the expectation was quite the opposite! I've checked and double-checked and there is no obvious overhead in the new code that would warrant such sluggishness.
To be specific: the old code is able to import 1100 items/sec steadily, while the new one manages 40 items/sec AT BEST (on average, it's even less, because the rate is falling slightly over time)! If I run the same test over a VPN, so that the network cost outweighs everything else, the throughputs is somewhere along 25 items/sec for sync and 20 for async.
I've read about multiple cases here on SO which report a 10-20% slowdown when switching from sync to async in similar situations and I was prepared to deal with that for tight loops such as mine. But a 20-fold penalty in a non-networked scenario?! That's completely unacceptable!
What is my best course of action here? How do I tackle this unexpected problem?
UPDATE
I've run the import under a profiler, as suggested.
I'm not sure what to make of the results, though. It would seem that the process spends more than 80% of its time just... waiting. See for yourselves:
The 14% spent inside the custom HTTP handler corresponds to the IDataReader.Read which is a consequence of a tiny remainder of the old sync API. This is still subject to optimization and is likely to be reduced in the near future. Regardless, it's dwarfed by the WaitAny cost, which definitely isn't there in the all-sync version!
What's curious is that the report isn't showing any direct calls from my code to WaitAny, which makes me think this is probably part of the async/await infrastructure. Am I wrong in this conclusion? I kind of hope I am!
What worries me is that I might be reading this all wrong. I know that async costs are much harder to reason about than single-threaded costs. In the end, the WaitAny might be nothing more than the equivalent of the "Sytem Idle Process" on Windows - an artificial representation of the CPU infrastructure that reflects a free percentage of the CPU resource.
Can anyone shed some light here for me, please?

Is it acceptable to cache huge amount of data in .Net? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm designing an accounting application with more than 400 tables in SQL Server.
About 10% of those tables are operational tables, and the others are used for decoding and reference information.
For example, Invoice tables (Master and details) use about 10 table to decode information like buyer, item , marketer and ... .
I want to know is it acceptable to cache decode tables in asp.net cache and do not query them from SQL Server (I know that changes to cache items should commit on SQL Server too). And use cache items for decoding?
I think it makes it so much faster than regular applications.
Maybe all of them together(Cache tables) are about 500 MB after some years because they don't change frequently.

If you've got the RAM then it's fine to use 500 MB.
However unless you have a performance problem now then caching will only cause problems. Don't fix problems that you haven't encountered, design for performance and optimize only when you have problems - because otherwise the optimization can cause more problems that it solves.
So I would advise that usually it is better to ensure that your queries are optimized and well structured, you have the correct indexes on the tables and that you issue a minimum amount of queries.
Although 500MB isn't a lot of data to cache, with all due respect, usually SQL Server will do a better job of caching than you can - providing that you use it correctly.
Using a cache will always improve performance; at a cost of higher implementation complexity.
For static data that never changes a cache is useful; but it still needs to be loaded and shared between threads which in itself can present challenges.
For data that rarely changes it becomes much more complex simply because it could have changed. If a single application (process) is the only updater of a cache then it isn't as difficult, but still not a simple task.
I have spent months optimizing a offline batch processing system (where the code has complete control of the database for a period of 12 hours). Part of the optimisation is to use various caches and data reprojections. All of the caches are readonly. Memory usage is around the 10gb mark during execution, database is around 170gb, 60 million records.
Even with the caching there has been considerable changes to the underlying schema to improve efficiency. The readonly caches are to eliminate reading during processing; to allow multi threaded processing and to improve the insert performance.
Processing rate has gone from 6 items processed per second 20 months ago to around 6000 items per second (yesterday) - but there is a genuine need for this optimization as the number of items to process has risen from 100,000 to 8 million in the same period.
If you don't have a need then don't optimize.

How much async/await is OK? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
In our project we are using async/await for almost 3 purposes (for all of their methods):
Data access layer: where fetching/updating databases (using Dapper).
Cache (Redis): read/write.
ASP.Net MVC 5 controllers.
The question is how much async/await is ok. Is it ok to use them even when reading or writing small amount of data? How about cache and controllers?
Remarks: the project is a little special and it may have about 50,000 requests per second for few hours of a day.

According to an article I've read:
Async/await is great for avoiding blocking while potentially
time-consuming work is performed in a .NET application, but there are
overheads associated with running an async method
The cost of this is comparatively negligible when the asynchronous
work takes a long time, but it’s worth keeping in mind.
Based on what you asked, even when reading or writing small amount of data?. It doesnt seem to be a good idea as there are over.
Here is the article: The overhead of async/await in NET 4.5
And in the article he used a profiler to check the optimization of async/await.
QUOTE:
Despite this async method being relatively simple, ANTS Performance
Profiler shows that it’s caused over 900 framework methods to be run
in order to initialize it and the work it does the first time that
it’s run.
The question here maybe if you're gonna accept these minimal overheads, and take into consideration that these overheads do pile up into something possibly problematic.

The question is how much async/await is ok. Is it ok to use them even
when reading or writing small amount of data? How about cache and
controllers?
You should use async/await for I/O bound operations, it doesn't matter if it's a small amount of data. More important is to avoid potentially long running I/O bound operations mainly disk and network calls. Asp.net has limited size of thread pool and these operations may block it. Using asynchronous calls helps your application to scale better and allows to handle more concurrent requests.
For more info: http://msdn.microsoft.com/en-us/magazine/dn802603.aspx

Improve performance for ASP.NET to be able to access by many users [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
If an ASP.NET Website is designed to be accessed by many users (i.e. 10,000 users) simultaneously, what are the techniques, knowledges, methods, designs or practices that can be implemented?
Can you please share the names of the designs/practices? I just need to know the names of the techniques, so that I can continue to do further research in Google.
Thanks.

This is a HUGE topic - and as the comments say, there's no magic bullet.
I'll separate the response into two sections: architecture and process.
From an architecture point of view, there are a number of practices. Firstly, there is horizontal scaling - i.e. you add more servers, typically managed by a load balancer. This is a relatively cheap hardware solution, but requires you to know where your bottleneck is. The easiest horizontal scalability trick is adding more web servers; scaling database servers horizontally typically requires significant complexity, using techniques such as sharding. Horizontal scaling can improve your resilience as well as performance.
Vertical scalability basically means upgrading the hardware - more RAM, more CPU, SSD disks, etc. This is often the cheapest solution. It may also mean separating elements of the solution - e.g. separating the web server from the database server.
The next architectural solution is caching - this is a huge topic in its own right. Adding a CDN is a good first step; many CDN providers also offer "application accellerator" options, which effectively add as a reverse caching proxy (much like #Aviatrix recommends). Adding your own reverse caching proxy is often a solution for some weirdness in your own environment, or for offloading static file serving from your ASP.Net servers.
Of course, ASP.Net offers lots of caching options within the framework - make sure you read up on those and understand them; they give huge bang for buck. Also make sure you run your solution through a tool like YSlow to make sure you're setting the appropriate HTTP cache headers.
Another architectural solution that may or may not help is invoking external services asynchronously. If your solution depends on an external web service, calling that service synchronously basically limits your site to the capacity and resilience of the external system. For high-traffic solutions, that's not a good idea.
For very high scalability, many web sites use NoSQL for persistence - this is another huge topic, and there are many complex trade-offs.
From a process point of view, if scalability is a primary concern, you need to bake it into your development process. This means conducting regular performance and scalability assessments throughout the project, and building a measurement framework so you can decide which optimizations to pursue.
You need to be able to load test your solution - but load testing at production levels of traffic is usually commercially unrealistic, so you need to find an alternative solution - I regularly use JMeter with representative infrastructure. You also need to be able to find your bottlenecks under load - this may require instrumenting your code, and using a profiler (RedGate do a great one).
Most importantly is to have a process for evaluating trade-offs - nearly every performance/scalability improvement is at the expense of some other thing you care about. Load balancers cost money; reverse caching proxy solutions increase complexity; NoSQL requires new skills from your development team; "clever" coding practices often reduce maintainability. I recommend establishing your required baseline, building a measurement framework to evaluate your solution against that baseline, and profiling to identify the bottleneck. Each solution to improve scalability must address the current bottleneck, and I recommend a proof of concept stage to make sure the solution really does have the expected impact.
Finally, 10000 concurrent users isn't a particularly large number for most web applications on moderns hardware.

here is my 2c as someone who is currently building a scalable system with asp.net backend
use NGINX as reverse proxy and for caching. Odds are your users will request mostly the same data that can be cached , use that.
use proper caching http headers and cache as much as you can both on the server and on the client, be careful tho, this can result in delay issues between when you update something and when the user sees it.
have servers with a lot of ram and SSD's , the ssd do help a lot!
use NGINX or something else as a load balancer to spread the load between servers.

C# 4 System.Threading.Tasks performance interrogations

I'm currently working on a ASP.NET MVC application with some pages loading a lot of data (repartited in separate LINQ queries).
To increase performance of these pages, i envisage to use the C# 4 Tasks to allow to make queries simultaneouly and, gain execution time.
But I have one major question : from a server side, wich situation is the best :
pages who use Tasks and so, a lot of the server resources in a small amount of time ?
pages who use only synchronous code, less server resource but a bigger amount of time ?
no incidence ?
Performance of my pages is important, but stability of the server is more !
Thank's by advance for your help.

You don't say whether the LINQ queries are CPU bound (e.g. computed in-memory) or IO bound (e.g. reading across the network or from disk).
If they are CPU bound, then using asynchronous code will improve fairness, but reduce throughput - but only so everyone suffers. For example, say you could only process one request at a time, and each request takes 5 seconds. Two requests come in almost at the same time. With synchronous code, the first will complete in 5 seconds while the second is queued, and the second will complete after 10. With asychronous code, both will start together and finish after slightly more than 10 seconds (due to overhead of swapping between the two). This is hypothetical, because you have many threads to process requests concurrently.
In reality, you'll find asynchronous code will only help when you have lots of IO bound operations that take long enough to cause request queuing. If the queue fills up, the server will start issuing Server Unavailable 503 errors. Check your performance counters - if you have little or no requests queued in ASP.NET under typical live loads then don't bother with the additional complexity.
If the work is IO bound, then using asynchronous code will push the bottleneck towards the network/disk. This is a good thing, because you aren't wasting your web server's memory resource on idle blocked request threads just waiting for responses - instead you make request throughput dependent on downstream performance and can focus on optimizing that. i.e. You'll keep those request threads free to take more work from the queue.
EDIT - Nice article on this here: http://blog.stevensanderson.com/2010/01/25/measuring-the-performance-of-asynchronous-controllers/

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.