C# 4 System.Threading.Tasks performance interrogations - c#

I'm currently working on a ASP.NET MVC application with some pages loading a lot of data (repartited in separate LINQ queries).
To increase performance of these pages, i envisage to use the C# 4 Tasks to allow to make queries simultaneouly and, gain execution time.
But I have one major question : from a server side, wich situation is the best :
pages who use Tasks and so, a lot of the server resources in a small amount of time ?
pages who use only synchronous code, less server resource but a bigger amount of time ?
no incidence ?
Performance of my pages is important, but stability of the server is more !
Thank's by advance for your help.

You don't say whether the LINQ queries are CPU bound (e.g. computed in-memory) or IO bound (e.g. reading across the network or from disk).
If they are CPU bound, then using asynchronous code will improve fairness, but reduce throughput - but only so everyone suffers. For example, say you could only process one request at a time, and each request takes 5 seconds. Two requests come in almost at the same time. With synchronous code, the first will complete in 5 seconds while the second is queued, and the second will complete after 10. With asychronous code, both will start together and finish after slightly more than 10 seconds (due to overhead of swapping between the two). This is hypothetical, because you have many threads to process requests concurrently.
In reality, you'll find asynchronous code will only help when you have lots of IO bound operations that take long enough to cause request queuing. If the queue fills up, the server will start issuing Server Unavailable 503 errors. Check your performance counters - if you have little or no requests queued in ASP.NET under typical live loads then don't bother with the additional complexity.
If the work is IO bound, then using asynchronous code will push the bottleneck towards the network/disk. This is a good thing, because you aren't wasting your web server's memory resource on idle blocked request threads just waiting for responses - instead you make request throughput dependent on downstream performance and can focus on optimizing that. i.e. You'll keep those request threads free to take more work from the queue.
EDIT - Nice article on this here: http://blog.stevensanderson.com/2010/01/25/measuring-the-performance-of-asynchronous-controllers/

Related

SSAS Processing via c# (Microsoft.AnalysisServices) Much Slower than SSMS

I am processing my SSAS Cube programmatically. I process the dimensions in parallel (I manage the parallel calls to .Process() myself) and once they're all finished, I process the measure group partitions in parallel (again managing the parallelism myself).
As far as I can see, this is a direct replication of what I would otherwise do in SSMS (same process types etc.) The only difference I can see is that I'm processing ALL of the dimensions in parallel and ALL of the measure group partitions in parallel thereafter. If you right click process several objects within SSMS, it appears to only process 2 in parallel at any one time (inferred from the text that indicates process has not started in all processing windows other than 2). But if anything, I would expect my code to be faster, not slower than SSMS.
I have wrapped the processing action with "starting" and "finishing" debug messages and everything is as expected. It is the work done by .Process() that seems to be much slower than SSMS.
On a Cube that normally takes just under 1 hour to process, it is taking 7.5 hours.
On a cube that normally takes just under 3 minutes to process, it is taking 6.5 minutes.
As far as I can tell, the processing of dimensions is about the same but the measure groups are significantly slower. However, the latter are much much larger of course so it might just be that the difference is not as obvious to me.
I'm at a loss for ideas and would appreciate any help! Am I missing a setting? Is managing the parallelism myself and processing multiple in parallel as opposed to 2 causing a problem?
If you can provide your code I'm happy to look but my guess is that you are calling dimension.Process() in parallel threads expecting it to process in parallel on the server. It won't. It will process in serial due to locking because you are executing separate processing batches and separate transactions.
Any reason not to process everything (rather than incrementally processing just recent partitions or something)? Let's start simple and see if this is all you need. Can you get the database object and just do a ProcessFull? That will properly process in parallel all dimensions and measure groups.
database.Process(ProcessType.ProcessFull)
If you do need incremental processing then review this link for using ExecuteCaptureLog(true,true) to run multiple ProcessUpdate commands in parallel and in a transaction:
https://jesseorosz.wordpress.com/2006/11/20/how-to-process-dimensions-in-parallel-using-amo/
I would recommend including the partitions you want to process in that transactional batch. It will know the right dependencies automatically. Also make sure to include a ProcessIndexes on the cube object in that batch so flexible aggs and indexes on old partitions get rebuilt after the dimension ProcessUpdate.

Best way to throttle an external application's CPU Usage

Ok - here is the scenario:
I host a server application on Amazon AWS hosted windows instances. (I do not have access to the source code - so I cannot resolve the issues from within the applications source code)
These specific instances are able to build up CPU credits during times of idle cpu (less than 10-20% usage) and then spend those CPU credits during times of increased compute requirement.
My server application however, typically runs at around 15-20% cpu usage when no clients are connected- this is time when I would rather lower the cpu usage to around 5% through throttling of the cpu - maintaining enough cpu throughput to accept a TCP Socket from incoming clients.
When a connected client is detected, I would like to remove the throttle and allow full access to the reserve of AWS CPU Credits.
I have got code in place that can Suspend and Resume processes via C# using Windows API calls.
I am however a bit fuzzy on how to accurately attain a target cpu usage for that process.
What I am doing so far, which is having moderate success:
Looping inside another application
check the cpu usage of the server application - using performance counters (dont like these - they require a 100-1000 ms wait in order to return a % value)
I determine if the current value is above or below the target value - if above, I increase an int value called 'sleep' by 10ms
If below - 'sleep' is decreased by 10ms.
Then the application will call
Process.Suspend();
Threads.sleep(sleep);
Process.Resume();
Like I said - this is having moderate success.
But there are several reasons I don't like it:
1. It requires a semi-rapid loop in an external application: This might end up just shifting cpu usage to that application.
2. Im sure there are better mathematical solutions to work out the ideal sleep time.
I came across this application : http://mion.faireal.net/BES/
It seems to do everything I want, except I need to be able to control it, and I am not a c++ developer.
It also seems to be able to achieve accurate cpu throttling without consuming large cpu utself.
Can someone suggest CPU throttle techniques.
Remember - I cannot modify the source code of the application being throttled - at most, I could inject code into it: but it occurs to me that if I inject suspend code into it, then the resume code could not fire etc.
An external agent program might be the best way to go.

Does multi-threading equal less CPU?

I have a small list of rather large files that I want to process, which got me thinking...
In C#, I was thinking of using Parallel.ForEach of TPL to take advantage of modern multi-core CPUs, but my question is more of a hypothetical character;
Does the use of multi-threading in practicality mean that it would take longer time to load the files in parallel (using as many CPU-cores as possible), as opposed to loading each file sequentially (but with probably less CPU-utilization)?
Or to put it in another way (:
What is the point of multi-threading? More tasks in parallel but at a slower rate, as opposed to focusing all computing resources on one task at a time?
In order to not increase latency, parallel computational programs typically only create one thread per core. Applications which aren't purely computational tend to add more threads so that the number of runnable threads is the number of cores (the others are in I/O wait, and not competing for CPU time).
Now, parallelism on disk-I/O bound programs may well cause performance to decrease, if the disk has a non-negligible seek time then much more time will be wasted performing seeks and less time actually reading. This is called "churning" or "thrashing". Elevator sorting helps somewhat, true random access (such as solid state memories) helps more.
Parallelism does almost always increase the total raw work done, but this is only important if battery life is of foremost importance (and by the time you account for power used by other components, such as the screen backlight, completing quicker is often still more efficient overall).
You asked multiple questions, so I've broken up my response into multiple answers:
Multithreading may have no effect on loading speed, depending on what your bottleneck during loading is. If you're loading a lot of data off disk or a database, I/O may be your limiting factor. On the other hand if 'loading' involves doing a lot of CPU work with some data, you may get a speed up from using multithreading.
Generally speaking you can't focus "all computing resources on one task." Some multicore processors have the ability to overclock a single core in exchange for disabling other cores, but this speed boost is not equal to the potential performance benefit you would get from fully utilizing all of the cores using multithreading/multiprocessing. In other words it's asymmetrical -- if you have a 4 core 1Ghz CPU, it won't be able to overclock a single core all the way to 4ghz in exchange for disabling the others. In fact, that's the reason the industry is going multicore in the first place -- at least for now we've hit limits on how fast we can make a single CPU run, so instead we've gone the route of adding more CPUs.
There are 2 reasons for multithreading. The first is that you want to tasks to run at the same time simply because it's desirable for both to be able to happen simultaneously -- e.g. you want your GUI to continue to respond to clicks or keyboard presses while it's doing other work (event loops are another way to accomplish this though). The second is to utilize multiple cores to get a performance boost.
For loading files from disk, this is likely to make things much slower. What happens is the operating system tries to lay out files on disk such that you should only need to do an expensive disk seek once for each file. If you have a lot of threads reading a lot of files, you're gonna have contention over which thread has access to the disk, and you'll have to seek back to the right place in the file every time the next thread gets a turn.
What you can do is use exactly two threads. Set one to load all of the files in the background, and let the other remain available for other tasks, like handling user input. In C# winforms, you can do this easily with a BackgroundWorker control.
Multi-threading is useful for highly parallelizable tasks. CPU intensive tasks are perfect. Your CPU has many cores, many threads can use many cores. They'll use more CPU time, but in the end they'll use less "user" time. If your app is I/O bounded, then multithreading isn't always the solution (but it COULD help)
It might be helpful to first understand the difference between Multithreading and Parallelism, as more often than not I see them being used rather interchangeably. Joseph Albahari has written a quite interesting guide about the subject: Threading in C# - Part 5 - Parallelism
As with all great programming endeavors, it depends. By and large, you'll be requesting files from one physical store, or one physical controller which will serialize the requests anyhow (or worse, cause a LOT of head back-and-forth on a classical hard drive) and slow down the already slow I/O.
OTOH, if the controllers and the medium are separate, multiple cores loading data from them should be improved over a sequential method.

How many threads to use?

I know there are some existing questions and they provide a very good general perspective on things. I'm hoping to get some details on the C#/VB.Net side for the actual implementation (not philosophy) of some of these perspectives.
My Particular Case
I have a WCF Service which, amongst other things, receives files. For most of the service's life this particular area is actually just sat doing nothing - when work does come it arrives in high bursts of greatly varying quantities.
For each file received (which at a max can be thousands per second) the service needs to work on the files for between 1-10 seconds (each) depending on a number of other services, local resources, and network IO wait times.
To aid the service with these burst workloads I implemented a Queue system. Those thousands of files recieved per second are placed onto the Queue. A controller calculates the number of threads to use based on the size of the queue, up until it reaches a "Peak Max Threads" setting which prevents it from creating additional threads. These threads are placed in a thread pool, and reused to cycle through the queue. The controller will; at intervals; recalculate the number of threads required. If the queue size reduces, a relevant number of threads are released.
The age old problem
How many threads should I peak at? Clearly, adding a new thread everytime a file was received would be silly for lack of a better word - the performance, at best, would deteriorate. Capping the threads when CPU utilization is only 10% across each core, also doesn't seem to be the best use of resources.
So, is there an appropriate way to determine how many threads to cap at? I would rather the service could determine this for itself by sampling available resources, but is there a performance hit from doing so? I know the common answer is to monitor workloads, adjust the counts through trial and error until I find a number I like, but due to the nature of this service (long periods of idle followed by high/burst workloads) it could take a long time to get that kind of information.
What then if we move the server's image to a different host which is faster/slower/different to the first? I have to re-sample the process all over again?
Ideally what I'm after, is for the co-ordinator to intelligently increase the size of the threadpool until CPU utilisation is at x% (would 80% be reasonable? 90%? 99%?). Clearly, I want to do this without adding more threads than is necessary to hit x% otherwise all I'll end up with is threads not just waiting on IO resources, but awaiting each other too.
Thanks in advance!
Related questions (if you want some generic ideas):
How many threads to create?
How many threads is too many?
How many threads to create and when?
A Complication for you
Where would be the fun if I didn't make the problem more difficult?
As it currently stands, the service does hit 100% cpu during these bursts, regularly. The issue is the CPU utilisation spikes. It goes from idle (0-10%) to 100%, and back down again. I'm not sure I can help that - ideally I wouldn't take it all the way to 100%. The problem exists because the files mentioned are in fact images, and part of the services' process is to pass the image through to the System.Windows.Media blackbox which does some complex image processing for me.
There are then lulls in between the spikes because of the IO waits and other processing that goes on. If the spikes hitting 100% can't be helped (and I'm all for knowing how to prevent that, or if I should) how should I aim for the CPU utilisation graph to look? Sat constantly at 100%? Bouncing between 50-100? If I do go through the effort of sampling to decide what does seem to work best, is it guaranteed that switching the virtual servers' host will also work best with the same graph?
This added complexity I won't take into consideration for those of you willing to answer. Feel free to ignore this section. However, any answer that also accounts for this complication, or even answers that just provide tips on how to handle it, I'll at the very least upvote!
Heck of a long question - sorry about that - and thanks for reading so much!!
PerformanceCounter allows you to query for processor usage.
However ,have you tried something the framework provides?
foreach (var file in files)
{
var workitem = file;
Task.Factory.StartNew(() =>
{
// do work on workitem
}, TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
You can tune the concurrency level for Tasks in the Task.Factory.
The .NET 4 threadpool by default will schedule the number of threads it finds most performing on the hardware where it runs, but you can change how that works with the previous link.
Probably you need a custom solution but it would be ok to benchmark yours with the standard.
Edit: (comment note):
No links needed, I may have used an invented term since english is not my language. What I mean is: have a variable where you store the variance before the last check (prevDelta), and call it delta. add this to the varuiable avrageDelta and divide by 2, each time you 'check'. You will have the variable averageDelta that will mostly be low since you have no activity. Then have another set of delta variables, one you have already (delta - prevdelta), and store it in a delta variable that is not the average of all deltas but the average of deltas in a small timespan (you will have to come up with an algortihm to calculate accurately this temporal variance). Once done this you can compare the average delta and the 'temporal delta'. The average delta will be mostly low and will slowly go up whjen bursts come. In the same period the temporal delta will go up really fast. Then you have the situation when the burst stops, the average delta goes slowly down, and the 'temporal' goes really fast.
You could use I/O Completion Ports to asynchronously fetch your images without tying up any threads until it comes time to process what you have fetched.
You could then limit your thread pool based on the number of cores on your client PC, making sure to leave a core free for other processes to use.
What about a dynamic thread manager that monitors their overall performance and according to this spawns new threads or kills old ones? The main problem here is only how to define the performance measurement function. The rest can be done with a periodically scheduled job that increases or decreases the number of threads according to the previous number of threads and performance in that case or something like that. Maybe also in connection to resources utilization (CPU, disks, network...).

reducing loading time of 100 pages of google

for my project i need to access entire pages(100) of google at a time for a particular keyword.I used 'for' loop for accessing pages in url written in my c# code.But it is taking more time to access.Some times it showing HttpRequest error.Any way to increase the speed?
Query them in parallel. HTTP is asynchronous by nature, so should be your request code.
In your case, speed is limited by the time it takes to fulfill an I/O request. You can speed up the total task by accessing servers in parallel (i.e. using ThreadPool). A browser will generally use a couple (2-8) parallel I/O requests to a serer, so so could you (for instance usefull if you also need image files or css files referenced by the google result). Since you'll have up to 100 servers, you can do it massively parallel; again a task the Threadpool will help you with.

Categories

Resources