I have a C# MVC application with a WCF service running on Azure. First of it was of course hosted on the free version, but as I had that one running smoothly I wanted to try and see how it ran on either Basic or Standard, which as far as I know should be dedicated servers.
To my surprise the code ran significantly slower once it was changed from Free to either Standard or Basic. I chose the smallest instance, but still expected them to perform better than the Free option?
From my performance logging I can see that the code that runs especially slow is something that is started as async from Task.Run. Initially it was old school Thread.Start() but considered whether this might spawn it in some lower priority thread and therefore changed it to Task.Run - without this changing anything - so perhaps it has nothing to do with it - but it might, so now you know.
The code that runs really slow basically works on some XML document, through XDocument, XElement etc. It loops through, has some LINQ etc. but nothing too fancy. But still it is 5-10 times slower on Basic and Standard as on the Free version? For the exact same request the Free version uses around 1000ms where as Basic and Standard uses 8000-10000ms?
In each test I have tried 5-10 times but without any decrease in response-times. I thought about whether I need to wait some hours before the Basic/Standard is fully functional or something like that, but each time I switch back, the Free version just outperforms it from the get-go.
Any suggestions? Is the Free version for some strange reason more powerful than Basic or Standard or do I need to configure something differently once I get up and running on Basic or Standard?
The notable difference between the Free and Basic/Standard tiers is that Free uses an undisclosed number of shared cores, whereas Basic/Standard has a defined number of CPU cores (1-4 based on how much you pay). Related to this is the fact that Free is a shared instance while Basic/Standard is a private instance.
My best guess based on this that since the Free servers you would be on house multiple different users and applications, they probably have pretty beef specs. Their CPUs are probably 8-core Xeons and there might even be multiple CPUs. Most likely, Azure isn't enforcing any caps but rather relying on quotas (60 CPU minutes / day for the Free tier) and overall demand on the server to restrict CPU use. In other words, if your site is the only one that happens to be doing anything at the moment (unlikely of course, but for the sake of example), you could be potentially utilizing all 8+ cores on the box, whereas when you move over to Basic/Standard you are hard-limited to 1-4. Processing XML is actually very CPU heavy, so this seems to line up with my assumptions.
More than likely, this is a fluke. Perhaps your residency is currently on a relatively newly provisioned server that hasn't been fill up with tenants yet. Maybe you just happen to be sharing with tenants that aren't doing much. Who knows? But, if the server is ever actually under real load, I'd imagine you'd see a much worse response time on the Free tier than even Basic/Standard.
Related
We are scraping an Web based API using Microsoft Azure. The issue is that there is SO much data to retrieve (there are combinations/permutations involved).
If we use a standard Web Job approach, we calculated it would take about 200 years to process all the data we want to get - and we would like our data to be refreshed every week.
Each request/response from the API takes about a 0.5-1.0 seconds to process. Request size is on average 20000 bytes and the average response is 35000 bytes. I believe the total number of requests is in the millions.
Another way to think about this question would be: how would you use Azure to Web scrape - and make sure you don't overload (in terms of memory + network) the VM it's running on? (I don't think you need too much CPU processing in this case).
What we have tried so far:
Used Service Bus Queues/Worker Roles scaled to 8 small VMs - but this caused a lot of network errors to occur (there must be some network limit to how much EACH worker role VM can handle).
Used Service Bus Queues/Continuous Web Job scaled to 8 small VMs - but this seems to work slower - and even scaled, doesn't give us too much control on what's happening behind the scenes. (We don't REALLY know how many VMs are up).
It seems that these things are built for CPU calculation - not for Web/API scraping.
Just to clarify: I throw my requests into a queue - which then get picked up by my multiple VMs for processing to get the responses. That's how I was using the queues. Each VM was using the ServiceBusTrigger class as prescribed by microsoft.
Is it better to have a lot small VMs or few massive VMs?
What C# classes should we be looking at?
What are the technical best practices when trying to do something like this on Azure?
Actually a web scraper is something that I have up and running, in Azure, for quite some time now :-)
AFAIK there is no 'magic bullet'. Scraping a lot of sources with deadlines is quite hard.
How it works (the most important things):
I use worker roles and C# code for the code itself.
For scheduling, I use the queue storage. I put crawling tasks on the queue with a timeout (e.g. 'when to crawl then') and have the scraper pull them off. You can put triggers on the queue size to ensure you meet deadlines in terms of speed -- personally I don't need them.
SQL Azure is slow, so I don't use that. Instead, I only use table storage for storing the scraped items. Note that updating data might be quite complex.
Don't use too much threading; instead, use async IO for all network traffic.
Also you might have to consider that extra threads require extra memory (parse trees can become quite big) - so there's a trade-off there... I do recall using some threads, but it's really just a few.
Note that probably this does require you to re-design and re-implement your complete web scraper if you're now using a threaded approach.. then again, there are some benefits:
Table storage and queue storage are cheap.
I currently use a single Extra Small VM to scrape well over a thousand web sources.
Inbound network traffic is for free.
As such, the result is quite cheap as well; I'm sure it's much less than the alternatives.
As for classes that I use... well, that's a bit of a long list. I'm using HttpWebRequest for the async HTTP requests and the Azure SDK -- but all the rest is hand crafted (and not open source).
P.S.: This doesn't just hold for Azure; most of this also holds for on-premise scrapers.
I have some experience with scraping so I will share my thoughts.
It seems that these things are built for CPU calculation - not for Web/API scraping.
They are built for dynamic scaling which given your task is not something you really need.
How to make sure you don't overload the VM?
Measure the response times and error rates and tune you code to lower them.
I don't think you need too much CPU processing in this case.
Depends on how much data is coming in each second and what you are doing with it. More complex parsing on quickly incoming data (if you decide to do it on the same machine) will eat up CPU pretty quickly.
8 small VMs caused a lot of network errors to occur (there must be some network limit)
The smaller the VMs the less shared resources they get. There are throughput limits and then there is an issue with your neighbors sharing the actual hardware with you. Often, the smaller your instance size the more trouble you run into.
Is it better to have a lot small VMs or few massive VMs?
In my experience, smaller VMs are too crippled. However, your mileage may vary and it all depends on the particular task and its solution implementation. Really, you have to measure yourself in your environment.
What C# classes should we be looking at?
What are the technical best practices when trying to do something like this on Azure?
With high throughput scraping you should be looking at infrastructure. You will have different latency in different Azure datacenters, and different experience with network latency/sustained throughput at different VM sizes, and depending on who in particular is sharing the hardware with you. The best practice is to try and find what works best for you - change datacenters, VM sizes and otherwise experiment.
Azure may not be the best solution to this problem (unless you are on a spending spree). 8 small VMs is $450 a month. It is enough to pay for an unmanaged dedicated server with 256Gb of RAM, 40 hardware threads and 500Mbps - 1Gbps (or even up to several Gbps bursts) of quality network bandwidth without latency issues.
For you budget, you will have a dedicated server that you cannot overload. You will have more than enough RAM to deal with async pinning (if you decide to go async), or enough hardware threads for multi-threaded synchronous IO which gives the best throughput (if you choose to go synchronous with a fixed-size threadpool).
On a sidenote, depending on the API specifics, it might turn out that your main issue will be the API owner simply throttling you down to a crawl when you start to put too much pressure on the API endpoints.
With reference to this post about Booksleeve and to the fact that there is not an official Windows Redis distribuition, what is the best practice? Is better to compile on Win32 or the "Unofficial" win32/64 distribuition is reliable and mantained ?
Booksleeve is just any-other-redis-client, and is rather orthogonal to the redis-server version/platform that you choose to use. Personally, I would only currently use the win32 implementations of redis-server as a local developer convenience. Production machines should probably use the linux build (we use ubuntu server, if it matters). The reason for this comes down to the simple fact that redis-server is designed to make use of the cheap linux fork/copy-on-write functionality to perform background saving (and possibly other functionality). Windows does not have such a fork, and the "linux on windows" implementations typically do a memory copy (pretty expensive, and may significantly impact how certain operations perform).
Worse: at least one pure-windows version of redis-server simply substitutes a BGSAVE request for a SAVE request; on an busy server, this is death: SAVE is synchronous, and redis is single threaded, usually simply exploiting the fact that individual operations are so ridiculously fast, so that you wouldn't normally notice. However: if you suddenly get a SAVE request that takes 20 seconds, then your redis server is doing nothing else for those 20 seconds. When you are relying on replies that are typically around 0.3ms, this is a big problem.
Microsoft have been working on a port of redis-server, and it may well be that this is now production ready; however, all things considered, for now I would rather stick with the primary, well-tested implementation on a linux server.
But: for ad-hoc developer usage, any of the win32 builds should be fine.
All what I know about performance testing is what it's name suggests!
But I have some problems specially with the database querying techniques and how will it affect my application's performance at normal times and at stress!
So can performance tests calculates for me a certain page's performance ?
Can I do that on the development machine (my own pc/local host) ?
Or I have to test it on the hosting server ? do I have to own a server or shared hosting is okay ?
what are the available books/articles ? and the good free tools to use ?
I know I asked a lot of questions but they will actually all adds up to help anyone that is having the same spins in my head when trying to decide which technique to use and can't get a definite opinion from the experienced ones!
Thanks in advance for your time and effort =)
First, if you know you have problems with your db architecture, then it sounds like you don't really need to do load testing at this time, you'd be better served figuring out what your db issues are.
As for the overall, "how can I load test, and what are some good directions to go?" It depends on a couple of things. First, you could test in your dev environment, though unless its the same setup as the production environment (server setup / cpu memory / ect.), then it is only going to be an estimate. In general I prefer to use a staging / test environment that mimics the production environment as closely as possible.
If you think you're going to have an application with high usage you'll want to know what your performance is period, whether dedicated or shared hosting. I will say, however, that if you are expecting a high traffic site / application, you'll probably have a number of reasons to have a dedicated hosting environment (or a cloud based solution).
There are some decent free tools available, specifically there is http://jmeter.apache.org/ which can plug into a bunch of stuff, the catch is that, while the gui interface is better than years ago, its not as good as some of the commercial options available.
You'll ultimately run into an issue where you can only bang on something so much from a single client computer, even with one of these packages, and you'll need to start distributing that load. That is where the commercial packages start to really provide some good benefits.
For C# specifically, and .Net projects in general, Visual Studio (depedning on your version) should have something like Test Projects, which you can read more about here: http://msdn.microsoft.com/en-us/library/ms182605(v=vs.80).aspx That may be closer, specifically, to what you were asking in the first place.
The most basic without access to the server is:
Console.write("Starting in " + DateTime.Now;)
//code
Console.write("Ending in " + DateTime.Now;)
Then you can measure what consult takes more time.
But you need to test with more scenarios, an approach can be better that other in certain cases, but vice-versa in others.
It's a tricky subject, and you will need more than just Stack Overflow to work through this - though I'm not aware of any books or web sites. This is just my experience talking...
In general, you want to know 2 things:
how many visitors can my site handle?
what do I need to do to increase that number?
You usually need to manage these concurrently.
My approach is to include performance testing into the development lifecycle, by creating a test environment (a dev machine is usually okay) on which I can control all the variables.
I use JMeter to run performance tests mimicking the common user journeys, and establish the number of users where the system starts to exceed maximum allowed response times (I typically use 1 second as the limit). Once I know where that point is, I will use analysis tools to understand what is causing the system to exceed its response time - is it the database? Should I introduce caching? Tools like PAL make this easy; at a more detailed level, you should use profilers (Redgate do a great one).
I run this process for an afternoon, once every two weeks, so there's no nasty surprise at the end of the project. By doing this, I have a high degree of confidence in my application's performance, and I know what to expect on "production" hardware.
On production, it's much harder to get accesso to the data which allows you to analyze a bottleneck - and once the site is live, it's usually harder to get permission to run performance tests which can bring the site down. On anything other than a start-up site, the infrastructure requirements mean it's usually too expensive to have a test environment that reflects live.
Therefore, I usually don't run a performance test on production which drives the app to the breaking point - but I do run "smoke tests", and collect log files which allow the PAL reports to be generated. The smoke test pushes the environment to a level which I expect to be around 50% of the breaking point - so if I think we've got a capacity of 100 concurrent users, the smoke test will go to 50 concurrent users.
Basically, I'm wondering if threading is useful or necessary, or possibly more specifically the uses and situations in which you would use it. I don't know much about threading, and have never used it (I primarily use C#) and have wondered if there are any gains to performance or stability if you use them. If anyone would be so kind to explain, I would be grateful.
In the world of desktop applications (my domain), threading is a vital construct in creating responsive user interfaces. Whenever a time-or-computationally-intensive operation needs to run, it's almost essential to run that operation in a separate thread. Otherwise, the user interface locks up and, in some cases, Windows will decide that the whole application has become unresponsive.
Threading is also a vital tool in animation, audio and communications. Basically, any situation in which you find yourself needing to do several things at once lends itself to the use of threads.
there is definitely no gains to stability :). I would suggest you get a basic understanding of threading but don't jump to use it in any real production application until you have a real need. you have C# so not sure if you are building websites or winforms.
Usually the firsty threading use case for winforms is when a user click a button and you want to run some expensive operation (database or webservice call) but you dont want the screen to freeze up . .
a good tutorial to deal with that situation is to look at the backgroundworker class in c# as this will give you a first flavor into this space and then you can go from there
There was a time when our applications would speed up when we deploy them on new CPU. And that speed up was by large extent because CPU speed (clock) was incremented by large factors.
But several years ago, CPU manufacturers stopped increasing CPU clocks because of physical limits (e.g. heat dissipation). And instead they started adding additional cores to CPUs.
Now, if your application runs only on one thread it cannot take advantage of complete CPU (e.g. of 4 cores it uses only 1).
So today to fully utilize CPU we must take effort and divide task on multiple treads.
For ASP.NET this is already done for us by ASP.NET architecture and IIS.
Look here The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software
Here is a simple example of how threading can improve performance. You have a n numbers that all needed to be added together. In a single threaded application, it will take a n time units to add all of the numbers together for the final sum. However, if you broke your numbers into 2 groups, you could have the same operation running side by side with, each with a group of n/2 numbers. Each would take n/2 time units to find their respective sums, and then an additional unit to find the full sum. By creating two threads, you have effectively cut the compute time in half.
Technically on a single core processor, there is no such thing as multi-threading, just the illusion that multiple tasks are happening in parallel since each task gets a small amount of time.
However, that being said, threading is very useful if you have to do some work that takes a long time but you want your application to be responsive (i.e. be able to do other things) while you wait for that task to finish. A good example is GUI applications.
On multi-core / multi-processor systems, you can have one process doing many things at once so the performance gain there is obvious :)
Are there any tips, tricks and techniques to prevent or minimize slowdowns or temporary freeze of an app because of the .NET GC?
Maybe something along the lines of:
Try to use structs if you can, unless the data is too large or will be mostly used inside other classes, etc.
The description of your App does not fit the usual meaning of "realtime". Realtime is commonly used for software that has a max latency in milliseconds or less.
You have a requirement of responsiveness to the user, meaning you could probably tolerate an incidental delay of 500 ms or more. 100 ms won't be noticed.
Luckily for you, the GC won't cause delays that long. And if it did you could use the Server (background) version of the GC, but I know little about the details.
But if your "user experience" does suffer, it probably won't be the GC.
IMHO, if the performance of your application is being affected noticeably by the GC, something is wrong. The GC is designed to work without intervention and without significantly affecting your application. In other words, you shouldn't have to code with the details of the GC in mind.
I would examine the structure of your application and see where the bottlenecks are, maybe using a profiler. Maybe there are places where you could reduce the number of objects that are being created and destroyed.
If parts of your application really need to be real-time, perhaps they should be written in another language that is designed for that sort of thing.
Another trick is to use GC.RegisterForFullNotifications on back-end.
Let say, that you have load balancing server and N app. servers. When load balancer recieves information about possible full GC on one of the servers it will forward requests to other servers for some time therefore SLA will not be affected by GC (which is especially usefull for x64 boxes where more than 4GB can be addressed).
Updated
No, unfortunately I don't have a code but there is a very simple example at MSDN.com with dummy methods like RedirectRequests and AcceptRequests which can be found here: Garbage Collection Notifications