Silverlight/WCF application is suddenly transferring extremely slowly - c#

I have a Silverlight 4 application that works a lot with a WCF service. The application has normally ran fine, with fast response times for even some hefty queries. Recently however, it's gotten quite slow, and I'm having a hard time troubleshooting why.
My database is hosted on a remote server. The application is hosted on the same server. Here's what I've noted:
When I run the application locally, using the ASP.NET as my server instead of IIS, and I hit the website via localhost, which hits the remote database, speeds are fast.
When I run the application locally, but use the remote WCF service rather than the local service, things are slow.
When I run the application over the web, (i.e. the remote application which is, again, on the same server as the database, so they're local to one another) the application is slow. This is pretty much what the production environment is...
When I log on to the server, and hit the the website from within the server, things are fast.
The queries to the database are fast. Manually running the queries on the database themselves, yields the results in a split second.
Using the WCFTestClient and hitting the remote WCF service is also really fast, and has virtually immediate turn around.
Lastly, when I'm using the expected setup of my local machine hitting the website over the web, which hits the database, etc:
Not all queries react the same way. Some of the heavier queries which result in large data sets actually have a quick response time. Some of the light queries - straight SELECT statements with no JOINS, that generate only a kilobyte of data, takes a lot longer...about 30 seconds. There are a few queries that are sometimes fast, sometimes slow, but the ones that are always slow are the worst.
About the server:
The server is a dedicated server, I've monitored the CPU and it's not being taxed by anything. I'm hosting with IIS 7, on Win Server 2K8, and Sql Server 2K8. The only thing that's changed in the past few weeks have been some Windows updates, and I've been told by one person that they made some Firewall changes - that's my current theory on the cause, but I don't know what else to try at this point, or how to show that it is the firewall..
Any thoughts?

It's hard to find out the reasons according to what you described, I think you should start to profile your application by logging the database time, WCF request processing time etc.
Once you get the data, you can find the real reason. This is what we have been doing on our products.

If I had to guess, you're experiencing a combination of network latency and a less-than-optimal database design. Your description of "small" queries taking longer than queries yielding large result sets is a classic indicator that you need to evaluate your query plans, and ensure that they are using the right indexes (you are using indexes, right?).
I suspect that sorting out your database issues will solve a great deal of the slowness you're experiencing; caching query results in memcached or something like it will solve most of the rest.
Generally, WCF is the last place I look for performance problems - every time I've gone that direction in the past, the trouble ended up being our code; WCF performs admirable well for its size.
I'm sorry that I can't be more specific, but performance questions are quite application-specific and we don't have much information here to go on.

Fiddler. Fiddler was the answer (as it usually turns out to be.)
If you've experienced similar issues, hopefully what I've learned can be of help.
Here's what I saw:
First, when using both the Chrome/IE Profiler, it became clear that the Request itself was causing the lag, while the Response was quite quick.
This lead me down two paths of possibilities: Either the server was causing lag in the requests due to some specific configuration that I wouldn't see when running via localhost, or there was something wrong with the request itself.
After using Fiddler to get a full view of the request, it became apparent that it was the request I was sending. One of the objects I was passing as a parameter to my WCF service had a property that, when serialized, amounted to about 1 megabyte's worth of data - and that was with gzip enabled. Initially this object was a rather small object, but as the application grew, so did this particular object, resulting in the sudden slow down.
The reason why it happened for certain calls and not others was purely determined by whichever call had this object as a parameter.
The reason why it happens when going over the web, vs. going through the localhost is that over the web, you inevitably face your provider's Upload limit, as well as a number of hops until you hit your server, vs. the direct connection from your localhost to your database.
The lesson: Always transmit the least amount of information you can get away with.

Related

What is the best Method for monitoring a large number of clients reliably with good performance

This is more of a programming strategy and direction question, than the actual code itself.
I am programming in C-Sharp.
I have an application that remotely starts processes on many different clients on the network, could be up to 1000 clients in theory.
It then monitors the status of the remote processes by reading a log file on each client.
I currently do this by running one thread that loops through all of the clients in a list, and reading the log file. It works fine for 10 or 20 machines, but 1000 would probably be untenable.
There are several problems with this approach:
First, if the thread doesn’t finish reading all of the client statuses before it’s called again, the client statuses at the end of the list might not be read and updated.
Secondly, if any client in the list goes offline during this period, the updating hangs, until that client is back online again.
So I require a different approach, and have thought up a few possible ways to resolve this.
Spawn a separate thread for each client, to read their log file and update its progress.
a. However, I’m not sure if having 1000 threads running on my machine is something that would be acceptable.
Test the connect for each machine first, before trying to read the file, and if it cannot connect, then just ignore it for that iteration and move on to the next client in the list.
a. This still has the same problem of not getting through the list before the next call, and causes more delay and it tries to test the connection via a port first. With 1000 clients, this would be noticeable.
Have each client send the data to the machine running the application whenever there is an update.
a. This could create a lot of chatter with 1000 machines trying to send data repeatedly.
So I’m trying to figure if there is another more efficient and reliable method, that I haven’t considered, or which one of these would be the best.
Right now I’m leaning towards having the clients send updates to the application, instead of having the application pulling the data.
Looking for thoughts, concerns, ideas and recommendations.
In my opinion, you are doing this (Monitoring) the wrong way. Instead of keeping all logs in a text file, you'd better preserve them in a central data repository that can be of any kind. With respect to the fact that you are monitoring the performance of those system, your design and the mechanism behind it must not impact the performance of the target systems negatively, and with this design the disk and CPU would be involved so much in certain cases that can result in a performance issue itself.
I recommend you to create a log repository server using a fast in-memory database like Redis, and send logged data directly to that server. Keep in mind that this database must be running on a different virtual machine. You can then tune Redis to store received data on physical Disk once a particular number of indexes are reached or a particular interval elapses. The in-memory feature here is advantageous as you may need to query information a lot in a monitoring application like this. On the other hand, the performance of Redis is so high that it efficiently passes processing millions of indexes.
The blueprint for you is that:
1- Centralize all log data in a single repository.
2- Configure clients to send monitored information to the centralized repository.
3- Read the data from the centralized repository by the main server (monitoring system) when required.
I'm not trying to advertise for a particular tool here as I'm only sharing my own experience. There's many more tools that you can use for this purpose such as ElasticSearch.

Slow initial connection to API

I've got an API written in C# (webforms) and an SQL Server 2008 database that accepts JSON POST data on an AWS EC2 VM. My problem is that the "first" use of this API is rather slow to respond.
What I mean by "first" is that if I were to wait for an hour or so, then post some data, that would be the first. Subsequent posts would process rather quickly in comparison, and I would need to wait another hour or so before experiencing the slow "first" transaction again.
Since only the initial post is slow, it makes me wonder if something is "spinning down" after being idle for some time, and then spinning up again upon first use, adding the extra time.
Things I have tried -
Run program through a performance profiler - This didn't really help. As far as I can see, the program itself doesn't have any obvious parts that run very slowly or inefficiently.
Change configuration to persist at least 1 connection to the database at all times. Again, no real change. I did this by adding "Min Pool Size=1;Max Pool Size=100" to my connection string.
Change configuration to use named pipes instead of TCP. Once again, no real change. I did this by adding "np:" before the server specified in my connection string, eg. server=np:MyServer;database=MyDatabase;
Is there anything else I can do to diagnose the problem? What else should I be looking for in this scenario?
Chances are your app pool is shutting down after a designated period of non-use. The first call after the shutdown forces everything to get loaded back into memory which explains the lag.
You could play with these settings: http://technet.microsoft.com/en-us/library/cc771956%28v=ws.10%29.aspx to see if you get the desired effect, or setup a task scheduler job that makes at least one call every 10+/- minutes of so by doing a simulated post - a simple powershell script could handle that for you and will keep everything 'primed' for the next use.

Multi-server n-tier synchronized timing and performance metrics?

[I'm not sure whether to post this in stackoverflow or serverfault, but since this is a C# development project, I'll stick with stackoverflow...]
We've got a multi-tiered application that is exhibiting poor performance at unpredictable times of the day, and we're trying to track down the cause(s). It's particularly difficult to fix because we can't reproduce it on our development environment - it's a sporadic problem on our production servers only.
The architecture is as follows: Load balanced front end web servers (IIS) running an MVC application (C#). A home-grown service bus, implemented with MSMQ running in domain-integration mode. Five 'worker pool' servers, running our Windows Service, which responds to requests placed on the bus. Back end SQL Server 2012 database, mirrored and replicated.
All servers have high spec hardware, running Windows Server 2012, latest releases, latest windows update. Everything bang up to date.
When a user hits an action in the MVC app, the controller itself is very thin. Pretty much all it does is put a request message on the bus (sends an MSMQ message) and awaits the reply.
One of the servers in the worker pool picks up the message, works out what to do and then performs queries on the SQL Server back end and does other grunt work. The result is then placed back on the bus for the MVC app to pick back up using the Correlation ID.
It's a nice architecture to work with in respect to the simplicity of each individual component. As demand increases, we can simply add more servers to the worker pool and all is normally well. It also allows us to hot-swap code in the middle tier. Most of the time, the solution performs extremely well.
However, as stated we do have these moments where performance is a problem. It's proving difficult to track down at which point(s) in the architecture the bottleneck is.
What we have attempted to do is send a request down the bus and roundtrip it back to the MVC app with a whole suite of timings and metrics embedded in the message. At each stop on the route, a timestamp and other metrics are added to the message. Then when the MVC app receives the reply, we can screen dump the timestamps and metrics and try to determine which part of the process is causing the issue.
However, we soon realised that we cannot rely on the Windows time as an accurate measure, due to the fact that many of our processes are down to the 5-100ms level and a message can go through 5 servers (and back again). We cannot synchronize the time across the servers to that resolution. MS article: http://support.microsoft.com/kb/939322/en-us
To compound the problem, each time we send a request, we can't predict which particular worker pool server will handle the message.
What is the best way to get an accurate, coordinated and synchronized time that is accurate to the 5ms level? If we have to call out to an external (web)service at each step, this would add extra time to the process, and how can we guarantee that each call takes the same amount of time on each server? Even a small amount of latency in an external call on one server would skew the results and give us a false positive.
Hope I have explained our predicament and look forward to your help.
Update
I've just found this: http://www.pool.ntp.org/en/use.html, which might be promising. Perhaps a scheduled job every x hours to keep the time synchronised could get me to the sub 5 ms resolution I need. Comments or experience?
Update 2
FWIW, We've found the cause of the performance issue. It occurs when the software tests if a queue has been created before it opens it. So it was essentially looking up the queue twice, which is fairly expensive. So the issue has gone away.
What you should try is using the Performance Monitor that's part of Windows itself. What you can do is create a Data Collector Set on each of the servers and select the metrics you want to monitor. Something like Request Execution Time would be a good one to monitor for.
Here's a tutorial for Data Collector Sets: https://www.youtube.com/watch?v=591kfPROYbs
Hopefully this will give you a start on troubleshooting the problem.

Caching architecture advise for a specific scenario

SETUP:
We have a .Net application that is distributed over 6 local servers each with a local database(ORACLE), 1 main server and 1 load balance machine. Requests come to the load balancer which redirects the incoming requests to one of the 6 local servers. In certain time intervals data is gathered in the main server and redistributed to the 6 local servers to be able to make decisions with the complete data.
Each local server has a cache component that caches the incoming requests based on different parameters (Location, incoming parameters, etc). With each request a local server decides whether to go to the database (ORACLE) or get the response from the cache. However in both cases the local server has to goto the database to do 1 insert and 1 update per request.
PROBLEM:
On a peak day each local server receives 2000 requests per second and system starts slowing down (CPU: 90% ). I am trying to increase the capacity before adding another local server to the mix. After running some benchmarks the bottleneck as it always is, seems to be the inevitable 1 insert and 1 update per request to database.
TRIED METHODS
To be able decrease the frequency I have created a Windows service that sits between the DB and .NET application. It contains a pipe server and receives each insert and update from the main .NET application and saves them in a Hashtable. The new service then at certain time intervals goes to the database once to do batch inserts and updates. The point was to go to the database less frequently. Although this had a positive effect it didn't benefit to the system load as much as I expected. The most of the cpu load comes from oracle.exe as requests per second increase.
I am trying to avoid going to the database as much as I can and the only way to avoid DB seems to be increasing the cache hit ratio other than the above mentioned solution I tried. My cache hit ratio is around 81 % percent currently. Because each local machine has its own cache I am actually missing lots of cacheable requests. When two similar requests redirects to different servers the second request cannot benefit from the cached result of the first one.
I don't have a lot of experience in system architecture so I would appreciate any help to this problem. Any suggestions on different caching architectures or setup, or any tools are welcome.
Thank you in advance, hopefully I made my question clear.
For me this looks like a application for a timesten solution. In that case you can eliminate the local databases and return to just one. Where you now have the local oracle databases, you can implement a cache grid. Most likely this is going to be a AWT (Async, Write Through) cache. See Oracle In-Memory Database Cache Concepts
It's not a cheap option but if could be worth investigating.
You can keep concentrating on the business logic and have no worries about speed. This of course only works good, if the aplication code is already tuned and the sql is performant and scalable. The SQL has to be prepared (using bind variables) to have the best performance.
Your application connects to the cache and no longer to the database. You create the cache tables in the cache group for which you want to have caching. All tables in a SQL should be cached, otherwise, the complete SQL is passed through to the Oracle database. In the grid a cache fusion mechanism is in place so you have no worries about where the data in your grid is located.
In the current release support for .net is included.
The data is consistent and asynchronously updated to the Oracle database. If the data that is needed is in the cache and you take the Oracle database down, the app can keep running. As soon as the database is back again, the synchronization pick up again. Very powerful.
2000 requests per second per server, about 24000 rps to database. It's a HUGE load for DB.
Try to optimize, scaleup or clusterize database.
May be NoSQL DB (Redis\Raven\Mongo) as middleware will be suitable for you. Local server will read\write sharded NoSQL DB, aggregated data will by synchronized with Oracle off-peak times.
I know the question is old now, but I wanted let everyone know how we solved our issue.
After trying many optimizations it turned out that all we needed was Solid State Drives for the 6 local machines. The CPU dropped down to 30% percent immediately after we installed them. This is the first time that I've seen any kind of hardware update contributes this much to performance.
If you have high load setup, before making any software or architecture changes try upgrading to a SSD.
Thanks everyone for your answers.

What is the most cost-effective way to break up a centralised database?

Following on from this question...
What to do when you’ve really screwed up the design of a distributed system?
... the client has reluctantly asked me to quote for option 3 (the expensive one), so they can compare prices to a company in India.
So, they want me to quote (hmm). In order for me to get this as accurate as possible, I will need to decide how I'm actually going to do it. Here's 3 scenarios...
Scenarios
Split the database
My original idea (perhaps the most tricky) will yield the best speed on both the website and the desktop application. However, it may require some synchronising between the two databases as the two "systems" so heavily connected. If not done properly and not tested thouroughly, I've learnt that synchronisation can be hell on earth.
Implement caching on the smallest system
To side-step the sync option (which I'm not fond of), I figured it may be more productive (and cheaper) to move the entire central database and web service to their office (i.e. in-house), and have the website (still on the hosted server) download data from the central office and store it in a small database (acting as a cache)...
Set up a new server in the customer's office (in-house).
Move the central database and web service to the new in-house server.
Keep the web site on the hosted server, but alter the web service URL so that it points to the office server.
Implement a simple cache system for images and most frequently accessed data (such as product information).
... the down-side is that when the end-user in the office updates something, their customers will effectively be downloading the data from a 60KB/s upload connection (albeit once, as it will be cached).
Also, not all data can be cached, for example when a customer updates their order. Also, connection redundancy becomes a huge factor here; what if the office connection is offline? Nothing to do but show an error message to the customers, which is nasty, but a necessary evil.
Mystery option number 3
Suggestions welcome!
SQL replication
I had considered MSSQL replication. But I have no experience with it, so I'm worried about how conflicts are handled, etc. Is this an option? Considering there are physical files involved, and so on. Also, I believe we'd need to upgrade from SQL express to SQL non-free, and buy two licenses.
Technical
Components
ASP.Net website
ASP.net web service
.Net desktop application
MSSQL 2008 express database
Connections
Office connection: 8 mbit down and 1 mbit up contended line (50:1)
Hosted virtual server: Windows 2008 with 10 megabit line
Having just read for the first time your original question related to this I'd say that you may have laid the foundation for resolving the problem simply because you are communicating with the database by a web service.
This web service may well be the saving grace as it allows you to split the communications without affecting the client.
A good while back I was involved in designing just such a system.
The first thing that we identified was that data which rarely changes - and immediately locked all of this out of consideration for distribution. A manual process for administering using the web server was the only way to change this data.
The second thing we identified was that data that should be owned locally. By this I mean data that only one person or location at a time would need to update; but that may need to be viewed at other locations. We fixed all of the keys on the related tables to ensure that duplication could never occur and that no auto-incrementing fields were used.
The third item was the tables that were truly shared - and although we worried a lot about these during stages 1 & 2 - in our case this part was straight-forwards.
When I'm talking about a server here I mean a DB Server with a set of web services that communicate between themselves.
As designed our architecture had 1 designated 'master' server. This was the definitive for resolving conflicts.
The rest of the servers were in the first instance a large cache of anything covered by item1. In fact it wasn't a large cache but a database duplication but you get the idea.
The second function of the each non-master server was to coordinate changes with the master. This involved a very simplistic process of actually passing through most of the work transparently to the master server.
We spent a lot of time designing and optimising all of the above - to finally discover that the single best performance improvement came from simply compressing the web service requests to reduce bandwidth (but it was over a single channel ISDN, which probably made the most difference).
The fact is that if you do have a web service then this will give you greater flexibility about how you implement this.
I'd probably start by investigating the feasability of implementing one of the SQL server replication methods
Usual disclaimers apply:
Splitting the database will not help a lot but it'll add a lot of nightmare. IMO, you should first try to optimize the database, update some indexes or may be add several more, optimize some queries and so on. For database performance tuning I recommend to read some articles from simple-talk.com.
Also in order to save bandwidth you can add bulk processing to your windows client and also add zipping (archiving) to your web service.
And probably you should upgrade to MS SQL 2008 Express, it's also free.
It's hard to recommend a good solution for your problem using the information I have. It's not clear where is the bottleneck. I strongly recommend you to profile your application to find exact place of the bottleneck (e.g. is it in the database or in fully used up channel and so on) and add a description of it to the question.
EDIT 01/03:
When the bottleneck is an up connection then you can do only the following:
1. Add archiving of messages to service and client
2. Implement bulk operations and use them
3. Try to reduce operations count per user case for the most frequent cases
4. Add a local database for windows clients and perform all operations using it and synchronize the local db and the main one on some timer.
And sql replication will not help you a lot in this case. The most fastest and cheapest solution is to increase up connection because all other ways (except the first one) will take a lot of time.
If you choose to rewrite the service to support bulking I recommend you to have a look at Agatha Project
Actually hearing how many they have on that one connection it may be time to up the bandwidth at the office (not at all my normal response) If you factor out the CRM system what else is a top user of the bandwidth? It maybe the they have reached the point of needing more bandwidth period.
But I am still curious to see how much information you are passing that is getting used. Make sure you are transferring efferently any chance you could add some easy quick measures to see how much people are actually consuming when looking at the data.

Categories

Resources