This is a very broad question, but hopefully I can get useful tips. Currently I have an ASP.NET application that runs on a single server. I now need to scale out to accommodate increasing customer loads. So my plan is to:
1) Scale out the ASP.NET and web component onto five servers.
2) Move the database onto a farm.
I don't believe I will have an issue with the database, as it's just a single IP address as far as the application is concerned. However, I am now concerns about the ASP.NET and web tier. Some issues I am already worried about:
Is the easiest model to implement just a load balancer that will farm out requests to each of the five servers in a round-robin fashion?
Is there any problem with HTTPS and SSL connections, now that they can terminate on different physical servers each time a request is made? (for example, performance?)
Is there any concern with regards to session maintanence (logon) via cookies? My guess is no, but can't quite explain why... ;-)
Is there any concern with session data itself (stored server side)? Obviously I will need to replicate session state between servers, or somehow force a request to only go to a single server. Either way, I see a problem here...
As David notes, much of this question is really more of an Administrative thing, and may be useful on ServerFault. The link he posts has good info to pore over.
For your Session questions: You will want to look at either the Session State Service (comes with IIS as a separate service that maintains the state in common between multiple servers) and/or storing asp.net session state in a SQL database. Both are options you can find at David Stratton's link, I'm sure.
Largely speaking, once you set up your out-of-process session state, it is otherwise transparent. It does require that you store Serializable objects in Session, though.
Round-Robin DNS is the simplest way to load-balance in this situation, yes. It does not take into account the actual load on each server, and also does not have any provision for when one server may be down for maintenance; anyone who got that particular IP would see the site as being 'down', even though four other servers may be running.
Load balancing and handling SSL connections might both benefit from a reverse proxy type of situation; where the proxy handles all the connections coming in, but all it's doing is encryption and balancing the actual request load to the web servers. (these issues are more on the Administration end, of course, but...)
Cookies will not be a problem provided all the web servers are advertising themselves as being the same web site (via the host headers, etc). Each server will gladly accept the cookies set by any other server using the same domain name, without knowing or caring what server sent it; It's based on the host name of the server the web browser is connecting to when it gets a cookie value.
That's a pretty broad question and hard to answer fully in a forum such as this. I'm not even sure if the question belongs here, or if it should be at serverfault.com. However....
Microsoft offers plenty of guidance on the subject. The first result for "scaling asp.net applications" from BING comes up to this.
http://msdn.microsoft.com/en-us/magazine/cc500561.aspx
I just want to bring up areas you should be concerned about with the database.
First off, most data models built with only a single database server in mind require massive changes in order to support a database farm in a multimaster mode.
If you used auto incrementing integers for your primary keys (which most people do) then you're basically screwed out of the gate. There are a couple ways to temporarily mitigate this but even those are going to require a lot of guesswork and have a high potential of collision. One mitigation involves setting the seed value on each server to a sufficiently high number to reduce the likelihood of a collision... This will usually work, for awhile.
Of course you have to figure out how to partition users across servers...
My point is that this area shouldn't be brushed off lightly and is almost always more difficult to accomplish than simply scaling "up" the database server by putting it on bigger hardware.
If you purposely built the data model with a multi-master role in mind then kindly ignore. ;)
Regarding sessions: Don't trust "sticky" sessions, sticky is not a guarantee. Quite frankly, our stuff is usually deployed to server farms so we completely disable session state from the get go. Once you move to a farm there is almost no reason to use session state as the data has to be retrieved from the state server, deserialized, serialized, and stored back to the state server on every single page load.
Considering the DB and network traffic from just and that their purpose was to reduce db and network traffic then you'll understand how they don't buy you anything anymore.
I have seen some issues related to round robin http/https sessions. We used to use in process sessions and told the load balancers to make the sessions sticky. (I think they use a cookie for this).
It let us avoid SQL sessions but meant that when we switched from http to https, our F5 boxes couldn't keep the stickiness. We ended up changing to SQL sessions.
You could investigate pushing the encryption up to the load balancer. I remember that was a possible solution for our problem, but alas, not one we investigated.
The session database on an SQL server can be easily scaled out with little code & configuration changes. You can stick asp.net sessions to a session database and irrespective of which web server in your farm serves the request, your session-id based sql state server mapping works flawless. This is probably one of the best ways to scale out the ASP.NET Session state using SQL server. For more information, read the link True Scaleout model for session state
Related
Let's say I am using multiple connections to download one file from an nginx server. Most of the time its limited to like 2 connections. Is there a way to ask the server how many it is allowing? Or do i have to trial and error to find it by myself
I think you're looking for a robots.txt. Perhaps the property "Crawl-delay" may help you if the site specifies that.
Typically you don't want to expose too much information about the technical setup and configuration of your server to the public for security reasons. Of course hiding such information is not gonna prevent persistent adversaries from attacking your server but at least it's gonna make it harder for the script kiddies. But for this reason I have my doubts that NGINX exposes a public API with such information.
Could you state what is the difference in performance between storing data remotely in dedicated cache/session servers for caching/ session, like Couchbase (which is a different machine) and storing session in the database (which is also a remote server)?
I read somewhere, REST architecture has been introduced to fill the gap, while the session is stored in the server, So rest was not required if we store the session in the database if the database is running on the remote server.
In both cases you have an external datastore where you hold session data. Whether this is a relational database or some dedicated cache server doesn't make a difference conceptually (though one approach may be more suitable than the other depending on your requirements and deployment environment).
I'm not 100% sure what you mean with REST architecture having filled a gap (which gap?), but REST is primarily a stateless way of communicating between machines.
If you have state (e.g. a session) you need to store it somewhere. Where that is, again depends on your requirements. If you have a single page webapp for example you could hold session data on the client and you wouldn't need another datastore for it. In general if you can make it work without a session you should strive to do so as session handling complicates things quite a lot when you think about things like failover behaviour and scalability.
Stateless communication makes things a lot easier - this is probably one of the reasons why the REST way of things got so popular.
I've inherited a relatively low traffic web application in which the main website is accessible on intranet, however it gets its data from a wcf service that runs on the same server which is only accessible via localhost. It was explained to me that this design was implemented as a security measure - essentially to ensure that no entity external to the server could potentially have access to our service and hence our data. The database however, is usually located on a different server.
It's been ok for a while but I am looking at ways to improve performance and it seems that running queries against the wcf service and having to serialize the response for transmission etc. is a waste of time - I'd like to just access the db directly from my web app.
Is this current design logical? Wouldn't it be better overall (for security, and performance) to have my site access db directly, and beef up the security between the app and the db?
Thanks in advance.
Rusty
I'm not sure if I have the big picture from your description. However, it is a common practise to have web-applications consuming content from more than a service to form a graphical interface to end-user. With the grow of SOA, this allows for fast integration to combine sources (from services) and producing rich output. This is called Mashup
This also, amongst other things, enhances the security of the backend services as they are only accessed by the backend application server, and not directly from front-end client
So, from an architectural perspective, it seems your application is trying to do this.
Having the services on the same server, and only consumed by the local server (localhost) is a choice taken to prevent but the application server running on the same machine from having access. This could be due to lack of better control on network access and network zoning
Your database lives on a different server, where there could be another measure implemented to secure the traffic and access
In general, implementation of security requires a budget, as any other functional and non-functional requirement. This is usually associated with the risk you're exposed to, and the sensitivity of the information. The earlier the security is built into the overall architecture the better.
Accessing the database from your web-application requires best practices to protect against the many risks of intrusions and vulnerabilities. In general, your web-client should NOT access the DB directly, and should always use server-side services and validation to do so, be it through wcf or other means
To prevent DOS attacks in my ASP.NET C# application, i have implemented throttling with help of Jarrod's answer in this post.
Best way to implement request throttling in ASP.NET MVC?
But this uses Ip address, which makes it vulnerable to advanced attackers who can change it easily. Another option to identify anonymous users is to use their session ID. I think that it can't be changed until the user restarts the browser, so it can be a good alternative. But i am not sure from the security point of view. Kindly tell me if it is safe or not to use it? If not, then is there any other method to achieve this purpose? Thanks
Edit:
There are some methods that need a longer throttle. That's why i need a programmatic throttle of about 5 secs to 2 mins. I have configured Dynamic Ip Restrictions for IIS, but i can't specify such large time for it.
I think your terminology might be mixed up. DoS is Denial of Service. Someone changing multiple records or requesting functionality is not a DoS attack and normally, most DoS attacks are Distributed, hence DDoS.
What you are requesting based on the link you provided is called throttling... but as others have suggested, the sessionid is simply a value passed up in a cookie and can be easily modified to bypass a check just as you can simply put a proxy in front of the request to mask the source IP between requests.
Therefore, if you only wish to throttle then you need to implement authentication in front of the functionality you want to protect, use the throttling code you posted and maybe throw a CSRF token in as well for good measure.
BUT... if you want to stop DDoS, it ain't going to happen at Layer 7 since the data is already at the server.
I have a scenario that requires me to export a large mailing list (> 1m customers) to an external email system. I have source code and control over both applications.
I need a mechanism for transferring the data from one system to another that is:
Robust
Fast
Secure
So far, I have set up a standard MVC controller method that responds to a request (over https), performs some custom security checks, and then pulls the data from the DB.
As data is retrieved from the DB, the controller method iterates over the results, and writes response in a plain text format, flushing the response every 100 records or so. The receiver reads each row of the response and performs storeage and processing logic.
I have chosen this method because it does not require persisting user data to a permanent file, and a client built in any language will be able to implement receiver logic without a dependency on any proprietary technology (e.g. WCF).
I am aware of other transport mechanisms that I can use with .NET, but none with an overall advantage, given the requirements listed above.
Any insight into which technologies might be better than my request / response solution?
Two suggestions come to mind, we had something similar to this happen at our company a little while ago (acquired website with over 1 million monthly active users and associated data needed a complete datacenter change, including 180gb db that was still active).
We ended up setting up a pull replication to it over SSH (SQL Server 2005), this is black magic at best and took us about a month to set up properly between research and failed configurations. There are various blog posts about it, but the key parts are:
1) set up a named server alias in SQL Server configuration manager on the subscriber db machine, specifying localhost:1234 (choose a better number).
2) set up putty to make a ssh tunnel between your subscriber's localhost:1234 from step 1 and publish db's port 9876 (again, choose a better number). Also make sure you have ssh server enabled on the publisher. Also keep the port a secret and figure out a secure password for the ssh permissions.
3) add a server alias on publisher for port 9876 for the replicated db.
4) if your data set is small enough, create the publications and try starting up the subscriber using snapshot initialize. If not, you need to create a publication with "initialize from backup" enabled, and restore a partial backup at the subscriber using ftp to transfer the backup file over. This method is much faster than snapshot initialization for larger datasets.
Pros: You don't need to worry about authentication for the sql server, "just" the ssh tunnel. Publication can be easily modified in case you realize you need more columns or schema changes. You save time writing an api that may be only temporary and might have more security issues.
Cons: It's weird, there's not much official documentation and ssh on windows is finicky. If you have a linux based load balancer, it may be easier. There are a lot of steps.
Second suggestion: use ServiceStack and protobuf.NET to create a very quick webservice and expose it over https. If you know how to use ServiceStack, it should be very quick. If you don't, it would take a little time because it operates on a different design philosophy from Web API and WCF. Protobuf.NET is the most compact and fastest serialization/deserialization wire format widely available currently. Links:
ServiceStack.NET
Protobuf.NET
Pros: You can handle security however you like. This is also a downside since you then have to worry about it. It's much better documented. You get to use or learn a great framework that will speed up the rest of your webservice-related projects for the rest of your life (or until something better comes along).
Cons: You have to write it. You have to test it. You have to support it.