How crawlers/bots work? differentiating bots/crawlers http requests - c#

I am working on one website.
I need to find out if my website gets a visit from Google's or any other search engine's crawlers/bots
In my application, I am intercepting http requests. And need to find out if crawlers/bots are making http requests to crawl my site.
How can I do this?

Check the user agent string to see if it's a known robot. An example:
protected void Page_Load(object sender, EventArgs e)
{
if (Request.UserAgent.Contains("Googlebot"))
{
//it's one of the google robots
}
else if (...)
{
...
}
}
For google, the list of agents they use can be found here here.
Others, you'll have to find out yourself.

Related

Getting client ip address in Global.asax.cs - is it possible?

Of course that Request.UserHostAddress way is great, however in Application_Start() the Request object isn't exists yet.
I want to first guess the location of user by his/her IP - just one time - as he/she enters the web site, and set the default locale for him/her. Then I'll manipulate it some where else.
I think that there must be an event to be overridden in Global.asax which Request exists in it, However I can't find that event...
Indeed any alternate trick will be appreciated...
Update:
In fact, I'm developing a multilingual web site and I use MaxMind GeoIP to get the country of users by their IP. So, I want to find a way in order that when a user enters the site (only and only the first request) I retrieve his/her country and store it in Session or a global variable.
I know that I can achieve my goal any where else by Request.UserHostAddress and I have not any problem with it - just one line overhead for each request isn't an issue at all for this small app.
However I wonder if it is possible to set that global variable, only and only once...!?!
Application_Start() is not right event, in which you can do this. It will not run for every request from your user, it does some basic initialization of asp.net application.
Better solution is to user, for example
protected void Application_BeginRequest(){}
which runs on beginning of request from client.
About earliest event, in which both request and session will be available... Seems like it is
void Application_AcquireRequestState(object sender, EventArgs e)
{
if (System.Web.HttpContext.Current.Session != null)
{
if (Session["locale"] != null)
{
//already set variable.
}
else
{
//set some variable
Session["locale"] = "code";
}
}
}
But what exactly do you want to you mean by "setting locale based on IP"? Can you clarify this task? 'Cause in general, for each request execution this "locale" information must be set.
You should do this like this
public void Init(HttpApplication context)
{
context.BeginRequest += (Application_BeginRequest);
}
private void Application_BeginRequest(object source, EventArgs e)
{
var context = ((HttpApplication) source).Context;
var ipAddress = context.Request.UserHostAddress;
}
It may be solved by using GlobalFilterCollection. You can override the method OnActionExecuting and add the necessary logic. This article: ASP.NET MVC 4 Custom Action Filters may help.

How to use session to make a login page

I'm trying to write a website which based on ASP.Net. When I made a login page with username and Password, and also connected to a SQL-server.
But when I type in the right username and password. It will need to click login twice to login. Once I login, when I go back to the login page. No matter what I'm trying to type in the username and password textbox. The system will always log me in. I heard that the session can help, but I don't have any idea how to use it.
Is there anyone could help me? Or show me some usable code samples please?
Thank you
Jimmy
I second #GojiraDeMonstah's suggestion and would also recommend that you try to use Microsoft's out of the box (OOTB) functionality for handling website security (i.e. authentication, authorization, user management, password reset etc.) as much as possible. There's no reason to go reinventing the wheel when it's all there for you. You can even extend the existing functionality to create your own custom authentication provider but you really want to avoid trying to write one from scratch especially if you're new to this stuff.
Microsoft provides an infinite number of tools and tutorials to allow you to setup all this stuff so easily. Don't try creating your own database unless you really, really have to. Just use the one they provide you and work from that as a starting point.
Here is another great resource that provides a more visual tutorial to show you how easy it is.
Good luck!
The process of supplying a username and password (credentials) and then using the supplied name & password to verify a user is called Authentication. If you google asp.net authentication you will get a zillion results. Here's a good start --> http://support.microsoft.com/kb/301240
Write code like this
FirstPage.aspx(On your first page Login button click)
protected void Login_Click(object sender, EventArgs e)
{
Session["UserName"] = txtUserName.Text;//Store username in session
}
SecondPage.aspx(after login on next page)
protected void Page_Load(object sender, EventArgs e)
{
LabelUserName.Text = Session["UserName"].ToString();//Show username on a label
}
Hope it helps ....
The easiest way I have found is to download the sample pages provided in this example here.
Use the Global.asac file so you don't have to add login code to each and every page in your application.
In the file "Global.asax", define your session in the function "Session_Start()"
protected void Session_Start(Object sender, EventArgs e)
{
//The first Session "Logged" which is an indicator to the
//status of the user
Session["Logged"]="No";
//The second Session "User" stores the name of the current user
Session["User"]="";
//The third Session "URL" stores the URL of the
//requested WebForm before Logging In
Session["URL"]="Default.aspx";
}
In each of the pages you want only authenticated access to check if the user is Logged or not like this:
private void Page_Load(object sender, System.EventArgs e)
{
if(Session["Logged"].Equals("No"))
{
....
}
else
{
....
}
}
In your Login.aspx page check the user name and password from your database with a function like:
if(CheckUser(UserNametxt.Text.Trim()) && CheckPassword(Passwordtxt.Text.Trim())
{
....
}
else
{
....
}
In your codebehind define the functions CheckUser() and CheckPassword() by connecting to your database and passing the variable from the login page.
Download sample files here.

Calling url on base of incoming url

I am trying to detect incoming url in asp.net page and making some decision on the base of that url But I am facing some problem here is my c# code the detect the url and also condtions
<script runat="server">
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
String url = Request.ServerVariables["HTTP_REFERER"];
if (url != "http://172.17.0.221:84/CP.aspx")
{
Response.Redirect("http://a.sml.com.pk");
}
else
{
Response.Redirect("http://172.17.0.221:85/MilkSale.aspx");
}
}
}
</script>
But When I call the page from http://172.17.0.221:84/CP.aspx then it gives this error:
This webpage has a redirect loop.
The webpage at http://172.17.0.221:85/MilkSale.aspx has resulted in too many redirects. Clearing your cookies for this site or allowing third-party cookies may fix the problem. If not, it is possibly a server configuration issue and not a problem with your computer.
Can any one tell me what may the error in this code?
If your script statement is also on the MilkSale.aspx page, then it will fire every time the page is hit; in effect, it will redirect to itself forever (or, in this instance, until asp.net detects that it is requesting the same page over and over again).
To begin with:
protected void Page_Load(object sender, EventArgs e)
{
if (!IsPostBack)
{
String url = Request.ServerVariables["HTTP_REFERER"];
if(!String.IsNullOrEmpty(url))
{
if (!url.ToUpper().Contains("CP.ASPX"))
{
Response.Redirect("http://a.sml.com.pk");
}
else if (!url.ToUpper().Contains("MILKSALE.ASPX") && !url.ToUpper().Contains("CP.ASPX"))
{
Response.Redirect("http://172.17.0.221:85/MilkSale.aspx");
}
}
}
}
Then this will fix the first issue. However, you then have to consider some other issues with your code;
You are doing case insensitive string matching.
You have IP addresses hard coded in your urls
1) is pretty easy to use; you can use String.Compare(url, referrer, StringComparison.InvariantCultureIgnoreCase) for example. In my code, I have used .ToUpper() but this is still fraught with issues (but makes for a compact example)
2) Is more difficult; you should really disassociate your redirect mechanism from the root url, or else you'll have to change your code everytime you change site. Either use the property HttpContext.Current.Request.Url.PathAndQuery or, preferably, look at URL rewriting.

Will this work when changing all the extensions in a site from .htm to .aspx?

I have been tasked with giving my company's website (~40 pages) a facelift. The original site is written in straight html/css/javascript and every file has the .htm extension. The new site is written in .net 3.5, Hosted on windows through iis.
I am not changing the directoy structure at all, but every page will go from a .htm extension to .aspx and I am concerned about how this will effect my SEO.
From another SO question I found a link to this article detailing a custom http module from which I have the following code:
public class PermanentRedirectHttpModule : IHttpModule
{
public void Dispose()
{
}
public void Init(HttpApplication context)
{
context.BeginRequest += new EventHandler(context_BeginRequest);
}
void context_BeginRequest(object sender, EventArgs e)
{
HttpContext context = HttpContext.Current;
HttpRequest request = context.Request;
if (request.Url.PathAndQuery.Contains(".htm"))
{
string url = request.Url.ToString();
url = url.Replace(".htm", ".aspx");
context.Response.Status="301 Moved Permanently";
context.Response.AddHeader("Location", url);
context.Response.End();
}
}
}
I have implemented this method and everything works as I would expect it to. Is this method acceptable and will it maintain my search engine rankings?
This would work by redirecting all of your .htm paths to .aspx. Because you're doing a 301 redirect, you might notice a temporary drop in search engine places as the power gets transferred. You'll also need to make sure that any links on your site go to the new URLs, otherwise you'll get alot of internal 301 redirects.
An alternative would be to use URL rewriting. This way you could maintain your .htm URLs, but they would be rewritten to point to the .aspx pages.
Helicon ISAPI (http://www.isapirewrite.com/) is one I often use. If you just have the 1 site on the server, you can get away with using the lite (free) version.
If you're using IIS7, you could use the built in rewrites which are configured in the web.config file of your site.

Search engine friendly urls in ASP dot NET

Objective was:
To change pages like details.aspx?GUID=903901823908129038 to clean ones like /adrian_seo
Achieved:
Now using Response.AddHeader("Location", url);
I am able to remove all uppercase urls. I use the following function:
protected void Page_Load(object sender, EventArgs e)
{
string url = Request.Url.ToString();
if (url != Request.Url.ToString().ToLower())
{
url = Request.Url.ToString().ToLower();
Response.Status = "301 Moved Permanently";
Response.AddHeader("Location", url);
}
}
Question is:
How do I change these to clean urls like /adrian_seo
I mean how do I handle requests coming to /adrian_seo and how do I show my old pages with new urls.
Do I need to use Global.asax?
Have a look into ASP.NET routing.
Use an HttpModule:
public void Init(HttpApplication context)
{
context.BeginRequest += new EventHandler(context_BeginRequest);
}
void context_BeginRequest(object sender, EventArgs e)
{
HttpContext context = ((HttpApplication)sender).Context;
if (context.Request.RawUrl.ToLowerInvariant().Equals("YOURSEOURL"))
context.RewritePath("YOURNONSEOURL");
}
Note that you don't want to hard code all this. Find some sort of regex to match your need, like if the SEO url is: /page/234234/This-is-my-page-title, you grab the 234234 and rewrite the path to page.aspx?pageid=234234
UPDATE
You can also use the IIS 7 Rewrite Module
I recommend using the UrlRewritingNet component. When writing your own library you need to overcome some difficulties, this library already handles that stuff for you...
It is a rewrite-module tuned for
ASP.NET 2.0, and offers support for
Themes and Masterpages
Regular Expressions
Good Postback-Urls
Cookieless Sessions
Runs in Shared-Hosting or Medium-Trust enviroments
OutputCache is supported
Redirects possible, even to other Domains
To enable extenionless urls in asp.net with IIS 6 or lower your also need to configure IIS to let asp.net handle all incoming requests.
You could use http://urlrewriter.net/ which can be used on asp.net 1.1 ->
After some Read up on Routing in Asp.net:
http://blog.eworldui.net/post/2008/04/ASPNET-MVC---Legacy-Url-Routing.aspx
Which brings both 301 redirects for SEO page rank and Asp.net Routing for permanent organic SEO solution.

Categories

Resources