I'm trying to create a parsing system for c#, to block my program from fetching images from "banned" websites that are located in a list. I have tried using bool class, to do a Regex.Replace operation, unfortunately it didn't work out.
To elaborate on what I exactly would like, this is an example:
I have a List BannedSites = new List { "site" };
if(Bannedsites.Contains(input))
{
Don't go to that site
}
else
{
Go to that site
}
Though the error I mostly get is I have "site" in the list, though if someone does "site " with a space afterwards it goes to the else statement, since it doesn't directly exist in the list, or if someone does "site?" and we know a questionmark at the end of the url doesn't make a difference usually to access the site, so they bypass it again. Is it possible to do something that if the input contains "site", WITHING the string, for it to not go to the site. Sorry if this is a simple code, though I haven't been able to figure it out and google didn't help.
Thanks in advance!
You can use LINQ's .Any to help with that:
if(Bannedsites.Any(x => input.Contains(x)) {
// Don't go to that site
} else {
// Go to that site
}
Remember to use .ToUpperInvariant() on everything to make it case-insensitive.
If you make sure that you only have the domain names (and arguably ips) in the list Bannedsites then you can look for the domain only.
To get the domain of a Uri, do as follows:
var uri = new Uri("http://stackoverflow.com/questions/11060418/c-sharp-string-parsing-containing-in-a-list");
Console.WriteLine(uri.DnsSafeHost);
The output is:
stackoverflow.com
Now you can get it to work like this (remember to store in upper case in Bannedsites):
var uri = new Uri(input)
if(Bannedsites.Contains(uri.DnsSafeHost.ToUpper(CultureInfo.InvariantCulture)))
{
//Don't go to that site
}
else
{
//Go to that site
}
This will also ensure that the domain didn't appear as a part of another string by chance, for example as part of a parameter.
Also note that this method will give you subdomains, so:
var uri = new Uri("http://msdn.microsoft.com/en-US/");
Console.WriteLine(uri.DnsSafeHost);
returns:
msdn.microsoft.com
and not only:
microsoft.com
You may also verify that the uri is valid with uri.IsWellFormedOriginalString():
var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Contains(uri.DnsSafeHost))
{
//Don't go to that site
}
else
{
//Go to that site
}
Now, let's say that you want to take into account the detail of subdomains, well, you can do this:
var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Any(x => uri.DnsSafeHost.EndsWith(x))
{
// Don't go to that site
}
else
{
// Go to that site
}
Lastly if you are banning particular pages not whole webs (in which case caring for the subdomains makes no sense), then you can do as follows:
var uri = new Uri(input)
if(uri.IsWellFormedOriginalString() && Bannedsites.Contains((uri.DnsSafeHost + uri.AbsolutePath)))
{
//Don't go to that site
}
else
{
//Go to that site
}
Using AbsolutePath you take care of those "?" and "#" often used to pass parameters, and any other character that doesn't change the requested page.
You may also consider using Uri.Compare and store a list of Uri instead of a list of strings.
I leave you the task of making the comparisons case invariant as RFC 1035 says:
"
For all parts of the DNS that are part of the official protocol, all
comparisons between character strings (e.g., labels, domain names, etc.)
are done in a case-insensitive manner.
"
Related
I am developing a Classified Web Site. This web site has a lot of Sub Domains like dubai.sitedomain.com, london.sitedomain.com, newyork.sitedomain.com etc depending on cities. I am using Angularjs as my front end and Web Api as server side.
My requirement is I want to get the subdomain on my Web Api Action. Like:
[HttpGet]
[Route("GetAds")]
public IHttpActionResult GetAds()
{
// Here I want to know, from which subdomain the request has been sent
// so I can filter my ads according to the city
var city = "city from subdomain";
var list = _adService.GetAdsByCity(city);
return Ok(list);
}
Use Request.Headers.Referrer.Host.Replace(".sitedomain.com", string.Empty);
Of course this will only work on your live environment, so you may need to modify this to work differently on your local test domains, or provide some sort of default fallback. I would suggest extracting this to a method in a common library, as it's likely you will need it in many places.
Alternatively, if you know that it will always be the first part, you can use
Request.Headers.Referrer.Host.Substring(0, Request.Headers.Referrer.Host.IndexOf("."));
You can extract it from
var x = Request.Host.Value;
Which will give you "london.sitedomain.com:port"
You can then do a
x.Split('.')[0] //(for example)
To get the subdomain
This should get you what you need.
HttpContext.Current.Request.UserHostAddress;
HttpContext.Current.Request.UserAgent;
HttpContext.Current.Request.Url.OriginalString;
There is also:
Request.Url.Scheme
Request.Url.Authority
Request.ApplicationPath
You can get host from HttpRequestMessage.RequestUri.Host:
string host = Request.RequestUri.Host;
int index = host.IndexOf('.');
return host.Substring(0, index + 1)
I have this following code:
if (Request.UrlReferrer != null)
{
if (Request.UrlReferrer.PathAndQuery.ToLowerInvariant() == "/test/content.htm")
{
postbacklink = Request.UrlReferrer.AbsoluteUri.Replace("/TEST/Content.htm", "/Testing.aspx?") + Request.QueryString;
}
else
{
postbacklink = Request.UrlReferrer.AbsoluteUri;
}
}
ExtendedLoanView.PostbackLink = postbacklink;
Now this page can be accessed by two different locations. Which means this code:
postbacklink = Request.UrlReferrer.AbsoluteUri.Replace("/TEST/Content.htm", "/Test.aspx?") + Request.QueryString;
can only work with one page (Test.aspx) and is hard coded. So in IE7 Request.UrlReferrer shows me this:
Request.UrlReferrer = {http://Testing:12345/PPP/Content.htm}
Whereas in IE8+ I am getting this value:
Request.UrlReferrer = {http://Testing:12345/PPP/TestingPage.aspx?Name=Xyz&Address=123 YYY
How should I solve this issue? Its been bugging me for past month.
I would definitely advice not to base your logic on request information (not anymore than user entered values). The thing is that it will be different across browsers, and it is really hackable.
If you still need to pass information from client to server, make sure to have those validated. If you need those to stay in sync and have valid information, do not rely on what the browsers give you, but set it yourself and then take it from a place in the request you did set (for example, a hidden input, a control, a variable on the viewstate, or whatever allows the technology you're using).
Most sites handle the situation you're trying to solve by passing the destination URL in the URL itself, in a query parameter. For example:
http://www.example.com/Login.aspx?returnUrl=/TEST/content.htm
EDIT: I do realize that everything you send to the client is very hackable anyway, but if you set it yourself, it's easier for you to validate that it hasn't been tampered with. An example is the ViewState validation methods.
Recently my team was asked to implement an HttpModule for an ASP.NET MVC application that handled double-encoded URLs on IIS 7 and .NET 3.5. Here's the crux of the problem:
We sometimes get URLs that have double-encoded forward slashes that look like so:
http://www.example.com/%252fbar%5cbaz/foo
There are other formats that we have to handle as well, but they all have something in common, they have a double-encoded forward slash.
To fix this, we wrote an HttpModule that only acts when a URL has a double encoded forward slash, and we redirect it to a sane URL. The details aren't important, but there are two bits that are:
We can't control the fact that these URLs have double-encoded forward slashes
And we have not ugpraded to .NET 4.0 yet, nor is it on the immediate horizon.
Here's the problem:
The first request after IIS starts up shows a different URL than the second request does.
If we used the URL from the above example, the first request to IIS would look like:
http://www.example.com/bar/baz/foo
and the second request would look like:
http://www.example.com/%252fbar%5cbaz/foo
This was done by inspecting the Application.Request.Url.AbsolutePath property while debugging.
Here's the smallest code example that should reproduce the problem (create a new MVC application, and register the following HttpModule):
public class ForwardSlashHttpModule : IHttpModule
{
internal IHttpApplication Application { get; set; }
public void Dispose()
{
Application = null;
}
public void Init(HttpApplication context)
{
Initialize(new HttpApplicationAdapter(context));
}
internal void Initialize(IHttpApplication context)
{
Application = context;
context.BeginRequest += context_BeginRequest;
}
internal void context_BeginRequest(object sender, EventArgs e)
{
var url = Application.Request.Url.AbsolutePath; //<-- Problem point
//Do stuff with Url here.
}
}
Then, call the same URL on localhost:
http://www.example.com/%252fbar%5c/foo
NB: Make sure to insert a Debugger.Launch() call before the line in context_BeginRequest so that you'll be able to see it the first time IIS launches
When you execute the first request, you should see:
http://example.com/bar/foo
on subsequent requests, you should see:
http://example.com//bar/foo.
My question is: Is this a bug in IIS? Why does it provide different URLs when calling Application.Request.Url.AbsolutePath the first time, but not for any subsequent request?
Also: It doesn't matter whether the first request is for a double encoded URL or not, the second request will always be handled appropriately by IIS (or at least, as appropriate as handling double-encoded forward slashes can be). It's that very first request that is the problem.
Update
I tried a few different properties to see if one had different values on the first request:
First Request
string u = Application.Request.Url.AbsoluteUri;
"http://example.com/foo/baz/bar/"
string x = Application.Request.Url.OriginalString;
"http://example.com:80/foo/baz/bar"
string y = Application.Request.RawUrl;
"/%2ffo/baz/bar"
bool z = Application.Request.Url.IsWellFormedOriginalString();
true
The only interesting thing is that the Application.Request.RawUrl emits a single-encoded Forward slash (%2f), and translates the encoded backslash (%5c) to a forwardslash (although everything else does that as well).
The RawUrl is still partially encoded on the first request.
Second Request
string u = Application.Request.Url.AbsoluteUri;
"http://example.com//foo/baz/bar"
string x = Application.Request.Url.OriginalString;
"http://example.com:80/%2ffoo/baz/bar"
string y = Application.Request.RawUrl;
"/%2ffoo/baz/bar"
bool z = Application.Request.Url.IsWellFormedOriginalString();
false
Interesting points from the second request:
IsWellFormedOriginalString() is false. On the first request it was true.
The RawUrl is the same (potentially helpful).
The AbsoluteUri is different. On the second request, it has two forward slashes.
Update
Application.Request.ServerVariables["URL"] = /quotes/gc/v12/CMX
Application.Request.ServerVariables["CACHE_URL"] = http://example.com:80/%2ffoo/baz/bar
Open Questions
This seems like a bug in either IIS or .NET. Is it?
This only matters for the very first request made by an application after an iisreset
Besides using RawUrl (as we'd have to worry about a lot of other problems if we parsed the Raw Url instead of using the 'safe' URL provided by .NET), what other methods are there for us to handle this?
Keep in mind, the physical impact of this problem is low: For it to be an actual problem, the first request to the web server from a client would have to be for the above specific URL, and the chances of that happening are relatively low.
Request.Url can be decoded already - I wouldn't trust it for what you are doing.
See the internal details at:
Querystring with url-encoded ampersand prematurely decoded within Request.Url
The solution is to access the values directly via Request.RawUrl.
I realize your prob is with the path, but it seems the same thing is going on. Try the RawUrl - see if it works for you instead.
This really isn't an answer, but possibly a step in the right direction. I haven't had time to create a test harness to prove anything.
I followed this.PrivateAbsolutePath through Reflector and it goes on and on. There is a lot of string manipulation when it's accessed.
public string AbsolutePath
{
get
{
if (this.IsNotAbsoluteUri)
{
throw new InvalidOperationException(SR.GetString("net_uri_NotAbsolute"));
}
string privateAbsolutePath = this.PrivateAbsolutePath; //HERE
if (this.IsDosPath && (privateAbsolutePath[0] == '/'))
{
privateAbsolutePath = privateAbsolutePath.Substring(1);
}
return privateAbsolutePath;
}
}
I would like to take the original URL, truncate the query string parameters, and return a cleaned up version of the URL. I would like it to occur across the whole application, so performing through the global.asax would be ideal. Also, I think a 301 redirect would be in order as well.
ie.
in: www.website.com/default.aspx?utm_source=twitter&utm_medium=social-media
out: www.website.com/default.aspx
What would be the best way to achieve this?
System.Uri is your friend here. This has many helpful utilities on it, but the one you want is GetLeftPart:
string url = "http://www.website.com/default.aspx?utm_source=twitter&utm_medium=social-media";
Uri uri = new Uri(url);
Console.WriteLine(uri.GetLeftPart(UriPartial.Path));
This gives the output: http://www.website.com/default.aspx
[The Uri class does require the protocol, http://, to be specified]
GetLeftPart basicallys says "get the left part of the uri up to and including the part I specify". This can be Scheme (just the http:// bit), Authority (the www.website.com part), Path (the /default.aspx) or Query (the querystring).
Assuming you are on an aspx web page, you can then use Response.Redirect(newUrl) to redirect the caller.
Here is a simple trick
Dim uri = New Uri(Request.Url.AbsoluteUri)
dim reqURL = uri.GetLeftPart(UriPartial.Path)
Here is a quick way of getting the root path sans the full path and query.
string path = Request.Url.AbsoluteUri.Replace(Request.Url.PathAndQuery,"");
This may look a little better.
string rawUrl = String.Concat(this.GetApplicationUrl(), Request.RawUrl);
if (rawUrl.Contains("/post/"))
{
bool hasQueryStrings = Request.QueryString.Keys.Count > 1;
if (hasQueryStrings)
{
Uri uri = new Uri(rawUrl);
rawUrl = uri.GetLeftPart(UriPartial.Path);
HtmlLink canonical = new HtmlLink();
canonical.Href = rawUrl;
canonical.Attributes["rel"] = "canonical";
Page.Header.Controls.Add(canonical);
}
}
Followed by a function to properly fetch the application URL.
Works perfectly.
I'm guessing that you want to do this because you want your users to see pretty looking URLs. The only way to get the client to "change" the URL in its address bar is to send it to a new location - i.e. you need to redirect them.
Are the query string parameters going to affect the output of your page? If so, you'll have to look at how to maintain state between requests (session variables, cookies, etc.) because your query string parameters will be lost as soon as you redirect to a page without them.
There are a few ways you can do this globally (in order of preference):
If you have direct control over your server environment then a configurable server module like ISAPI_ReWrite or IIS 7.0 URL Rewrite Module is a great approach.
A custom IHttpModule is a nice, reusable roll-your-own approach.
You can also do this in the global.asax as you suggest
You should only use the 301 response code if the resource has indeed moved permanently. Again, this depends on whether your application needs to use the query string parameters. If you use a permanent redirect a browser (that respects the 301 response code) will skip loading a URL like .../default.aspx?utm_source=twitter&utm_medium=social-media and load .../default.aspx - you'll never even know about the query string parameters.
Finally, you can use POST method requests. This gives you clean URLs and lets you pass parameters in, but will only work with <form> elements or requests you create using JavaScript.
Take a look at the UriBuilder class. You can create one with a url string, and the object will then parse this url and let you access just the elements you desire.
After completing whatever processing you need to do on the query string, just split the url on the question mark:
Dim _CleanUrl as String = Request.Url.AbsoluteUri.Split("?")(0)
Response.Redirect(_CleanUrl)
Granted, my solution is in VB.NET, but I'd imagine that it could be ported over pretty easily. And since we are only looking for the first element of the split, it even "fails" gracefully when there is no querystring.
I set cookies based on the referral links and they all start with the same letters, lets say "google", but they end with _xxx, _yyy, _zzz or whatever is the reference.
Now, when I try to get the cookies later, I have the problem that I don't want to check for all of the different cookies, I would like to check for all cookies that start with "google" and based on that I will start a script that goes on with processing.
if (Request.Cookies("google"))
{
run other stuff
}
Any idea how I can add StartWith or something to it? I am a newbie, so not really that into C# yet.
Thanks in advance,
Pat
Well.. HttpRequest.Cookies is a collection. So use LINQ:
var qry = from cookieName in Request.Cookies.Keys
where cookieName.StartsWith("google")
select cookieName;
foreach(var item in qry)
{
// get the cookie and deal with it.
var cookie = Request.Cookies[item];
}
Bottom line: you can't get away from iterating over the entire cookie collection. But you can do it easily using LINQ.
You have to check all the cookies if you want to find ones with a certain suffix (Randolpho's answer will work).
It's not a particularly good idea to do it that way. The problem is that the more cookies you create, the more overhead you put on the server and connection. Say you have 10 cookies: google_aaa, google_bbb, etc. Each request will send all 10 cookies to your server (this includes requests for images, css, etc.
You're better off using a single cookie which is some sort of key to all the information stored on your server. Something like this:
var cookie = Cookies["google"];
if(cookie!=null)
{
// cookie.Value is a unique key for this user. Lookup this
// key in your database or other store to find out the
// information about this user.
}
If you prefer, use lambda expression, this way
var cookie = Request.Cookies.AllKeys.FirstOrDefault(s => s.Contains("yourName"));
Hope this help!