New line constant in the url breaks the page - c#

I am using ASP.NET and c# to build my website.
I am facing some issues with newline constant in the url.
Here is my Url
http://localhost/TestProject/Login/Login.aspx?ReturnUrl=http://google.com?search=Software+Engineer%0ASalesforce.com
which contains a new line constant. (%0A)
In my code i am decoding the url to get some parameter, so i am using System.Web.HttpUtility.UrlDecode to decode the url and later after that, i am appending some session id to the url and redirecting the same.
Since the Url contains new line constant, it breaking my page and throws an exception
Value does not fall within the expected range.
if i remove the new line constant from the url its working fine as expected.
Any suggestions how to handle this?

Related

Problems with aspx getting url with parentheses

I am getting problems with ASP getting the URL when it has parentheses.
My site has URLs like this one: http://example.com/my-amazing-product-%28online%29
So, I am trying to get that URL using the instruction:
var requestURL = HttpContext.Current.Request.RawUrl;
But, that instruction is getting the URL: http://example.com/my-amazing-product-(online)
It automatically converts the parentheses from %28 to ( and from %29 to ), but I don't need that conversion, I need the original URL with %28 and %29
And I can't use replace to fix the URL. I have to get that URL directly from the request or something similar.
Can anyone explain that weird behavior and solve the problem?

Weird return url

I am trying to use oauth2.
I make a get request, and then I get redirected at a callback url that I have set up before. The problem lies in the fact that the url parameters get preceded by the # sign and thus php, .net can't read them!
I get redirected in the following url in my browser:
http://localhost:1787/About.aspx?#access_token=f3EToovT2bQNNOQ&token_type=bearer&merchant_id=A6BGD4BH&response_type=token
Request.Params is empty, request.query string is empty. Even when I use php and print the $_REQUEST array still is empty!
How is this possible?
Whatever comes after the # doesn't mark as DATA being sent to the server, but a hash on the client side.
Try redirect using JavaScript only the hash:
window.location = window.location.pathname + '?' + window.location.hash.substring(1);

How to control colon (:) URL Query String?

I need to pass the Query String of ID in which colon (:) is included i.e. ABC_PD:123456.
When I am using this ID in query String session and when its redirect to another page in URL it give 404 no error found error on webpage.
So can any one provide the solution for this so that I can pass the colon in query string and when Page will be redirect without 404 error.
Solution would be much appreciated.
When you build the URL that you redirect to, you need to encode special characters by using the UrlEncode-method:
var redirectTo = "/mypage.aspx?id=" + HttpUtility.UrlEncode("id123:456");
This will create a query string that looks like this and will be interpreted correctly:
"/mypage.aspx?id=id123%3A456"

Should a .Net protocol-less URI showing as relative?

All,
In HTML, it is my understanding that a url that starts // (e.g. //www.google.com) refers to a protocol-less url that should be requested in the same scheme as that in which the page was served.
However, the following c# code fails
var uri = new Uri("//www.google.com", UriKind.RelativeOrAbsolute);
Assert.IsTrue(uri.IsAbsoluteUri);
Am I missing something here? At the moment I am rolling my own regex to find out if a URI is absolute:
return Regex.IsMatch(url, #"^(https?:)?//")
It's not absolute. It's relative to whether the URL is accessed from a source that is served over HTTP, HTTPS, or something else.

How to get the address of a redirected page?

The goal of my program is to grab a webpage and then generate a list of Absolute links with the pages it links to.
The problem I am having is when a page redirects to another page without the program knowing, it makes all the relative links wrong.
For example:
I give my program this link: moodle.pgmb.si/moodle/course/view.php?id=1
On this page, if it finds the link href="signup.php" meaning signup.php in the current directory, it errors because there is no directory above the root.
However this error is invalid because the page's real location is:
moodle.pgmb.si/moodle/login/index.php
Meaning that "signup.php" is linking to moodle.pgmb.si/signup.php which is a valid page, not moodle.pgmb.si/moodle/course/signup.php like my program thinks.
So my question is how is my program supposed to know that the page it received is at another location?
I am doing this in C Sharp using the follownig code to get the HTML
WebRequest wrq = WebRequest.Create(address);
WebResponse wrs = wrq.GetResponse();
StreamReader strdr = new StreamReader(wrs.GetResponseStream());
string html = strdr.ReadToEnd();
strdr.Close();
wrs.Close();
You should be able to use ResponseUri method of WebResponse class. This will contain the URI of the internet resource that actually provided the response data, as opposed to the resource that was requested. You can then use this URI to build correct links.
http://msdn.microsoft.com/en-us/library/system.net.webresponse.responseuri.aspx
What I would do is first check if each link is absolute or relative by searching for an "http://" within it. If it's absolute, you're done. If it's relative, then you need to append the path to the page you're scanning in front of it.
There are a number of ways you could get the current path: you could Split() it on the slashes ("/"), then recombine all but the last one. Or you could search for the last occurrence of a slash and then take a substring of up to and including that position.
Edit: Re-reading the question, I'm not sure I am understanding. href="signup.php" is a relative link, which should go to the /signup.php. So the current behavior you mentioned is correct "moodle.pgmb.si/moodle/course/signup.php."
The problem is that, if the URL isn't a relative or absolute URL, then you have no way of knowing where it goes unless you request it. Even then, it might not actually be being served from where you think it is located. This is because it might actually be implemented as a HTTP Redirect or similar server side.
So if you want to be exhaustive, what you can do is:
Use your current technique to grab a list of all links on the page.
Attempt to request each of those pages. Then if you:
Get a 200 responce code then all is good - it's there.
Get a 404 response code you know the page does not exist
Get a 3XX response code then you know where the web server
expects that content to actually orginate form.
Your (Http)WebResponse object should have a ResponseCode property. Note that you should also handle any possible WebException errors - these too will have a WebResponse with a ResponseCode in (usually 5xx).
You can also look at the HttpWebResponse Headers property - the Location header.

Categories

Resources