Characters after "#" is not recognized by system.net.webclient.DownloadFile method [duplicate] - c#

How do you properly encode a path that includes a hash (#) in it? Note the hash is not the fragment (bookmark?) indicator but part of the path name.
For example, if there is a path like this:
http://www.contoso.com/code/c#/somecode.cs
It causes problems when you for example try do this:
Uri myUri = new Uri("http://www.contoso.com/code/c#/somecode.cs");
It would seem that it interprets the hash as the fragment indicator.
It feels wrong to manually replace # with %23. Are there other characters that should be replaced?
There are some escaping methods in Uri and HttpUtility but none seem to do the trick.

There are a few characters you are not supposed to use. You can try to work your way through this very dry documentation, or refer to this handy URL summary on Stack Overflow.
If you check out this very website, you'll see that their C# questions are encoded %23.
Stack Overflow C# Questions
You can do this using either (for ASP.NET):
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
Server.UrlEncode("c#")
);
Or for class libraries / desktop:
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
HttpUtility.UrlEncode("c#")
);

Did some more digging friends and found a duplicate question for Java:
HTTP URL Address Encoding in Java
However, the .Net Uri class does not offer the constructor we need, but the UriBuilder does.
So, in order to construct a proper URI where the path contains illegal characters, do this:
// Build Uri by explicitly specifying the constituent parts. This way, the hash is not confused with fragment identifier
UriBuilder uriBuilder = new UriBuilder("http", "www.contoso.com", 80, "/code/c#/somecode.cs");
Debug.WriteLine(uriBuilder.Uri);
// This outputs: http://www.contoso.com/code/c%23/somecode.cs
Notice how it does not unnecessarily escape parts of the URI that does not need escaping (like the :// part) which is the case with HttpUtility.UrlEncode. It would seem that the purpose of this class is actually to encode the querystring/fragment part of the URL - not the scheme or hostname.

Use UrlEncode: System.Web.HttpUtility.UrlEncode(string)
class Program
{
static void Main(string[] args)
{
string url = "http://www.contoso.com/code/c#/somecode.cs";
string enc = HttpUtility.UrlEncode(url);
Console.WriteLine("Original: {0} ... Encoded {1}", url, enc);
Console.ReadLine();
}
}

Related

How to encode a URL using Asp.net?

I have the following line of aspx link that I would like to encode:
Response.Redirect("countriesAttractions.aspx?=");
I have tried the following method:
Response.Redirect(Encoder.UrlPathEncode("countriesAttractions.aspx?="));
This is another method that I tried:
var encoded = Uri.EscapeUriString("countriesAttractions.aspx?=");
Response.Redirect(encoded);
Both redirects to the page without the URL being encoded:
http://localhost:52595/countriesAttractions?=
I tried this third method:
Response.Redirect(Server.UrlEncode("countriesAttractions.aspx?="));
This time the url itself gets encoded:
http://localhost:52595/countriesAttractions.aspx%3F%3D
However I get an error from the UI saying:
HTTP Error 404.0 Not Found
The resource you are looking for has been removed, had its name changed, or
is temporarily unavailable.
Most likely causes:
-The directory or file specified does not exist on the Web server.
-The URL contains a typographical error.
-A custom filter or module, such as URLScan, restricts access to the file.
Also, I would like to encode another kind of URL that involves parsing of session strings:
Response.Redirect("specificServices.aspx?service=" +
Session["service"].ToString().Trim() + "&price=" +
Session["price"].ToString().Trim()));
The method I tried to include the encoding method into the code above:
Response.Redirect(Server.UrlEncode("specificServices.aspx?service=" +
Session["service"].ToString().Trim() + "&price=" +
Session["price"].ToString().Trim()));
The above encoding method I used displayed the same kind of results I received with my previous Server URL encode methods. I am not sure on how I can encode url the correct way without getting errors.
As well as encoding URL with CommandArgument:
Response.Redirect("specificAttractions.aspx?attraction=" +
e.CommandArgument);
I have tried the following encoding:
Response.Redirect("specificAttractions.aspx?attraction=" +
HttpUtility.HtmlEncode(Convert.ToString(e.CommandArgument)));
But it did not work.
Is there any way that I can encode the url without receiving this kind of error?
I would like the output to be something like my second result but I want to see the page itself and not the error page.
I have tried other methods I found on stackoverflow such as self-coded methods but those did not work either.
I am using AntiXSS class library in this case for the methods I tried, so it would be great if I can get solutions using AntiXSS library.
I need to encode URL as part of my school project so it would be great if I can get solutions. Thank you.
You can use the UrlEncode or UrlPathEncode methods from the HttpUtility class to achieve what you need. See documentation at https://msdn.microsoft.com/en-us/library/system.web.httputility.urlencode(v=vs.110).aspx
It's important to understand however, that you should not need to encode the whole URL string. It's only the parameter values - which may contain arbitrary data and characters which aren't valid in a URL - that you need to encode.
To explain this concept, run the following in a simple .NET console application:
string url = "https://www.google.co.uk/search?q=";
//string url = "http://localhost:52595/specificAttractions.aspx?country=";
string parm = "Bora Bora, French Polynesia";
Console.WriteLine(url + parm);
Console.WriteLine(url + HttpUtility.UrlEncode(parm), System.Text.Encoding.UTF8);
Console.WriteLine(url + HttpUtility.UrlPathEncode(parm), System.Text.Encoding.UTF8);
Console.WriteLine(HttpUtility.UrlEncode(url + parm), System.Text.Encoding.UTF8);
You'll get the following output:
https://www.google.co.uk/search?q=Bora Bora, French Polynesia
https://www.google.co.uk/search?q=Bora+Bora%2c+French+Polynesia
https://www.google.co.uk/search?q=Bora%20Bora,%20French%20Polynesia
https%3a%2f%2fwww.google.co.uk%2fsearch%3fq%3dBora+Bora%2c+French+Polynesia
By pasting these into a browser and trying to use them, you'll soon see what is a valid URL and what is not.
(N.B. when pasting into modern browsers, many of them will URL-encode automatically for you, if your parameter is not valid - so you'll find the first output works too, but if you tried to call it via some C# code for instance, it would fail.)
Working demo: https://dotnetfiddle.net/gqFsdK
You can of course alter the values you input to anything you like. They can be hard-coded strings, or the result of some other code which returns a string (e.g. fetching from the session, or a database, or a UI element, or anywhere else).
N.B. It's also useful to clarify that a valid URL is simply a string in the correct format of a URL. It is not the same as a URL which actually exists. A URL may be valid but not exist if you try to use it, or may be valid and really exist.

WebClient.DownloadFile - Invalid URI: The hostname could not be parsed

So I tried this in several different formats and produced different results. I will include all relevant information below.
My company uses a web-based application to schedule the generation of reports. The service emails a URL that can be clicked on and will immediately begin the "Open Save As Cancel" dialogue box. I am trying to automate the process of downloading these reports with a C# script as part of a Visual Studio project (the end goal is to import these reports in SQL Server).
I am encountering terrible difficulty initiating the download of this file using WebClient Here is the closest I have gotten with any of the methods I have tried:
*NOTE: I removed all identifying information from the URL, but left all special characters and the basic architecture intact. Hopefully this will be a happy medium between protecting confidential info and giving you enough to understand my dilemma. The URL does work when manually copied and pasted into the address bar of internet explorer.
Error Message:
"Invalid URI: The hostname could not be parsed."
public void Main()
{
using (var wc = new System.Net.WebClient())
{
wc.DownloadFile(
new Uri(#"http:\\webapp.locality.company.com\scripts\rds\cgigetf.exe?job_id=3058352&file_id=1&format=TAB\report.tab"),
#"\\server\directory\folder1\folder2\folder3\...\...\...\rawfile.tab");
}
}
Note also that I have tried to set:
string sourceUri = #"http:\\webapp.locality.company.com\scripts\rds\cgigetf.exe?job_id=3058352&file_id=1&format=TAB\report.tab\abc123_3058352.tab";
Uri uriPath;
Uri.TryCreate(sourceUri, UriKind.Absolute, out uriPath);
But uriPath remains null - TryCreate fails.
I have attempted doing a webrequest / webresponse / WebStream, but it still cannot find the host.
I have tried including the download URL (as in my first code example) and the download URL + the file name (as in my second code example). I do not need the file name in the URL to initiate the download if I do it manually. I have also tried replacing the "report.tab" portion of the URL with the file name, but to no avail.
Help is greatly appreciated as I have simply run out of thoughts on this one. The only idea I have left is that perhaps one of the special characters in my URL is getting in the way, but I don't know which one that would be or how to handle it properly.
Thanks in advance!
My first thought would be that your URI backslashes are being interpreted as escape characters, leading to a nonsense result after evaluation. I would try a quick test where each backslash is escaped as itself (i.e. "\" instead of "\" in each instance). I'm also a little puzzled as to why your URI is not using forward slashes...?
// Create an absolute Uri from a string.
Uri absoluteUri = new Uri("http://www.contoso.com/");
Ref: Uri Constructor on MSDN

Unable to encode Url properly using HttpUtility.UrlEncode() method

I have created an application in which I need to encode/decode special characters from the url which is entered by user.
For example : if user enters http://en.wikipedia.org/wiki/Å then it's respective Url should be http://en.wikipedia.org/wiki/%C3%85.
I made console application with following code.
string value = "http://en.wikipedia.org/wiki/Å";
Console.WriteLine(System.Web.HttpUtility.UrlEncode(value));
It decodes the character Å successfully and also encodes :// characters. After running the code I am getting output like : http%3a%2f%2fen.wikipedia.org%2fwiki%2f%c3%85 but I want http://en.wikipedia.org/wiki/%C3%85
What should I do?
Uri.EscapeUriString(value) returns the value that you expect. But it might have other problems.
There are a few URL encoding functions in the .NET Framework which all behave differently and are useful in different situations:
Uri.EscapeUriString
Uri.EscapeDataString
WebUtility.UrlEncode (only in .NET 4.5)
HttpUtility.UrlEncode (in System.Web.dll, so intended for web applications, not desktop)
You could use regular expressions to select hostname and then urlencode only other part of string:
var inputString = "http://en.wikipedia.org/wiki/Å";
var encodedString;
var regex = new Regex("^(?<host>https?://.+?/)(?<path>.*)$");
var match = regex.Match(inputString);
if (match.Success)
encodedString = match.Groups["host"] + System.Web.HttpUtility.UrlEncode(match.Groups["path"].ToString());
Console.WriteLine(encodedString);

Using System.Uri to remove redundant slash

I have a condition in my program where I have to combine a server (e.g. http://server1.my.corp/) that may or may not have an ending slash with a relative path (e.g. /Apps/TestOne/). According to the docs, Uri should...
Canonicalizes the path for hierarchical URIs by compacting sequences such as /./, /../, //,...
So when I do something like var url = new Uri(server + relativePath), I'd expect it to take what would otherwise be http://server1.my.corp//Apps/TestOne/ and remove the double slash (i.e // -> /), but ToString, AbsolutePath and various options still show the redundant/duplicate slash. Am I not using Uri right?
Take a look at the constructors for the Uri class. You need to specify a base Uri and a relative path to get the canonized behavior. Try something like this:
var server = new Uri("http://server1.my.corp/");
var resource = new Uri(server, "/Apps/TestOne/");

How to match URL in c#?

I have found many examples of how to match particular types of URL-s in PHP and other languages. I need to match any URL from my C# application. How to do this? When I talk about URL I talk about links to any sites or to files on sites and subdirectiories and so on.
I have a text like this: "Go to my awsome website http:\www.google.pl\something\blah\?lang=5" or else and I need to get this link from this message. Links can start only with www. too.
If you need to test your regex to find URLs you can try this resource
http://gskinner.com/RegExr/
It will test your regex while you're writing it.
In C# you can use regex for example as below:
Regex r = new Regex(#"(?<Protocol>\w+):\/\/(?<Domain>[\w#][\w.:#]+)\/?[\w\.?=%&=\-#/$,]*");
// Match the regular expression pattern against a text string.
Match m = r.Match(text);
while (m.Success)
{
//do things with your matching text
m = m.NextMatch();
}
Microsoft has a nice page of some regular expressions...this is what they say (works pretty good too)
^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$
http://msdn.microsoft.com/en-us/library/ff650303.aspx#paght000001_commonregularexpressions
I am not sure exactly what you are asking, but a good start would be the Uri class, which will parse the url for you.
Here's one defined for URL's.
^http(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$
http://msdn.microsoft.com/en-us/library/ms998267.aspx
Regex regx = new Regex("http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);
This will return a match collection of all matches found within "yourStringThatHasUrlsInIt":
var pattern = #"((ht|f)tp(s?)\:\/\/|~/|/)?([w]{2}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?";
var regex = new Regex(pattern);
var matches = regex.Matches(yourStringThatHasUrlsInIt);
The return will be a "MatchCollection" which you can read more about here:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchcollection.aspx
//This code return (protocol://)host:port from URL
//Commented URL's with different protocols. Just uncomment to test.
//string url = "http://www.contoso.com:8080/letters/readme.html";
//string url = "ftp://www.contoso.com:8080/letters/readme.html";
//string url = "l2tp://1.5.8.6:8080/letters/readme.html";
string url = "l2tp://1.5.8.6:8080/letters/readme.html";
string host = "";//empty string with host from url
//protocol, (ip/domain), port
host = Regex.Match(url, #"^(?<proto>\w+)://+?(?<host>[A-Za-z0-9\-\.]+)+?(?<port>:\d+)?/", RegexOptions.None, TimeSpan.FromMilliseconds(150)).Result("${proto}://${host}${port}");
//(ip/domain):port without protocol. If HTTPS board loading images from HTTP host.
//host = Regex.Match(url, #"^(?<proto>\w+)://+?(?<host>[A-Za-z0-9\-\.]+)+?(?<port>:\d+)?/", RegexOptions.None, TimeSpan.FromMilliseconds(150)).Result("${host}${port}");
Console.WriteLine("url: "+url+"\nhost: "+host); //display host
see https://rextester.com/PVSO54371
u can also use https://github.com/d-kistanov-parc/DotNetUrlPatternMatching
The library allows you to match a URL to a pattern.
How it works:
an url pattern is split into parts
each non-empty part is matched with a similar one from the URL.
You can specify a Wildcard * or ~
Where * is any character set within the group (scheme, host, port, path, parameter, fragment)
Where ~ any character set within a group segment (host, path)
Only supply parts of the URL you care about. Parts which are left out will match anything. E.g. if you don’t care about the host, then leave it out.

Categories

Resources