I have created an application in which I need to encode/decode special characters from the url which is entered by user.
For example : if user enters http://en.wikipedia.org/wiki/Å then it's respective Url should be http://en.wikipedia.org/wiki/%C3%85.
I made console application with following code.
string value = "http://en.wikipedia.org/wiki/Å";
Console.WriteLine(System.Web.HttpUtility.UrlEncode(value));
It decodes the character Å successfully and also encodes :// characters. After running the code I am getting output like : http%3a%2f%2fen.wikipedia.org%2fwiki%2f%c3%85 but I want http://en.wikipedia.org/wiki/%C3%85
What should I do?
Uri.EscapeUriString(value) returns the value that you expect. But it might have other problems.
There are a few URL encoding functions in the .NET Framework which all behave differently and are useful in different situations:
Uri.EscapeUriString
Uri.EscapeDataString
WebUtility.UrlEncode (only in .NET 4.5)
HttpUtility.UrlEncode (in System.Web.dll, so intended for web applications, not desktop)
You could use regular expressions to select hostname and then urlencode only other part of string:
var inputString = "http://en.wikipedia.org/wiki/Å";
var encodedString;
var regex = new Regex("^(?<host>https?://.+?/)(?<path>.*)$");
var match = regex.Match(inputString);
if (match.Success)
encodedString = match.Groups["host"] + System.Web.HttpUtility.UrlEncode(match.Groups["path"].ToString());
Console.WriteLine(encodedString);
Related
How do you properly encode a path that includes a hash (#) in it? Note the hash is not the fragment (bookmark?) indicator but part of the path name.
For example, if there is a path like this:
http://www.contoso.com/code/c#/somecode.cs
It causes problems when you for example try do this:
Uri myUri = new Uri("http://www.contoso.com/code/c#/somecode.cs");
It would seem that it interprets the hash as the fragment indicator.
It feels wrong to manually replace # with %23. Are there other characters that should be replaced?
There are some escaping methods in Uri and HttpUtility but none seem to do the trick.
There are a few characters you are not supposed to use. You can try to work your way through this very dry documentation, or refer to this handy URL summary on Stack Overflow.
If you check out this very website, you'll see that their C# questions are encoded %23.
Stack Overflow C# Questions
You can do this using either (for ASP.NET):
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
Server.UrlEncode("c#")
);
Or for class libraries / desktop:
string.Format("http://www.contoso.com/code/{0}/somecode.cs",
HttpUtility.UrlEncode("c#")
);
Did some more digging friends and found a duplicate question for Java:
HTTP URL Address Encoding in Java
However, the .Net Uri class does not offer the constructor we need, but the UriBuilder does.
So, in order to construct a proper URI where the path contains illegal characters, do this:
// Build Uri by explicitly specifying the constituent parts. This way, the hash is not confused with fragment identifier
UriBuilder uriBuilder = new UriBuilder("http", "www.contoso.com", 80, "/code/c#/somecode.cs");
Debug.WriteLine(uriBuilder.Uri);
// This outputs: http://www.contoso.com/code/c%23/somecode.cs
Notice how it does not unnecessarily escape parts of the URI that does not need escaping (like the :// part) which is the case with HttpUtility.UrlEncode. It would seem that the purpose of this class is actually to encode the querystring/fragment part of the URL - not the scheme or hostname.
Use UrlEncode: System.Web.HttpUtility.UrlEncode(string)
class Program
{
static void Main(string[] args)
{
string url = "http://www.contoso.com/code/c#/somecode.cs";
string enc = HttpUtility.UrlEncode(url);
Console.WriteLine("Original: {0} ... Encoded {1}", url, enc);
Console.ReadLine();
}
}
I have the following line of aspx link that I would like to encode:
Response.Redirect("countriesAttractions.aspx?=");
I have tried the following method:
Response.Redirect(Encoder.UrlPathEncode("countriesAttractions.aspx?="));
This is another method that I tried:
var encoded = Uri.EscapeUriString("countriesAttractions.aspx?=");
Response.Redirect(encoded);
Both redirects to the page without the URL being encoded:
http://localhost:52595/countriesAttractions?=
I tried this third method:
Response.Redirect(Server.UrlEncode("countriesAttractions.aspx?="));
This time the url itself gets encoded:
http://localhost:52595/countriesAttractions.aspx%3F%3D
However I get an error from the UI saying:
HTTP Error 404.0 Not Found
The resource you are looking for has been removed, had its name changed, or
is temporarily unavailable.
Most likely causes:
-The directory or file specified does not exist on the Web server.
-The URL contains a typographical error.
-A custom filter or module, such as URLScan, restricts access to the file.
Also, I would like to encode another kind of URL that involves parsing of session strings:
Response.Redirect("specificServices.aspx?service=" +
Session["service"].ToString().Trim() + "&price=" +
Session["price"].ToString().Trim()));
The method I tried to include the encoding method into the code above:
Response.Redirect(Server.UrlEncode("specificServices.aspx?service=" +
Session["service"].ToString().Trim() + "&price=" +
Session["price"].ToString().Trim()));
The above encoding method I used displayed the same kind of results I received with my previous Server URL encode methods. I am not sure on how I can encode url the correct way without getting errors.
As well as encoding URL with CommandArgument:
Response.Redirect("specificAttractions.aspx?attraction=" +
e.CommandArgument);
I have tried the following encoding:
Response.Redirect("specificAttractions.aspx?attraction=" +
HttpUtility.HtmlEncode(Convert.ToString(e.CommandArgument)));
But it did not work.
Is there any way that I can encode the url without receiving this kind of error?
I would like the output to be something like my second result but I want to see the page itself and not the error page.
I have tried other methods I found on stackoverflow such as self-coded methods but those did not work either.
I am using AntiXSS class library in this case for the methods I tried, so it would be great if I can get solutions using AntiXSS library.
I need to encode URL as part of my school project so it would be great if I can get solutions. Thank you.
You can use the UrlEncode or UrlPathEncode methods from the HttpUtility class to achieve what you need. See documentation at https://msdn.microsoft.com/en-us/library/system.web.httputility.urlencode(v=vs.110).aspx
It's important to understand however, that you should not need to encode the whole URL string. It's only the parameter values - which may contain arbitrary data and characters which aren't valid in a URL - that you need to encode.
To explain this concept, run the following in a simple .NET console application:
string url = "https://www.google.co.uk/search?q=";
//string url = "http://localhost:52595/specificAttractions.aspx?country=";
string parm = "Bora Bora, French Polynesia";
Console.WriteLine(url + parm);
Console.WriteLine(url + HttpUtility.UrlEncode(parm), System.Text.Encoding.UTF8);
Console.WriteLine(url + HttpUtility.UrlPathEncode(parm), System.Text.Encoding.UTF8);
Console.WriteLine(HttpUtility.UrlEncode(url + parm), System.Text.Encoding.UTF8);
You'll get the following output:
https://www.google.co.uk/search?q=Bora Bora, French Polynesia
https://www.google.co.uk/search?q=Bora+Bora%2c+French+Polynesia
https://www.google.co.uk/search?q=Bora%20Bora,%20French%20Polynesia
https%3a%2f%2fwww.google.co.uk%2fsearch%3fq%3dBora+Bora%2c+French+Polynesia
By pasting these into a browser and trying to use them, you'll soon see what is a valid URL and what is not.
(N.B. when pasting into modern browsers, many of them will URL-encode automatically for you, if your parameter is not valid - so you'll find the first output works too, but if you tried to call it via some C# code for instance, it would fail.)
Working demo: https://dotnetfiddle.net/gqFsdK
You can of course alter the values you input to anything you like. They can be hard-coded strings, or the result of some other code which returns a string (e.g. fetching from the session, or a database, or a UI element, or anywhere else).
N.B. It's also useful to clarify that a valid URL is simply a string in the correct format of a URL. It is not the same as a URL which actually exists. A URL may be valid but not exist if you try to use it, or may be valid and really exist.
I'm using the Uri class to request datas using a php script. In my case I need to use URL containing special char like: é or '. Here is my piece of code:
string NomArret = "Université";
uri = new Uri("http://localhost/getdata.php?aarret=" + NomArret);
But this return 0 results. I debugged and I notices that uri encode this URL like:
http://84.75.112.69/getdata.php?aarret=Universit%C3%A9
So he converts the char é to %C3%A9. In this website (www.degraeve.com/reference/urlencoding.php) I've seen that the é char does be convert to %E9.
When I try manually using this encoding:
http://84.75.112.69/getdata.php?aarret=Universit%E9
It works ! So how can I adapt my code to be able to convert correctly the special character ?
Can you use Uri.EscapeDataString ? (I'm not a C# dev so i can't verify it)
I have a problem parsing base64 encoded blob from tool output.
I'm using this regex in c#: #"(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)"
Everything worked fine until the blob I receive looks like following (it does not have even one '='. according to wiki base64 string can have 0-2 "=" signs in the end/)
I cannot work with string before and after the blob because it can be muli-language
Provisioning the computer account... Successfully provisioned
[user-1009-1-] in the domain [testauto.sof]. Provisioning data was
saved successfully to
[C:\Users\user1\AppData\Local\Temp\user-1009-1-.blob]. Provisioning
string (2624l bytes):
ARAIAMzMzMzIAwAAAAAAAAAAAgABAAAAAQAAAAQAAgABAAAAAQAAAKADAAAIAAIAoAMAAAEQCADMzMzMkAMAAAAAAAAAAAIABAACAAgAAgAMAAIAFAAWABAAAgAcAB4AFAACABwAHgAYAAIA5pXFb5WfaEeEDMNZ4eUJVxwAAgAgAAIAJAACAAEAAADmlcVvlZ9oR4QMw1nh5QlXKAACACwAAgD8MQDgMAACADQAAgACAAAADwAAAAAAAAAPAAAAbQBpAHIAYQBnAGUAYQB1AHQAbwAuAHMAbwBmAAAAAAAQAAAAAAAAABAAAAB4AGkAcwBoAGUAbgBnAC0AMQAwADAAOQAtADEALQAAAHkAAAAAAAAAeQAAAF0APwAjAHkAWgBYADoAdwBgAGUANgB2AG8AJQBuAHkAIwBUAE4ALAAxAHUAOQBkAD4AdwA5AG8AaABgAHEAIwBhADYALABwAF0ALwBoAEgAIABqAHMATwA5AGwATgB5ADEAVQAhAE4ASwAiAEUAPQA8AGIAZQBAAC8ARgA1AGIAQAAiAG8AUgA6AGIAawAzAF4AVwBrACYATQA5AE8AdQBvAHYAZQA0AGMAKwBDAHMAUABrAGsAdQBAAF4AYAAgAFsAJQBgAGUAJwAuAG8AOwBBACgATQBuACIARgAlAG0ASwA1ADMAbQBJAHcAXQBkAAAAAAALAAAAAAAAAAoAAABNAEkAUgBBAEcARQBBAFUAVABPAA8AAAAAAAAADgAAAG0AaQByAGEAZwBlAGEAdQB0AG8ALgBzAG8AZgAPAAAAAAAAAA4AAABtAGkAcgBhAGcAZQBhAHUAdABvAC4AcwBvAGYABAAAAAEEAAAAAAAFFQAAAN0GKLrAxnxUyKGAThoAAAAAAAAAGgAAAFwAXABTAE8ARgAtAFEAQQAwADIALgBtAGkAcgBhAGcAZQBhAHUAdABvAC4AcwBvAGYAAAANAAAAAAAAAA0AAABcAFwAMQAwAC4AMgAzAC4ANwAyAC4AMgAAAAAADwAAAAAAAAAPAAAAbQBpAHIAYQBnAGUAYQB1AHQAbwAuAHMAbwBmAAAAAAAPAAAAAAAAAA8AAABtAGkAcgBhAGcAZQBhAHUAdABvAC4AcwBvAGYAAAAAABgAAAAAAAAAGAAAAEQAZQBmAGEAdQBsAHQALQBGAGkAcgBzAHQALQBTAGkAdABlAC0ATgBhAG0AZQAAABgAAAAAAAAAGAAAAEQAZQBmAGEAdQBsAHQALQBGAGkAcgBzAHQALQBTAGkAdABlAC0ATgBhAG0AZQAAAAAAAAAAAAAA
Computer account provisioning completed successfully. The operation
completed successfully.
Anyone can help me to fix the regex?
Here is regex calculator that I using:
http://regex101.com/r/wP3kP9/1
The following should work successfully:
^(?!$)(?:[A-Za-z0-9+\/]{4})*(?:[A-Za-z0-9+\/]{2}==|[A-Za-z0-9+\/]{3}=)?$
regex101
In my understanding, if no = is present, it's because the string length is a multiple of 4.
I also anchored it with ^...$and used the m option so only your base64 string matches. I added (?!$) so empty lines don't match (couldn't simply change the * to + because you may want to match short strings like aa==).
I have found many examples of how to match particular types of URL-s in PHP and other languages. I need to match any URL from my C# application. How to do this? When I talk about URL I talk about links to any sites or to files on sites and subdirectiories and so on.
I have a text like this: "Go to my awsome website http:\www.google.pl\something\blah\?lang=5" or else and I need to get this link from this message. Links can start only with www. too.
If you need to test your regex to find URLs you can try this resource
http://gskinner.com/RegExr/
It will test your regex while you're writing it.
In C# you can use regex for example as below:
Regex r = new Regex(#"(?<Protocol>\w+):\/\/(?<Domain>[\w#][\w.:#]+)\/?[\w\.?=%&=\-#/$,]*");
// Match the regular expression pattern against a text string.
Match m = r.Match(text);
while (m.Success)
{
//do things with your matching text
m = m.NextMatch();
}
Microsoft has a nice page of some regular expressions...this is what they say (works pretty good too)
^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$
http://msdn.microsoft.com/en-us/library/ff650303.aspx#paght000001_commonregularexpressions
I am not sure exactly what you are asking, but a good start would be the Uri class, which will parse the url for you.
Here's one defined for URL's.
^http(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$
http://msdn.microsoft.com/en-us/library/ms998267.aspx
Regex regx = new Regex("http(s)?://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);
This will return a match collection of all matches found within "yourStringThatHasUrlsInIt":
var pattern = #"((ht|f)tp(s?)\:\/\/|~/|/)?([w]{2}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?";
var regex = new Regex(pattern);
var matches = regex.Matches(yourStringThatHasUrlsInIt);
The return will be a "MatchCollection" which you can read more about here:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchcollection.aspx
//This code return (protocol://)host:port from URL
//Commented URL's with different protocols. Just uncomment to test.
//string url = "http://www.contoso.com:8080/letters/readme.html";
//string url = "ftp://www.contoso.com:8080/letters/readme.html";
//string url = "l2tp://1.5.8.6:8080/letters/readme.html";
string url = "l2tp://1.5.8.6:8080/letters/readme.html";
string host = "";//empty string with host from url
//protocol, (ip/domain), port
host = Regex.Match(url, #"^(?<proto>\w+)://+?(?<host>[A-Za-z0-9\-\.]+)+?(?<port>:\d+)?/", RegexOptions.None, TimeSpan.FromMilliseconds(150)).Result("${proto}://${host}${port}");
//(ip/domain):port without protocol. If HTTPS board loading images from HTTP host.
//host = Regex.Match(url, #"^(?<proto>\w+)://+?(?<host>[A-Za-z0-9\-\.]+)+?(?<port>:\d+)?/", RegexOptions.None, TimeSpan.FromMilliseconds(150)).Result("${host}${port}");
Console.WriteLine("url: "+url+"\nhost: "+host); //display host
see https://rextester.com/PVSO54371
u can also use https://github.com/d-kistanov-parc/DotNetUrlPatternMatching
The library allows you to match a URL to a pattern.
How it works:
an url pattern is split into parts
each non-empty part is matched with a similar one from the URL.
You can specify a Wildcard * or ~
Where * is any character set within the group (scheme, host, port, path, parameter, fragment)
Where ~ any character set within a group segment (host, path)
Only supply parts of the URL you care about. Parts which are left out will match anything. E.g. if you don’t care about the host, then leave it out.