C# string.Split() Matching Both Slashes?

C# string.Split() Matching Both Slashes? - c#

I've got a .NET 3.5 web application written in C# doing some URL rewriting that includes a file path, and I'm running into a problem. When I call string.Split('/') it matches both '/' and '\' characters. Is that... supposed to happen? I assumed that it would notice that the ASCII values were different and skip it, but it appears that I'm wrong.
// url = 'someserver.com/user/token/files\subdir\file.jpg
string[] buffer = url.Split('/');
The above code gives a string[] with 6 elements in it... which seems counter intuitive. Is there a way to force Split() to match ONLY the forward slash? Right now I'm lucky, since the offending slashes are at the end of the URL, I can just concatenate the rest of the elements in the string[], but it's a lot of work for what we're doing, and not a great solution to the underlying problem.
Anyone run into this before? Have a simple answer? I appreciate it!
More Code:
url = HttpContext.Current.Request.Path.Replace("http://", "");
string[] buffer = url.Split('/');
Turns out, Request.Path and Request.RawUrl are both changing my slashes, which is ridiculous. So, time to research that a bit more and figure out how to get the URL from a function that doesn't break my formatting. Thanks everyone for playing along with my insanity, sorry it was a misleading question!

When I try the following:
string url = #"someserver.com/user/token/files\subdir\file.jpg";
string[] buffer = url.Split('/');
Console.WriteLine(buffer.Length);
... I get 4. Post more code.

Something else is happening, paste more code.
string str = "a\\b/c\\d";
string[] ts = str.Split('/');
foreach (string t in ts)
{
Console.WriteLine(t);
}
outputs
a\b
c\d
just like it should.
My guess is that you are converting / into \ somewhere.

You could use regex to convert all \ slashes to a temp char, split on /, then regex the temp chars back to \. Pain in the butt, but one option.

I suspect (without seeing your whole application) that the problem lies in the semantics of path delimiters in URLs. It sounds like you are trying to attach a semantic value to backslashes within your application that is contrary to the way HTTP protocols define and use backslashes.
This is just a guess, of course.
The best way to solve this problem might be modifying the application to encode the path in some other way (such as "%5C" for backslashes, maybe?).

those two functions are probably converting \ to / because \ is not a valid character in a URL (see Which characters make a URL invalid?). The browser (NOT C#, as you are inferring) is assuming that when you are using that invalid character, you mean /, so it is "fixing" it for you. If you want \ in your URL, you need to encode it first.
The browsers themselves are actually the ones that make that change in the request, even if it is behind the scenes. To verify this, just turn on fiddler and look at the URLs that are actually getting sent when you go to a URL like this. IE and Chrome actually change the \ to / in the URL field on the browser itself, FireFox doesn't, but the request goes through that way anyways.

Update:
How about this:
Regex.Split(url, "/");

Related

Regex to match a fragment of the URL

I have URL's like:
http://127.0.0.1:81/controller/verbOne/NXw4fDF8MXwxfDQ1?source=dddd
or
http://127.0.0.1:81/controller/verbTwo/NXw4fDF8MXwxfDQ1
I'd like to extract that part in bold. The host and port can change to anything (when I publish it to a live server it will change). The controller never changes. And for the verb part, there are 2 possibilities.
Can anyone help me with the regex?
Thanks

Instead of using a regex you could use the built in functionality of Uri
Uri uri = new Uri("http://127.0.0.1:81/controller/verbOne/NXw4fDF8MXwxfDQ1?source=dddd");
var lastSegment = uri.Segments.Last();

You're looking for the Uri and Path classes:
Path.GetFileName(new Uri(str).AbsolutePath)

Why do you look for a regex? you can look for the two string elements "verbOne/" or "verbTwo/" and make a substring from the end. And then you can look for the rest and substrakt the part with the '?'
I think this is faster then a regex.
krikit

Though everyone else here is correct that regex is not the best solution, because it could fail when parsers already exist that should never fail due to their specialization, I believe you could use the following regex:
(?<=http://127\.0\.0\.1:81/controller/verb(One|Two)/)[a-zA-Z0-9]*

How to URL encode periods?

I need to URL encode some periods since I have to pass some document path along and it is like this
http://example.com/test.aspx?document=test.docx
So test.docx is causing me an error of an illegal character. So I need to change it to
. --> %2E
I tried to use Server.UrlEncode
string b = Server.UrlEncode("http://example.com/test.aspx?document=test.docx");
but I get
"http%3a%2f%2fexample.com%2ftest.aspx%3fdocument%3dtest.docx"
So do I have to use like a string replace and do it manually and replace all periods with that code?

This is a really old question, but I ran into this searching for a similar problem. I stuck a "/" onto the end of my url's with periods in them and it got around the problem.

The period there isn't he problem (given that %2E doesn't solve the problem). A period is a perfectly valid URL character whatever the problem is it's not the period. Check the stack trace of the error being throw or post the complete error details.
And you shouldn't be URL encoding the entire path. Only the query string parameter value.
string b = "http://example.com/test.aspx?document=" + Server.UrlEncode("test.docx");
Are you still getting the error if you try it that way?
I wouldn't touch SharePoint with a ten foot pole. However, escaping the period wouldn't necessarily stop SharePoint from doing it's shenanigans. But I guess you should at least try it.
Server.UrlEncode("test.docx").Replace(".", "%2E");

Best way to escape javascript string? (json?)

Using C# .net I am parsing some data with some partial html/javascript inside it (i dont know who made that decision) and i need to pull a link. The link looks like this
http:\/\/fc0.site.net\/fs50\/i\/2009\/name.jpg
It came from this which i assume is javascript and looks like json
"name":{"id":"589","src":"http:\/\/fc0.site.net\/fs50\/i\/2009\/name.jpg"}
But anyways how should i escape the first link so i get
http://fc0.site.net/fs50/i/2009/name.jpg
In this case i could just replace '\' with '' since links dont contain \ nor " so i could do that but i am a fan of knowing the right solution and doing things properly. So how might i escape this. After looking at that link for a minute i thought is that valid? does java script or json escape / with \? It doesnt seem like it should?

In your case:
"name":{"id":"589","src":"http://fc0.site.net/fs50/i/2009/name.jpg"}
"/" is a valid escape sequence. However, it is not required that / be escaped. You may escape it if you need to. The reason JSON explicitly allows escaping of slash is because HTML does not allow a string in a to contain "...
Update:
Check out this post

Odd, it doesn’t look like any JavaScript/JSON escaping you’d expect. You can have forward slashes in JavaScript strings just fine.

Why dont you try a regex on the escaped slashes to replace them in the C# code...
String url = #"http:\/\/fc0.site.net\/fs50\/i\/2009\/name.jpg";
String pattern = #"\\/";
String cleanUrl = Regex.Replace(url, pattern, "/");
Hope it helps!

Actually you want to unescape the string. Answered in this question.

I am implementing URL rewriting in ASP.net and my URLs are causing me a world of problems.
The URL is generated from a database of departments & categories. I want employees to be able to add items to the database with whatever special characters are appropriate without it breaking the site.
I am encoding the data before I construct the URLs.
There are several problems...
IIS decodes the URL before it reaches .net making it impossible to properly parse anything with a "/" in it.
ASP.net gets confused by the url making "~" useless within certain pages
I migrated from the built in test server to my local IIS server (XP machine) and any URL containing an encoded & (%26) gives me a "Bad Request" error.
UrlEncode leaves some breaking characters untouched such as '.'
I did have two other related posts on this subject, at the time I only saw the small problems not the big problem upstream. I've found some registry tricks to solve the "Bad Request" issue but I'm going to be deploying to a shared hosting environment making that useless. I also know that this is a fix for some security issue so I don't want to necessarily bypass it without knowing what can of worms I'm opening.
Rather than trying to force .net to pass me the raw url, or override IIS settings i'd like to make truly safe URLs in the first place.
I'll note i've tried AntiXss.URLEncode, HttpUtility.URLEncode, URI.EscapeDataString. I've even tried stupid things like double URLEncodng. Is there a utility that does what I need, or do i really need to roll my own. I'm even considering doing something Hacky like replacing the % with an unusual string of characters. The end result should be at least readable which was the point of using URL rewriting in the first place.
Sorry for the long post- I just wanted to make sure that I've included all the necessary details. I can't seem to find any relevant information on this, and it seems like it would be a common problem - so maybe I'm missing something big. Thanks for your help, and patience with the long explanation!
Edit for clarity:
When I say the urls are being built from a database what I mean is that the directory structure is contstructed from the departments and categories in my database.
Some Example URLS -
Mystore/Refrigeration/Bar+Fridge.aspx
Mystore/Cooking+Equipment.aspx
Mystore/Kitchen/Cutting+Boards.asxpx
The problems come in when I use a department like "Beverage & Bar" or "Pastry/Decorating" to construct my URL. Despite being encoded first these cause the aforementioned issues.
My handlers are already implemented and working fine except for the special character encoding issues.

You should consider having a table off of your category/department table which has a unique URL for each category. Then you can use a special routine to generate the URLs. This can be a SQL scalar function, or a CLR function, but one of the things it would do is normalize the URL for the web. You can convert "Beverage & Bar" to "Beverage-And-Bar" and "Pastry / Decorating" to "Pastry-Decorating". Mainly, the routine needs to replace all invalid HTTP URL characters with something else. An example is this:
public static class URL
{
static readonly Regex feet = new Regex(#"([0-9]\s?)'([^'])", RegexOptions.Compiled);
static readonly Regex inch1 = new Regex(#"([0-9]\s?)''", RegexOptions.Compiled);
static readonly Regex inch2 = new Regex(#"([0-9]\s?)""", RegexOptions.Compiled);
static readonly Regex num = new Regex(#"#([0-9]+)", RegexOptions.Compiled);
static readonly Regex dollar = new Regex(#"[$]([0-9]+)", RegexOptions.Compiled);
static readonly Regex percent = new Regex(#"([0-9]+)%", RegexOptions.Compiled);
static readonly Regex sep = new Regex(#"[\s_/\\+:.]", RegexOptions.Compiled);
static readonly Regex empty = new Regex(#"[^-A-Za-z0-9]", RegexOptions.Compiled);
static readonly Regex extra = new Regex(#"[-]+", RegexOptions.Compiled);
public static string PrepareURL(string str)
{
str = str.Trim().ToLower();
str = str.Replace("&", "and");
str = feet.Replace(str, "$1-ft-");
str = inch1.Replace(str, "$1-in-");
str = inch2.Replace(str, "$1-in-");
str = num.Replace(str, "num-$1");
str = dollar.Replace(str, "$1-dollar-");
str = percent.Replace(str, "$1-percent-");
str = sep.Replace(str, "-");
str = empty.Replace(str, string.Empty);
str = extra.Replace(str, "-");
str = str.Trim('-');
return str;
}
}
You could make this a SQL enhance function, or run URL generation as a separate process. Then to implement mapping, you would map the entire URL directly to a category ID. This approach is better in the long run for several reasons. First, you are not always generating URLs, you do this once and they stay static, you don't have to worry about your procedure changing, and then GoogleBot not being able to find old URLs. Also, if you get a collision, you may notice a potential duplicate category name, because a collision would only be different by special characters. Finally, you can always view your URLs from the database, without having to run the mapping function.

I have a url rewrite i implement in the global.asax file in the begin authenticated request as I have some security. This is where I take the raw url and then do the db look up. this then rewrites the path to the aspx page and all the parameters are passed through the query string. No encoding is necessary.
However if you are using the url to actually change data then i can see that you will have huge problems as you are effectively using the http GET to change database. It is usually concidered a bad idead, and not something i do.
I only use a post request to do any databse manipulation. This keeps the url clean as all the data is in the page form.
The only issue i had was to set the correct url to the page.form.action which in most cases is the raw url.
If its the category names that are causing the issue then perhaps you should restrict the names to alpha numeric characters only and swap spaces for "-". IIS will throw a wobbly with periods "." as it looks for file names.
P.S.
IIS does not understand the tilde "~", this is something that the compiler understands. so if you use it in an anchor tag it will not work as expected and you should use the application root instead of the tilde.
Edit:
OK, it looks like an issue with IIS having issues with certain characters such as . / and &. Even if you do urlencode these IIS will still try to implement its own meanings.
As such consider removing them so:
Beverage & bar becomes BeverageBar
Pastry / decorating becomes PastryDecorating.
This will keep you urls clean, but does mean an extra column in the database so you can cheack the url against this shortened category name.

I'm having the exact same problem. Thanks for writing it up so nicely. It actually helped me to understand the problem better.
I had some other considerations however. One of the goals I have is to support the potential for any characters to be in the url which is based on the title of an article. Additionally I want to ensure uniqueness in the encoding and a two way encode / decode process.
So I did some manual encoding to solve the problem. This won't completely eliminate percent encoding, but will greatly reduce it and keep users from generating an inaccessible url. My process starts with using the Server.URLEncode function. But this doesn't eliminate the problems in the url. Because IIS is decoding the url and then passing it to the application, certain characters will break it with a dangerous request exception. These characters include +, &, /, !, *, ., ( and ). So on those characters plus other characters I would like to make more readable I do a double encoding for a more usable url. Encoding is also hard because of the limited number of characters that are allowed in an url. So prior to encoding I made all letters capital and then did the encoding with lower case. This keeps it from being totally decodable, but I can easily do a match in the database or in code by making the value I wish to match be upper case.
Well, here is my code. Feedback would be appreciated. Oh ya, this is in VB, but things should transfer over to C# easy enough.
Dim strReturn As String = Trim(strStringToEncode)
strReturn = Server.UrlEncode(strReturn)
strReturn = strReturn.Replace("-", "dash").Replace("+", "-")
strReturn = strReturn.Replace("%26", "and").
Replace("%2f", "or").
Replace("!", "excl").
Replace("*", "star").
Replace("%27", "apos").
Replace("(", "lprn").
Replace(")", "rprn").
Replace("%3b", "semi").
Replace("%3a", "coln").
Replace("%40", "at").
Replace("%3d", "eq").
Replace("%2b", "plus").
Replace("%24", "dols").
Replace("%25", "pct").
Replace("%2c", "coma").
Replace("%3f", "query").
Replace("%23", "hash").
Replace("%5b", "lbrk").
Replace("%5d", "rbrk").
Replace(".", "dot").
Replace("%3e", "gt").
Replace("%3c", "lt")
Return strReturn

I guess you are looking for HttpUtility.UrlEncode and HttpUtility.HtmlDecode
string url = "http://www.google.com/search?q=" + HttpUtility.UrlEncode("Example");

How can I deal with ampersands in a mail client's mailto links?

I have an ASP.NET/C# application, part of which converts WWW links to mailto links in an HTML email.
For example, if I have a link such as:
www.site.com
It gets rewritten as:
mailto:my#address.com?Subject=www.site.com
This works extremely well, until I run into URLs with ampersands, which then causes the subject to be truncated.
For example the link:
www.site.com?val1=a&val2=b
Shows up as:
mailto:my#address.com?Subject=www.site.com?val1=a&val2=b
Which is exactly what I want, but then when clicked, it creates a message with:
subject=www.site.com?val1=a
Which has dropped the &val2, which makes sense as & is the delimiter in a mailto command.
So, I have tried various other was to work around this with no success.
I have tried implicitly quoting the subject='' part and that did nothing.
I (in C#) replace '&' with & which Live Mail and Thunderbird just turn back into:
www.site.com?val1=a&val2=b
I replaced '&' with '%26' which resulted in:
mailto:my#address.com?Subject=www.site.com?val1=a%26amp;val2=b
In the mail with the subject:
www.site.com?val1=a&val2=b
EDIT:
In response to how URL is being built, this is much trimmed down but is the gist of it. In place of the att.Value.Replace I have tried System.Web.HtmlUtility.URLEncode calls which also results in a failure
HtmlAgilityPack.HtmlNodeCollection nodes =doc.DocumentNode.SelectNodes("//a[#href]");
foreach (HtmlAgilityPack.HtmlNode link in nodes)
{
HtmlAgilityPack.HtmlAttribute att = link.Attributes["href"];
att.Value = att.Value.Replace("&", "%26");
}

Try mailto:my#address.com?Subject=www.site.com?val1=a%26val2=b
& is an HTML escape code, whereas %26 is a URL escape code. Since it's a URL, that's all you need.
EDIT: I figured that's how you were building your URL. Don't build URLs that way! You need to get the %26 in there before you let anything else parse or escape it. If you really must do it this way (which you really should try to avoid), then you should search for "&" instead of just "&" because the string has already been HTML escaped at this point.
So, ideally, you build your URL properly before it's HTML escaped. If you can't do it properly, at least search for the right string instead of the wrong one. "&" is the wrong one.

You cant put any character as subject. You could try using System.Web.HttpUtility.URLEncode function on the subject´s value...

Using the URL escape code %26 is the right way.
Sadly this is still not working on the Android OS because of bug 8023

What I ended up doing for my case was eliminating the &.
www.site.com/mytest.php?val1=a=b=c. Where the 2nd and 3rd = would be equivalent to www.site.com?val1=a&val2=b&val3=c
In mytest.php I explode on ? and then explode again on =.
A total hack I know but it does work for me.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.