Making an URL out of modified database fields

Making an URL out of modified database fields - c#

I have a button that when the user clicks it, it must go to a specified URL.
But I have to create my URL out of the values coming from database and most importantly, I need to modify the values coming from database before I make a URL out of it.
Suppose the values from database is
country- France
hotel - Hotel Movenpick
Now first I have to turn the capitals from above values to lowercase, then spaces to '-' sign.
Then i will have to create my URL with these modified values as below.
http://www.travel.com/france/hotel-movenpick
I have never done this before. Please provide me some reference for doing this task. I am coding in c#.

How about:
string fixedCountry = country.ToLower(CultureInfo.InvariantCulture)
.Replace(" ", "-");
string fixedHotel = hotel.ToLower(CultureInfo.InvariantCulture)
.Replace(" ", "-");
string url = "http://www.travel.com/" + fixedCountry + "/" + fixedHotel;
Note that this won't fix up any accented characters or other symbols. It becomes more complicated if you want to do that. It will depend on how much you trust your data to not contain that sort of thing.
If you need to make this any more complicated, or need to do it anywhere else, I suggest you create a "string fixing" method which munges it appropriately, then call it for each of your fields.
EDIT: Removing accented characters is interesting. .NET makes this fairly easy, but I don't know what it will do for your "ae" situation - you may need to special-case that. Try this though, as a starting point:
static string RemoveAccents (string input)
{
string normalized = input.Normalize(NormalizationForm.FormKD);
Encoding removal = Encoding.GetEncoding
(Encoding.ASCII.CodePage,
new EncoderReplacementFallback(""),
new DecoderReplacementFallback(""));
byte[] bytes = removal.GetBytes(normalized);
return Encoding.ASCII.GetString(bytes);
}

Related

C# ignore part of string split

In my C# application I concatenate form data into a string format to be passed over to a format expected by a webservice.
string firstName = "Test";
string lastName = "Test";
string freeText = "this is some free text, thanks";
string submitString = firstName + "," + lastName + "," + freeText;
Later in the application I need to pick this apart when it is returned from the webservice to be used somewhere else.
string[] returnData = submitString.Split(',');
However if free text contains a comma, the returnData variable splits it as part of the string array and I would like to keep the contents of freeText as one whole string (despite containing a comma).
Is there a quick way I can ignore the contents of that field in the string split (rather than stopping the customer entering a comma).

If the following two conditions are satisfied:
there is a fixed number of "fields" in your comma-separated string (e.g. 3) and
only the last one can contain additional commas,
Then, yes, you can use the String.Split(char[], int) overload to specify the maximum number of items to return:
var s = "Test,Test,this is some free text, thanks";
var a = s.Split(new[] {','}, 3); // return at most 3 items
Console.WriteLine(a[0]); // prints Test
Console.WriteLine(a[1]); // prints Test
Console.WriteLine(a[2]); // prints this is some free text, thanks
Otherwise, the answer is "no", because String.Split has no way to see a difference between a "field-separating comma" and a "user-entered comma". How would it know to split Test,Test,free text, thanks,Test as Test/Test/free text, thanks/Test or Test/Test/free text/ thanks,Test?
However, there are a few other ways to solve this problem:
What you have is essentially a string with "comma-separated values" (CSV). If you use a professional CSV library (instead of String.Join/String.Split), values that contain commas will be quoted, and those commas will be ignored when extracting the values.
An easier solution might be to use a different string format altogether: If you encode your values in a JSON array instead of a CSV string, the JSON library will take care of encoding/decoding values that include special characters.
Obviously, if you can avoid encoding all values in a single string at all and just use an array or some other data structure instead, the problem would just disappear. However, there is not enough background in your question to know whether this is a viable option.

Encode URL querystring from database

I am trying to encode searches that are sent as querystrings (Response.Redirect("/Find/" + TextBoxSearch.Text);). There is a row in the database with names including / and +, and when that enters the URL things stop working properly. I have tried encoding like this:
String encode = HttpUtility.UrlEncode(TextBoxSearch.Text);
Response.Redirect("/Find/" + encode);
But can' get it to work, what am I missing? Pretend the search value is ex/ex 18+. How could I get this to work as a querystring?
Don't know if this is important but here is how I get the querysting in my Find-page:
IList<string> segments = Request.GetFriendlyUrlSegments();
string val = "";
for (int i = 0; i < segments.Count; i++)
{
val = segments[i];
}
search = val;
I can't even encode spaces properly.
I try:
String encoded = Uri.EscapeDataString(TextBoxSearch.Text);
Response.Redirect("/Find/" + encoded);
But this does not turn spaces in the querystring in to %20. It does transform "/" though.
EDIT: At this point I would be happy to just turn this url localhost/Find/here are spaces in to localhost/Find/here+are+spaces
EDIT: I have been searching and trying solutions for over 5 hours now.
Can anyone just tell me this:
If I redirect like this Response.Redirect("/Find/" + search);
And I make a search like this Social media
I then Get the queryString as the code above using segments.
Now I want to display info about Social media from my database
but at the same time I want the url to say Find/Social+media
PS: Do I need to encode every url-string? or just where I use signs and spaces.

Instead of HttpUtility.UrlEncode use HttpUtility.UrlPathEncode. From the documentation of UrlEncode:
You can encode a URL using with the UrlEncode method or the
UrlPathEncode method. However, the methods return different results.
The UrlEncode method converts each space character to a plus character
(+). The UrlPathEncode method converts each space character into the
string "%20", which represents a space in hexadecimal notation. Use
the UrlPathEncode method when you encode the path portion of a URL in
order to guarantee a consistent decoded URL, regardless of which
platform or browser performs the decoding.
The + character does represent a space in query strings but not in the address part.

0x202A in filename: Why?

I recently needed to do a isnull in SQL on a varbinary image.
So far so (ab)normal.
I very quickly wrote a C# program to read in the file no_image.png from my desktop, and output the bytes as hex string.
That program started like this:
byte[] ba = System.IO.File.ReadAllBytes(#"‪D:\UserName\Desktop\no_image.png");
Console.WriteLine(ba.Length);
// From here, change ba to hex string
And as I had used readallbytes countless times before, I figured no big deal.
To my surprise, I got a "NotSupported" exception on ReadAllBytes.
I found that the problem was that when I right click on the file, go to tab "Security", and copy-paste the object-name (start marking at the right and move inaccurately to the left), this happens.
And it happens only on Windows 8.1 (and perhaps 8), but not on Windows 7.
When I output the string in question:
public static string ToHexString(string input)
{
string strRetVal = null;
System.Text.StringBuilder sb = new System.Text.StringBuilder();
foreach (char c in input)
{
sb.Append(((int)c).ToString("X2"));
}
strRetVal = sb.ToString();
sb.Length = 0;
sb = null;
return strRetVal;
} // End Function ToHexString
string str = ToHexString(#"‪D:\UserName\Desktop\cookie.png");
string strRight = " (" + ToHexString(#"D:\UserName\Desktop\cookie.png") + ")"; // Correct value, for comparison
string msg = str + Environment.NewLine + " " + strRight;
Console.WriteLine(msg);
I get this:
202A443A5C557365724E616D655C4465736B746F705C636F6F6B69652E706E67
(443A5C557365724E616D655C4465736B746F705C636F6F6B69652E706E67)
First thing, when I lookup 20 2A in ascii, it's [space] + *
Since I don't see neither a space nor a star, when I google 20 2A, the first thing I get is paragraph 202a of the german penal code
http://dejure.org/gesetze/StGB/202a.html
But I suppose that is rather an unfortunate coincidence and it is actually the unicode control character 'LEFT-TO-RIGHT EMBEDDING' (U+202A)
http://www.fileformat.info/info/unicode/char/202a/index.htm
Is that a bug, or is that a feature ?
My guess is, it's a buggy feature.

The issue is that the string does not begin with a letter D at all - it just looks like it does.
It appears that the string is hard-coded in your source file.
If that's the case, then you have pasted the string from the security dialog. Unbeknownst to you, the string you pasted begins with the LRO character. This is an invisible character which tales no space, but tells the renderer to render characters from left-to-right, ignoring the usual rendering.
You just need to delete the character.
To do this, position the cursor AFTER the D in the string. Use the Backspace or Delete to Left key <x] to delete the D. Use the key again to delete the invisible LRO character. One more time to delete the ". Now retype the " and the D.
A similar problem could occur wherever the string came from - e.g. from user input, command line, script file etc.
Note: The security dialog shows the filename beginning with the LRO character to ensure that characters are displayed in the left-to-right order, which is necessary to ensure that the hierarchy is correctly understood when using RTL characters. e.g. a filename c:\folder\path\to\file in Arabic might be c:\folder\مسار/إلى/ملف. The "gotcha" is the Arabic parts read in the other direction so the word "path" according to google translate is مسار, and that is the rightmost word, making it appear is if it was the last element of the path, when in fact it is the element immediately after "c:\folder\".
Because security object paths have an hierarchy which is in conflict with the RTL text layout rules, the security dialog always displays RTL text in LTR mode. That means that the Arabic words will be mangled (letters in wrong order) on the security tab. (Imagine it as if it said "elif ot htap"). So the meaning is just about discernable, but from the point of view of security, the security semantics are preserved.

Filenames that contain RLO/LRO overrides are commonly created by malware. Eg. “exe” read backwards spells “malware”. You probably have an infected host, or the origin of the .png is infected.

This question bothered me a lot, how would it be possible that a deterministic function would give 2 different results for identical input? After some testing, it turns out that the answer is simple.
If you look through it in your debugger, you will see that the 'D' char in your #"‪D:\UserName\Desktop\cookie.png" (first use of Hex function) is NOT the same char as in #"D:\UserName\Desktop\cookie.png" (second use).
You must have used some other 'D'-like character, probably by unwanted keyboard shortcut or by messing with your Visual Studio character encoding.
It looks exactly the same, but in reality it's not event a single char 9try to watch the c variable in your toHex function.
if you change to the normal 'D' in your first example, it will work fine.

How do I parse a query string with "&" in the value using C#?

I have a C# custom webpart on a sharepoint 2007 page. When clicking on a link in an SSRS report on another page, it sends the user to my custom webpart page with a query string like the following:
?tax4Elem=Docks%20&%20Chargers&ss=EU%20MOVEX&Phase=1&tax3Elem=Play%20IT&tax5Elem=Charger
Take note of the value for "tax4Elem", which is basically "Docks & Chargers". (The ampersand can actually come up in "tax4Elem", "tax3Elem", and "tax5Elem").
I cannot have the ampersand in that value encoded so I will have to work with this.
How do I parse this query string so that it doesn't recognize the "&" in "Docks & Chargers" as the beginning of a key/value pair?
Thanks in Advance!
kate

If you really cannot correct the URL, you can still try to parse it, but you have to make some decisions. For example:
Keys can only contain alphanumeric characters.
There are no empty values, or at least, there is always an equal sign = after the key
Values may contain additional ampersands and question marks.
Values may contain additional equal signs, as long as they don't appear to be part of a new key/value pair (they are not preceded with &\w+)
One possible way to capture these pairs is:
MatchCollection matches = Regex.Matches(s, #"\G[?&](?<Key>\w+)=(?<Value>.*?(?=$|&\w+=))");
var values = matches.Cast<Match>()
.ToDictionary(m => m.Groups["Key"].Value,
m => HttpUtility.UrlDecode(m.Groups["Value"].Value),
StringComparer.OrdinalIgnoreCase);
You can then get the values:
string tax4 = values["tax4Elem"];
Note that if the query string is "invalid" according to our rule, the pattern may not capture all values.

I think you can't parse that string correctly - it has been incorrectly encoded. The ampersand in "Docks & Chargers" should have been encoded as %26 instead of &:
?tax4Elem=Docks%20%26%20Chargers&ss=EU%20MOVEX&Phase=1&tax3Elem=Play%20IT&tax5Elem=Charger
Is it possible to change the code that generated the URL?

Obviously the request is incorrect. However, to work-around it, you can take the original URL, then find the IndexOf of &ss=. Then, find the = sign immediately before that. Decode (with UrlDecode) then reencode (with UrlEncode) the part between the = and &ss= (the value of tax4Elem). Then, reconstruct the query string like this:
correctQueryString = "?tax4Elem=" + reencodedTaxValue + remainderOfQueryString
and decode it normally (e.g. with ParseQueryString) into a NameValueCollection.

Or you can use HttpServerUtility.HtmlDecode method to decode the value to '&' (ampersand) sign

ASP.Net URL Encoding

I am implementing URL rewriting in ASP.net and my URLs are causing me a world of problems.
The URL is generated from a database of departments & categories. I want employees to be able to add items to the database with whatever special characters are appropriate without it breaking the site.
I am encoding the data before I construct the URLs.
There are several problems...
IIS decodes the URL before it reaches .net making it impossible to properly parse anything with a "/" in it.
ASP.net gets confused by the url making "~" useless within certain pages
I migrated from the built in test server to my local IIS server (XP machine) and any URL containing an encoded & (%26) gives me a "Bad Request" error.
UrlEncode leaves some breaking characters untouched such as '.'
I did have two other related posts on this subject, at the time I only saw the small problems not the big problem upstream. I've found some registry tricks to solve the "Bad Request" issue but I'm going to be deploying to a shared hosting environment making that useless. I also know that this is a fix for some security issue so I don't want to necessarily bypass it without knowing what can of worms I'm opening.
Rather than trying to force .net to pass me the raw url, or override IIS settings i'd like to make truly safe URLs in the first place.
I'll note i've tried AntiXss.URLEncode, HttpUtility.URLEncode, URI.EscapeDataString. I've even tried stupid things like double URLEncodng. Is there a utility that does what I need, or do i really need to roll my own. I'm even considering doing something Hacky like replacing the % with an unusual string of characters. The end result should be at least readable which was the point of using URL rewriting in the first place.
Sorry for the long post- I just wanted to make sure that I've included all the necessary details. I can't seem to find any relevant information on this, and it seems like it would be a common problem - so maybe I'm missing something big. Thanks for your help, and patience with the long explanation!
Edit for clarity:
When I say the urls are being built from a database what I mean is that the directory structure is contstructed from the departments and categories in my database.
Some Example URLS -
Mystore/Refrigeration/Bar+Fridge.aspx
Mystore/Cooking+Equipment.aspx
Mystore/Kitchen/Cutting+Boards.asxpx
The problems come in when I use a department like "Beverage & Bar" or "Pastry/Decorating" to construct my URL. Despite being encoded first these cause the aforementioned issues.
My handlers are already implemented and working fine except for the special character encoding issues.

You should consider having a table off of your category/department table which has a unique URL for each category. Then you can use a special routine to generate the URLs. This can be a SQL scalar function, or a CLR function, but one of the things it would do is normalize the URL for the web. You can convert "Beverage & Bar" to "Beverage-And-Bar" and "Pastry / Decorating" to "Pastry-Decorating". Mainly, the routine needs to replace all invalid HTTP URL characters with something else. An example is this:
public static class URL
{
static readonly Regex feet = new Regex(#"([0-9]\s?)'([^'])", RegexOptions.Compiled);
static readonly Regex inch1 = new Regex(#"([0-9]\s?)''", RegexOptions.Compiled);
static readonly Regex inch2 = new Regex(#"([0-9]\s?)""", RegexOptions.Compiled);
static readonly Regex num = new Regex(#"#([0-9]+)", RegexOptions.Compiled);
static readonly Regex dollar = new Regex(#"[$]([0-9]+)", RegexOptions.Compiled);
static readonly Regex percent = new Regex(#"([0-9]+)%", RegexOptions.Compiled);
static readonly Regex sep = new Regex(#"[\s_/\\+:.]", RegexOptions.Compiled);
static readonly Regex empty = new Regex(#"[^-A-Za-z0-9]", RegexOptions.Compiled);
static readonly Regex extra = new Regex(#"[-]+", RegexOptions.Compiled);
public static string PrepareURL(string str)
{
str = str.Trim().ToLower();
str = str.Replace("&", "and");
str = feet.Replace(str, "$1-ft-");
str = inch1.Replace(str, "$1-in-");
str = inch2.Replace(str, "$1-in-");
str = num.Replace(str, "num-$1");
str = dollar.Replace(str, "$1-dollar-");
str = percent.Replace(str, "$1-percent-");
str = sep.Replace(str, "-");
str = empty.Replace(str, string.Empty);
str = extra.Replace(str, "-");
str = str.Trim('-');
return str;
}
}
You could make this a SQL enhance function, or run URL generation as a separate process. Then to implement mapping, you would map the entire URL directly to a category ID. This approach is better in the long run for several reasons. First, you are not always generating URLs, you do this once and they stay static, you don't have to worry about your procedure changing, and then GoogleBot not being able to find old URLs. Also, if you get a collision, you may notice a potential duplicate category name, because a collision would only be different by special characters. Finally, you can always view your URLs from the database, without having to run the mapping function.

I have a url rewrite i implement in the global.asax file in the begin authenticated request as I have some security. This is where I take the raw url and then do the db look up. this then rewrites the path to the aspx page and all the parameters are passed through the query string. No encoding is necessary.
However if you are using the url to actually change data then i can see that you will have huge problems as you are effectively using the http GET to change database. It is usually concidered a bad idead, and not something i do.
I only use a post request to do any databse manipulation. This keeps the url clean as all the data is in the page form.
The only issue i had was to set the correct url to the page.form.action which in most cases is the raw url.
If its the category names that are causing the issue then perhaps you should restrict the names to alpha numeric characters only and swap spaces for "-". IIS will throw a wobbly with periods "." as it looks for file names.
P.S.
IIS does not understand the tilde "~", this is something that the compiler understands. so if you use it in an anchor tag it will not work as expected and you should use the application root instead of the tilde.
Edit:
OK, it looks like an issue with IIS having issues with certain characters such as . / and &. Even if you do urlencode these IIS will still try to implement its own meanings.
As such consider removing them so:
Beverage & bar becomes BeverageBar
Pastry / decorating becomes PastryDecorating.
This will keep you urls clean, but does mean an extra column in the database so you can cheack the url against this shortened category name.

I'm having the exact same problem. Thanks for writing it up so nicely. It actually helped me to understand the problem better.
I had some other considerations however. One of the goals I have is to support the potential for any characters to be in the url which is based on the title of an article. Additionally I want to ensure uniqueness in the encoding and a two way encode / decode process.
So I did some manual encoding to solve the problem. This won't completely eliminate percent encoding, but will greatly reduce it and keep users from generating an inaccessible url. My process starts with using the Server.URLEncode function. But this doesn't eliminate the problems in the url. Because IIS is decoding the url and then passing it to the application, certain characters will break it with a dangerous request exception. These characters include +, &, /, !, *, ., ( and ). So on those characters plus other characters I would like to make more readable I do a double encoding for a more usable url. Encoding is also hard because of the limited number of characters that are allowed in an url. So prior to encoding I made all letters capital and then did the encoding with lower case. This keeps it from being totally decodable, but I can easily do a match in the database or in code by making the value I wish to match be upper case.
Well, here is my code. Feedback would be appreciated. Oh ya, this is in VB, but things should transfer over to C# easy enough.
Dim strReturn As String = Trim(strStringToEncode)
strReturn = Server.UrlEncode(strReturn)
strReturn = strReturn.Replace("-", "dash").Replace("+", "-")
strReturn = strReturn.Replace("%26", "and").
Replace("%2f", "or").
Replace("!", "excl").
Replace("*", "star").
Replace("%27", "apos").
Replace("(", "lprn").
Replace(")", "rprn").
Replace("%3b", "semi").
Replace("%3a", "coln").
Replace("%40", "at").
Replace("%3d", "eq").
Replace("%2b", "plus").
Replace("%24", "dols").
Replace("%25", "pct").
Replace("%2c", "coma").
Replace("%3f", "query").
Replace("%23", "hash").
Replace("%5b", "lbrk").
Replace("%5d", "rbrk").
Replace(".", "dot").
Replace("%3e", "gt").
Replace("%3c", "lt")
Return strReturn

I guess you are looking for HttpUtility.UrlEncode and HttpUtility.HtmlDecode
string url = "http://www.google.com/search?q=" + HttpUtility.UrlEncode("Example");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.