I have a query string which contains the character §, for example /search?q=5§2. This should be encoded as /search?q=5%c2%a72, but HttpContext.Current.Request.QueryString.ToString() gives me q=5%u00a72. For some reason the %c2 is lost.
Encoding is ok, you can read more details here http://en.wikipedia.org/wiki/Percent-encoding ("Non-standard implementations" part)
To not think about it you can just use HttpUtility.UrlDecode() to obtain real q=5§2 string regardless used encoding.
Related
I'm using Convert.FromBase64String() for decoding a base 64 encoded string. The string actually is a XML file, which has base 64 encoded images in it. E.g.
...
I get the following exception:
System.FormatException: The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
Where is the problem? The double base 64 encoding? The string image/png;base64 in the base 64 encoded data? An online tool has no issues at all.
Edit:
Now I tried to remove image/png;base64 part from the XML file and I still get this error. Then I tried to decode the string YWJj with the same error!? If I use this code
byte[] dataBuffer = Convert.FromBase64String(base64string);
I get the above exception. If I use instead
byte[] dataBuffer = Convert.FromBase64String("YWJj");
it does work. Encoding of the file is UTF-8 according to Notepad++. Any ideas?
Edit 2:
String.Equals says that the two strings YWJj are not equal, despite the Locals Window shows that they are:
BTW the above code doesn't throw the exception, because I use string test = "YWJj";. Why does it work with local defined variables, but not with passed strings? I don't think it's a threading problem, because I made the above function, which is only called once.
You should remove data:image/png;base64, part from string to decode.
strind data = "...";
string[] pd = data.Split(',');
string decoded = Convert.FromBase64String(pd[1]);
The part of string data:image/png;base64, isn't base64 data. Real encoded data starts after ,. Base64 description. So function Convert.FromBase64String accepts only encoded data. Therefore, you need to extract the encoded data.
As I've already written I'm reading the base 64 encoded file in and decode it with Convert.FromBase64String(). Now I got it working and the reason is completely unknowable. What I've done?
I renamed the file. That's it.
Before I had a filename like NAME_Something_v1.0.xsl.b64. Now I use NAME_Something.b64. Perhaps it's not the only reason, but I'm accessing the file from an assembly with assembly.GetManifestResourceStream(). I've cleaned the solution before, but I always had the same problem. Now I changed the name back to where it was and it also works ...
1. You shouldn't include the data:image/png;base64, part, as this isn't actually a part of the base64 string.
2. iVBORw0KGgoAA... isn't valid either, this is not the full base64 string.
You can solve this by either splitting the string or using regular expressions to parse it.
Everything after data:image/png;base64, is the actual Base64 string to be decoded.
You can remove the first part of the string like so:
ImageAsString = ImageAsString.Substring(input.IndexOf('data:image/png;base64,') + 1);
I have a file with URLs, one of which is http://en.wikipedia.org/wiki/São_Paulo. Note that 'ã'. When I read the URLs (in C#) and try to print it, it appears as http://en.wikipedia.org/wiki/S?o_Paulo.
I tried reading the URLs as following:
List<string> urls = System.IO.File.ReadAllLines(wikiURL_FilePath, Encoding.UTF8).ToList();
Note that I have passed second argument to read it in UTF8 format, but still the problem is not rectified. How can I read and store the string in correct form?
The data you have shown is simply not UTF-8, despite having a UTF-8 BOM; the UTF-8 for São is 53-C3-A3-6F; you have 53-E3-6F, which is... the right unicode code-points for basic multi-lingual plane data, but incorrectly encoded to disk as UTF-8. You probably need to fix the code that wrote this file, or: agree on what the encoding is (it could be a single-byte code-page, but you need to agree which, else everything falls apart).
Likely looking encodings (if we take away the BOM):
utf-7
windows-1252
windows-1254
iso-8859-1
iso-8859-4
iso-8859-9
iso-8859-15
I have a string converted with encryption. I would like to make this part of a URL. Is that possible if it has been converted to base64 or do I need to do something more?
var going_to_be_part_of_url = System.Convert.ToBase64String(bytOut, 0, i);
Thanks
Yes, but it's not a good idea, Base64 requires that you respect the difference between upper case and lower case. URL's aren't typically case strict.
Then there's the problem of the special characters in Base64 being converted to URL encoded equivalents, making your URL's ugly and less manageable.
You should go with Base36 instead.
You can use a modified base 64 for URL Applications which is just base64 with a couple of the problem characters replaced.
The easiest way is to take your base64 string and encode perform a string replace on the problem characters when building the URL, and reversing the process when interpreting the URL.
I keep getting a Base64 invalid character error even though I shouldn't.
The program takes an XML file and exports it to a document. If the user wants, it will compress the file as well. The compression works fine and returns a Base64 String which is encoded into UTF-8 and written to a file.
When its time to reload the document into the program I have to check whether its compressed or not, the code is simply:
byte[] gzBuffer = System.Convert.FromBase64String(text);
return "1F-8B-08" == BitConverter.ToString(new List<Byte>(gzBuffer).GetRange(4, 3).ToArray());
It checks the beginning of the string to see if it has GZips code in it.
Now the thing is, all my tests work. I take a string, compress it, decompress it, and compare it to the original. The problem is when I get the string returned from an ADO Recordset. The string is exactly what was written to the file (with the addition of a "\0" at the end, but I don't think that even does anything, even trimmed off it still throws). I even copy and pasted the entire string into a test method and compress/decompress that. Works fine.
The tests will pass but the code will fail using the exact same string? The only difference is instead of just declaring a regular string and passing it in I'm getting one returned from a recordset.
Any ideas on what am I doing wrong?
You say
The string is exactly what was written
to the file (with the addition of a
"\0" at the end, but I don't think
that even does anything).
In fact, it does do something (it causes your code to throw a FormatException:"Invalid character in a Base-64 string") because the Convert.FromBase64String does not consider "\0" to be a valid Base64 character.
byte[] data1 = Convert.FromBase64String("AAAA\0"); // Throws exception
byte[] data2 = Convert.FromBase64String("AAAA"); // Works
Solution: Get rid of the zero termination. (Maybe call .Trim("\0"))
Notes:
The MSDN docs for Convert.FromBase64String say it will throw a FormatException when
The length of s, ignoring white space
characters, is not zero or a multiple
of 4.
-or-
The format of s is invalid. s contains a non-base 64 character, more
than two padding characters, or a
non-white space character among the
padding characters.
and that
The base 64 digits in ascending order
from zero are the uppercase characters
'A' to 'Z', lowercase characters 'a'
to 'z', numerals '0' to '9', and the
symbols '+' and '/'.
Whether null char is allowed or not really depends on base64 codec in question.
Given vagueness of Base64 standard (there is no authoritative exact specification), many implementations would just ignore it as white space. And then others can flag it as a problem. And buggiest ones wouldn't notice and would happily try decoding it... :-/
But it sounds c# implementation does not like it (which is one valid approach) so if removing it helps, that should be done.
One minor additional comment: UTF-8 is not a requirement, ISO-8859-x aka Latin-x, and 7-bit Ascii would work as well. This because Base64 was specifically designed to only use 7-bit subset which works with all 7-bit ascii compatible encodings.
string stringToDecrypt = HttpContext.Current.Request.QueryString.ToString()
//change to
string stringToDecrypt = HttpUtility.UrlDecode(HttpContext.Current.Request.QueryString.ToString())
If removing \0 from the end of string is impossible, you can add your own character for each string you encode, and remove it on decode.
One gotcha to do with converting Base64 from a string is that some conversion functions use the preceding "data:image/jpg;base64," and others only accept the actual data.
I have a a string in c# initialised as follows:
string strVal = "£2000";
However whenever I write this string out the following is written:
£2000
It does not do this with dollars.
An example bit of code I am using to write out the value:
System.IO.File.AppendAllText(HttpContext.Current.Server.MapPath("/logging.txt"), strVal);
I'm guessing it's something to do with localization but if c# strings are just unicode surely this should just work?
CLARIFICATION: Just a bit more info, Jon Skeet's answer is correct, however I also get the issue when I URLEncode the string. Is there a way of preventing this?
So the URL encoded string looks like this:
"%c2%a32000"
%c2 = Â
%a3 = £
If I encode as ASCII the £ comes out as ?
Any more ideas?
AppendAllText is writing out the text in UTF-8.
What are you using to look at it? Chances are it's something that doesn't understand UTF-8, or doesn't try UTF-8 first. Tell your editor/viewer that it's a UTF-8 file and all should be well. Alternatively, use the overload of AppendAllText which allows you to specify the encoding and use whichever encoding is going to be most convenient for you.
EDIT: In response to your edited question, the reason it fails when you encode with ASCII is that £ is not in the ASCII character set (which is Unicode 0-127).
URL encoding is also using UTF-8, by the looks of it. Again, if you want to use a different encoding, specify it to the HttpUtility.UrlEncode overload which accepts an encoding.
The default character set of URLs when used in HTML pages and in HTTP headers is called ISO-8859-1 or ISO Latin-1.
It's not the same as UTF-8, and it's not the same as ASCII, but it does fit into one-byte-per-character. The range 0 to 127 is a lot like ASCII, and the whole range 0 to 255 is the same as the range 0000-00FF of Unicode.
So you can generate it from a C# string by casting each character to a byte, or you can use Encoding.GetEncoding("iso-8859-1") to get an object to do the conversion for you.
(In this character set, the UK pound symbol is 163.)
Background
The RFC says that unencoded text must be limited to the traditional 7-bit US ASCII range, and anything else (plus the special URL delimiter characters) must be encoded. But it leaves open the question of what character set to use for the upper half of the 8-bit range, making it dependent on the context in which the URL appears.
And that context is defined by two other standards, HTTP and HTML, which do specify the default character set, and which together create a practically irresistable force on implementers to assume that the address bar contains percent-encodings that refer to ISO-8859-1.
ISO-8859-1 is the character set of text-based content sent via HTTP except where otherwise specified. So by the time a URL string appears in the HTTP GET header, it ought to be in ISO-8859-1.
The other factor is that HTML also uses ISO-8859-1 as its default, and URLs typically originate as links in HTML pages. So when you craft a simple minimal HTML page in Notepad, the URLs you type into that file are in ISO-8859-1.
It's sometimes described as "hole" in the standards, but it's not really; it's just that HTML/HTTP fill in the blank left by the RFC for URLs.
Hence, for example, the advice on this page:
URL encoding of a character consists
of a "%" symbol, followed by the
two-digit hexadecimal representation
(case-insensitive) of the ISO-Latin
code point for the character.
(ISO-Latin is another name for IS-8859-1).
So much for the theory. Paste this into notepad, save it as an .html file, and open it in a few browsers. Click the link and Google should search for UK pound.
<HTML>
<BODY>
Test
</BODY>
</HTML>
It works in IE, Firefox, Apple Safari, Google Chrome - I don't have any others available right now.
Note that %a3 cannot be encoded in ASCII (7 bit, Basic Latin).
The Pound Sign (down the page) is part of Latin-1 encoding.
I have noticed that this is happening only when long strings are used (over 4000) chars. My solution was upon receiving the parameter in database, I simply replace the  sign with nothing.
Be careful, Â may actually be needed, and if that is the case this solution is not appropriate.