Base64 encoded String that contains exclamation points causing exception

Base64 encoded String that contains exclamation points causing exception - c#

Unfortunately, I cannot post the string as it contains sensitive data. I have created an API that is in use at my company. We have a partner that is attempting to use said API. In a part of the JSON there we expect a base64 encoded string of a digitally signed XML file. When I parse the JSON and try to decode the Base64 string, the API throws an exception.
System.FormatException occurred HResult=0x80131537
Message=The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
Source=mscorlib
StackTrace:
at System.Convert.FromBase64_Decode(Char* startInputPtr, Int32 inputLength, Byte* startDestPtr, Int32 destLength)
at System.Convert.FromBase64CharPtr(Char* inputPtr, Int32 inputLength)
at System.Convert.FromBase64String(String s)
at Base64Decoding.Program.Main(String[] args) in
I have tried taking the raw XML and encoding it on 3 different machines, including a Linux system using Python, I have gotten the exact same Base64 string each time. The string that I receive does not match his string.
This is the only partner we work with that has ever had an issue with the encoding and not matter what I have tried, I cannot duplicate his results encoding the signed XML file. When I try to decode his Base64 using an online decoder, it displays an error. But when I click the decode button, it actually downloads the correctly decoded XML!
When I use my encoded string of the same file, it does not display the error, it displays the decoded XML in the 'live view' box and downloads the correct XML when I click the decode button.
Does anyone have any idea what might cause System.Convert.ToBase64String() to output a string with exclamation points in it? To my understanding that should not be an allowed character in Base64. I have tried 64 vs. 32 bit, I have tried every version of the .Net Framework to back to 2.

Try replacing every exclamation point character '!' with a forward slash character '/'.
I have seen base64 encoded data with exclamation marks in it before. Usually it was embedded in HTML or was intended to be embedded in HTML. Presumably, because forward slashes are a syntactically meaningful character in HTML, and because base64 encoded data can have forward slashes in it, all the forward slashes have been replaced by an exclamation point.
I've seen people who have embedded WebAssembly programs in javascript tags do this.
As an additional note, if you have a piece of base64 encoded data, especially one that might be truncated or only part of the full base64 encoded string, and the framework is giving you an exception like the one above or an exception that says "Invalid length for a Base-64 char array or string", you can try either removing characters from the end of the string, one-by-one, trying to decode it each time (up to about 4 characters or so--beyond that its probably not a length/padding problem), or by adding padding characters '=' (equals symbol) one-by-one, but not more than twice (for a total of two equals, ie '==').

Related

Encoding string from reading email

I am using Gmail API to read emails from Gmail account.
In the body I am replacing some chars which are needed as I read in the forums:
String codedBody = body.Replace("-", "+");
codedBody = codedBody.Replace("_", "/");
Problem is that when I try to convert it
byte[] data = Convert.FromBase64String(codedBody);
there is an exception which is firing with some emails:
System.FormatException: 'The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.'
The string which is coming from the request is:
"0J7QsdGP0LLQsDogSGVhbHRoY2FyZSBTZXJ2aWNlIFJlcHJlc2VudGF0aXZlIHdpdGggRHV0Y2gsIEdlcm1hbiANCiDQktCw0LbQvdC-ISDQnNC-0LvRjywg0L3QtSDQvtGC0LPQvtCy0LDRgNGP0LnRgtC1INC90LAg0YLQvtC30LggZW1haWwuICANCiAg0KLQvtC30LggZW1haWwg0LUg0LjQt9C_0YDQsNGC0LXQvSDQv9GA0LXQtyBqb2JzLmJnINC-0YIg0LjQvNC10YLQviDQvdCwINCa0YDQuNGB0YLQuNCw0L0g0JrRitC90LXQsiAg0JfQsCDQtNCwINGB0LUg0YHQstGK0YDQttC10YLQtSDRgSDQutCw0L3QtNC40LTQsNGC0LAg0YfRgNC10LcgZW1haWwg0LjQt9C_0L7Qu9C30LLQsNC50YLQtToga3Jpc3RpYW5fdG9uaUBhYnYuYmcgIA0KICDQodGK0L7QsdGJ0LXQvdC40LUg0L7RgiDQutCw0L3QtNC40LTQsNGC0LA6ICANCiAg0LHQu9Cw0LHQu9Cw0LHQu9Cw0LHQu9CwDQoNCg0KDQoNCg0KICA=PEhUTUw-PEJPRFk-DQrQntCx0Y_QstCwOiBIZWFsdGhjYXJlIFNlcnZpY2UgUmVwcmVzZW50YXRpdmUgd2l0aCBEdXRjaCwgR2VybWFuPGRpdj48YnI-PGRpdj7QktCw0LbQvdC-ISDQnNC-0LvRjywg0L3QtSDQvtGC0LPQvtCy0LDRgNGP0LnRgtC1INC90LAg0YLQvtC30LggZW1haWwuPC9kaXY-PGRpdj48YnI-PC9kaXY-PGRpdj7QotC-0LfQuCBlbWFpbCDQtSDQuNC30L_RgNCw0YLQtdC9INC_0YDQtdC3IGpvYnMuYmcg0L7RgiDQuNC80LXRgtC-INC90LAg0JrRgNC40YHRgtC40LDQvSDQmtGK0L3QtdCyPC9kaXY-PGRpdj7Ql9CwINC00LAg0YHQtSDRgdCy0YrRgNC20LXRgtC1INGBINC60LDQvdC00LjQtNCw0YLQsCDRh9GA0LXQtyBlbWFpbCDQuNC30L_QvtC70LfQstCw0LnRgtC1OiBrcmlzdGlhbl90b25pQGFidi5iZzwvZGl2PjxkaXY-PGJyPjwvZGl2PjxkaXY-0KHRitC-0LHRidC10L3QuNC1INC-0YIg0LrQsNC90LTQuNC00LDRgtCwOjwvZGl2PjxkaXY-PGJyPjwvZGl2PjxkaXY-0LHQu9Cw0LHQu9Cw0LHQu9Cw0LHQu9CwPGJyPjxicj48YnI-PGJyPjxicj48YnI-PC9kaXY-PC9kaXY-PC9CT0RZPjwvSFRNTD4NCg=="
What is causing this problem?

Your source Base64 string is not valid. It contains a padding character = at position 604 in the middle of the string.
It appears as if you have two valid Base64 string that have been concatenated together. Go back to your source and ensure that you're collecting them correctly.
The source has to provide some detail on this as Base64 itself provides no means to determine if you have two values joined like this. If the first source byte array had a length which was a multiple of 3, there would be no padding character in the middle, and it would have decoded successfully and given garbage.
For what it's worth, replacing those characters appears to be correct as there is no de-facto standard for which two symbols characters are used in Base64. However, make sure you've gotten them right way around.
Update
Having investigated further (learning is fun) there is a defined Base64 standard, which defines two separate Base64 encodings.
The Base 64 Alphabet defines + and / for the two symbols, and = for the padding character.
The same RFC also specifies The "URL and Filename safe" Base 64 Alphabet which uses - and _ for the two symbols, and = (or %3D) for the padding character.
It appears your source data uses the "URL and Filename safe" format, while FromBase64String() only accepts the normal format. Therefore you are quite correct to replace - with + and _ with / to convert from one to the other.

How do I remove invalid characters from UTF-8 encoded file?

Explanation:
I've come across an edge case when writing my web app. I accept UTF-8 files to be uploaded, and I've got a check in place to confirm it is UTF-8 encoded (or at least the best check possible, apparently there is no silver bullet, I'm aware there are many other questions on Stack Overflow for that specific issue).
As a test, I took an ANSI encoded file and converted it to UTF-8 by both (in separate tests) converting it UTF-8 in Notepad++, and also by just decoding as UTF-8 (even though it is ANSI) on the fly in C# using Encoding.UTF.GetBytes(inputStream).
Where The Problem Arises:
Later on, I place the raw data of the file as one of the elements in an XML file. This is where the problem arises. It appears that a character has persisted from the ANSI file which (I assume) is not valid in UTF-8. When I try load the XML using the following command...
XDocument xmlSample = XDocument.Load(outputPath);
I get this exception...
{"Invalid character in the given encoding. Line 10, position 14."}
Which looks like this in Visual Studio...
And like this in Notepad++...
Below is the character copy and pasted.
From NPP: ¡ From Visual Studio String Viewer: �
Question:
How can I remove invalid characters from UTF-8 encoded file, or at least discover them in a sane way so I can reject the file?

First, as to your example, the word “Temperature” suggests that the offending character is in fact the “degree” sign (°, Unicode 176), so that the full text reads “Temperature(°C)”. In this case the character would be coded as a \260 byte in ANSI and as the two bytes \302\260 in UTF-8. \260(preceded by the left parenthesis in this case) is not valid UTF-8.
Second – if you are still interested after more than a year – could you clarify how you use Encoding.UTF.GetBytes()to “decode a file as UTF-8?” GetBytes()reads characters, not bytes, and characters in C# do not have an encoding; the encoding has been applied when reading the file and converting it into characters. What UTF.GetBytes() does is encode (not decode) the characters into a UTF-8 byte sequence.
In order to check an incoming byte sequence you might use Encoding.UTF.GetChars() to decode your byte sequence into characters. Depending on the constructor you use you can get a “cleaned-up” character string (with data loss if problems occurred) or receive a DecoderFallbackException on offending byte sequences, so you can reject the input.

Can't decode a base 64 string with Convert.FromBase64String()

I'm using Convert.FromBase64String() for decoding a base 64 encoded string. The string actually is a XML file, which has base 64 encoded images in it. E.g.
data:image/png;base64,iVBORw0KGgoAA...
I get the following exception:
System.FormatException: The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
Where is the problem? The double base 64 encoding? The string image/png;base64 in the base 64 encoded data? An online tool has no issues at all.
Edit:
Now I tried to remove image/png;base64 part from the XML file and I still get this error. Then I tried to decode the string YWJj with the same error!? If I use this code
byte[] dataBuffer = Convert.FromBase64String(base64string);
I get the above exception. If I use instead
byte[] dataBuffer = Convert.FromBase64String("YWJj");
it does work. Encoding of the file is UTF-8 according to Notepad++. Any ideas?
Edit 2:
String.Equals says that the two strings YWJj are not equal, despite the Locals Window shows that they are:
BTW the above code doesn't throw the exception, because I use string test = "YWJj";. Why does it work with local defined variables, but not with passed strings? I don't think it's a threading problem, because I made the above function, which is only called once.

You should remove data:image/png;base64, part from string to decode.
strind data = "data:image/png;base64,iVBORw0KGgoAA...";
string[] pd = data.Split(',');
string decoded = Convert.FromBase64String(pd[1]);
The part of string data:image/png;base64, isn't base64 data. Real encoded data starts after ,. Base64 description. So function Convert.FromBase64String accepts only encoded data. Therefore, you need to extract the encoded data.

As I've already written I'm reading the base 64 encoded file in and decode it with Convert.FromBase64String(). Now I got it working and the reason is completely unknowable. What I've done?
I renamed the file. That's it.
Before I had a filename like NAME_Something_v1.0.xsl.b64. Now I use NAME_Something.b64. Perhaps it's not the only reason, but I'm accessing the file from an assembly with assembly.GetManifestResourceStream(). I've cleaned the solution before, but I always had the same problem. Now I changed the name back to where it was and it also works ...

1. You shouldn't include the data:image/png;base64, part, as this isn't actually a part of the base64 string.
2. iVBORw0KGgoAA... isn't valid either, this is not the full base64 string.
You can solve this by either splitting the string or using regular expressions to parse it.

Everything after data:image/png;base64, is the actual Base64 string to be decoded.
You can remove the first part of the string like so:
ImageAsString = ImageAsString.Substring(input.IndexOf('data:image/png;base64,') + 1);

Two encodings used in RTF string won't display correct in RichTextBox?

I am trying to parse some RTF, that i get back from the server. For most text i get back this works fine (and using a RichTextBox control will do the job), however some of the RTF seems to contain an additional "encoding" and some of the characters get corrupted.
The original string is as follows (and contains some of the characters used in Polish):
ąćęłńóśźż
The RTF string with hex encoded characters that is send back looks like this
{\lang1045\langfe1045\f16383 {\'b9\'e6\'ea\'b3{\f7 \'a8\'bd\'a8\'ae}\'9c\'9f\'bf}}
I am having problems decoding the ńó characters in the returned string, they seem to be represented by two hex values each, whereas the rest of the string is represented (as expected) by single hex values.
Using a RichTextBox control to "parse" the RTF results in corrupter text (the two characters in question are displayed as four different unwanted characters).
If i would encode the plain string myself to hex using the expected codepage (1250, Latin 2, the ANSI codepage for lcid 1045) i would get the following:
\'B9\'E6\'EA\'B3\'F1\'F3\'9C\'9F\'BF
I am lost as to how i can correctly decode the {\f7 \'a8\'bd\'a8\'ae} part of the returned string that should correspond to ńó.
Note that there is no font definition for \f7 in the RTF header and the string looks fine when viewed directly on the server meaning that the characters (if they are corrupted) are corrupted somewhere in the conversion before sending.
I am not sure if the problem is on the server side (as i have no control over that), but since the server is used for a lot of translation work i assume that the returned string is ok.
I have been going through the RTF specs but can not find any hint regarding this type of combination of encodings.

I don't know why it's happening, but the encoding appears to be GBK (or something sufficiently similar).
Perhaps the server tries to do some "clever" matching to find the characters, or the server's default character encoding is GBK or so, and those characters (and only those) also occur in GBK so it prefers that.
I found out by adding the offending hex codes (A8 BD A8 AE) as bytes into a simple HTML file, so I could go through my browser's encodings and see if anything matched:
<html><body>¨½¨®</body></html>
To my surprise, my browser came up with "ńó" straight away.

Base64 String throwing invalid character error

I keep getting a Base64 invalid character error even though I shouldn't.
The program takes an XML file and exports it to a document. If the user wants, it will compress the file as well. The compression works fine and returns a Base64 String which is encoded into UTF-8 and written to a file.
When its time to reload the document into the program I have to check whether its compressed or not, the code is simply:
byte[] gzBuffer = System.Convert.FromBase64String(text);
return "1F-8B-08" == BitConverter.ToString(new List<Byte>(gzBuffer).GetRange(4, 3).ToArray());
It checks the beginning of the string to see if it has GZips code in it.
Now the thing is, all my tests work. I take a string, compress it, decompress it, and compare it to the original. The problem is when I get the string returned from an ADO Recordset. The string is exactly what was written to the file (with the addition of a "\0" at the end, but I don't think that even does anything, even trimmed off it still throws). I even copy and pasted the entire string into a test method and compress/decompress that. Works fine.
The tests will pass but the code will fail using the exact same string? The only difference is instead of just declaring a regular string and passing it in I'm getting one returned from a recordset.
Any ideas on what am I doing wrong?

You say
The string is exactly what was written
to the file (with the addition of a
"\0" at the end, but I don't think
that even does anything).
In fact, it does do something (it causes your code to throw a FormatException:"Invalid character in a Base-64 string") because the Convert.FromBase64String does not consider "\0" to be a valid Base64 character.
byte[] data1 = Convert.FromBase64String("AAAA\0"); // Throws exception
byte[] data2 = Convert.FromBase64String("AAAA"); // Works
Solution: Get rid of the zero termination. (Maybe call .Trim("\0"))
Notes:
The MSDN docs for Convert.FromBase64String say it will throw a FormatException when
The length of s, ignoring white space
characters, is not zero or a multiple
of 4.
-or-
The format of s is invalid. s contains a non-base 64 character, more
than two padding characters, or a
non-white space character among the
padding characters.
and that
The base 64 digits in ascending order
from zero are the uppercase characters
'A' to 'Z', lowercase characters 'a'
to 'z', numerals '0' to '9', and the
symbols '+' and '/'.

Whether null char is allowed or not really depends on base64 codec in question.
Given vagueness of Base64 standard (there is no authoritative exact specification), many implementations would just ignore it as white space. And then others can flag it as a problem. And buggiest ones wouldn't notice and would happily try decoding it... :-/
But it sounds c# implementation does not like it (which is one valid approach) so if removing it helps, that should be done.
One minor additional comment: UTF-8 is not a requirement, ISO-8859-x aka Latin-x, and 7-bit Ascii would work as well. This because Base64 was specifically designed to only use 7-bit subset which works with all 7-bit ascii compatible encodings.

string stringToDecrypt = HttpContext.Current.Request.QueryString.ToString()
//change to
string stringToDecrypt = HttpUtility.UrlDecode(HttpContext.Current.Request.QueryString.ToString())

If removing \0 from the end of string is impossible, you can add your own character for each string you encode, and remove it on decode.

One gotcha to do with converting Base64 from a string is that some conversion functions use the preceding "data:image/jpg;base64," and others only accept the actual data.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.