I'm trying to write a function that converts a string to a base64 byte array. I've tried with this approach:
public byte[] stringToBase64ByteArray(String input)
{
byte[] ret = System.Text.Encoding.Unicode.GetBytes(input);
string s = Convert.ToBase64String(input);
ret = System.Text.Encoding.Unicode.GetBytes(s);
return ret;
}
Would this function produce a valid result (provided that the string is in unicode)?
Thanks!
You can use:
From byte[] to string:
byte[] array = somebytearray;
string result = Convert.ToBase64String(array);
From string to byte[]:
array = Convert.FromBase64String(result);
Looks okay, although the approach is strange. But use Encoding.ASCII.GetBytes() to convert the base64 string to byte[]. Base64 encoding only contains ASCII characters. Using Unicode gets you an extra 0 byte for each character.
Representing a string as a blob represented as a string is odd... any reason you can't just use the string directly?
The string is always unicode; it is the encoded bytes that change. Since base-64 is always <128, using unicode in the last part seems overkill (unless that is what the wire-format demands). Personally, I'd use UTF8 or ASCII for the last GetBytes so that each base-64 character only takes one byte.
All strings in .NET are unicode. This code will produce valid result but the consumer of the BASE64 string should also be unicode enabled.
Yes, it would output a base64-encoded string of the UTF-16 little-endian representation of your source string. Keep in mind that, AFAIK, it's not really common to use UTF-16 in base64, ASCII or UTF-8 is normally used. However, the important thing here is that the sender and the receiver agree on which encoding must be used.
I don't understand why you reconvert the base64 string in array of bytes: base64 is used to avoid encoding incompatibilities when transmitting, so you should keep is as a string and output it in the format required by the protocol you use to transmit the data. And, as Marc said, it's definitely overkill to use UTF-16 for that purpose, since base64 includes only 64 characters, all under 128.
Related
I get strange results when converting byte array to string and then converting the string back to byte array.
Try this:
byte[] b = new byte[1];
b[0] = 172;
string s = Encoding.ASCII.GetString(b);
byte[] b2 = Encoding.ASCII.GetBytes(s);
MessageBox.Show(b2[0].ToString());
And the result for me is not 172 as I'd expect but... 63.
Why does it happen?
Why does it happen?
Because ASCII only contains values up to 127.
When faced with binary data which is invalid for the given encoding, Encoding.GetString can provide a replacement character, or throw an exception. Here, it's using a replacement character of ?.
It's not clear exactly what you're trying to achieve, but:
If you're converting arbitrary binary data to text, use Convert.ToBase64String instead; do not try to use an encoding, as you're not really representing text. You can use Convert.FromBase64String to then decode.
Encoding.ASCII is usually a bad choice, and certainly binary data including a byte of 172 is not ASCII text
You need to work out which encoding you're actually using. Personally I dislike using Encoding.Default unless you really know the data is in the default encoding for the platform you're working on. If you get the choice, using UTF-8 is a good one.
ASCII encoding is a 7-bit encoding. If you take a look into generated string it contains "?" - unrecognized character. You might choose Encoding.Default instead.
ASCII is a seven bit character encoding, so 172 falls out of that range, so when converting to a string, it converts to "?" which is used for characters that cannot be represented.
I'm converting the encrypted text using UTF8, yet the resulting string has funny characters that I can't read and not sure if I can send this text to the browser.
string message = "hello world";
var rsa = new RSACryptoServiceProvider(2048);
var c = new UTF8Encoding();
byte[] dataToEncrypt = c.GetBytes(message);
byte[] encryptedData = rsa.Encrypt(dataToEncrypt, false);
string output = c.GetString(encryptedData);
Console.WriteLine(output);
Console.ReadLine();
When I run the above, I get the following:
�VJI����J/;�>�:<�M����g�1�7�A.#�`J�s��~��)�Fn�����5�.���o���ҵ���jH3;G�<<��F�͗��~?�Y�#���j���6l{{�Y�$�]���nylz���X8u�\f�V1/�$�n+�\b��\b�fsAh՝G\n�\t���\b���6߇3����Ԕ���4��#هhI���'\0� T�n��|EϺ^7ú l��T\\!�w���QRWA%p��V\f��5�
I need to send this text back to the browser, or save it to a file and currently I'm not sure why I am getting these characters?
The problem is that you are taking an array of bytes that was not created by encoding text, and use it as if it was. You can only decode data that was created by encoding, if you decode any arbitrary data, you end up with garbage.
If you want the binary data produced by the encryption as a string, use for example base64 encoding:
string output = Convert.ToBase64String(encryptedData);
When you want to decrypt the data, use Convert.FromBase64String to get the byte array back, decrypt it, and use Encoding.UTF8.GetString to turn it back into the original string. There it will work do decode the data, because it was created by encoding the string from the beginning.
These two lines are pretending that the output of an RSA-encrypted UTF-8 sequence is a valid UTF-8 sequence:
var c = new UTF8Encoding();
string output = c.GetString(encryptedData);
But this is simply not the case: the RSA encryption maps byte values to other, (seemingly) arbitrary byte values. The resulting byte sequence doesn’t form a valid UTF-8 sequence (there is no reason to assume that it would), and thus cannot be treated as one.
If you merely want a readable (or HTTP sendable) representation of your data, then Base64 is the way to go, as shown in other answers. Fundamentally, though, you should probably read Joel’s article about The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Encrypting a string will result in a byte array that contains non-printable characters. You'll want to convert it to base64 to have a readable version of it.
I'm trying to convert UTF-8 to base64 string.
Example: I have "abcdef==" in UTF-8. It's in fact a "representation" of a base64 string.
How can I retrieve a "abcdef==" base64 string (note that I don't want a "abcdef==" "translation" from UTF-8, I want to get a string encoded in base64 which is "abcdef==").
EDIT
As my question seems to be unclear, here is a reformulation:
My byte array (let's say I name it A) is represented by a base64 string. Converting A to base64 gives me "abcdef==".
This string representation is sent through a socket in UTF-8 (note that the string representation is exactly the same in UTF-8 and base64). So I receive an UTF-8 message which contains "whatever/abcdef==/whatever" in UTF-8.
So I need to retrieve the base64 "abcedf==" string from this socket message in order to get A.
I hope this is more clear!
It's a little difficult to tell what you're trying to achieve, but assuming you're trying to get a Base64 string that when decoded is abcdef==, the following should work:
byte[] bytes = Encoding.UTF8.GetBytes("abcdef==");
string base64 = Convert.ToBase64String(bytes);
Console.WriteLine(base64);
This will output: YWJjZGVmPT0= which is abcdef== encoded in Base64.
Edit:
To decode a Base64 string, simply use Convert.FromBase64String(). E.g.
string base64 = "YWJjZGVmPT0=";
byte[] bytes = Convert.FromBase64String(base64);
At this point, bytes will be a byte[] (not a string). If we know that the byte array represents a string in UTF8, then it can be converted back to the string form using:
string str = Encoding.UTF8.GetString(bytes);
Console.WriteLine(str);
This will output the original input string, abcdef== in this case.
I do not want to encode a string to a byte[]. I want to turn a string of hex numbers to a byte[]. How can I do that?
Note: I again repeat I do not want to use Encoding.UTF8.GetBytes() or any other encoding.
A sample string is detailed below:
0x42A2C6A046057454C2D1AB2CE5A0147ACF1E728E1888367CF3218A1D513C72E582DBDC7F8C4674777CA148E4EFA0B4944BB4998F446724D4F56D96B507EAE619
How can I convert this string to a byte[] of the numbers in the string.
There is no unambiguous way to convert a string to a byte array, that's why you need to use the Encoding class. In your case, you can use Encoding.ASCII.GetBytes(), because you only have characters from the ASCII charset.
I'm trying to store a Gzip serialized object into Active Directory's "Extension Attribute", more info here. This field is a Unicode string according to it's oM syntax of 64.
What is the most efficient way to store a binary blob as Unicode? Once I get this down, the rest is a piece of cake.
There are, of course, many ways of reliably packing an arbitrary byte array into Unicode characters, but none of them are very efficient. It is very unfortunate that ActiveDirectory would choose to use Unicode for data that is not textual in nature. It’s like using a string to represent a 32-bit integer, or like using Nutella to write a love letter.
My recommendation would be to “play it safe” and use an ASCII-based encoding such as base64. The reason I recommend this is because there is already a built-in .NET implementation for this:
var base64Encoded = Convert.ToBase64String(byteArray);
var original = Convert.FromBase64String(base64Encoded);
In theory you could come up with an encoding that is more efficient than this by making use of more of the Unicode character set. However, in order to do so reliably, you would need to know quite a bit about Unicode.
Normally, this would be the way to convert between bytes and Unicode text:
// string from bytes
System.Text.Encoding.Unicode.GetString(bytes);
// bytes from string
System.Text.Encoding.Unicode.GetBytes(bytes);
EDIT:
But since not every possible byte sequence is a valid Unicode string, you should use a method that can create a string from an arbitrary byte sequence:
// string from bytes
Convert.ToBase64String(byteArray);
// bytes from string
Convert.FromBase64String(base64Encoded);
(Thanks to #Timwi who pointed this out!)