I'm trying to convert UTF-8 to base64 string.
Example: I have "abcdef==" in UTF-8. It's in fact a "representation" of a base64 string.
How can I retrieve a "abcdef==" base64 string (note that I don't want a "abcdef==" "translation" from UTF-8, I want to get a string encoded in base64 which is "abcdef==").
EDIT
As my question seems to be unclear, here is a reformulation:
My byte array (let's say I name it A) is represented by a base64 string. Converting A to base64 gives me "abcdef==".
This string representation is sent through a socket in UTF-8 (note that the string representation is exactly the same in UTF-8 and base64). So I receive an UTF-8 message which contains "whatever/abcdef==/whatever" in UTF-8.
So I need to retrieve the base64 "abcedf==" string from this socket message in order to get A.
I hope this is more clear!
It's a little difficult to tell what you're trying to achieve, but assuming you're trying to get a Base64 string that when decoded is abcdef==, the following should work:
byte[] bytes = Encoding.UTF8.GetBytes("abcdef==");
string base64 = Convert.ToBase64String(bytes);
Console.WriteLine(base64);
This will output: YWJjZGVmPT0= which is abcdef== encoded in Base64.
Edit:
To decode a Base64 string, simply use Convert.FromBase64String(). E.g.
string base64 = "YWJjZGVmPT0=";
byte[] bytes = Convert.FromBase64String(base64);
At this point, bytes will be a byte[] (not a string). If we know that the byte array represents a string in UTF8, then it can be converted back to the string form using:
string str = Encoding.UTF8.GetString(bytes);
Console.WriteLine(str);
This will output the original input string, abcdef== in this case.
Related
I'm using json.net to read data sent in json format from a server. The server encodes all string-type data it sends in json as utf-8.
Now to read the data in c# I do something like this: string s = json.Value<string>("data");
I assume the string s is now in utf-8 format, whereas the default encoding for strings in c# is utf-16 (unicode).
To convert the string to unicode, would this be correct?
byte[] bytes = Encoding.Unicode.GetBytes(s);
string unicode = Encoding.UTF8.GetString(bytes);
What I want (I think) is the raw bytes from s and then pass that to the utf-8 decoder to get unicode, but I'm not sure what exactly Encoding.Unicode.GetBytes gives me, or what I should use instead.
There is no need to convert anything, since string objects in .NET are encoded in UTF-16.
If there is anything to change, you should change something where JSON.NET deserializes the string: you can't double parse it. The incoming JSON string is already interpreted for a specific encoding. You can't go back from there without the original bytes.
I'm converting the encrypted text using UTF8, yet the resulting string has funny characters that I can't read and not sure if I can send this text to the browser.
string message = "hello world";
var rsa = new RSACryptoServiceProvider(2048);
var c = new UTF8Encoding();
byte[] dataToEncrypt = c.GetBytes(message);
byte[] encryptedData = rsa.Encrypt(dataToEncrypt, false);
string output = c.GetString(encryptedData);
Console.WriteLine(output);
Console.ReadLine();
When I run the above, I get the following:
�VJI����J/;�>�:<�M����g�1�7�A.#�`J�s��~��)�Fn�����5�.���o���ҵ���jH3;G�<<��F�͗��~?�Y�#���j���6l{{�Y�$�]���nylz���X8u�\f�V1/�$�n+�\b��\b�fsAh՝G\n�\t���\b���6߇3����Ԕ���4��#هhI���'\0� T�n��|EϺ^7ú l��T\\!�w���QRWA%p��V\f��5�
I need to send this text back to the browser, or save it to a file and currently I'm not sure why I am getting these characters?
The problem is that you are taking an array of bytes that was not created by encoding text, and use it as if it was. You can only decode data that was created by encoding, if you decode any arbitrary data, you end up with garbage.
If you want the binary data produced by the encryption as a string, use for example base64 encoding:
string output = Convert.ToBase64String(encryptedData);
When you want to decrypt the data, use Convert.FromBase64String to get the byte array back, decrypt it, and use Encoding.UTF8.GetString to turn it back into the original string. There it will work do decode the data, because it was created by encoding the string from the beginning.
These two lines are pretending that the output of an RSA-encrypted UTF-8 sequence is a valid UTF-8 sequence:
var c = new UTF8Encoding();
string output = c.GetString(encryptedData);
But this is simply not the case: the RSA encryption maps byte values to other, (seemingly) arbitrary byte values. The resulting byte sequence doesn’t form a valid UTF-8 sequence (there is no reason to assume that it would), and thus cannot be treated as one.
If you merely want a readable (or HTTP sendable) representation of your data, then Base64 is the way to go, as shown in other answers. Fundamentally, though, you should probably read Joel’s article about The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
Encrypting a string will result in a byte array that contains non-printable characters. You'll want to convert it to base64 to have a readable version of it.
I do not want to encode a string to a byte[]. I want to turn a string of hex numbers to a byte[]. How can I do that?
Note: I again repeat I do not want to use Encoding.UTF8.GetBytes() or any other encoding.
A sample string is detailed below:
0x42A2C6A046057454C2D1AB2CE5A0147ACF1E728E1888367CF3218A1D513C72E582DBDC7F8C4674777CA148E4EFA0B4944BB4998F446724D4F56D96B507EAE619
How can I convert this string to a byte[] of the numbers in the string.
There is no unambiguous way to convert a string to a byte array, that's why you need to use the Encoding class. In your case, you can use Encoding.ASCII.GetBytes(), because you only have characters from the ASCII charset.
I created a webservice which returns a (binary) file. Unfortunately, I cannot use byte[] so I have to convert the byte array to a string.
What I do at the moment is the following (but it does not work):
Convert file to string:
byte[] arr = File.ReadAllBytes(fileName);
System.Text.UnicodeEncoding enc = new System.Text.UnicodeEncoding();
string fileAsString = enc.GetString(arr);
To check if this works properly, I convert it back via:
System.Text.UnicodeEncoding enc = new System.Text.UnicodeEncoding();
byte[] file = enc.GetBytes(fileAsString);
But at the end, the original byte array and the byte array created from the string aren't equal. Do I have to use another method to read the file to a byte array?
Use Convert.ToBase64String to convert it to text, and Convert.FromBase64String to convert back again.
Encoding is used to convert from text to a binary representation, and from a binary representation of text back to text again. In this case you don't have a binary representation of text - you just have arbitrary binary data... so Encoding is inappropriate. Even if you use an encoding which can "sort of" handle any binary data (e.g. ISO Latin 1) you'll find that many ways of transmitting text will fail when you've got control characters etc.
Base64 encoding will give you text which is just ASCII, and much easier to handle.
I'm trying to write a function that converts a string to a base64 byte array. I've tried with this approach:
public byte[] stringToBase64ByteArray(String input)
{
byte[] ret = System.Text.Encoding.Unicode.GetBytes(input);
string s = Convert.ToBase64String(input);
ret = System.Text.Encoding.Unicode.GetBytes(s);
return ret;
}
Would this function produce a valid result (provided that the string is in unicode)?
Thanks!
You can use:
From byte[] to string:
byte[] array = somebytearray;
string result = Convert.ToBase64String(array);
From string to byte[]:
array = Convert.FromBase64String(result);
Looks okay, although the approach is strange. But use Encoding.ASCII.GetBytes() to convert the base64 string to byte[]. Base64 encoding only contains ASCII characters. Using Unicode gets you an extra 0 byte for each character.
Representing a string as a blob represented as a string is odd... any reason you can't just use the string directly?
The string is always unicode; it is the encoded bytes that change. Since base-64 is always <128, using unicode in the last part seems overkill (unless that is what the wire-format demands). Personally, I'd use UTF8 or ASCII for the last GetBytes so that each base-64 character only takes one byte.
All strings in .NET are unicode. This code will produce valid result but the consumer of the BASE64 string should also be unicode enabled.
Yes, it would output a base64-encoded string of the UTF-16 little-endian representation of your source string. Keep in mind that, AFAIK, it's not really common to use UTF-16 in base64, ASCII or UTF-8 is normally used. However, the important thing here is that the sender and the receiver agree on which encoding must be used.
I don't understand why you reconvert the base64 string in array of bytes: base64 is used to avoid encoding incompatibilities when transmitting, so you should keep is as a string and output it in the format required by the protocol you use to transmit the data. And, as Marc said, it's definitely overkill to use UTF-16 for that purpose, since base64 includes only 64 characters, all under 128.