I do not want to encode a string to a byte[]. I want to turn a string of hex numbers to a byte[]. How can I do that?
Note: I again repeat I do not want to use Encoding.UTF8.GetBytes() or any other encoding.
A sample string is detailed below:
0x42A2C6A046057454C2D1AB2CE5A0147ACF1E728E1888367CF3218A1D513C72E582DBDC7F8C4674777CA148E4EFA0B4944BB4998F446724D4F56D96B507EAE619
How can I convert this string to a byte[] of the numbers in the string.
There is no unambiguous way to convert a string to a byte array, that's why you need to use the Encoding class. In your case, you can use Encoding.ASCII.GetBytes(), because you only have characters from the ASCII charset.
Related
I'm trying to convert a string like
<Root>á</Root>
To it's UTF string representation, like this
<Root>á</Root>
(Taken from this page: http://www.cafewebmaster.com/online_tools/utf8_encode)
But when I issue Encoding.UTF8.GetBytes(str) I get an array of utf bytes.
How can I convert those bytes to the string representation I'm after?
--
Thanks for pointing that there is no string representation of an utf8 string.
Just to clarify my needs, I have to execute something like this in sql 2008:
xmlAuditoria_Alta
'
<Out>utf8 char: á</Out>
'
This is the only way I found so far to have this stored precedure correctly save the value
utf8 char: á
That's why I'm trying to convert from á to á
Perhaps there's a more correct way to do it
Your question is based on an erroneous premise.
<Root>á</Root>
is not the UTF-8 representation of your string. In fact that string is the UTF-8 bytes re-interpreted in some other single-byte 8 bit character set.
If you want to convert a C# string to UTF-8 then you do indeed write:
Encoding.UTF8.GetBytes(str)
I'm trying to convert UTF-8 to base64 string.
Example: I have "abcdef==" in UTF-8. It's in fact a "representation" of a base64 string.
How can I retrieve a "abcdef==" base64 string (note that I don't want a "abcdef==" "translation" from UTF-8, I want to get a string encoded in base64 which is "abcdef==").
EDIT
As my question seems to be unclear, here is a reformulation:
My byte array (let's say I name it A) is represented by a base64 string. Converting A to base64 gives me "abcdef==".
This string representation is sent through a socket in UTF-8 (note that the string representation is exactly the same in UTF-8 and base64). So I receive an UTF-8 message which contains "whatever/abcdef==/whatever" in UTF-8.
So I need to retrieve the base64 "abcedf==" string from this socket message in order to get A.
I hope this is more clear!
It's a little difficult to tell what you're trying to achieve, but assuming you're trying to get a Base64 string that when decoded is abcdef==, the following should work:
byte[] bytes = Encoding.UTF8.GetBytes("abcdef==");
string base64 = Convert.ToBase64String(bytes);
Console.WriteLine(base64);
This will output: YWJjZGVmPT0= which is abcdef== encoded in Base64.
Edit:
To decode a Base64 string, simply use Convert.FromBase64String(). E.g.
string base64 = "YWJjZGVmPT0=";
byte[] bytes = Convert.FromBase64String(base64);
At this point, bytes will be a byte[] (not a string). If we know that the byte array represents a string in UTF8, then it can be converted back to the string form using:
string str = Encoding.UTF8.GetString(bytes);
Console.WriteLine(str);
This will output the original input string, abcdef== in this case.
I created a webservice which returns a (binary) file. Unfortunately, I cannot use byte[] so I have to convert the byte array to a string.
What I do at the moment is the following (but it does not work):
Convert file to string:
byte[] arr = File.ReadAllBytes(fileName);
System.Text.UnicodeEncoding enc = new System.Text.UnicodeEncoding();
string fileAsString = enc.GetString(arr);
To check if this works properly, I convert it back via:
System.Text.UnicodeEncoding enc = new System.Text.UnicodeEncoding();
byte[] file = enc.GetBytes(fileAsString);
But at the end, the original byte array and the byte array created from the string aren't equal. Do I have to use another method to read the file to a byte array?
Use Convert.ToBase64String to convert it to text, and Convert.FromBase64String to convert back again.
Encoding is used to convert from text to a binary representation, and from a binary representation of text back to text again. In this case you don't have a binary representation of text - you just have arbitrary binary data... so Encoding is inappropriate. Even if you use an encoding which can "sort of" handle any binary data (e.g. ISO Latin 1) you'll find that many ways of transmitting text will fail when you've got control characters etc.
Base64 encoding will give you text which is just ASCII, and much easier to handle.
I am getting a character from a emf record using Encoding.Unicode.GetString and the resulting string contains only one character but has two bytes. I don't have any idea about the encoding scheme and the multi byte character set. I want to convert that character to its equivalent single hexadecimal value.Can you help me regarding this..
It's not clear what you mean. A char in C# is a 16-bit unsigned value. If you've got a binary data source and you want to get Unicode characters, you should use an Encoding to decode the binary data into a string, that you can access as a sequence of char values.
You can convert a char to a hex string by first converting it to an integer, and then using the X format specifier like this:
char = '\u0123';
string hex = ((int)c).ToString("X4"); // Now hex = "0123"
Now, that leaves one more issue: surrogate pairs. Values which aren't in the Basic Multilingual Plane (U+0000 to U+FFFF) are represented by two UTF-16 code units - a high surrogate and a low surrogate. You can use the char.IsSurrogate* methods to check for surrogate pairs... although it's harder (as far as I can see) to then convert a surrogate pair into a UCS-4 value. If you're lucky, you won't need to deal with this... if you're happy converting your binary data into a sequence of UTF-16 code units instead of strict UCS-4 values, you don't need to worry.
EDIT: Given your comments, it's still not entirely clear what you've got to start with. You say you've got two bytes... are they separate, or in a byte array? What do they represent? Text in a particular encoding, presumably... but which encoding? Once you know the encoding, you can convert a byte array into a string easily:
byte[] bytes = ...;
// For example, if your binary data is UTF-8
string text = Encoding.UTF8.GetString(bytes);
char firstChar = text[0];
string hex = ((int)firstChar).ToString("X4");
If you could edit your question to give more details about your actual situation, it would be a lot easier to help you get to a solution. If you're generally confused about encodings and the difference between text and binary data, you might want to read my article about it.
Try this:
System.Text.Encoding.Unicode.GetBytes(theChar.ToString())
.Aggregate("", (agg, val) => agg + val.ToString("X2"));
However, since you don't specify exactly what encoding that the character is in, this could fail. Futher, you don't make it very clear if you want the output to be a string of hex chars or bytes. I'm guessing the former, since I'd guess you want to generate HTML. Let me know if any of this is wrong.
I created an extension method to convert unicode or non-unicode string to hex string.
I shared for whom concern.
public static class StringHelper
{
public static string ToHexString(this string str)
{
byte[] bytes = str.IsUnicode() ? Encoding.UTF8.GetBytes(str) : Encoding.Default.GetBytes(str);
return BitConverter.ToString(bytes).Replace("-", string.Empty);
}
public static bool IsUnicode(this string input)
{
const int maxAnsiCode = 255;
return input.Any(c => c > maxAnsiCode);
}
}
Get thee to StringInfo:
http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx
http://msdn.microsoft.com/en-us/library/8k5611at.aspx
The .NET Framework supports text elements. A text element is a unit of text that is displayed as a single character, called a grapheme. A text element can be a base character, a surrogate pair, or a combining character sequence. The StringInfo class provides methods that allow your application to split a string into its text elements and iterate through the text elements. For an example of using the StringInfo class, see String Indexing.
I'm trying to write a function that converts a string to a base64 byte array. I've tried with this approach:
public byte[] stringToBase64ByteArray(String input)
{
byte[] ret = System.Text.Encoding.Unicode.GetBytes(input);
string s = Convert.ToBase64String(input);
ret = System.Text.Encoding.Unicode.GetBytes(s);
return ret;
}
Would this function produce a valid result (provided that the string is in unicode)?
Thanks!
You can use:
From byte[] to string:
byte[] array = somebytearray;
string result = Convert.ToBase64String(array);
From string to byte[]:
array = Convert.FromBase64String(result);
Looks okay, although the approach is strange. But use Encoding.ASCII.GetBytes() to convert the base64 string to byte[]. Base64 encoding only contains ASCII characters. Using Unicode gets you an extra 0 byte for each character.
Representing a string as a blob represented as a string is odd... any reason you can't just use the string directly?
The string is always unicode; it is the encoded bytes that change. Since base-64 is always <128, using unicode in the last part seems overkill (unless that is what the wire-format demands). Personally, I'd use UTF8 or ASCII for the last GetBytes so that each base-64 character only takes one byte.
All strings in .NET are unicode. This code will produce valid result but the consumer of the BASE64 string should also be unicode enabled.
Yes, it would output a base64-encoded string of the UTF-16 little-endian representation of your source string. Keep in mind that, AFAIK, it's not really common to use UTF-16 in base64, ASCII or UTF-8 is normally used. However, the important thing here is that the sender and the receiver agree on which encoding must be used.
I don't understand why you reconvert the base64 string in array of bytes: base64 is used to avoid encoding incompatibilities when transmitting, so you should keep is as a string and output it in the format required by the protocol you use to transmit the data. And, as Marc said, it's definitely overkill to use UTF-16 for that purpose, since base64 includes only 64 characters, all under 128.