Which encoding should I use to write bbb to a file as exact bytes, so if the file were opened in a hex editor, its contents would be "99 59"?
The following methods created incorrect results, as listed:
Byte[] bbb = { 0x99, 0x59 };
string o = System.Text.Encoding.UTF32.GetString(bbb);
UTF32 (above) writes 'EF BF BD', UTF7 writes 'C2 99 59', UTF8 writes 'EF BF BD 59', Unicode writes 'E5 A6 99', ASCII writes '3F 59'
What encoding will produce the un-changed 8-bit bytes?
If you want bytes to be written unencoded to a file/stream, simply write them to the file/stream.
File.WriteAllBytes(#"d:\temp\test.bin", bbb);
or
stream.Write(bbb, 0, bbb.Length);
Don't encode them at all.
Related
I'm wondering if I can know how long in bytes for a string in C#, anyone know?
You can use encoding like ASCII to get a character per byte by using the System.Text.Encoding class.
or try this
System.Text.ASCIIEncoding.Unicode.GetByteCount(string);
System.Text.ASCIIEncoding.ASCII.GetByteCount(string);
From MSDN:
A String object is a sequential collection of System.Char objects that represent a string.
So you can use this:
var howManyBytes = yourString.Length * sizeof(Char);
System.Text.ASCIIEncoding.Unicode.GetByteCount(yourString);
Or
System.Text.ASCIIEncoding.ASCII.GetByteCount(yourString);
How many bytes a string will take depends on the encoding you choose (or is automatically chosen in the background without your knowledge). This sample code shows the difference:
void Main()
{
string text = "a🡪";
Console.WriteLine("{0,15} length: {1}", "String", text.Length);
PrintInfo(text, Encoding.ASCII); // Note that '🡪' cannot be encoded in ASCII, information loss will occur
PrintInfo(text, Encoding.UTF8); // This should always be your choice nowadays
PrintInfo(text, Encoding.Unicode);
PrintInfo(text, Encoding.UTF32);
}
void PrintInfo(string input, Encoding encoding)
{
byte[] bytes = encoding.GetBytes(input);
var info = new StringBuilder();
info.AppendFormat("{0,16} bytes: {1} (", encoding.EncodingName, bytes.Length);
info.AppendJoin(' ', bytes);
info.Append(')');
string decodedString = encoding.GetString(bytes);
info.AppendFormat(", decoded string: \"{0}\"", decodedString);
Console.WriteLine(info.ToString());
}
Output:
String length: 3
US-ASCII bytes: 3 (97 63 63), decoded string: "a??"
Unicode (UTF-8) bytes: 5 (97 240 159 161 170), decoded string: "a🡪"
Unicode bytes: 6 (97 0 62 216 106 220), decoded string: "a🡪"
Unicode (UTF-32) bytes: 8 (97 0 0 0 106 248 1 0), decoded string: "a🡪"
With respect to this tool, I need to convert hexadecimal data, irrespective of their combination to equivalent text. For example:
"HelloWorld" = 48656c6c6f576f726c64;
The solution needs to take into account that hexadecimal can be grouped in different lengths:
48656c6c 6f576f72 6c64
or
48 65 6c 6c 6f 57 6f 72 6c 64
All of the hexadecimal values supplied above read as HelloWorld when converted to text.
First, I would like to point out that this question has been asked many times on the web (here is one example). However, I am going to break this down step by step for you to hopefully teach you how to not only utilize your resources available on the web, but also how to solve your problem.
Overview: Converting from hexadecimal data to text that is able to be read by human beings is a straight-forward process in modern development languages; you clean the data (ensuring no illegal characters remain), then you convert down to the byte level so that you can work with the raw data. Finally, you'll convert that raw data into readable text utilizing a method that has already been created by Microsoft.
Important: Remember, for the conversion to work, you have to ensure you're converting in the same format that you started with:
ASCII -> ASCII: Works Great!
ASCII -> UTF7: Not so much...
Removing Illegal Characters: One of the first things you'll need to do is ensure the hexadecimal value that you're supplying doesn't contain any illegal characters. The simplest way to do this is to create an array of acceptable characters and then remove anything but these in a loop:
private string GetCleanHex(string hex) {
string legalCharacters = "0123456789ABCDEF";
string result = hex.ToUpper();
foreach (char c in result) {
if (!legalCharacters.Contains(c))
result = result.Replace(c.ToString(), string.Empty);
}
}
Getting The Byte Array: Once you've cleaned out all illegal characters, you can now convert your hexadecimal string into a byte array. This is required to convert from hexadecimal to ASCII. This step was provided by the linked post above:
private byte[] GetBytesFromHex(string hex) {
byte[] bytes = new byte[result.Length / 2];
for (int i = 0; i < bytes.Length; i++)
bytes[i] = Convert.ToByte(result.Substring(i * 2, 2), 16);
}
Converting To Text: Now that you've cleaned your data, and converted it to a byte[], you can now convert that byte data into ASCII. This can be done using a method available in Encoding.ASCII called GetString:
string text = Encoding.ASCII.GetString(bytes);
The Final Result: Plug all of this into your application and you'll have successfully converted hexadecimal data into clean, readable text:
string hex = GetCleanHex("506c 65 61736520 72 656164 20686f77 2074 6f 2061 73 6b 2e");
byte[] bytes = GetBytesFromHex(hex);
string text = Encoding.ASCII.GetString(bytes);
Console.WriteLine(text);
Console.ReadKey();
The code above will print the following text to the console:
Please read how to ask.
I saw this code example:
using (FileStream fStream = File.Open(#"C:\myMessage.dat", FileMode.Create))
{
string msg = "Helloo";
byte[] msgAsByteArray = Encoding.Default.GetBytes(msg);
foreach (var a in msgAsByteArray)
{
Console.WriteLine($"a: {a}");
}
// Write byte[] to file.
fStream.Write(msgAsByteArray, 0, msgAsByteArray.Length);
// Reset internal position of stream.
fStream.Position = 0;
// Read the types from file and display to console.
Console.Write("Your message as an array of bytes: ");
byte[] bytesFromFile = new byte[msgAsByteArray.Length];
for (int i = 0; i < msgAsByteArray.Length; i++)
{
bytesFromFile[i] = (byte)fStream.ReadByte();
Console.Write(bytesFromFile[i]);
}
// Display decoded messages.
Console.Write("\nDecoded Message: ");
Console.WriteLine(Encoding.Default.GetString(bytesFromFile));
And the result of Console.WriteLine($"a: {a}") is this:
a: 72
a: 101
a: 108
a: 108
a: 111
a: 111
1.
I thought byte[] is composed of many each unit of byte.
But each byte is represented in integer number.
That numbers must be corresponding ASCII characters.
In C#, byte array means data represented in ASCII?
2.
Is the file myMessage.dat composed of binary data composed of only 0 and 1?
But when I open myMessage.dat with the text editor, it's showing Helloo text string. What's the reason for this?
A byte is a 8bit integer with values from 0 to 255. The output to console outputs the normal number, by providing a format string (https://learn.microsoft.com/en-us/dotnet/standard/base-types/standard-numeric-format-strings) you can output as hex. You can use this answer to get the binary representation.
You explicitly converted the "Halloo" to bytes with Encoding.Default.GetBytes() - that is kindof like converting it to its ascii value but heeding the default encoding on your system.
Your texteditor interpretes the data of the file and displays it as it can. If you put a byte[] myBytes = new [] {0,7,12,3,9,30} into a file and open that with your textedit you will get nonreadable texts as "normal text" starts around 32 , before are f.e. tabs, bells, line feeds and other special non printable characters. See f.e. NonPrintableAscii
I'm wondering if I can know how long in bytes for a string in C#, anyone know?
You can use encoding like ASCII to get a character per byte by using the System.Text.Encoding class.
or try this
System.Text.ASCIIEncoding.Unicode.GetByteCount(string);
System.Text.ASCIIEncoding.ASCII.GetByteCount(string);
From MSDN:
A String object is a sequential collection of System.Char objects that represent a string.
So you can use this:
var howManyBytes = yourString.Length * sizeof(Char);
System.Text.ASCIIEncoding.Unicode.GetByteCount(yourString);
Or
System.Text.ASCIIEncoding.ASCII.GetByteCount(yourString);
How many bytes a string will take depends on the encoding you choose (or is automatically chosen in the background without your knowledge). This sample code shows the difference:
void Main()
{
string text = "a🡪";
Console.WriteLine("{0,15} length: {1}", "String", text.Length);
PrintInfo(text, Encoding.ASCII); // Note that '🡪' cannot be encoded in ASCII, information loss will occur
PrintInfo(text, Encoding.UTF8); // This should always be your choice nowadays
PrintInfo(text, Encoding.Unicode);
PrintInfo(text, Encoding.UTF32);
}
void PrintInfo(string input, Encoding encoding)
{
byte[] bytes = encoding.GetBytes(input);
var info = new StringBuilder();
info.AppendFormat("{0,16} bytes: {1} (", encoding.EncodingName, bytes.Length);
info.AppendJoin(' ', bytes);
info.Append(')');
string decodedString = encoding.GetString(bytes);
info.AppendFormat(", decoded string: \"{0}\"", decodedString);
Console.WriteLine(info.ToString());
}
Output:
String length: 3
US-ASCII bytes: 3 (97 63 63), decoded string: "a??"
Unicode (UTF-8) bytes: 5 (97 240 159 161 170), decoded string: "a🡪"
Unicode bytes: 6 (97 0 62 216 106 220), decoded string: "a🡪"
Unicode (UTF-32) bytes: 8 (97 0 0 0 106 248 1 0), decoded string: "a🡪"
how can i convert
Hex UTF-8 bytes -E0 A4 A4 to hex code point - 0924
ref: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=e0+a4+a4&mode=bytes
I need this because when i read Unicode data in c# it is taking it as single byte sequence and displaying 3 characters instead of 1,but i need 3 byte sequence(read 3 bytes and display single character),I tried many solutions but didn't get the result.
If I can display or store a 3-byte sequence utf-8 character then I don't need conversion.
senario is like this:
string str=getivrresult();
in str I have a word with each character as 3 byte utf-8 sequence.
Edited:
string str="à ¤¤";
//i want it as "त" in str.
Character त
Character name DEVANAGARI LETTER TA
Hex code point 0924
Decimal code point 2340
Hex UTF-8 bytes E0 A4 A4
Octal UTF-8 bytes 340 244 244
UTF-8 bytes as Latin-1 characters bytes à ¤ ¤
Thank You.
Use the GetString methdod in the Encoding class:
byte[] data = { 0xE0, 0xA4, 0xA4 };
string str = Encoding.UTF8.GetString(data);
The string now contains one character with the character code 0x924.
//utf-8 Single Byte Sequence input
string str = "à ¤¤";
int i = 0;
byte[] data=new byte[3];
foreach (char c in str)
{
string tmpstr = String.Format("{0:x2}", (int)c);
data[i] = Convert.ToByte(int.Parse(tmpstr, System.Globalization.NumberStyles.HexNumber));
i++;
}
//utf-8 3-Byte Sequence Output now stp contains "त".
string stp = Encoding.UTF8.GetString(data);