How do I convert from unicode to single byte in C#? - c#

How do I convert from unicode to single byte in C#?
This does not work:
int level =1;
string argument;
// and then argument is assigned
if (argument[2] == Convert.ToChar(level))
{
// does not work
}
And this:
char test1 = argument[2];
char test2 = Convert.ToChar(level);
produces funky results. test1 can be: 49 '1' while test2 will be 1 ''

How do I convert from unicode to single byte in C#?
This question makes no sense, and the sample code just makes things worse.
Unicode is a mapping from characters to code points. The code points are numbered from 0x0 to 0x10FFFF, which is far more values than can be stored in a single byte.
And the sample code has an int, a string, and a char. There are no bytes anywhere.
What are you really trying to do?

Use UnicodeEncoding.GetBytes().
UnicodeEncoding unicode = new UnicodeEncoding();
Byte[] encodedBytes = unicode.GetBytes(unicodeString);

char and string are always Unicode in .NET. You can't do it the way you're trying.
In fact, what are you trying to accomplish?

If you want to test whether the int level matches the char argument[2] then use
if (argument[2] == Convert.ToChar(level + (int)'0'))

Related

Convert a Binary String longer than 8 into a single unicode character

I need to convert a string into binary and back.
But when the input is an Unicode Character the resulting Binary String is longer than 8 characters and converting it back using Convert.ToByte causes an Overflow Exception.
I used the Convert.ToString method to convert the String into Binary.
String input = "⌓" //Unicode Character
String binary = Convert.ToString(input, 2); //Convert to Binary
//->> Returns "10001100010011"
byte re = Convert.ToByte("10001100010011", 2); //This causes an Overflow Exception
//What should I do here? Is there a better reverse of Convert.ToString than Convert.ToByte?
Is there an alternative or something similar?
I think the problem here is that a byte is only 8 bits. You are putting in a string with 14 bits in it. Try using Convert.ToUint16() and declare "re" as Uint16 like so:
UInt16 re = Convert.ToUInt16("10001100010011", 2);
// UInt16 is large enough to hold the data
Console.WriteLine(re); // prints 8979
//unicode decimal 8979: http://www.codetable.net/decimal/8979

Converting a string getting Wrong value

I write a simple code and need to get each number in a string as an Integer
But when I try to use Convert.ToInt32() it give me another value
Example:
string x="4567";
Console.WriteLine(x[0]);
the result will be 4 , but if i try to use Convert
Console.WriteLine(Convert.ToInt32(x[0]));
it Give me 52 !!
I try using int.TryParse() and its the same
According to the docs:
The ToInt32(Char) method returns a 32-bit signed integer that represents the UTF-16 encoded code unit of the value argument. If value is not a low surrogate or a high surrogate, this return value also represents the Unicode code point of value.
In order to get 4, you'd have to convert it to a string before converting to int32:
Console.WriteLine(Convert.ToInt32(x[0].ToString()));
you can alternatively do this way to loop through all characters in a string
foreach (char c in x)
{
Console.WriteLine(c);
}
Will Print 4,5,6,7
And as suggested in earlier answer before converting to integer make that as a sting so it doesn't return the ascii code
A char is internally a short integer of it's ASCII representation.
If you cast/convert (explicitly or implicitly) a char to int, you will get it's Ascii value. Example Convert.ToInt32('4') = 52
But when you print it to Console, you are using it's ToString() method implicitly, so you are actually printing the ASCII character '4'. Example: Console.WriteLine(x[0]) is equivalent to Console.WriteLine("4")
Try using an ASCII letter so you will notice the difference clearly:
Console.WriteLine((char)('a')); // a
Console.WriteLine((int)('a')); // 97
Now play with char math:
Console.WriteLine((char)('a'+5)); // f
Console.WriteLine((int)('a'+5)); // 102
Bottom line, just use int digit = int.Parse(x[0]);
Or the funny (but less secure) way: int digit = x[0] - '0';
x isnt x[0], x[0] is the first char, '4' which is 52.

C# Convert Int64 to String using a cast

How do you convert an integer to a string? It works the other way around but not this way.
string message
Int64 message2;
message2 = (Int64)message[0];
If the message is "hello", the output is 104 as a number;
If I do
string message3 = (string)message2;
I get an error saying that you cant convert a long to a string. Why is this. The method .ToString() does not work because it converts just the number to a string so it will still show as "104". Same with Convert.ToString(). How do I make it say "hello" again from 104? In C++ it allows you to cast such methods but not in C#
message[0] gives first letter from string as char, so you're casting char to long, not string to long.
Try casting it back to char again and then concatenate all chars to get entire string.
ToString() is working exactly correctly. Your error is in the conversion to integer.
How exactly do you expect to store a string composed of non-numeric digits in a long? You might be interested in BitConverter, if you want to treat numbers as byte arrays.
If you want to convert a numeric ASCII code to a string, try
((char)value).ToString()
Another alternative approach is using ASCII.GetBytes method as below
string msg1 ="hello";
byte[] ba = System.Text.Encoding.ASCII.GetBytes(msg1);
//ba[0] = 104
//ba[1] = 101
//ba[2] = 108
//ba[3] = 108
//ba[4] = 111
string msg2 =System.Text.Encoding.ASCII.GetString(ba);
//msg2 = "hello"
Try this method:
string message3 = char.ConvertFromUtf32(message2);
104 is the value of "h" not "hello".
There is no integer representation of a string, only of a char. Therefore, as stated by others, 104 is not the value of "hello" (a string) but of 'h' (a char) (see the ASCII chart here).
I can't entirely think of why you'd want to convert a string to an int-array and then back into a string, but the way to do it would be to run through the string and get the int-value of each character and then reconvert the int-values into char-values and concatenate each of them. So something like
string str = "hello"
List<int> N = new List<int>();
//this creates the list of int-values
for(int i=0;i<str.Count;i++)
N.Add((int)str[i]);
//and this joins it all back into a string
string newString = "";
for(int i=0;i<str.Count;i++)
newString += (char)N[i];

How to prevent conversion of Windows-1252 argument into a Unicode string?

I've written my first COM classes. My unit tests work fine, but my first use of the COM objects has hit a snag.
The COM classes provide methods which accept a string, manipulate it and return a string. The consumer of the COM objects is a dBASE PLUS program.
When the input string contains common keyboard characters (ASCII 127 or lower), the COM methods work fine. However, if the string contains characters beyond the ASCII range, some of them get remapped from Windows-1252 to C#'s Unicode. This table shows the mapping that takes place: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
For example, if the dBASE program calls the COM object with:
oMyComObject.MyMethod("It will cost€123") where the € is hex 80,
the C# method receives it as Unicode:
public string MyMethod(string source)
{
// source is Unicode and now the Euro symbol is hex 20AC
...
}
I would like to avoid this remapping because I want the original hex content of the string.
I've tried adding the following to MyMethod to convert the string back to Windows-1252, but the Euro symbol gets lost because it becomes a question mark:
byte[] UnicodeBytes = Encoding.Unicode.GetBytes(source.ToString());
byte[] Win1252Bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), UnicodeBytes);
string Win1252 = Encoding.GetEncoding(1252).GetString(Win1252Bytes);
Is there a way to prevent this conversion of the "source" parameter to Unicode? Or, is there a way to convert it 100% from Unicode back to Windows-1252?
Yes, I'm answering my own question. The answer by "Jigsore" put me on the right track, but I want to explain more clearly in case someone else makes the same mistake I made.
I eventually figured out that I had misdiagnosed the problem. dBASE was passing the string fine and C# was receiving it fine. It was how I checked the contents of the string that was in error.
This turnkey builds on Jigsore's answer:
void Main()
{
string unicodeText = "\u20AC\u0160\u0152\u0161";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeText);
byte[] win1252bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), unicodeBytes);
for (int i = 0; i < win1252bytes.Length; i++)
Console.Write("0x{0:X2} ", win1252bytes[i]); // output: 0x80 0x8A 0x8C 0x9A
// win1252String represents the string passed from dBASE to C#
string win1252String = Encoding.GetEncoding(1252).GetString(win1252bytes);
Console.WriteLine("\r\nWin1252 string is " + win1252String); // output: Win1252 string is €ŠŒš
Console.WriteLine("looking at the code of the first character the wrong way: " + (int)win1252String[0]);
// output: looking at the code of the first character the wrong way: 8364
byte[] bytes = Encoding.GetEncoding(1252).GetBytes(win1252String[0].ToString());
Console.WriteLine("looking at the code of the first character the right way: " + bytes[0]);
// output: looking at the code of the first character the right way: 128
// Warning: If your input contains character codes which are large in value than what a byte
// can hold (ex: multi-byte Chinese characters), then you will need to look at more than just bytes[0].
}
The reason the first method was wrong is that casting (int)win1252String[0] (or the converse of casting an integer j to a character with (char)j) involves an implicit conversion with the Unicode character set C# uses.
I consider this resolved and would like to thank each person who took the time to comment or answer for their time and trouble. It is appreciated!
Actually you're doing the Unicode to Win-1252 conversion correctly, but you're performing an extra step. The original Win1252 codes are in the Win1252Bytes array!
Check the following code:
string unicodeText = "\u20AC\u0160\u0152\u0161";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeText);
byte[] win1252bytes = Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), unicodeBytes);
for (i = 0; i < win1252bytes.Length; i++)
Console.Write("0x{0:X2} ", win1252bytes[i]);
The output shows the Win-1252 codes for the unicodeText string, you can check this by looking at the CP1252.TXT table.

C# Convert Char to Byte (Hex representation)

This seems to be an easy problem but i can't figure out.
I need to convert this character < in byte(hex representation), but if i use
byte b = Convert.ToByte('<');
i get 60 (decimal representation) instead of 3c.
60 == 0x3C.
You already have your correct answer but you're looking at it in the wrong way.
0x is the hexadecimal prefix
3C is 3 x 16 + 12
You could use the BitConverter.ToString method to convert a byte array to hexadecimal string:
string hex = BitConverter.ToString(new byte[] { Convert.ToByte('<') });
or simply:
string hex = Convert.ToByte('<').ToString("x2");
char ch2 = 'Z';
Console.Write("{0:X} ", Convert.ToUInt32(ch2));
get 60 (decimal representation) instead of 3c.
No, you don't get any representation. You get a byte containing the value 60/3c in some internal representation. When you look at it, i.e., when you convert it to a string (explicitly with ToString() or implicitly), you get the decimal representation 60.
Thus, you have to make sure that you explicitly convert the byte to string, specifying the base you want. ToString("x"), for example will convert a number into a hexadecimal representation:
byte b = Convert.ToByte('<');
String hex = b.ToString("x");
You want to convert the numeric value to hex using ToString("x"):
string asHex = b.ToString("x");
However, be aware that you code to convert the "<" character to a byte will work for that particular character, but it won't work for non-ANSI characters (that won't fit in a byte).

Categories

Resources