Best way to decode hex sequence of unicode characters to string

Best way to decode hex sequence of unicode characters to string - c#

I'm working with C# .Net
I would like to know how to convert a Unicode form string like "\u1D0EC"
(note that it's above "\uFFFF") to it's symbol... "𝃬"
Thanks For Advance!!!

That Unicode codepoint is encoded in UTF32. .NET and Windows encode Unicode in UTF16, you'll have to translate. UTF16 uses "surrogate pairs" to handle codepoints above 0xffff, a similar kind of approach as UTF8. The first code of the pair is 0xd800..dbff, the second code is 0xdc00..dfff. Try this sample code to see that at work:
using System;
using System.Text;
class Program {
static void Main(string[] args) {
uint utf32 = uint.Parse("1D0EC", System.Globalization.NumberStyles.HexNumber);
string s = Encoding.UTF32.GetString(BitConverter.GetBytes(utf32));
foreach (char c in s.ToCharArray()) {
Console.WriteLine("{0:X}", (uint)c);
}
Console.ReadLine();
}
}

Convert each sequence with int.Parse(String, NumberStyles) and char.ConvertFromUtf32:
string s = #"\U1D0EC";
string converted = char.ConvertFromUtf32(int.Parse(s.Substring(2), NumberStyles.HexNumber));

I have recently push my FOSS Uncode Converter at Codeplex (http://unicode.codeplex.com)
you can convert whatever you want to Hex code and from Hex code to get the right character, also there is a full information character database.
I use this code
public static char ConvertHexToUnicode(string hexCode)
{
if (hexCode != string.Empty)
return ((char)int.Parse(hexCode, NumberStyles.AllowHexSpecifier));
char empty = new char();
return empty;
}//end
you can see entire code on the http://unicode.codeplex.com/

It appears you just want this in your code... you can type it as a string literal using the escape code \Uxxxxxxxx (note that this is a capital U, and there must be 8 digits). For this example, it would be: "\U0001D0EC".

Related

Translate string to emoji in c#

How to turn such a string into an emoji? 1F600 => 😀 or 1F600 => \U0001F600 or 1F600 => 0x1F600
I spent a few days but I still didn't understand how to translate a string like 1F600 into emoji

You simply need to convert the value to the code point then get the character at that code point:
var emoji = Char.ConvertFromUtf32(Convert.ToInt32("1F600", 16));
Demo on dotnetfiddle

The string "1F600" is the hexadecimal representation of a Unicode code point. As it is not in the BMP, you either need UTF32 or a UTF16 surrogate pair to represent it.
Here is some code to perform the requested conversion using UTF32 representation:
Parse as 32-bit integer:
var utf32Char = uint.Parse("1F600", NumberStyles.AllowHexSpecifier);
Convert this to a 4-element byte array in litte-endian byte order:
var utf32Bytes = BitConverter.GetBytes(utf32Char);
if (!BitConverter.IsLittleEndian)
Array.Reverse(utf32Bytes);
Finally, use Encoding.UTF32 to make a string from it.
var str = Encoding.UTF32.GetString(utf32Bytes);
Console.WriteLine(str);

Convert string with special characters to hex - C#

Hi I'm trying to transform a string containing special characters like û and ….
In my research and tests I almost succeeded using the following function:
public static string ToHex(this string input)
{
char[] values = input.ToCharArray();
string hex = "0x";
string add = "";
foreach (char c in values)
{
int value = Convert.ToInt32(c);
add = String.Format("{0:X}", value).Length == 1 ?
"0" + String.Format("{0:X}", value) + "00"
: String.Format("{0:X}", value) + "00";
hex += add;
}
return hex;
}
If I try to decode ´o¸sçPQ^ûË\u000f±d it does it correctly and turns it into this 0xB4006F00B8007300E700500051005E00FB00CB000F00B1006400,
instead when I try to decode ´o¸sçPQ](ÂF\u0012…a it fails and turns it into 0xB4006F00B8007300E700500051005D002800C200460012002026006100 instead of this
0xB4006F00B8007300E700500051005D002800C2004600120026206100.
Making a minimum of debug I saw that the string is transformed from
´o¸sçPQ](ÂF\u0012…a to ´o¸sçPQ](ÂF.a, I wouldn't want that to be the problem but I'm not sure.
EDIT
0xB4006F00B8007300E700500051005D002800C2004600120026206100 ´o¸sçPQ](ÂF…a CORRECT
0xB4006F00B8007300E700500051005D002800C200460012002026006100 ´o¸sçPQ](ÂF.a MY OUTPUT
0xB4006F00B8007300E700500051005D003D00CB0042000C00A50061006000AD004500BB00 ´o¸sçPQ]=ËB¥a`E» CORRECT
0xB4006F00B8007300E700500051005D003D00CB0042000C00A50061006000AD004500BB00 ´o¸sçPQ]=ËB¥a`E» MY OUTPUT
0xB4006F00B8007300E700500051005D002F00D30042001900B7006E006100 ´o¸sçPQ]/ÓB·na CORRECT
0xB4006F00B8007300E700500051005D002F00D30042001900B7006E006100 ´o¸sçPQ]/ÓB·na MY OUTPUT
0xB4006F00B8007300E700500051005F001A20BC006B0021003500DD00 ´o¸sçPQ_‚¼k!5Ý CORRECT
0xB4006F00B8007300E700500051005F00201A00BC006B0021003500DD00 ´o¸sçPQ_'¼k!5Ý MY OUTPUT
0xB4006F00B8007300E700500051005D002F00EE006B00290014204E004100 ´o¸sçPQ]/îk)—NA CORRECT
0xB4006F00B8007300E700500051005D002F00EE006B0029002014004E004100 ´o¸sçPQ]/îk)-NA MY OUTPUT
0xB4006F00B8007300E700500051005D003800E600690036001C204C004F00 ´o¸sçPQ]8æi6“LO CORRECT
0xB4006F00B8007300E700500051005D003800E60069003600201C004C004F00 ´o¸sçPQ]8æi6"LO MY OUTPUT
0xB4006F00B8007300E700500051005D002F00F3006200390014204E004700C602 ´o¸sçPQ]/ób9—NGˆ CORRECT
0xB4006F00B8007300E700500051005D002F00F300620039002014004E0047002C600 ´o¸sçPQ]/ób9-NG^ MY OUTPUT
0xB4006F00B8007300E700500051005D003B00EE007200330078014100 ´o¸sçPQ];îr3ŸA CORRECT
0xB4006F00B8007300E700500051005D003B00EE0072003300178004100 ´o¸sçPQ];îr3YA MY OUTPUT
0xB4006F00B8007300E700500051005D003000F20064003E009D004B00 ´o¸sçPQ]0òd>K CORRECT
0xB4006F00B8007300E700500051005D003000F20064003E009D004B00 ´o¸sçPQ]0òd>?K MY OUTPUT
0xB4006F00B8007300E700500051005D002F00E60075003E00 ´o¸sçPQ]/æu> CORRECT
0xB4006F00B8007300E700500051005D002F00E60075003E00 ´o¸sçPQ]/æu> MY OUTPUT
0xB4006F00B8007300E700500051005D002F00EE006A003000DC024500 ´o¸sçPQ]/îj0˜E CORRECT
0xB4006F00B8007300E700500051005D002F00EE006A0030002DC004500 ´o¸sçPQ]/îj0~E MY OUTPUT
I thank you in advance for every reply or comment,
greetings.

This is due to endianness, and different integer and string encodings.
char cc = '…';
Console.WriteLine(cc);
// 2026 <-- note, hex value differs from byte representation shown below
Console.WriteLine(((int)cc).ToString("x"));
// 26200000
Console.WriteLine(BytesToHex(BitConverter.GetBytes((int)cc)));
// 2620
Console.WriteLine(BytesToHex(Encoding.GetEncoding("utf-16").GetBytes(new[] { cc })));
You should not treat chars as integers. There are plenty of different ways to encode strings, .net internally uses UTF-16. And all encodings works with bytes, not with integers. Explicit conversion chars to integer can lead to unexpected results, like yours. Why don't you get encoding you need and work with bytes via Encoding.GetBytes?
void Main()
{
// output you expect 0xB4006F00B8007300E700500051005D002800C2004600120026206100
Console.WriteLine(BytesToHex(Encoding.GetEncoding("utf-16").GetBytes("´o¸sçPQ](ÂF\u0012…a")));
}
public static string BytesToHex(byte[] bytes)
{
// whatever way to convert bytes to hex
return "0x" + BitConverter.ToString(bytes).Replace("-", "");
}

C# - Replace Chars with its Unicode instance

I'm developing the android application that reads book from JSON format.In order to create such type of books i needed the desktop application due to comfortableness and i chose C#.
First of all i want to say that in my native language there are lots of chars that should be encoded in Unicode not in ASCII for example...
[ə ç ş ğ ö ü and so on]
My problem is that there is problem with Json for some char formats and i should use the instance of this chars.(Unicode instance).For instance:
string text = "asdsdas";
text = ConvertToUnicode(Text);//->/u231/u213/u123...
i tried many ways to achieve this in JavaScript but i couldn't. Now devs please help me to solve this problem in C#.Thanks in advance any suggestion would be okay for me :).

You can define an extension method:
public static class Extension {
public static string ToUnicodeString(this string str) {
StringBuilder sb = new StringBuilder();
foreach(var c in str) {
sb.Append("\\u" + ((int) c).ToString("X4"));
}
return sb.ToString();
}
}
which can be called like myString.ToUnicodeString()
Check it in this demo.

Base64 in C# and PL/SQL?

In PL/SQL how can I convert a string (long HTML string with new line and tags, etc) to Base64 that is easy to decrypt in C#?
In C# there are:
Convert.ToBase64String()
Convert.ToBase64CharArray()
BitConverter.ToString()
which one is compatible with PL/SQL
utl_encode.base64_encode();
?
I welcome any other suggestions :-)

You'll probably want to use this method:
Convert.ToBase64String()
It returns a Base64 encoded String based off an array of unsigned 8-bit integers (bytes).
As an alternate, you can use Convert.ToBase64CharArray(), but the output is a character array, which is a bit odd but may be useful in certain circumstances.
The method BitConverter.ToString() returns a String, but the bytes are represented in Hexadecimal, not Base64 encoded.

I done it :-)
PL/SQL
s1 varchar2(32767);
s2 varchar2(32767);
s2:= utl_raw.cast_to_varchar2(utl_encode.base64_encode(utl_raw.cast_to_raw(s1)));
s2:= utl_raw.cast_to_varchar2(utl_encode.base64_decode(utl_raw.cast_to_raw(s1)));
are compatible with C#
public static string ToBase64(string str)
{
return Convert.ToBase64String(Encoding.UTF8.GetBytes(str));
}
//++++++++++++++++++++++++++++++++++++++++++++++
public static string FromBase64(string str)
{
return Encoding.UTF8.GetString(Convert.FromBase64String(str));
}
hope you find it useful :-)

SHA1 C# method equivalent in Perl?

I was given C# code and I'm trying to generate the equivalent SHA1 using Perl.
public string GetHashedPassword(string passkey)
{
// Add a timestamp to the passkey and encrypt it using SHA1.
string passkey = passkey + DateTime.UtcNow.ToString("yyyyMMddHH0000");
using (SHA1 sha1 = new SHA1CryptoServiceProvider())
{
byte[] hashedPasskey =
sha1.ComputeHash(Encoding.UTF8.GetBytes(passkey));
return ConvertToHex(hashedPasskey);
}
}
private string ConvertToHex(byte[] bytes)
{
StringBuilder hex = new StringBuilder();
foreach (byte b in bytes)
{
if (b < 16)
{
hex.AppendFormat("0{0:X}", b);
}
else
{
hex.AppendFormat("{0:X}", b);
}
}
return hex.ToString();
}
The same as:
use Digest::SHA1 qw( sha1_hex );
my $pass = "blahblah";
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime();
$year += 1900;
my $date = sprintf("%d%02d%02d%02d0000", $year, $mon+1, $mday, $hour);
my $passSha1 = sha1_hex($pass.$date);
//9c55409372610f8fb3695d1c7c2e6945164a2578
I don't actually have any C# experience so I'm not able to test what is normally outputted from the C# code.
The code is supposed to be used as a checksum for a website but the one I'm providing is failing.
Edit: it also adds the UTC timestamp (yyyyMMDDHH0000) to the end of the pass before hashing so I've added that code in case the issue is there.

I do not know C# either. However, {0:X} formats hex digits using upper case letters. So, would
my $passSha1 = uc sha1_hex($pass);
help? (Assuming GetHashedPassword makes sense.)

The only difference I can see (from running the code under Visual Studio 2008) is that the C# code is returning the hex string with alphas in uppercase
D3395867D05CC4C27F013D6E6F48D644E96D8241
and the perl code is using lower case for alphas
d3395867d05cc4c27f013d6e6f48d644e96d8241
The format string used in the C# code is asking for uppercase ("X" as opposed to "x"):
hex.AppendFormat("{0:X}", b);
Maybe the code at the website is using a case sensitive comparison? I assume it would be trivial for you to convert the output from the CPAN function to uppercase before you submit it?

Could it be as simple as changing the uppercase 'X' in the AppendFormat call to a lowercase 'x'?

I think you're looking for Digest::SHA1

Your SHA-1 could have also just been:
BitConverter.ToString(SHA.ComputeHash(buffer)).Replace("-", "");

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Best way to decode hex sequence of unicode characters to string - c#

I'm working with C# .Net I would like to know how to convert a Unicode form string like "\u1D0EC" (note that it's above "\uFFFF") to it's symbol... "𝃬" Thanks For Advance!!!

Convert each sequence with int.Parse(String, NumberStyles) and char.ConvertFromUtf32: string s = #"\U1D0EC"; string converted = char.ConvertFromUtf32(int.Parse(s.Substring(2), NumberStyles.HexNumber));

It appears you just want this in your code... you can type it as a string literal using the escape code \Uxxxxxxxx (note that this is a capital U, and there must be 8 digits). For this example, it would be: "\U0001D0EC".

Related

Translate string to emoji in c#

Convert string with special characters to hex - C#

C# - Replace Chars with its Unicode instance

Base64 in C# and PL/SQL?

SHA1 C# method equivalent in Perl?

Categories

Resources