I am trying to convert a AS3 (ActionScript 3) function to C#.
This ActionScript function contains a class called ByteArray which from what I am aware of it's basically what it sounds like lmao. I think it's kind of similar of how byte[] would be in C#. Anyway, I have tried my best to convert the code to C# using MemoryStream and then writing bytes to it, and then returning UTF8 string as you can see in my code below. However, I feel as if my way of doing how the ActionScript code does isn't accurate and that is where my question above comes in.
With them negative numbers being written into "loc1" (The ByteArray) and "loc1.uncompress()", that's where I feel like I am failing and was wondering if someone could help me out in converting this function so it's fully accurate?
On top of that question, I would also like to ask if what I was doing with the negative numbers was correct in my C# code just like how the ActionScript code was doing it? Would mean a lot (:
(Sorry if not fully understandable and if what I say doesn't match up as much)
ActionScript Code:
private function p() : String
{
var _loc1_:ByteArray = new ByteArray();
_loc1_.writeByte(120);
_loc1_.writeByte(-38);
_loc1_.writeByte(99);
_loc1_.writeByte(16);
_loc1_.writeByte(12);
_loc1_.writeByte(51);
_loc1_.writeByte(41);
_loc1_.writeByte(-118);
_loc1_.writeByte(12);
_loc1_.writeByte(50);
_loc1_.writeByte(81);
_loc1_.writeByte(73);
_loc1_.writeByte(49);
_loc1_.writeByte(-56);
_loc1_.writeByte(13);
_loc1_.writeByte(48);
_loc1_.writeByte(54);
_loc1_.writeByte(54);
_loc1_.writeByte(14);
_loc1_.writeByte(48);
_loc1_.writeByte(46);
_loc1_.writeByte(2);
_loc1_.writeByte(0);
_loc1_.writeByte(45);
_loc1_.writeByte(-30);
_loc1_.writeByte(4);
_loc1_.writeByte(-16);
_loc1_.uncompress();
_loc1_.position = 0;
return _loc1_.readUTF();
}
My C# Code:
public string p()
{
MemoryStream loc1 = new MemoryStream();
loc1.WriteByte((byte)120);
loc1.WriteByte((byte)~-38);
loc1.WriteByte((byte)99);
loc1.WriteByte((byte)16);
loc1.WriteByte((byte)12);
loc1.WriteByte((byte)51);
loc1.WriteByte((byte)41);
loc1.WriteByte((byte)~-118);
loc1.WriteByte((byte)12);
loc1.WriteByte((byte)50);
loc1.WriteByte((byte)81);
loc1.WriteByte((byte)73);
loc1.WriteByte((byte)49);
loc1.WriteByte((byte)~-56);
loc1.WriteByte((byte)13);
loc1.WriteByte((byte)48);
loc1.WriteByte((byte)54);
loc1.WriteByte((byte)54);
loc1.WriteByte((byte)14);
loc1.WriteByte((byte)48);
loc1.WriteByte((byte)46);
loc1.WriteByte((byte)2);
loc1.WriteByte((byte)0);
loc1.WriteByte((byte)45);
loc1.WriteByte((byte)~-30);
loc1.WriteByte((byte)4);
loc1.WriteByte((byte)~-16);
loc1.Position = 0;
return Encoding.UTF8.GetString(loc1.ToArray());
}
1) In C#, bytes are unsigned. You cannot convert a signed byte to an unsigned byte with the complement, because your intention is that the bitwise representation should be identical, rather than opposite, which is what the complement does.
one simple way to convert is to mask with 0xFF: -37 & 0xFF = 219. There are other, mathematically equivalent ways, such as checking for negatives with sbyte sb = -37; byte b = sb < 0 ? 256 + sb : sb;
2) The builtin System.IO.Compression namespace is lacking in a number of ways. For one, it doesn't even support decompressing zlib data, which is what your byte array holds. the best way is to use a third party package on Nuget instead. The DotNetZip library does what you need, specifically the Ionic.Zlib.ZlibStream.UncompressBuffer function.
(1)
#Jimmy has given you a good Answer.
This is what he meant when he told you "to mask with 0xFF" so that your -38 becomes masked as:
loc1.WriteByte( (byte)(-38 & 0xFF) );
Do the same above logic for any other values that have a minus sign.
(2)
It might be easier if you just use values written in hex instead of decimal. This means instead of decimal 255 you write equivalent hex of 0xFF since bytes are supposed to be in hex. The WriteByte is auto-converting your decimals but it's not helping you to learn what it is going on...
For example your beginning two byte values are 120 -38 but in hex that is 0x78 0xDA.
Now if you google search bytes 0x78 0xDA you will find out those two bytes are header for ZLIB's DEFLATE compression algorithm.
This ZLIB detail is important to know for the next step...
(3)
Sometimes the variable names are not always recovered during de-compiling. This is why all your code has these silly _loc_ as generic names (real var names are unknown, only their data type).
Your _loc1_.uncompress(); is supposed to contain a String variable specifying the algorithm.
public function uncompress(algorithm:String) :void //from AS3 documentation
During decompilation that important info was lost. Luckily there only 3 options "ZLIB", "DEFLATE" or "LZMA". From the above notice (2) we can see it should be _loc1_.uncompress("DEFLATE");
Solution:
Create a byte array (not Memory Stream) and manually fill with hex values (eg: -13 is written 0xDA).
First convert each of your numbers to hex. You can use Windows Calculator in Programmer mode (under View option), where you type a decimal in dec mode then press hex to see same value as hex format. Maybe some online tool can do it too.
The final hex values should look like 78 DA 63 10 0C 33 29 8A 0C 32 51 49 31 C8 .... etc until the ending hex value F0 which equals your ending decimal -16.
Then you can easily do...
public string p()
{
byte[] loc_Data = new byte[] {
0x78, 0xDA, 0x63, 0x10, 0x0C, 0x33, 0x29, 0x8A,
0x0C, 0x32, 0x51, 0x49, 0x31, 0xC8, 0x0D, 0x30, etc etc ... until 0xF0
};
var loc_Uncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer( loc_Data );
return Encoding.UTF8.GetString( loc_Uncompressed ); //or try: loc_Uncompressed.ToArray()
}
Related
I got this trouble when rewrite my service to .net5. I check password hash, and it not equal with hash in DB. For example, I wrote small console programm. Results of work in Core 2.1 and .net5 are different
static void Main(string[] args)
{
var password = "MlsU37z*!";
var bytesOfHash = GetBytesOfHash(password);
Console.WriteLine(BitConverter.ToString(bytesOfHash));
var hashString = Encoding.Default.GetString(bytesOfHash);
Console.WriteLine(string.Join("-", hashString.Select(ch => $"{(ushort)ch:X4}")));
Console.WriteLine($"Hash string length = {hashString.Length}");
Console.ReadKey();
}
Result for Core2.1:
29-E7-2A-D5-85-F7-ED-5F-3E-E2-F4-23-7E-09-D0-48-7E-0E-E9-B0-70-9E-D5-08-B0-A9-BA-75-30-C7-8E-B9-A9-8B-93-71-FC-7D-E8-B5-24-C2-80-EE-41-58-8C-D2-7E-0D-78-87-30-C1-A2-2D-FF-D9-A4-95-B3-97-95-BF-AA-44-62-06-02-46-3F-96-0E-0F-C3-86-DE-18-97-AB-A9-59-CF-E6-14-F8-DE-66-0D-44-CF-1B-16-5D-5F-8A-75-96-58-FC-FE-A4-14-B1-17-6D-DE-CF-B4-FE-0B-95-DB-96-39-49-48-0C-7B-8A-C1-6F-62-8F-63-E6-B1-77-BC-41-B6-FC-D3-5A-4B-BB-05-E6-02-F4-9D-40-C5-9B-97-33-6A-6D-21-83-D6-F3-56-87-68-56-C3-1A-97-57-95-ED-CC-47-DB-EA-35-73-EE-83-3B-40-E3-95-93-0C-87-10-64-28-8F-39-B6-8B-FD-15-7D-1D-AB-43-AC-97-9F-23-FF-1F-60-E8-8A-21-A1-40-E6-4F-D4-E7-05-BC-52-E9-5B-0F-8D-0E-F4-EA-29-05-B6-2A-10-44-CF-D6-8A-5A-71-36-C6-4C-99-8B-B4-CC-39-89-F9-B2-3A-C3-1D-A3-AD-17-78-FA-E4-5E-89-25-07-55-C1-21-E2-EB-C4-AD-96-76-38-5E-6F-8F-DC-0F-04-E9-DF-20-B5-A1-C2-85-C4-0E-F7-AF-9A-5F-C5-E9-BE-16-98-D9-F0-B1-48-77-0E-1D-E5-05-02-0A-EE-4F-F6-27-0A-B7-3C-20-BA-FA-B3-21-A2-8E-D7-51-C5-14-E1-CB-61-D2-36-1B-47-8D-57-17-CB-61-D9-FE-F3-93-F8-2F-2E-2E-80-B0-D3-F7-F6-3C-06-CE-92-7E-49-88-C8-4D-38-F5-BF-61-4D-33-2E-51-53-DC-33-E4-8E-33-31-57-F4-BF-E1-A9-B8-13-A0-AB-5A-C0-D5-DC-0E-61-26-AD-4C-A3-48-46-C6-05-C5-79-47-CC-4B-CB-C8-21-31-56-34-B5-BA-52-85-87-F2-DB-71-FF-3D-D6-FD-C0-23-E4-6E-23-F2-78-A1-04-B3-1B-B4-A9-A2-AE-EB-
55-75-89-08-01-6D-B8-00-49-1D-44-C4-8E-9B-21-E2-5B-4F-59-41-00-43-4F-C5-B3-C3-53-E5-2F-95-06-24-30-A3-B6-87-0F-DB-31-C4-AE-D7-8C-AE-D9-BD-C8-9F-F1-04-E5-5F-A9-D7-9B-3B-21-51-FC-B4-44-23-D3-75-34-3B-1F-93-99-6C-D5-EE-D4-65-96-B4-16-2D-30-4E-B0-DD-D3-31-CF-53-0E
0029-FFFD-002A-0545-FFFD-FFFD-005F-003E-FFFD-FFFD-0023-007E-0009-FFFD-0048-007E-000E-FFFD-0070-FFFD-FFFD-0008-FFFD-FFFD-FFFD-0075-0030-01CE-FFFD-FFFD-FFFD-FFFD-0071-FFFD-007D-FFFD-0024-0080-FFFD-0041-0058-FFFD-FFFD-007E-000D-0078-FFFD-0030-FFFD-FFFD-002D-FFFD-0664-FFFD-FFFD-FFFD-FFFD-FFFD-FFFD-0044-0062-0006-0002-0046-003F-FFFD-000E-000F-00C6-FFFD-0018-FFFD-FFFD-FFFD-0059-FFFD-FFFD-0014-FFFD-FFFD-0066-000D-0044-FFFD-001B-0016-005D-005F-FFFD-0075-FFFD-0058-FFFD-FFFD-FFFD-0014-FFFD-0017-006D-FFFD-03F4-FFFD-000B-FFFD-06D6-0039-0049-0048-000C-007B-FFFD-FFFD-006F-0062-FFFD-0063-FFFD-0077-FFFD-0041-FFFD-FFFD-FFFD-005A-004B-FFFD-0005-FFFD-0002-FFFD-0040-015B-FFFD-0033-006A-006D-0021-FFFD-FFFD-FFFD-0056-FFFD-0068-0056-FFFD-001A-FFFD-0057-FFFD-FFFD-FFFD-0047-FFFD-FFFD-0035-0073-FFFD-003B-0040-3553-000C-FFFD-0010-0064-0028-FFFD-0039-FFFD-FFFD-FFFD-0015-007D-001D-FFFD-0043-FFFD-FFFD-FFFD-0023-FFFD-001F-0060-FFFD-0021-FFFD-0040-FFFD-004F-FFFD-FFFD-0005-FFFD-0052-FFFD-005B-000F-FFFD-000E-FFFD-FFFD-0029-0005-FFFD-002A-0010-0044-FFFD-058A-005A-0071-0036-FFFD-004C-FFFD-FFFD-FFFD-FFFD-0039-FFFD-FFFD-FFFD-003A-FFFD-001D-FFFD-FFFD-0017-0078-FFFD-FFFD-005E-FFFD-0025-0007-0055-FFFD-0021-FFFD-FFFD-012D-FFFD-0076-0038-005E-006F-FFFD-FFFD-000F-0004-FFFD-FFFD-0020-FFFD-FFFD-0085-
FFFD-000E-FFFD-FFFD-FFFD-005F-FFFD-FFFD-0016-FFFD-FFFD-FFFD-0048-0077-000E-001D-FFFD-0005-0002-000A-FFFD-004F-FFFD-0027-000A-FFFD-003C-0020-FFFD-FFFD-FFFD-0021-FFFD-FFFD-FFFD-0051-FFFD-0014-FFFD-FFFD-0061-FFFD-0036-001B-0047-FFFD-0057-0017-FFFD-0061-FFFD-FFFD-FFFD-FFFD-002F-002E-002E-FFFD-FFFD-FFFD-FFFD-FFFD-003C-0006-0392-007E-0049-FFFD-FFFD-004D-0038-FFFD-FFFD-0061-004D-0033-002E-0051-0053-FFFD-0033-FFFD-0033-0031-0057-FFFD-1A78-0013-FFFD-FFFD-005A-FFFD-FFFD-FFFD-000E-0061-0026-FFFD-004C-FFFD-0048-0046-FFFD-0005-FFFD-0079-0047-FFFD-004B-FFFD-FFFD-0021-0031-0056-0034-FFFD-FFFD-0052-FFFD-FFFD-FFFD-FFFD-0071-FFFD-003D-FFFD-FFFD-FFFD-0023-FFFD-006E-0023-FFFD-0078-FFFD-0004-FFFD-001B-FFFD-FFFD-FFFD-FFFD-FFFD-0055-0075-FFFD-0008-0001-006D-FFFD-0000-0049-001D-0044-010E-FFFD-0021-FFFD-005B-004F-0059-0041-0000-0043-004F-0173-FFFD-0053-FFFD-002F-FFFD-0006-0024-0030-FFFD-FFFD-FFFD-000F-FFFD-0031-012E-05CC-FFFD-067D-021F-FFFD-0004-FFFD-005F-FFFD-05DB-003B-0021-0051-FFFD-FFFD-0044-0023-FFFD-0075-0034-003B-001F-FFFD-FFFD-006C-FFFD-FFFD-FFFD-0065-FFFD-FFFD-0016-002D-0030-004E-FFFD-FFFD-FFFD-0031-FFFD-0053-000E
Hash string length = 478
Result for .net5:
29-E7-2A-D5-85-F7-ED-5F-3E-E2-F4-23-7E-09-D0-48-7E-0E-E9-B0-70-9E-D5-08-B0-A9-BA-75-30-C7-8E-B9-A9-8B-93-71-FC-7D-E8-B5-24-C2-80-EE-41-58-8C-D2-7E-0D-78-87-30-C1-A2-2D-FF-D9-A4-95-B3-97-95-BF-AA-44-62-06-02-46-3F-96-0E-0F-C3-86-DE-18-97-AB-A9-59-CF-E6-14-F8-DE-66-0D-44-CF-1B-16-5D-5F-8A-75-96-58-FC-FE-A4-14-B1-17-6D-DE-CF-B4-FE-0B-95-DB-96-39-49-48-0C-7B-8A-C1-6F-62-8F-63-E6-B1-77-BC-41-B6-FC-D3-5A-4B-BB-05-E6-02-F4-9D-40-C5-9B-97-33-6A-6D-21-83-D6-F3-56-87-68-56-C3-1A-97-57-95-ED-CC-47-DB-EA-35-73-EE-83-3B-40-E3-95-93-0C-87-10-64-28-8F-39-B6-8B-FD-15-7D-1D-AB-43-AC-97-9F-23-FF-1F-60-E8-8A-21-A1-40-E6-4F-D4-E7-05-BC-52-E9-5B-0F-8D-0E-F4-EA-29-05-B6-2A-10-44-CF-D6-8A-5A-71-36-C6-4C-99-8B-B4-CC-39-89-F9-B2-3A-C3-1D-A3-AD-17-78-FA-E4-5E-89-25-07-55-C1-21-E2-EB-C4-AD-96-76-38-5E-6F-8F-DC-0F-04-E9-DF-20-B5-A1-C2-85-C4-0E-F7-AF-9A-5F-C5-E9-BE-16-98-D9-F0-B1-48-77-0E-1D-E5-05-02-0A-EE-4F-F6-27-0A-B7-3C-20-BA-FA-B3-21-A2-8E-D7-51-C5-14-E1-CB-61-D2-36-1B-47-8D-57-17-CB-61-D9-FE-F3-93-F8-2F-2E-2E-80-B0-D3-F7-F6-3C-06-CE-92-7E-49-88-C8-4D-38-F5-BF-61-4D-33-2E-51-53-DC-33-E4-8E-33-31-57-F4-BF-E1-A9-B8-13-A0-AB-5A-C0-D5-DC-0E-61-26-AD-4C-A3-48-46-C6-05-C5-79-47-CC-4B-CB-C8-21-31-56-34-B5-BA-52-85-87-F2-DB-71-FF-3D-D6-FD-C0-23-E4-6E-23-F2-78-A1-04-B3-1B-B4-A9-A2-AE-EB-
55-75-89-08-01-6D-B8-00-49-1D-44-C4-8E-9B-21-E2-5B-4F-59-41-00-43-4F-C5-B3-C3-53-E5-2F-95-06-24-30-A3-B6-87-0F-DB-31-C4-AE-D7-8C-AE-D9-BD-C8-9F-F1-04-E5-5F-A9-D7-9B-3B-21-51-FC-B4-44-23-D3-75-34-3B-1F-93-99-6C-D5-EE-D4-65-96-B4-16-2D-30-4E-B0-DD-D3-31-CF-53-0E
0029-FFFD-002A-0545-FFFD-FFFD-005F-003E-FFFD-FFFD-0023-007E-0009-FFFD-0048-007E-000E-FFFD-0070-FFFD-FFFD-0008-FFFD-FFFD-FFFD-0075-0030-01CE-FFFD-FFFD-FFFD-FFFD-0071-FFFD-007D-FFFD-0024-0080-FFFD-0041-0058-FFFD-FFFD-007E-000D-0078-FFFD-0030-FFFD-FFFD-002D-FFFD-0664-FFFD-FFFD-FFFD-FFFD-FFFD-FFFD-0044-0062-0006-0002-0046-003F-FFFD-000E-000F-00C6-FFFD-0018-FFFD-FFFD-FFFD-0059-FFFD-FFFD-0014-FFFD-FFFD-0066-000D-0044-FFFD-001B-0016-005D-005F-FFFD-0075-FFFD-0058-FFFD-FFFD-FFFD-0014-FFFD-0017-006D-FFFD-03F4-FFFD-000B-FFFD-06D6-0039-0049-0048-000C-007B-FFFD-FFFD-006F-0062-FFFD-0063-FFFD-0077-FFFD-0041-FFFD-FFFD-FFFD-005A-004B-FFFD-0005-FFFD-0002-FFFD-FFFD-0040-015B-FFFD-0033-006A-006D-0021-FFFD-FFFD-FFFD-0056-FFFD-0068-0056-FFFD-001A-FFFD-0057-FFFD-FFFD-FFFD-0047-FFFD-FFFD-0035-0073-FFFD-003B-0040-3553-000C-FFFD-0010-0064-0028-FFFD-0039-FFFD-FFFD-FFFD-0015-007D-001D-FFFD-0043-FFFD-FFFD-FFFD-0023-FFFD-001F-0060-FFFD-0021-FFFD-0040-FFFD-004F-FFFD-FFFD-0005-FFFD-0052-FFFD-005B-000F-FFFD-000E-FFFD-FFFD-0029-0005-FFFD-002A-0010-0044-FFFD-058A-005A-0071-0036-FFFD-004C-FFFD-FFFD-FFFD-FFFD-0039-FFFD-FFFD-FFFD-003A-FFFD-001D-FFFD-FFFD-0017-0078-FFFD-FFFD-005E-FFFD-0025-0007-0055-FFFD-0021-FFFD-FFFD-012D-FFFD-0076-0038-005E-006F-FFFD-FFFD-000F-0004-FFFD-FFFD-0020-FFFD-FFFD-
0085-FFFD-000E-FFFD-FFFD-FFFD-005F-FFFD-FFFD-0016-FFFD-FFFD-FFFD-0048-0077-000E-001D-FFFD-0005-0002-000A-FFFD-004F-FFFD-0027-000A-FFFD-003C-0020-FFFD-FFFD-FFFD-0021-FFFD-FFFD-FFFD-0051-FFFD-0014-FFFD-FFFD-0061-FFFD-0036-001B-0047-FFFD-0057-0017-FFFD-0061-FFFD-FFFD-FFFD-FFFD-002F-002E-002E-FFFD-FFFD-FFFD-FFFD-FFFD-003C-0006-0392-007E-0049-FFFD-FFFD-004D-0038-FFFD-FFFD-0061-004D-0033-002E-0051-0053-FFFD-0033-FFFD-0033-0031-0057-FFFD-FFFD-1A78-0013-FFFD-FFFD-005A-FFFD-FFFD-FFFD-000E-0061-0026-FFFD-004C-FFFD-0048-0046-FFFD-0005-FFFD-0079-0047-FFFD-004B-FFFD-FFFD-0021-0031-0056-0034-FFFD-FFFD-0052-FFFD-FFFD-FFFD-FFFD-0071-FFFD-003D-FFFD-FFFD-FFFD-0023-FFFD-006E-0023-FFFD-0078-FFFD-0004-FFFD-001B-FFFD-FFFD-FFFD-FFFD-FFFD-0055-0075-FFFD-0008-0001-006D-FFFD-0000-0049-001D-0044-010E-FFFD-0021-FFFD-005B-004F-0059-0041-0000-0043-004F-0173-FFFD-0053-FFFD-002F-FFFD-0006-0024-0030-FFFD-FFFD-FFFD-000F-FFFD-0031-012E-05CC-FFFD-067D-021F-FFFD-0004-FFFD-005F-FFFD-05DB-003B-0021-0051-FFFD-FFFD-0044-0023-FFFD-0075-0034-003B-001F-FFFD-FFFD-006C-FFFD-FFFD-FFFD-0065-FFFD-FFFD-0016-002D-0030-004E-FFFD-FFFD-FFFD-0031-FFFD-0053-000E
Hash string length = 480
I run both console programs on the same computer.
Your use of Encoding.GetString on an array of bytes which in no way represents a text string, is meaningless and wrong. So never do that! The rest of this answer is about the way in which the same (illegal!) byte sequence gives different strings in the two versions of .NET.
After comments, and addition of actual data to the question, it looks like the reason is indeed differences in the UTF-8 decoders when they encounter illegal sequences of bytes. In particular, when they first see a correct byte starting with bits 11110 meaning the start of a 4-byte encoding, then see a correct byte starting with 10 meaning the first pure "payload" byte (two more of that type expected), but then see a wrong byte in the context, one not starting with 10, then the decoders disagree on how many U+FFFD 'REPLACEMENT CHARACTER' � to emit.
In particular, the asker has confirmed in comments that Encoding.Default.GetString(new byte[] { 0xF4, 0x9D, 0x40, }) gives "�#" in .NET Core 2.1, but "��#" in .NET 5. The byte F4 promises the start of a 4-byte sequence, the byte 9D looks correct enough as byte 2, but then the plain ASCII byte 40 (meaning #) breaks the sequence. So the disagreement is about whether that is shown as "�#" or "��#".
The other case where the strings differ is similar, except here the 4-byte sequence is interrupted by a byte starting with 1110 instead of a byte starting with 0.
I guess more universally you can use Encoding.UTF8.GetString(new byte[] { 0xF4, 0x9D, 0x40, }) (because Encoding.Default may be something else if your OS is set to a language/locale that can be represented with a legacy 1-byte encoding, such as Windows-1252). Addition: I see in the documentation of Default that it will give UTF-8 on all .NET Core variants (including .NET Core 2.1 and .NET 5). It is only on the old .NET Framework (like .NET Framework 4.8) that Default may yield an ANSI code page, like Windows-1252.
So the issue is that when using c# the char is 4 bytes so "abc" is (65 0 66 0 67 0).
When inputing that to a wstring in c++ thru sending it in a socket i get the following output a.
How i am able to convert such a string to a c++ string?
Sounds like you need ASCII or UTF-8 encoding instead of Unicode.
65 0 66 0 67 0 is only going to get you the A, since the next zero is interpreted as a null termination character in C++.
Strategies for converting Unicode to ASCII can be found here.
using c# the char is 4 bytes
No, in CSharp Strings are encoded in UTF16. Code units need at least two bytes in UTF16. For simple charachters a single code unit can represent a code point (e.g. 65 0).
On Windows wstring is usually UTF16 (2-4 Bytes) encoded, too. But on Unix/Linux wstring uses usually UTF32-Encoding (always 4 Bytes).
The Unicode code Point has the same numerical value compared to ASCII - therefore UTF-16 encoded ASCII text looks often like this: {num} 0 {num} 0 {num} 0...
See the details here: (https://en.wikipedia.org/wiki/UTF-16)
Could you show us some Code, how you constructed your wstring object?
The null byte is critical here, because it was the end marker for ASCII / ANSI Strings.
I have been able to solve the issue by using a std::u16string.
Here is some example code
std::vector<char> data = { 65, 0, 66, 0, 67, 0 };
std::u16string string(&data[0], data.size() / 2);
// now string should be encoded right
I'm working on a parser to receive UDP information, parse it, and store it. To do so I'm using a BinaryReader since it will mostly be binary information. Some of it will be strings though. MSDN says for the ReadString() function:
Reads a string from the current stream. The string is prefixed with
the length, encoded as an integer seven bits at a time.
And I completely understand it up until "seven bits at a time" which I tried to simply ignore until I started testing. I'm creating my own byte array before putting it into a MemoryStream and attempting to read it with a BinaryReader. Here's what I first thought would work:
byte[] data = new byte[] { 3, 0, 0, 0, (byte)'C', (byte)'a', (byte)'t', }
BinaryReader reader = new BinaryReader(new MemoryStream(data));
String str = reader.ReadString();
Knowing an int is 4 bytes (and toying around long enough to find out that BinaryReader is Little Endian) I pass it the length of 3 and the corresponding letters. However str ends up holding \0\0\0. If I remove the 3 zeros and just have
byte[] data = new byte[] { 3, (byte)'C', (byte)'a', (byte)'t', }
Then it reads and stores Cat properly. To me this conflicts with the documentation saying that the length is supposed to be an integer. Now I'm beginning to think they simply mean a number with no decimal place and not the data type int. Does this mean that a BinaryReader can never read a string larger than 127 characters (since that would be 01111111 corresponding to the 7 bits part of the documentation)?
I'm writing up a protocol and need to completely understand what I'm getting into before I pass our documentation along to our clients.
I found the source code for BinaryReader. It uses a function called Read7BitEncodedInt() and after looking up that documentation and the documentation for Write7BitEncodedInt() I found this:
The integer of the value parameter is written out seven bits at a
time, starting with the seven least-significant bits. The high bit of
a byte indicates whether there are more bytes to be written after this
one. If value will fit in seven bits, it takes only one byte of space.
If value will not fit in seven bits, the high bit is set on the first
byte and written out. value is then shifted by seven bits and the next
byte is written. This process is repeated until the entire integer has
been written.
Also, Ralf found this link that better displays what's going on.
Unless they specifically say 'int' or 'Int32', they just mean an integer as in a whole number.
By '7 bits at time', they mean that it implements 7-bit length encoding, which seems a bit confusing at first but is actually rather straightforward. Here are some example values and how they are written out using 7-bit length encoding:
/*
decimal value binary value -> enc byte 1 enc byte 2 enc byte 3
85 00000000 00000000 01010101 -> 01010101 n/a n/a
1,365 00000000 00000101 01010101 -> 11010101 00001010 n/a
349,525 00000101 01010101 01010101 -> 11010101 10101010 00010101
*/
The table above uses big endian for no other reason than I simply had to pick one and it's what I'm most familiar with. The way 7-bit length encoding works, it is little endian by it's very nature.
Note that 85 writes out to 1 byte, 1,365 writes out to 2 bytes, and 349,525 writes out to 3 bytes.
Here's the same table using letters to show how each value's bits were used in the written output (dashes are zero-value bits, and the 0s and 1s are what's added by the encoding mechanism to indicate if a subsequent byte is to be written/read)...
/*
decimal value binary value -> enc byte 1 enc byte 2 enc byte 3
85 -------- -------- -AAAAAAA -> 0AAAAAAA n/a n/a
1,365 -------- -----BBB AAAAAAAA -> 1AAAAAAA 0---BBBA n/a
349,525 -----CCC BBBBBBBB AAAAAAAA -> 1AAAAAAA 1BBBBBBA 0--CCCBB
*/
So values in the range of 0 to 2^7-1 (127) will write out as 1 byte, values of 2^7 (128) to 2^14-1 (16,383) will use 2 bytes, 2^14 (16,384) to 2^21-1 (2,097,151) will take 3 bytes, and so on and so forth.
I got a method that's supposed to generate a 64 byte (512 bit) salt for me:
public static string GenerateSalt()
{
var rngCrypto = new RNGCryptoServiceProvider();
byte[] saltBytes = new byte[64];
rngCrypto.GetBytes(saltBytes);
string result = Convert.ToBase64String(saltBytes);
return result;
}
This seems to be running fine, the saltBytes bytearray has the size of 64 byte. However, I can't enter the results in my MS SQL Database Table, consisting of a char(64) typed column.
My assumption is, that the Convert.ToBase64String(saltBytes); method is faulty on my side, but I'd like to know how I can improve this. A quick run through System.Text.ASCIIEncoding.Unicode.GetByteCount(secondSalt); reveals a string size of 176 byte instead of 64 byte.
A byte array is logically a number in base 256. Converting that to a number in base 64 is going to make it longer. Just like when you convert from hex F0 to binary 1111111100000000, it gets longer.
If you want to store the salt in the database in a human-readable base-64-encoded string then it is going to have to be much longer than 64 single-byte characters.
As for running it through the ASCII encoder -- I have no idea what you're trying to do here. That sounds like an odd thing to do to non-textual data. Can you explain?
When you start with 64 bytes (512 bits), then convert to base 64, you're storing only 6 bits in each byte, so you need ceiling(512/6) = 86 bytes to store the result (not sure where your 176 bytes is coming from though).
I have a very interesting problem. One to which i have yet to be able to find an answer.
the code as follows
Console.WriteLine("\xc0\x80");
Console.WriteLine(Encoding.GetEncoding(1252).GetString(new byte[] { 0xC0, 0x80 }));
Console.WriteLine( Encoding.GetEncoding(1252).GetString(Encoding.GetEncoding(1252).GetBytes("\xc0\x80")));
byte[] bt = new byte[2];
bt[0] = (byte)'\xC0';
bt[1] = (byte)'\x80';
Console.WriteLine(Encoding.GetEncoding(1252).GetString(bt));
produces the following output:
À?
À€
À?
À€
when encoded to bytes using codepage 1252 "\xc0\x80" produces C0 3f ..
however, if i cast it straight to a byte array, the bytes are C0 80
Suggestions?
Also, the same code ran from vs in another machine, produces  on every line...
The problem isent so much the code page, it's got to be a setting in vs or in windows
causing the lookup from my default codepage to 1252.
3f is a question mark. It is produced because CP 1252 does not support the character U+0080 (which is a control character); in CP 1252, byte 80 is U+20AC (EURO SIGN).
If you want a EURO SIGN in the 1252 string, put it also into the str variable (e.g. as \u20ac).
Edit: Going to your examples one by one:
Console.WriteLine("\xc0\x80");
Your terminal doesn't support the character \x80 (PAD - Padding character), so it prints a question mark.
Console.WriteLine(Encoding.GetEncoding(1252).
GetString(new byte[] { 0xC0, 0x80 }));
The GetString call gives you "\xc0\u20ac". Encoding this to the terminal's character set gives the EURO SIGN.
Console.WriteLine(Encoding.GetEncoding(1252).GetString(
Encoding.GetEncoding(1252).GetBytes("\xc0\x80")));
GetBytes gives you { 0xC0, 0x3f}. GetString then gives you "\xC0?", and that gets printed.
Console.WriteLine(Encoding.GetEncoding(1252).GetString(bt));
This is really the same code as the second example.