I'm trying to decode a string like 'BHQsZMaQQok='.
All I know about the string is that it must be a number. I can find more encrypted string if it is necessary.
It's a base64 string, 8 bytes: 04 74 2C 64 C6 90 42 89
Interpreted as an IEEE754 double, it is 3.3121005957308838680392659232E-287
As a big-endian long: 320930284789842569
As a little-endian long: -8556117160291961852
One can only guess how to interpret it...
Related
This question already has an answer here:
Why does Guid.ToByteArray() order the bytes the way it does?
(1 answer)
Closed last year.
How does Guid.ToByteArray() work in c#.Can someone help me understand how does the following Guid token gets converted to ByteArray.
Guid: 35918bc9-196d-40ea-9779-889d79b753f0
guid.toByteArray: C9 8B 91 35 6D 19 EA 40 97 79 88 9D 79 B7 53 F0
A guid is essentially just a 128-bit number. Internally this is represented as one 32-bit int, two 16-bit ints and eight 8-bit ints.
So conversion to a byte array is essentially just creating an array, and using shifting to select the correct byte in the 16 & 32-bit ints.
I'm communicating to a device that returns uuencoded data:
ASCII: EZQAEgETAhMQIBwIAUkAAABj
HEX: 45-5A-51-41-45-67-45-54-41-68-4D-51-49-42-77-49-41-55-6B-41-41-41-42-6A
The documentation for this device states the above is uuencoded but I can't figure out how to decode it. The final result won't be a human readable string but the first byte reveals the number of bytes for the following product data. (Which would be 23 or 24?)
I've tried using Crypt2 to decode it; it doesn't seem to match 644, 666, 744 modes.
I've tried to hand write it out following the Wiki: https://en.wikipedia.org/wiki/Uuencoding#Formatting_mechanism
Doesn't make sense! How do I decode this uuencoded data?
I agree with #canton7 that it looks like it's base64 encoded. You can decode it like this
byte[] decoded = Convert.FromBase64String("EZQAEgETAhMQIBwIAUkAAABj");
and if you want, you can print the hex values like this
Console.WriteLine(BitConverter.ToString(decoded));
which prints
11-94-00-12-01-13-02-13-10-20-1C-08-01-49-00-00-00-63
As #HansKilian says in the comments, this is not uuencoded.
If you base64-decode it you get (in hex):
11 94 00 12 01 13 02 13 10 20 1c 08 01 49 00 00 00 63
The first number, 17 in decimal, is the same as the number of bytes following it, which matches:
The final result won't be a human readable string but the first byte reveals the number of bytes for the following product data.
(#HansKilian made the original call that it was base64-encoded. This answer provides confirmation of that by looking at the first decoded byte, but please accept his answer)
So I understand the way the values are encoded when their values are less than 127. However, after reading https://learn.microsoft.com/en-us/windows/desktop/seccertenroll/about-object-identifier, i still don't understand how values greater than 128 are encoded. For example:
1.3.6.1.4.1.311.21.20
gets encoded into:
2b 06 01 04 01 82 37 15 14
How is 311 encoded into 82 37? When you convert 8237 to decimal, you get 33335. I don't really understand this part exactly.
This article should help you understand the encoding.
7-bit encoding is used and 8th bit (MSB) used to indicate end of encoding.
82 37 is in binary 10000010 00110111. You can see that it is composed of 2 parts. The first part has MSB set to 1 but the second (also the last in this case) has the MSB set to 0 indicating end of encoding. If you decoded that (ignore MSB from first part) it would be 0000 0010 = 256 (2*128) + 0011 0111 = 55 (2^0 + 2^1 + 2^2 + 2^4 + 2^5) = 311
How to convert a simple string to a null-terminated one?
Example:
Example string: "Test message"
Here are the bytes:
54 65 73 74 20 6D 65 73 73 61 67 65
I need string with bytes like follows:
54 00 65 00 73 00 74 00 20 00 6D 00 65 00 73 00 73 00 61 00 67 00 65 00 00
I could use loops, but will be too ugly code. How can I make this conversion by native methods?
It looks like you want a null-terminated Unicode string. If the string is stored in a variable str, this should work:
var bytes = System.Text.Encoding.Unicode.GetBytes(str + "\0");
(See it run.)
Note that the resulting array will have three zero bytes at the end. This is because Unicode represents characters using two bytes. The first zero is half of the last character in the original string, and the next two are how Unicode encodes the null character '\0'. (In other words, there is one extra null character using my code than what you originally specified, but this is probably what you actually want.)
A little background on c# strings is a good place to start.
The internal structure of a C# string is different from a C string.
a) It is unicode, as is a 'char'
b) It is not null terminated
c) It includes many utility functions that in C/C++ you would require for.
How does it get away with no null termination? Simple! Internally a C# String manages a char array. C# arrays are structures, not pointers (as in C/C++). As such, they are aware of their own length. The Null termination in C/C++ is required so that string utility functions like strcmp() are able to detect the end of the string in memory.
The null character does exist in c#.
string content = "This is a message!" + '\0';
This will give you a string that ends with a null terminator. Importantly, the null character is invisible and will not show up in any output. It will show in the debug windows. It will also be present when you convert the string to a byte array (for saving to disk and other IO operations) but if you do Console.WriteLine(content) it will not be visible.
You should understand why you want that null terminator, and why you want to avoid using a loop construct to get what you are after. A null terminated string is fairly useless in c# unless you end up converting to a byte array. Generally you will only do that if you want to send your string to a native method, over a network or to a usb device.
It is also important to be aware of how you are getting your bytes. In C/C++, a char is stored as 1 bytes (8bit) and the encoding is ANSI. In C# the encoding is unicode, it is two bytes (16bit). Jon Skeet's answer shows you how to get the bytes in unicode.
Tongue in cheek but potentially useful answer.
If you are after output on your screen in hex as you have shown there you want to follow two steps:
Convert string (with null character '\0' on the end) to byte array
Convert bytes strings representations encoded in hex
Interleave with spaces
Print to screen
Try this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace stringlulz
{
class Program
{
static void Main(string[] args)
{
string original = "Test message";
byte[] bytes = System.Text.Encoding.Unicode.GetBytes(original + '\0');
var output = bytes.Aggregate(new StringBuilder(), (s, p) => s.Append(p.ToString("x2") + ' '), s => { s.Length--; return s; });
Console.WriteLine(output.ToString().ToUpper());
Console.ReadLine();
}
}
}
The output is:
54 00 65 00 73 00 74 00 20 00 6D 00 65 00 73 00 73 00 61 00 67 00 65 00 00 00
Here's a tested C# sample of an xml command null terminated and works great.
strCmd = #"<?xml version=""1.0"" encoding=""utf-8""?><Command name=""SerialNumber"" />";
sendB = System.Text.Encoding.UTF8.GetBytes(strCmd+"\0");
sportin.Send = sendB;
I'm parsing a file (which I don't generate) that contains a string. The string is always preceded by 2 bytes which tell me the length of the string that follows.
For example:
05 00 53 70 6F 72 74
would be:
Sport
Using a C# BinaryReader, I read the string using:
string s = new string(binaryReader.ReadChars(size));
Sometimes there's the odd funky character which seems to push the position of the stream on further than it should. For example:
0D 00 63 6F 6F 6B 20 E2 80 94 20 62 6F 6F 6B
Should be:
cook - book
and although it reads fine the stream ends up two bytes further along than it should?! (Which then messes up the rest of the parsing.)
I'm guessing it has something to do with the 0xE2 in the middle, but I'm not really sure why or how to deal with it.
Any suggestions greatly appreciated!
My guess is that the string is encoded in UTF-8. The 3-byte sequence E2 80 94 corresponds to the single Unicode character U+2014 (EM DASH).
In your first example
05 00 53 70 6F 72 74
none of the bytes are over 0x7F and that happens to be the limit for 7 bit ASCII. UTF-8 retains compability with ASCII by using the 8th bit to indicate that there will be more information to come.
0D 00 63 6F 6F 6B 20 E2 80 94 20 62 6F 6F 6B
Just as Ted noticed your "problems" starts with 0xE2 because that is not a 7 bit ASCII character.
The first byte 0x0D tells us there should be 11 characters but there are 13 bytes.
0xE2 tells us that we've found the beginning of a UTF-8 sequence since the most significant bit is set (it's over 127). In this case a sequence that represents — (EM Dash).
As you did correctly state the E2 character is the problem. BinaryReader.ReadChars(n) does not read n-bytes but n UTF-8 encoded Unicode characters. See Wikipedia for Unicode Encodings. The term you are after are Surrogate Characters. In UTF-8 characters in the range of 000080 – 00009F are represented by two bytes. This is the reason for your offset mismatch.
You need to use BinaryReader.ReadBytes to fix the offset issue and the pass it to an Encoding instance.
To make it work you need to read the bytes with BinaryReader and then decode it with the correct encoding. Assuming you are dealing with UTF-8 then you need to pass the byte array to
Encoding.UTF8.GetString(byte [] rawData)
to get your correctly encoded string back.
Yours,
Alois Kraus