I'm communicating to a device that returns uuencoded data:
ASCII: EZQAEgETAhMQIBwIAUkAAABj
HEX: 45-5A-51-41-45-67-45-54-41-68-4D-51-49-42-77-49-41-55-6B-41-41-41-42-6A
The documentation for this device states the above is uuencoded but I can't figure out how to decode it. The final result won't be a human readable string but the first byte reveals the number of bytes for the following product data. (Which would be 23 or 24?)
I've tried using Crypt2 to decode it; it doesn't seem to match 644, 666, 744 modes.
I've tried to hand write it out following the Wiki: https://en.wikipedia.org/wiki/Uuencoding#Formatting_mechanism
Doesn't make sense! How do I decode this uuencoded data?
I agree with #canton7 that it looks like it's base64 encoded. You can decode it like this
byte[] decoded = Convert.FromBase64String("EZQAEgETAhMQIBwIAUkAAABj");
and if you want, you can print the hex values like this
Console.WriteLine(BitConverter.ToString(decoded));
which prints
11-94-00-12-01-13-02-13-10-20-1C-08-01-49-00-00-00-63
As #HansKilian says in the comments, this is not uuencoded.
If you base64-decode it you get (in hex):
11 94 00 12 01 13 02 13 10 20 1c 08 01 49 00 00 00 63
The first number, 17 in decimal, is the same as the number of bytes following it, which matches:
The final result won't be a human readable string but the first byte reveals the number of bytes for the following product data.
(#HansKilian made the original call that it was base64-encoded. This answer provides confirmation of that by looking at the first decoded byte, but please accept his answer)
Related
I have a task to take millions of floats and store them in the database in batches of 5,000, as binary. This is forcing me to learn interesting things about serialization performance.
One of the things that surprises me is the size of the serialized data, which is a factor of ten above what I expected. This test shows me that a four-byte float is serialized to 55 bytes and an eight-byte double to 59 bytes.
What is happening here? I expected it to simply split the float value into its four bytes. What are the other 51 bytes?
private void SerializeFloat()
{
Random rnd = new Random();
IFormatter iFormatter = new BinaryFormatter();
using (MemoryStream memoryStream = new MemoryStream(10000000))
{
memoryStream.Capacity = 0;
iFormatter.Serialize(memoryStream, (Single)rnd.NextDouble());
iFormatter.Serialize(memoryStream, rnd.NextDouble());
}
}
Serialization is more than simply blitting bits and bytes to a stream. Serialization is structured output. This structure accounts for your actual differences. The Framework encodes additional information which lets it know the type and number of objects in the serialized data, among many other possibilities. It is an implementation detail best left alone.
If you need unstructured output, you could use BinaryWriter instead.
Because it is maybe of interest for someone I decided to do this post about What does the binary format of serialized .NET objects look like and how can we interpret it correctly?
I have based all my research on the .NET Remoting: Binary Format Data Structure specification.
Example class:
To have a working example, I have created a simple class called A which contains 2 properties, one string and one integer value, they are called SomeString and SomeValue.
Class A looks like this:
[Serializable()]
public class A
{
public string SomeString
{
get;
set;
}
public int SomeValue
{
get;
set;
}
}
For the serialization I used the BinaryFormatter of course:
BinaryFormatter bf = new BinaryFormatter();
StreamWriter sw = new StreamWriter("test.txt");
bf.Serialize(sw.BaseStream, new A() { SomeString = "abc", SomeValue = 123 });
sw.Close();
As can be seen, I passed a new instance of class A containing abc and 123 as values.
Example result data:
If we look at the serialized result in an hex editor, we get something like this:
Let us interpret the example result data:
According to the above mentioned specification (here is the direct link to the PDF: [MS-NRBF].pdf) every record within the stream is identified by the RecordTypeEnumeration. Section 2.1.2.1 RecordTypeNumeration states:
This enumeration identifies the type of the record. Each record (except for MemberPrimitiveUnTyped) starts with a record type enumeration. The size of the enumeration is one BYTE.
SerializationHeaderRecord:
So if we look back at the data we got, we can start interpreting the first byte:
As stated in 2.1.2.1 RecordTypeEnumeration a value of 0 identifies the SerializationHeaderRecord which is specified in 2.6.1 SerializationHeaderRecord:
The SerializationHeaderRecord record MUST be the first record in a binary serialization. This record has the major and minor version of the format and the IDs of the top object and the headers.
It consists of:
RecordTypeEnum (1 byte)
RootId (4 bytes)
HeaderId (4 bytes)
MajorVersion (4 bytes)
MinorVersion (4 bytes)
With that knowledge we can interpret the record containing 17 bytes:
00 represents the RecordTypeEnumeration which is SerializationHeaderRecord in our case.
01 00 00 00 represents the RootId
If neither the BinaryMethodCall nor BinaryMethodReturn record is present in the serialization stream, the value of this field MUST contain the ObjectId of a Class, Array, or BinaryObjectString record contained in the serialization stream.
So in our case this should be the ObjectId with the value 1 (because the data is serialized using little-endian) which we will hopefully see again ;-)
FF FF FF FF represents the HeaderId
01 00 00 00 represents the MajorVersion
00 00 00 00 represents the MinorVersion
BinaryLibrary:
As specified, each record must begin with the RecordTypeEnumeration. As the last record is complete, we must assume that a new one begins.
Let us interpret the next byte:
As we can see, in our example the SerializationHeaderRecord it is followed by the BinaryLibrary record:
The BinaryLibrary record associates an INT32 ID (as specified in [MS-DTYP] section 2.2.22) with a Library name. This allows other records to reference the Library name by using the ID. This approach reduces the wire size when there are multiple records that reference the same Library name.
It consists of:
RecordTypeEnum (1 byte)
LibraryId (4 bytes)
LibraryName (variable number of bytes (which is a LengthPrefixedString))
As stated in 2.1.1.6 LengthPrefixedString...
The LengthPrefixedString represents a string value. The string is prefixed by the length of the UTF-8 encoded string in bytes. The length is encoded in a variable-length field with a minimum of 1 byte and a maximum of 5 bytes. To minimize the wire size, length is encoded as a variable-length field.
In our simple example the length is always encoded using 1 byte. With that knowledge we can continue the interpretation of the bytes in the stream:
0C represents the RecordTypeEnumeration which identifies the BinaryLibrary record.
02 00 00 00 represents the LibraryId which is 2 in our case.
Now the LengthPrefixedString follows:
42 represents the length information of the LengthPrefixedString which contains the LibraryName.
In our case the length information of 42 (decimal 66) tell's us, that we need to read the next 66 bytes and interpret them as the LibraryName.
As already stated, the string is UTF-8 encoded, so the result of the bytes above would be something like: _WorkSpace_, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
ClassWithMembersAndTypes:
Again, the record is complete so we interpret the RecordTypeEnumeration of the next one:
05 identifies a ClassWithMembersAndTypes record. Section 2.3.2.1 ClassWithMembersAndTypes states:
The ClassWithMembersAndTypes record is the most verbose of the Class records. It contains metadata about Members, including the names and Remoting Types of the Members. It also contains a Library ID that references the Library Name of the Class.
It consists of:
RecordTypeEnum (1 byte)
ClassInfo (variable number of bytes)
MemberTypeInfo (variable number of bytes)
LibraryId (4 bytes)
ClassInfo:
As stated in 2.3.1.1 ClassInfo the record consists of:
ObjectId (4 bytes)
Name (variable number of bytes (which is again a LengthPrefixedString))
MemberCount(4 bytes)
MemberNames (which is a sequence of LengthPrefixedString's where the number of items MUST be equal to the value specified in the MemberCount field.)
Back to the raw data, step by step:
01 00 00 00 represents the ObjectId. We've already seen this one, it was specified as the RootId in the SerializationHeaderRecord.
0F 53 74 61 63 6B 4F 76 65 72 46 6C 6F 77 2E 41 represents the Name of the class which is represented by using a LengthPrefixedString. As mentioned, in our example the length of the string is defined with 1 byte so the first byte 0F specifies that 15 bytes must be read and decoded using UTF-8. The result looks something like this: StackOverFlow.A - so obviously I used StackOverFlow as name of the namespace.
02 00 00 00 represents the MemberCount, it tell's us that 2 members, both represented with LengthPrefixedString's will follow.
Name of the first member:
1B 3C 53 6F 6D 65 53 74 72 69 6E 67 3E 6B 5F 5F 42 61 63 6B 69 6E 67 46 69 65 6C 64 represents the first MemberName, 1B is again the length of the string which is 27 bytes in length an results in something like this: <SomeString>k__BackingField.
Name of the second member:
1A 3C 53 6F 6D 65 56 61 6C 75 65 3E 6B 5F 5F 42 61 63 6B 69 6E 67 46 69 65 6C 64 represents the second MemberName, 1A specifies that the string is 26 bytes long. It results in something like this: <SomeValue>k__BackingField.
MemberTypeInfo:
After the ClassInfo the MemberTypeInfo follows.
Section 2.3.1.2 - MemberTypeInfo states, that the structure contains:
BinaryTypeEnums (variable in length)
A sequence of BinaryTypeEnumeration values that represents the Member Types that are being transferred. The Array MUST:
Have the same number of items as the MemberNames field of the ClassInfo structure.
Be ordered such that the BinaryTypeEnumeration corresponds to the Member name in the MemberNames field of the ClassInfo structure.
AdditionalInfos (variable in length), depending on the BinaryTpeEnum additional info may or may not be present.
| BinaryTypeEnum | AdditionalInfos |
|----------------+--------------------------|
| Primitive | PrimitiveTypeEnumeration |
| String | None |
So taking that into consideration we are almost there...
We expect 2 BinaryTypeEnumeration values (because we had 2 members in the MemberNames).
Again, back to the raw data of the complete MemberTypeInfo record:
01 represents the BinaryTypeEnumeration of the first member, according to 2.1.2.2 BinaryTypeEnumeration we can expect a String and it is represented using a LengthPrefixedString.
00 represents the BinaryTypeEnumeration of the second member, and again, according to the specification, it is a Primitive. As stated above, Primitive's are followed by additional information, in this case a PrimitiveTypeEnumeration. That's why we need to read the next byte, which is 08, match it with the table stated in 2.1.2.3 PrimitiveTypeEnumeration and be surprised to notice that we can expect an Int32 which is represented by 4 bytes, as stated in some other document about basic datatypes.
LibraryId:
After the MemerTypeInfo the LibraryId follows, it is represented by 4 bytes:
02 00 00 00 represents the LibraryId which is 2.
The values:
As specified in 2.3 Class Records:
The values of the Members of the Class MUST be serialized as records that follow this record, as specified in section 2.7. The order of the records MUST match the order of MemberNames as specified in the ClassInfo (section 2.3.1.1) structure.
That's why we can now expect the values of the members.
Let us look at the last few bytes:
06 identifies an BinaryObjectString. It represents the value of our SomeString property (the <SomeString>k__BackingField to be exact).
According to 2.5.7 BinaryObjectString it contains:
RecordTypeEnum (1 byte)
ObjectId (4 bytes)
Value (variable length, represented as a LengthPrefixedString)
So knowing that, we can clearly identify that
03 00 00 00 represents the ObjectId.
03 61 62 63 represents the Value where 03 is the length of the string itself and 61 62 63 are the content bytes that translate to abc.
Hopefully you can remember that there was a second member, an Int32. Knowing that the Int32 is represented by using 4 bytes, we can conclude, that
must be the Value of our second member. 7B hexadecimal equals 123 decimal which seems to fit our example code.
So here is the complete ClassWithMembersAndTypes record:
MessageEnd:
Finally the last byte 0B represents the MessageEnd record.
Binary serialization is type safe. It makes sure that when you deserialize the data, you'll get the exact same object back.
To make that work, BinaryFormatter adds additional data about the types of the objects that you serialize. You are seeing that extra overhead. You can see it by serializing to a FileStream and looking at the generated file with a hex viewer. You'll see strings back, like "System.Single", the type name, and "m_value", the name of the field where the value is stored. A good way to cut down on the overhead is to, say, serialize an array instead.
BinaryWriter is the exact opposite, very compact but not type-safe. Plenty of alternatives are available in between.
.NET serialization throws in a bunch of information other than the actual 8 bytes of your double (type information, etc.). You could use a file Stream and then write the bytes gotten by byte[] BitConverter.GetBytes(double) or the BinaryWriter class.
There are many alternatives to .NET serialization:
Text formats
XML
JSON
Binary formats
Google Protocol Buffers
MessagePack
These all have their pros and cons. I especially like MessagePack and encourage you to take a look at it. For example, it will use 9 bytes to store a self-describing double.
I'm parsing a file (which I don't generate) that contains a string. The string is always preceded by 2 bytes which tell me the length of the string that follows.
For example:
05 00 53 70 6F 72 74
would be:
Sport
Using a C# BinaryReader, I read the string using:
string s = new string(binaryReader.ReadChars(size));
Sometimes there's the odd funky character which seems to push the position of the stream on further than it should. For example:
0D 00 63 6F 6F 6B 20 E2 80 94 20 62 6F 6F 6B
Should be:
cook - book
and although it reads fine the stream ends up two bytes further along than it should?! (Which then messes up the rest of the parsing.)
I'm guessing it has something to do with the 0xE2 in the middle, but I'm not really sure why or how to deal with it.
Any suggestions greatly appreciated!
My guess is that the string is encoded in UTF-8. The 3-byte sequence E2 80 94 corresponds to the single Unicode character U+2014 (EM DASH).
In your first example
05 00 53 70 6F 72 74
none of the bytes are over 0x7F and that happens to be the limit for 7 bit ASCII. UTF-8 retains compability with ASCII by using the 8th bit to indicate that there will be more information to come.
0D 00 63 6F 6F 6B 20 E2 80 94 20 62 6F 6F 6B
Just as Ted noticed your "problems" starts with 0xE2 because that is not a 7 bit ASCII character.
The first byte 0x0D tells us there should be 11 characters but there are 13 bytes.
0xE2 tells us that we've found the beginning of a UTF-8 sequence since the most significant bit is set (it's over 127). In this case a sequence that represents — (EM Dash).
As you did correctly state the E2 character is the problem. BinaryReader.ReadChars(n) does not read n-bytes but n UTF-8 encoded Unicode characters. See Wikipedia for Unicode Encodings. The term you are after are Surrogate Characters. In UTF-8 characters in the range of 000080 – 00009F are represented by two bytes. This is the reason for your offset mismatch.
You need to use BinaryReader.ReadBytes to fix the offset issue and the pass it to an Encoding instance.
To make it work you need to read the bytes with BinaryReader and then decode it with the correct encoding. Assuming you are dealing with UTF-8 then you need to pass the byte array to
Encoding.UTF8.GetString(byte [] rawData)
to get your correctly encoded string back.
Yours,
Alois Kraus
So here's my issue. I have a binary file that I want to edit. I can use a hex editor to edit it of course, but I need to make a program to edit this particular file. Say that I know a certain hex I want to edit, I know it's address etc. Let's say that it's a 16-bit binary, and the address is 00000000, it's on row 04 and it has a value of 02. How could I create a program that would change the value of that hex, and only that hex with the click of a button?
I've found resources that talk about similar things, but I can't for the life of me find help with the exact issue.
Any help would be appreciated, and please, don't just tell me the answer if there is one but try and explain a bit.
I think this is best explained with a specific example. Here are the first 32 bytes of an executable file as shown in Visual Studio's hex editor:
00000000 4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00
00000010 B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
Now a file is really just a linear sequence of bytes. The rows that you see in a hex editor are just there to make things easier to read. When you want to manipulate the bytes in a file using code, you need to identify the bytes by their 0-based positions. In the above example, the positions of the non-zero bytes are as follows:
Position Value
-------- ------
0 0x4D
1 0x5A
2 0x90
4 0x03
8 0x04
12 0xFF
13 0xFF
16 0xB8
24 0x40
In the hex editor representation shown above, the numbers on the left represent the positions of the first byte in the corresponding line. The editor is showing 16 bytes per line, so they increment by 16 (0x10) at each line.
If you simply want to take one of the bytes in the file and change its value, the most efficient approach that I see would be to open the file using a FileStream, seek to the appropriate position, and overwrite the byte. For example, the following will change the 0x40 at position 24 to 0x04:
using (var stream = new FileStream(path, FileMode.Open, FileAccess.ReadWrite)) {
stream.Position = 24;
stream.WriteByte(0x04);
}
Well the first thing would probably be to understand the conversions. Hex to decimal probably isn't as important (unless of course you need to change the value from a decimal first, but that's a simple conversion formula), but hex to binary will be important seeing as each hex character (0-9,A-F) corresponds to a specific binary output.
After understanding that stuff, the next step is to figure out exactly what you are searching for, make the proper conversion, and replace that exact string. I would recommend (if the buffer wouldn't be too large) to take the entire hex dump and replace whatever you're searching for in there to avoid overwriting a duplicate binary sequence.
Hope that helps!
Regards,
Dennis M.
I want to create a very simple piece of software in C# .NET that I can pass a folder's path to and detect all files with a frequency of below a given threshold. Any pointers on how I would do this?
You have to read mp3 files. To do that you have to find specifications for them.
Generally mp3 file is wrapped into ID3 tag, so that you have to read it, find its length and skip it. Let's take ID3v2.3 for example:
ID3v2/file identifier "ID3"
ID3v2 version $03 00
ID3v2 flags %abc00000
ID3v2 size 4 * %0xxxxxxx
so bytes 6,7,8,9 store header length in big-endian form. Here is sample of some file:
0 1 2 3 4 5 6 7 8 9 A B C D E F
49 44 33 03 00 00 00 00 07 76 54 43 4f 4e 00 00
07 76 - is the size. You need to shift left first byte so that actual size is 3F6. Then add 10 (A) to get the offset = 400. This is address of start of mp3 header.
Then you take description of mp3 header:
bits are: AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM, we need FF , sampling frequency and convert t to actual frequency:
bits MPEG1 MPEG2 MPEG2.5
00 44100 22050 11025
01 48000 24000 12000
10 32000 16000 8000
11 reserv. reserv. reserv.
You can use UltraID3Lib to get mp3 metadata (bitrate, frequency)
Check value of frequency bits in a file. There is some info about mp3 format.
How can i write all bits of a file using c#?
For example writing 0 to all bits
Please provide me with a sample
I'm not sure why you'd want to do this, but this will overwrite a file with data that is the same length but contains byte values of zero:
File.WriteAllBytes(filePath, new byte[new FileInfo(filePath).Length]);
Definitely has the foul stench of homework to it.
Hint - Think why someone might want to do this. Just deleting the file and replacing with a file of 0s of the correct length might not be what you're after.
Have a look at System.IO.FileInfo; you'll need to open a writable stream for the file you're interested in and then write however many bytes (with value 0 in your example) to it as there are in the file already (which you can ascertain via FileInfo.Length). Be sure to dispose of the stream once you're done with it – using constructs are useful for this purpose.
Consider using the BinaryWriter available in the .NET framework
using(BinaryWriter binWriter =
new BinaryWriter(File.Open(fileName, FileMode.Create)))
{
binWriter.Write("Hello world");
}
When you say write all bits to a file I'll assume you mean bits as in nyble, bit, byte. That's just writing an integer to a file. You can't have a 4 bit file as far as I know so the smallest denomination will be a byte.
You probably don't want to be responsible for serializing yourself, so your easiest option would be to use the BinaryReader and BinaryWriter classes, and then manipulate the bits inside your C#.
The BinaryWriter class uses a 4 byte integer as minimum however. For example
writer.Write( 1 ); // 01
writer.Write( 10 ); // 0a
writer.Write( 100 ); // 64
writer.Write( 1000 ); // 3e8
writer.Write( 10000 ); // 2710
//writer.Write( 123456789 ); // 75BCD15
is written to file as
01 00 00 00 0a 00 00 00 64 00 00 00 e8 03 00 00 10 27 00 00 15 cd 5b 07
read into a byte and then test against >= powers of 2 to get each of the bits in that byte