Read binary file into a struct - c#

I'm trying to read binary data using C#. I have all the information about the layout of the data in the files I want to read. I'm able to read the data "chunk by chunk", i.e. getting the first 40 bytes of data converting it to a string, get the next 40 bytes.
Since there are at least three slightly different version of the data, I would like to read the data directly into a struct. It just feels so much more right than by reading it "line by line".
I have tried the following approach but to no avail:
StructType aStruct;
int count = Marshal.SizeOf(typeof(StructType));
byte[] readBuffer = new byte[count];
BinaryReader reader = new BinaryReader(stream);
readBuffer = reader.ReadBytes(count);
GCHandle handle = GCHandle.Alloc(readBuffer, GCHandleType.Pinned);
aStruct = (StructType) Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(StructType));
handle.Free();
The stream is an opened FileStream from which I have began to read from. I get an AccessViolationException when using Marshal.PtrToStructure.
The stream contains more information than I'm trying to read since I'm not interested in data at the end of the file.
The struct is defined like:
[StructLayout(LayoutKind.Explicit)]
struct StructType
{
[FieldOffset(0)]
public string FileDate;
[FieldOffset(8)]
public string FileTime;
[FieldOffset(16)]
public int Id1;
[FieldOffset(20)]
public string Id2;
}
The examples code is changed from original to make this question shorter.
How would I read binary data from a file into a struct?

The problem is the strings in your struct. I found that marshaling types like byte/short/int is not a problem; but when you need to marshal into a complex type such as a string, you need your struct to explicitly mimic an unmanaged type. You can do this with the MarshalAs attrib.
For your example, the following should work:
[StructLayout(LayoutKind.Explicit)]
struct StructType
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
public string FileDate;
[FieldOffset(8)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
public string FileTime;
[FieldOffset(16)]
public int Id1;
[FieldOffset(20)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 66)] //Or however long Id2 is.
public string Id2;
}

Here is what I am using.This worked successfully for me for reading Portable Executable Format.It's a generic function, so T is your struct type.
public static T ByteToType<T>(BinaryReader reader)
{
byte[] bytes = reader.ReadBytes(Marshal.SizeOf(typeof(T)));
GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
T theStructure = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));
handle.Free();
return theStructure;
}

As Ronnie said, I'd use BinaryReader and read each field individually. I can't find the link to the article with this info, but it's been observed that using BinaryReader to read each individual field can be faster than Marshal.PtrToStruct, if the struct contains less than 30-40 or so fields. I'll post the link to the article when I find it.
The article's link is at: http://www.codeproject.com/Articles/10750/Fast-Binary-File-Reading-with-C
When marshaling an array of structs, PtrToStruct gains the upper-hand more quickly, because you can think of the field count as fields * array length.

I don't see any problem with your code.
just out of my head, what if you try to do it manually? does it work?
BinaryReader reader = new BinaryReader(stream);
StructType o = new StructType();
o.FileDate = Encoding.ASCII.GetString(reader.ReadBytes(8));
o.FileTime = Encoding.ASCII.GetString(reader.ReadBytes(8));
...
...
...
also try
StructType o = new StructType();
byte[] buffer = new byte[Marshal.SizeOf(typeof(StructType))];
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.StructureToPtr(o, handle.AddrOfPinnedObject(), false);
handle.Free();
then use buffer[] in your BinaryReader instead of reading data from FileStream to see whether you still get AccessViolation exception.
I had no luck using the
BinaryFormatter, I guess I have to
have a complete struct that matches
the content of the file exactly.
That makes sense, BinaryFormatter has its own data format, completely incompatible with yours.

I had no luck using the BinaryFormatter, I guess I have to have a complete struct that matches the content of the file exactly. I realised that in the end I wasn't interested in very much of the file content anyway so I went with the solution of reading part of stream into a bytebuffer and then converting it using
Encoding.ASCII.GetString()
for strings and
BitConverter.ToInt32()
for the integers.
I will need to be able to parse more of the file later on but for this version I got away with just a couple of lines of code.

Try this:
using (FileStream stream = new FileStream(fileName, FileMode.Open))
{
BinaryFormatter formatter = new BinaryFormatter();
StructType aStruct = (StructType)formatter.Deserialize(filestream);
}

Reading straight into structs is evil - many a C program has fallen over because of different byte orderings, different compiler implementations of fields, packing, word size.......
You are best of serialising and deserialising byte by byte. Use the build in stuff if you want or just get used to BinaryReader.

I had structure:
[StructLayout(LayoutKind.Explicit, Size = 21)]
public struct RecordStruct
{
[FieldOffset(0)]
public double Var1;
[FieldOffset(8)]
public byte var2
[FieldOffset(9)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 12)]
public string String1;
}
}
and I received "incorrectly aligned or overlapped by non-object".
Based on that I found:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/2f9ffce5-4c64-4ea7-a994-06b372b28c39/strange-issue-with-layoutkindexplicit?forum=clr
OK. I think I understand what's going on here. It seems like the
problem is related to the fact that the array type (which is an object
type) must be stored at a 4-byte boundary in memory. However, what
you're really trying to do is serialize the 6 bytes separately.
I think the problem is the mix between FieldOffset and serialization
rules. I'm thinking that structlayout.sequential may work for you,
since it doesn't actually modify the in-memory representation of the
structure. I think FieldOffset is actually modifying the in-memory
layout of the type. This causes problems because the .NET framework
requires object references to be aligned on appropriate boundaries (it
seems).
So my struct was defined as explicit with:
[StructLayout(LayoutKind.Explicit, Size = 21)]
and thus my fields had specified
[FieldOffset(<offset_number>)]
but when you change your struct to Sequentional, you can get rid of those offsets and the error will disappear. Something like:
[StructLayout(LayoutKind.Sequential, Size = 21)]
public struct RecordStruct
{
public double Var1;
public byte var2;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 12)]
public string String1;
}
}

Related

safe fixed size array in struct c#

I have a C struct in an embedded MCU with about 1000 elements, and it contains lots of fixed size arrays and other structs inside, Now I want to bring the data to PC Using C#
Here is a simple preview of my struct elements in C
struct _registers
{
char name[32];
float calibrate[4][16];
float DMTI;
float DMTII;
float DMTIII;
float DMTIE;
float DMTIIE;
....
};
Now I want to convert the Struct into C# using the GCHandle class,
something like this
//The C struct is in this byte array named buffer
byte[] buffer = new byte[4096];
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
_registers stuff = (protection_registers)Marshal.PtrToStructure(handle.AddrOfPinnedObject(),typeof(_registers));
handle.Free();
the problem is that the Visual studio complains about the "Pointers and fixed size buffers may only be used in an unsafe context"
is there a way to use it normally without unsafe code? I have found Doing something like this
[StructLayout(LayoutKind.Explicit, Size = 56, Pack = 1)]
public struct NewStuff
{
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32)]
[FieldOffset(0)]
public string name;
[MarshalAs(UnmanagedType.U4)]
[FieldOffset(32)]
public float calibrate[4][16];
}
but as the Code on the MCU is evolving in the coming years and we will add lots of functionality and parameter to the Struct, and since the struct has already 1000 elements, how we can do it better and more clever way? because keep tracking of all the offsets is very hard and error prone!
Try doing something like this instead (note: using class instead of struct which is more appropriate for C# - still marshals OK to a C++ struct):
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public class NewStuff
{
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32)]
public StringBuilder name;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 4*16)]
public float[,] calibrate;
[MarshalAs(UnmanagedType.R4)]
public float DMTI;
[MarshalAs(UnmanagedType.R4)]
public float DMTII;
// Etc
}
You won't be able to remove the SizeConst since the marshalling needs to know this.
Also, when you intialise the class, you will need to set the array fields to the appropriate sized buffers, and initialise the StringBuilder with the correct buffer size.
Doing it this way means you can avoid using fixed buffers (and hence, you avoid the unsafe code).

Is there a generic way to write a struct to bytes in Big Endian format?

I've found questions such as this one, which have come close to solving my dilemma. However, I've yet to find a clean approach to solving this issue in a generic manner.
I have a project that has a lot of structs that will be used for binary data transmission. This data needs to be Big Endian and, of course, most .Net architecture is Little Endian. This means that when I convert my structs to bytes, the byte order for my values are reversed.
Is there a fairly straight-forward approach to either forcing my structs to contain data in Big Endian format, or is there a way to generically write these structs to byte arrays (and byte arrays to structs) that output Big Endian data?
Here is some sample code for what I've already done.
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public unsafe struct StructType_1
{
short shortVal;
ulong ulongVal;
int intVal;
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public unsafe struct StructType_2
{
long longVal_1;
long longVal_2;
long longVal;
int intVal;
}
...
public static class StructHelper
{
//How can I change the following methods so that the struct data
//is converted to and from BigEndian data?
public static byte[] StructToByteArray<T>(T structVal) where T : struct
{
int size = Marshal.SizeOf(structVal);
byte[] arr = new byte[size];
IntPtr ptr = Marshal.AllocHGlobal(size);
Marshal.StructureToPtr(structVal, ptr, true);
Marshal.Copy(ptr, arr, 0, size);
Marshal.FreeHGlobal(ptr);
return arr;
}
public static T StructFromByteArray<T>(byte[] bytes) where T : struct
{
int sz = Marshal.SizeOf(typeof(T));
IntPtr buff = Marshal.AllocHGlobal(sz);
Marshal.Copy(bytes, 0, buff, sz);
T ret = (T)Marshal.PtrToStructure(buff, typeof(T));
Marshal.FreeHGlobal(buff);
return ret;
}
}
If you don't mind reading and writing each field to a stream (which may have performance implications) you could use Jon Skeet's EndianBinaryWriter: https://stackoverflow.com/a/1540269/106159
The code would look something like this:
public unsafe struct StructType_2
{
long longVal_1;
long longVal_2;
long longVal;
int intVal;
}
using (MemoryStream memory = new MemoryStream())
{
using (EndianBinaryWriter writer = new EndianBinaryWriter(EndianBitConverter.Big, stream))
{
writer.Write(longVal_1);
writer.Write(longVal_2);
writer.Write(longVal);
writer.Write(intVal);
}
byte[] buffer = memory.ToArray();
// Use buffer
}
You would use the EndianBinaryReader for data going in the opposite direction.
This does of course have two fairly large drawbacks:
It's fiddly writing code to read and write each field.
It may be too slow to do it this way, depending on performance requirements.
Also have a look at this answers to this similar question: Marshalling a big-endian byte collection into a struct in order to pull out values
Given the example on the BitConverter that shows converting a uint, I would suspect not.

Read non-byte array from file without having to use a loop?

Is there a way to read binary data from file into an array like in C where I can pass a pointer of any type to the i/o functions? I am thinking of something like BinaryReader::ReadBytes(), but that returns a byte[] which I cannot cast to the desired array pointer type.
If you have a fixed size struct
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
struct MyFixedStruct
{
//..
}
You can then read it in in one go using this:
public static T ReadStruct<T>(Stream stream)
{
byte[] buffer = new byte[Marshal.SizeOf(typeof(T))];
stream.Read(buffer, 0, Marshal.SizeOf(typeof(T)));
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
T typedStruct = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));
handle.Free();
return typedStruct;
}
This reads in a byte array covering the size of the struct and then marshals the byte array into the structure. You can use it like this:
MyFixedStruct fixedStruct = ReadStruct<MyFixedStruct>(stream);
The struct may include array types as long as the array length is specified, i.e:
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
public struct MyFixedStruct
{
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 5)]
public int[] someInts; // 5 int's
//..
};
Edit:
I see you just want to read in a short array - In this case just read in the byte array and use Buffer.BlockCopy() to convert to the array you want:
byte[] someBytes = ..
short[] someShorts = new short[someBytes.Length/2];
Buffer.BlockCopy(someBytes, 0, someShorts, 0, someBytes.Length);
This is quite efficient, equivalent to a memcpy in C++ under the hood. The only other overhead you have of course is that the original byte array will be allocated and later garbage collected. This approach would work for any other primitive array type as well.
How about storing a serialized array of your struct in the file? You can build the array of structs easily. Not sure how to stream through the file though, as you would do in C.
How about using a Stream Class as it provides a generic view for sequence of bytes.

How to read byte blocks into struct

I have this resource file which I need to process, wich packs a set of files.
First, the resource file lists all the files contained within, plus some other data, such as in this struct:
struct FileEntry{
byte Value1;
char Filename[12];
byte Value2;
byte FileOffset[3];
float whatever;
}
So I would need to read blocks exactly this size.
I am using the Read function from FileStream, but how can I specify the size of the struct?
I used:
int sizeToRead = Marshal.SizeOf(typeof(Header));
and then pass this value to Read, but then I can only read a set of byte[] which I do not know how to convert into the specified values (well I do know how to get the single byte values... but not the rest of them).
Also I need to specify an unsafe context which I don't know whether it's correct or not...
It seems to me that reading byte streams is tougher than I thought in .NET :)
Thanks!
Assuming this is C#, I wouldn't create a struct as a FileEntry type. I would replace char[20] with strings and use a BinaryReader - http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx to read individual fields. You must read the data in the same order as it was written.
Something like:
class FileEntry {
byte Value1;
char[] Filename;
byte Value2;
byte[] FileOffset;
float whatever;
}
using (var reader = new BinaryReader(File.OpenRead("path"))) {
var entry = new FileEntry {
Value1 = reader.ReadByte(),
Filename = reader.ReadChars(12) // would replace this with string
FileOffset = reader.ReadBytes(3),
whatever = reader.ReadFloat()
};
}
If you insist having a struct, you should make your struct immutable and create a constructor with arguments for each of your field.
If you can use unsafe code:
unsafe struct FileEntry{
byte Value1;
fixed char Filename[12];
byte Value2;
fixed byte FileOffset[3];
float whatever;
}
public unsafe FileEntry Get(byte[] src)
{
fixed(byte* pb = &src[0])
{
return *(FileEntry*)pb;
}
}
The fixed keyword embeds the array in the struct. Since it is fixed, this can cause GC issues if you are constantly creating these and never letting them go. Keep in mind that the constant sizes are the n*sizeof(t). So the Filename[12] is allocating 24 bytes (each char is 2 bytes unicode) and FileOffset[3] is allocating 3 bytes. This matters if you're not dealing with unicode data on disk. I would recommend changing it to a byte[] and converting the struct to a usable class where you can convert the string.
If you can't use unsafe, you can do the whole BinaryReader approach:
public unsafe FileEntry Get(Stream src)
{
FileEntry fe = new FileEntry();
var br = new BinaryReader(src);
fe.Value1 = br.ReadByte();
...
}
The unsafe way is nearly instant, far faster, especially when you're converting a lot of structs at once. The question is do you want to use unsafe. My recommendation is only use the unsafe method if you absolutely need the performance boost.
Base on this article, only I have made it generic, this is how to marshal the data directly to the struct. Very useful on longer data types.
public static T RawDataToObject<T>(byte[] rawData) where T : struct
{
var pinnedRawData = GCHandle.Alloc(rawData,
GCHandleType.Pinned);
try
{
// Get the address of the data array
var pinnedRawDataPtr = pinnedRawData.AddrOfPinnedObject();
// overlay the data type on top of the raw data
return (T) Marshal.PtrToStructure(pinnedRawDataPtr, typeof(T));
}
finally
{
// must explicitly release
pinnedRawData.Free();
}
}
Example Usage:
[StructLayout(LayoutKind.Sequential)]
public struct FileEntry
{
public readonly byte Value1;
//you may need to play around with this one
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
public readonly string Filename;
public readonly byte Value2;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public readonly byte[] FileOffset;
public readonly float whatever;
}
private static void Main(string[] args)
{
byte[] data =;//from file stream or whatever;
//usage
FileEntry entry = RawDataToObject<FileEntry>(data);
}
Wrapping your FileStream with a BinaryReader will give you dedicated Read*() methods for primitive types:
http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx
Out of my head, you could probably mark your struct with [StructLayout(LayoutKind.Sequential)] (to ensure proper representation in memory) and use a pointer in unsafe block to actually fill the struct C-style. Going unsafe is not recommended if you don't really need it (interop, heavy operations like image processing and so on) however.
Not a full answer (it's been covered I think), but a specific note on the filename:
The Char type is probably not a one-byte thing in C#, since .Net characters are unicode, meaning they support character values far beyond 255, so interpreting your filename data as Char[] array will give problems. So the first step is definitely to read that as Byte[12], not Char[12].
A straight conversion from byte array to char array is also not advised, though, since in binary indices like this, filenames that are shorter than the allowed 12 characters will probably be padded with '00' bytes, so a straight conversion will result in a string that's always 12 characters long and might end on these zero-characters.
However, simply trimming these zeroes off is not advised, since reading systems for such data usually simply read up to the first encountered zero, and the data behind that in the array might actually contain garbage if the writing system doesn't bother to specifically clear its buffer with zeroes before putting the string into it. It's something a lot of programs don't bother doing, since they assume the reading system will only interpret the string up to the first zero anyway.
So, assuming this is indeed such a typical zero-terminated (C-style) string, saved in a one-byte-per-character text encoding (like ASCII, DOS-437 or Win-1252), the second step is to cut off the string on the first zero. You can easily do this with Linq's TakeWhile function. Then the third and final step is to convert the resulting byte array to string with whatever that one-byte-per-character text encoding it's written with happens to be:
public String StringFromCStringArray(Byte[] readData, Encoding encoding)
{
return encoding.GetString(readData.TakeWhile(x => x != 0).ToArray());
}
As I said, the encoding will probably be something like pure ASCII, which can be accessed from Encoding.ASCII, standard US DOS encoding, which is Encoding.GetEncoding(437), or Windows-1252, the standard US / western Europe Windows text encoding, which you can retrieve with Encoding.GetEncoding("Windows-1252").

AccessViolationException when serializing a struct of arrays of structs?

I have a sequential struct that I'd like to serialize to a file, which seems trivial. However, this struct consists of, among other things, 2 arrays of other types of structs. The main struct is defined as follows...
[StructLayout(LayoutKind.Sequential)]
public struct ParentStruct
{
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 256)]
public const string prefix = "PRE";
public Int32 someInteger;
public DataLocater[] locater; //DataLocater is another struct
public Body[] body; //Body is another struct
};
I can create these structs exactly as intended. However, when trying to serialize with the following method (which seems popular online), I get an AccessViolationException:
public static byte[] RawSerialize(object structure)
{
int size = Marshal.SizeOf(structure);
IntPtr buffer = Marshal.AllocHGlobal(size);
Marshal.StructureToPtr(structure, buffer, true);
byte[] data = new byte[size];
Marshal.Copy(buffer, data, 0, size);
Marshal.FreeHGlobal(buffer);
return data;
}
I'm assuming this is because the structures don't define exactly how large the arrays are, so it cannot explicitly determine the size beforehand? It seems that since it cannot get that, it is not allocating the right amount of space for the structure and it ends up being too short when casting the structure to a pointer. I'm not sure on this. Why might this occur and what are possible alternatives?
Edit: The line throwing the error is
Marshal.StructureToPtr(structure, buffer, true);
Not possible because of the nested struct arrays. See When I try to use a structure containing an array of other structures, I get an exception. What's wrong?.
In C# it makes more sense to implement ISerializable and use the BinaryFormatter class to write the struct to disk.
ISerializable

Categories

Resources