How to read byte blocks into struct - c#

I have this resource file which I need to process, wich packs a set of files.
First, the resource file lists all the files contained within, plus some other data, such as in this struct:
struct FileEntry{
byte Value1;
char Filename[12];
byte Value2;
byte FileOffset[3];
float whatever;
}
So I would need to read blocks exactly this size.
I am using the Read function from FileStream, but how can I specify the size of the struct?
I used:
int sizeToRead = Marshal.SizeOf(typeof(Header));
and then pass this value to Read, but then I can only read a set of byte[] which I do not know how to convert into the specified values (well I do know how to get the single byte values... but not the rest of them).
Also I need to specify an unsafe context which I don't know whether it's correct or not...
It seems to me that reading byte streams is tougher than I thought in .NET :)
Thanks!

Assuming this is C#, I wouldn't create a struct as a FileEntry type. I would replace char[20] with strings and use a BinaryReader - http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx to read individual fields. You must read the data in the same order as it was written.
Something like:
class FileEntry {
byte Value1;
char[] Filename;
byte Value2;
byte[] FileOffset;
float whatever;
}
using (var reader = new BinaryReader(File.OpenRead("path"))) {
var entry = new FileEntry {
Value1 = reader.ReadByte(),
Filename = reader.ReadChars(12) // would replace this with string
FileOffset = reader.ReadBytes(3),
whatever = reader.ReadFloat()
};
}
If you insist having a struct, you should make your struct immutable and create a constructor with arguments for each of your field.

If you can use unsafe code:
unsafe struct FileEntry{
byte Value1;
fixed char Filename[12];
byte Value2;
fixed byte FileOffset[3];
float whatever;
}
public unsafe FileEntry Get(byte[] src)
{
fixed(byte* pb = &src[0])
{
return *(FileEntry*)pb;
}
}
The fixed keyword embeds the array in the struct. Since it is fixed, this can cause GC issues if you are constantly creating these and never letting them go. Keep in mind that the constant sizes are the n*sizeof(t). So the Filename[12] is allocating 24 bytes (each char is 2 bytes unicode) and FileOffset[3] is allocating 3 bytes. This matters if you're not dealing with unicode data on disk. I would recommend changing it to a byte[] and converting the struct to a usable class where you can convert the string.
If you can't use unsafe, you can do the whole BinaryReader approach:
public unsafe FileEntry Get(Stream src)
{
FileEntry fe = new FileEntry();
var br = new BinaryReader(src);
fe.Value1 = br.ReadByte();
...
}
The unsafe way is nearly instant, far faster, especially when you're converting a lot of structs at once. The question is do you want to use unsafe. My recommendation is only use the unsafe method if you absolutely need the performance boost.

Base on this article, only I have made it generic, this is how to marshal the data directly to the struct. Very useful on longer data types.
public static T RawDataToObject<T>(byte[] rawData) where T : struct
{
var pinnedRawData = GCHandle.Alloc(rawData,
GCHandleType.Pinned);
try
{
// Get the address of the data array
var pinnedRawDataPtr = pinnedRawData.AddrOfPinnedObject();
// overlay the data type on top of the raw data
return (T) Marshal.PtrToStructure(pinnedRawDataPtr, typeof(T));
}
finally
{
// must explicitly release
pinnedRawData.Free();
}
}
Example Usage:
[StructLayout(LayoutKind.Sequential)]
public struct FileEntry
{
public readonly byte Value1;
//you may need to play around with this one
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 12)]
public readonly string Filename;
public readonly byte Value2;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public readonly byte[] FileOffset;
public readonly float whatever;
}
private static void Main(string[] args)
{
byte[] data =;//from file stream or whatever;
//usage
FileEntry entry = RawDataToObject<FileEntry>(data);
}

Wrapping your FileStream with a BinaryReader will give you dedicated Read*() methods for primitive types:
http://msdn.microsoft.com/en-us/library/system.io.binaryreader.aspx
Out of my head, you could probably mark your struct with [StructLayout(LayoutKind.Sequential)] (to ensure proper representation in memory) and use a pointer in unsafe block to actually fill the struct C-style. Going unsafe is not recommended if you don't really need it (interop, heavy operations like image processing and so on) however.

Not a full answer (it's been covered I think), but a specific note on the filename:
The Char type is probably not a one-byte thing in C#, since .Net characters are unicode, meaning they support character values far beyond 255, so interpreting your filename data as Char[] array will give problems. So the first step is definitely to read that as Byte[12], not Char[12].
A straight conversion from byte array to char array is also not advised, though, since in binary indices like this, filenames that are shorter than the allowed 12 characters will probably be padded with '00' bytes, so a straight conversion will result in a string that's always 12 characters long and might end on these zero-characters.
However, simply trimming these zeroes off is not advised, since reading systems for such data usually simply read up to the first encountered zero, and the data behind that in the array might actually contain garbage if the writing system doesn't bother to specifically clear its buffer with zeroes before putting the string into it. It's something a lot of programs don't bother doing, since they assume the reading system will only interpret the string up to the first zero anyway.
So, assuming this is indeed such a typical zero-terminated (C-style) string, saved in a one-byte-per-character text encoding (like ASCII, DOS-437 or Win-1252), the second step is to cut off the string on the first zero. You can easily do this with Linq's TakeWhile function. Then the third and final step is to convert the resulting byte array to string with whatever that one-byte-per-character text encoding it's written with happens to be:
public String StringFromCStringArray(Byte[] readData, Encoding encoding)
{
return encoding.GetString(readData.TakeWhile(x => x != 0).ToArray());
}
As I said, the encoding will probably be something like pure ASCII, which can be accessed from Encoding.ASCII, standard US DOS encoding, which is Encoding.GetEncoding(437), or Windows-1252, the standard US / western Europe Windows text encoding, which you can retrieve with Encoding.GetEncoding("Windows-1252").

Related

Marshal C# struct containing variable length array of struct (also defined in C#) to byte array

There have been variants on this question, but the ones I found were "I have a struct in a C DLL". In this case, I have 100% C# code. I have a struct that contains a variable length array of structs that I am trying to marshal to a tightly packed byte array. I'm using structs and Marshal.StructureToPtr because I need a tightly packed array without all of the metadata that BinaryReader/Writer use to help it serialize and deserialize.
Here is the struct definition:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct CharacterSelect_Struct
{
public uint CharCount;
public uint TotalChars;
[MarshalAs(UnmanagedType.ByValArray, ArraySubType = UnmanagedType.Struct)]
public CharacterSelectEntry_Struct[] Entries;
public static CharacterSelect_Struct Initialize(uint totalChars, uint charCount)
{
return new CharacterSelect_Struct
{
CharCount = charCount,
TotalChars = totalChars,
Entries = 0 != charCount ? new CharacterSelectEntry_Struct[charCount] : null
};
}
}
This works great if Entries contains 1 element. If it contains 2 or more I still only ever get the contents of the first element.
Is there a way to serialize the above and get all of the contents of Entries, or do I have to manually serialize the above, and if so, other than doing, "Serialize each Entry separately and append to a list, then serialize the outer struct"?
I went with a struct and Marshal.StructureToPtr approach because it seemed easier than going with classes and manually writing all of the serialization and deserialization using BitConverter, but I'm wondering if there's a simpler way to give me what I need..

Strange unmarshalling behavior with union in C#

I want to export a C-like union into a byte array, like this :
[StructLayout(LayoutKind.Explicit)]
struct my_struct
{
[FieldOffset(0)]
public UInt32 my_uint;
[FieldOffset(0)]
public bool other_field;
}
public static void Main()
{
var test = new my_struct { my_uint = 0xDEADBEEF };
byte[] data = new byte[Marshal.SizeOf(test)];
IntPtr buffer = Marshal.AllocHGlobal(data.Length);
Marshal.StructureToPtr(test, buffer, false);
Marshal.Copy(buffer, data, 0, data.Length);
Marshal.FreeHGlobal(buffer);
foreach (byte b in data)
{
Console.Write("{0:X2} ", b);
}
Console.WriteLine();
}
The output we get (https://dotnetfiddle.net/gb1wRf) is 01 00 00 00 instead of the expected EF BE AD DE.
Now, what do we get if we change the other_field type to byte (for instance) ?
Oddly, we get the output we wanted in the first place, EF BE AD DE (https://dotnetfiddle.net/DnXyMP)
Moreover, if we swap the original two fields, we again get the same output we wanted (https://dotnetfiddle.net/ziSQ5W)
Why is this happening? Why would the order of the fields matter ? Is there a better (reliable) solution for doing the same thing ?
This is an inevitable side-effect of the way a structure is marshaled. Starting point is that the structure value is not blittable, a side-effect of it containing a bool. Which takes 1 byte of storage in the managed struct but 4 bytes in the marshaled struct (UnmanagedType.Bool).
So the struct value cannot just be copied in one fell swoop, the marshaller needs to convert each individual member. So the my_uint is first, producing 4 bytes. The other_field is next, also producing 4 bytes at the exact same address. Which overwrites everything that my_uint produced.
The bool type is an oddity in general, it never produces a blittable struct. Not even when you apply [MarshalAs(UnmanagedType.U1)]. Which in itself has an interesting effect on your test, you'll now see that the 3 upper bytes produced by my_int are preserved. But the result is still junk since the members are still converted one-by-one, now producing a single byte of value 0x01 at offset 0.
You can easily get what you want by declaring it as a byte instead, now the struct is blittable:
[StructLayout(LayoutKind.Explicit)]
struct my_struct {
[FieldOffset(0)]
public UInt32 my_uint;
[FieldOffset(0)]
private byte _other_field;
public bool other_field {
get { return _other_field != 0; }
set { _other_field = (byte)(value ? 1 : 0); }
}
}
I admit that I don't have an authoritative answer for why Marshal.StructureToPtr() behaves this way, other than that clear it is doing more than just copying bytes. Rather, it must be interpreting the struct itself, marshaling each field individually to the destination via the normal rules for interpreting that field. Since bool is defined to only ever be one of two values, the non-zero value gets mapped to true, which marshals to raw bytes as 0x00000001.
Note that if you really just want the raw bytes from the struct value, you can do the copying yourself instead of going through the Marshal class. For example:
var test = new my_struct { my_uint = 0xDEADBEEF };
byte[] data = new byte[Marshal.SizeOf(test)];
unsafe
{
byte* pb = (byte*)&test;
for (int i = 0; i < data.Length; i++)
{
data[i] = pb[i];
}
}
Console.WriteLine(string.Join(" ", data.Select(b => b.ToString("X2"))));
Of course, for that to work you will need to enable unsafe code for your project. You can either do that for the project in question, or build the above into a separate helper assembly where unsafe is less risky (i.e. where you don't mind enabling it for other code, and/or don't care about the assembly being verifiable, etc.).

Marshaling struct with dynamic array size (incorrect size) [duplicate]

How do I marshal this C++ type?
The ABS_DATA structure is used to associate an arbitrarily long data block with the length information. The declared length of the Data array is 1, but the actual length is given by the Length member.
typedef struct abs_data {
ABS_DWORD Length;
ABS_BYTE Data[ABS_VARLEN];
} ABS_DATA;
I tried the following code, but it's not working. The data variable is always empty and I'm sure it has data in there.
[System.Runtime.InteropServices.StructLayoutAttribute(System.Runtime.InteropServices.LayoutKind.Sequential, CharSet = System.Runtime.InteropServices.CharSet.Ansi)]
public struct abs_data
{
/// ABS_DWORD->unsigned int
public uint Length;
/// ABS_BYTE[1]
[System.Runtime.InteropServices.MarshalAsAttribute(System.Runtime.InteropServices.UnmanagedType.ByValTStr, SizeConst = 1)]
public string Data;
}
Old question, but I recently had to do this myself and all the existing answers are poor, so...
The best solution for marshaling a variable-length array in a struct is to use a custom marshaler. This lets you control the code that the runtime uses to convert between managed and unmanaged data. Unfortunately, custom marshaling is poorly-documented and has a few bizarre limitations. I'll cover those quickly, then go over the solution.
Annoyingly, you can't use custom marshaling on an array element of a struct or class. There's no documented or logical reason for this limitation, and the compiler won't complain, but you'll get an exception at runtime. Also, there's a function that custom marshalers must implement, int GetNativeDataSize(), which is obviously impossible to implement accurately (it doesn't pass you an instance of the object to ask its size, so you can only go off the type, which is of course variable size!) Fortunately, this function doesn't matter. I've never seen it get called, and it the custom marshaler works fine even if it returns a bogus value (one MSDN example has it return -1).
First of all, here's what I think your native prototype might look like (I'm using P/Invoke here, but it works for COM too):
// Unmanaged C/C++ code prototype (guess)
//void DoThing (ABS_DATA *pData);
// Guess at your managed call with the "marshal one-byte ByValArray" version
//[DllImport("libname.dll")] public extern void DoThing (ref abs_data pData);
Here's the naïve version of how you might have used a custom marshaler (which really ought to have worked). I'll get to the marshaler itself in a bit...
[StructLayout(LayoutKind.Sequential)]
public struct abs_data
{
// Don't need the length as a separate filed; managed arrays know it.
[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef=typeof(ArrayMarshaler<byte>))]
public byte[] Data;
}
// Now you can just pass the struct but it takes arbitrary sizes!
[DllImport("libname.dll")] public extern void DoThing (ref abs_data pData);
Unfortunately, at runtime, you apparently can't marshal arrays inside data structures as anything except SafeArray or ByValArray. SafeArrays are counted, but they look nothing like the (extremely common) format that you're looking for here. So that won't work. ByValArray, of course, requires that the length be known at compile time, so that doesn't work either (as you ran into). Bizarrely, though, you can use custom marshaling on array parameters, This is annoying because you have to put the MarshalAsAttribute on every parameter that uses this type, instead of just putting it on one field and having that apply everywhere you use the type containing that field, but c'est la vie. It looks like this:
[StructLayout(LayoutKind.Sequential)]
public struct abs_data
{
// Don't need the length as a separate filed; managed arrays know it.
// This isn't an array anymore; we pass an array of this instead.
public byte Data;
}
// Now you pass an arbitrary-sized array of the struct
[DllImport("libname.dll")] public extern void DoThing (
// Have to put this huge stupid attribute on every parameter of this type
[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef=typeof(ArrayMarshaler<abs_data>))]
// Don't need to use "ref" anymore; arrays are ref types and pass as pointer-to
abs_data[] pData);
In that example, I preserved the abs_data type, in case you want to do something special with it (constructors, static functions, properties, inheritance, whatever). If your array elements consisted of a complex type, you would modify the struct to represent that complex type. However, in this case, abs_data is basically just a renamed byte - it's not even "wrapping" the byte; as far as the native code is concerned it's more like a typedef - so you can just pass an array of bytes and skip the struct entirely:
// Actually, you can just pass an arbitrary-length byte array!
[DllImport("libname.dll")] public extern void DoThing (
// Have to put this huge stupid attribute on every parameter of this type
[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef=typeof(ArrayMarshaler<byte>))]
byte[] pData);
OK, so now you can see how to declare the array element type (if needed), and how to pass the array to an unmanaged function. However, we still need that custom marshaler. You should read "Implementing the ICustomMarshaler Interface" but I'll cover this here, with inline comments. Note that I use some shorthand conventions (like Marshal.SizeOf<T>()) that require .NET 4.5.1 or higher.
// The class that does the marshaling. Making it generic is not required, but
// will make it easier to use the same custom marshaler for multiple array types.
public class ArrayMarshaler<T> : ICustomMarshaler
{
// All custom marshalers require a static factory method with this signature.
public static ICustomMarshaler GetInstance (String cookie)
{
return new ArrayMarshaler<T>();
}
// This is the function that builds the managed type - in this case, the managed
// array - from a pointer. You can just return null here if only sending the
// array as an in-parameter.
public Object MarshalNativeToManaged (IntPtr pNativeData)
{
// First, sanity check...
if (IntPtr.Zero == pNativeData) return null;
// Start by reading the size of the array ("Length" from your ABS_DATA struct)
int length = Marshal.ReadInt32(pNativeData);
// Create the managed array that will be returned
T[] array = new T[length];
// For efficiency, only compute the element size once
int elSiz = Marshal.SizeOf<T>();
// Populate the array
for (int i = 0; i < length; i++)
{
array[i] = Marshal.PtrToStructure<T>(pNativeData + sizeof(int) + (elSiz * i));
}
// Alternate method, for arrays of primitive types only:
// Marshal.Copy(pNativeData + sizeof(int), array, 0, length);
return array;
}
// This is the function that marshals your managed array to unmanaged memory.
// If you only ever marshal the array out, not in, you can return IntPtr.Zero
public IntPtr MarshalManagedToNative (Object ManagedObject)
{
if (null == ManagedObject) return IntPtr.Zero;
T[] array = (T[])ManagedObj;
int elSiz = Marshal.SizeOf<T>();
// Get the total size of unmanaged memory that is needed (length + elements)
int size = sizeof(int) + (elSiz * array.Length);
// Allocate unmanaged space. For COM, use Marshal.AllocCoTaskMem instead.
IntPtr ptr = Marshal.AllocHGlobal(size);
// Write the "Length" field first
Marshal.WriteInt32(ptr, array.Length);
// Write the array data
for (int i = 0; i < array.Length; i++)
{ // Newly-allocated space has no existing object, so the last param is false
Marshal.StructureToPtr<T>(array[i], ptr + sizeof(int) + (elSiz * i), false);
}
// If you're only using arrays of primitive types, you could use this instead:
//Marshal.Copy(array, 0, ptr + sizeof(int), array.Length);
return ptr;
}
// This function is called after completing the call that required marshaling to
// unmanaged memory. You should use it to free any unmanaged memory you allocated.
// If you never consume unmanaged memory or other resources, do nothing here.
public void CleanUpNativeData (IntPtr pNativeData)
{
// Free the unmanaged memory. Use Marshal.FreeCoTaskMem if using COM.
Marshal.FreeHGlobal(pNativeData);
}
// If, after marshaling from unmanaged to managed, you have anything that needs
// to be taken care of when you're done with the object, put it here. Garbage
// collection will free the managed object, so I've left this function empty.
public void CleanUpManagedData (Object ManagedObj)
{ }
// This function is a lie. It looks like it should be impossible to get the right
// value - the whole problem is that the size of each array is variable!
// - but in practice the runtime doesn't rely on this and may not even call it.
// The MSDN example returns -1; I'll try to be a little more realistic.
public int GetNativeDataSize ()
{
return sizeof(int) + Marshal.SizeOf<T>();
}
}
Whew, that was long! Well, there you have it. I hope people see this, because there's a lot of bad answers and misunderstanding out there...
It is not possible to marshal structs containing variable-length arrays (but it is possible to marshal variable-length arrays as function parameters). You will have to read your data manually:
IntPtr nativeData = ... ;
var length = Marshal.ReadUInt32 (nativeData) ;
var bytes = new byte[length] ;
Marshal.Copy (new IntPtr ((long)nativeData + 4), bytes, 0, length) ;
If the data being saved isn't a string, you don't have to store it in a string. I usually do not marshal to a string unless the original data type was a char*. Otherwise a byte[] should do.
Try:
[MarshalAs(UnmanagedType.ByValArray, SizeConst=[whatever your size is]]
byte[] Data;
If you need to convert this to a string later, use:
System.Text.Encoding.UTF8.GetString(your byte array here).
Obviously, you need to vary the encoding to what you need, though UTF-8 usually is sufficient.
I see the problem now, you have to marshal a VARIABLE length array. The MarshalAs does not allow this and the array will have to be sent by reference.
If the array length is variable, your byte[] needs to be an IntPtr, so you would use,
IntPtr Data;
Instead of
[MarshalAs(UnmanagedType.ByValArray, SizeConst=[whatever your size is]]
byte[] Data;
You can then use the Marshal class to access the underlying data.
Something like:
uint length = yourABSObject.Length;
byte[] buffer = new byte[length];
Marshal.Copy(buffer, 0, yourABSObject.Data, length);
You may need to clean up your memory when you are finished to avoid a leak, though I suspect the GC will clean it up when yourABSObject goes out of scope. Anyway, here is the cleanup code:
Marshal.FreeHGlobal(yourABSObject.Data);
You are trying to marshal something that is a byte[ABS_VARLEN] as if it were a string of length 1. You'll need to figure out what the ABS_VARLEN constant is and marshal the array as:
[MarshalAs(UnmanagedType.LPArray, SizeConst = 1024)]
public byte[] Data;
(The 1024 there is a placeholder; fill in whatever the actual value of ASB_VARLEN is.)
In my opinion, it's simpler and more efficient to pin the array and take its address.
Assuming you need to pass abs_data to myNativeFunction(abs_data*):
public struct abs_data
{
public uint Length;
public IntPtr Data;
}
[DllImport("myDll.dll")]
static extern void myNativeFunction(ref abs_data data);
void CallNativeFunc(byte[] data)
{
GCHandle pin = GCHandle.Alloc(data, GCHandleType.Pinned);
abs_data tmp;
tmp.Length = data.Length;
tmp.Data = pin.AddrOfPinnedObject();
myNativeFunction(ref tmp);
pin.Free();
}

Deserialize an array of struct

Once again I'm receiving structs via UDP from a C++ programm,
Now I ported the structs to C#, Example:
[Serializable]
struct sample
{
public int in;
public byte[] arr;
public int[] arr2;
public float fl;
}
Ok so how does the Deserializer know when one array ends and the other begins?
Can specify somehow how big the array is?
I don't want to use fixed, since this makes my code unsafe, and I also can't use a Constructor since structs are not allowed to contain constructors without parameters.
Any suggestions?
//edit:
the arrays are known to be 32 and 4 long.
the problem is that I don't know how to pass this information to the deserialiser
then sender is C++ an works like this:
char* pr = &sample;
int i=0;
while (i<sizeof(sample))
{
udp.send(*(pr+i))
i++;
}
Now that you have told us that the lengths are of pre-defined length, then the following statement becomes clearer:
I don't know how to pass this information to the deserialiser
In fact, it becomes moot. There is no pre-defined serializer that is going to help you here. You have two options:
A: write your own serializer, and process the data now that you know the format - perhaps using BinaryReader:
using(var reader = new BinaryReader(source)) {
int in = reader.ReadInt32();
byte[] arr = reader.ReadBytes(32);
int[] arr2 = new int[4];
for(int i = 0 ; i < 4 ; i++) arr2[i] = reader.ReadInt32();
float fl = reader.ReadSingle();
var obj = /* something involving ^^^ */
}
B: buffer 56 bytes, and use really nasty unsafe / fixed / pointer-banging code
I strongly suggest the first. In particular, this will also allow you to address endianness if required.
IN THE NAME OF EVERYTHING SACRED TO YOU, DO NOT DO THIS:
using System;
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Explicit)]
unsafe struct sample
{
[FieldOffset(0)] public int #in;
[FieldOffset(4)] public fixed byte arr[32];
[FieldOffset(36)] public fixed int arr2[4];
[FieldOffset(52)] public float fl;
}
static class Program
{
unsafe static void Main()
{
byte[] buffer = new byte[56];
new Random().NextBytes(buffer); // some data...
sample result;
fixed(byte* tmp = buffer)
{
sample* ptr = (sample*) tmp;
result = ptr[0];
}
Console.WriteLine(result.#in);
Console.WriteLine(result.fl);
}
}
For larger buffers, you can treat ptr as an unsafe array of multiple sample, accessed by index:
int #in = ptr[i].#in;
(etc)
But honestly... there are so many things "evil" with that, I honestly don't know where to begin... just... unless you know absolutely what every line in there is doing, have done it before, and understand all the traps... DON'T EVEN THINK ABOUT IT
It depends on the format used to pass the structure over the wire.
If say it's json, then each field will have a key and the array will be surrounded by [].
If say it's xml, then you would expect an arr node with child nodes.
If it's some arbitrary format, you need to know the format.
Deserializers have some default behavior but if the passed data is not in default format, you need to tell them exactly how to deserialize.
And how is the raw data documented? I would expect there to be something to tell you here, for example, I might expect it to say the format is (purely an example)
4 bytes NBO Int32 (in)
4 bytes NBO Int32 (length of arr)
len bytes (arr)
4 bytes NBO (length of arr2)
4 * len bytes (arr2, each in NBO Int32)
4 bytes IEEE-754 (fl)
You need to know the format.
Edit: if the C++ arrays are of a known fixed length, then you simply need to know those lengths in advance.

Read binary file into a struct

I'm trying to read binary data using C#. I have all the information about the layout of the data in the files I want to read. I'm able to read the data "chunk by chunk", i.e. getting the first 40 bytes of data converting it to a string, get the next 40 bytes.
Since there are at least three slightly different version of the data, I would like to read the data directly into a struct. It just feels so much more right than by reading it "line by line".
I have tried the following approach but to no avail:
StructType aStruct;
int count = Marshal.SizeOf(typeof(StructType));
byte[] readBuffer = new byte[count];
BinaryReader reader = new BinaryReader(stream);
readBuffer = reader.ReadBytes(count);
GCHandle handle = GCHandle.Alloc(readBuffer, GCHandleType.Pinned);
aStruct = (StructType) Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(StructType));
handle.Free();
The stream is an opened FileStream from which I have began to read from. I get an AccessViolationException when using Marshal.PtrToStructure.
The stream contains more information than I'm trying to read since I'm not interested in data at the end of the file.
The struct is defined like:
[StructLayout(LayoutKind.Explicit)]
struct StructType
{
[FieldOffset(0)]
public string FileDate;
[FieldOffset(8)]
public string FileTime;
[FieldOffset(16)]
public int Id1;
[FieldOffset(20)]
public string Id2;
}
The examples code is changed from original to make this question shorter.
How would I read binary data from a file into a struct?
The problem is the strings in your struct. I found that marshaling types like byte/short/int is not a problem; but when you need to marshal into a complex type such as a string, you need your struct to explicitly mimic an unmanaged type. You can do this with the MarshalAs attrib.
For your example, the following should work:
[StructLayout(LayoutKind.Explicit)]
struct StructType
{
[FieldOffset(0)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
public string FileDate;
[FieldOffset(8)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 8)]
public string FileTime;
[FieldOffset(16)]
public int Id1;
[FieldOffset(20)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 66)] //Or however long Id2 is.
public string Id2;
}
Here is what I am using.This worked successfully for me for reading Portable Executable Format.It's a generic function, so T is your struct type.
public static T ByteToType<T>(BinaryReader reader)
{
byte[] bytes = reader.ReadBytes(Marshal.SizeOf(typeof(T)));
GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
T theStructure = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));
handle.Free();
return theStructure;
}
As Ronnie said, I'd use BinaryReader and read each field individually. I can't find the link to the article with this info, but it's been observed that using BinaryReader to read each individual field can be faster than Marshal.PtrToStruct, if the struct contains less than 30-40 or so fields. I'll post the link to the article when I find it.
The article's link is at: http://www.codeproject.com/Articles/10750/Fast-Binary-File-Reading-with-C
When marshaling an array of structs, PtrToStruct gains the upper-hand more quickly, because you can think of the field count as fields * array length.
I don't see any problem with your code.
just out of my head, what if you try to do it manually? does it work?
BinaryReader reader = new BinaryReader(stream);
StructType o = new StructType();
o.FileDate = Encoding.ASCII.GetString(reader.ReadBytes(8));
o.FileTime = Encoding.ASCII.GetString(reader.ReadBytes(8));
...
...
...
also try
StructType o = new StructType();
byte[] buffer = new byte[Marshal.SizeOf(typeof(StructType))];
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
Marshal.StructureToPtr(o, handle.AddrOfPinnedObject(), false);
handle.Free();
then use buffer[] in your BinaryReader instead of reading data from FileStream to see whether you still get AccessViolation exception.
I had no luck using the
BinaryFormatter, I guess I have to
have a complete struct that matches
the content of the file exactly.
That makes sense, BinaryFormatter has its own data format, completely incompatible with yours.
I had no luck using the BinaryFormatter, I guess I have to have a complete struct that matches the content of the file exactly. I realised that in the end I wasn't interested in very much of the file content anyway so I went with the solution of reading part of stream into a bytebuffer and then converting it using
Encoding.ASCII.GetString()
for strings and
BitConverter.ToInt32()
for the integers.
I will need to be able to parse more of the file later on but for this version I got away with just a couple of lines of code.
Try this:
using (FileStream stream = new FileStream(fileName, FileMode.Open))
{
BinaryFormatter formatter = new BinaryFormatter();
StructType aStruct = (StructType)formatter.Deserialize(filestream);
}
Reading straight into structs is evil - many a C program has fallen over because of different byte orderings, different compiler implementations of fields, packing, word size.......
You are best of serialising and deserialising byte by byte. Use the build in stuff if you want or just get used to BinaryReader.
I had structure:
[StructLayout(LayoutKind.Explicit, Size = 21)]
public struct RecordStruct
{
[FieldOffset(0)]
public double Var1;
[FieldOffset(8)]
public byte var2
[FieldOffset(9)]
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 12)]
public string String1;
}
}
and I received "incorrectly aligned or overlapped by non-object".
Based on that I found:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/2f9ffce5-4c64-4ea7-a994-06b372b28c39/strange-issue-with-layoutkindexplicit?forum=clr
OK. I think I understand what's going on here. It seems like the
problem is related to the fact that the array type (which is an object
type) must be stored at a 4-byte boundary in memory. However, what
you're really trying to do is serialize the 6 bytes separately.
I think the problem is the mix between FieldOffset and serialization
rules. I'm thinking that structlayout.sequential may work for you,
since it doesn't actually modify the in-memory representation of the
structure. I think FieldOffset is actually modifying the in-memory
layout of the type. This causes problems because the .NET framework
requires object references to be aligned on appropriate boundaries (it
seems).
So my struct was defined as explicit with:
[StructLayout(LayoutKind.Explicit, Size = 21)]
and thus my fields had specified
[FieldOffset(<offset_number>)]
but when you change your struct to Sequentional, you can get rid of those offsets and the error will disappear. Something like:
[StructLayout(LayoutKind.Sequential, Size = 21)]
public struct RecordStruct
{
public double Var1;
public byte var2;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 12)]
public string String1;
}
}

Categories

Resources