Reading binary file data into List of Structs - c#

I have a simple procedure to write a list of library books (of type TBook) to a binary file as follows:
static void SaveToFile(List<TBook> lib)
{
FileStream currentFile;
BinaryWriter writerToFile;
currentFile = new FileStream("MyLibrary.bin", FileMode.Create);
writerToFile = new BinaryWriter(currentFile);
foreach (TBook book in lib)
{
writerToFile.Write(book.Title);
writerToFile.Write(book.Author);
writerToFile.Write(book.Genre);
writerToFile.Write(book.BookID);
}
writerToFile.Close();
currentFile.Close();
}
However, when trying to read the binary file and load contents into a list, i get an error:
An unhandled exception of type 'System.IO.EndOfStreamException' occurred in mscorlib.dll
Additional information: Unable to read beyond the end of the stream.
Here is my subroutine that attempts to read the Binary File back into a Struct again:
static List<TBook> LoadDataFromFile (List<TBook>library)
{
FileStream currentFile;
BinaryReader readerFromFile;
currentFile = new FileStream("MyLibrary.bin", FileMode.Open);
readerFromFile= new BinaryReader(currentFile);
while (currentFile.Position < currentFile.Length)
{
TBook CurrentRecord = new TBook();
CurrentRecord.Title = readerFromFile.ReadString();
CurrentRecord.Author = readerFromFile.ReadString();
CurrentRecord.Genre = readerFromFile.ReadString();
CurrentRecord.BookID = readerFromFile.ReadInt16();
library.Add(CurrentRecord);
}
readerFromFile.Close();
currentFile.Close();
return library;
}
I assume the issue is with the line:
while (currentFile.Position < currentFile.Length)
Note: The Struct is setup as follows:
struct TBook
{
public string Title;
public string Author;
public string Genre;
public int BookID;
}

When you are serializing data as binary, your deserialization code must follow serialization code exactly; otherwise your deserializer starts reading junk from adjacent positions, eventually causing an exception or silently populating your structures with wrong data.
This pair of calls is mismatched:
writerToFile.Write(book.BookID);
....
CurrentRecord.BookID = readerFromFile.ReadInt16();
It is hard to see this problem, because BinaryWriter overloads the Write method. Since book.BookID is of type int, an alias for Int32, the call to Write is resolved to Write(Int32). Therefore, the corresponding read must also read Int32, not Int16:
CurrentRecord.BookID = readerFromFile.ReadInt32();

Related

Convert.ToBase64String throws 'System.OutOfMemoryException' for byte [] (file: large size)

I am trying to convert byte[] to base64 string format so that i can send that information to third party. My code as below:
byte[] ByteArray = System.IO.File.ReadAllBytes(path);
string base64Encoded = System.Convert.ToBase64String(ByteArray);
I am getting below error:
Exception of type 'System.OutOfMemoryException' was thrown. Can you
help me please ?
Update
I just spotted #PanagiotisKanavos' comment pointing to Is there a Base64Stream for .NET?. This does essentially the same thing as my code below attempts to achieve (i.e. allows you to process the file without having to hold the whole thing in memory in one go), but without the overhead/risk of self-rolled code / rather using a standard .Net library method for the job.
Original
The below code will create a new temporary file containing the Base64 encoded version of your input file.
This should have a lower memory footprint, since rather than doing all data at once, we handle it several bytes at a time.
To avoid holding the output in memory, I've pushed that back to a temp file, which is returned. When you later need to use that data for some other process, you'd need to stream it (i.e. so that again you're not consuming all of this data at once).
You'll also notice that I've used WriteLine instead of Write; which will introduce non base64 encoded characters (i.e. the line breaks). That's deliberate, so that if you consume the temp file with a text reader you can easily process it line by line.
However, you can amend per your needs.
void Main()
{
var inputFilePath = #"c:\temp\bigfile.zip";
var convertedDataPath = ConvertToBase64TempFile(inputFilePath);
Console.WriteLine($"Take a look in {convertedDataPath} for your converted data");
}
//inputFilePath = where your source file can be found. This is not impacted by the below code
//bufferSizeInBytesDiv3 = how many bytes to read at a time (divided by 3); the larger this value the more memory is required, but the better you'll find performance. The Div3 part is because we later multiple this by 3 / this ensures we never have to deal with remainders (i.e. since 3 bytes = 4 base64 chars)
public string ConvertToBase64TempFile(string inputFilePath, int bufferSizeInBytesDiv3 = 1024)
{
var tempFilePath = System.IO.Path.GetTempFileName();
using (var fileStream = File.Open(inputFilePath,FileMode.Open))
{
using (var reader = new BinaryReader(fileStream))
{
using (var writer = new StreamWriter(tempFilePath))
{
byte[] data;
while ((data = reader.ReadBytes(bufferSizeInBytesDiv3 * 3)).Length > 0)
{
writer.WriteLine(System.Convert.ToBase64String(data)); //NB: using WriteLine rather than Write; so when consuming this content consider removing line breaks (I've used this instead of write so you can easily stream the data in chunks later)
}
}
}
}
return tempFilePath;
}

Binary file read pre-defined number of Byte/ Bytes using C#

Scenario
I have a Binary file which is a output from a certain system. The vendor has provided us with the description of the file encoding. Its very complicated because the encoding follows a certain methodology. For eg. the first Byte is ISO coded, we need to decode it first, if the value matches the provided list then it has some meaning. Then the next 15 Bytes also ISO encoded, we need to decode it and compare. Similarly after certain position, few Bytes are Binary encoded.. so on and so forth.
Action so far
I will be using C# WinForm Application. So far I have looked at various documents and all point to FileStream/ BinaryReader combination, since my file size are in the range of 1G to 1.8G. I cannot put the whole file in a Byte[] either.
Problem
I am facing issue in reading the file Byte by Byte. According to the above scenario, first I need to read only 1 Byte then 15 Bytes then 10 Bytes and so on and so forth. How to accomplish this. Thanks in advance for your help.
BinaryReader is the way to go, as it uses a stream the memory usage will be low.
Now you can do something like below :
internal struct MyHeader
{
public byte FirstByte;
// etc
}
internal class MyFormat
{
private readonly string _fileName;
private MyFormat(string fileName)
{
_fileName = fileName;
}
public MyHeader Header { get; private set; }
public string FileName
{
get { return _fileName; }
}
public static MyFormat FromFileName(string fileName)
{
if (fileName == null) throw new ArgumentNullException("fileName");
// read the header of your file
var header = new MyHeader();
using (var reader = new BinaryReader(File.OpenRead(fileName)))
{
byte b1 = reader.ReadByte();
if (b1 != 0xAA)
{
// return null or throw an exception
}
header.FirstByte = b1;
// you can also read block of bytes with a BinaryReader
var readBytes = reader.ReadBytes(10);
// etc ... whenever something's wrong return null or throw an exception
}
// when you're done reading your header create and return the object
var myFormat = new MyFormat(fileName);
myFormat.Header = header;
// the rest of the object is delivered only when needed, see method below
return myFormat;
}
public object GetBigContent()
{
var content = new object();
// use FileName and Header property to get your big content and return it
// again, use a BinaryReader with 'using' statement here
return content;
}
}
Explanations
Call MyFormat.FromFileName to create one of these object, inside it :
you parse the header, whenever an error occurs return null or throw an exception
once your header is parsed you create the object and return it and that's it
Since you just read the header, provide a way for reading the bigger parts in the file.
Pseudo-example:
Use GetBigContent or whatever you want to call it whenever you need to read a large part of it.
Using Header and FileName inside that method you will have everything you need to return a content from this file on-demand.
By using this approach,
you quickly return a valid object by only parsing its header
you do not consume 1.8Gb at first call
you return only what the user needs, on-demand
For your encoding-related stuff the Encoding class will probably be helpful to you :
http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx
BinaryReader.ReadBytes Method
Reads the specified number of bytes from the current stream into a byte array and advances the current position by that number of bytes.
public virtual byte[] ReadBytes(int count)
http://msdn.microsoft.com/en-us/library/system.io.binaryreader.readbytes(v=vs.110).aspx

google's protocol buffer from c# to java - Protocol message tag had invalid wire type

I am creating a stream in C# and trying to read it in java, but I receive the error: "Protocol message tag had invalid wire type." when i read it in my java code the object created in c#.
Details:
I started from an equal .proto file (see below) to create the correspondent .java file and .cs file (compiling using the protoc for java in version "protobuf-2.4.1" and the protobuf-csharp-port-2.4.1.473-full-binaries for c#).
I succeed to create the addressbook.java and the addressbook.cs.
The object is created in c# and written to a file using the following c# code:
[...]
byte[] bytes;
//Create a builder to start building a message
Person.Builder newContact = Person.CreateBuilder();
//Set the primitive properties
newContact.SetId(1)
.SetName("Foo")
.SetEmail("foo#bar");
//Now add an item to a list (repeating) field
newContact.AddPhone(
//Create the child message inline
Person.Types.PhoneNumber.CreateBuilder().SetNumber("555-1212").Build()
);
//Now build the final message:
Person person = newContact.Build();
newContact = null;
using(MemoryStream stream = new MemoryStream())
{
//Save the person to a stream
person.WriteTo(stream);
bytes = stream.ToArray();
//save this to a file (by me)
ByteArrayToFile("personStreamFromC#", bytes);
[...]
I copy the created file "personStreamFromC#" to my java solution and try to read it using the following java code:
AddressBook.Builder addressBook = AddressBook.newBuilder();
// Read the existing address book.
try {
FileInputStream input = new FileInputStream(args[0]);
byte[] data = IOUtils.toByteArray(input);
addressBook.mergeFrom(data);
// Read the existing address book.
AddressBook addressBookToReadFrom =
AddressBook.parseFrom(new FileInputStream(args[0]));
Print(addressBookToReadFrom);
}
But I get the following message:
Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException: Protocol message
tag had invalid wire type. at
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:78)
at
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
at
com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438)
at
com.example.tutorial.AddressBookProtos$Person$Builder.mergeFrom(AddressBookProtos.java:1034)
at
com.example.tutorial.AddressBookProtos$Person$Builder.mergeFrom(AddressBookProtos.java:1)
at
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
at
com.example.tutorial.AddressBookProtos$AddressBook$Builder.mergeFrom(AddressBookProtos.java:1715)
at
com.example.tutorial.AddressBookProtos$AddressBook$Builder.mergeFrom(AddressBookProtos.java:1)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:300)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
at
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:162)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:716)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
at
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:153)
at
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:709)
at AddPerson.main(test.java:104)
Below the .proto file:
package tutorial;
message Person {
required string name = 1;
required int32 id = 2; // Unique ID number for this person.
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
message AddressBook {
repeated Person person = 1;
}
Any ideas ??
You write Person object to file in C#, but then read AddressBook in Java, I don't think this is correct. Try following in your Java code:
Person.parseFrom(new FileInputStream(args[0]));
One common mistake that can cause invalid wire-type errors (especially when using files) is: over-writing an existing file without truncating it. We can't see your ByteArrayToFile, but frankly File.WriteAllBytes may be an easier option. The problem is that if the new data is smaller than the original contents, any remaining extra bytes are essentially garbage.
My advice:
check if you can deserialize it in c#; if you can't, the error is certainly in the file handling
if it works in c#, check how you are getting the file to the java code: are you copying it around anywhere?
and check you are using binary (not text) processing at all stages

Highly customized serializing

I have an object I'd like to serialize to a memory buffer, which is then sent via UART to an embedded device.
I'm working in a C# environment on windows.
What I'd like to do is to create two classes that look like this:
class StatusElement
{
byte statusPart1;
byte statusPart2;
}
class DeviceCommand
{
byte Address;
byte Length;
StatusElement[] statusElements; // Can have an arbitrary number of elements in it
}
I'd like to use a serialize, preferably something based on c# serialization, to convert the second class to a byte stream.
The problem is that the embedded device is hard-coded to accept an exact sequence (AddressByte, LengthByte .... ErrorCorrectionByte) so I cannot use the regular C# serialization, which adds serialization metadata in the stream. This also rules out other serializes like Protobuf.
So my question is:
Is it possible to customize the c# serialization to get the output I need? How?
--- Update ---
Thanks everyone for the help.
After consideration I’ve decided to implement my own mini-serializer, using reflection and per-type handler. More complex but gives me more flexibility and automation capabilities.
use a MemoryStream to manully serialize your object.
private byte[] Serialize()
{
using (var ms = new MemoryStream())
{
ms.WriteByte(Address);
ms.WriteByte(Length);
foreach (var element in statusElements)
{
ms.WriteByte(element.statusPart1);
ms.WriteByte(element.statusPart2);
}
return ms.ToArray();
}
}
Likewise for deserialization:
private static DeviceCommand Deserialize(byte[] input)
{
DeviceCommand result = new DeviceCommand();
using (var ms = new MemoryStream(input))
{
result.Address = ms.ReadByte();
result.Length = ms.ReadByte();
//assuming .Length contains the number of statusElements:
result.statusElemetns = new StatusElement[result.Length];
for (int i = 0; i < result.Length; i++)
{
result.statusElements[i] = new StatusElement();
result.statusElements[i].statusPart1 = ms.ReadByte();
result.statusElements[i].statusPart2 = ms.ReadByte();
}
}
return result;
}
If you need only to write bytes or byte arrays, you can use the MemoryStream directly. If you want to use other .NET base types, access your Stream with a System.IO.BinaryWriter / BinaryReader. This class is used by the System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
for binary serialization and deserialization.

Check output size using .NET XmlTextWriter

I need to generate an XML file and i need to stick as much data into it as possible BUT there is a filesize limit. So i need to keep inserting data until something says no more. How do i figure out the XML file size without repeatably writing it to file?
I agree with John Saunders. Here's some code that will basically do what he's talking about but as an XmlSerializer except as a FileStream and uses a MemoryStream as intermediate storage. It may be more effective to extend stream though.
public class PartitionedXmlSerializer<TObj>
{
private readonly int _fileSizeLimit;
public PartitionedXmlSerializer(int fileSizeLimit)
{
_fileSizeLimit = fileSizeLimit;
}
public void Serialize(string filenameBase, TObj obj)
{
using (var memoryStream = new MemoryStream())
{
// serialize the object in the memory stream
using (var xmlWriter = XmlWriter.Create(memoryStream))
new XmlSerializer(typeof(TObj))
.Serialize(xmlWriter, obj);
memoryStream.Seek(0, SeekOrigin.Begin);
var extensionFormat = GetExtensionFormat(memoryStream.Length);
var buffer = new char[_fileSizeLimit];
var i = 0;
// split the stream into files
using (var streamReader = new StreamReader(memoryStream))
{
int readLength;
while ((readLength = streamReader.Read(buffer, 0, _fileSizeLimit)) > 0)
{
var filename
= Path.ChangeExtension(filenameBase,
string.Format(extensionFormat, i++));
using (var fileStream = new StreamWriter(filename))
fileStream.Write(buffer, 0, readLength);
}
}
}
}
/// <summary>
/// Gets the a file extension formatter based on the
/// <param name="fileLength">length of the file</param>
/// and the max file length
/// </summary>
private string GetExtensionFormat(long fileLength)
{
var numFiles = fileLength / _fileSizeLimit;
var extensionLength = Math.Ceiling(Math.Log10(numFiles));
var zeros = string.Empty;
for (var j = 0; j < extensionLength; j++)
{
zeros += "0";
}
return string.Format("xml.part{{0:{0}}}", zeros);
}
}
To use it, you'd initialize it with the max file length and then serialize using the base file path and then the object.
public class MyType
{
public int MyInt;
public string MyString;
}
public void Test()
{
var myObj = new MyType { MyInt = 42,
MyString = "hello there this is my string" };
new PartitionedXmlSerializer<MyType>(2)
.Serialize("myFilename", myObj);
}
This particular example will generate an xml file partitioned into
myFilename.xml.part001
myFilename.xml.part002
myFilename.xml.part003
...
myFilename.xml.part110
In general, you cannot break XML documents at arbitrary locations, even if you close all open tags.
However, if what you need is to split an XML document over multiple files, each of no more than a certain size, then you should create your own subtype of the Stream class. This "PartitionedFileStream" class could write to a particular file, up to the size limit, then create a new file, and write to that file, up to the size limit, etc.
This would leave you with multiple files which, when concatenated, make up a valid XML document.
In the general case, closing tags will not work. Consider an XML format that must contain one element A followed by one element B. If you closed the tags after writing element A, then you do not have a valid document - you need to have written element B.
However, in the specific case of a simple site map file, it may be possible to just close the tags.
You can ask the XmlTextWriter for it's BaseStream, and check it's Position.
As the other's pointed out, you may need to reserve some headroom to properly close the Xml.

Categories

Resources