I need to generate an XML file and i need to stick as much data into it as possible BUT there is a filesize limit. So i need to keep inserting data until something says no more. How do i figure out the XML file size without repeatably writing it to file?
I agree with John Saunders. Here's some code that will basically do what he's talking about but as an XmlSerializer except as a FileStream and uses a MemoryStream as intermediate storage. It may be more effective to extend stream though.
public class PartitionedXmlSerializer<TObj>
{
private readonly int _fileSizeLimit;
public PartitionedXmlSerializer(int fileSizeLimit)
{
_fileSizeLimit = fileSizeLimit;
}
public void Serialize(string filenameBase, TObj obj)
{
using (var memoryStream = new MemoryStream())
{
// serialize the object in the memory stream
using (var xmlWriter = XmlWriter.Create(memoryStream))
new XmlSerializer(typeof(TObj))
.Serialize(xmlWriter, obj);
memoryStream.Seek(0, SeekOrigin.Begin);
var extensionFormat = GetExtensionFormat(memoryStream.Length);
var buffer = new char[_fileSizeLimit];
var i = 0;
// split the stream into files
using (var streamReader = new StreamReader(memoryStream))
{
int readLength;
while ((readLength = streamReader.Read(buffer, 0, _fileSizeLimit)) > 0)
{
var filename
= Path.ChangeExtension(filenameBase,
string.Format(extensionFormat, i++));
using (var fileStream = new StreamWriter(filename))
fileStream.Write(buffer, 0, readLength);
}
}
}
}
/// <summary>
/// Gets the a file extension formatter based on the
/// <param name="fileLength">length of the file</param>
/// and the max file length
/// </summary>
private string GetExtensionFormat(long fileLength)
{
var numFiles = fileLength / _fileSizeLimit;
var extensionLength = Math.Ceiling(Math.Log10(numFiles));
var zeros = string.Empty;
for (var j = 0; j < extensionLength; j++)
{
zeros += "0";
}
return string.Format("xml.part{{0:{0}}}", zeros);
}
}
To use it, you'd initialize it with the max file length and then serialize using the base file path and then the object.
public class MyType
{
public int MyInt;
public string MyString;
}
public void Test()
{
var myObj = new MyType { MyInt = 42,
MyString = "hello there this is my string" };
new PartitionedXmlSerializer<MyType>(2)
.Serialize("myFilename", myObj);
}
This particular example will generate an xml file partitioned into
myFilename.xml.part001
myFilename.xml.part002
myFilename.xml.part003
...
myFilename.xml.part110
In general, you cannot break XML documents at arbitrary locations, even if you close all open tags.
However, if what you need is to split an XML document over multiple files, each of no more than a certain size, then you should create your own subtype of the Stream class. This "PartitionedFileStream" class could write to a particular file, up to the size limit, then create a new file, and write to that file, up to the size limit, etc.
This would leave you with multiple files which, when concatenated, make up a valid XML document.
In the general case, closing tags will not work. Consider an XML format that must contain one element A followed by one element B. If you closed the tags after writing element A, then you do not have a valid document - you need to have written element B.
However, in the specific case of a simple site map file, it may be possible to just close the tags.
You can ask the XmlTextWriter for it's BaseStream, and check it's Position.
As the other's pointed out, you may need to reserve some headroom to properly close the Xml.
Related
I have a simple procedure to write a list of library books (of type TBook) to a binary file as follows:
static void SaveToFile(List<TBook> lib)
{
FileStream currentFile;
BinaryWriter writerToFile;
currentFile = new FileStream("MyLibrary.bin", FileMode.Create);
writerToFile = new BinaryWriter(currentFile);
foreach (TBook book in lib)
{
writerToFile.Write(book.Title);
writerToFile.Write(book.Author);
writerToFile.Write(book.Genre);
writerToFile.Write(book.BookID);
}
writerToFile.Close();
currentFile.Close();
}
However, when trying to read the binary file and load contents into a list, i get an error:
An unhandled exception of type 'System.IO.EndOfStreamException' occurred in mscorlib.dll
Additional information: Unable to read beyond the end of the stream.
Here is my subroutine that attempts to read the Binary File back into a Struct again:
static List<TBook> LoadDataFromFile (List<TBook>library)
{
FileStream currentFile;
BinaryReader readerFromFile;
currentFile = new FileStream("MyLibrary.bin", FileMode.Open);
readerFromFile= new BinaryReader(currentFile);
while (currentFile.Position < currentFile.Length)
{
TBook CurrentRecord = new TBook();
CurrentRecord.Title = readerFromFile.ReadString();
CurrentRecord.Author = readerFromFile.ReadString();
CurrentRecord.Genre = readerFromFile.ReadString();
CurrentRecord.BookID = readerFromFile.ReadInt16();
library.Add(CurrentRecord);
}
readerFromFile.Close();
currentFile.Close();
return library;
}
I assume the issue is with the line:
while (currentFile.Position < currentFile.Length)
Note: The Struct is setup as follows:
struct TBook
{
public string Title;
public string Author;
public string Genre;
public int BookID;
}
When you are serializing data as binary, your deserialization code must follow serialization code exactly; otherwise your deserializer starts reading junk from adjacent positions, eventually causing an exception or silently populating your structures with wrong data.
This pair of calls is mismatched:
writerToFile.Write(book.BookID);
....
CurrentRecord.BookID = readerFromFile.ReadInt16();
It is hard to see this problem, because BinaryWriter overloads the Write method. Since book.BookID is of type int, an alias for Int32, the call to Write is resolved to Write(Int32). Therefore, the corresponding read must also read Int32, not Int16:
CurrentRecord.BookID = readerFromFile.ReadInt32();
as you can imagine I'm adressing my question to you as I wasn't able to solve my problem doing research. Also I'm rather unexperienced so maybe you can help.
At work I have a simulation program that produces huge amount of data so I want to write out every time increment as VTK file using binary. For reasons of acquiring the data I want to implement this vtkWrite by myself.
Therefore as you might know, I need to write some lines in ASCII-format that contain the declaration of the data part that I need to write in binary format.
Up to now my code looks like this:
public void WriteVTKBin(int increment, string output_dir, bool format)
{
int data_structure = 12;
int[,] ElementTypes = new int[hex8Elements.numberOfElements, 1];
for (int ii = 0; ii < ElementTypes.Length; ii++) ElementTypes[ii, 0] = data_structure;
string File_name = "output_bin_" + increment + ".vtk";
var output = new FileStream(#output_dir + "/" + File_name, FileMode.Create);
var output_Stream = new StreamWriter(output);
using (output)
{
output_Stream.WriteLine("# vtk DataFile Version 3.0");
output_Stream.WriteLine("vtk-Output for increment");
if (format) output_Stream.WriteLine("BINARY");
else output_Stream.WriteLine("ASCII");
output_Stream.WriteLine("DATASET UNSTRUCTURED_GRID");
output_Stream.WriteLine("POINTS " + nodes.numberOfNodes + " double");
}
var output_rest = new FileStream(#output_dir + "/" + File_name, FileMode.Append);
var BinWriter = new BinaryWriter(output_rest);
using (output_rest)
{
BinWriter.Write(GetBytesDouble(GetPointsOutput()));
}
}
}
The argument taken from the BinaryWriter is a ByteArray that I produce using a different method.
My idea was to initialize the file with FileMode.Create to overwrite potentially existing old files and then write the header section. Afterwards closing the file, open it again using FileMode.Append and writing my binary data. This I want to repeat until all fields I want to write out are contained in the vtk-file.
The Problem is: The BinaryWriter overwrites my header even though it's in Append-Mode and when I want to close it and write another ASCII-line it tells me that it cannot access an already closed file.
So is there a solution for my approach or is there even a way more sophisticated way to deal with such types of output?
Thank you very much in advance,
RR
You can simply transform your ascii strings to binary and write them to file as such:
ByteArray = Encoding.XXX.GetBytes(text)
Where XXX is the encoding you want
BinaryWriter.Write(ByteArray)
When you open the file it will try to decode it with ascii and your ascii strings will be there for you to read
You have missed using for StreamWriter and BinaryWriter. Change your code like this and it will work:
using (var fstream = new FileStream(..., FileMode.Create))
using (var writer = new StreamWriter(fstream))
{
// Write headers
}
using (var fstream = new FileStream(..., FileMode.Append)
using (var writer = new BinaryWriter(fstream))
{
// Write binary data
}
Also you can encode your headers to byte[] and use only BinaryWriter.
using (var fstream = new FileStream(..., FileMode.Create)
using (var writer = new BinaryWriter(fstream))
{
// Write headers
writer.Write(Encoding.ASCII.GetBytes("..."));
// Write binary data
}
Or just use BinaryWriter to write string.
using (var fstream = new FileStream(..., FileMode.Create)
using (var writer = new BinaryWriter(fstream))
{
// Write headers
writer.Write("...");
// Write binary data
}
I'm attempting to split a PDF file page by page, and get each page file's byte array. However, I'm having trouble converting each page to byte array in iText version 7.0.4 for C#.
Methods referenced in other solutions rely on PdfWriter.GetInstance or PdfCopy, which seems to no longer exist in iText version 7.0.4.
I've gone through iText's sample codes and API documents, but I have not been able to extract any useful information out of them.
using (Stream stream = new MemoryStream(pdfBytes))
using (PdfReader reader = new PdfReader(stream))
using (PdfDocument pdfDocument = new PdfDocument(reader))
{
PdfSplitter splitter = new PdfSplitter(pdfDocument);
// My Attempt #1 - None of the document's functions seem to be of help.
foreach (PdfDocument splitPage in splitter.SplitByPageCount(1))
{
// ??
}
// My Attempt #2 - GetContentBytes != pdf file bytes.
for (int i = 1; i <= pdfDocument.GetNumberOfPages(); i++)
{
PdfPage page = pdfDocument.GetPage(i);
byte[] bytes = page.GetContentBytes();
}
}
Any help would be much appreciated.
Your approach of using PdfSplitter is one of the best ways to approach your task. Maybe not so much is available out of the box, but PdfSplitter is highly customizable and if you take a look at the implementation or simply the API, it becomes clear which are correct points for injecting your own customized behavior.
You should override GetNextPdfWriter to provide any output media you want the documents to be created at. You can also use IDocumentReadyListener to define the action that will be performed once another document is ready.
I am attaching one of the implementations that can achieve your goal:
class ByteArrayPdfSplitter : PdfSplitter {
private MemoryStream currentOutputStream;
public ByteArrayPdfSplitter(PdfDocument pdfDocument) : base(pdfDocument) {
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
currentOutputStream = new MemoryStream();
return new PdfWriter(currentOutputStream);
}
public MemoryStream CurrentMemoryStream {
get { return currentOutputStream; }
}
public class DocumentReadyListender : IDocumentReadyListener {
private ByteArrayPdfSplitter splitter;
public DocumentReadyListender(ByteArrayPdfSplitter splitter) {
this.splitter = splitter;
}
public void DocumentReady(PdfDocument pdfDocument, PageRange pageRange) {
pdfDocument.Close();
byte[] contents = splitter.CurrentMemoryStream.ToArray();
String pageNumber = pageRange.ToString();
}
}
}
The calls would be basically as you did, but with custom document ready event:
PdfDocument docToSplit = new PdfDocument(new PdfReader(path));
ByteArrayPdfSplitter splitter = new ByteArrayPdfSplitter(docToSplit);
splitter.SplitByPageCount(1, new ByteArrayPdfSplitter.DocumentReadyListender(splitter));
I'm working on a project in .NET 4 and Web API 2, adding a file upload field to an already-implemented controller. I've found that Web API doesn't support multipart/form-data POST requests by default, and I need to write my own formatter class to handle them. Fine.
Ideally, what I'd like to do is use the existing formatter to populate the model, then add the file data before returning the object. This file upload field is being attached to six separate models, all of which are very complex (classes containing lists of classes, enums, guids, etc.). I've run into a few snags...
I tried implementing it manually using the source code for FormUrlEncodedMediaTypeFormatter.cs as an example. I found that it constructs a list of KeyValue pairs for each field (which I can easily do), then parses them using FormUrlEncodedJson.Parse(). I can't use FormUrlEncodedJson, because it's (for some reason?) marked Internal.
I started implementing my own parser, but when I hit about line 50, I thought to myself: I must be doing something wrong. There must be some way to populate the object with one of the existing Formatters, right? Surely they didn't expect us to write a new formatter for every single model or, even worse, writing our own more-fragile version of FormUrlEncodedJson.Parse()?
What am I missing here? I'm stumped.
// Multipart/form-data formatter adapted from:
// http://stackoverflow.com/questions/17924655/how-create-multipartformformatter-for-asp-net-4-5-web-api
public class MultipartFormFormatter : FormUrlEncodedMediaTypeFormatter
{
private const string StringMultipartMediaType = "multipart/form-data";
//private const string StringApplicationMediaType = "application/octet-stream";
public MultipartFormFormatter()
{
this.SupportedMediaTypes.Add(new MediaTypeHeaderValue(StringMultipartMediaType));
//this.SupportedMediaTypes.Add(new MediaTypeHeaderValue(StringApplicationMediaType));
}
public override bool CanReadType(Type type)
{
return true;
}
public override bool CanWriteType(Type type)
{
return false;
}
public override async Task<object> ReadFromStreamAsync(Type type, Stream readStream, HttpContent content, IFormatterLogger formatterLogger)
{
var parts = await content.ReadAsMultipartAsync();
var obj = Activator.CreateInstance(type);
var propertiesFromObj = obj.GetType().GetRuntimeProperties().ToList();
// *****
// * Populate obj using FormUrlEncodedJson.Parse()? How do I do this?
// *****
foreach (var property in propertiesFromObj.Where(x => x.PropertyType == typeof(AttachedDocument)))
{
var file = parts.Contents.FirstOrDefault(x => x.Headers.ContentDisposition.Name.Contains(property.Name));
if (file == null || file.Headers.ContentLength <= 0) continue;
try
{
var fileModel = new AttachedDocument()
{
ServerFilePath = ReadToTempFile(file.ReadAsStreamAsync().Result),
};
property.SetValue(obj, fileModel);
}
catch (Exception e)
{
// TODO: proper error handling
}
}
return obj;
}
/// <summary>
/// Reads a file from the stream and writes it to a temporary directory
/// </summary>
/// <param name="input"></param>
/// <returns>The path of the written temporary file</returns>
private string ReadToTempFile(Stream input)
{
var fileName = Path.GetTempFileName();
var fileInfo = new FileInfo(fileName);
fileInfo.Attributes = FileAttributes.Temporary;
var buffer = new byte[16 * 1024];
using (var writer = File.OpenWrite(fileName))
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
writer.Write(buffer, 0, read);
}
}
return fileName;
}
}
EDIT: after stewing on this for way too many hours, I've come to the conclusion that what I want to do is basically impossible. After talking to my boss, we've decided the best alternative is to make a second controller that accepts a file and then associates it to the rest of the form data, and put the onus on the front-end developers to do much more work to support that scenario.
I've extremely disappointed in the designers of Web API for making such a common use-case so difficult (if at all possible!) to pull off.
It actually does support MultiPart/FormPost data:
It's all about using the HttpContext of the Web API controller, and on the request you will have the files collection filled, and when data is being posted as well, you can access the data.
Below is an example that I use to upload a profile picture and the data object to go with it:
[Route("UploadUserImage", Name = "UploadUserImage")]
[HttpPost]
public async Task<dynamic> PostUploadUserImage(UserInfo userInformation)
{
foreach (string fileKey in HttpContext.Current.Request.Files.Keys)
{
HttpPostedFile file = HttpContext.Current.Request.Files[fileKey];
if (file.ContentLength <= 0)
continue; //Skip unused file controls.
//The resizing settings can specify any of 30 commands.. See http://imageresizing.net for details.
//Destination paths can have variables like <guid> and <ext>, or
//even a santizied version of the original filename, like <filename:A-Za-z0-9>
ImageResizer.ImageJob i = new ImageResizer.ImageJob(file, "~/image-uploads/<guid>.<ext>", new ImageResizer.ResizeSettings(
"width=2000;height=2000;format=jpg;mode=max"));
i.CreateParentDirectory = true; //Auto-create the uploads directory.
i.Build();
var fileNameArray = i.FinalPath.Split(#"\".ToCharArray());
var fileName = fileNameArray[fileNameArray.Length - 1];
userInformation.profilePictureUrl = String.Format("/services/image-uploads/{0}",fileName);
return userInformation;
}
return null;
}
I have an object I'd like to serialize to a memory buffer, which is then sent via UART to an embedded device.
I'm working in a C# environment on windows.
What I'd like to do is to create two classes that look like this:
class StatusElement
{
byte statusPart1;
byte statusPart2;
}
class DeviceCommand
{
byte Address;
byte Length;
StatusElement[] statusElements; // Can have an arbitrary number of elements in it
}
I'd like to use a serialize, preferably something based on c# serialization, to convert the second class to a byte stream.
The problem is that the embedded device is hard-coded to accept an exact sequence (AddressByte, LengthByte .... ErrorCorrectionByte) so I cannot use the regular C# serialization, which adds serialization metadata in the stream. This also rules out other serializes like Protobuf.
So my question is:
Is it possible to customize the c# serialization to get the output I need? How?
--- Update ---
Thanks everyone for the help.
After consideration I’ve decided to implement my own mini-serializer, using reflection and per-type handler. More complex but gives me more flexibility and automation capabilities.
use a MemoryStream to manully serialize your object.
private byte[] Serialize()
{
using (var ms = new MemoryStream())
{
ms.WriteByte(Address);
ms.WriteByte(Length);
foreach (var element in statusElements)
{
ms.WriteByte(element.statusPart1);
ms.WriteByte(element.statusPart2);
}
return ms.ToArray();
}
}
Likewise for deserialization:
private static DeviceCommand Deserialize(byte[] input)
{
DeviceCommand result = new DeviceCommand();
using (var ms = new MemoryStream(input))
{
result.Address = ms.ReadByte();
result.Length = ms.ReadByte();
//assuming .Length contains the number of statusElements:
result.statusElemetns = new StatusElement[result.Length];
for (int i = 0; i < result.Length; i++)
{
result.statusElements[i] = new StatusElement();
result.statusElements[i].statusPart1 = ms.ReadByte();
result.statusElements[i].statusPart2 = ms.ReadByte();
}
}
return result;
}
If you need only to write bytes or byte arrays, you can use the MemoryStream directly. If you want to use other .NET base types, access your Stream with a System.IO.BinaryWriter / BinaryReader. This class is used by the System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
for binary serialization and deserialization.