I need to generate a huge xml file from different sources (functions). I decide to use XmlTextWriter since it uses less memory than XmlDocument.
First, initiate an XmlWriter with underlying MemoryStream
MemoryStream ms = new MemoryStream();
XmlTextWriter xmlWriter = new XmlTextWriter(ms, new UTF8Encoding(false, false));
xmlWriter.Formatting = Formatting.Indented;
Then I pass the XmlWriter (note xml writer is kept open until the very end) to a function to generate the beginning of the XML file:
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement();
// xmlWriter.WriteEndElement(); // Do not write the end of root element in first function, to add more xml elements in following functions
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
But I found that underlying memory stream is empty (by converting byte array to string and output string). Any ideas why?
Also, I have a general question about how to generate a huge xml file from different sources (functions). What I do now is keeping the XmlWriter open (I assume the underlying memory stream should open as well) to each function and write. In the first function, I do not write the end of root element. After the last function, I manually add the end of root element by:
string endRoot = "</Root>";
byte[] byteEndRoot = Encoding.ASCII.GetBytes(endRoot);
ms.Write(byteEndRoot, 0, byteEndRoot.Length);
Not sure if this works or not.
Thanks a lot!
Technically you should only ask one question per question, so I'm only going to answer the first one because this is just a quick visit to SO for me at the moment.
You need to call Flush before attempting to read from the Stream I think.
Edit
Just bubbling up my second hunch from the comments below to justify the accepted answer here.
In addition to the call to Flush, if reading from the Stream is done using the Read method and its brethren, then the position in the stream must first be reset back to the start. Otherwise no bytes will be read.
ms.Position = 0; /*reset Position to start*/
StreamReader reader = new StreamReader(ms);
string text = reader.ReadToEnd();
Console.WriteLine(text);
Perhaps you need to call Flush() on the xml stream before checking the memory streazm.
Make sure you call Flush on the XmlTextWriter before checking the memory stream.
Related
I have some product data that I want to write into a csv file. First I have a function that writes the header into the csv file:
using(StreamWriter streamWriter = new StreamWriter(path))
{
string[] headerContent = {"banana","apple","orange"};
string header = string.Join(",", headerContent);
streamWriter.WriteLine(header);
}
Another function goes over the products and writes their data into the csv file:
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Open), Encoding.UTF8))
{
foreach (var product in products)
{
await streamWriter.WriteLineAsync(product.ToString());
}
}
When writing the products into the csv file and do it with FileMode.Open and Encoding.UTF8, the encoding is set correctly into the file meaning that special characters in german or french get shown correctly. But the problem here is that I overwrite my header when I do it like this.
The solution I tried was to not use FileMode.Open but to use FileMode.Append which works, but then for some reason the encoding just gets ignored.
What could I do to append the data while maintaing the encoding? And also why is this happening in the first place?
EDIT:
Example with FileMode.Open:
Fußpflegecreme
Example with FileMode.Append:
Fußpflegecreme
The important question here is: what does the file actually contain; for example, if I use the following:
using System.Text;
string path = "my.txt";
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Create), Encoding.UTF8))
{
streamWriter.WriteLine("Fußpflegecreme 1");
}
using (StreamWriter streamWriter = new StreamWriter(new FileStream(path, FileMode.Append), Encoding.UTF8))
{
streamWriter.WriteLine("Fußpflegecreme 2");
}
// this next line is lazy and inefficient; only good for quick tests
Console.WriteLine(BitConverter.ToString(File.ReadAllBytes(path)));
then the output is (re-formatted a little):
EF-BB-BF-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-31-0D-0A-
46-75-C3-9F-70-66-6C-65-67-65-63-72-65-6D-65-20-32-0D-0A
The first line (note: there aren't any "lines" in the original hex) is the UTF-8 BOM; the second and third lines are the correctly UTF-8 encoded payloads. It would help if you could show the exact bytes that get written in your case. I wonder if the real problem here is that in your version, there is no BOM, but the rest of the data is correct. Some tools, in the absence of a BOM, will choose the wrong encoding. But also, some tools: in the presence of a BOM: will incorrectly show some garbage at the start of the file (and may also, because they're clearly not using the BOM: use the wrong encoding). The preferred option is: specify the encoding explicitly when reading the file, and use a tool that can handle the presence of absence of a BOM.
Whether or not to include a BOM (especially in the case of UTF-8) is a complex question, and there are pros/cons of each - and there are tools that will work better, or worse, with each. A lot of UTF-8 text files do not include a BOM, but: there is no universal answer. The actual content is still correctly UTF-8 encoded whether or not there is a BOM - but how that is interpreted (in either case) is up to the specific tool that you're using to read the data (and how that tool is configured).
I think this will be solved once you explicitly choose the utf8 encoding when writing the header. This will prefix the file with a BOM.
I am writing a xml node in the memorystream using xmlwriter.writenode(reader,false) under the parent element node named "root" ,for some of the nodes the stream length is always zero.My code will be look like this,
MemoryStream ms=new MemoryStream();
xmlTextWriter writer= new XmlTextWriter( ms, encoding );
writer.WriteStartElement( "root" );
writer.WriteNode( reader, false );
if(ms.length!=0)
{
.....
....
}
writer.WriteEndElement();
writer.Flush();
i can get the length where i flush the writer after i wrote the node.Is there any optimization for a length of the stream regarding small sized or large sized node?
XmlTextWriter uses a StreamWriter internally which has a default buffer size of 1Kb.
As your write your XML, the contents will be written using this StreamWriter, and this will only flush its buffer to the underlying Stream once it is full or when Flush() is called.
Writing a small node may not fill this buffer, whereas writing a large node may do. This is why, with large nodes, you are seeing data is flushed to the MemoryStream before you explicitly call Flush().
I need to serialize to an XML document without overwriting the data that is currently in there. I have a method that does this and it will save to the xml file, but will delete whatever is currently in that file upon serializing. Below is the code.
public void SaveSubpart()
{
SOSDocument doc = new SOSDocument();
doc.ID = 1;
doc.Subpart = txtSubpart.Text;
doc.Title = txtTitle.Text;
doc.Applicability = txtApplicability.Text;
doc.Training = txtTraining.Text;
doc.URL = txtUrl.Text;
StreamWriter writer = new StreamWriter(Server.MapPath("~/App_Data/Contents.xml"));
System.Xml.Serialization.XmlSerializer serializer;
try
{
serializer = new System.Xml.Serialization.XmlSerializer(doc.GetType());
serializer.Serialize(writer, doc);
}
catch (Exception ex)
{
//e-mail admin - serialization failed
}
finally
{ writer.Close(); }
}
The contract for the StreamWriter constructor taking only a filename says that if the named file exists, it is overwritten. So this has nothing to do with serializing to XML, per se. You would get the same result if you wrote to the stream through some other means.
The way to do what you are looking for is to read the old XML file into memory, make whatever changes are necessary, and then serialize and write the result to disk.
And even if it was possible to transparently modify an on-disk XML file, that's almost certainly what would happen under the hood because it's the only way to really do it. Yes, you probably could fiddle around with seeking and writing directly on disk, but what if something caused the file to change on disk while you were doing that? If you do the read/modify/write sequence, then you lose out on the changes that were made after you read the file into memory; but if you modify the file directly on disk by seeking and writing, you would be almost guaranteed to end up with the file in an inconsistent state.
And of course, you could only do it if you could fit whatever changes you wanted to make into the bytes that were already on disk...
If concurrency is a problem, either use file locking or use a proper database with transactional support.
try this:
StreamWriter writer = new StreamWriter(Server.MapPath("~/App_Data/Contents.xml"),true);
this determines to append the data to the file.
true=append,
false = overwrite
more info http://msdn.microsoft.com/en-us/library/36b035cb.aspx
So what you want to implement is to serialize an object without overwriting it to an existing file.
Thus
XmlSerializer s = new XmlSerializer(doc.GetType());
TextWriter w = new StringWriter();
s.Serialize(w, doc);
var yourXMLstring = w.ToString();
Then you can process this xml string and append it to existing xml file if you want to.
XmlDocument xml = new XmlDocument();
xml.LoadXml(yourXMLstring );
I've encountering an unusual problem with .Net Framework 3.5 and the System.Xml.XmlReader class.
Before my application calls the XmlReader.Read method it first reads the content of the stream for logging purposes using the Stream.Read method. It then seeks back to the beginning of the stream before calling Stream.Read. When I do this I am getting the following error:
Unhandled Exception: System.Xml.XmlException: Unexpected end of file while parsing Name has occurred. Line 1, position 4097.
If however I call XmlReader.Read, seek to the beginning of the stream and then call the Stream.Read method it all works fine. This only appears to be happening on large streams however. I've just seen one go through the system at about 2000 characters and it works fine?
I've included a code sample below to give an idea of what I'm doing.
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.Schemas.Add(null, args[1]);
readerSettings.ValidationType = ValidationType.Schema;
readerSettings.ValidationEventHandler += new ValidationEventHandler(XmlValidatingReaderValidationEventHandler);
XmlReader reader = XmlReader.Create(fileReader, readerSettings);
byte[] buffer = new byte[fs.Length];
fs.Read(buffer, 0, buffer.Length);
string content = System.Text.UTF8Encoding.UTF8.GetString(buffer, 0, buffer.Length);
fs.Seek(0, SeekOrigin.Begin);
while(reader.Read());
Console.WriteLine("Done");
Thanks
Messing around with the stream which is backing something like XmlReader is generally a bad idea. If you want to do two different things with the same file, I suggest you open two different streams. That way they won´t interfere with each other.
Note that using File.ReadAllText is a simpler way of loading the contents of a text file into a string.
That's because XmlReader buffers the data from the Stream. If you mess with the current position of the Stream, you mess with the XmlReader too...
I am writing a program for formatting 100s of MB String data (nearing a gig) into xml == And I am required to return it as a response to an HTTP (GET) request .
I am using a StringWriter/XmlWriter to build an XML of the records in a loop and returning the
using (StringWriter writer = new StringWriter())
using (writer = XmlWriter.Create(writer, settings)) //where settings are the xml props
writer.ToString()
during testing I saw a few --out of memory exceptions-- and quite clueless on how to find a solution? do you guys have any suggestions for a memory optimized delivery of the response?
is there a memory efficient way of encoding the data? or maybe chunking the data --
I just can not think of how to return it without building the whole thing into one HUGE string object
thanks
--
a few clarifications --
this is an asp .net webservices app over a gigabit ethernet link as josh noted. I am not very familiar with it so still a bit of a learning curve.
I am using XMLWriter to create the XML and create a string out of it using String
some stats --
response xml size = about 385 megs (my data size will grow very quickly to way more than this)
string object size as calculated by a memory profiler = peaked at 605MB
and thanks to everyone who responded...
Use XmlTextWriter wrapped around Reponse.OutputStream to send the XML to the client and periodically flush the response. This way you never have to have more than a few mb in memory at any one time (at least for sending to the client).
Can't you just stream the response to the client? XmlWriter doesn't require its underlying stream to be buffered in memory. If it's ASP.NET you can use the Response.OutputStream or if it's WCF, you can use response streaming.
HTTP get for 1 gig? that's a lot! Perhaps you should reconsider.
At least gziping the output could help.
You should not create XML using string manipulation.
Instead, you should use the XmlTextWriter, XmlDocument, or (in .Net 3.5) XElement classes to build an XML tree in memory, then write it directly to Response.OutputStream using an XmlTextWriter.
Writing directly to an XmlTextWriter that wraps Response.OutputStream wil be most efficient (you'll never have an entire element tree in memory at once), but will be somewhat more complicated.
By doing it this way, you will never have a single string (or array) containing the entire object, and should thus avoid OutOfMemoryExceptions.
Had a similar problem, hope this will help someone. My initial code was:
var serializer = new XmlSerializer(type);
string xmlString;
using (var writer = new StringWriter())
{
serializer.Serialize(writer, objectData, sn); // OutOfMemoryException here
xmlString = writer.ToString();
}
I ended up replaceing StringWriter with MemoryStream and this solved my problem
using (var mem = new MemoryStream())
{
serializer.Serialize(mem, objectData, sn);
xmlString = Encoding.UTF8.GetString(mem.ToArray());
}
You'll have to return each record (or a small group of records) on their own individual GETs.