I'm writing xml with XmlWriter. My code has lots of sections like this:
xml.WriteStartElement("payload");
ThirdPartyLibrary.Serialise(results, xml);
xml.WriteEndElement(); // </payload>
The problem is that the ThirdPartyLibrary.Serialise method is unreliable. It can happen (depending on the variable results) that it doesn't close all the tags it opens. As a consequence, my WriteEndElement line is perverted, consumed closing the library's hanging tags, rather than writing </payload>.
Thus I'd like to make a checked call to WriteEndElement that checks the element name, and throws an exception unless the cursor is at the expected element.
xml.WriteEndElement("payload");
You can think of this like XmlReader.ReadStartElement(name) which throws unless the cursor is at the expected place in the document.
How can I achieve this?
Edit: A second use case for this extension method would be to make my own code more readable and reliable.
XMLWriter is just writes the given xml information in the stream with out any validation. If it does any validation while writing the xml tags, the performance problem will arise while creating the big xml file.
Creating the XML file using XMLWriter is up to developer risk. If you want to do any such kind of validation, you can use XMLDocument.
If you really want to do this validation in XMLWriter, you have to create the writer by using String or StringBuilder. Because, if you use Stream or TextWriter you can't read the information which is written into the stream in the middle of writing. In Every update of the XML you have to read the string and write your own method to validate the written information.
I Suggest you to use XMLDocument for creating these type of xml.
In the end, I wrote an extention method WriteSubtree that gives this usable API:
using (var resultsXml = xml.WriteSubtree("Results"))
{
ThirdPartyLibrary.Serialise(results, resultsXml);
}
The extension method XmlWriter.WriteSubtree is analogous to .NET's XmlReader.ReadSubtree. It returns a special XmlWriter that checks against funny business. Its dispose method closes any tags left open.
Related
On my search, i have seen a lot of examples on reading and writing xml files. All of them has needs setting parameters or classes on every read and write process.
Is it possible to read and write on an XML file with subroutines that taking parameters as filename, node and function?
For example of a file named xmlExample :
<node0>
<node1><name>a</name><number>b</number>
<node2><name>aa</name><number>bb</number><extra>cc</extra>
<node3><another>aa</another><sample>bb</sample>
string filename = "C:\Documents and Settings\Administrator\Desktop\xmlExample .xml"
And then adressing the wanted object hierarchically:
Read( xmlExample, node0, node1 , name)
Or addressing that object with id-like unique node:
Read(xmlExample, sample)//there will be just one "sample".
My question is clearly about non-standart read and write approaches. Do we have to assign the unnecessary parts of file all the time or once a time we write to read or write functions, is it possible to call just function with parameters?
I don't know anything ready made. However, you can quite easily create something like that. Take a look at XmlReader class, and especially the XmlReader.ReadToFollowing method
I have a function that is very small, but is called so many times that my profiler marks it as time consuming. It is the following one:
private static XmlElement SerializeElement(XmlDocument doc, String nodeName, String nodeValue)
{
XmlElement newElement = doc.CreateElement(nodeName);
newElement.InnerXml = nodeValue;
return newElement;
}
The second line (where it enters the nodeValue) is the one takes some time.
The thing is, I don't think it can be optimized code-wise, I'm still open to suggestions on that part though.
However, I remember reading or hearing somewhere that you could tell the compiler to flag this function, so that it is loaded in memory when the program starts and it runs faster.
Is this just my imagination or such a flag exists?
Thanks,
FB.
There are ways you can cause it to be jitted early, but it's not the jit time that's hurting you here.
If you're having performance problems related to Xml serialization, you might consider using XmlWriter rather than XmlDocument, which is fairly heavy. Also, most automatic serialization systems (including the built-in .NET XML Serialization) will emit code dynamically to perform the serialization, which can then be cached and re-used. Most of this has to do with avoiding the overhead of reflection, however, rather than the overhead of the actual XML writing/parsing.
I dont think this can be solved using any kind of catching or inlining. And I believe its your imagination. Mainly the part about performance. What you have in mind is pre-JIT-ing your code. This technique will remove the wait time for JITer when your function is first called. But this is only first time this function is called. It has no performance effect for subsequent calls.
As documentation states, setting InnterXml parses set string as XML. And parsing XML string can be expensive operation, especialy if set xml in string format is complex. And documentation even has this line:
InnerXml is not an efficient way to modify the DOM. There may be performance issues when replacing complex nodes. It is more efficient to construct nodes and use methods such as InsertBefore, InsertAfter, AppendChild, and RemoveChild to modify the Xml document.
So, if you are creating complex XML structure this way it would be wise to do it by hand.
Aloha,
I have a 8MB XML file that I wish to deserialize.
I'm using this code:
public static T Deserialize<T>(string xml)
{
TextReader reader = new StringReader(xml);
Type type = typeof(T);
XmlSerializer serializer = new XmlSerializer(type);
T obj = (T)serializer.Deserialize(reader);
return obj;
}
This code runs in about a minute, which seems rather slow to me. I've tried to use sgen.exe to precompile the serialization dll, but this didn't change the performance.
What other options do I have to improve performance?
[edit] I need the object that is created by the deserialization to perform (basic) transformations on. The XML is received from an external webservice.
The XmlSerializer uses reflection and is therefore not the best choice if performance is an issue.
You could build up a DOM of your XML document using the XmlDocument or XDocument classes and work with that, or, even faster use an XmlReader. The XmlReader however requires you to write any object mapping - if needed - yourself.
What approach is the best depends stronly on what you want to do with the XML data. Do you simply need to extract certain values or do you have to work and edit the whole document object model?
Yes it does use reflection, but performance is a gray area. When talking an 8mb file... yes it will be much slower. But if dealing with a small file it will not be.
I would NOT saying reading the file vial XmlReader or XPath would be easier or really any faster. What is easier then telling something to turn your xml to an object or your object to XML...? not much.
Now if you need fine grain control then maybe you need to do it by hand.
Personally the choice is like this. I am willing to give up a bit of speed to save a TON of ugly nasty code.
Like everything else in software development there are trade offs.
You can try implementing IXmlSerializable in your "T" class write custom logic to process the XML.
I'm just curious about this. It strikes me that the behavior of a StringBuilder is functionally (if not technically) the same as a Stream -- it's a bin of data to which other data can be added.
Again, just curious.
Stream is an input and output of binary data.
StringBuilder is means of building up text data.
Beyond that, there's the issue of state - a StringBuilder just has the current value, with no idea of "position". It allows you to access and mutate data anywhere within it. A stream, on the other hand, is logically a potentially infinite stream of data, with a cursor somewhere in the middle to say where you've got to. You generally just read/write forwards, with Seek/Position to skip to a specific part of the data stream.
Try to imagine implementing the Stream API with StringBuilder... it just doesn't fit. You could sort of do it, but you'd end up with StringReader and StringWriter, basically.
StringBuilder has more than just Append functions. It also has insert functions which is unnatural for a stream. Use the StringWriter class if you want a stream that wraps a StringBuilder.
A stream normally refers to an external input/output source (file, network). StringBuilder has no such characteristic.
Because it's not really a stream. It's more of a buffer that grows.
While both can have data added to them, the functionality as a whole is different.
A Stream is for inputting or outputting data from/to some source, not for building something. StringBuilder does not need functionality that Stream provides, like Buffering, etc to build a resource.
On the other hand you will find the classes StringReader/Writer in System.IO. The StringWriter e.g. implements TextWriter against an underlying StringBuilder.
Personally I've never used it but if you have a text file writing routine you could make it work against a TextWriter. Then in your test, instead of instantiating a StreamWriter you instantiate a StringWriter and you could then check what was written by looking at the underlying StringBuilder.
Now I'm dizzy...
I need advice on how to have my C# console application display text to the user through the standard output while still being able access it later on. The actual feature I would like to implement is to dump the entire output buffer to a text file at the end of program execution.
The workaround I use while I don't find a cleaner approach is to subclass TextWriter overriding the writing methods so they would both write to a file and call the original stdout writer. Something like this:
public class DirtyWorkaround {
private class DirtyWriter : TextWriter {
private TextWriter stdoutWriter;
private StreamWriter fileWriter;
public DirtyWriter(string path, TextWriter stdoutWriter) {
this.stdoutWriter = stdoutWriter;
this.fileWriter = new StreamWriter(path);
}
override public void Write(string s) {
stdoutWriter.Write(s);
fileWriter.Write(s);
fileWriter.Flush();
}
// Same as above for WriteLine() and WriteLine(string),
// plus whatever methods I need to override to inherit
// from TextWriter (Encoding.Get I guess).
}
public static void Main(string[] args) {
using (DirtyWriter dw = new DirtyWriter("path", Console.Out)) {
Console.SetOut(dw);
// Teh codez
}
}
}
See that it writes to and flushes the file all the time. I'd love to do it only at the end of the execution, but I couldn't find any way to access to the output buffer.
Also, excuse inaccuracies with the above code (had to write it ad hoc, sorry ;).
The perfect solution for this is to use log4net with a console appender and a file appender. There are many other appenders available as well. It also allows you to turn the different appenders off and on at runtime.
I don't think there's anything wrong with your approach.
If you wanted reusable code, consider implementing a class called MultiWriter or somesuch that takes as input two (or N?) TextWriter streams and distributes all writs, flushes, etc. to those streams. Then you can do this file/console thing, but just as easily you can split any output stream. Useful!
Probably not what you want, but just in case... Apparently, PowerShell implements a version of the venerable tee command. Which is pretty much intended for exactly this purpose. So... smoke 'em if you got 'em.
I would say mimic the diagnostics that .NET itself uses (Trace and Debug).
Create a "output" class that can have different classes that adhere to a text output interface. You report to the output class, it automatically sends the output given to the classes you have added (ConsoleOutput, TextFileOutput, WhateverOutput).. And so on.. This also leaves you open to add other "output" types (such as xml/xslt to get a nicely formatted report?).
Check out the Trace Listeners Collection to see what I mean.
Consider refactoring your application to separate the user-interaction portions from the business logic. In my experience, such a separation is quite beneficial to the structure of your program.
For the particular problem you're trying to solve here, it becomes straightforward for the user-interaction part to change its behavior from Console.WriteLine to file I/O.
I'm working on implementing a similar feature to capture output sent to the Console and save it to a log while still passing the output in real time to the normal Console so it doesn't break the application (eg. if it's a console application!).
If you're still trying to do this in your own code by saving the console output (as opposed to using a logging system to save just the information you really care about), I think you can avoid the flush after each write, as long as you also override Flush() and make sure it flushes the original stdoutWriter you saved as well as your fileWriter. You want to do this in case the application is trying to flush a partial line to the console for immediate display (such as an input prompt, a progress indicator, etc), to override the normal line-buffering.
If that approach has problems with your console output being buffered too long, you might need to make sure that WriteLine() flushes stdoutWriter (but probably doesn't need to flush fileWriter except when your Flush() override is called). But I would think that the original Console.Out (actually going to the console) would automatically flush its buffer upon a newline, so you shouldn't have to force it.
You might also want to override Close() to (flush and) close your fileWriter (and probably stdoutWriter as well), but I'm not sure if that's really needed or if a Close() in the base TextWriter would issue a Flush() (which you would already override) and you might rely on application exit to close your file. You should probably test that it gets flushed on exit, to be sure. And be aware that an abnormal exit (crash) likely won't flush buffered output. If that's an issue, flushing fileWriter on newline may be desirable, but that's another tricky can of worms to work out.