How to deserialize only part of an XML document in C# - c#

Here's a fictitious example of the problem I'm trying to solve. If I'm working in C#, and have XML like this:
<?xml version="1.0" encoding="utf-8"?>
<Cars>
<Car>
<StockNumber>1020</StockNumber>
<Make>Nissan</Make>
<Model>Sentra</Model>
</Car>
<Car>
<StockNumber>1010</StockNumber>
<Make>Toyota</Make>
<Model>Corolla</Model>
</Car>
<SalesPerson>
<Company>Acme Sales</Company>
<Position>
<Salary>
<Amount>1000</Amount>
<Unit>Dollars</Unit>
... and on... and on....
</SalesPerson>
</Cars>
the XML inside SalesPerson can be very long, megabytes in size. I want to deserialize the tag, but not deserialize the SalesPerson XML element instead keeping it in raw form "for later on".
Essentially I would like to be able to use this as a Objects representation of the XML.
[System.Xml.Serialization.XmlRootAttribute("Cars", Namespace = "", IsNullable = false)]
public class Cars
{
[XmlArrayItem(typeof(Car))]
public Car[] Car { get; set; }
public Stream SalesPerson { get; set; }
}
public class Car
{
[System.Xml.Serialization.XmlElementAttribute("StockNumber")]
public string StockNumber{ get; set; }
[System.Xml.Serialization.XmlElementAttribute("Make")]
public string Make{ get; set; }
[System.Xml.Serialization.XmlElementAttribute("Model")]
public string Model{ get; set; }
}
where the SalesPerson property on the Cars object would contain a stream with the raw xml that is within the <SalesPerson> xml element after being run through an XmlSerializer.
Can this be done? Can I choose to only deserialize "part of" an xml document?
Thanks!
-Mike
p.s. example xml stolen from How to Deserialize XML document

It might be a bit old thread, but i will post anyway. i had the same problem (needed to deserialize like 10kb of data from a file that had more than 1MB). In main object (which has a InnerObject that needs to be deserializer) i implemented a IXmlSerializable interface, then changed the ReadXml method.
We have xmlTextReader as input , the first line is to read till a XML tag:
reader.ReadToDescendant("InnerObjectTag"); //tag which matches the InnerObject
Then create XMLSerializer for a type of the object we want to deserialize and deserialize it
XmlSerializer serializer = new XmlSerializer(typeof(InnerObject));
this.innerObject = serializer.Deserialize(reader.ReadSubtree()); //this gives serializer the part of XML that is for the innerObject data
reader.close(); //now skip the rest
this saved me a lot of time to deserialize and allows me to read just a part of XML (just some details that describe the file, which might help the user to decide if the file is what he wants to load).

The accepted answer from user271807 is a great solution but I found, that I also needed to set the xml root of the fragment to avoid an exception with an inner exception saying something like this:
...xmlns=''> was not expected
This exception was trown when I tried to deserialize only the inner Authentication element of this xml document:
<?xml version=""1.0"" encoding=""UTF-8""?>
<Api>
<Authentication>
<sessionid>xxx</sessionid>
<errormessage>xxx</errormessage>
</Authentication>
</ApI>
So I ended up creating this extension method as a reusable solution - warning contains a memory leak, see below:
public static T DeserializeXml<T>(this string #this, string innerStartTag = null)
{
using (var stringReader = new StringReader(#this))
using (var xmlReader = XmlReader.Create(stringReader)) {
if (innerStartTag != null) {
xmlReader.ReadToDescendant(innerStartTag);
var xmlSerializer = new XmlSerializer(typeof(T), new XmlRootAttribute(innerStartTag));
return (T)xmlSerializer.Deserialize(xmlReader.ReadSubtree());
}
return (T)new XmlSerializer(typeof(T)).Deserialize(xmlReader);
}
}
Update 20th March 2017:As the comment below points out, there is a memory leak problem when using one of the constructors of XmlSerializer, so I ended up using a caching solution as shown below:
/// <summary>
/// Deserialize XML string, optionally only an inner fragment of the XML, as specified by the innerStartTag parameter.
/// </summary>
public static T DeserializeXml<T>(this string #this, string innerStartTag = null) {
using (var stringReader = new StringReader(#this)) {
using (var xmlReader = XmlReader.Create(stringReader)) {
if (innerStartTag != null) {
xmlReader.ReadToDescendant(innerStartTag);
var xmlSerializer = CachingXmlSerializerFactory.Create(typeof (T), new XmlRootAttribute(innerStartTag));
return (T) xmlSerializer.Deserialize(xmlReader.ReadSubtree());
}
return (T) CachingXmlSerializerFactory.Create(typeof (T), new XmlRootAttribute("AutochartistAPI")).Deserialize(xmlReader);
}
}
}
/// <summary>
/// A caching factory to avoid memory leaks in the XmlSerializer class.
/// See http://dotnetcodebox.blogspot.dk/2013/01/xmlserializer-class-may-result-in.html
/// </summary>
public static class CachingXmlSerializerFactory {
private static readonly ConcurrentDictionary<string, XmlSerializer> Cache = new ConcurrentDictionary<string, XmlSerializer>();
public static XmlSerializer Create(Type type, XmlRootAttribute root) {
if (type == null) {
throw new ArgumentNullException(nameof(type));
}
if (root == null) {
throw new ArgumentNullException(nameof(root));
}
var key = string.Format(CultureInfo.InvariantCulture, "{0}:{1}", type, root.ElementName);
return Cache.GetOrAdd(key, _ => new XmlSerializer(type, root));
}
public static XmlSerializer Create<T>(XmlRootAttribute root) {
return Create(typeof (T), root);
}
public static XmlSerializer Create<T>() {
return Create(typeof (T));
}
public static XmlSerializer Create<T>(string defaultNamespace) {
return Create(typeof (T), defaultNamespace);
}
public static XmlSerializer Create(Type type) {
return new XmlSerializer(type);
}
public static XmlSerializer Create(Type type, string defaultNamespace) {
return new XmlSerializer(type, defaultNamespace);
}
}

You can control how your serialization is done by implementing the ISerializable interface in your class. Note this will also imply a constructor with the method signature (SerializationInfo info, StreamingContext context) and sure you can do what you are asking with that.
However have a close look at whether or not you really need to do this with streaming because if you don't have to use the streaming mechanism, achieving the same thing with Linq to XML will be easier, and, simpler to maintain in the long term (IMO)

I think the previous commenter is correct in his comment that XML might not be the best choice of a backing store here.
If you are having issues of scale and aren't taking advantage of some of the other niceties you get with XML, like transforms, you might be better off using a database for your data. The operations you are doing really seem to fit more into that model.
I know this doesn't really answer your question, but I thought I would highlight an alternate solution you might use. A good database and an appropriate OR mapper like .netTiers, NHibernate, or more recently LINQ to SQL / Entity Framework would probably get you back up and running with minimal changes to the rest of your codebase.

Typically XML deserialization is an all-or-nothing proposition out of the box, so you'll probably need to customize. If you don't do a full deserialization, you run the risk that the xml is malformed within the SalesPerson element, and so the document is invalid.
If you are willing to accept that risk, you'll probably want to do some basic text parsing to break out the SalesPerson elements into a different document using plain text processing facilities, then process the XML.
This is a good example of why XML is not always the correct answer.

Please try defining the SalesPerson property as type XmlElement. This works for output from ASMX web services, which use XML Serialization. I would think it would work on input as well. I would expect the entire <SalesPerson> element to wind up in the XmlElement.

You may control what parts of the Cars class are deserialized by implementing the IXmlSerializable interface on the Cars class, and then within the ReadXml(XmlReader) method you would read and deserialize the Car elements but when you reach the SalesPerson element you would read its subtree as a string and then construct a Stream over the the textual content using a StreamWriter.
If you never want the XmlSerializer to write out the SalesPerson element, use the [XmlIgnore] attribute. I am not sure what you want to happen when you seriailize the Cars class to its XML representation. Are you trying to only prevent deserialization of the SalesPerson while still being able to serialize the XML representation of the SalesPerson represented by the Stream?
I could probably provide a code example of this if you want a concrete implementation.

If all you want to do is parse out the SalesPerson element but keep it as a string, you should use Xsl Transform rather than "Deserialization". If, on the other hand, you want to parse out the SalesPerson element and only populate an object in memory from all the other non-SalesPerson elements, then Xsl Transform might also be the way to go. If the files are way big, you may consider separating them and using Xsl to combine different xml files so that the SalesPerson I/O only occurs when you need it to.

I would suggest you to manually read from Xml, using any lightweight methods, like XmlReader, XPathDocument or LINQ-to-XML.
When you have to read only 3 properties, I suppose you can write code that manually read from that node and have a full control of how it is executed instead of relying on Serialization/Deserialization

Related

Serialize an object to XML without using attributes

Is it possible to control XmlSerializer/DataContractSerializer behavior without specifying any serialization attributes?
To be more specific, I want to control the serialization (XmlIgnore, custom attribute name etc.), but without decorating my properties.
Use cases:
A large existing class, which I don't wish to pollute with serialization attributes
Serializing a class for which no source code is available
Switching from using XmlSerializer to DataContractSerializer to JSON without changing class code
For example, how would I serialize the following without uncommenting the attributes:
// [Serializable]
public MyClass
{
// [XmlIgnore] - The following property should not be serialized, without uncommenting this line
public int DontSerializeMeEvenThoughImPublic { get; set; }
// [XmlAttribute("another_name")] - should be serialized as 'another_name', not 'SerializeMeAsXmlAttribute'
public double SerializeMeAsXmlAttribute { get; set; }
// [DataMember] - should be included
private string IWantToBeSerializedButDontDecorateMeWithDataMember { get; set; }
}
You can't (do it elegantly).
The only way to modify the way the XmlSerializer serializes classes is by using attributes (by the way SerializableAttribute is not required). The DataContractSerializer is even worse.
One possible solution is to create intermediate classes for XML serialization/desrialization and use Automapper to copy the data between the "real" class and mediator.
I've used this approach to keep the front end XML serialization concerns outside of my business logic data types.
I know this is an old question, but for the XmlSerializer part, it's interesting that no one has suggested the use of Attribute overrides.
Although not solving the Private property, but AFAIK you can't do that with attributes either, so the only route there would be the IXmlSerializable interface.
But what you can do by adding Attributes should be possible with overrides as well.
The following should work for the change wishes reflected by the outcommented XmlAttributes:
public class Program
{
public static void Main()
{
XmlAttributeOverrides overrides = new XmlAttributeOverrides();
overrides.Add(typeof(MyClass), "DontSerializeMeEvenThoughImPublic", new XmlAttributes { XmlIgnore = true });
overrides.Add(typeof(MyClass), "SerializeMeAsXmlAttribute", new XmlAttributes { XmlAttribute = new XmlAttributeAttribute("another_name") });
XmlSerializer serializer = new XmlSerializer(typeof(MyClass), overrides);
using (var writer = new StringWriter())
{
serializer.Serialize(writer, new MyClass());
Console.WriteLine(writer.ToString());
}
}
}
Serialization via XmlSerializer should work without the attributes.

How to XMLSerialize member Object inside a container object

I've been searching on google about this for an hour but I think I don't use the right word because I can't find a very simple example of what I'm trying to do. People always use complexe structure like List or derived object in the samples.
All I want to do is to XMLSerialize my main object called SuperFile to a file. This SuperFile class contains 2 members and these 2 members are not serialized so the resulting XML file is empty (containing only the header).
Here is my code, what am I doing wrong?
SuperFile
public class SuperFile
{
private NetworkInfo _networkInfo;
private Planification _planification;
public NetworkInfo NI
{
get
{
return _networkInfo;
}
}
public Planification Planif
{
get
{
return _planification;
}
}
}
NetworkInfo and Planification are very normal class with mostly double member and they serialize perfectly on their own if I want. But now, I want them to serialize inside the SuperFile object.
Finally, here is my code to do the serialization
public void Save(string strFilename)
{
System.Xml.Serialization.XmlSerializer x = new System.Xml.Serialization.XmlSerializer(typeof(ExoFile));
TextWriter WriteFileStream = new StreamWriter(strFilename);
x.Serialize(WriteFileStream, this);
WriteFileStream.Close();
}
If I put this inside SuperFile, it get serialized but the 2 other member gets skipped. I think it get serialize since it's not a complex type...
public int _nDummy;
Hope it's clear!
Thanks!
XMLSerializer has some limitations, one of which is to require a setter. (It also doesn't serialise private fields, indexers..). It's not an obvious gotcha, and has had me scratching my head in the past :)
here's an answer with some details - Why isn't my public property serialized by the XmlSerializer?

XmlSerializer doesn't pick up some arrays when deserializing

XmlSerializer has been pretty good so far, but it seems to be breaking on a situation which doesn't seem too complicated. Here is the structure of my XML not the real stuff, but I think I've captured the basic structure):
<RootNode>
<SomeNodeNames>
<SomeNodeName>
<anelement>avalue</anelement>
</SomeNodeName>
</SomeNodeNames>
<TheseOnesDontWork>
<ThisOneDoesntWork>
<elementWhichDoesWork>8</elementWhichDoesWork>
<collection1>
<itemrow>
<text>text.......</text>
</itemrow>
<itemrow>
<text>more text........</text>
</itemrow>
</collection1>
</ThisOneDoesntWork>
</TheseOnesDontWork>
</CutsceneData>
So, I then have a number of classes that correspond to each of these elements.
public class RootNode
{
public SomeNodeName[] SomeNodeNames;
public ThisOneDoesntWork[] TheseOnesDontWork;
}
public class ThisOneDoesntWork
{
public int elementWhichDoesWork;
public itemrow[] collection1;
}
public class itemrow
{
public string text;
}
The XmlSerializer invocation is pretty straight-forward.
XmlSerializer serializer = new XmlSerializer(typeof(RootNode), attrOverrides);
FileStream stream = File.Open("filename.xml", FileMode.Open);
RootNode obj = (RootNode)serializer.Deserialize(stream);
So, this loads correctly, except for 'collection1' not being created at all. I put in an UnknownNode event handler, and sure enough, it comes up with a flag saying that itemrow is of unknown type. I'm not sure why this is. There are a couple of collections, and one of the collections contains an element which is itself a collection. Does this situation really require writing my own deserializer, or am I simply missing a simple fix?

C# XmlSerializer: keep the value, override the element label

I am currently using a LINQ query to read an XML file e.g.
<MyObjects>
<MyObject>
<MyElement>some_text</MyElement>
<MyOtherElement>some_more_text</MyOtherElement>
</MyObject>
</MyObjects>
into a list of custom objects containing custom HistoryString properties. HistoryString contains 2 strings, a currentValue and a previousValue.
This all works great except when using XmlSerializer to write the custom objects back to an XML file, the output fairly obviously contains additional tags i.e.
<MyObjects>
<MyObject>
<MyElement>
<currentValue>some_text</currentValue>
<previousValue>some_text</previousValue>
</MyElement>
<MyOtherElement>
<currentValue>some_more_text</currentValue>
<previousValue>some_more_text</previousValue>
</MyOtherElement>
</MyObject>
</MyObjects>
Q: What would be the neatest and/or most efficient way of reading and writing XML in the same format, based on this fundamental difference?
Some initial ideas:
1) Mark the previousValue property with [System.Xml.Serialization.XmlIgnore] then sweep through the XML string that is to be written removing all traces of <currentValue> and </currentValue>
2) Open the existing file and manually make any updates/deletes/additions - this is surely more long winded.
3) Any way of having a HistoryString automatically resolve to its currentValue rather than serialize each of its properties, similar to how ToString() works?
I have done some research into this, including the useful MSDN articles here and here but I can't see any other attributes that would solve this problem, I am still unsure whether this is possible. Any ideas?
Here is another idea. If you define your class like so:
[Serializable]
public class MyObject
{
[XmlElement(ElementName = "MyElement")]
public string CurrentValueElement
{
get
{
return Element.CurrentValue;
}
set
{
Element = new MyElement
{
CurrentValue = value, PreviousValue = value
};
}
}
[XmlElement(ElementName = "MyOtherElement")]
public string CurrentValueOtherElement
{
get
{
return OtherElement.CurrentValue;
}
set {}
}
[XmlIgnore]
public MyElement Element { get; set; }
[XmlIgnore]
public MyElement OtherElement { get; set; }
}
Then, when the object is serialized, the output XML will look exactly like your example.
Also, if you extend the CurrentValueElement/CurrentValueOtherElement setter like this:
[XmlElement(ElementName = "MyElement")]
public string CurrentValueElement
{
get
{
return Element.CurrentValue;
}
set
{
Element = new MyElement
{
CurrentValue = value, PreviousValue = value
};
}
}
Then you'll be able to use the XmlSerializer to deserialize your objects directly without needing to resorting to LINQ.
Well why not serialize back using original schema and feeding into it the list of transformed objects from history using only current value?
e.g.
from h in HistoryEntryList
select new OriginalEntry{ field = h.field.current_value, ... };

Tweak output from XmlSerializer in C#

I am wondering if there is a way to get XmlSerializer in C# to change what it outputs for one property of one object. For example, if I have something like:
public class MyClass
{
public string prop1{get;}
public uint prop2{get;}
public MyClass2 class2{get;}
public MyClass3 class3{get;}
}
public class MyClass2
{
public string prop3{get;}
}
public class MyClass3
{
//Lots of properties that I want to serialize as normal
}
Now in somewhere in my code, I have something like this:
private void SerializeStuff(List<MyClass> list, string path)
{
XmlSerializer serialize = new XmlSerializer(typeof(List<MyClass>));
TextWriter writer = new StreamWriter(path);
serialize.Serialize(writer, list);
writer.Close();
}
What I want is for the serialization to work as normal, but with prop3 replaced with some other stuff. Example:
<MyClass>
<prop1>whatever</prop1>
<prop2>345</prop2>
<class2>
<somethingNotProp3>whatever</somethingNotProp3>
<somethingElseNotProp3>whatever</somethingElseNotProp3>
</class2>
<class3>
...
</class3>
</MyClass>
Is there a way to customize XmlSerializer so I don't have to write the entire Xml file manually, or is there no way to do that?
Edit:
I am pretty sure that the solution could have something to do with implementing ISerializable's GetObjectData method; however, I am not exactly sure how to implement it. I tried making MyClass2 inherit from ISerializable and implementing GetObjectData, but nothing changed. prop3 was still output in the XML file.
Use the attributes in the System.Xml.Serialization namespace to affect the way that the instance is serialized.
If you need very specialized serialization for MyClass3 that the attributes cannot handle, then implement the IXmlSerializable interface which will give you complete control over the serialization process (you can even decide on an instance-by-instance basis how to serialize the content/property).
Based on the comments, it would seem that you want to implement IXmlSerializable, as you want to change the name of the property and the value; the attributes will let you change the name of the property/element/attribute, but not allow you to perform transformations on the value (and I assume you don't want to corrupt your representation and add an extra property/value for this purpose).

Categories

Resources