Flattening an XML document - c#

I am currently trying to flatten a deep-structured XML document in C# so that every value of an element is converted to an attibute.
The XML structure is as follows:
<members>
<member xmlns="mynamespace" id="1" status="1">
<sensitiveData>
<notes/>
<url>someurl</url>
<altUrl/>
<date1>somedate</date1>
<date2>someotherdate</date2>
<description>some description</description>
<tags/>
<category>some category</category>
</sensitiveData>
<contacts>
<contact contactId="1">
<contactPerson>some contact person</contactPerson>
<phone/>
<mobile>mobile number</mobile>
<email>some#email.com</email>
</contact>
</contacts>
</member>
</members>
What I want it to look like is this:
<members>
<member xmlns="mynamespace" id="1" status="1" notes="" url="someurl" altUrl="" date1="somedate" date2="someotherdate" description="some description" tags="" category="some category" contactId="1" contactPerson="some contact person" phone="" mobile="mobile number" email="some#email.com" />
</members>
I could just parse away on the element names and their attributes, but since this XML comes from a webservice that I can't control, I have to create some sort of dynamic parser to flatten this as the structure can change at some point.
Should be worth noting that the XML structure comes as an XElement from the webservice.
Has anyone tried to do this before and would be helpful to share how? :-) It would be greatly appreciated!
Thanks a lot in advance.
All the best,
Bo

Try this:
var doc = XDocument.Parse(#"<members>...</members>");
var result = new XDocument(
new XElement(doc.Root.Name,
from x in doc.Root.Elements()
select new XElement(x.Name,
from y in x.Descendants()
where !y.HasElements
select new XAttribute(y.Name.LocalName, y.Value))));
Result:
<members>
<member notes="" url="someurl" altUrl="" date1="somedate" date2="someotherdate" description="some description" tags="" category="some category" contactPerson="some contact person" phone="" mobile="mobile number" email="some#email.com" xmlns="mynamespace" />
</members>

You could use this XSLT 1.0 stylesheet. You might want to modify how it handles multiple <contact> elements.
Input XML
<members>
<member xmlns="mynamespace" id="1" status="1">
<sensitiveData>
<notes/>
<url>someurl</url>
<altUrl/>
<date1>somedate</date1>
<date2>someotherdate</date2>
<description>some description</description>
<tags/>
<category>some category</category>
</sensitiveData>
<contacts>
<contact contactId="1">
<contactPerson>some contact person</contactPerson>
<phone/>
<mobile>mobile number</mobile>
<email>some#email.com</email>
</contact>
<contact contactId="2">
<contactPerson>second contact person</contactPerson>
<phone/>
<mobile>second mobile number</mobile>
<email>second some#email.com</email>
</contact>
</contacts>
</member>
</members>
XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:my="mynamespace" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:apply-templates select="node()|#*"/>
</xsl:template>
<xsl:template match="members|my:member">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="node()[text()][ancestor::my:member]|#*[ancestor::my:member]">
<xsl:variable name="vContact">
<xsl:if test="ancestor-or-self::my:contact">
<xsl:value-of select="count(ancestor-or-self::my:contact/preceding-sibling::my:contact) + 1"/>
</xsl:if>
</xsl:variable>
<xsl:attribute name="{name()}{$vContact}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
XML Output
<members>
<member xmlns="mynamespace" id="1" status="1" url="someurl" date1="somedate"
date2="someotherdate"
description="some description"
category="some category"
contactId1="1"
contactPerson1="some contact person"
mobile1="mobile number"
email1="some#email.com"
contactId2="2"
contactPerson2="second contact person"
mobile2="second mobile number"
email2="second some#email.com"/>
</members>

I think dtb answer is the best way to do it. However, you have to note one important issue. Try to add another contact information and dtb code would crash. Because a member can have more than one contact information but yet can not have duplicate attributes. In order to work around that I updated the code to select only distinct attributes. To do that I implemented IEqualityComparer<XAttribute>.
The updated linq expression would look like this
var result = new XDocument(new XElement(doc.Root.Name,
from x in doc.Root.Elements()
select new XElement(x.Name, (from y in x.Descendants()
where !y.HasElements
select new XAttribute(y.Name.LocalName, y.Value)).Distinct(new XAttributeEqualityComparer())
)));
As you can notice a Distinct call was added with a custom Equality comparer overload(XAttributeEqualityComparer)
class XAttributeEqualityComparer : IEqualityComparer<XAttribute>
{
public bool Equals(XAttribute x, XAttribute y)
{
return x.Name == y.Name;
}
public int GetHashCode(XAttribute obj)
{
return obj.Name.GetHashCode();
}
}

You could write an XSLT transform to convert the elements to attributes.

Are you doing this to create another XML document, or is just to make your processing simpler? If former is the case, then you just have to put all values in a map when you come across a leaf node and that's it. You can actually then iterate over the key-value pairs in the map to reconstruct an xml tag with just attributes.

Related

XSL Using Params in a contains function

Is it possible to use params as the data for a contains function?
I have a C# file that is passing information to a XSL sheet in the form of param's to make a html page that prints out the data. If I hard code the information it works but if i use params instead it returns nothing yet if i print out the information using a text tag it works so i know the value being passed in should be correct.
<xsl:param name="type"/>
<xsl:param name="filter"/>
<xsl:for-each select="london-schools/school [contains($type, '$filter')]">
that is what I am trying to do, and it just returns the table headings instead of information.
Thanks, Brandon.
Perhaps you meant:
<xsl:for-each select="london-schools/school [contains(type, $filter)]">
It's hard to tell for sure without seeing your input and the expected output - but certainly, if type is the name of a node, then it should not be prefixed by $, and if $filter is a parameter, then it should not be quoted.
Note also that XML is case-sensitive; you mention both type and Type - they are not the same.
Added:
I'm really guessing here, but consider the following:
XML
<records>
<record>
<name>Alpha</name>
<type>Bravo</type>
</record>
<record>
<name>Bravo</name>
<type>Bravo</type>
</record>
<record>
<name>Charlie</name>
<type>Alpha</type>
</record>
<record>
<name>Delta</name>
<type>Alpha</type>
</record>
</records>
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:param name="property"/>
<xsl:param name="value"/>
<xsl:template match="/records">
<xsl:copy>
<xsl:for-each select="record[contains(*[name()=$property], $value)]">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When the above stylesheet is applied to the input with parameters:
$property = "name"
$value = "Bravo"
the result will be:
<?xml version="1.0" encoding="UTF-8"?>
<records>
<record>
<name>Bravo</name>
<type>Bravo</type>
</record>
</records>
When the parameters are:
$property = "type"
$value = "Bravo"
the result will be:
<?xml version="1.0" encoding="UTF-8"?>
<records>
<record>
<name>Alpha</name>
<type>Bravo</type>
</record>
<record>
<name>Bravo</name>
<type>Bravo</type>
</record>
</records>

Convert from XML to CSV

I have been searching for the solution to convert XML into CSV, but I cannot find one which matches my case as XML structure is different
XML structure looks like
<VWSRecipeFile>
<EX_Extrusion User="ABC" Version="1.0" Description="" LastChange="41914.7876341204">
<Values>
<C22O01_A_TempFZ1_Set Item="A_TempFZ1_Set" Type="4" Hex="42700000" Value="60"/>
<C13O02_A_TempHZ2_Set Item="A_TempHZ2_Set" Type="4" Hex="43430000" Value="195"/>
<C13O03_A_TempHZ3_Set Item="A_TempHZ3_Set" Type="4" Hex="43430000" Value="195"/>
</Values>
</EX_Extrusion>
</VWSRecipeFile>
Expected CSV Format
A_TempFZ1_Set,A_TempHZ2_Set,A_TempHZ3_Set
60,195,195
i can achieve the new expected csv format, but don't know if it is the best way to do it, any suggestion is appreciated
'
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no"/>
<xsl:template match="/VWSRecipeFile">
<xsl:for-each select="EX_Extrusion/Values/*">
<xsl:value-of select="concat(#Item,',')" />
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:for-each select="EX_Extrusion/Values/*">
<xsl:value-of select="concat(#Value,',')" />
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>'
Thanks
One way you can do this is to use XSLT, the language designed to work with XML. You surely can parse the XML with C# but I like XSLT cause it's cleaner.
You define an external XSLT file, then call it within C# to do the transform.
Edit: added new columns based on new requirements.
File C:\XmlToCSV.xslt (
is the newline character)
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no"/>
<xsl:template match="/VWSRecipeFile">
<xsl:variable name="User" select="EX_Extrusion/#User"/>
<xsl:variable name="Version" select="EX_Extrusion/#Version"/>
<xsl:variable name="Description" select="EX_Extrusion/#Description"/>
<xsl:variable name="LastChange" select="EX_Extrusion/#LastChange"/>
<xsl:text>Item,Type,Hex,Value,User,Version,Description,LastChange
</xsl:text>
<xsl:for-each select="EX_Extrusion/Values/*">
<xsl:value-of select="concat(#Item,',',#Type,',',#Hex,',',#Value,',',$User,',',$Version,',',$Description,',',$LastChange,'
')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Apply the transform with XslCompiledTransform:
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load("C:\\XmlToCSV.xslt");
xslt.Transform("InputFile.xml", "OutputFile.csv");
Adjust it based on your needs.
Basic idea would be to iterate though values nodes and select the attributes you want for each node and keep writing them to a file with comma separator. Simply name the file as .csv. If you want something ready made, check this out.
XSLT is one way to do it. Alternatively you can use, Cinchoo ETL - an open source library available to parse xml, produce CSV the way you want it.
string xml = #"<VWSRecipeFile>
<EX_Extrusion User=""ABC"" Version=""1.0"" Description="""" LastChange=""41914.7876341204"">
<Values>
<C22O01_A_TempFZ1_Set Item=""A_TempFZ1_Set"" Type=""4"" Hex=""42700000"" Value=""60""/>
<C13O02_A_TempHZ2_Set Item=""A_TempHZ2_Set"" Type=""4"" Hex=""43430000"" Value=""195""/>
<C13O03_A_TempHZ3_Set Item=""A_TempHZ3_Set"" Type=""4"" Hex=""43430000"" Value=""196""/>
</Values>
</EX_Extrusion>
</VWSRecipeFile>";
StringBuilder sb = new StringBuilder();
using (var p = ChoXmlReader.LoadText(xml).WithXPath("/Values/*"))
{
using (var w = new ChoCSVWriter(sb)
.WithFirstLineHeader()
)
w.Write(p.ToDictionary(r => r.Item, r => r.Value).ToDynamic());
}
Console.WriteLine(sb.ToString());
Output:
A_TempFZ1_Set,A_TempHZ2_Set,A_TempHZ3_Set
60,195,196
Disclaimer: I'm the author of this library.

Create a new XMLDocument by filtering an existing document in c# using xpath

I have a situation where I receive an XML (document) file from an external company. I need to filter the document to remove all data I am not interested in.
The file is about 500KB but will be requested very often.
let say the following file:
<dvdlist>
<dvd>
<title>title 1</title>
<director>directory 2</director>
<price>1</price>
<location>
<city>denver</city>
</location>
</dvd>
<dvd>
<title>title 2</title>
<director>directory 2</director>
<price>2</price>
<location>
<city>london</city>
</location>
</dvd>
<dvd>
<title>title 3</title>
<director>directory 3</director>
<price>3</price>
<location>
<city>london</city>
</location>
</dvd>
</dvdlist>
What I need is simply filter the document based on the city = london in order to end up with this new XML document
<dvdlist>
<dvd>
<title>title 2</title>
<director>directory 2</director>
<price>2</price>
<location>
<city>london</city>
</location>
</dvd>
<dvd>
<title>title 3</title>
<director>directory 3</director>
<price>3</price>
<location>
<city>london</city>
</location>
</dvd>
</dvdlist>
I have tried the following
XmlDocument doc = new XmlDocument();
doc.Load(#"C:\Development\Website\dvds.xml");
XmlNode node = doc.SelectSingleNode("dvdlist/dvd/location/city[text()='london']");
Any help or links will appreciate
Thanks
XPath is a selection expression language -- it never modifies the XML document(s) it operates on.
Therefore, in order to obtain the desired new XML document, you need to either use XML DOM (not recommended) or apply an XSLT transformation to the XML document. The latter is the recommended way to go, since XSLT is a language especially designed for tree transformations.
In .NET one can use the XslCompiledTransform class and its Transform() method. Read more about these in the relevant MSDN documentation.
The XSLT transformation itself is extremely simple:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="dvd[not(location/city='london')]"/>
</xsl:stylesheet>
Here, you can find a complete code example how to obtain the result of the transformation as an XmlDocument (or if desired, as an XDocument).
Here's an example using LINQ to XML.
//load the document
var document = XDocument.Load(#"C:\Development\Website\dvds.xml");
//get all dvd nodes
var dvds = document.Descendants().Where(node => node.Name == "dvd");
//get all dvd nodes that have a city node with a value of "london"
var londonDVDs = dvds.Where(dvd => dvd.Descendants().Any(child => child.Name == "city" && child.Value == "london"));

Optimal Method to Minify an XML in C# 3.0

Coding Platform: ASP.NET C#
I have an XML like this.
<Items>
<Map id="35">
<Terrains>
<Item id="1" row="0" column="0"/>
<Item id="1" row="0" column="1"/>
<Item id="1" row="0" column="2"/>
<Item id="1" row="0" column="3"/>
<Item id="1" row="0" column="4"/>
</Terrains>
</Map>
</Items>
I would like to minify this to
<Its>
<Map id="30">
<Te>
<It id="1" r="0" c="0"/>
<It id="1" r="0" c="1"/>
<It id="1" r="0" c="2"/>
<It id="1" r="0" c="3"/>
<It id="1" r="0" c="4"/>
</Te>
</Map>
</Its>
Then I am converting this to JSON using James Newton-King's JSON Converter.
The idea is to minify the xml data to the maximum as it contains tens of thousands of lines.
My questions are
What is the optimal method to minify the xml as mentioned above?
Now its done like XML-MinifyXML-Convert to JSON. Can I do it in two steps?(XML-Minify while converting to JSON)
Is James Newton-King's JSON converter a bit overkill for this simple conversion?
Please provide code snippets also if possible.
I suspect GZIP (via GZipStream, or simply via IIS, noting that you need to enable dynamic compression for the json mime-type) would be both simpler and smaller, but if you are using serializarion, simply adding some [XmlElement(...)] / [XmlAttribute(...)] should do it. Of course, if size is your concern, can I also suggest something like protobuf-net, which gives an extremely dense binary output.
If you aren't using serialisation, then this looks an ideal fit for some "xslt":
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()">
<xsl:copy><xsl:apply-templates select="#* | node()"/></xsl:copy>
</xsl:template>
<xsl:template match="/Items">
<Its><xsl:apply-templates/></Its>
</xsl:template>
<xsl:template match="/Items/Map/Terrains">
<Te><xsl:apply-templates/></Te>
</xsl:template>
<xsl:template match="/Items/Map/Terrains/Item">
<It id="{#id}" r="{#row}" c="{#column}"><xsl:apply-templates select="*"/></It>
</xsl:template>
</xsl:stylesheet>
(with C#:)
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load("Condense.xslt"); // cache and re-use this object; don't Load each time
xslt.Transform("Data.xml", "Smaller.xml");
Console.WriteLine("{0} vs {1}",
new FileInfo("Data.xml").Length,
new FileInfo("Smaller.xml").Length);

Comparing 2 XML docs and applying the changes to source document

Here's my problem.I have 2 xmlfiles with identical structure, with the second xml containing only few node compared to first.
File1
<root>
<alpha>111</alpha>
<beta>22</beta>
<gamma></gamma>
<delta></delta>
</root>
File2
<root>
<beta>XX</beta>
<delta>XX</delta>
</root>
This's what the result should look like
<root>
<alpha>111</alpha>
<beta>22</beta>
<gamma></gamma>
<delta>XX</delta>
</root>
Basically if the node contents of any node in File1 is blank then it should read the values from File2(if it exists, that is).
I did try my luck with Microsoft XmlDiff API but it didn't work out for me(the patch process didn't apply changes to the source doc). Also I'm a bit worried about the DOM approach that it uses, because of the size of the xml that I'll be dealing with.
Can you please suggest a good way of doing this.
I'm using C# 2
Here is a little bit simpler and more efficient solution that that proposed by Alastair (see my comment to his solution).
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:variable name="vFile2"
select="document('File2.xml')"/>
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[not(text())]">
<xsl:copy>
<xsl:copy-of
select="$vFile2/*/*[name() = name(current())]/text()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
when applied on this XML document:
<root>
<alpha>111</alpha>
<beta>22</beta>
<gamma></gamma>
<delta></delta>
</root>
produces the wanted result:
<root>
<alpha>111</alpha>
<beta>22</beta>
<gamma></gamma>
<delta>XX</delta>
</root>
In XSLT you can use the document() function to retrieve nodes from File2 if you encounter an empty node in File1. Something like:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="root/*[.='']">
<xsl:variable name="file2node">
<xsl:copy-of select="document('File2.xml')/root/*[name()=name(current())]"/>
</xsl:variable>
<xsl:choose>
<xsl:when test="$file2node != ''">
<xsl:copy-of select="$file2node"/>
</xsl:when>
<xsl:otherwise>
<xsl:copy/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This merge seems very specific.
If that is the case, just write some code to load both xml files and apply the changes as you described.

Categories

Resources