Parsing XML using C# [duplicate] - c#

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Amazon Marketplace XML parsing
I am new to parsing XML in C# and I have some data from Amazon MWS library that is displayed below. I need to parse out various ItemAttributes such as ItemDimensions. I am use to JSON responses so I am not sure how to apply this to XML. Would it be possible from someone to point me in the right direction? I have Googled XML Parsing with C# but not valuable results were found to help me complete my tasks.
<GetMatchingProductResponse xmlns="http://mws.amazonservices.com/schema/Products/2011-10-01">
<GetMatchingProductResult ASIN="1430225491" status="Success">
<Product>
<Identifiers>
<MarketplaceASIN>
<MarketplaceId>ATVPDKIKX0DER</MarketplaceId>
<ASIN>1430225491</ASIN>
</MarketplaceASIN>
</Identifiers>
<AttributeSets>
<ns2:ItemAttributes xml:lang="en-US">
<ns2:Author>Troelsen, Andrew</ns2:Author>
<ns2:Binding>Paperback</ns2:Binding>
<ns2:Brand>Apress</ns2:Brand>
<ns2:Edition>5</ns2:Edition>
<ns2:ItemDimensions>
<ns2:Height Units="inches">9.21</ns2:Height>
<ns2:Length Units="inches">7.48</ns2:Length>
<ns2:Width Units="inches">2.52</ns2:Width>
<ns2:Weight Units="pounds">5.80</ns2:Weight>
</ns2:ItemDimensions>
<ns2:IsAutographed>false</ns2:IsAutographed>
<ns2:IsEligibleForTradeIn>true</ns2:IsEligibleForTradeIn>
<ns2:IsMemorabilia>false</ns2:IsMemorabilia>
<ns2:Label>Apress</ns2:Label>
<ns2:Languages>
<ns2:Language>
<ns2:Name>english</ns2:Name>
<ns2:Type>Unknown</ns2:Type>
</ns2:Language>
<ns2:Language>
<ns2:Name>english</ns2:Name>
<ns2:Type>Original Language</ns2:Type>
</ns2:Language>
<ns2:Language>
<ns2:Name>english</ns2:Name>
<ns2:Type>Published</ns2:Type>
</ns2:Language>
</ns2:Languages>
<ns2:ListPrice>
<ns2:Amount>59.99</ns2:Amount>
<ns2:CurrencyCode>USD</ns2:CurrencyCode>
</ns2:ListPrice>
<ns2:Manufacturer>Apress</ns2:Manufacturer>
<ns2:NumberOfItems>1</ns2:NumberOfItems>
<ns2:NumberOfPages>1752</ns2:NumberOfPages>
<ns2:PackageDimensions>
<ns2:Height Units="inches">2.60</ns2:Height>
<ns2:Length Units="inches">9.20</ns2:Length>
<ns2:Width Units="inches">7.50</ns2:Width>
<ns2:Weight Units="pounds">5.80</ns2:Weight>
</ns2:PackageDimensions>
<ns2:PartNumber>9781430225492</ns2:PartNumber>
<ns2:ProductGroup>Book</ns2:ProductGroup>
<ns2:ProductTypeName>ABIS_BOOK</ns2:ProductTypeName>
<ns2:PublicationDate>2010-05-14</ns2:PublicationDate>
<ns2:Publisher>Apress</ns2:Publisher>
<ns2:SmallImage>
<ns2:URL>http://ecx.images-amazon.com/images/I/51h9Sju5NKL._SL75_.jpg</ns2:URL>
<ns2:Height Units="pixels">75</ns2:Height>
<ns2:Width Units="pixels">61</ns2:Width>
</ns2:SmallImage>
<ns2:Studio>Apress</ns2:Studio>
<ns2:Title>Pro C# 2010 and the .NET 4 Platform</ns2:Title>
</ns2:ItemAttributes>
</AttributeSets>
<Relationships/>
<SalesRankings>
<SalesRank>
<ProductCategoryId>book_display_on_website</ProductCategoryId>
<Rank>43011</Rank>
</SalesRank>
<SalesRank>
<ProductCategoryId>697342</ProductCategoryId>
<Rank>36</Rank>
</SalesRank>
<SalesRank>
<ProductCategoryId>3967</ProductCategoryId>
<Rank>53</Rank>
</SalesRank>
<SalesRank>
<ProductCategoryId>4013</ProductCategoryId>
<Rank>83</Rank>
</SalesRank>
</SalesRankings>
</Product>
</GetMatchingProductResult>
<ResponseMetadata>
<RequestId>440cdde0-fa76-4c48-bdd1-d51a3b467823</RequestId>
</ResponseMetadata>
</GetMatchingProductResponse>

I find "Linq To Xml" easier to use
var xDoc = XDocument.Parse(xml); //or XDocument.Load(filename);
XNamespace ns = "http://mws.amazonservices.com/schema/Products/2011-10-01";
var items = xDoc.Descendants(ns + "ItemAttributes")
.Select(x => new
{
Author = x.Element(ns + "Author").Value,
Brand = x.Element(ns + "Brand").Value,
Dimesions = x.Element(ns+"ItemDimensions").Descendants()
.Select(dim=>new{
Type = dim.Name.LocalName,
Unit = dim.Attribute("Units").Value,
Value = dim.Value
})
.ToList()
})
.ToList();

You could reinvent the wheel, or you could use Amazon's wheel (see #George Duckett's answer for the direct link):
Amazon Marketplace API
One option to address your question: if you want a tool that will enable you to work with your xml file, I would look at xsd.exe. MSDN for xsd.exe
This tool is able to generate classes from xml.
Otherwise, you can create a parser from the XDocument class that will allow you to use linq to build a parser such as #L.B noted in his post.

You have not made clear exactly what you need from the XML, so I cannot give you an objective answer. I'll begin by stating that there are many different ways to parse XML using .Net (and C# in your case, albeit they are similar with VB and C#).
The first one that I would look into is actually modeling your XML Data into .Net objects, more specifically, POCOs. To that class model you could add attributes that would bind or relate them to the XML and then all you'd need to do is pass the data and the class to a XML deserializer.
Now, if you don't need to retrieve the whole object, you can either use XDocument or XmlDocument. The fun part of XDocument is that its syntax in LINQ friendly, so you can parse you XML very simply.
XmlDocument is more old-style sequential method invocation, but achieves the same thing.
Let me illustrate. Consider a simpler XML, for simplicity sake's:
<body>
<h1>This is a text.</h1>
<p class="SomeClass">This is a paragraph</p>
</body>
(see what I did there? That HTML is a valid XML!)
I. Using A Deserializer:
First you model the classes:
[XmlRoot]
public class body
{
[XmlElement]
public h1 h1 { get; set; }
[XmlElement]
public p p { get; set; }
}
public class h1
{
[XmlText]
public string innerXML { get; set; }
}
public class p
{
[XmlAttribute]
public string id { get; set; }
[XmlText]
public string innerXML { get; set; }
}
Once you have your class model, you call the serializer.
void Main()
{
string xml =
#"<body>
<h1>This is a text.</h1>
<p id=""SomeId"">This is a paragraph</p>
</body>";
// Creates a stream that reads from the string
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(xml);
writer.Flush();
stream.Position = 0;
// Check the classes below before proceding.
XmlSerializer serializer = new XmlSerializer(typeof(body));
var obj = (body)serializer.Deserialize(stream);
// Check obj here with the debugger. All fields are filled.
}
II. Using XDocument
The example above makes for a very neat code, since you access everything typed. However, it demands a lot of setup work since you must model the classes. Maybe some simpler will suffice in your case. Let's say you want to get the attribute of the p element:
void Main()
{
string xml =
#"<body>
<h1>This is a text.</h1>
<p id=""SomeId"">This is a paragraph</p>
</body>";
// Creates a stream that reads from the string
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(xml);
writer.Flush();
stream.Position = 0;
// Using XDocument
var pAttrib = XDocument.Load(stream).Element("body").Element("p").Attribute("id").Value;
Console.Writeline(pAttrib);
}
Simple, huh? You can do more complex stuff throwing LINQ there... Let's try to find the element with id named "SomeId":
void Main()
{
string xml =
#"<body>
<h1>This is a text.</h1>
<p id=""SomeId"">This is a paragraph</p>
</body>";
// Creates a stream that reads from the string
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(xml);
writer.Flush();
stream.Position = 0;
// Using XDocument
var doc = XDocument.Load(stream);
var found = from body in doc.Elements("body")
from elem in body.Elements()
from attrib in elem.Attributes()
where attrib.Name == "id" && attrib.Value == "SomeId"
select elem;
foreach (var e in found) Console.WriteLine(e);
}
Hope it helps.

Related

Linq to XML data structure

This is my XML:
<home>
<contents>
<row>
<content>
<idContent>1</idContent>
<title>title1</title>
</content>
<content>
<idContent>2</idContent>
<title>title2</title>
</content>
</row>
<row>
<content>
<idContent>3</idContent>
<title>title3</title>
</content>
<content>
<idContent>4</idContent>
<title>title4</title>
</content>
</row>
</contents>
I want to store this information in a list of objects
List myList = ...
Where a Content could be:
int idContent;
string title;
int row_number;
Each Content object has to store the row where it is located in the XML.
What's the best way for doing this?
Presuming row_number is simply a sequence relating to the order it appears in the XML, then you can do something like this:
var doc = XDocument.Parse(xml);
var contents = doc.Descendants("row")
.Select((e, index) => new {Row = e, RowIndex = index})
.SelectMany(x => x.Row.Elements("content").Select(e => new {Content = e, x.RowIndex}))
.Select(x => new Content
{
IdContent = (int)x.Content.Element("idContent"),
Title = (string)x.Content.Element("title"),
RowNumber = x.RowIndex + 1
}).ToList();
I use this method for similar scenarios:
public static Object CreateObject(string XMLString, Object YourClassObject)
{
System.Xml.Serialization.XmlSerializer oXmlSerializer = new System.Xml.Serialization.XmlSerializer(YourClassObject.GetType());
//The StringReader will be the stream holder for the existing XML file
YourClassObject = oXmlSerializer.Deserialize(new System.IO.StringReader(XMLString));
//initially deserialized, the data is represented by an object without a defined type
return YourClassObject;
}
With this method you can create a class object from a XML String. I haven't tested it on your scenario, but you can use it on object of following home class:
public class home
{
public List<row> contents;
}
public class row
{
public List<content> content;
}
public class content
{
public int idContent;
public string title;
}
USAGE:
home h = new home();
h = (home)CreateObject(xml, h);
Remember that the variable and class names must be exactly same as that of XML nodes.
EXTRA:
If you want to convert a class object into XML String, use this method:
string CreateXML(Object YourClassObject)
{
XmlDocument xmlDoc = new XmlDocument(); //Represents an XML document,
// Initializes a new instance of the XmlDocument class.
XmlSerializer xmlSerializer = new XmlSerializer(YourClassObject.GetType());
// Creates a stream whose backing store is memory.
using (MemoryStream xmlStream = new MemoryStream())
{
xmlSerializer.Serialize(xmlStream, YourClassObject);
xmlStream.Position = 0;
//Loads the XML document from the specified string.
xmlDoc.Load(xmlStream);
return xmlDoc.InnerXml;
}
}
The best solution I could think of - although it is not using Linq to XML - is to feed your XML document into one of those various XML Schema generators like http://www.freeformatter.com/xsd-generator.html and pipe this generated Schema right into xsd.exe (see https://msdn.microsoft.com/en-us/library/x6c1kb0s.aspx). The generated code can be used to read (and write) the XML document such as the one you provided and you can of course apply Linq to Objects on this collection.
However, since you mentioned the row_number field which does not yet appear in your example XML, you would have to add this to the XML or manually edit the XML Schema afterwards.

Modifying element of Json string (in C#)

I'm trying to modify an attribute of an XML string using Json in C#. Currently I'm doing the following:
XmlDocument serializedFormXml = new XmlDocument();
serializedFormXml.LoadXml(mySerializedForm);
string formJsonString = JsonConvert.SerializeXmlNode(serializedFormXml, Newtonsoft.Json.Formatting.None, true);
JObject formJsonObj = JObject.Parse(formJsonString);
formJsonObj["#code"] = "myNewValue";
var xml = JsonConvert.DeserializeXmlNode(formJsonObj.ToString()).ToString();
When I do this I get get an exception on the last line:
Unable to cast object of type 'Newtonsoft.Json.Converters.XmlDocumentWrapper' to type 'Newtonsoft.Json.Converters.IXmlElement'
Any ideas what I'm doing wrong and how I can fix modify my form attribute "code"?
This is the XML I'm using:
<Form code="XYZ">
<Info>Data</Info>
.....
Thanks!
That's going to be way, way easier with Linq-to-XML:
var doc = XDocument.Parse(mySerializedForm);
doc.Root.SetAttributeValue(doc.Root.Name.Namespace + "code", "myNewValue");
var xml = doc.ToString();
This drops the XML declaration. If you need the XML declaration included, you can use the following extension method:
public static class XObjectExtensions
{
public static string ToXml(this XDocument xDoc)
{
using (var writer = new StringWriter())
{
xDoc.Save(writer);
return writer.ToString();
}
}
}
And then write:
var xml = doc.ToXml();
If specifically you need to make the encoding string say "UTF-8", use Utf8StringWriter from this answer.
Update
The reason you code fails is that you stripped the XML root element name away when you converted to json by passing true here:
string formJsonString = JsonConvert.SerializeXmlNode(serializedFormXml, Newtonsoft.Json.Formatting.None, true);
Thus you need to add it back when converting back:
var xml = JsonConvert.DeserializeXmlNode(formJsonObj.ToString(), serializedFormXml.DocumentElement.Name).ToString();

XML Deserialization of many types

I am deserializing a large xml doc into a C# object.
I've run into an issue where there are multiple xml elements on the same line, and am having trouble re-constructing them properly in code.
A snippet example as so:
<parent>
<ce:para view="all">
Text <ce:cross-ref refid="123">[1]</ce:cross-ref> More Text <ce:italic>Italicized text</ce:italic> and more text here
</ce:para>
<ce:para>...</ce:para>
</parent>
The generated C# class looks like this
[XmlRoot(ElementName = "para", Namespace = "namespace")]
public class Para
{
[XmlElement(ElementName = "cross-ref", Namespace = "namespace")]
public List<Crossref> Crossref { get; set; }
[XmlText]
public List<string> Text { get; set; }
[XmlElement(ElementName = "italic", Namespace = "namespace")]
public List<Italic> Italic { get; set; }
}
I want to be able to loop over this object and re-construct the sentence as a plain string.
Text [1] More Text Italicized Text and more text here
The only problem is though when the deserialization happens, the order is lost as each bit is stuck into it's respective object. This means I have no way of knowing how to reconstruct the string back to how it is supposed to be.
Text: {"Text", "More Text", "and more text here"}
Crossref: {"[1]"}
Italic: {"Italicized Text"}
I've thought about bringing in the whole element in as a string, and then scrubbing the tags out of it, but I'm not sure how to properly get it deserialized. Or if there is a better way to go about it.
Disclaimer: I am not able to alter the XML document as it is coming in from a 3rd party.
Thanks
Once you have deserialized the 3rd party XML into an object that directly matches the XML's schema (as you have done already in your example above) you should be able to use XmlNode.InnerText() on the <ce:para node to extract what you're looking for without having to write any parsing code.
At that point, you could do a translation from the object you deserialized into from the raw 3rd party XML into an object which flattens out the <ce:para node into a simple string.
As per Chris' request, I'm posting my solution. It probably could used refining as I'm not very experienced with linq queries.
XDocument xdoc = xmlAdapter.GetAsXDoc(xmlstring);
IEnumerable<XElement> body = from b in xdoc.Descendants()
where b.Name.LocalName == "body"
select b;
IEnumerable<XElement> sections = from s in body.Descendants()
where s.Name.LocalName == "sections"
select s;
IEnumerable<XElement> paragraphs = from p in sections.Descendants()
where p.Name.LocalName == "para"
select p;
string bodytext = "";
if (paragraphs.Count() > 0)
{
StringBuilder text = new StringBuilder();
foreach (XElement p in paragraphs)
{
text.AppendFormat("{0} ", p.Value);
}
}
bodytext = text.ToString();

XmlSerializer - Present an array of items

I have an XML File that looks like this :
<ROOT><DOC> ... </DOC><DOC> ... </DOC><DOC> ... </DOC></ROOT>
I want to put all the DOC in an array.
How do I do that in C# (XmlSerializer) ?
In essence, you need a string that contains your XML, a StringReader to read the string, an XMLReader to read the feed from the StringReader and an XDocument to store the feed from the XMLReader. This can be done in a single line of code, like this:
XDocument xDoc = XDocument.Load (XmlReader.Create (new StringReader (xmlString)));
The xmlString is the path (and name) of the file you're reading. You should use a List to store the data you'll get (unless it's a set number, then you can just use a string[]).
List<string> docList = new List<string>();
Then it's a matter of using a foreach loop to go through the XML elements and adding them to your list:
foreach (var element in xDoc.Descendants("ROOT"))
{
string doc = element.Element ("DOC").Value;
docList.Add (doc);
}
to make it an array, use:
docList.ToArray();
I hope this helps! Good luck.
Maybe it depends on the framework version. I have .net v4 and would use the following class with XmlSerializer.
Thanks to #Reniuz for the hint of the error. Here is a full working example:
public class Document
{
[XmlAttribute]
public string Value { get; set; }
}
[XmlRoot("ROOT")]
public class Root
{
[XmlElement("DOC")]
public List<Document> Documents { get; set; }
}
Using this code to load:
string data = "<ROOT><DOC Value=\"adhfjasdhf\"></DOC><DOC Value=\"asldfhalsdh\"></DOC></ROOT>";
XmlSerializer serializer = new XmlSerializer(typeof(Root));
using (StringReader sr = new StringReader(data))
{
Root root = serializer.Deserialize(sr) as Root;
}
Keep attantion that the tags are case sensitive.
This is the right answer, based on Magicbjorn answer :
First of all, i'm getting my string from a StreamReader.
using(StreamReader read = new StreamReader("FilePath.xml"))
{
XDocument xDoc = XDocument.Load(XmlReader.Create(read));
List<string> docList = new List<string>();
var root = xDoc.Element("ROOT");
foreach (var element in root.Elements("DOC"))
{
string s = element.Value;
docList.Add(s);
}
}

C# newbie: reading repetitive XML to memory

I'm new to C#. I'm building an application that persists an XML file with a list of elements. The structure of my XML file is as follows:
<Elements>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
</Elements>
I have < 100 of those items, and it's a single list (so I'm considering a DB solution to be overkill, even SQLite). When my application loads, I want to read this list of elements to memory. At present, after browsing the web a bit, I'm using XmlTextReader.
However, and maybe I'm using it in the wrong way, I read the data tag-by-tag, and thus expect the tags to be in a certain order (otherwise the code will be messy). What I would like to do is read complete "Element" structures and extract tags from them by name. I'm sure it's possible, but how?
To clarify, the main difference is that the way I'm using XmlTextReader today, it's not tolerant to scenarios such as wrong order of tags (e.g. Type comes before Name in a certain Element).
What's the best practice for loading such structures to memory in C#?
It's really easy to do in LINQ to XML. Are you using .NET 3.5? Here's a sample:
using System;
using System.Xml.Linq;
using System.Linq;
class Test
{
[STAThread]
static void Main()
{
XDocument document = XDocument.Load("test.xml");
var items = document.Root
.Elements("Element")
.Select(element => new {
Name = (string)element.Element("Name"),
Type = (string)element.Element("Type"),
Color = (string)element.Element("Color")})
.ToList();
foreach (var x in items)
{
Console.WriteLine(x);
}
}
}
You probably want to create your own data structure to hold each element, but you just need to change the "Select" call to use that.
Any particular reason you're not using XmlDocument?
XmlDocument myDoc = new XmlDocument()
myDoc.Load(fileName);
foreach(XmlElement elem in myDoc.SelectNodes("Elements/Element"))
{
XmlNode nodeName = elem.SelectSingleNode("Name/text()");
XmlNode nodeType = elem.SelectSingleNode("Type/text()");
XmlNode nodeColor = elem.SelectSingleNode("Color/text()");
string name = nodeName!=null ? nodeName.Value : String.Empty;
string type = nodeType!=null ? nodeType.Value : String.Empty;
string color = nodeColor!=null ? nodeColor.Value : String.Empty;
// Here you use the values for something...
}
It sounds like XDocument, and XElement might be better suited for this task. They might not have the absolute speed of XmlTextReader, but for your cases they sound like they would be appropriate and it would make dealing with fixed structures a lot easier. Parsing out elements would work like so:
XDocument xml;
foreach (XElement el in xml.Element("Elements").Elements("Element")) {
var name = el.Element("Name").Value;
// etc.
}
You can even get a bit fancier with Linq:
XDocument xml;
var collection = from el in xml.Element("Elements").Elements("Element")
select new { Name = el.Element("Name").Value,
Color = el.Element("Color").Value,
Type = el.Element("Type").Value
};
foreach (var item in collection) {
// here you can use item.Color, item.Name, etc..
}
You could use XmlSerializer class (http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx)
public class Element
{
public string Name { get; set; }
public string Type { get; set; }
public string Color { get; set; }
}
class Program
{
static void Main(string[] args)
{
string xml =
#"<Elements>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>(...)</Elements>";
XmlSerializer serializer = new XmlSerializer(typeof(Element[]), new XmlRootAttribute("Elements"));
Element[] result = (Element[])serializer.Deserialize(new StringReader(xml));}
You should check out Linq2Xml, http://www.hookedonlinq.com/LINQtoXML5MinuteOverview.ashx

Categories

Resources