I'm new to C#. I'm building an application that persists an XML file with a list of elements. The structure of my XML file is as follows:
<Elements>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>
</Elements>
I have < 100 of those items, and it's a single list (so I'm considering a DB solution to be overkill, even SQLite). When my application loads, I want to read this list of elements to memory. At present, after browsing the web a bit, I'm using XmlTextReader.
However, and maybe I'm using it in the wrong way, I read the data tag-by-tag, and thus expect the tags to be in a certain order (otherwise the code will be messy). What I would like to do is read complete "Element" structures and extract tags from them by name. I'm sure it's possible, but how?
To clarify, the main difference is that the way I'm using XmlTextReader today, it's not tolerant to scenarios such as wrong order of tags (e.g. Type comes before Name in a certain Element).
What's the best practice for loading such structures to memory in C#?
It's really easy to do in LINQ to XML. Are you using .NET 3.5? Here's a sample:
using System;
using System.Xml.Linq;
using System.Linq;
class Test
{
[STAThread]
static void Main()
{
XDocument document = XDocument.Load("test.xml");
var items = document.Root
.Elements("Element")
.Select(element => new {
Name = (string)element.Element("Name"),
Type = (string)element.Element("Type"),
Color = (string)element.Element("Color")})
.ToList();
foreach (var x in items)
{
Console.WriteLine(x);
}
}
}
You probably want to create your own data structure to hold each element, but you just need to change the "Select" call to use that.
Any particular reason you're not using XmlDocument?
XmlDocument myDoc = new XmlDocument()
myDoc.Load(fileName);
foreach(XmlElement elem in myDoc.SelectNodes("Elements/Element"))
{
XmlNode nodeName = elem.SelectSingleNode("Name/text()");
XmlNode nodeType = elem.SelectSingleNode("Type/text()");
XmlNode nodeColor = elem.SelectSingleNode("Color/text()");
string name = nodeName!=null ? nodeName.Value : String.Empty;
string type = nodeType!=null ? nodeType.Value : String.Empty;
string color = nodeColor!=null ? nodeColor.Value : String.Empty;
// Here you use the values for something...
}
It sounds like XDocument, and XElement might be better suited for this task. They might not have the absolute speed of XmlTextReader, but for your cases they sound like they would be appropriate and it would make dealing with fixed structures a lot easier. Parsing out elements would work like so:
XDocument xml;
foreach (XElement el in xml.Element("Elements").Elements("Element")) {
var name = el.Element("Name").Value;
// etc.
}
You can even get a bit fancier with Linq:
XDocument xml;
var collection = from el in xml.Element("Elements").Elements("Element")
select new { Name = el.Element("Name").Value,
Color = el.Element("Color").Value,
Type = el.Element("Type").Value
};
foreach (var item in collection) {
// here you can use item.Color, item.Name, etc..
}
You could use XmlSerializer class (http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx)
public class Element
{
public string Name { get; set; }
public string Type { get; set; }
public string Color { get; set; }
}
class Program
{
static void Main(string[] args)
{
string xml =
#"<Elements>
<Element>
<Name>Value</Name>
<Type>Value</Type>
<Color>Value</Color>
</Element>(...)</Elements>";
XmlSerializer serializer = new XmlSerializer(typeof(Element[]), new XmlRootAttribute("Elements"));
Element[] result = (Element[])serializer.Deserialize(new StringReader(xml));}
You should check out Linq2Xml, http://www.hookedonlinq.com/LINQtoXML5MinuteOverview.ashx
Related
How do I read and parse an XML file in C#?
XmlDocument to read an XML from string or from file.
using System.Xml;
XmlDocument doc = new XmlDocument();
doc.Load("c:\\temp.xml");
or
doc.LoadXml("<xml>something</xml>");
then find a node below it ie like this
XmlNode node = doc.DocumentElement.SelectSingleNode("/book/title");
or
foreach(XmlNode node in doc.DocumentElement.ChildNodes){
string text = node.InnerText; //or loop through its children as well
}
then read the text inside that node like this
string text = node.InnerText;
or read an attribute
string attr = node.Attributes["theattributename"]?.InnerText
Always check for null on Attributes["something"] since it will be null if the attribute does not exist.
LINQ to XML Example:
// Loading from a file, you can also load from a stream
var xml = XDocument.Load(#"C:\contacts.xml");
// Query the data and write out a subset of contacts
var query = from c in xml.Root.Descendants("contact")
where (int)c.Attribute("id") < 4
select c.Element("firstName").Value + " " +
c.Element("lastName").Value;
foreach (string name in query)
{
Console.WriteLine("Contact's Full Name: {0}", name);
}
Reference: LINQ to XML at MSDN
Here's an application I wrote for reading xml sitemaps:
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using System.Xml;
namespace SiteMapReader
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Please Enter the Location of the file");
// get the location we want to get the sitemaps from
string dirLoc = Console.ReadLine();
// get all the sitemaps
string[] sitemaps = Directory.GetFiles(dirLoc);
StreamWriter sw = new StreamWriter(Application.StartupPath + #"\locs.txt", true);
// loop through each file
foreach (string sitemap in sitemaps)
{
try
{
// new xdoc instance
XmlDocument xDoc = new XmlDocument();
//load up the xml from the location
xDoc.Load(sitemap);
// cycle through each child noed
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
// first node is the url ... have to go to nexted loc node
foreach (XmlNode locNode in node)
{
// thereare a couple child nodes here so only take data from node named loc
if (locNode.Name == "loc")
{
// get the content of the loc node
string loc = locNode.InnerText;
// write it to the console so you can see its working
Console.WriteLine(loc + Environment.NewLine);
// write it to the file
sw.Write(loc + Environment.NewLine);
}
}
}
}
catch { }
}
Console.WriteLine("All Done :-)");
Console.ReadLine();
}
static void readSitemap()
{
}
}
}
Code on Paste Bin
http://pastebin.com/yK7cSNeY
There are lots of way, some:
XmlSerializer. use a class with the target schema
you want to read - use XmlSerializer
to get the data in an Xml loaded into
an instance of the class.
Linq 2 xml
XmlTextReader.
XmlDocument
XPathDocument (read-only access)
You could use a DataSet to read XML strings.
var xmlString = File.ReadAllText(FILE_PATH);
var stringReader = new StringReader(xmlString);
var dsSet = new DataSet();
dsSet.ReadXml(stringReader);
Posting this for the sake of information.
You can either:
Use XmlSerializer class
Use XmlDocument class
Examples are on the msdn pages provided
Linq to XML.
Also, VB.NET has much better xml parsing support via the compiler than C#. If you have the option and the desire, check it out.
Check out XmlTextReader class for instance.
There are different ways, depending on where you want to get.
XmlDocument is lighter than XDocument, but if you wish to verify minimalistically that a string contains XML, then regular expression is possibly the fastest and lightest choice you can make. For example, I have implemented Smoke Tests with SpecFlow for my API and I wish to test if one of the results in any valid XML - then I would use a regular expression. But if I need to extract values from this XML, then I would parse it with XDocument to do it faster and with less code. Or I would use XmlDocument if I have to work with a big XML (and sometimes I work with XML's that are around 1M lines, even more); then I could even read it line by line. Why? Try opening more than 800MB in private bytes in Visual Studio; even on production you should not have objects bigger than 2GB. You can with a twerk, but you should not. If you would have to parse a document, which contains A LOT of lines, then this documents would probably be CSV.
I have written this comment, because I see a lof of examples with XDocument. XDocument is not good for big documents, or when you only want to verify if there the content is XML valid. If you wish to check if the XML itself makes sense, then you need Schema.
I also downvoted the suggested answer, because I believe it needs the above information inside itself. Imagine I need to verify if 200M of XML, 10 times an hour, is valid XML. XDocument will waste a lof of resources.
prasanna venkatesh also states you could try filling the string to a dataset, it will indicate valid XML as well.
public void ReadXmlFile()
{
string path = HttpContext.Current.Server.MapPath("~/App_Data"); // Finds the location of App_Data on server.
XmlTextReader reader = new XmlTextReader(System.IO.Path.Combine(path, "XMLFile7.xml")); //Combines the location of App_Data and the file name
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
break;
case XmlNodeType.Text:
columnNames.Add(reader.Value);
break;
case XmlNodeType.EndElement:
break;
}
}
}
You can avoid the first statement and just specify the path name in constructor of XmlTextReader.
If you want to retrive a particular value from an XML file
XmlDocument _LocalInfo_Xml = new XmlDocument();
_LocalInfo_Xml.Load(fileName);
XmlElement _XmlElement;
_XmlElement = _LocalInfo_Xml.GetElementsByTagName("UserId")[0] as XmlElement;
string Value = _XmlElement.InnerText;
Here is another approach using Cinchoo ETL - an open source library to parse xml file with few lines of code.
using (var r = ChoXmlReader<Item>.LoadText(xml)
.WithXPath("//item")
)
{
foreach (var rec in r)
rec.Print();
}
public class Item
{
public string Name { get; set; }
public string ProtectionLevel { get; set; }
public string Description { get; set; }
}
Sample fiddle: https://dotnetfiddle.net/otYq5j
Disclaimer: I'm author of this library.
I am deserializing a large xml doc into a C# object.
I've run into an issue where there are multiple xml elements on the same line, and am having trouble re-constructing them properly in code.
A snippet example as so:
<parent>
<ce:para view="all">
Text <ce:cross-ref refid="123">[1]</ce:cross-ref> More Text <ce:italic>Italicized text</ce:italic> and more text here
</ce:para>
<ce:para>...</ce:para>
</parent>
The generated C# class looks like this
[XmlRoot(ElementName = "para", Namespace = "namespace")]
public class Para
{
[XmlElement(ElementName = "cross-ref", Namespace = "namespace")]
public List<Crossref> Crossref { get; set; }
[XmlText]
public List<string> Text { get; set; }
[XmlElement(ElementName = "italic", Namespace = "namespace")]
public List<Italic> Italic { get; set; }
}
I want to be able to loop over this object and re-construct the sentence as a plain string.
Text [1] More Text Italicized Text and more text here
The only problem is though when the deserialization happens, the order is lost as each bit is stuck into it's respective object. This means I have no way of knowing how to reconstruct the string back to how it is supposed to be.
Text: {"Text", "More Text", "and more text here"}
Crossref: {"[1]"}
Italic: {"Italicized Text"}
I've thought about bringing in the whole element in as a string, and then scrubbing the tags out of it, but I'm not sure how to properly get it deserialized. Or if there is a better way to go about it.
Disclaimer: I am not able to alter the XML document as it is coming in from a 3rd party.
Thanks
Once you have deserialized the 3rd party XML into an object that directly matches the XML's schema (as you have done already in your example above) you should be able to use XmlNode.InnerText() on the <ce:para node to extract what you're looking for without having to write any parsing code.
At that point, you could do a translation from the object you deserialized into from the raw 3rd party XML into an object which flattens out the <ce:para node into a simple string.
As per Chris' request, I'm posting my solution. It probably could used refining as I'm not very experienced with linq queries.
XDocument xdoc = xmlAdapter.GetAsXDoc(xmlstring);
IEnumerable<XElement> body = from b in xdoc.Descendants()
where b.Name.LocalName == "body"
select b;
IEnumerable<XElement> sections = from s in body.Descendants()
where s.Name.LocalName == "sections"
select s;
IEnumerable<XElement> paragraphs = from p in sections.Descendants()
where p.Name.LocalName == "para"
select p;
string bodytext = "";
if (paragraphs.Count() > 0)
{
StringBuilder text = new StringBuilder();
foreach (XElement p in paragraphs)
{
text.AppendFormat("{0} ", p.Value);
}
}
bodytext = text.ToString();
I have an XML File that looks like this :
<ROOT><DOC> ... </DOC><DOC> ... </DOC><DOC> ... </DOC></ROOT>
I want to put all the DOC in an array.
How do I do that in C# (XmlSerializer) ?
In essence, you need a string that contains your XML, a StringReader to read the string, an XMLReader to read the feed from the StringReader and an XDocument to store the feed from the XMLReader. This can be done in a single line of code, like this:
XDocument xDoc = XDocument.Load (XmlReader.Create (new StringReader (xmlString)));
The xmlString is the path (and name) of the file you're reading. You should use a List to store the data you'll get (unless it's a set number, then you can just use a string[]).
List<string> docList = new List<string>();
Then it's a matter of using a foreach loop to go through the XML elements and adding them to your list:
foreach (var element in xDoc.Descendants("ROOT"))
{
string doc = element.Element ("DOC").Value;
docList.Add (doc);
}
to make it an array, use:
docList.ToArray();
I hope this helps! Good luck.
Maybe it depends on the framework version. I have .net v4 and would use the following class with XmlSerializer.
Thanks to #Reniuz for the hint of the error. Here is a full working example:
public class Document
{
[XmlAttribute]
public string Value { get; set; }
}
[XmlRoot("ROOT")]
public class Root
{
[XmlElement("DOC")]
public List<Document> Documents { get; set; }
}
Using this code to load:
string data = "<ROOT><DOC Value=\"adhfjasdhf\"></DOC><DOC Value=\"asldfhalsdh\"></DOC></ROOT>";
XmlSerializer serializer = new XmlSerializer(typeof(Root));
using (StringReader sr = new StringReader(data))
{
Root root = serializer.Deserialize(sr) as Root;
}
Keep attantion that the tags are case sensitive.
This is the right answer, based on Magicbjorn answer :
First of all, i'm getting my string from a StreamReader.
using(StreamReader read = new StreamReader("FilePath.xml"))
{
XDocument xDoc = XDocument.Load(XmlReader.Create(read));
List<string> docList = new List<string>();
var root = xDoc.Element("ROOT");
foreach (var element in root.Elements("DOC"))
{
string s = element.Value;
docList.Add(s);
}
}
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Amazon Marketplace XML parsing
I am new to parsing XML in C# and I have some data from Amazon MWS library that is displayed below. I need to parse out various ItemAttributes such as ItemDimensions. I am use to JSON responses so I am not sure how to apply this to XML. Would it be possible from someone to point me in the right direction? I have Googled XML Parsing with C# but not valuable results were found to help me complete my tasks.
<GetMatchingProductResponse xmlns="http://mws.amazonservices.com/schema/Products/2011-10-01">
<GetMatchingProductResult ASIN="1430225491" status="Success">
<Product>
<Identifiers>
<MarketplaceASIN>
<MarketplaceId>ATVPDKIKX0DER</MarketplaceId>
<ASIN>1430225491</ASIN>
</MarketplaceASIN>
</Identifiers>
<AttributeSets>
<ns2:ItemAttributes xml:lang="en-US">
<ns2:Author>Troelsen, Andrew</ns2:Author>
<ns2:Binding>Paperback</ns2:Binding>
<ns2:Brand>Apress</ns2:Brand>
<ns2:Edition>5</ns2:Edition>
<ns2:ItemDimensions>
<ns2:Height Units="inches">9.21</ns2:Height>
<ns2:Length Units="inches">7.48</ns2:Length>
<ns2:Width Units="inches">2.52</ns2:Width>
<ns2:Weight Units="pounds">5.80</ns2:Weight>
</ns2:ItemDimensions>
<ns2:IsAutographed>false</ns2:IsAutographed>
<ns2:IsEligibleForTradeIn>true</ns2:IsEligibleForTradeIn>
<ns2:IsMemorabilia>false</ns2:IsMemorabilia>
<ns2:Label>Apress</ns2:Label>
<ns2:Languages>
<ns2:Language>
<ns2:Name>english</ns2:Name>
<ns2:Type>Unknown</ns2:Type>
</ns2:Language>
<ns2:Language>
<ns2:Name>english</ns2:Name>
<ns2:Type>Original Language</ns2:Type>
</ns2:Language>
<ns2:Language>
<ns2:Name>english</ns2:Name>
<ns2:Type>Published</ns2:Type>
</ns2:Language>
</ns2:Languages>
<ns2:ListPrice>
<ns2:Amount>59.99</ns2:Amount>
<ns2:CurrencyCode>USD</ns2:CurrencyCode>
</ns2:ListPrice>
<ns2:Manufacturer>Apress</ns2:Manufacturer>
<ns2:NumberOfItems>1</ns2:NumberOfItems>
<ns2:NumberOfPages>1752</ns2:NumberOfPages>
<ns2:PackageDimensions>
<ns2:Height Units="inches">2.60</ns2:Height>
<ns2:Length Units="inches">9.20</ns2:Length>
<ns2:Width Units="inches">7.50</ns2:Width>
<ns2:Weight Units="pounds">5.80</ns2:Weight>
</ns2:PackageDimensions>
<ns2:PartNumber>9781430225492</ns2:PartNumber>
<ns2:ProductGroup>Book</ns2:ProductGroup>
<ns2:ProductTypeName>ABIS_BOOK</ns2:ProductTypeName>
<ns2:PublicationDate>2010-05-14</ns2:PublicationDate>
<ns2:Publisher>Apress</ns2:Publisher>
<ns2:SmallImage>
<ns2:URL>http://ecx.images-amazon.com/images/I/51h9Sju5NKL._SL75_.jpg</ns2:URL>
<ns2:Height Units="pixels">75</ns2:Height>
<ns2:Width Units="pixels">61</ns2:Width>
</ns2:SmallImage>
<ns2:Studio>Apress</ns2:Studio>
<ns2:Title>Pro C# 2010 and the .NET 4 Platform</ns2:Title>
</ns2:ItemAttributes>
</AttributeSets>
<Relationships/>
<SalesRankings>
<SalesRank>
<ProductCategoryId>book_display_on_website</ProductCategoryId>
<Rank>43011</Rank>
</SalesRank>
<SalesRank>
<ProductCategoryId>697342</ProductCategoryId>
<Rank>36</Rank>
</SalesRank>
<SalesRank>
<ProductCategoryId>3967</ProductCategoryId>
<Rank>53</Rank>
</SalesRank>
<SalesRank>
<ProductCategoryId>4013</ProductCategoryId>
<Rank>83</Rank>
</SalesRank>
</SalesRankings>
</Product>
</GetMatchingProductResult>
<ResponseMetadata>
<RequestId>440cdde0-fa76-4c48-bdd1-d51a3b467823</RequestId>
</ResponseMetadata>
</GetMatchingProductResponse>
I find "Linq To Xml" easier to use
var xDoc = XDocument.Parse(xml); //or XDocument.Load(filename);
XNamespace ns = "http://mws.amazonservices.com/schema/Products/2011-10-01";
var items = xDoc.Descendants(ns + "ItemAttributes")
.Select(x => new
{
Author = x.Element(ns + "Author").Value,
Brand = x.Element(ns + "Brand").Value,
Dimesions = x.Element(ns+"ItemDimensions").Descendants()
.Select(dim=>new{
Type = dim.Name.LocalName,
Unit = dim.Attribute("Units").Value,
Value = dim.Value
})
.ToList()
})
.ToList();
You could reinvent the wheel, or you could use Amazon's wheel (see #George Duckett's answer for the direct link):
Amazon Marketplace API
One option to address your question: if you want a tool that will enable you to work with your xml file, I would look at xsd.exe. MSDN for xsd.exe
This tool is able to generate classes from xml.
Otherwise, you can create a parser from the XDocument class that will allow you to use linq to build a parser such as #L.B noted in his post.
You have not made clear exactly what you need from the XML, so I cannot give you an objective answer. I'll begin by stating that there are many different ways to parse XML using .Net (and C# in your case, albeit they are similar with VB and C#).
The first one that I would look into is actually modeling your XML Data into .Net objects, more specifically, POCOs. To that class model you could add attributes that would bind or relate them to the XML and then all you'd need to do is pass the data and the class to a XML deserializer.
Now, if you don't need to retrieve the whole object, you can either use XDocument or XmlDocument. The fun part of XDocument is that its syntax in LINQ friendly, so you can parse you XML very simply.
XmlDocument is more old-style sequential method invocation, but achieves the same thing.
Let me illustrate. Consider a simpler XML, for simplicity sake's:
<body>
<h1>This is a text.</h1>
<p class="SomeClass">This is a paragraph</p>
</body>
(see what I did there? That HTML is a valid XML!)
I. Using A Deserializer:
First you model the classes:
[XmlRoot]
public class body
{
[XmlElement]
public h1 h1 { get; set; }
[XmlElement]
public p p { get; set; }
}
public class h1
{
[XmlText]
public string innerXML { get; set; }
}
public class p
{
[XmlAttribute]
public string id { get; set; }
[XmlText]
public string innerXML { get; set; }
}
Once you have your class model, you call the serializer.
void Main()
{
string xml =
#"<body>
<h1>This is a text.</h1>
<p id=""SomeId"">This is a paragraph</p>
</body>";
// Creates a stream that reads from the string
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(xml);
writer.Flush();
stream.Position = 0;
// Check the classes below before proceding.
XmlSerializer serializer = new XmlSerializer(typeof(body));
var obj = (body)serializer.Deserialize(stream);
// Check obj here with the debugger. All fields are filled.
}
II. Using XDocument
The example above makes for a very neat code, since you access everything typed. However, it demands a lot of setup work since you must model the classes. Maybe some simpler will suffice in your case. Let's say you want to get the attribute of the p element:
void Main()
{
string xml =
#"<body>
<h1>This is a text.</h1>
<p id=""SomeId"">This is a paragraph</p>
</body>";
// Creates a stream that reads from the string
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(xml);
writer.Flush();
stream.Position = 0;
// Using XDocument
var pAttrib = XDocument.Load(stream).Element("body").Element("p").Attribute("id").Value;
Console.Writeline(pAttrib);
}
Simple, huh? You can do more complex stuff throwing LINQ there... Let's try to find the element with id named "SomeId":
void Main()
{
string xml =
#"<body>
<h1>This is a text.</h1>
<p id=""SomeId"">This is a paragraph</p>
</body>";
// Creates a stream that reads from the string
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(xml);
writer.Flush();
stream.Position = 0;
// Using XDocument
var doc = XDocument.Load(stream);
var found = from body in doc.Elements("body")
from elem in body.Elements()
from attrib in elem.Attributes()
where attrib.Name == "id" && attrib.Value == "SomeId"
select elem;
foreach (var e in found) Console.WriteLine(e);
}
Hope it helps.
How do I read and parse an XML file in C#?
XmlDocument to read an XML from string or from file.
using System.Xml;
XmlDocument doc = new XmlDocument();
doc.Load("c:\\temp.xml");
or
doc.LoadXml("<xml>something</xml>");
then find a node below it ie like this
XmlNode node = doc.DocumentElement.SelectSingleNode("/book/title");
or
foreach(XmlNode node in doc.DocumentElement.ChildNodes){
string text = node.InnerText; //or loop through its children as well
}
then read the text inside that node like this
string text = node.InnerText;
or read an attribute
string attr = node.Attributes["theattributename"]?.InnerText
Always check for null on Attributes["something"] since it will be null if the attribute does not exist.
LINQ to XML Example:
// Loading from a file, you can also load from a stream
var xml = XDocument.Load(#"C:\contacts.xml");
// Query the data and write out a subset of contacts
var query = from c in xml.Root.Descendants("contact")
where (int)c.Attribute("id") < 4
select c.Element("firstName").Value + " " +
c.Element("lastName").Value;
foreach (string name in query)
{
Console.WriteLine("Contact's Full Name: {0}", name);
}
Reference: LINQ to XML at MSDN
Here's an application I wrote for reading xml sitemaps:
using System;
using System.Collections.Generic;
using System.Windows.Forms;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Data;
using System.Xml;
namespace SiteMapReader
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Please Enter the Location of the file");
// get the location we want to get the sitemaps from
string dirLoc = Console.ReadLine();
// get all the sitemaps
string[] sitemaps = Directory.GetFiles(dirLoc);
StreamWriter sw = new StreamWriter(Application.StartupPath + #"\locs.txt", true);
// loop through each file
foreach (string sitemap in sitemaps)
{
try
{
// new xdoc instance
XmlDocument xDoc = new XmlDocument();
//load up the xml from the location
xDoc.Load(sitemap);
// cycle through each child noed
foreach (XmlNode node in xDoc.DocumentElement.ChildNodes)
{
// first node is the url ... have to go to nexted loc node
foreach (XmlNode locNode in node)
{
// thereare a couple child nodes here so only take data from node named loc
if (locNode.Name == "loc")
{
// get the content of the loc node
string loc = locNode.InnerText;
// write it to the console so you can see its working
Console.WriteLine(loc + Environment.NewLine);
// write it to the file
sw.Write(loc + Environment.NewLine);
}
}
}
}
catch { }
}
Console.WriteLine("All Done :-)");
Console.ReadLine();
}
static void readSitemap()
{
}
}
}
Code on Paste Bin
http://pastebin.com/yK7cSNeY
There are lots of way, some:
XmlSerializer. use a class with the target schema
you want to read - use XmlSerializer
to get the data in an Xml loaded into
an instance of the class.
Linq 2 xml
XmlTextReader.
XmlDocument
XPathDocument (read-only access)
You could use a DataSet to read XML strings.
var xmlString = File.ReadAllText(FILE_PATH);
var stringReader = new StringReader(xmlString);
var dsSet = new DataSet();
dsSet.ReadXml(stringReader);
Posting this for the sake of information.
You can either:
Use XmlSerializer class
Use XmlDocument class
Examples are on the msdn pages provided
Linq to XML.
Also, VB.NET has much better xml parsing support via the compiler than C#. If you have the option and the desire, check it out.
Check out XmlTextReader class for instance.
There are different ways, depending on where you want to get.
XmlDocument is lighter than XDocument, but if you wish to verify minimalistically that a string contains XML, then regular expression is possibly the fastest and lightest choice you can make. For example, I have implemented Smoke Tests with SpecFlow for my API and I wish to test if one of the results in any valid XML - then I would use a regular expression. But if I need to extract values from this XML, then I would parse it with XDocument to do it faster and with less code. Or I would use XmlDocument if I have to work with a big XML (and sometimes I work with XML's that are around 1M lines, even more); then I could even read it line by line. Why? Try opening more than 800MB in private bytes in Visual Studio; even on production you should not have objects bigger than 2GB. You can with a twerk, but you should not. If you would have to parse a document, which contains A LOT of lines, then this documents would probably be CSV.
I have written this comment, because I see a lof of examples with XDocument. XDocument is not good for big documents, or when you only want to verify if there the content is XML valid. If you wish to check if the XML itself makes sense, then you need Schema.
I also downvoted the suggested answer, because I believe it needs the above information inside itself. Imagine I need to verify if 200M of XML, 10 times an hour, is valid XML. XDocument will waste a lof of resources.
prasanna venkatesh also states you could try filling the string to a dataset, it will indicate valid XML as well.
public void ReadXmlFile()
{
string path = HttpContext.Current.Server.MapPath("~/App_Data"); // Finds the location of App_Data on server.
XmlTextReader reader = new XmlTextReader(System.IO.Path.Combine(path, "XMLFile7.xml")); //Combines the location of App_Data and the file name
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
break;
case XmlNodeType.Text:
columnNames.Add(reader.Value);
break;
case XmlNodeType.EndElement:
break;
}
}
}
You can avoid the first statement and just specify the path name in constructor of XmlTextReader.
If you want to retrive a particular value from an XML file
XmlDocument _LocalInfo_Xml = new XmlDocument();
_LocalInfo_Xml.Load(fileName);
XmlElement _XmlElement;
_XmlElement = _LocalInfo_Xml.GetElementsByTagName("UserId")[0] as XmlElement;
string Value = _XmlElement.InnerText;
Here is another approach using Cinchoo ETL - an open source library to parse xml file with few lines of code.
using (var r = ChoXmlReader<Item>.LoadText(xml)
.WithXPath("//item")
)
{
foreach (var rec in r)
rec.Print();
}
public class Item
{
public string Name { get; set; }
public string ProtectionLevel { get; set; }
public string Description { get; set; }
}
Sample fiddle: https://dotnetfiddle.net/otYq5j
Disclaimer: I'm author of this library.