I want to compare 2 XML files.
My xml1 is:
<ROOT><NODE><BOOK><ID>1234</ID><NAME isbn="dafdfad">Numbers: Language of Science</NAME><AUTHOR>Tobias Dantzig</AUTHOR></BOOK></NODE></ROOT>
I have another XML from database which is
<Book xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><Id>12345</Id><Name isbn="31231223">Numbers: Language of Science</Name><Author>Tobias Dantzig</Author></Book>
I want to compare the "BOOK" node from XML1 and "Book" node from db XML
I have a Name-space in the XML which is obtained from database
The node names are in mixed cases
I want to compare these 2 XML files node by node for Text and attributes value
I am using C# and wanted to know if this is possible using LINQ
Any help would be really appreciated
P.S. I searched for similar posts but couldn't find what i am exactly looking for.
Thanks a lot in advance
Cheers,
Karthik
In xml, case and namespace are fundamentally important, and whitespace and attribute-order aren't (making direct string compares incorrect).
So IMO you should parse it; perhaps with XmlSerializer, but (as you note) both are trivially parsed with LINQ-to-XML:
string xml1 = #"<ROOT><NODE><BOOK><ID>1234</ID><NAME isbn=""dafdfad"">Numbers: Language of Science</NAME><AUTHOR>Tobias Dantzig</AUTHOR></BOOK></NODE></ROOT>";
var book1 = (from book in XElement.Parse(xml1).Elements("NODE").Elements("BOOK")
let nameEl = book.Element("NAME")
select new
{
Id = (int)book.Element("ID"),
Name = nameEl.Value,
Isbn = (string)nameEl.Attribute("isbn"),
Author = (string)book.Element("AUTHOR")
}).Single();
string xml2 = #"<Book xmlns:rdf=""http://www.w3.org/1999/02/22-rdf-syntax-ns#""><Id>12345</Id><Name isbn=""31231223"">Numbers: Language of Science</Name><Author>Tobias Dantzig</Author></Book>";
var el = XElement.Parse(xml2);
var book2 = new
{
Id = (int)el.Element("Id"),
Name = el.Element("Name").Value,
Isbn = el.Element("Name").Attribute("isbn"),
Author = el.Element("Author")
};
Then it is just a case of comparing the values.
An alternative is to use something like xslt to pre-process one of the files to match the expected layout of the other, so you can share parsing code. It depends whether you are already familiar with xslt, I guess.
can be done quite easily using Linq to XML or even simple Xml DOM.
though i would bravely do it with Regular Expressions.
one regex to find all the books, and a couple or so do dismantle each record.
Related
I want to extract an svg element by his class name with a C# regex.
For example I have this:
<path fill="none" ... class="highcharts-tracker highcharts-tracker" ... stroke-width="22" zIndex="2" style=""/>
And I want to delete every path elements with highcharts-tracker as class name by using :
new Regex("");
Anybody know ?
In LINQ to XML, this is pretty straightforward:
var classToRemove = "highlights-tracker";
var xml = XDocument.Parse(svg);
var elements = doc.Descendants("path")
.Where(x => x.Attribute("class") != null &&
x.Attribute("class")
.Value.Split(' ')
.Contains(classToRemove));
// Remove all the elements which match the query
elements.Remove();
You should not use regular expressions to try to parse XML... XML is very well handled by existing APIs, and regular expressions are not an appropriate tool.
EDIT: If it's malformed (which you should have said to start with) you should try to work out why it's malformed and fix it before you try to do any other processing. There's really no excuse for XML being malformed these days... there are plenty of good XML APIs for just about every platform in existence.
I'm trying to create a C# application that extracts data from pages like this one. It's basically an XML file that stores information about a music album. Here's the relevant code:
<resp stat="ok" version="2.0">
<release id="368116" status="Accepted">
<title>The Bends</title>
<tracklist>
<track>
<position>1</position>
<title>Planet Telex</title>
<duration>4:18</duration>
</track>
</tracklist>
</release>
I'd like to extract all the track titles from the album (in the above code "Planet Telex") and output them in a list like this:
Planet Telex
The Bends
...
what would be the best/most elegant way to do this? From what I've read, the XmlTextReader is a good class to use. I've also seen many mentions of Linq to XML... Thanks in advance!
BTW, I've posted this question again (albeit formulated differently). I'm not sure why it was removed last time.
If you can, go with LINQ to XML:
XDocument doc = XDocument.Load(xml);
var titles = doc.Descendants("title").Select(x => x.Value);
A more sophisticated version that distinguishes between the album and the track title is the following:
var titles = doc.Descendants("release")
.Select(x => new
{
AlbumTitle = x.Element("title").Value,
Tracks = x.Element("tracklist")
.Descendants("title")
.Select(y => y.Value)
});
It returns a list of anonymous types, each with a property AlbumTitle of type string and an IEnumerable<string> representing the track titles.
Use xsd.exe to generate a class structure from your XML file, then deserialize your XML into that class structure. It should be pretty straightforward.
Check out this simpleXml library
https://bitbucket.org/kberridge/simplexml
It's on NuGet by the way!
Install-package simpleXml
Although LINQ is certainly a valid approach, I figured I would mention at least one quick alternative: XPath. Here is an example:
XPathDocument doc = new XPathDocument("http://api.discogs.com/release/368116?f=xml");
XPathNavigator nav = doc.CreateNavigator();
XPathNodeIterator iter = (XPathNodeIterator)nav.Evaluate("//tracklist/track/title");
while (iter.MoveNext())
{
Console.WriteLine(iter.Current.Value);
}
Output is as follows:
Planet Telex
The Bends
High And Dry
Fake Plastic Trees
Bones
(Nice Dream)
Just
My Iron Lung
Bullet Proof..I Wish I Was
Black Star
Sulk
Street Spirit (Fade Out)
Note that I added ?f=xml on your sample URL, since the default output from the API is JSON.
I ahve experience serializing/deserializing XML files but I have never had to parse just a single statement, so I'm not sure how to go about this.
I have a string that holds this:
<Vol Model_Type="Flat">102.14</Vol>
And, I want to extract just the 102.14.
Should I use XPath, or is there a simpler option?
If you're using .NET 3.5 or above, use LINQ to XML. For example:
string x = "<Vol Model_Type=\"Flat\">102.14</Vol>";
XElement element = XElement.Parse(x);
decimal value = (decimal) element;
XML handling doesn't get much simpler than that :)
Of course, that's assuming you don't care about the element name or the attribute. If you do, LINQ to XML will still make it easy for you.
string x = "<Vol Model_Type=\"Flat\">102.14</Vol>";
XElement element = XElement.Parse(x);
decimal value=element.Value.Tostring();
I have an incoming file with data as
<root><![CDATA[<defs><elements>
<element><item>aa</item><int>1</int></element>
<element><item>bb</item><int>2</int></element>
<element><item>cc</item><int>3</int></element>
</elements></defs>]]></root>
writing multiple foreach( xElement x in root.Elements ) seems superfluous !
looking for a less verbose method preferably using C#
UPDATE - yes - the input is in a CDATA, rest assured it's not my design and i have ZERO control over it !
Assuming that nasty CDATA section is intentional, and you're only interested in the text content of your leaf elements, you can do something like:
XElement root = XElement.Load(yourFile);
var data = from element in XElement.Parse(root.Value).Descendants("element")
select new {
Item = element.Elements("item").First().Value,
Value = element.Elements("int").First().Value
};
That said, if the code that generates your input file is under your control, consider getting rid of the CDATA section. Storing XML within XML that way is not the way to go most of the time, as it defeats the purpose of the markup language (and requires multiple parser passes, as shown above).
I have tons of XML files all containing a the same XML Document, but with different values. But the structure is the same for each file.
Inside this file I have a datetime field.
What is the best, most efficient way to query these XML files? So I can retrieve for example... All files where the datetime field = today's date?
I'm using C# and .net v2. Should I be using XML objects to achieve this or text in file search routines?
Some code examples would be great... or just the general theory, anything would help, thanks...
This depends on the size of those files, and how complex the data actually is. As far as I understand the question, for this kind of XML data, using an XPath query and going through all the files might be the best approach, possibly caching the files in order to lessen the parsing overhead.
Have a look at:
XPathDocument, XmlDocument classes and XPath queries
http://support.microsoft.com/kb/317069
Something like this should do (not tested though):
XmlNamespaceManager nsmgr = new XmlNamespaceManager(new NameTable());
// if required, add your namespace prefixes here to nsmgr
XPathExpression expression = XPathExpression.Compile("//element[#date='20090101']", nsmgr); // your query as XPath
foreach (string fileName in Directory.GetFiles("PathToXmlFiles", "*.xml")) {
XPathDocument doc;
using (XmlTextReader reader = new XmlTextReader(fileName, nsmgr.NameTable)) {
doc = new XPathDocument(reader);
}
if (doc.CreateNavigator().SelectSingleNode(expression) != null) {
// matching document found
}
}
Note: while you can also load a XPathDocument directly from a URI/path, using the reader makes sure that the same nametable is being used as the one used to compile the XPath query. If a different nametable was being used, you'd not get results from the query.
You might look into running XSL queries. See also XSLT Tutorial, XML transformation using Xslt in C#, How to query XML with an XPath expression by using Visual C#.
This question also relates to another on Stack Overflow: Parse multiple XML files with ASP.NET (C#) and return those with particular element. The accepted answer there, though, suggests using Linq.
If it is at all possible to move to C# 3.0 / .NET 3.5, LINQ-to-XML would be by far the easiest option.
With .NET 2.0, you're stuck with either XML objects or XSL.