Select xml element by class name with a regex - c#

I want to extract an svg element by his class name with a C# regex.
For example I have this:
<path fill="none" ... class="highcharts-tracker highcharts-tracker" ... stroke-width="22" zIndex="2" style=""/>
And I want to delete every path elements with highcharts-tracker as class name by using :
new Regex("");
Anybody know ?

In LINQ to XML, this is pretty straightforward:
var classToRemove = "highlights-tracker";
var xml = XDocument.Parse(svg);
var elements = doc.Descendants("path")
.Where(x => x.Attribute("class") != null &&
x.Attribute("class")
.Value.Split(' ')
.Contains(classToRemove));
// Remove all the elements which match the query
elements.Remove();
You should not use regular expressions to try to parse XML... XML is very well handled by existing APIs, and regular expressions are not an appropriate tool.
EDIT: If it's malformed (which you should have said to start with) you should try to work out why it's malformed and fix it before you try to do any other processing. There's really no excuse for XML being malformed these days... there are plenty of good XML APIs for just about every platform in existence.

Related

Complex regex or string parse

We are trying to use urls for complex querying and filtering.
I managed to get some of the simpler parst working using expression trees and a mix of regex and string manipulation but then we looked at a more complex string example
var filterstring="(|(^(categoryid:eq:1,2,3,4)(categoryname:eq:condiments))(description:lk:”*and*”))";
I'd like to be able to parse this out in to parts but also allow it to be recursive.. I'd like to get the out put looking like:
item[0] (^(categoryid:eq:1,2,3,4)(categoryname:eq:condiments)
item[1] description:lk:”*and*”
From there I could Strip down the item[0] part to get
categoryid:eq:1,2,3,4
categoryname:eq:condiments
At the minute I'm using RegEx and strings to find the | ^ for knowing if it's an AND or an OR the RegEx matches brackets and works well for a single item it's when we nest the values that I'm struggling.
the Regex looks like
#"\((.*?)\)"
I need some way of using Regex to match the nested brackets and help would be appreciated.
You could transform the string into valid XML (just some simple replace, no validation):
var output = filterstring
.Replace("(","<node>")
.Replace(")","</node>")
.Replace("|","<andNode/>")
.Replace("^","<orNode/>");
Then, you could parse the XML nodes by using, for example, System.Xml.Linq.
XDocument doc = XDocument.Parse(output);
Based on you comment, here's how you rearrange the XML in order to get the wrapping you need:
foreach (var item in doc.Root.Descendants())
{
if (item.Name == "orNode" || item.Name == "andNode")
{
item.ElementsAfterSelf()
.ToList()
.ForEach(x =>
{
x.Remove();
item.Add(x);
});
}
}
Here's the resulting XML content:
<node>
<andNode>
<node>
<orNode>
<node>categoryid:eq:1,2,3,4</node>
<node>categoryname:eq:condiments</node>
</orNode>
</node>
<node>description:lk:”*and*”</node>
</andNode>
</node>
I understand that you want the values specified in the filterstring.
My solution would be something like this:
NameValueCollection values = new NameValueCollection();
foreach(Match pair in Regex.Matches(#"\((?<name>\w+):(?<operation>\w+):(?<value>[^)]*)\)"))
{
if (pair.Groups["operation"].Value == "eq")
values.Add(pair.Groups["name"].Value, pair.Groups["value"].Value);
}
The Regex understand a (name:operation:value), it doesn't care about all the other stuff.
After this code has run you can get the values like this:
values["categoryid"]
values["categoryname"]
values["description"]
I hope this will help you in your quest.
I think you should just make a proper parser for that — it would actually end up simpler, more extensible and save you time and headaches in the future. You can use any existing parser generator such as Irony or ANTLR.

how to compare 2 XML using LINQ in C#

I want to compare 2 XML files.
My xml1 is:
<ROOT><NODE><BOOK><ID>1234</ID><NAME isbn="dafdfad">Numbers: Language of Science</NAME><AUTHOR>Tobias Dantzig</AUTHOR></BOOK></NODE></ROOT>
I have another XML from database which is
<Book xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><Id>12345</Id><Name isbn="31231223">Numbers: Language of Science</Name><Author>Tobias Dantzig</Author></Book>
I want to compare the "BOOK" node from XML1 and "Book" node from db XML
I have a Name-space in the XML which is obtained from database
The node names are in mixed cases
I want to compare these 2 XML files node by node for Text and attributes value
I am using C# and wanted to know if this is possible using LINQ
Any help would be really appreciated
P.S. I searched for similar posts but couldn't find what i am exactly looking for.
Thanks a lot in advance
Cheers,
Karthik
In xml, case and namespace are fundamentally important, and whitespace and attribute-order aren't (making direct string compares incorrect).
So IMO you should parse it; perhaps with XmlSerializer, but (as you note) both are trivially parsed with LINQ-to-XML:
string xml1 = #"<ROOT><NODE><BOOK><ID>1234</ID><NAME isbn=""dafdfad"">Numbers: Language of Science</NAME><AUTHOR>Tobias Dantzig</AUTHOR></BOOK></NODE></ROOT>";
var book1 = (from book in XElement.Parse(xml1).Elements("NODE").Elements("BOOK")
let nameEl = book.Element("NAME")
select new
{
Id = (int)book.Element("ID"),
Name = nameEl.Value,
Isbn = (string)nameEl.Attribute("isbn"),
Author = (string)book.Element("AUTHOR")
}).Single();
string xml2 = #"<Book xmlns:rdf=""http://www.w3.org/1999/02/22-rdf-syntax-ns#""><Id>12345</Id><Name isbn=""31231223"">Numbers: Language of Science</Name><Author>Tobias Dantzig</Author></Book>";
var el = XElement.Parse(xml2);
var book2 = new
{
Id = (int)el.Element("Id"),
Name = el.Element("Name").Value,
Isbn = el.Element("Name").Attribute("isbn"),
Author = el.Element("Author")
};
Then it is just a case of comparing the values.
An alternative is to use something like xslt to pre-process one of the files to match the expected layout of the other, so you can share parsing code. It depends whether you are already familiar with xslt, I guess.
can be done quite easily using Linq to XML or even simple Xml DOM.
though i would bravely do it with Regular Expressions.
one regex to find all the books, and a couple or so do dismantle each record.

How to iterate over xml using linq2xml or Xquery

I have an incoming file with data as
<root><![CDATA[<defs><elements>
<element><item>aa</item><int>1</int></element>
<element><item>bb</item><int>2</int></element>
<element><item>cc</item><int>3</int></element>
</elements></defs>]]></root>
writing multiple foreach( xElement x in root.Elements ) seems superfluous !
looking for a less verbose method preferably using C#
UPDATE - yes - the input is in a CDATA, rest assured it's not my design and i have ZERO control over it !
Assuming that nasty CDATA section is intentional, and you're only interested in the text content of your leaf elements, you can do something like:
XElement root = XElement.Load(yourFile);
var data = from element in XElement.Parse(root.Value).Descendants("element")
select new {
Item = element.Elements("item").First().Value,
Value = element.Elements("int").First().Value
};
That said, if the code that generates your input file is under your control, consider getting rid of the CDATA section. Storing XML within XML that way is not the way to go most of the time, as it defeats the purpose of the markup language (and requires multiple parser passes, as shown above).

How do I work with an XML tag within a string?

I'm working in Microsoft Visual C# 2008 Express.
Let's say I have a string and the contents of the string is: "This is my <myTag myTagAttrib="colorize">awesome</myTag> string."
I'm telling myself that I want to do something to the word "awesome" - possibly call a function that does something called "colorize".
What is the best way in C# to go about detecting that this tag exists and getting that attribute? I've worked a little with XElements and such in C#, but mostly to do with reading in and out XML files.
Thanks!
-Adeena
Another solution:
var myString = "This is my <myTag myTagAttrib='colorize'>awesome</myTag> string.";
try
{
var document = XDocument.Parse("<root>" + myString + "</root>");
var matches = ((System.Collections.IEnumerable)document.XPathEvaluate("myTag|myTag2")).Cast<XElement>();
foreach (var element in matches)
{
switch (element.Name.ToString())
{
case "myTag":
//do something with myTag like lookup attribute values and call other methods
break;
case "myTag2":
//do something else with myTag2
break;
}
}
}
catch (Exception e)
{
//string was not not well formed xml
}
I also took into account your comment to Dabblernl where you want parse multiple attributes on multiple elements.
You can extract the XML with a regular expression, load the extracted xml string in a XElement and go from there:
string text=#"This is my<myTag myTagAttrib='colorize'>awesome</myTag> text.";
Match match=Regex.Match(text,#"(<MyTag.*</MyTag>)");
string xml=match.Captures[0].Value;
XElement element=XElement.Parse(xml);
XAttribute attribute=element.Attribute("myTagAttrib");
if(attribute.Value=="colorize") DoSomethingWith(element.Value);// Value=awesome
This code will throw an exception if no MyTag element was found, but that can be remedied by inserting a line of:
if(match.Captures.Count!=0)
{...}
It gets even more interesting if the string could hold more than just the MyTag Tag...
I'm a little confused about your example, because you switch between the string (text content), tags, and attributes. But I think what you want is XPath.
So if your XML stream looks like this:
<adeena/><parent><child x="this is my awesome string">This is another awesome string<child/><adeena/>
You'd use an XPath expression that looks like this to find the attribute:
//child/#x
and one like this to find the text value under the child tag:
//child
I'm a Java developer, so I don't know what XML libraries you'd use to do this. But you'll need a DOM parser to create a W3C Document class instance for you by reading in the XML file and then using XPath to pluck out the values.
There's a good XPath tutorial from the W3C schools if you need it.
UPDATE:
If you're saying that you already have an XML stream as String, then the answer is to not read it from a file but from the String itself. Java has abstractions called InputStream and Reader that handle streams of bytes and chars, respectively. The source can be a file, a string, etc. Check your C# DOM API to see if it has something similar. You'll pass the string to a parser that will give back a DOM object that you can manipulate.
Since the input is not well-formed XML you won't be able to parse it with any of the built in XML libraries. You'd need a regular expression to extract the well-formed piece. You could probably use one of the more forgiving HTML parsers like HtmlAgilityPack on CodePlex.
This is my solution to match any type of xml using Regex:
C# Better way to detect XML?
The XmlTextReader can parse XML fragments with a special constructor which may help in this situation, but I'm not positive about that.
There's an in-depth article here:
http://geekswithblogs.net/kobush/archive/2006/04/20/75717.aspx

XML Parsing with C#?

I'm working on a project for school that involves a heavy amount of XML Parsing. I'm coding in C#, but I have yet to find a "suitable" method of parsing this XML out. There's several different ways I've looked at, but haven't gotten it right yet; so I have come to you. Ideally, I'm looking for something kind of similar to Beautiful Soup in Python (sort of).
I was wondering if there was any way to convert XML like this:
<config>
<bgimg>C:\\background.png</bgimg>
<nodelist>
<node>
<oid>012345</oid>
<image>C:\\image.png</image>
<label>EHRV</label>
<tooltip>
<header>EHR Viewer</header>
<body>Version 1.0</body>
<icon>C:\\ico\ehrv.png</icon>
</tooltip>
<msgSource>8181:iqLog</msgSource>
</nodes>
</nodeList>
<config>
Into an Array/Hastable/Dictionary/Other like this:
Array
(
["config"] => array
(
["bgimg"] => "C:\\background.png"
["nodelist"] => array
(
["node"] => array
(
["oid"] => "012345"
["image"] => "C:\\image.png"
["label"] => "Version 1.0"
["tooltip"] => array
(
["header"] => "EHR Viewer"
["body"] => "Version 1.0"
["icon"] => "C:\\ico\ehrv.png"
)
["msgSource"] => "8181:iqLog"
)
)
)
)
Even just giving me a decent resource to look through would be really helpful. Thanks a ton.
I would look into Linq to Xml. This gives you an object structure similar to the Xml file that is fairly easy to traverse.
XmlDocument + XPath is pretty much all you ever need in .NET to parse XML.
There must be 1/2 dozen different ways to do this in C#. My favorite uses the System.Xml namespace, particularly System.Xml.Serialization.
You use a command line tool called xsd.exe to turn an xml sample into an xsd schema file (tip: make sure your nodelist has more than one node in the sample), and then use it again on the schema to turn that into a C# class file you can load into your project and easily use with the System.Xml.Serialization.XmlSerializer class.
There's no shame in using an old-fashioned XmlDocument:
var xml = "<config>hello world</config>";
var doc = new System.Xml.XmlDocument();
doc.LoadXml(xml);
var nodes = doc.SelectNodes("/config");
You should defiantly use LINQ to XML, A.K.A. XLINQ. There is a nice tool called LINQPad that you should check out. It has nice features, from a comprehensive examples library to allowing you to directly query an SQL database via Linq to SQL. Best of all, it lets you test your queries before putting them into code.
The best approach will be dictated by what you actually want to do with the data once you've parsed it out.
If you want to pass it around in a structured-but-not-tied-to-XML fashion, XML Serialization is probably your best bet. This will also get you closest to what you've described, though you'll be dealing with an object graph rather than nested maps.
If you are just looking for a convenient format to query for specific bits of data, your best option would be LINQ to Xml. Alternatively, you could use the more traditional classes in the System.Xml namespace (starting with XmlDocument) and query using XPath.
You could also use any of these techniques (or an XmlTextReader) as building blocks to create the datastructure you've described but, barring some special need, I don't think it'll give you any more versatility than what the other approaches will.
You can also use serialization to convert the XML text back into a strongly typed class instance.
I personally like to map XML elements to classes and viceversa using System.Xml.Serialization.XmlSerializer class.
http://msdn.microsoft.com/es-es/library/system.xml.serialization.xmlserializer(VS.80).aspx
I personally use XPathDocument, XPathNavigator and XPathNodeIterator e.g.
XPathDocument xDoc = new XPathDocument(CHOOSE SOURCE!);
XPathNavigator xNav = xDoc.CreateNavigator();
XPathNodeIterator iterator = xNav.Select("nodes/node[#SomePredicate = 'SomeValue']");
while (iterator.MoveNext())
{
string val = iterator.Current.SelectSingleNode("nodeWithValue");
// etc etc
}
Yeah, i agree..
The linq-way is very nice.
And i especially like the way you write XML using it.
It is much more simple using the "objects in objects"-way.

Categories

Resources