I have an HTML document and I'm getting elements based on a class. Once I have them, I'm going through each element and get further elements:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(content);
var rows = doc.DocumentNode.SelectNodes("//tr[contains(#class, 'row')]");
foreach (var row in rows)
{
var name = row.SelectSingleNode("//span[contains(#class, 'name')]").InnerText,
var surname = row.SelectSingleNode("//span[contains(#class, 'surname')]").InnerText,
customers.Add(new Customer(name, surname));
};
However, the above is iterating through the rows but the always retrieving the text of the first row.
Is the XPath wrong?
This is a FAQ in XPath. Whenever your XPath starts with /, it ignores context element (the element referenced by row variable in this case). It searches for matching elements starting from the root document node regardless of the context. That's why your SelectSingleNode() always return the same element which is the first matched element in the entire document.
You only need to prepend a dot (.) to make it relative to current context element :
foreach (var row in rows)
{
var name = row.SelectSingleNode(".//span[contains(#class, 'name')]").InnerText,
var surname = row.SelectSingleNode(".//span[contains(#class, 'surname')]").InnerText,
customers.Add(new Customer(name, surname));
}
What about using LINQ?
var customers = rows.Select(row => new Customer(Name = row.SelectSingleNode("//span[contains(#class, 'name')]").InnerText, Surname = row.SelectSingleNode("//span[contains(#class, 'surname')]").InnerText)).ToList();
Related
I am putting together a map of all the inline styles on elements in a large project. I would like to show the line number where they are located similar the example below.
Is it possible to get the line number of an element in AngleSharp?
foreach (var file in allFiles)
{
string source = File.ReadAllText(file.FullName);
var parser = new HtmlParser();
var doc = parser.ParseDocument(source);
var items = doc.QuerySelectorAll("*[style]");
sb.AppendLine($"{file.Name} - inline styles({items.Count()})");
foreach (var item in items)
{
sb.AppendLine($"\t\tstyle (**{item.LineNumber}**): {item.GetAttribute("style")}");
}
}
Yes this is possible.
Quick example:
var parser = new HtmlParser(new HtmlParserOptions
{
IsKeepingSourceReferences = true,
});
var document = parser.ParseDocument("<html><head></head><body>&foo</body></html>");
document.QuerySelector("body").SourceReference?.Position.Dump();
The output looks as follows:
The important part is to use the IsKeepingSourceReferences option, as this will allow you to use SourceReference. Some (by the parser / spec inserted) elements may not have a source reference, so keep in mind that this may be null.
I try to iterate a single table row and its a href links but it does not work as expected, instead of finding the selected row and its links it find all links in the table.. What am I doing wrong?
var allRows = doc.DocumentNode.SelectNodes("//table[#id='sortingTable']/tr");
var i = 0;
var rowNumber = 0;
foreach (var row in allRows)
{
if (row.InnerText.Contains("Text in cell for which row I want to use"))
{
rowNumber = i+1;
break;
}
i += 1;
}
var list = new List<SortFile>();
var rowToRead = allRows[rowNumber]; // One specific row
var numberOfLinks = rowToRead.SelectNodes("//a[#href]"); // this does not find the 2 links in the table row but all links in the whole table?
foreach (HtmlNode link in rowToRead.SelectNodes("//a[#href]"))
{
//HtmlAttribute att = link.Attributes["href"];
//var text = link.OuterHtml;
}
The XPath you are using (//a[#href]) would get all of the links in the document. // means to find anything starting from the document root.
You should use .//a[#href] to start from the current node and select all links. That would only take the links underneath the tr node you have selected.
I'm having a problem with my XML document.
I want my program to find all values of the items in my XML file, but only if the handlingType is of a certain character bunch.
Code (C#) :
string path = "//files//handling.meta";
var doc = XDocument.Load(path);
var items = doc.Descendants("HandlingData").Elements("Item");
var query = from i in items
select new
{
HandlingName = (string)i.Element("handlingName"),
HandlingType = (string)i.Element("HandlingType"),
Mass = (decimal?)i.Element("fMass")
};
foreach (var HandlingType in items)
{
if (HandlingType.ToString() == "HANDLING_TYPE_FLYING")
{
MessageBox.Show(HandlingType.ToString());
}
}
The above code demonstraights a short version of what I want to happen, but fails to find this handlingType (does not show the messageBox)
Here's the XML :
<CHandlingDataMgr>
<HandlingData>
<Item type="CHandlingData">
<handlingName>Plane</handlingName>
<fMass value="380000.000000"/>
<handlingType>HANDLING_TYPE_FLYING</handlingType>
</Item>
<Item type="CHandlingData">
<handlingName>Car1</handlingName>
<fMass value="150000.000000"/>
<handlingType>HANDLING_TYPE_DRIVING</handlingType>
</Item>
</HandlingData>
</CHandlingDataMgr>
I would like the output to show the handlingName if it contains a certain HandlingType
For e.g.
if (handlingType == "HANDLING_TYPE_FLYING")
{
messageBox.Show(this.HandlingName);
}
My problem in short : Program does not find item's handling type, it does find the tag but when asked to display, returns empty/shows as nothing.
Edit: Also in the XML handling_type_flying contains extra elements such as thrust that cannot be found in each item (such as car), I would like the program to also find these elements. (this is a second problem I'm facing, maybe should ask 2nd ques?)
Several things that need fixing.
you are not using your query in your foreach loop. foreach (var item in query)
Your element has an upercase "H" but should be lowercase "handlingType". HandlingType = (string)i.Element("handlingType"),
You are not pulling the Attribute value of your fMass element.Mass = i.Element("fMass").Attribute("value").Value
Once you adjust your Query in your foreach loop you then need to adjust the loop to account for looping over your newly made object.
NOTE that I removed (decimal) from Mass = i.Element("fMass").Attribute("value").Value
here is the code with all the fixes.
class Program
{
static void Main()
{
const string path = "//files//handling.meta";
var doc = XDocument.Load(path);
var items = doc.Descendants("HandlingData").Elements("Item");
var query = from i in items
select new
{
HandlingName = (string)i.Element("handlingName"),
HandlingType = (string)i.Element("handlingType"),
Mass = i.Element("fMass").Attribute("value").Value
};
foreach (var item in query)
{
if (item.HandlingType == "HANDLING_TYPE_FLYING")
{
//Remove messagebox if consoleapp
MessageBox.Show(item.HandlingType);
MessageBox.Show(item.HandlingName);
Console.WriteLine(item.HandlingType);
Console.WriteLine(item.HandlingName);
}
}
}
}
I would recommend looking into serializing your xml to an object.
If you look at http://msdn.microsoft.com/en-us/library/system.xml.linq.xelement(v=vs.110).aspx the ToString() method doesn't return the name of the tag, but the indented XML.
You should instead be using the Value property. Also you should use .equals("...") instead of ==
if (handlingType.Value.equals("HANDLING_TYPE_FLYING")
{
messageBox.Show(this.handlingname);
}
How to test whether a node contains particular string or character using C# code.
example:
<abc>
<foo>data testing</foo>
<foo>test data</foo>
<bar>data value</bar>
</abc>
Now I need to test the particular node value has the string "testing" ?
The output would be "foo[1]"
You can also that into an XPath document and then use a query:
var xPathDocument = new XPathDocument("myfile.xml");
var query = XPathExpression.Compile(#"/abc/foo[contains(text(),""testing"")]");
var navigator = xpathDocument.CreateNavigator();
var iterator = navigator.Select(query);
while(iterator.MoveNext())
{
Console.WriteLine(iterator.Current.Name);
Console.WriteLine(iterator.Current.Value);
}
This will determine if any elements (not just foo) contain the desired value and will print the element's name and it's entire value. You didn't specify what the exact result should be, but this should get you started. If loading from a file use XElement.Load(filename).
var xml = XElement.Parse(#"<abc>
<foo>data testing</foo>
<foo>test data</foo>
<bar>data value</bar>
</abc>");
// or to load from a file use this
// var xml = XElement.Load("sample.xml");
var query = xml.Elements().Where(e => e.Value.Contains("testing"));
if (query.Any())
{
foreach (var item in query)
{
Console.WriteLine("{0}: {1}", item.Name, item.Value);
}
}
else
{
Console.WriteLine("Value not found!");
}
You can use Linq to Xml
string someXml = #"<abc>
<foo>data testing</foo>
<foo>test data</foo>
</abc>";
XDocument doc = XDocument.Parse(someXml);
bool containTesting = doc
.Descendants("abc")
.Descendants("foo")
.Where(i => i.Value.Contains("testing"))
.Count() >= 1;
i need to store all the informationen from the xml in an array. My code doesn't work, because I always get just the first item from the xml.
Does anyone know how to fix this?
XDocument xdoc = XDocument.Load("http://www.thefaxx.de/xml/nano.xml");
var items = from item in xdoc.Descendants("items")
select new
{
Title = item.Element("item").Element("title").Value,
Description = item.Element("item").Element("description").Value
};
foreach (var item in items)
{
listView1.Items.Add(item.Title);
}
How about:
var items = from item in xdoc.Descendants("item")
select new
{
Title = item.Element("title").Value,
// *** NOTE: xml has "desc", not "description"
Description = item.Element("desc").Value
};
It is a little hard to be sure without sample xml - but it looks like you intend to loop over all the <item>...</item> elements - which is what the above does. Your original code loops over the (single?) <items>...</items> element(s), then fetches the first <item>...</item> from within it.
edit after looking at the xml; this would be more efficient:
var items = from item in xdoc.Root.Elements("item")
select new {
Title = item.Element("title").Value,
Description = item.Element("desc").Value
};