I have XElement xDoc =
<div id="item123">
<div id="item456">
<h3 id="1483538342">
<span>Dessuten møtte</span>
</h3>
<p>Test!</p>
</div>
</div>
When I try to remove en item with id = "item456" I get an error
System.NullReferenceException: Object reference not set to an instance of an object.
var item = "456";
xDoc.Descendants("div").Where(s => s.Attribute("id").Value == "item" + item).Remove();
I can't understand what is wrong here.
You need to check if the current element (inside the where iteration) has an id attribute, otherwise you will access a null object and get an exception.
var item = "456";
xDoc.Descendants("div").Where(s => s.Attribute("id") != null && s.Attribute("id").Value == "item" + item).Remove();
Your error means that some of div elements do not have id attribute. Thus s.Attribute("id") returns null. Trying to get it's value throws exception. If you will cast attribute to string instead of trying to access it's Value, you will not get error (null will be returned if attribute was not found):
xDoc.Descendants("div")
.Where(d => (string)d.Attribute("id") == "item" + item)
.Remove();
Also thus you are dealing with HTML I suggest you to use appropriate tool - HtmlAgilityPack. Removing your div nodes will look like:
HtmlDocument doc = new HtmlDocument();
doc.Load(path_to_file);
foreach (var div in doc.DocumentNode.SelectNodes("//div[#id='item456']"))
div.Remove();
Related
i have a list on a website that stores the part number and the order number.
in this list are different div elements and i would like to export every part number in this list.
The list looks like this:
<div class="spareValue">
<span class="label">OrderNumber:</span>
<span class="PartNumber">180011</span>
</div>
<div class="spareValue">
<span class="label">SparePartNumber:</span>
<span class="PartNumber">01002523</span>
</div>
How can i export every OrderNumber and put them into a list in c# that i can work with the values??
lot of ways to do that:
var spans = driver.FindElements(By.CssSelector("div.spareValue span"));
var nbrspans = spans.Count;
var Listordernumber = new List<string>();
for(int i = 0; i < nbrspans; i +=2)
{
if (spans[i].GetAttribute("textContent") != "OrderNumber:") continue;
Listordernumber.Add(spans[i + 1].GetAttribute("textContent"));
}
so Listordernumber contains the result
if you prefer linq, you could use that:
string path = "//div[#class='spareValue' and ./span[text()='OrderNumber:']]/span[#class = 'PartNumber']";
var Listordernumber = driver.FindElements(By.XPath(path)).Select(s => s.GetAttribute("textContent")).ToList();
Oke, you want every partnummer, that belongs to an ordernummer. That leads me to this xPath, find the div that has a ordernumber in it, than find the partnumber element inside.
Then to find them all (or none).
Last put them all in a neat list, selecting the partnumber text:
string path = "//div[#class='spareValue' and ./span[text()='OrderNumber:']]/div[#class='PartNumber']"
var elements = driver.findElements(By.XPath(path));
var listOrderNumber = elements.Select(e=>e.Text);
var els = driver.FindElements(By.XPath("//*[text()='OrderNumber:']"));
foreach(var el in els){
var el = els.FindElement(By.XPath("./../span[#class='PartNumber']"));
console.writeline("OrderNumber: "+el.Text());
}
first, you have to find all elements that have "OrderNumber:" text on it and eliminate all elements that don't.
now, iterate through all elements that have "OrderNumber:" we have found from step above and go to its parent node, then find all element inside the parent node that the class name is "PartNumber".
I am new in Webscraping and trying to get data from a website with HTMLAgilityPack using ASP.NET C#. HTML structure which I am trying to parse is:
<li class='subsubnav' id='new-women-clothing'>
<span class='cat-name'>CLOTHING</span>
<ul>
<li>Just In</li>
<li>Exclusives</li>
<li>Dresses & Gowns</li>
<li>Coats</li>
<li>Jackets</li>
<li>Shirts & Blouses</li>
<li>Tops</li>
<li>Knitwear</li>
<li>Sweatshirts</li>
<li>Skirts & Shorts</li>
<li>Trousers</li>
<li>Jumpsuits</li>
<li>Jeans</li>
<li>Swimwear</li>
<li>Lingerie</li>
<li>Nightwear</li>
<li>Sportswear</li>
<li>Ski Wear</li>
</ul>
</li>
I am getting the parent categories which in this case is CLOTHING perfectly but i am unable to get elements inside ul.
here is my c# code:
var html = new HtmlDocument();
html.LoadHtml(new WebClient().DownloadString("http://www.harrods.com/men/t-shirts?icid=megamenu_MW_clothing_t_shirts"));
var root = html.DocumentNode;
var nodes = root.Descendants();
var totalNodes = nodes.Count();
var dt = root.Descendants().Where(n => n.GetAttributeValue("class", "").Equals("cat-name"));
foreach(var x in dt)
{
foreach (var element in x.Descendants("ul"))
{
child_data.Add(new cat_childs(element.InnerText));
}
data.Add(new Categories(x.InnerText,child_data));
}
test.DataSource = data;
test.DataBind();
So how can I get the link and text of anchor tags inside <ul>?
If you want to base the iteration on span with class='cat-name', then the target ul relation to the span is following sibling instead of descendant. You can use SelectNodes() to get following sibling elements from current span, like so :
foreach (var x in dt)
{
foreach (var element in x.SelectNodes("following-sibling::ul/li/a"))
{
child_data.Add(new cat_childs(element.InnerText));
}
data.Add(new Categories(x.InnerText,child_data));
}
UPDATE :
It seems that the actual problem is in child_data variable being declared outside the outer loop. It means that you're keep adding item to the same child_data instance. Try to declare it inside the outer loop, right after foreach (var x in dt){. Alternatively, you can write the entire codes as a LINQ expression, something like this :
var data = (from d in dt
let child_data = x.SelectNodes("following-sibling::ul/li/a")
.Select(o => new cat_childs(o.InnerText))
.ToList()
select new Categories(x.InnerText, child_data)
).ToList();
Using this xpath. It will get all the <li> that contain a <span> that has a class='cat-name'. After which it picks all the <a>s that are enclosed by <li>.
//If the span has no influence on what you want you can simply use:
//HtmlNodeCollection hNC = htmlDoc.DocumentNode.SelectNodes("//ul/li/a");
HtmlNodeCollection hNC = htmlDoc.DocumentNode.SelectNodes("//li/span[#class='cat-name']/parent::*/ul/li");
foreach (HtmlNode h in hNC)
{
Console.Write(h.InnerText+" ");
Console.WriteLine(h.GetAttributeValue("href", ""));
}
I'm having trouble figuring out how to traverse the DOM with HTML Agility Pack.
For example let's say that I wanted to find an element with id="gbqfsa".
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(Url);
var foo = from bar in doc.DocumentNode.DescendantNodes()
where bar.Attributes["id"].Value == "gbqfsa"
select bar.InnerText;
Right now I'm doing this (above), but foo is coming out as null. What am I doing wrong?
EDIT: This is the if statement I was using. I was just testing to see if the elements InnerText equaled "Google Search."
if (foo.Equals("Google Search"))
{
HasSucceeded = 1;
MessageBox.Show(yay);
}
else
{
MessageBox.Show("kms");
}
return HasSucceeded;
What you should do is:
var foo = (from bar in doc.DocumentNode.DescendantNodes()
where bar.GetAttributeValue("id", null) == "gbqfsa"
select bar.InnerText).FirstOrDefault();
You forgot FirstOrDefault() to select the first element that satisfy the condition in where.
And I replace Attributes["id"].Value by GetAttributeValue("id", null) not to throw an exception if an element does have an id attribute.
I don't think foo is coming out as null. More likely, bar.Attributes["id"] is null for some of the elements in the tree since not all descendant nodes have an "id" property. I would recommend using the GetAttributeValue method, which will return a default value if the attribute is not found.
var foo = from bar in doc.DocumentNode.DescendantNodes()
where bar.GetAttributeValue("id", null) == "gbqfsa"
select bar.InnerText;
I was trying to remove a descendant element from an XElement (using .Remove()) and I seem to get a null object reference, and I'm not sure why.
Having looked at the previous question with this title (see here), I found a way to remove it, but I still don't see why the way I tried 1st didn't work.
Can someone enlighten me ?
String xml = "<things>"
+ "<type t='a'>"
+ "<thing id='100'/>"
+ "<thing id='200'/>"
+ "<thing id='300'/>"
+ "</type>"
+ "</things>";
XElement bob = XElement.Parse(xml);
// this doesn't work...
var qry = from element in bob.Descendants()
where element.Attribute("id").Value == "200"
select element;
if (qry.Count() > 0)
qry.First().Remove();
// ...but this does
bob.XPathSelectElement("//thing[#id = '200']").Remove();
Thanks,
Ross
The problem is that the collection you are iterating contains some element that don't have the id attribute. For them, element.Attribute("id") is null, and so trying to access the Value property throws a NullReferenceException.
One way to solve this is to use a cast instead of Value:
var qry = from element in bob.Descendants()
where (string)element.Attribute("id") == "200"
select element;
If an element doesn't have the id attribute, the cast will returns null, which works fine here.
And if you're doing a cast, you can just as well cast to an int?, if you want.
Try the following:
var qry = bob.Descendants()
.Where(el => el .Attribute("id") != null)
.Where(el => el .Attribute("id").Value = "200")
if (qry.Count() > 0)
qry.First().Remove();
You need to test for the presence of the id attribute before getting its value.
Consider the following html code:
<div id='x'><div id='y'>Y content</div>X content</div>
I'd like to extract only the content of 'x'. However, its innerText property includes the content of 'y' as well. I tried iterating over its children and all properties but they only return the inner tags.
How can I access through the IHTMLElement interface only the actual data of 'x'?
Thanks
Use something like:
function getText(this) {
var txt = this.innerHTML;
txt.replace(/<(.)*>/g, "");
return txt;
}
Since this.innerHTML returns
<div id='y'>Y content</div>X content
the function getText would return
X content
Maybe this'll help.
Use the childNodes collection to return child elements and textnodes
You need to QI IHTMLDomNote from IHTMLelement for that.
Here is the final code as suggested by Sheng (just a part of the sample, of course):
mshtml.IHTMLElementCollection c = ((mshtml.HTMLDocumentClass)(wbBrowser.Document)).getElementsByTagName("div");
foreach (IHTMLElement div in c)
{
if (div.className == "lyricbox")
{
IHTMLDOMNode divNode = (IHTMLDOMNode)div;
IHTMLDOMChildrenCollection children = (IHTMLDOMChildrenCollection)divNode.childNodes;
foreach (IHTMLDOMNode child in children)
{
Console.WriteLine(child.nodeValue);
}
}
}
Since innerText() doesn't work with ie, there is no real way i guess.
Maybe try server-side solving the issue by creating content the following way:
<div id='x'><div id='y'>Y content</div>X content</div>
<div id='x-plain'>_plain X content_</div>
"Plain X content" represents your c# generated content for the element.
Now you gain access to the element by refering to getObject('x-plan').innerHTML().