How to parse string with regular expression - c#

How can I get List() from this text:
For the iPhone do the following:<ul><li>Go to AppStore</li><li>Search by him</li><li>Download</li></ul>
that should consist :Go to AppStore,
Search by him,
Download

Load the string up into the HTML Agility Pack then select all li elements inner text.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("following:<ul><li>Go to AppStore</li><li>Search by him</li><li>Download</li></ul>");
var uls = doc.DocumentNode.Descendants("li").Select(d => d.InnerText);
foreach (var ul in uls)
{
Console.WriteLine(ul);
}

Wrap in an XML root element and use LINQ to XML:
var xml = "For the iPhone do the following:<ul><li>Go to AppStore</li><li>Search by him</li><li>Download</li></ul>";
xml = "<r>"+xml+"</r>";
var ele = XElement.Parse(xml);
var lists = ele.Descendants("li").Select(e => e.Value).ToList();
Returns in lists:
Go to AppStore
Search by him
Download

Related

Htmlagilitypack doesnt get nodes.

I am using Htmlagilitypack in c#. But when i want to select images in a div at the url bottom, there are nothing found in selector. But i think i write right selector.
Codes are in fiddle. Thanks.
https://dotnetfiddle.net/NNIC3X
var url = "https://fotogaleri.haberler.com/unlu-sarkici-imaj-degistirdi-gorenler-gozlerine/";
//I will get the images src values in .col-big div at this url.
var web = new HtmlWeb();
var doc = web.Load(url);
var htmlNodes = doc.DocumentNode.SelectNodes("//div[#class='col-big']//*/img");
//i am selecting all images in div.col-big. But there is nothing.
foreach (var node in htmlNodes)
{
Console.WriteLine(node.Attributes["src"].Value);
}
Your xpath is wrong because there is no div-tag that has class-attribtue with the value 'col-big'. There is however a div-tag that has a class attribute with the value 'col-big pull-left'. So try.
var htmlNodes = doc.DocumentNode.SelectNodes("//div[#class='col-big pull-left']//*/img");

Parse Compelete Web Page

How to parse complete HTML web page not specific nodes using HTML Agility Pack or any other technique?
I am using this code, but this code only parse specific node, but I need complete page to parse with neat and clear contents
List<string> list = new List<string>();
string url = "https://www.google.com";
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a"))
{
list.Add(node.InnerText);
}
To get all descendant text nodes use something like
var textNodes = doc.DocumentNode.SelectNodes("//text()").
Select(t=>t.InnerText);
To get all non empty descendant text nodes
var textNodes = doc.DocumentNode.
SelectNodes("//text()[normalize-space()]").
Select(t=>t.InnerText);
Do SelectNodes("*") . '*' (asterisk) Is the wild card selector and will get every node on the page.

Finding node using HTML agility pack

Here is the google chrome dev tool to get the elment im looking for.
Here are all the different ways I have tried to get the nodes..
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(webObject.Html);
// HtmlNode footer = doc.DocumentNode.Descendants().SingleOrDefault(y => y. == "boardPickerInner");
// "//div[#class='boardPickerInner']"
//var y = (from HtmlNode node in doc.DocumentNode.SelectNodes("//")
// where node.InnerText == "boardPickerInner"
// select node.InnerHtml);
HtmlAgilityPack.HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("//nameAndIcons");
var xq = doc.DocumentNode.SelectSingleNode("//td[#class='nameAndIcons']");
var x = doc.DocumentNode.SelectSingleNode("");
HtmlNode nodes = doc.DocumentNode.SelectSingleNode("//[#class='nameAndIcons']");
var boards = nodes.SelectNodes("//*[#class='nameAndIcons']");
Can someone explain what I am doing wrong..?
It looks like you have multiple span elements with class="nameAndIcons". So in order to get them all you could use the SelectNodes function:
var nodes = doc.DocumentNode.SelectNodes("//span[#class='nameAndIcons'"])

Htmlagilitypack: create html text node

In HtmlAgilityPack, I want to create HtmlTextNode, which is a HtmlNode (inherts from HtmlNode) that has a custom InnerText.
HtmlTextNode CreateHtmlTextNode(string name, string text)
{
HtmlDocument doc = new HtmlDocument();
HtmlTextNode textNode = doc.CreateTextNode(text);
textNode.Name = name;
return textNode;
}
The problem is that the textNode.OuterHtml and textNode.InnerHtml will be equal to "text" after the method above.
e.g. CreateHtmlTextNode("title", "blabla") will generate:
textNode.OuterHtml = "blabla" instead of <Title>blabla</Title>
Is there any better way to create HtmlTextNode?
The following lines creates a outer html with content
var doc = new HtmlDocument();
// create html document
var html = HtmlNode.CreateNode("<html><head></head><body></body></html>");
doc.DocumentNode.AppendChild(html);
// select the <head>
var head = doc.DocumentNode.SelectSingleNode("/html/head");
// create a <title> element
var title = HtmlNode.CreateNode("<title>Hello world</title>");
// append <title> to <head>
head.AppendChild(title);
// returns Hello world!
var inner = title.InnerHtml;
// returns <title>Hello world!</title>
var outer = title.OuterHtml;
Hope it helps.
A HTMLTextNode contains just Text, no tags.
It's like the following:
<div> - HTML Node
<span>text</span> - HTML Node
This is the Text Node - Text Node
<span>text</span> - HTML Node
</div>
You're looking for a standard HtmlNode.
HtmlDocument doc = new HtmlDocument();
HtmlNode textNode = doc.CreateElement("title");
textNode.InnerHtml = HtmlDocument.HtmlEncode(text);
Be sure to call HtmlDocument.HtmlEncode() on the text you're adding. That ensures that special characters are properly encoded.

Html Agility Pack C#: Expression must evaluate to a node-set

I'm using Html Agility Pack to fetch a webpage.
I want to collect all the TEXT I AM LOOKING FOR of the following form:
<li></li>
I tried this code:
var web = new HtmlWeb();
var doc = web.Load(url);
var nodes1 = doc.DocumentNode.SelectNodes("//[#data-address]");
var nodes2 = doc.DocumentNode.SelectNodes("//[#data-address={0}]");
both threw an exception: Expression must evaluate to a node-set.
How can i correct my selector ?
I'm not an XPath expert by any means, but I suspect you want:
// Note the *
var nodes1 = doc.DocumentNode.SelectNodes("//*[#data-address]");
In other words "any element with a data-address attribute"

Categories

Resources