c# HtmlAgilityPack for on nodes array

c# HtmlAgilityPack for on nodes array - c#

I'm using html agility pack and after I got array of nodes:
HtmlNode[] nodes = document.DocumentNode.SelectNodes("//tbody[#class='table']").ToArray();
now i want to run a for loop one each nodes[i]. I've tried this:
for (int i = 0; i < 1; i++)
{
if (t == null)
t = new Model.Track();
HtmlNode[] itemText = nodes[i].SelectNodes("//td[#class='artist']").ToArray();
for (int x = 0; x < itemText.Length; x++)
{ //doing something }
the problem is that the itemtext array isn't focusing on nodes[i] .
but brings out an array of all the ("//td[#class='artist']") in the html document.
help?

Using //td[#class='artist'] will fetch all columns with artist class from your document.DocumentNode.
Using .//td[#class='artist'] (Notice the dot at the begining) will fetch all columns with artist class from the current selected node, which in your case is nodes[i].

Related

Split PDF by chapters from Table Of Contents

I'm using GemBox.Pdf and I need to extract individual chapters in a PDF file as a separate PDF files.
The first page (maybe the second page as well) contains TOC (Table Of Contents) and I need to split the rest of the PDF pages based on it:
Also, those PDF documents that are split, should be named as the chapters they contains.
I can split the PDF based on the number of pages for each document (I figured that out using this example):
using (var source = PdfDocument.Load("Chapters.pdf"))
{
int pagesPerSplit = 3;
int count = source.Pages.Count;
for (int index = 1; index < count; index += pagesPerSplit)
{
using (var destination = new PdfDocument())
{
for (int splitIndex = 0; splitIndex < pagesPerSplit; splitIndex++)
destination.Pages.AddClone(source.Pages[index + splitIndex]);
destination.Save("Chapter " + index + ".pdf");
}
}
}
But I can't figure out how to read and process that TOC and incorporate the chapters splitting base on its items.

You should iterate through the document's bookmarks (outlines) and split it based on the bookmark destination pages.
For instance, try this:
using (var source = PdfDocument.Load("Chapters.pdf"))
{
PdfOutlineCollection outlines = source.Outlines;
PdfPages pages = source.Pages;
Dictionary<PdfPage, int> pageIndexes = pages
.Select((page, index) => new { page, index })
.ToDictionary(item => item.page, item => item.index);
for (int index = 0, count = outlines.Count; index < count; ++index)
{
PdfOutline outline = outlines[index];
PdfOutline nextOutline = index + 1 < count ? outlines[index + 1] : null;
int pageStartIndex = pageIndexes[outline.Destination.Page];
int pageEndIndex = nextOutline != null ?
pageIndexes[nextOutline.Destination.Page] :
pages.Count;
using (var destination = new PdfDocument())
{
while (pageStartIndex < pageEndIndex)
{
destination.Pages.AddClone(pages[pageStartIndex]);
++pageStartIndex;
}
destination.Save($"{outline.Title}.pdf");
}
}
}
Note, from the screenshot it seems that your chapter bookmarks include the order's number (roman numerals). If needed, you can easily remove those with something like this:
destination.Save($"{outline.Title.Substring(outline.Title.IndexOf(' ') + 1)}.pdf");

Xml searching recursively for a specific value

In my C# app I am reading an xml document, and in it, are tags containing paths to where .png and .jpg files are being kept. These tags are , and .
I could simply create an XmlNodeList object for each of these tags, such as
XmlNodeList image = _doc.GetElementsByTagName.("Image");
XmlNodeList background = _doc.GetElementsByTagName.("BackgroundImage");
XmlNodeList foreground = _doc.GetElementsByTagName.("ForegroundImage");
for(int i = 0; i < image.count; i++)
{
//..code
}
for(int i = 0; i < background .count; i++)
{
//..code
}
for(int i = 0; i < foreground .count; i++)
{
//..code
}
Clunky, I know. But, is there a way where I can have the application recursively find the tags that contains the word "Image" and return it as a single XmlNodeList? Can it be done? Would this be the best approach? Many thanks in advance.

trouble with adding child node C#

I am adding a child node to the current parent node in treeview. But my problem is that it adds the new node to the end of the current parent rather than to add in the position which the if is true.
Here is my code:
for (int i = 0; i < num; i++)
{
if (action_type1 != action_type2)
{
TreeNode new_node = = treeView1.Nodes[0].Nodes[position];
string new_name = "";
new_node.Nodes.Add(new_name);
}
}
of course num, position, action_type1, and action_type2 are variables in my code and for any for loop they are different integers an strings. action_type1 is the name of nodes of treeView and action_type2 is a fixed string. if loop looks for whole the tree if there is nodes equal with the given string then leave the node otherwise insert an empty node in the tree and then do recursively.
but to make it simple, let we have:
int num = 2;
int position = 4;
string action_type1;
string action_type2;

This is what you want?
for (int i = 0; i < num; i++)
{
if (action_type1 != action_type2)
{
treeView1.Nodes[i].Nodes.Insert(position - 1, virtual_name);
}
}

for loop "index was out of range" c# webdriver

I am getting "index out of range" from this loop. But I need to use new elements that loop founds, how do I do that? Please help to fix the problem
int linkCount = driver.FindElements(By.CssSelector("a[href]")).Count;
string[] links = new string[linkCount];
for (int i = 0; i < linkCount; i++)
{
List<IWebElement> linksToClick = driver.FindElements(By.CssSelector("a[href]")).ToList();
links[i] = linksToClick[i].GetAttribute("href");
}

I think that you could refactor your code:
var linkElements = driver.FindElements(By.CssSelector("a[href]")).ToList();
var links = new List<string>();
foreach (var elem in linkElements)
{
links.Add(elem.GetAttribute("href"));
}
If that works, you could simplify the query:
var instantLinks = driver.FindElements(By.CssSelector("a[href]"))
.Select(e => e.GetAttribute("href"))
.ToList();

You can rewrite your code to bypass the for loop:
string[] links = driver.FindElements(By.CssSelector("a[href]")).Select(l => l.GetAttribute("href")).ToArray();
This should also avoid the index out of range problem, and cut down the amount of code you have to write.

First of all i dont see a point in assigning linkstoclick values inside loop... And Reason for error must be that linksToClick list's length is more than that of linkCount.

int linkCount = driver.FindElements(By.CssSelector("a[href]")).Count;
List<string> links = new List<string>();
for (int i = 0; i < linkCount; i++)
{
List<IWebElement> linksToClick = driver.FindElements(By.CssSelector("a[href]")).ToList();
if (linksToClick.Count < i)
links.Add(linksToClick[i].GetAttribute("href"));
}
This might help with the out of range exception.
Doing this allows you to create a list of type: string without having to explicitly define the size of the list

the first one gets all of your elements by tag name ...let's assume 5.
in the loop, your driver get's all the elements by css selector, and you might have a different number here. let's say 4.
then, you might be trying to set the fifth element in a four element array.
boom.
Easiest fix to debug:
int linkCount = driver.FindElements(By.TagName("a")).Count;
string[] links = new string[linkCount];
// WRITE OUT HOM MANY links you have
for (int i = 0; i < linkCount; i++)
{
List<IWebElement> linksToClick = driver.FindElements(By.CssSelector("a[href]")).ToList();
// ASSERT THAT YOU HAVE THE SAME AMOUNT HERE
If (links.Count != linksToClick.Count)
// your logic here
links[i] = linksToClick[i].GetAttribute("href");
}

Why do I need to count the number of XmlNodes before iterating through and deleting some of them?

I believe I have found a weird bug as follow:
I want to delete the first two nodes in an XmlNodeList.
I know that there may be other ways of doing this (there surely are) but it is the reason why one of the code segments works and one doesn't (the difference being the Count line) that I am interested in.
var strXml = #"<food><fruit type=""apple""/><fruit type=""pear""/><fruit type=""banana""/></food>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(strXml);
XmlNodeList nlFruit = doc.SelectNodes("food/fruit");
for(int i = 0; i < 2; i++)
{
// This produces a null reference exception:
nlFruit[i].ParentNode.RemoveChild(nlFruit[i]);
}
However, if I count the number of nodes in the XmlNodeList it works and I am left with the desired outcome:
var strXml = #"<food><fruit type=""apple""/><fruit type=""pear""/><fruit type=""banana""/></food>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(strXml);
XmlNodeList nlFruit = doc.SelectNodes("food/fruit");
// Count the nodes..
Debug.WriteLine(nlFruit.Count);
for(int i = 0; i < 2; i++)
{
nlFruit[i].ParentNode.RemoveChild(nlFruit[i]);
}
// doc is now: <food><fruit type="banana" /></food>

Both are wrong you should delete from the end
for(int i = 1; i >= 0; i--)
{
nlFruit[i].ParentNode.RemoveChild(nlFruit[i]);
}
because you remove the 0 th element, and 1 st element becomes the 0 th, than you removes 1st element which is null.

May be this will help:
Halloween Problem : http://blogs.msdn.com/mikechampion/archive/2006/07/20/672208.aspx

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

c# HtmlAgilityPack for on nodes array - c#

Using //td[#class='artist'] will fetch all columns with artist class from your document.DocumentNode. Using .//td[#class='artist'] (Notice the dot at the begining) will fetch all columns with artist class from the current selected node, which in your case is nodes[i].

Related

Split PDF by chapters from Table Of Contents

Xml searching recursively for a specific value

trouble with adding child node C#

for loop "index was out of range" c# webdriver

Why do I need to count the number of XmlNodes before iterating through and deleting some of them?

Categories

Resources