Select single table row using HtmlAgilityPack and iterate its links

Select single table row using HtmlAgilityPack and iterate its links - c#

I try to iterate a single table row and its a href links but it does not work as expected, instead of finding the selected row and its links it find all links in the table.. What am I doing wrong?
var allRows = doc.DocumentNode.SelectNodes("//table[#id='sortingTable']/tr");
var i = 0;
var rowNumber = 0;
foreach (var row in allRows)
{
if (row.InnerText.Contains("Text in cell for which row I want to use"))
{
rowNumber = i+1;
break;
}
i += 1;
}
var list = new List<SortFile>();
var rowToRead = allRows[rowNumber]; // One specific row
var numberOfLinks = rowToRead.SelectNodes("//a[#href]"); // this does not find the 2 links in the table row but all links in the whole table?
foreach (HtmlNode link in rowToRead.SelectNodes("//a[#href]"))
{
//HtmlAttribute att = link.Attributes["href"];
//var text = link.OuterHtml;
}

The XPath you are using (//a[#href]) would get all of the links in the document. // means to find anything starting from the document root.
You should use .//a[#href] to start from the current node and select all links. That would only take the links underneath the tr node you have selected.

Related

Foreach not iterating through elements

I have an HTML document and I'm getting elements based on a class. Once I have them, I'm going through each element and get further elements:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(content);
var rows = doc.DocumentNode.SelectNodes("//tr[contains(#class, 'row')]");
foreach (var row in rows)
{
var name = row.SelectSingleNode("//span[contains(#class, 'name')]").InnerText,
var surname = row.SelectSingleNode("//span[contains(#class, 'surname')]").InnerText,
customers.Add(new Customer(name, surname));
};
However, the above is iterating through the rows but the always retrieving the text of the first row.
Is the XPath wrong?

This is a FAQ in XPath. Whenever your XPath starts with /, it ignores context element (the element referenced by row variable in this case). It searches for matching elements starting from the root document node regardless of the context. That's why your SelectSingleNode() always return the same element which is the first matched element in the entire document.
You only need to prepend a dot (.) to make it relative to current context element :
foreach (var row in rows)
{
var name = row.SelectSingleNode(".//span[contains(#class, 'name')]").InnerText,
var surname = row.SelectSingleNode(".//span[contains(#class, 'surname')]").InnerText,
customers.Add(new Customer(name, surname));
}

What about using LINQ?
var customers = rows.Select(row => new Customer(Name = row.SelectSingleNode("//span[contains(#class, 'name')]").InnerText, Surname = row.SelectSingleNode("//span[contains(#class, 'surname')]").InnerText)).ToList();

selenium to click on several links one after other

I have a table over a webpage having many values repeating like this:
Description App Name Information
Some Desc1 App1 Some Info
Some Desc2 App2 Some Info
Some Desc3 App2 Some Info
Some Desc4 App3 Some Info
Some Desc5 App4 Some Info
At the start of my app, it will ask the user to enter an appname of their choice. What I want is if I choose APP2 it should select "Some Desc2" first, that will lead to another page and there I will do something. Then again it should come back to previous page and this time it should select "Some Desc3", that will lead to another page. This should be repeated n number of times until selenium can't find an appname specified.
I have tried as shown below:
//Finding Table, its rows and coloumns
int rowcount = driver.FindElements(By.Id("someid")).Count;
for (int i = 0; i < rowcount; i++)
{
//Finding App name based on user entered text
var elems = driver.FindElements(By.PartialLinkText(text));
IList<IWebElement> list = elems;
for (int j = 0; j < list.Count; j++)
{
var table = driver.FindElement(By.Id("someid"));
IList<IWebElement> rows = table.FindElements(By.TagName("tr"));
IList<IWebElement> cells = rows[i].FindElements(By.TagName("td"));
//Again finding element based on user entered text
var elem = driver.FindElements(By.PartialLinkText(text));
list = elem;
if (list[1].Text.Equals(text))
{
list[0].Click();
string duration;
string price;
var elements = driver.FindElements(By.Id("SPFieldNumber"));
IList<IWebElement> lists = elements;
duration = lists.First().Text.ToString();
price = lists.ElementAt(1).Text.ToString();
MessageBox.Show(duration);
MessageBox.Show(price);
driver.Navigate().Back();
}
}
}
Running this code selects "Some Desc2" correctly and everything went fine. But after returning to the previous page c# throws an exception "element not found in the cache - perhaps the page has changed since it was looked up selenium".

For this particular issue, you find table and row elements before the loop, then by calling driver.Navigate().Back(); inside the loop, your table and row are no longer in the DOM (because your page changes, DOM changes, the table element is not the one you find outside the loop anymore)
Try put them inside the loop
int rowCount = driver.FindElements(By.CssSelector("#table_id tr")).Count; // replace table_id with the id of your table
for (int i = 0; i < rowCount ; i++)
{
var table = driver.FindElement(By.Id("some ID"));
rows = table.FindElements(By.TagName("tr"));
// the rest of the code
}
However, apart from solving your problems, I really suggest you read the Selenium documentation and learn some basic C# programming first, this will save you a lot time asking questions here.
Why are you doing this every time?
var elems = driver.FindElements(By.PartialLinkText(text));
IList<IWebElement> list = elems;
// IList<IWebElement> list = driver.FindElements(By.PartialLinkText(text));
element.Text is the string type you want, no need for calling ToString()
lists.First().Text.ToString();
// lists.First().Text;
You don't need this if there's no frames involved.
driver.SwitchTo().DefaultContent();
(from your earlier post) A list of IWebElement would never equal to a string, and the result can't be an element. Avoid using var if you don't know what type you want, as it may get you a totally different thing.
IList<IWebElement> list = elems;
var elem= list.Equals(text);
(from your earlier post) element.ToString() and element.Text are different
string targetele = elem.ToString(); // you want elem.Text;

How to read tables from a particular place in a document?

When I use the below line It reads all tables of that particular document:
foreach (Microsoft.Office.Interop.Word.Table tableContent in document.Tables)
But I want to read tables of a particular content for example from one identifier to another identifier.
Identifier can be in the form of [SRS oraganisation_123] to another identifier [SRS Oraganisation_456]
I want to read the tables only in between the above mentioned identifiers.
Suppose 34th page contains my identifier so I want read all tables from that point to until I come across my second identifier. I don't want to read remaining tables.
Please ask me for any clarification in the question.

Say start and end Identifiers are stored in variables called myStartIdentifier and myEndIdentifier -
Range myRange = doc.Range();
int iTagStartIdx = 0;
int iTagEndIdx = 0;
if (myRange.Find.Execute(myStartIdentifier))
iTagStartIdx = myRange.Start;
myRange = doc.Range();
if (myRange.Find.Execute(myEndIdentifier))
iTagEndIdx = myRange.Start;
foreach (Table tbl in doc.Range(iTagStartIdx,iTagEndIdx).Tables)
{
// Your code goes here
}

Not sure how your program is structured... but if you can access the identifier in tableContent then you should be able to write a LINQ query.
var identifiers = new List<string>();
identifiers.Add("myIdentifier");
var tablesWithOnlyTheIdentifiersIWant = document.Tables.Select(tableContent => identifiers.Contains(tableContent.Identifier)
foreach(var tableContent in tablesWithOnlyTheIdentifiersIWant)
{
//Do something
}

Go through following code, if it helps you.
System.Data.DataTable dt = new System.Data.DataTable();
foreach (Microsoft.Office.Interop.Word.Cell c in r.Cells)
{
if(c.Range.Text=="Content you want to compare")
dt.Columns.Add(c.Range.Text);
}
foreach (Microsoft.Office.Interop.Word.Row row in newTable.Rows)
{
System.Data.DataRow dr = dt.NewRow();
int i = 0;
foreach (Cell cell in row.Cells)
{
if (!string.IsNullOrEmpty(cell.Range.Text)&&(cell.Range.Text=="Text you want to compare with"))
{
dr[i] = cell.Range.Text;
}
}
dt.Rows.Add(dr);
i++;
}
Go through following linked 3rd number answer.
Replace bookmark text in Word file using Open XML SDK

Adding text to multiples rows in word with a single bookmark

Is it possible to add several rows with the help of Bookmarks and openXML to a word document?
We have a worddocument that serves as a report template.
In that template we need to add several transaction rows.
The problem is that the number of rows aren't static. It could be 0, 1 or 42 for example.
In the current template (which we can change) we have added 3 bookmarks
TransactionPart, TransactionPart2 and TransactionPart3.
The tree transactionparts forms a singel row with three different datacontent (ID, Description, Amount)
If we have just one transaction row we have no problem adding the data to those bookmarks, but what do we do when we should add row two? There are no bookmarks for more rows.
Is there a smart way of doing this?
Or should we change the worddocument so that the rows end up in a table? Would that solve the problem in a better way?

I would put a single bookmark lets call it "transactions" inside a 3 coloumn table.
Like this
When you know the design of the table, but not the number of rows you'll be needing the simplest way is to add a row for each line of data you have.
You could accomplish that with a code like this
//make some data.
List<String[]> data = new List<string[]>();
for (int i = 0; i < 10; i++)
data.Add(new String[] {"this","is","sparta" });
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open("yourDocument.docx", true))
{
var mainPart = wordDoc.MainDocumentPart;
var bookmarks = mainPart.Document.Body.Descendants<BookmarkStart>();
var bookmark =
from n in bookmarks
where n.Name == "transactions"
select n;
OpenXmlElement elem = bookmark.First().Parent;
//isolate tabel
while (!(elem is DocumentFormat.OpenXml.Wordprocessing.Table))
elem = elem.Parent;
var table = elem; //found
//save the row you wanna copy in each time you have data.
var oldRow = elem.Elements<TableRow>().Last();
DocumentFormat.OpenXml.Wordprocessing.TableRow row = (TableRow)oldRow.Clone();
//remove old row
elem.RemoveChild<TableRow>(oldRow);
foreach (String[] s in data)
{
DocumentFormat.OpenXml.Wordprocessing.TableRow newrow = (TableRow)row.Clone();
var cells = newrow.Elements<DocumentFormat.OpenXml.Wordprocessing.TableCell>();
//we know we have 3 cells
for(int i = 0; i < cells.Count(); i++)
{
var c = cells.ElementAt(i);
var run = c.Elements<Paragraph>().First().Elements<Run>().First();
var text = run.Elements<Text>().First();
text.Text = s[i];
}
table.AppendChild(newrow);
}
}
You end up with this
I've tested this code on a pretty basic document and know it works.
Good luck and let me know if I can clarify further.

How to read rows from Lucene.Net's index files

I am using Lunece.net 2.0.5 version.
I want to open and display all the records in the index file in a grid (table) format in an ASP.NET web application, and also provide edit option for each cell in that grid.
But I don't know how to read each row from Index file.
I used code below-
private List<String> GetIndexTerms(string indexFolder)
{
List<String> termlist = new List<string>();
IndexReader reader = IndexReader.Open(indexFolder, false);
TermEnum terms = reader.Terms();
while (terms.Next())
{
Term term = terms.Term();
String termText = term.Text();
int frequency = reader.DocFreq(term);
termlist.Add(termText);
}
reader.Close();
return termlist;
}
but it returns list of each term and here I am unable to aggregate data by each row (record).
Let me know if there is way to read file by each row or I need to update version of Lucene that I am currently using.
Also please provide any links to Lucene.net's better documentation websites.

You can read all the records/rows (documents in Lucene terminology) directly from the index without searching
var reader = IndexReader.Open(dir);
for (int i = 0; i < reader.MaxDoc(); i++)
{
if (reader.IsDeleted(i)) continue;
Document d = reader.Document(i);
var fieldValuePairs = d.GetFields()
.Select(f => new {
Name = f.Name(),
Value = f.StringValue() })
.ToArray();
}
PS: v2.0.5 is very old. try latest & greatest Lucene.Net

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Select single table row using HtmlAgilityPack and iterate its links - c#

The XPath you are using (//a[#href]) would get all of the links in the document. // means to find anything starting from the document root. You should use .//a[#href] to start from the current node and select all links. That would only take the links underneath the tr node you have selected.

Related

Foreach not iterating through elements

selenium to click on several links one after other

How to read tables from a particular place in a document?

Adding text to multiples rows in word with a single bookmark

How to read rows from Lucene.Net's index files

Categories

Resources