How to get Page Count per section in a PDF

How to get Page Count per section in a PDF - c#

I'm rendering a PDF document with MigraDoc.
Each section has one or more paragraph texts.
Currently this is how I create a document;
var document = new Document();
var pdfRenderer = new PdfDocumentRenderer(true);
pdfRenderer.Document = document;
for(int i=0;i<10;i++){
Section section = document.AddSection();
section.PageSetup.PageFormat = PageFormat.A4;
for(int j=0;j<5;j++) {
var paragraphText = GetParaText(i,j); // some large text can span multiple pages
section.AddParagraph(paragraphText);
//Want page count per section?
// Section 1 -> 5 , Section 2 ->3 etc.
// int count = CalculateCurrentPageCount(); //*EDIT*
}
}
// Create the PDF document
pdfRenderer.RenderDocument();
pdfRenderer.Save(filename);
Edit : Currently i use the following code to get the page count.
But it takes a lot of time ,possibly every page is rendered twice.
public int CalculateCurrentPageCount()
{
var tempDocument = document.Clone();
tempDocument.BindToRenderer(null);
var pdfRenderer = new PdfDocumentRenderer(true);
pdfRenderer.Document = tempDocument;
pdfRenderer.RenderDocument();
int count = pdfRenderer.PdfDocument.PageCount;
Console.WriteLine("-- Count :" + count);
return count;
}
Some of the sections can span multiple pages depending on content added.
Is it possible to get/find how many pages (in PDF) it took for a Section to render?
Edit 2 : Is it possible to tag a section and find on which page it starts on?

Thx for the help. I calculated it like this (i.e. To get the count in code...) :
First i tagged the section with a creation count of the section
newsection.Tag = num_sections_in_doc; //count changes every time i add a section
Then i used GetDocumentObjectsFromPage :
var x = new Dictionary<int, int>();
int numpages = pdfRenderer.PdfDocument.PageCount;
for (int idx = 0; idx < numpages; idx++)
{
DocumentObject[] docObjects = pdfRenderer.DocumentRenderer.GetDocumentObjectsFromPage(idx + 1);
if (docObjects != null && docObjects.Length > 0)
{
Section section = docObjects[0].Section;
int sectionTag = -1;
if (section != null)
sectionTag = (int)section.Tag;
if (sectionTag >= 0)
{
// count a section only once
if (!x.ContainsKey(sectionTag))
x.Add(sectionTag, idx + 1);
}
}
}
x.Keys are the sections.
and x.values are the start of each section.

If you want to display the page count in the PDF, use paragraph.AddSectionPagesField().
See also:
https://stackoverflow.com/a/19499231/162529
To get the count in code: you can add a tag to any document object (e.g. to any paragraph) and then use docRenderer.GetDocumentObjectsFromPage(...) to query the objects for a specific page. This allows you to find out which section the objects on this page belong to.
Or create each section in a separate document and then combine them to one PDF using docRenderer.RenderPage(...) as shown here:
http://www.pdfsharp.net/wiki/MixMigraDocAndPdfSharp-sample.ashx
The sample scales pages down to thumbnail size - you would draw them 1:1, each on a new page.

Related

Getting Error for td : stale element reference: element is not attached to the page document

I am new to selenium coding, and I have the below code where I am fetching values from the table it has multiple pages,
for 1st time, it reads all values from the table and control move to the next page, I m getting the error stale element reference: element is not attached to the page document
but when i m debugging the code, i m not getting any error for the below code, when i run it throws an error and it shows an error at line where I have defined tdCollection
Please guide me on this.
var ReportCount = Convert.ToInt32(_driver.FindElement(By.Id("Reporter_TotalPages")).Text);
for (int i = 0; i < ReportCount; i++)
{
IList<IWebElement> _records = (IList<IWebElement>)_driver.FindElements(By.XPath("//*[contains(#id,'ReportViewerControl')]//div//table//tbody//tr[position()>2]"));
IList<IWebElement> tdCollection;
for (int j = 0; j < _records.Count; j++)
{
tdCollection = _records[j].FindElements(By.TagName("td"));
var Patientdemolist = new XPatientDemographicsList();
{
Patientdemolist.PatientID = tdCollection[0].Text;
Patientdemolist.LastName = tdCollection[1].Text;
Patientdemolist.FirstName = tdCollection[2].Text;
};
PatientDemographicsList.Add(Patientdemolist);
tdCollection = null;
}
if (ReportCount - 1 > i)
{
// For Next Page
_driver.FindElement(By.Id("Report_Next")).Click();
}
}

Try adjusting your conditional to this.
if (ReportCount - 1 > i)
{
// For Next Page
_driver.FindElement(By.Id("Report_Next")).Click();
Thread.Sleep(5000)
}
Its possible you are getting a reference before the page has completed loading from the .Click() method.
If that works you can refine the tests to wait implictly/ use fluent waits instead of waiting for 5 seconds.
https://www.selenium.dev/documentation/webdriver/waits/

Unable to fetch value of web element even though it exists

I am trying to fetch the Text of All table rows in a web page using Selenium C#.
Below is the code for retriving all tr's from the web page:
var trElements = driver.FindElements(By.TagName("tr"));
This shows the data correctly. For example, tr element number 35 has text 'canara bank' in it, that can be seen in below image.
Now I am trying to extract only text of all the tr elements. Either by using LINQ or by using for loop:
string[] strrr= trElements.Select(t => t.Text).ToArray();
Surprisingly, Text property of most of the element does not show up the data that was shown in web element. Randomly data of some elements keeps showing up or goes off.
I want to ensure that data of web elements is correctly converted to string array. How to achieve this?

I think there are 3 possibilities.
1. The rows are not visible. So element.Text can't give you the text. In this case, you need to use element.GetAttribute("innerText") instead of element.Text.
string[] strrr = trElements.Select(t => t.GetAttribute("innerText")).ToArray();
2. The script does not have enough wait time. In this case, you just need to add wait to check text length.
var trElements = driver.FindElements(By.TagName("tr"));
List<string> strrr = new List<string>();
foreach (var tr in trElements)
{
IWait<IWebElement> wait = new DefaultWait<IWebElement>(tr);
wait.Timeout = TimeSpan.FromSeconds(10);
try
{
wait.Until(element => element.Text.Trim().Length > 1);
strrr.Add(element.Text.Trim());
}
catch (WebDriverTimeoutException)
{
strrr.Add("");
}
}
3. The text will be displayed when you scroll down.
int SCROLL_PAUSE_TIME = 1;
int SCROLL_LENGTH = 500;
var jsExecutor = driver as IJavaScriptExecutor;
int pageHeight = Int32.Parse((string)jsExecutor.ExecuteScript("return document.body.scrollHeight"));
int scrollPosition = 0;
while (scrollPosition < pageHeight)
{
scrollPosition = scrollPosition + SCROLL_LENGTH;
jsExecutor.ExecuteScript("window.scrollTo(0, " + scrollPosition + ");");
System.Threading.Thread.Sleep(SCROLL_PAUSE_TIME);
}
var trElements = driver.FindElements(By.TagName("tr"));

How to Define a PDF Outline Using MigraDoc

I noticed when using MigraDoc that if I add a paragraph with any of the heading styles (e.g., "Heading1"), an entry is automatically placed in the document outline. My question is, how can I add entries in the document outline without showing the text in the document? Here is an example of my code:
var document = new Document();
var section = document.AddSection();
// The following line adds an entry to the document outline, but it also
// adds a line of text to the current section. How can I add an
// entry to the document outline without adding any text to the page?
var paragraph = section.AddParagraph("TOC Level 1", "Heading1");

I used a hack: added white text on white ground with a font size of 0.001 or so to get outlines that are actually invisible to the user.
For a perfect solution, mix PDFsharp and MigraDoc code. The hack works for me and is much easier to implement.

I realized after reading ThomasH's answer that I am already mixing PDFSharp and MigraDoc code. Since I am utilizing a PdfDocumentRenderer, I was able to add a custom outline to the PdfDocument property of that renderer. Here is an example of what I ended up doing to create a custom outline:
var document = new Document();
// Populate the MigraDoc document here
...
// Render the document
var renderer = new PdfDocumentRenderer(false, PdfFontEmbedding.Always)
{
Document = document
};
renderer.RenderDocument();
// Create the custom outline
var pdfSharpDoc = renderer.PdfDocument;
var rootEntry = pdfSharpDoc.Outlines.Add(
"Level 1 Header", pdfSharpDoc.Pages[0]);
rootEntry.Outlines.Add("Level 2 Header", pdfSharpDoc.Pages[1]);
// Etc.
// Save the document
pdfSharpDoc.Save(outputStream);

I've got a method that is slightly less hacked. Here's the basic method:
1) Add a bookmark, save into a list that bookmark field object and the name of the outline entry. Do not set a paragraph .OutlineLevel (or set as bodytext)
// Defined previously
List<dynamic> Bookmarks = new List<dynamic>();
// In your bookmarking method, P is a Paragraph already created somewhere
Bookmarks.Add(new { Bookmark = P.AddBookmark("C1"), Name = "Chapter 1", Depth = 0 });
2) At the end of your Migradoc layout, before rendering, prepare the pages
pdfwriter.PrepareRenderPages();
3) Build a dictionary of the Bookmark's parent's parent (This will be a paragraph) and pages (pages will be initialized to -1)
var Pages = Bookmarks.Select(x=> ((BookmarkField)x).Bookmark.Parent.Parent).ToDictionary(x=>x, x=>-1);
4) Now fill in those pages by iterating through the objects on each page, finding the match
for (int i = 0; i < pdfwriter.PageCount; i++)
foreach (var s in pdfwriter.DocumentRenderer.GetDocumentObjectsFromPage(i).Where(x=> Pages.ContainsKey(x))
Pages[s] = i-1;
5) You've now got a dictionary of Bookmark's parent's parents to page numbers, with this you can add your outlines directly into the PDFSharp document. This also iterates down the depth-tree, so you can have nested outlines
foreach(dynamic d in Bookmarks)
{
var o = pdfwriter.PdfDocument.Outlines;
for(int i=0;i<d.Depth;i++)
o = o.Last().Outlines;
BookmarkField BK = d.Bookmark;
int PageNumber = Pages[BK.Parent.Parent];
o.Add(d.Name, pdfwriter.PdfDocument.Pages[PageNumber], true, PdfOutlineStyle.Regular);
}

selenium to click on several links one after other

I have a table over a webpage having many values repeating like this:
Description App Name Information
Some Desc1 App1 Some Info
Some Desc2 App2 Some Info
Some Desc3 App2 Some Info
Some Desc4 App3 Some Info
Some Desc5 App4 Some Info
At the start of my app, it will ask the user to enter an appname of their choice. What I want is if I choose APP2 it should select "Some Desc2" first, that will lead to another page and there I will do something. Then again it should come back to previous page and this time it should select "Some Desc3", that will lead to another page. This should be repeated n number of times until selenium can't find an appname specified.
I have tried as shown below:
//Finding Table, its rows and coloumns
int rowcount = driver.FindElements(By.Id("someid")).Count;
for (int i = 0; i < rowcount; i++)
{
//Finding App name based on user entered text
var elems = driver.FindElements(By.PartialLinkText(text));
IList<IWebElement> list = elems;
for (int j = 0; j < list.Count; j++)
{
var table = driver.FindElement(By.Id("someid"));
IList<IWebElement> rows = table.FindElements(By.TagName("tr"));
IList<IWebElement> cells = rows[i].FindElements(By.TagName("td"));
//Again finding element based on user entered text
var elem = driver.FindElements(By.PartialLinkText(text));
list = elem;
if (list[1].Text.Equals(text))
{
list[0].Click();
string duration;
string price;
var elements = driver.FindElements(By.Id("SPFieldNumber"));
IList<IWebElement> lists = elements;
duration = lists.First().Text.ToString();
price = lists.ElementAt(1).Text.ToString();
MessageBox.Show(duration);
MessageBox.Show(price);
driver.Navigate().Back();
}
}
}
Running this code selects "Some Desc2" correctly and everything went fine. But after returning to the previous page c# throws an exception "element not found in the cache - perhaps the page has changed since it was looked up selenium".

For this particular issue, you find table and row elements before the loop, then by calling driver.Navigate().Back(); inside the loop, your table and row are no longer in the DOM (because your page changes, DOM changes, the table element is not the one you find outside the loop anymore)
Try put them inside the loop
int rowCount = driver.FindElements(By.CssSelector("#table_id tr")).Count; // replace table_id with the id of your table
for (int i = 0; i < rowCount ; i++)
{
var table = driver.FindElement(By.Id("some ID"));
rows = table.FindElements(By.TagName("tr"));
// the rest of the code
}
However, apart from solving your problems, I really suggest you read the Selenium documentation and learn some basic C# programming first, this will save you a lot time asking questions here.
Why are you doing this every time?
var elems = driver.FindElements(By.PartialLinkText(text));
IList<IWebElement> list = elems;
// IList<IWebElement> list = driver.FindElements(By.PartialLinkText(text));
element.Text is the string type you want, no need for calling ToString()
lists.First().Text.ToString();
// lists.First().Text;
You don't need this if there's no frames involved.
driver.SwitchTo().DefaultContent();
(from your earlier post) A list of IWebElement would never equal to a string, and the result can't be an element. Avoid using var if you don't know what type you want, as it may get you a totally different thing.
IList<IWebElement> list = elems;
var elem= list.Equals(text);
(from your earlier post) element.ToString() and element.Text are different
string targetele = elem.ToString(); // you want elem.Text;

Reasonable method to store and read constant values

So I have a xml structure with the following format:
< Smart>
< Attribute>
< id >1 </id >
< name >name </ name>
< description >description</ description >
</ Attribute>
.
.
.
</Smart>
I then need to get user input to produce a datatable, depending on what the user inputs different constants will be used. The ID is used to distinguish between the different constants. All these constants are predefined before startup. The following is my code to find desired constants and store them into a datatable
for ( int row = 0; row < rowcount; row++)
{
found = false;
XmlTextReader textReader = new XmlTextReader ("Smart_Attributes.xml" );
textReader.ReadStartElement( "Smart" );
while (!found)
{
textReader.ReadStartElement("Attribute" );
DataId = Convert .ToByte(textReader.ReadElementString("id" ));
if (DataId > id)
{
dataView[count][5] = "Unknown" ;
dataView[count][7] = "Unknown" ;
found = true ;
}
if (DataId == id)
{
dataView[count][5] = textReader.ReadElementString("name" );
dataView[count][7] = textReader.ReadElementString("description" );
found = true ;
}
else
{
textReader.ReadElementString("name" );
textReader.ReadElementString("description" );
}
textReader.ReadEndElement(); //</Attribute>
}
count++;
}
}
this does work on getting the desired constants to be found fo their corresponding row. However it seems like a lot of work for not much gain. Could this potentially be done better using something like a dataset? any other suggestions would be super helpful.

The benefit of the XmlTextReader lies in its forward-only nature. This allows you to read a very large file in just one sweep. If your XML file is smaller, and you're happy to hold the whole structure in memory at any one time, you can use Linq2Xml, and read it into an XDocument class.
var doc = XDocument.Load("Smart_Attributes.xml");
var rows = doc.Descendants("Attribute").Where(e => e.Element("id").Value == id)
.Select(e => new
{
Name = e.Element("name").Value,
Description = e.Element("description").Value
});
// Load rows into your datatable

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to get Page Count per section in a PDF - c#

Related

Getting Error for td : stale element reference: element is not attached to the page document

Unable to fetch value of web element even though it exists

How to Define a PDF Outline Using MigraDoc

selenium to click on several links one after other

Reasonable method to store and read constant values

Categories

Resources