How to Identify the Page overflow in Aspose PDF

How to Identify the Page overflow in Aspose PDF - c#

I am converting a HTML content to PDF using Aspose .Net API's. HTML content contains table which has dynamic data, so if the data is more date, then the table is overflown into second page.
Is there a way to Identify the page overflown in my document, because I want insert an empty page in between, whenever an overflow happens.
Code Snippet:
// Creating Doc
Aspose.Pdf.Document document = new Document();
for(var contentModel in listcontentModel)
{
Page page = document.Pages.Add();
string HTMLContent = //HTMLContent -- using contentModel to build HTML;
-- Sometime page overflow happens because of huge HTML content
HtmlFragment printHtml = new HtmlFragment(text);
page.Paragraphs.Add(printHtml);
}
document.Save(filePath);
Thanks in Advance.

Related

Calculate number of pages in docs file by dot net and C# and Syncfusion.DocIO.Net.Core Package

I have a .net Core 3.1 webapi.
I want to get the number of pages are in docs and doc file.
Am using the Syncfusion.DocIO.Net.Core package to perform operation on docs/doc file. But it does't provide feature to update stat of file and display only the PageCount: document.BuiltinDocumentProperties.PageCount;
This is not updated by files. Would you someone suggest me how i can calculate arithmetically.

You can get the page count from BuiltInProperties of the document using DocIO as below. It shows the page count in the document while creating the document using Microsoft Word application and also it still returns the same page count if we manipulate the document using DoIO.
int count = document.BuiltinDocumentProperties.PageCount;
If you want to get the page count, after manipulating the Word document using DocIO, we suggest you to convert the word document to PDF, and then you can retrieve the page count as like below.
WordDocument wordDocument = new WordDocument(fileStream, FormatType.Docx);
DocIORenderer render = new DocIORenderer();
//Sets Chart rendering Options.
render.Settings.ChartRenderingOptions.ImageFormat = ExportImageFormat.Jpeg;
//Converts Word document into PDF document
PdfDocument pdfDocument = render.ConvertToPDF(wordDocument);
int pageCount = pdfDocument.PageCount;
Since Word document is a flow document in which contents will not be preserved page by page; instead the contents will be preserved sequentially section by section. Each section may extend to various pages based on its contents like table, text, images etc.
Whereas Essential DocIO is a non-UI component that provides a full-fledged document object model to manipulate the Word document contents. Hence it is not feasible to get the page count directly from Word document using DocIO.
We have prepared the sample application to get the page counts from the word document and it can be downloaded from the below link.
https://www.syncfusion.com/downloads/support/directtrac/general/ze/GetPageCount16494613

Split Doc file Pages And Convert To PDF with gembox Document

I want to convert the entire content of that page to PDF by searching for a specific word on each page (which may be on one page or more).
For example, we have a file that has three pages, there is a special word on the first page, and the next special word on the third page. I want to save the PDF from the first to the second page and then save the third page separately. The PDF files will be named according to the specific word on that page.
My problem is that I don't know how to loop for each page and read the content of that page to get to the special word and save the pages as a PDF.
Thank You

Here is how you can do it.
Paginate your Word document using DocumentModel.GetPaginator method.
Read the text content of each page using FrameworkElement.ToText extension method.
Save selected pages to PDF using DocumentModelPage.Save method.
In other words, try the following:
string search = "Your Specific Word";
string inputPath = "input.docx";
// Load Word document.
var document = DocumentModel.Load(inputPath);
// 1. Get document's pages.
var pages = document.GetPaginator().Pages;
for (int i = 0, count = pages.Count; i < count; ++i)
{
// 2. Read page's text content.
DocumentModelPage page = pages[i];
string pageTextContent = page.PageContent.ToText();
// 3. Save page as PDF.
if (pageTextContent.Contains(search))
{
string outputPath = $"{search}_{i}.pdf";
page.Save(outputPath);
}
}

Extract pages from a PDF file using ITextSharp

Is it possible using IText to copy PDF pages from a full PDF document and return partial document based on a form field name? For example I need to copy the beginning of a pdf document and stop at a certain text field called [STOP_HERE], so whatever contents before this fields need to be extracted, the [STOP_HERE] field could be located on a different page for each document, so using page numbers wouldn't help here.
I searched online and all I can find is a way to copy only form fields from a document but not the whole document elements including images texts with their exact location and style.
Can IText do the job here?
EDIT: More details
[STOP_HERE] is an AcroForms text field which has been placed in a document by the PDF design person to indicate that everything before this element should be copied as is into a different document. The field itself is not important, I don't want to fill or do anything with it, it's just used as a signal to let the document parser stop there and copy all previous (upper) contents, I just don't know how to read all contents (without changing style, contents, etc) before this field.

Is it possible using IText to copy PDF pages from a full PDF document and return partial document based on a form field name? For example I need to copy the beginning of a pdf document and stop at a certain text field called [STOP_HERE]
Unfortunately the OP didn't tell whether the page containing the form field [STOP_HERE] is to be included or not. As that is a mere +/-1 matter, though, I simply assumed the page is to be included.
Thus, the task can be implemented like this:
PdfReader reader = new PdfReader(srcFile);
AcroFields.Item field = reader.AcroFields.Fields["[STOP_HERE]"];
if (field != null)
{
int firstPage = reader.NumberOfPages + 1;
for (int index = 0; index < field.Size; index++)
{
int page = field.GetPage(index);
if (page > 0 && page < firstPage)
firstPage = page;
}
if (firstPage <= reader.NumberOfPages)
{
reader.SelectPages("1-" + firstPage);
PdfStamper stamper = new PdfStamper(reader, new FileStream(dstFile, FileMode.Create, FileAccess.Write));
stamper.Close();
}
}
reader.Close();
The code opens the source file in a PdfReader and first looks for the field. If it exists, it iterates over all appearances of that field and determines the earliest page with an appearance of the field. If there is such a page, the code restricts the reader to the pages up to that page and stores this restriction using a PdfStamper.

Get Document OuterHTML of MVC Application in C#

We need to export the entire page of MVC Application to PDF for that purpose need to get all the HTML contents (i.e. including dynamic content too)
To get the contents of page we used following code
string contents = File.ReadAllText(path);
but it will give only static content of page(i.e. it gives page source code) not new nodes added in DOM.
Then tried following code but this also gives static content
// WebClient object
WebClient client = new WebClient();
// Retrieve resource as a stream
Stream data = client.OpenRead(new Uri("xxxx.html"));
// Retrieve the text
StreamReader reader = new StreamReader(data);
string htmlContent = reader.ReadToEnd();
So i want to get enitre outerHTML of document in C# with out using any third party DLL . i googled so many links and everyone updated like use webbrowser control and get the content.
i don't how this will be useful for our application. Our Application is MVC4. we need to export the enitre page to PDF so we need enitre content OF HTML (dynamic content too)
How can i use this below code in ourt MVC Application to get document outerHTML
mshtml.HTMLDocument doc = webBrowser1.Document.DomDocument as mshtml.HTMLDocument;
string html = doc.documentElement.outerHTML;
or
var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)webBrowser.Document.DomDocument;
StringReader sr = new StringReader(documentAsIHtmlDocument3.documentElement.outerHTML);
htmlDoc.Load(sr)
Any help on this.

You haven't mentioned what the PDF is intended for. Most likely it is for the visitor of the page to download. If that is true, maybe you could use jsPDF. That way you get around the problem with not having access to the entire page serverside.

Appending PDFs to generated PDF using iTextSharp

I am generating a pdf using iTextSharp. If certain properties are true then I also want to insert an existing pdf with static content.
private byte[] GeneratePdf(DraftOrder draftOrder)
// create a pdf document
var document = new Document();
// set the page size, set the orientation
document.SetPageSize(PageSize.A4);
// create a writer instance
var pdfWriter = PdfWriter.GetInstance(document, new FileStream(file, FileMode.Create));
document.Open();
if(draftOrder.hasProperty){
//add these things to the pdf
var textToBeAdded = "<table><tr>....</table>";
}
FormatHtml(document, textToBeAdded , css);
if(someOtherProperty){
//add static pdf from file
document.NewPage();
var reader = new PdfReader("myPath/existing.pdf");
PdfImportedPage page;
for(var i = 0; i < reader.NumberOfPages; i++){
//It's this bit I don't really understand
//**how can I add the page read to the document being created?**
}
I can load the pdf from the source but when I iterate over the pages I can't seem to be able to add them to the document I am creating.
Cheers

Please read http://manning.com/lowagie2/samplechapter6.pdf
If you don't mind losing all interactivity, you can get the template from the writer object with the GetImportedPage() method and add it to the document with AddTemplate ().
This question has been answered many times on StackOverflow and you'll notice that I always warn about some dangers: you need to realize that the dimensions of the imported page can be different from the page size you initially defined. Because of this invisible parts of the imported page can become visible; visible parts can become invisible.
I'd prefer adding the extra page in a second ho using PdfCopy, but maybe that's just me.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.