PDFsharp trim page white-space, merge pages

PDFsharp trim page white-space, merge pages - c#

I am merging some PDF documents using PDFsharp, and I want to do the following:
-> the LAST page of FIRST document contains just 2-3 lines of text, so I would like the FIRST page of SECOND document to begin immediately after those 2-3 lines...
Now it goes to the new page, so there is lot of Space-wasting.
Thanks.

PDFsharp does not render PDF pages, PDFsharp does not parse PDF pages.
When using PDFsharp there is no easy way to achieve what you want and you will have to add a lot of code to get it done.
It might be easier with another library, but I cannot give you any names.

Related

PDF First page image preview in DIV on website

In my system, there are multiple PDFs listed in the website. I need to show the preview image of 1st page of all the PDFs.
There are two previews which I want to display -
One small preview
One big preview on mouse hover
What I am doing now?
We are taking help few third party preview generators. Which is used to create JPEG image and using those images in the website for previews.
What I tried differently?
I used EvoPDFtoHTML tool to use HTML instead of images directly but for many files the generated HTML is not appropriate.
Also, These both process is taking a lot of time and making website
slow in response.
I would like to know that is there any better way to achieve this?
Image attached below for better understandings -

An approach that is worth exploring
Parse the PDF and extract the 1st page.You may use command line tools like : PDFtk, Ghostscript, or Implement your own class to parse out the first page in C#
Then use Google doc viewer and embed an iframe to point to PDF
Example of PDFtk:
pdftk input.pdf cat 1 output page-1-of-input.pdf
Example of GhostScript:
gs -o page-1-of-input.pdf -sDEVICE=pdfwrite -dPDFLastPage=1 input.pdf
References:
Display first page of PDF as Image
You can also look at Fahims answer for the C# snippet that he tried

how can i read page number 2 in docx file by C# or .net?

**i'm trying to iterate on docx file pages and read content page by page. any help?

You can't. Word documents don't work like that. A word document is more of a website (bunch of xml tags) than a bunch of specific pages with elements attached. The program(s) (Microsoft Office, Open Office) just give you the illusion of there being actual pages.
You could do this if you knew a page was going to x in length and never deviate... but that's rarely the case.
I could be wrong, but I've looked at 2 libraries in-depth and all of them can't do that.
Have a look at the documentation for Aspose.Words

C#: How to convert HTML5/CSS3 into PDF document?

It's obvious from the title what I want to do. I know it is possible to convert html to PDF document using a very popular library iTextSharp. But what I acknowledged from this post is iTextSharp cannot render HTML5 and CSS3 styles correctly. Is there any free library to achieve this?
Backgroud:
I am using DevExtreme for report generation. It has supported chart export in PDF but my client wants some extra content in PDF apart from charts. It is not supported by DevExtreme, so I took decision to write my own custom PDF exporter.
There are some libraries available but I cannot rely them since I can't predict in advance what issues will it cause in production in future. Correct me if I am wrong, there is no API given by Microsoft for manipulating PDF files. We can create and manipulate excel and doc files using Microsoft.Office.Interop.Excel.dll and Microsoft.Office.Interop.Word.dll but I didn't find anything for PDF manipulation.
Please suggest me what options I have.
Hope this makes sense..!

A few years back I was using iTextSharp to get our html manuals in xhtml/css/wiki to pdf. It was...painful and a lot of work. So, the first news is: You will need quite a few weeks (2,3,4 weeks, depending on the grade of perfection you want) of time if what you have is not only a few html pages.
If you only have a very limited amount of pages, the quickest and dirtiest way is to make screenshots from your rendered pages and add those images to the pdf. Not very high-tech but quickly done.
If your style sheets can be sacrificed and you do not care about the formatting of the content to be identical, you can convert your html5 pages to xhtml so you can load them as XmlDocuments. Then you simply create a program which does some mapping from xml elements, such as <h1>MyTitle</h1> to some section of code which creates a pdf entity using iTextSharp. Basically that was the way I did it in my case. I also did some mapping from css style classes to some specific pdf formatting, but not to the extreme.
Also worth trying is converters from html (or xml) to tex/latex. If you are lucky you find one which does a good enough job. Then you can use pdftex and get your pdf.
Also, it is possible that you can print your documents to an xps printer and then convert the xps to pdf. Or you simply convince your customer that xps is what they want.

Merging word documents and preserve their formatting, header and footer

I have a trouble on merging multiple word documents into single. I had a scenario where I am generating word documents from html with header and footer. I have around 10-15 such documents. I am generating these word document individually and are working fine.
Now, I have a requirement to generate html of all 10 pages and combine them to single word report. These should preserve the individuals report's formatting, header and footer.
I have tried this in two ways but didn't get success:
Combined html of all pages into one html page and then finally saved file as word file.
Created word report for all 10 html files individually and merged them using Microsoft.Office.Interop.
I was able to merge the document but was not able to keep the header, footer and formatting of the individual document.
I have searched about section-break too but not sure how to use this.
Please see if anyone can guide me toward the possible solution or anything else that can help me.
Thanks in advance.
.

You could try merging with DocumentBuilder
If that doesn't give you enough control, see whether docx4j.NET (commercial edition) might help, with its demo merge webapp. Docx4j's MergeDocx provides fine grained control over header/footer behaviour.

How to generate an RTF document server side in c#

I have tried using the System.Windows.Documents.FlowDocument server side, but ran into a problem with images.
What I need to produce is a document with headings, section breaks, page breaks, images (with text wrapping around from the left or the right), tables and ideally some kind of table of contents.
I use c# and asp.net.
Is there a library that will do most of this?
RTF has been chosen because the document needs to be openable in older versions of word, be editable, and we can't run word on the server.
Thank-you

I used MigraDoc in the past, it is a free library. You can create PDFs or RTFs. Just Google it.

I have started using .net rtf writer.
It produces clean rtf, but doesn't do everything I need.
There is pretty good documentation for rtf here.
I am working some things out for my self. For example, I needed to be able to wrap text around an image. Whilst the rtf writer above enables you to add images to documents, it does so by putting the image in its own paragraph. What I need is a shape element.
In the rtf it ends up looking something like this (some of the numbers define the size and position of the image in twips):
{\shp{\*\shpinst\shpleft3801\shptop1\shpright8300\shpbottom4500\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr2\shpwrk0\shpfblwtxt0\shpz0
{\sp
{\sn pib}
{\sv
{\pict\pngblip\pichgoal4499\picwgoal4499
-- image binary data goes here --
}}}
{\sp
{\sn fLine}
{\sv 0}}}}
I sometimes just save something in word and try and understand what it did (but word seems to add a lot of noise).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

PDFsharp trim page white-space, merge pages - c#

PDFsharp does not render PDF pages, PDFsharp does not parse PDF pages. When using PDFsharp there is no easy way to achieve what you want and you will have to add a lot of code to get it done. It might be easier with another library, but I cannot give you any names.

Related

PDF First page image preview in DIV on website

how can i read page number 2 in docx file by C# or .net?

C#: How to convert HTML5/CSS3 into PDF document?

Merging word documents and preserve their formatting, header and footer

How to generate an RTF document server side in c#

Categories

Resources