I'm stuck in a project for school.
I have a word template, this is a empty document but it has some formatting, for example, each page has 25 lines, the font is Times New Roman and its size 12.
What I have to do is to open this template, write information and save it. When I open the template I should see the text displayed respecting the format.
How can I do this? I tried DocX library but I couldn't make work Line Spacing feature.
You can user whatever you want to make it work. It must be in C#.
Does it need to be C#? Maybe they want you to use vba?
Related
I was wondering if it is possible to find the coordinates of a specific Run (text, no drawing or other elements that have offset parameter) on a page in a Word document using OpenXML SDK. I know that OpenXML is basically .. well XML, and simple runs have no relative, numerical position embedded in them.
I was reading through OpenXML SDK API and found no clues but maybe I have missed something. By coordinates I mean any tuple that can be mapped to pixels if I would generated an image out of the page (imagine you made a screenshot of page)
I suspect, if this is possible, it is not trivial.
Appreciate your help!
The Open XML SDK does not include this functionality. This would require a layout engine, which is not part of the SDK.
Word is not a page layout program, it's a word processor. Therefore:
No, it's not possible because...
The Word application dynamically lays out a page when it's opened in the Word application. Exactly how it's layed out and where things appear on-screen (or on the printed page) depends on how Word calculates font size as well as line, character and paragraph spacing (in all directions) for the currently selected printer driver. So it can vary and thus cannot be saved in the Open XML file.
Is there a way to paste a particular word document's contents into any winforms controls? I need the exact formatting of the word document along with any printscreens it has.
I don't believe there is any Visual Studio native control to achieve what you want to do. The RichText Box will not provide 100% formatting.
In this case you will have to find a 3rd party control.
The following is free and you have the sourcecode but it looks a bit old:
https://www.codeproject.com/articles/3582/word-control-for-net
A quick google search yielded these products.
https://www.devexpress.com/products/net/controls/winforms/rich_editor/
https://marketplace.visualstudio.com/items?itemName=NevronSoftware.RichTextEditorforWinForms-12056
Currently, I can extract all the text chunks with their location data from a PDF. The problem is that the PDF contains images with text annotations which I do not want including in the extraction.
However, for whatever reason whenever I search the PDF for images, it only finds 1 of the images and usually throws the exception: The colour space is not supported. It's as if it doesn't recognise them as images?
I am not wishing to extract the images, just locate where they start and end in relation to the PDF so I can exempt the text that is on top of the images.
For example:
Where the numbers on the graph are unwanted and need to be removed from the extracted text.
Im just not sure how to:
A) Locate all the images and store the coordinates of where it starts and ends
B) Ignore the text that is on top of the images in the PDF document
(I am using iTextSharp to try and achieve this, but so far I am not having much luck)
I'm not exactly sure how iTextSharp works but the PostScript language reference or the PDF Reference manuals may be a good place to start figuring out what you need to know.
I just cracked open a PDF file in a text editor to check out the format because I haven't seen it in a while and then realized what the problem might be.
PDFs support "Images", and "Stream Objects" which can contain image data. Stream objects actually declare enough information that you can know where they begin and end and write something to manually ignore them.
A Stream Object Header looks like this:
<</Intent/RelativeColorimetric/Subtype/Image/Length 19678/Filter/DCTDecode/Name/X/Metadata 4314 0 R/BitsPerComponent 8/ColorSpace 5247 0 R/Width 290/Height 372/Type/XObject>>stream
It's entirely possible that your particular PDF has only one "Image" and then the rest of it is "Streams".
I suggest cracking it open to take a look. It would also be beneficial if you included some sample code with on the library you're using.
I also found by opening a PDF in a text editor this string /Type /Page which seems to create new pages, so you there's a chance you could count those to determine which page you're currently on.
The header at the top of the document I'm reviewing is %PDF-1.2 and the latest version is 1.7, so there may be some disparity here because of that.
Any chance you can share the PDF file you're working with?
I have tried using the System.Windows.Documents.FlowDocument server side, but ran into a problem with images.
What I need to produce is a document with headings, section breaks, page breaks, images (with text wrapping around from the left or the right), tables and ideally some kind of table of contents.
I use c# and asp.net.
Is there a library that will do most of this?
RTF has been chosen because the document needs to be openable in older versions of word, be editable, and we can't run word on the server.
Thank-you
I used MigraDoc in the past, it is a free library. You can create PDFs or RTFs. Just Google it.
I have started using .net rtf writer.
It produces clean rtf, but doesn't do everything I need.
There is pretty good documentation for rtf here.
I am working some things out for my self. For example, I needed to be able to wrap text around an image. Whilst the rtf writer above enables you to add images to documents, it does so by putting the image in its own paragraph. What I need is a shape element.
In the rtf it ends up looking something like this (some of the numbers define the size and position of the image in twips):
{\shp{\*\shpinst\shpleft3801\shptop1\shpright8300\shpbottom4500\shpfhdr0\shpbxcolumn\shpbxignore\shpbypara\shpbyignore\shpwr2\shpwrk0\shpfblwtxt0\shpz0
{\sp
{\sn pib}
{\sv
{\pict\pngblip\pichgoal4499\picwgoal4499
-- image binary data goes here --
}}}
{\sp
{\sn fLine}
{\sv 0}}}}
I sometimes just save something in word and try and understand what it did (but word seems to add a lot of noise).
There're a data list with hundreds of data items (suppose each item is a customer), and a predefined word document as template, the requirement is - for each data item, fill corresponding data into template fields, and generate a readonly PDF file as result.
Prefered platform is ASP.NET with C#.
I found two solutions:
Change the word document into a PDF form, and use iTextSharp to fill the form fields. But create the PDF form with correct format (font, layout, etc.) is a difficult work, and it needs particular tool and new skill when system user wants to add new template (unless the PDF form is always created by developer).
Add text placeholder in the word file, and the program can read word file, replace text, and convert into PDF. But I'm not sure which components should be used.
I'd like to get some advices on this problem. tks.
Update 20130416:
After some searching & experiments, my conclusion is below:
Client solution: use Microsoft.Office.Interop.Word (Office2007+plugin or Office2012) to read data, convert to pdf, etc. But this method running on server side may be unsafe.
Server solution:
Make PDF form, and use iTextSharp to fill the form fields. The disadvantage has been mentioned above.
Make HTML template, and replace field placeholders, and use iTextSharp+XMLWorker to convert HTML to PDF. The difficulty is create the HTML template manually and optimize the PDF effect.
MS SharePoint Office Automation Service is a server solution based on MS Office, perhaps this method will be easier, but it needs license and SharePoint server cost.
Finally, I chose the HTML template solution for this request. QED.
Another option would be to use Tx Text Control for ASP.NET. They have a
mailmerge feature that allows you to fill data into a word template.
The merged document can easily be saved as a pdf.
For the second option you can use iTextsharp or Aspose which supports the placeholder replacement and generation PDF, it supports creating files based on templates of MSWord and Openoffice which could be usefull for user who do not want to buy MSWord only to create a template.
Another option, you can use nustache templates, fill them with list data and then use xmlworker from ItextSharp to render to pdf.