Is there a way to do form filling with XPS documents in C#?
It wouldn't be an ideal solution. The reason is that text in XPS is represented by Glyphs, in which characters (roughly speaking) are specified as indexes into a particular typeface's table of character shapes, rather than in something handy like Unicode. And the Glyphs have to be specifically positioned, each character offset from the string origin. If a string has to wrap onto multiple lines, that has to be done by putting in many separate Glyphs objects, all individually positioned. It's not a logical format, but a physical one, and it isn't particularly amenable to being altered after it's generated.
You'd be much better off defining the template document as a FlowDocument. That's a much more logically manipulable format, rather than a physical "frozen" format. (An XPS document is technically a FixedDocument). You can then freeze a FlowDocument into a FixedDocument to export it as XPS, when it has been filled in.
Read more about Flow Documents.
Here's how you do it.
First, you create an object (POCO is fine) that contains the form fields you wish to have the user fill in for your form. Next, you create your form as a FixedDocument. It should have Bindings that work against your POCO. Then you create a Window that is set up so that the user can enter values for those fields.
Bind the POCO (plain old CLR object) to the TextBoxes in the Window. After the user has entered the form data, take your now filled-in object and set it as the DataContext of the FixedDocument. Create an XpsWriter and Write your FixedDocument to your output (file, printer, etc).
I'd give you code, but since I'm actually doing this (or something relatively similar) for work I don't think I can. I can tell you that the hardest part is dealing with the XPS document. You can judge relative to your experiences there.
If you want to fill a XPS form, yes. XPS is XML, so you can change it.
Modifying XPS is certainly possible, it depends a bit on how you want this to work and the amount of programming you are willing to do (but this might be true for every programming issue ;-)).
Anyway: feel free to e-mail me directly on info-at-nixps-dot-com.
I never found a really great solution, so I made one. Check out www.flexsoftinc.com to see my program XPS Express.
Related
I have an word document template that only changes in several fields.
I remember looking around using google-fu and found that you can bind an xml to content controls and dynamically generate word documents through that method.
Unfortunately it's been a while since I revisited this particular problem, and all I remember about this problem was that it was unnessarily clunky and hard to manage.
Are there any opensource solutions that are more elegant? Or a better way to go at this?
I think it is fair to describe content control XML data binding as the latest in a series of techniques which enable Word document generation/automation.
Content control data binding was introduced in Word 2007, so is now not all that new. When you do document automation, you generally also want support for:
conditional inclusion/exclusion of content
repeating data (eg table rows, list items)
There are ways to do both of the above with content controls, and Microsoft has recently released dedicated support for repeating data.
Content control data binding is less brittle than older approaches, but if you don't like XML and XPath, you might not call it elegant :-)
The legacy approaches "baked-in" to Word include:
fields: MERGEFIELD, form fields, or custom fields
use of bookmarks
A problem with these is that it is easy for the author user to mistakenly delete them. Another problem is that they aren't as easy for a developer to work with at the XML level (though maybe OK via VBA or VSTO).
The other approach is to include magic text strings on the document surface, and there are various tools around which do this. This has an initial appeal, but the problem with this is that what looks like a contiguous text string on the document surface can be split up at the XML level, because of:
spelling/grammar correction
rsid (edits made at different times or by different people)
formatting (eg font size, appearance changes)
In short, yeah, there are alternatives. Choose your poison.
Is it possible to create an RTF template document and programmatically insert elements like tables or populate variables? If so, what do I need to do this?
I understand that there is a RichTextBox control that can represent the document, but I need to define a template (or more specifically, the sales guys will be defining the template and giving it to me) which I need to then populate with the data. The RichTextBox control is no good to them. We need RTF because not all clients have Word.
As #ToastyMallows suggested, it's come down to a dirty game of replacing strings.
I'm using the following to generate tables and formatting and stuff:
http://www.codeproject.com/Articles/30902/RichText-Builder-StringBuilder-for-RTF
I've done this, but in an old Windows Forms client-side application using Word automation. It works but not elegantly. We defined the templates in WordPad I think, then extracted the RTF which gets stored in a database. There are pre-defined fields that get populated on document creation.
One thing I have discovered in doing RTF stuff, is that the meaning of RichText (the format) and RichText (as in controls) are not necessarily the same.
I would like to know how can I edit an existing PDF document in C#. The document is already created and has fields as the one on the image below:
I want to know if there is a code which can check the desired checkbox or enter text at the lines. Please let me know.
I looked at iTextSharp but I don't know if that tool can help me achieve that.
There are ways to do it, but it requires external tools. I use ActivePDF library, it provides form filling routines and works quite well..
You can do that with iTextSharp, BUT first you should find out more about the document.
If the pdf contains an actual acroform form definition, filling it is fairly easy. There are many examples in the documentation and on the iText Web site.
If it does not contain such a form definition, though, and the check boxes and text fields merely are some lines drawn somewhere, it gets a bit more difficult: you have to measure where to put your entries.
Additionally you should find out whether the document is signed or encrypted which might limit what you are allowed to do with the document.
I have a PDF and want to extract the text contained in it. I've tried a few different PDF libraries and they all return basically the same results. When extracting the text from a two page document with literally hundreds of words, only a dozen or so words from the header are returned.
Is there any way to tell if the text I'm after is actually text or a raster image of the text? I'm thinking something along the lines of Firebug's "Inspect Element" but at this point I'll take any solution that tells what I'm really looking at.
This project really doesn't justify attempting to use OCR. And, although a simple solution, using fields in the PDF is not an option since the generator of the file is a third party.
If Acrobat/Reader can select the text, then it Is Text.
Reasons your library might not be able to find the text in question:
Complex/bad fonts or encodings. Adobe can be very forgiving of garbage in, somehow managing to get Good Info out.
The text could be in an annotation rather than the page contents. It won't matter what program parses the content stream if you need to look in the annot array instead.
You didn't name a particular library, so it's possible that the library you're using doesn't look inside XObject Forms. That's unlikely in an even remotely mature API, but stranger things have happened.
If you can get away with copy/pasta from Reader, then just go that route.
Have you tried Amyuni PDF Creator .Net? It allows you to enumerate all components from a specified rectangular region of a page and inspect their type from a predefined types list. You could run a quick test using the trial version and the following code sample for text extraction:
// open a PDF file
axPDFCreactiveX1.Open(System.IO.Directory.GetCurrentDirectory()+"\\sampleBookmarks.pdf", "");
axPDFCreactiveX1.Refresh ();
String text = axPDFCreactiveX1.GetRawPageText (1);
MessageBox.Show (text);
Additionally, it provides Tesseract OCR integration in case you needed it.
Disclaimer: I am part of the development team of this product.
Check this site out. It may contain some helpful code snippets. http://www.codeproject.com/KB/cs/PDFToText.aspx
I have an existing XPS file that I would like to use as a template and possibly bind data to it. I have tried several methods, but cannot seem to get it to work.
Does anyone have any experience altering an existing XPS file to add data at runtime and then print or save?
Any help is appreciated.
XPS documents conform to the Open XML standard. There is an SDK for working with these docs. Here is a How-to article by Beth Massi: "Accessing Open XML Document Parts with the Open XML SDK".
Since you are working with the internal doc structure you might also check out 'Open XML Package Editor" which lets you explore the doc with Visual Studio. Here is another How-to by Beth Massi: "Handy Visual Studio Add-In to View Office 2007 Files".
+tom
it's a bit of a challenge to do this with XPS, but it is possible.
You can do this with our NiXPS SDK.
I've posted an example on my blog a while ago:
XPS variable data example
Regards,
Nick
Bindings are evaluated during the process of writing to an XPS document. So you can't set up a {Binding} in a FixedDocument, Write that FD to an XpsDocument, and expect to get that original FD back again when you next open that saved doc.
Also, the standard XpsWriter does convert everything into Glyphs on canvases, so you can't, say, a textbox in the original and expect to be able to find it after its been saved to a document.
I've never used the NiXPS libraries, so if Nick says it can be done you might want to check it out.
One last possibility--You can create placeholders in a form that you will be able to find later. They'd have to be text (something like [[{{FORMFIELDHERELOL}}]]) with some kind of delimiter scheme to differentiate the text from everything else. You could then go spelunking in the XML looking for text that fits the delimeter pattern and switch out those glyphs for your binding text. Of course, the issue with THAT is that if you aren't putting X chars in place of X chars you might find you have to do some repositioning. As its all glyphs on canvas this might be slightly harder than, say, threading a needle with a shoelace.