C# to read XFA form data

C# to read XFA form data - c#

I am trying to programmatically read the field values from a Livecycle created form. I tried opening the document using the Acrobat COM component and it seemed to work and with some reflection I managed to get the actual field names, but the value for each field is the hard part, as it seems.
Furthermore, I know believe that I actually need to use a different approach to extract the values, since it is an XFA form PDF.
(Please don't tell me to look into the examples provided in the Adobe PDF SDK, because they are very poor and absolutely useless to my issue - I already read all I could from the Adobe documentation).
Thank you all.

I use iText/iTextSharp when working with both Acrobat and LiveCycle (XFA) forms. You need need to get access to the LiveCycle XML DOM as the starting point:
iTextSharp Example:
string sourcePdf = #"c:\livecycle.pdf";<br>
PdfReader reader = new PdfReader(sourcePdf);<br>
XmlDocument xmlDoc = reader.AcroFields.Xfa.DomDocument;
You'll need to familiarize yourself with the XFA specification to work with the DOM.

Perhaps, you could use a third-party library, such as ABCPdf to extract field values (this is not an advertisement, I used this library in a similar case, though some time ago).
Another opportunity is that if the PDF in question is under your control, you can use the HTTP-post facility of the LifeCycle-generated PDF files (AFAIK, they can send the values of the fields to a pre-configured Web address once the user pushed the Send button).

Related

read the content of the current page opened page of PDF file

I'm trying to read data and attributes from an opened PDF file which is on screen.
Is there a way of attaching to running acrobat reader and manipulating data from it ?

Attaching to another process means that you will have to handle a lot of inter-process-communication (IPC). Apart from that, you don't know what Acrobat Reader looks like inside. So you cannot simple ask it to deliver you some bytes.
Instead, you should use one of the many libraries to open, display and read PFD-files like iTextSharp. I am certain that these will serve purposes well.
There are many more libraries available, you should have a look at PDFSharp.

I have never done it my self, but a quick look around I found that the Acrobat Reader (assuming that is what you are talking about) has an API which (by looking at its documentation) has an IPC module which will be the closest to what you are asking for.

edit pdf document checkboxes and fields in c#

I would like to know how can I edit an existing PDF document in C#. The document is already created and has fields as the one on the image below:
I want to know if there is a code which can check the desired checkbox or enter text at the lines. Please let me know.
I looked at iTextSharp but I don't know if that tool can help me achieve that.

There are ways to do it, but it requires external tools. I use ActivePDF library, it provides form filling routines and works quite well..

You can do that with iTextSharp, BUT first you should find out more about the document.
If the pdf contains an actual acroform form definition, filling it is fairly easy. There are many examples in the documentation and on the iText Web site.
If it does not contain such a form definition, though, and the check boxes and text fields merely are some lines drawn somewhere, it gets a bit more difficult: you have to measure where to put your entries.
Additionally you should find out whether the document is signed or encrypted which might limit what you are allowed to do with the document.

Printing documents contained in a sharepoint document library

I need a way to print selected documents in a library from the ribbon. As I understand it, this will have to be done through scripting. So the main difficulty seems to be the fact that everything will need to be done client-side. Another not so tiny factor is that it has be a generic document printing button, not just for pdf or office documents.
Is there any way to tell the OS/browser to launch for each of the selected items/documents its appropriate program's printing options/window?
Also I want this to be generic to document libraries so I can't just map one as a remote disc.
Another solution that springs to mind is to transfer the selected documents to another document library under the care of an event handler that runs the neccessary c# commands for printing when a new item is added to said(hidden) document library.
Another thing that comes to mind is to basically reiterate what most print drivers already do with the Print to file option where they create a pdf document of the intended material then it would be just a matter of printing the created pdf. But then the printing applications already know the file they have is in the correct/accepted format and merely have to make the convertion to pdf.
Is there any way to use the OS to just open/print-to-file my document with the correct application?
Any help with the script or c# part would be great. Thank you.

Replacing contents inside docx and pdf file using asp.net c#

In my application I am using some templates in docx and pdf format. I am storing this docs to DB as Bytes.
Befor showing/sending this docs back to user or application I need to replace some contents inside the doc. eg:if the doc contain ##username## I need to replace this with the exact username of the customer. I am not getting a proper solution for this. Any good ideas?

For the docx file, your best bet is to use OpenXML, and instead of having special text like ##username##, replace it with a content control that you can fill in.
Since you specified docx, you can use OpenXML, which is great, it's an API. If it has to work with older doc files, then you'll have to automate Word (which should be avoided if at all possible).
For the PDF, your best bet is to create a PDF form, and fill it in a runtime (using a tool like itextsharp).
HTH,
Brian

For DOC / DOCX:
You should use the MSWord object model through MSWord assembly reference (will work only on machines which have msoffice installed.. or else you can use something like ASPOSE word libraries which wont need msoffice installation on server). You can programmatically trigger the Find-Replace context of word through the library's API.
For PDF: You will need a third party library for editing pdf files.. 3rd party libraries like ABCpdf are available.. (not sure whether Adobe itself has something for this)
The same mechanism like for word library.. but I am not sure whether you will be able to trigger the Find-replace context here or do something else... I have not used a pdf generation library.

How to find out what "kind" of document is being displayed in IE

i am using c# to create a button for IE and this button performs certain actions that all depend on the document being a PDF document. I am trying to setup a guard to prevent any action taking place if the document type is not a PDF but not sure how as IE hands over the document to adobe and reader takes charge. I am using both SHDocWv have looked at the WebBrowserClass objects and not sure how to figure this out. any suggestions?

It's a little bit problematic to do this AFAIK.
Value of IWebBrowser2::Type property depends on what plug-in you have installed that handles PDFs, because some plug-ins creates HTML wrapper for PDF file (like Adobe) so you get "HTML Document" as type and some plug-ins don't do this (like Foxit), so you can't relay on this exclusively.
So if you got PDF with HTML wrapper you can use IHTMLDocument2::mimeType to find out exact type of the document (JPEG/GIF/PNG/etc. files are all wrapped in HTML by the browser). But as I know it is unreliable too, for instance on my machine it returns "Firefox Document" for HTML documents because .html files are associated with Firefox :s But I didn't test to see if this is the case with PDFs alos.
Another options is to use GetUrlCacheEntryInfoEx API call to obtain file in local browser cache which stores document, then read it (only the beginning of the file, I think only the first 256 bytes are important) and call FindMimeFromData with data you just read and it will return mime type.

Check mime type of the document or see the window.location.href of webbrowser... If pdf is being displayed, you would be able to find it...

Another good way is to do the following..
1] Cast the Document object to IPersist and then extract the CLSID using .GetClassID(..).
2] pInvoke ProgIDFromCLSID to extract the progId
3] Match the progID against known COM objects / applications.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# to read XFA form data - c#

Related

read the content of the current page opened page of PDF file

edit pdf document checkboxes and fields in c#

Printing documents contained in a sharepoint document library

Replacing contents inside docx and pdf file using asp.net c#

How to find out what "kind" of document is being displayed in IE

Categories

Resources